Split Tornado logs by component
Dirac offers the ability to write logs for each component. One can find logs in <DIRAC FOLDER>/startup/<DIRAC COMPONENT>/log/current
In case of Tornado, logs come from many components, and can be hard to sort.
Using Fluent-bit will allow to collect logs from files, rearrange content, then send them elsewhere like an ELK instance or simply other files. Thus, in case of ELK, it’s now possible to monitor and display informations through Kibana and Grafana tools, using filters to sort logs, or simply read other splitted log files, one by component.
The idea behind that is to deal with logs independantly from Dirac. It is also possible to grab servers metrics such as cpu, memory and disk usage, giving the opportunity to make correlations between logs and server usage.
DIRAC Configuration
First of all, you should configure a JSON Log Backend in your Resources
and Operations
like:
Resources
{
LogBackends
{
StdoutJson
{
Plugin = StdoutJson
}
}
}
Operations
{
Defaults
{
Logging
{
DefaultBackends = StdoutJson
}
}
}
Fluent-bit Installation
On each Dirac server, install Fluent-bit (https://docs.fluentbit.io):
curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh
Fluent-bit Configuration
Edit and add in /etc/fluent-bit/fluent-bit.conf:
@INCLUDE dirac-json.conf
Create following files in /etc/fluent-bit
dirac-json.conf (Add all needed components and choose the output you want):
[SERVICE]
flush 1
log_level info
parsers_file dirac-parsers.conf
[INPUT]
name cpu
tag metric
Interval_Sec 10
[INPUT]
name mem
tag metric
Interval_Sec 10
[INPUT]
name disk
tag metric
Interval_Sec 10
[INPUT]
name tail
parser dirac_parser_json
path <DIRAC FOLDER>/startup/<DIRAC_COMPONENT>/log/current
Tag log.<DIRAC_COMPONENT>.log
Mem_Buf_Limit 50MB
[INPUT]
name tail
parser dirac_parser_json
path <DIRAC FOLDER>/startup/<ANOTHER_DIRAC_COMPONENT>/log/current
Tag log.<ANOTHER_DIRAC_COMPONENT>.log
Mem_Buf_Limit 50MB
[FILTER]
Name modify
Match log.*
Rename log message
Add levelname DEV
[FILTER]
Name modify
Match *
Add hostname ${HOSTNAME}
[FILTER]
Name Lua
Match log.*
script dirac.lua
call add_raw
[FILTER]
Name rewrite_tag
Match log.tornado
Rule $tornadoComponent .$ $TAG.$tornadoComponentclean.log false
Emitter_Name re_emitted
#[OUTPUT]
# name stdout
# match *
[OUTPUT]
Name file
Match log.*
Path /vo/dirac/logs
Mkdir true
Format template
Template {raw}
[OUTPUT]
name es
host <host>
port <port>
logstash_format true
logstash_prefix <index prefix>
tls on
tls.verify off
tls.ca_file <path_to_ca_file>
tls.crt_file <path_to_crt_file>
tls.key_file <path_to_key_file>
match log.*
[OUTPUT]
name es
host <host>
port <port>
logstash_format true
logstash_prefix <index prefix>
tls on
tls.verify off
tls.ca_file <path_to_ca_file>
tls.crt_file <path_to_crt_file>
tls.key_file <path_to_key_file>
match metric
dirac-json.conf
is the main file, it defines different steps such as::[SERVICE] where we describe our json parser (from dirac Json log backend) [INPUT] where we describe dirac components log file and the way it will be parsed (json) [FILTER] where we apply modifications to parsed data, for example adding a levelname “DEV” whenever logs are not well formatted, typically “print” in code, or adding fields like hostname to know from which host logs are coming, but also more complex treatments like in dirac.lua script (described later) [OUTPUT] where we describe formatted logs destination, here, we have stdout, files on disks and elasticsearch.
dirac-parsers.conf:
[PARSER]
Name dirac_parser_json
Format json
Time_Key asctime
Time_Format %Y-%m-%d %H:%M:%S,%L
Time_Keep On
dirac-parsers.conf
describes the source format that will be parsed, and the time that will be used (here asctime field) as reference
dirac.lua:
function add_raw(tag, timestamp, record)
new_record = record
if record["asctime"] ~= nil then
raw = record["asctime"] .. " [" .. record["levelname"] .. "] [" .. record["componentname"] .. "] "
if record["tornadoComponent"] ~= nil then
patterns = {"/"}
str = record["tornadoComponent"]
for i,v in ipairs(patterns) do
str = string.gsub(str, v, "_")
end
new_record["tornadoComponentclean"] = str
raw = raw .. "[" .. record["tornadoComponent"] .. "] "
else
raw = raw .. "[]"
end
raw = raw .. "[" .. record["customname"] .. "] " .. record["message"] .. " " .. record["varmessage"] .. " [" .. record["hostname"] .. "]"
new_record["raw"] = raw
else
new_record["raw"] = os.date("%Y-%m-%d %H:%M:%S %Z") .. " [" .. record["levelname"] .. "] " .. record["message"] .. " [" .. record["hostname"] .. "]"
end
return 2, timestamp, new_record
end
dirac.lua
is the most important transformation we perform on primarily logs, it builds new record depending on logs containing or not special field tornadocomponent, then cleans and formats it before sending to the outputs.
Testing
Before throwing logs to ElasticSearch, config can be tested in Standard output by uncommenting:
[OUTPUT]
name stdout
match *
…and commenting ElasticSearch outputs.
Then by using command:
/opt/fluent-bit/bin/fluent-bit -c //etc/fluent-bit/fluent-bit.conf
NOTE: When all is OK, uncomment ElasticSearch outputs and comment stdout output
Service
sudo systemctl start/stop fluent-bit.service
Dashboards
In case of logs sent to an ELK instance, dashboards are available here.
On disk
In case of logs sent to local files, Logrotate is mandatory.
Having a week log retention, Logrotate config file should look like /etc/logrotate.d/diraclogs:
/vo/dirac/logs/* {
rotate 7
daily
missingok
notifempty
compress
delaycompress
create 0644 diracsgm dirac
sharedscripts
postrotate
/bin/kill -HUP `cat /var/run/syslogd.pid 2>/dev/null` 2>/dev/null || true
endscript
}
along with crontab line like
0 0 * * * logrotate /etc/logrotate.d/diraclogs