Centralized logging for DIRAC components with the ELK stack and MQ services

The ELK stack is now a de facto standard for centralizing logs. DIRAC allows you to send your Agents and Services logs to a MessageQueue system, from which the ELK stack can be fed.

We will not detail the installation of the MQ or the ELK stack here.

DIRAC Configuration

First of all, you should configure a MQ in your Resources that will be used for the logging purposes. See Message Queues for details. It would look something like:

Resources
{
  MQServices
  {
    messagebroker.cern.ch
    {
      MQType = Stomp
      Port = 61713
      User = myUser
      Password = myPassword
      Host = messagebroker.cern.ch
      Queues
      {
        lhcb.dirac.logging
        {
        }
      }
    }
  }
}

Next, you should define a logging backend using the messageQueue plugin. See MessageQueueBackend and Send a log record in different outputs for details. It would look something like:

LogBackends
{
  mqLogs
  {
    MsgQueue = messagebroker.cern.ch::Queues::lhcb.dirac.logging
    Plugin = messageQueue
  }
}

Finally, you have to instruct your services or agents to use that log backend. Again, see Send a log record in different outputs. For the DFC for example, it would look something like:

FileCatalog
{
  [...]
  LogBackends = stdout, mqLogs
}

From the DIRAC point of view, that’s all there is to do.

Logstash and ELK configurations

The suggested logstash configuration (/etc/logstash/conf.d/configname) can be found in https://gitlab.cern.ch/ai/it-puppet-module-dirac/-/blob/qa/code/templates/logstash.conf.erb (check the full documentation)

The ElasticSearch template lhcb-dirac-logs_default looks like:

{
  "order": 1,
  "template": "lhcb-dirac-logs-*",
  "settings": {
    "index.number_of_shards": "1",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.threshold.query.warn": "10s",
    "indexing.slowlog.level": "info",
    "indexing.slowlog.threshold.index.warn": "10s",
    "indexing.slowlog.threshold.index.info": "5s"
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        {
          "varmessage_field": {
            "path_match": "varmessage",
            "match_mapping_type": "*",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "@version": {
          "type": "keyword"
        },
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "location": {
              "type": "geo_point"
            },
            "latitude": {
              "type": "half_float"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        }
      }
    }
  }
}

Kibana dashboard

A dashboard for the logs can be found here in both json and ndjson format, as ES services are dropping support for .json imports in newer versions.