What to Log

Phil Packer PublicBlog 18 May 2022 Hits: 992

I'm not sure if you all know about the existence of the National Cyber Security Centre (NCSC) - they are the 'user friendly' wing of GCHQ and publish lots of good articles about cyber security. I had the pleasure of working with their support on a piece of "Critical National Infrastructure" some years back and found their ideas practical and helpful - although TBH it was the first and only time I've had meetings with an engineer who had their own minder, who sat alongside, didn't say anything and just took lots of notes!

Anyway, I recently came across a very interesting article from them and I wanted an excuse to reference it here and discuss some of the issues they raise.

So first off, you might find it useful to read this article from the NCSC

My reading of that article is that it is mostly about logging for security purposes rather than for diagnostic purposes, which is a good and necessary thing of course, but logging can be much, much more than that.

In fact I'd go so far as to say, if you aren't doing good logging, how do you know that your network and systems are working properly?

It's also important to remember that logging is like gardening - it's not a 'fire and forget' task - you need to keep tuning it as your network and attack surface changes, although beware that many companies will offer you AI based logging systems that claim to avoid this necessity, though I have my doubts!

Security Information and Event Management (SIEM)

What we are talking about here is a very cut-down approach to Security Information and Event Management (SIEM) which is a huge topic and large companies make small fortunes advising and supplying big software suites to help with this.

However, many SMEs and even some large ones get by with much more straightforward approaches.

Commercial and Free Tools

I'm not planning to give a total rundown of what is a huge market but here are a few packages to look at; some of which I've used, and some I haven't

Splunk - aggregate and dashboard stuff in a very straightforward fashion. This also has installable agents for on-system data collection.
Logstash/Kibana/ElasticSearch - this is the open-source equivalent for the above, but as with all open-source stuff you may find it needs more care and feeding than you might like.
Solarwinds Security and Event Manager - commercial product. Not cheap!
Nagios - an opensource log management and analysis tool
Solarwinds SEM - I've not actually used this but it's spoken well of (if you have $$$)
AlienVault OSSIM - I've not used this either but it looks interesting

What to log

My standard answer would be "it depends". Whilst I appreciate that isn't very helpful, here's some of the things I think should always be logged and some "it depends" items too.

If one needs a quick fix one can just log messages of a given level of severity or above (which does somewhat assume that all the clients properly label each message with a sensible severity, of course!)

All Devices

Authentication failures
Authentication successes
General 'out of resource' errors e.g. out of memory, storage, CPU etc.
System Restarted messages of all sorts - bad actors installing malware will quite often have to force a reboot to complete their installation.

Network switches

Any routing state changes, be that for OSPF, RIP, BGP or whatever

LACP State Changes

Spanning-Tree State Changes

Flapping ports

Error-Disable activity messages on Switch Ports - very handy for detecting if someone has added an unauthorized unmanaged switch to the network!

Ideally one would capture the whole device config on a regular basis and compare with the previous version. This is very handy in detecting unauthorised config changes (Also see Never Be The Last Person Who Touched The Firewall)

Things to look for when analysing logs

Very often analysing logs is about looking for things that should be there to confirm correct operation, and noticing things that should not be there, such as strange unexpected messages or errors. Ideally it should be possible to switch on additional logging that shows the flow and operation of software. This will help you to understand what is happening. This may also help you to detect if some messages that should be there are not being generated. Some illustrative examples:

Always carry out some simple checks to ensure that all logs are accurately time synchronised.
Same user logging in successfully or unsuccessfully across multiple devices at the same time - this might be legit, it might not.
Any gaps in the logs ( which could be a sign of attacker deleting stuff )
Log Hogs - Devices or systems producing log noise, this can highlight a fault, however these can also mask real problems.
Time stamp issues can also indicate strange behaviour or misconfiguration.
Martian Packets - a piece of information coming in on the wrong port.
Multiple devices reporting the same problems at the same time, usually indicates a common cause.
Excessive log entries for the same issue either continuously through the logs or sporadically.

How to Log

You should get logging information off the box you are protecting ASAP to avoid it being over-written or compromised by a bad actor.

Preferably one should use a write-only channel such as syslog (although syslog doesn't have any form of integrity protection). The standard way of doing this is just with unencrypted UDP on port 514 but there is now (since 2009!) an IETF standard way of doing this as per RFC5425 which provides protection against various forms of attacks.

The NCSC have a "basic logging toolkit" to help with this Logging Made Easy

Once you have all the log messages from across your estate in (approximately) the same place and hopefully into a relatively sane format (syslog format is common and will save you time munging log formats later...) you can then look at how to process the logs

Logging hardware

This is inevitably tied up with the "how much to log" question, and the related second question "How far back do you want to be able to go?", as obviously the more days logs you want to keep, the more disk you need. On the plus side, since each log entry is likely to be a pretty small line of text so it's not like you are re-inventing YouTube or anything like that!

As an example of how far you can go, a colleague of ours was the security manager for a large-ish gambling site and he had a setup where he captured every byte of data flowing into and out of his network from the internet, and he stored that for several weeks in a searchable form - just in case he needed to replay any suspicious traffic for investigation. Now I'm not suggesting you ought to go that far, but none-the-less keeping all your log messages for say several months would seem to be a reasonable starting point

Analysing Logs

If you are trying to analyse an issue, it may be useful to initially narrow the search to a relatively wide time window, but only looking for log messages of a relatively high severity. That may highlight a narrower timeframe where a suspicious log message occurs, at which point you can inspect a narrower time window more rigorously, looking for any messages of any severity in that timeframe.

A general point would be that one is trying to find patterns. Sometimes these will be repeated instances of the same message on the same device, or perhaps a cluster of similar messages around the same time across multiple devicess. One recent item that we've seen recently showed a pattern where any given "incident" of one message type on a given switch was followed after about 10 seconds by a different set of message types on the same device. The obvious conclusion was that the latter was somehow a consequence of the former, presumably driven in part by some timeout settings in the various protocols running on the Switch.

Another general idea is that it's worth having a regular inspection of logs for anything which is hogging the limelight; indeed, you could even make a simple tool which counts the top 10 log messages on a daily basis. These can be simple things like "flapping" edge ports which don't, of themselves, cause huge problems to the overall network, but may be consuming huge amounts of log space and possibly even Switch CPU resource, making it difficult to see the wood for the trees when a real problem comes along.

And Finally

If you would like full confidence that your network is fit for use and that nothing untoward is happening on your network, please contact us on 0203 805 7795 and talk directly to one of the Layer3 Systems team.

Print Email