Is change detection even worth the effort?

The idea behind File Integrity Monitoring (FIM) is simple, straightforward and powerful. So you know there has to be a catch, right?

Once you have a system in a “known good” state, record that state so that you can capture all changes as the system drifts. That will give you something to alert you to changes, and provide you with a baseline upon which changes are compared.  Take a look at everything that has changed and everything will become clear (supposedly).  You will know what changed certainly, but will you know whether or not the change was ‘authorized’ change, or the dreaded ‘unauthorized’ change?

Authorized change is simple to understand (notice I didn’t say easy to ‘identify’…  that’s for another blog post).  Authorized change can be the result of patching the system, or upgrading an application, etc.  Unauthorized change can also be any number of things.   Human error (someone edited the wrong configuration file, or deleted the wrong file(s), changed permissions incorrectly, etc.), or worse, someone could have left a little backdoor ‘present’ on your system.

Unfortunately, traditional FIM does not differentiate between authorized and unauthorized changes, and in my opinion, this is the fundamental flaw.  Change is change as far as FIM is concerned.  A human brain needs to make the determination as to whether or not the change is good or bad.  This makes sense at the surface but when we look more closely, it is what I believe will cause the next generation of FIM products to succeed in the market.  Let’s evaluate a real-life example.  For this example let’s keep things simple.  We are tasked with monitoring the FIM output for all of our company’s web servers.  Our company is a medium sized company, so this amounts to 20 different servers.  Naturally all 20 servers should be configured identically so let’s assume that this is the case.  Of course, it is VERY unlikely that all 20 servers are configured identically.  Wouldn’t it be nice to be able to know which systems were not configured properly?  Stay tuned reader…  We’re almost there.

So, we are monitoring change for the 20 web servers.  Everything is running fine and our FIM product is monitoring the systems without any issues.  This weekend though is when the change window opens up for our web servers.  You are on top of things and have a great line of communication open with the operations team(s) responsible for everything from the 5 patches that are going to be deployed as well as the team responsible for upgrading your web server application software.  Well done for knowing ahead of time that things are going to be happening!  This is not always the case (let’s be honest, it is almost *never* the case).

Let’s keep things simple again.  Each patch ‘changes’ 100 files.  Some are files that are deleted, some are files that are added, some are just changes to pre-existing files.  In total though, let’s use the number 100 per patch (some patches are significantly higher in the number of changes but I like simple math).  The web server software upgrade is a little larger.  There are a total of 500 file and file permission changes with that upgrade.  Once the change window closes, you are looking at 1000 file changes for each server that you are monitoring.  1000 changes for each of 20 servers of course means that you get to evaluate 20,000 file changes (yay).  Certainly, there will be some duplication so that will help if your FIM solution has a de-duplication capability but the sad truth here is that most admins out there will simply approve all of these changes automatically because they knew that the massive number of changes occurred during the authorized change window, therefore all of the changes were authorized; Right?   You can of course see the fallacy here.  A smart ‘bad person’ will wait until the change window to drop their backdoor payload and it has a much stronger change to just slip right through with the rest of the authorized changes that took place during the window.

Post-deployment changes needing review
Traditional FIM SignaCert Integrity
Webserver 1 1,000 changes 0
Webserver 2 1,000 changes 0
Webserver 13 1,001 changes (+1 Unauthorized change) 1 Unauthorized change
Webserver 16 999 changes (–1 Authorized change) 1 Authorized change that didn’t occur
Webserver 20 1,000 changes 0
Total Post-deployment changes to review 20,000 changes 2 changes

Who is to blame here?   The first reaction is that the admins are just too lazy to do their job properly.  I strongly believe that is absolutely NOT the right answer.  Computers have always been able to do the ‘busy work’ faster, cheaper and much more methodically than humans do.  We have somehow all fallen into the trap of blaming the administrator rather than blaming the software vendors that are placing such a heavy burden on today’s already over-burdened administrators.  Surely there is a better way…

The fundamental problem lies in the inability to properly identify ‘authorized’ change.  Anything not categorized, as being ‘authorized’ is therefore ‘unauthorized’ and that is where your attention needs to be focused.  If we are able to identify authorized change then we can ‘hide’ that from our main view of our 20 servers so that we can focus all of our attention on that one new binary called ‘notepad.exe’ that has been dropped into the C:\Windows\Temp directories of all our 20 web servers.  That one file is the needle in the haystack that we need to focus our attention on rather than the 20,000 other (red herring) file changes.  Remember, change is not necessarily bad.  In fact, the lack of change could even be more painful than the act of change itself.   If one of the five patches did not properly deploy to one of your 20 servers, the results could be disastrous for that one server.  Does your FIM document the fact that (authorized) change did NOT occur or does it only show you when change DOES occur.  Again, it is all about identifying authorized change, which is supposed to happen.  Anything that deviates from that is suspect, and demands a human brain to analyze.

Traditional FIM works by following a simple process.   A ‘snapshot’ is taken of each system to mark a known good point in time.   As time moves on, the FIM software can compare the current state with the baseline for the same system in order to document any deviations.   Notice the problem yet?  20 different servers mean 20 different baselines.   We are hoping that all 20 baselines are identical but we all know that entropy demands it’s pound of flesh so it is highly unlikely that 20 production servers are identical in every way.  This means that our 20 servers are all being compared against a different ‘known good’ baseline to some degree.

The next generation FIM products need to be able to identify one of the 20 servers as a ‘Golden System’ and then compare the other 19 servers against this one system that we have taken the time to properly configure.  Now, all of our 20 servers will be compared against a single, known-good baseline.   While this will show us any changes (deltas) over time for these 20 systems, there is another significant advantage, which is easy to miss.

Not only will we be detecting change over time for the 20 servers, but we will also be able to detect configuration differences between the 19 servers and the one ‘gold master’ server that we took so much time to configure properly.  I will now have the ability to monitor for system ‘Correctness’!  That is significant and worth some consideration.  Now I have a way to detect when one system deviates from a known to be properly configured state for free.  Bonus!

How does this affect our original problem?  How can I differentiate between authorized and unauthorized change with this new paradigm?  The answer is simple, straightforward and powerful (but don’t let that scare you).  The single, gold standard system simply needs to be brought into your change process a little earlier than the other servers.  As your organization tests the patches that are going to be deployed into production, simply include our gold standard server.  Once the authorized changes (patches, plus upgrades) are properly deployed, simply tell the new FIM application, like SignaCert Integrity, that the state of the gold system has changed and that these changes are authorized changes (A single button click).  That’s all that you need to do because all of our servers will now be compared against this new baseline of authorized changes.  The only alerts I will see now are alerts for systems, which did NOT receive the authorized changes.  I now have a system, which shows me what I am TRULY interested in seeing.  I want to know if any of our 20 production web servers have any unauthorized changes.  It is nice to be able to document all authorized changes as well but do not distract me with what is supposed to happen.  Hit me over the head when something occurs which is NOT supposed to happen.  Tell me when new unauthorized files such as c:\Windows\Temp\Notepad.exe show up.  Tell me when patch number 3 of 5 did not properly deploy to my server.  When the most important data is flashing in front of me with blinky lights, it is much easier to discover (and therefore FIX) the root issue rather than having to manually hunt through a static list of 20,000 entries, which nobody has the time to go analyze properly.  It turns out there is a better way by using SignaCert Integrity.  I will be blogging more about the other unique benefits of using SignaCert Integrity over the coming weeks.