Contact Us
24/7
Python BlogDjango BlogSearch for Kubernetes Big DataSearch for Kubernetes AWS BlogCloud Services

Blog

<< ALL BLOG POSTS

What Can Systems Administrators Learn From Fighter Pilots?

|
November 15, 2010

What can Systems Administrators learn from Fighter Pilots?  This sounds like a ridiculous question at face value.  Flying a fighter plane is completely different than running a network of servers, right?  I propose it's not as different as you might think.

There is a military strategy developed by USAF Colonel John Boyd called "OODA loop."  According to Col. Boyd, decision-making occurs in a recurring cycle of Observe-Orient-Decide-Act. 

To explain the components of the OODA, these are the four steps in more detail.

 

There are two sides of a dogfight: action and reaction.  The pilot who is in reaction mode, will typically lose the dogfight.  It is necessary for a pilot to force his opponent into reacting to him in order to win.

To apply this to systems administration requires only a small leap of faith.  A Systems Administrator is usually managing a complex group of networking infrastructure, servers, and services.  If that Administrator is always reacting to problems they will always be evading missiles.  


By proactively fixing the future problem, the Administrator has now eliminated something that later... he would have had to react to.

Like the fighter pilot, it is necessary for the Systems Administrator to perform an action, or series of actions, to push him to the action side of the bubble.  Here are a couple of examples of how a Systems Administrator can be proactive. 

First, while working on a project, a server gets updated to a newer version of the operating system.  During this upgrade the Systems Administrator notices that there is a flaw in the way a service was configured that, while working on the older version of the OS, no longer works after the upgrade.  The Administrator reacts by fixing the configuration on the upgraded server.  But being proactive, the Administrator also goes through and reconfigures the other servers that use that service that have not been upgraded to use the new, improved configuration.  By proactively fixing the future problem, the Administrator has now eliminated something that later, when upgrading the OS on those other servers, he would have had to react to.

Next, part of a Systems Administrator's job is typically to take trouble tickets from users.  The worst of these is when a customer/client/co-worker calls in saying a server is down.  After reacting and bringing the server back online the Systems Administrator looks for evidence on why the server failed.  For this example, let's say it is because the system ran out of swap and crashed when it no longer had available ram or swap space to write objects.  A proactive Systems Administrator will build a monitoring system that pays attention to the system metrics.  This system has alerting rules that can be configured by the Administrator that triggers a warning when 70% of swap is used, an error when 80% of swap is used, and a critical warning when 90% of swap is used.  With this new monitoring system in place the Administrator can now proactively restart services that are consuming memory before the server fails.  This prevents a service outage and prevents the users from experiencing downtime.

Finally, as Systems Administrators our skill set should be constantly evolving.  It's not uncommon to learn new techniques or methods while on the job.  When a new method or technique is learned that is superior to the previous method it is a good idea to apply these new techniques to previous installations.  While there weren't technically any failures,  it is usually worth the time and effort to perform the changes.  Another benefit of back porting these changes to the other installations is it reduces the number of configurations you need to understand, dramatically simplifying the troubleshooting process if something bad does happen.

In short, Colonel Boyd's strategy is to work through the Observe,-Orient,-Decide-Act loop quickly and efficiently enough to get onto the action side of the action/reaction bubble by more quickly processing available information and performing actions to force a shift in the opponent.  As applied to Systems Administrators it is necessary to proactively to discover potential changes we can make to prevent problems from ever occurring.  It is also necessary that we as Systems Administrators use tools to monitor our networks so that we know about problems before our end-users as often as possible.

 

Have you found being proactive to be beneficial overall?  What techniques have you used to be proactive in your systems administration?

 

How can we assist you in reaching your objectives?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.