Savings Plan for System Admin – It’s about Time, not Money.

time

  There are several techniques that how an UNIX administrator can save his time during his regular job, and few of the techniques are like writing effective automation scripts for daily routine jobs, installing handy little desktop tools which helps for easy and quick access to entire UNIX environment and having proper monitoring dashboards which helps for proactive system administration …etc.  But the fact is all of these techniques  saves our time only during the normal daily jobs when we are not expecting any disastrous stuff to our environment.


What are the situations we consider as worst situations in Unix Admin Job ( few examples):

  • Natural disasters
  • Complete Data Center Power outages
  • Server failures Due to Power Shortage
  • Hardware component failures for critical Servers
  • User  and Systems database corruption for Mission critical application
  • Complete Application failures
  • Complete Network failures due to Network component failure
  • Web server or other necessary server failures, due security vulnerabilities

The bad thing is , many times, these worst moments don’t come with prior notice ,  so we can’t keep skillful people always ready for those moments , to just sit and wait on the floor,  all the times.  And the Regular Support administrators  ( with beginner and intermediate level skills) who are  working on routine system admin tasks  usually  have their mind trapped in the same state of routine tasks for long period , and  it takes a little while to understand the Problem and to take necessary action to quickly resolve the issue.  And we can’t simply blame them for not reacting to the situation so promptly, because it is not always their skill problem  but it is the problem of human mind and the way it works. 

Just in case, if that worst moment happens  … do you know what exactly saves our time and gives us a breathing space to work with all the available UNIX resources on the floor,  by leveraging their skills irrespective of their experience and knowledge …..   If you are guessing about some kind of information document , yes you are closely right, i.e. RUN BOOK.

Runbook  gives the freedom to  the team , to work with the  required level of expertise during disastrous situations even though the actual available skill levels  are little less than required. Runbook also improves the problem response time and resolution times,  and hence the team can deliver support within SLA ( Service Level Agreement). In the current competitive market it is highly important to deliver service with the agreed SLAs. 

What does Runbook contain?

A run book mentions all the instructions that  unix  administrator need to perform for day-to-day operations and also contains the information to respond to the emergency situations. The run book should contain all necessary information to enable a staff member to perform any process, from performing a backup to failing over to a remote site.

The Runbook that has day-to-day operational instructions, is considered as Unix Operational Run book. And the Runbook which has instructions to  perform critical disaster recovery and fail over Operations is considered as BCM ( Business Continuity Management) Runbook. Depending on the Size of the Organization and IT infrastructure, system administrators manage single or multiple  Runbooks for each application and environment.

Normally below  information should be included in the Runbook:

  • Resource information about the data center and its hardware and software
  • Contact Information for Each Resource involved in the RunBook Instructions, or  the related tools to find the contact information.
  • Process information, including step-by-step procedures for operational and emergency processes
  • Every runbook should have Version number and revision history to include proper comments for every update made

When do we first prepare RunBook and How do we Validate the instructions?

Normally, Run books are prepared/updated very first time whenever the applications/servers are placed in the production environment. And it will be updated at least once in a year by performing proper Disaster Recovery tests. Especially for the Financial organisations,  it is a mandatory requirement to perform regular DR tests and updating their runbooks regularly.

Is that RUNBOOK same as the document that we see in unixadminschool.com?

Absolutely, No.  RunBook  is the document with a very customized set of instructions that works for specific application / environment / organization.  Runbooks  are not generic technical documents,  they are confidential and cannot be shared outside of the organization. 

Some Aggressive sysadmins ask,  Why You need a RunBook, why can’t you script or automate everything and scrap the Run book?

I would recommend, first you prepare the runbook with manual instructions and then automate each task of the Runbook  but I don’t advice to automate the  entire runbook as single automation task. Because during the disastrous situations we want to have the controlled recovery. In fact, whenever the disaster happens , the scripts are the first thing that breaks and never works so we can’t simple rely on automation for those situations.

Here is the Conclusion …. Invest your time in preparing a run book for the  project you work/ deliver, Get Good Returns in Long Term.

If you are working for ” a team with  few expert level resources ,  more intermediate/beginner level resources. You work on projects and also you spending lot of time on resolving escalated issues from other system administrators”,  then this is right action to you.

What are thoughts you have about RunBooks, and what do you suggest others ? 

 

 

 

 

 

 

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

7 Responses

  1. Muneer EP says:

    To prepare full fledge RUN_BOOK, required lot of effort. I started. Thank Ramdev.
    Some useful references.

    http://technet.microsoft.com/en-us/library/cc917702.aspx
    http://wiki.servicenow.com/index.php?title=Runbook_Example_-_Restart_a_Linux_Server
    .)

  2. Muneer EP says:

    Ramdev: Now I am keep on hearing one funda, System Admin has to speak Business Language, do you have any comment, appreciated if you can put some light on this.

  3. Ramdev Ramdev says:

    Hi Muneer,  you actually made my article complete with the sample runbook link :) . Thanks for the Example runbook, 

  4. Ramdev Ramdev says:

     Hi Muneer,  ” System Admin has to speak Business Language ”   —  Yes. it was there since longtime but not that popular as we hear now a days. And the reason is  simple industry calculations changed a lot during post recession period ( after 2008) and management want to see clear connection between every technology component (e.g. server / software /application /tool ) to the business service of the organization . That will make give the management clear picture of how each component is adding a value to the end user business service and how much it costs.

    Just trying to give an hypothetical Scenario to explain the sentance :

    Just assume,   I am a system admin, and I am sitting in the line of a specific business function which is not doing very well at the end ( i mean not a profitable business). And now, I want to request for a new project to install a technology which can solve one of very old  existing problem of the environment which is causing multiple server crashes randomly.

    At the system admin end, we know that technology is very promising and it solves very critical problem and  it will make our server environment stable. The problem here is if we explain same thing to the business people   our project will have very less possibilities of getting approval because not every business manager see a business value in the project ( may be not every business manager  knows what does it mean by server crash and how it will effect the business).

    But if you rephrase your project requirement  using the business Language, and try to explain the project in a way that how much the new technology can increase the business service  availability and reliability. And also give little focus on how much cost it will reduce on avoiding extra maintenance hours per year. then, we will have more possibilities to get our project approved.

    Hope this helps you get little idea.

  5. Muneer says:

    Good comments , Thanks Ramdev

  1. September 15, 2015

    […] Savings Plan for System Admin – It’s about Time, not Money […]

  2. September 17, 2015

    […] Savings Plan for System Admin – It's about Time, not Money […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us