Sysadmin BestPractices: Configuring loghost on Sun Fire 3800, 4800, 4900, 6800 and E6900

It is considered a ‘Best Practice’ to configure a ‘loghost’s for each server and production domain. A ‘Loghost’ on a Solaris platform can permanently save messages that are logged in the System Controller’s NVRAM buffer. This will insure that they are not lost due to either a power event, or by rolling off of the small first in first out buffer in the system controller. Properly stored, they can be quickly accessed if a domain outage occurs even if the server controlling the domain is unresponsive. These files when sent to SUN Engineers can speed up troubleshooting and resolve problems quickly and accurately.

Configuration Best practices.

1) Log files to an independent Solaris platform not the domain(‘s) associated with the server. This is to remove the issue which is created when the domain which is down is the same domain which was collecting the loghost data.

This issue presents two problems. First, data, that is needed for troubleshooting becomes inaccessible. Secondly, as soon as Solaris goes down on the domain which is logging, all further messages from the system controller are lost forever. Failure to follow these best practices will lead to an increase in the time it takes to troubleshoot severe problems on the server and could drastically increase downtime if the situation arises.

2) The loghost could be any type of Solaris machine running on the same subnet as the system controllers. If the machine can ping the system consoles it could generally be used.

3) Capture Platform and also domain logs in the loghost. Platform logs give a great snapshot of what is happening globally in a server and domain logs give a much more detailed error report of what issues may be occurring on a failing domain.

Getting Started:

The first step is to create ‘loghost’s is to configure the ‘loghost’ on the System Controller(SC). Setup the IP address of the admin workstation, and then, configure the syslog.conf file on the admin workstation itself.

1. On the System Controller(SC) of the Sun Fire platform, do the following:

First, log on to the SC, either by telnet, ssh or through the serial port, then issue the following command:

r12-1a:SC> setup

platform -p loghost

Loghosts ——–

Loghost []: 172.16.40.10 Log Facility [local0]: local0

Note: You must not use a domain on the platform itself as the loghost.

The format is slightly different below 5.12.5, an OBP upgrade is long overdue. This setting will send platform messages with facility local0, to the admin workstation with IP address 172.16.40.10. The local0 facility is used to differentiate messages coming from the platform.

Log Facility will default to local0, this setting allows syslogd on the admin  workstation to determine which file to log the message into.

For domains, setup the loghost the same way, but use the ‘setupdomain’ command.  For instance, if you want to configure domains A and B loghost facility:

r12-1a:A> setup

domain -p loghost

Loghosts

Loghost []: 172.16.40.10

Log Facility [local0]: local1

r12-1a:B> setupdomain -p loghost

Loghosts

Loghost []: 172.16.40.10 Log Facility [local0]: local2


NOTE: local8 and higher are not allowed and won’t work – there are 8 local facilities, local0-local7. See the syslog.conf(4) man page for more information.

It’s also possible to leave the Log Facility set to it’s default of “local0” but if this is done without using a separate logging package on the loghost (like, The Sun Fire(TM) System Controller Logger (SUNWsclog)), it will be more difficult to separate messages coming from the platform and domains, into different messages files on the admin workstation.


2. On the admin workstation, do the following:

Now, configure the syslog.conf file on the admin workstation, to place the messages into specific messages files. To do this, create the following message files, and simply add the following lines in the file /etc/syslog.conf:

% ssh -l root 172.16.40.10
# touch /var/adm/messages.platform
# touch /var/adm/messages.domainA
# touch /var/adm/messages.domainB
# vi /etc/syslog.conf
[…]
#
# non-loghost machines will use the following lines to cause “user”
# log messages to be logged locally.
#
ifdef(`LOGHOST’, ,
user.err /dev/sysmsg
user.err /var/adm/messages
user.alert `root, operator’
user.emerg *
)
local0.notice /var/adm/messages.platform <= ADD THIS LINE
local1.notice /var/adm/messages.domainA <= ADD THIS LINE
local2.notice /var/adm/messages.domainB <= ADD THIS LINE


NOTE: You must use TABs between the two entries spaces will not work.  Then restart the syslog daemon:

# /etc/init.d/syslog stop # /etc/init.d/syslog start

or just force it to re-read the configuration file:

# pkill -HUP syslogd

3. To test that your loghost configurations are working correctly, do the following:

To test the platform shell loghost file, on the main SC run setfailover off followed by setfailover on You should see the messages appear in the /var/adm/messages.platform file.

To test the domain shell loghost files, there are three methods. The first method requires an outage of the domain whereas the second does not. The third one is very simple and non-intrusive.

Method 1

From the domain shell, perform a “setkeyswitch on”. Note, the keyswitch must be initially in the off position. If the keyswitch is initially in the on position,  you will need to perform a “setkeyswitch off” followed by a “setkeyswitch on”. If the domain loghost setup is correct, you should see the output from LPOST  appear in the domain shell’s loghost file.

Method 2

If an outage is not acceptable, the following procedure may be performed. This procedure relies on a spare System Board being available, or a Dynamic  Reconfiguration(DR) operation.

Spare System Board available

If there is a spare System Board available perform the following:

1) First, add the spare System Board to the domain for which you wish to test the loghost setup, at the platform shell.

2) Power off the System Board if it is powered on. If it is powered off already, ignore this step.

3) Power on the System Board

4) Perform a “testboard SB#” in the domain shell you are testing, where # is the System Board number.

The testboard command will cause an LPOST to be run on the specified System Board only, and will cause output to be displayed to the domain loghost(if configured correctly), as well as to the console.

5) You can then re-assign the System Board to another domain for testing, and follow the steps above, for each domain that is to be tested.

No spare System Boards, DR can be performed

If there are no spare System Boards, and a DR operation can be performed on a System Board in one of the domains, perform the following:

1) DR a System Board out from a domain on the system, via the cfgadm command from Solaris.

2) Ensure the System Board is assigned to the domain you wish to test. You may need to perform a deleteboard/addboard operation at the platform level.

3) Power off the System Board if it is powered on. If it is powered off already, ignore this step.

4) Power on the System Board

5) Perform a “testboard SB#” in the domain shell you are testing, where # is the System Board number.

The testboard command will cause an LPOST to be run on the specified System Board only, and will cause output to be displayed to the domain loghost(if configured correctly), as well as to the console.

6) You can then re-assign the System Board to another domain for testing, and follow the steps above for each domain that is to be tested.

7) Once finished, assign the System Board to the original domain, and DR the board back into the domain.


Method 3

If an outage is not acceptable and you do not have spare board or do not want to use DR then do the following:

1) Look at the current keyswitch setting in the domain shell

2) If it is ‘on’ then perform “setkeyswitch secure” and then “setkeyswitch on”. If it is ‘secure’ then perform “setkeyswitch on” and then “setkeyswitch secure”.

The setkeyswitch command will cause a transition of keyswitch between ‘on’ and ‘secure’ positions and will cause output to be displayed to the domain loghost (if configured correctly), as well as to the domain buffer (you may check itwith “showlogs” command).

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

1 Response

  1. August 6, 2011

    […] Qui e qui (Document ID# 1008676.1 «Best Practices’ and configuring loghost on Sun Fire[TM] 3800,4800,4900,6800, and E6900 servers») per chi ha un’accont MOS (My Oracle Support) la procedura da seguire per raccogliere e centralizzare in un “loghost server” i logs delle SCs dei Mid-Range server in oggetto. […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us