Solaris Troubleshooting : Debugging the Solaris Name Service Switch

The processing of the name service switch configuration file, /etc/nsswitch.conf, is subtle.  Existing tools, such as ” truss” , do not always provide sufficent information as to how a lookup is performed. However, by setting the NSS_OPTIONS environment variable, the name service switch can reveal how it processes /etc/nsswitch.conf to perform the lookup and the success or failure thereof.

Steps to Follow
Solaris 8 and later provide an undocumented and unsupported environment variable called NSS_OPTIONS.  It accepts an option called “debug_eng_loop” which takes an optional value. The value can be either zero or non-zero. The default value is 1. For example, the following settings have the identical effect:

NSS_OPTIONS=debug_eng_loop

NSS_OPTIONS=debug_eng_loop=1

Setting “debug_eng_loop” to zero turns off debugging. To enable debugging, “debug_eng_loop” must be set to a non-zero value, for example:

NSS_OPTIONS=-2

NSS_OPTIONS=2

The only difference in setting “debug_eng_loop” to 1 and more than 1 is that doing the latter will also display the “NSS: loop: sleeping …” messages.

If the name service cache daemon, nscd , is running then little if any debugging output will be seen as results of most lookups are retrieved from cache, thus bypassing the name service switch. Temporarily disabling the daemon or its individual cache will typically show more.

For example, with caching enabled for “passwd” lookups, there may be no debugging output if the lookup result is already in cache:

$ NSS_OPTIONS=debug_eng_loop=2 ; export NSS_OPTIONS

$ getent passwd foo

foo:xxx:2324:10:Foo Bar:/home/foo:/bin/ksh

$

If we disable the cache, then we can see more:

# nscd -e passwd,no

$ getent passwd foo

NSS_retry(0): ‘passwd’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘passwd’: continue …

NSS_retry(0): ‘passwd’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘passwd’: return.

foo:xxx:2324:10:Foo Bar:/home/foo:/bin/ksh

The above debugging output indicates that “foo” is a user whose entry is stored on NIS+. To enable the cache for “passwd” lookups again, we run:

 # nscd -e passwd,yes

Some lookups do not consult the cache. For example, only “group” lookups that call getgrgid(), getgrgid_r(), getgrnam(), and getgrnam_r(), consult the cache. This is why running “groups” has debugging output even when the cache is enabled for “group” lookups:

$ groups

NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘group’: continue …

NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

staff cte techies bld-i386 sssp

But if the cache is disabled, we see much more:

$ groups

NSS_retry(0): ‘passwd’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘passwd’: continue …

NSS_retry(0): ‘passwd’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘passwd’: return.

NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘group’: continue …

NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

NSS_retry(0): ‘group’: trying ‘files’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

staff

NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘group’: continue …

NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

cte

NSS_retry(0): ‘group’: trying ‘files’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

techies

NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘group’: continue …

NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

bld-i386

NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE

NSS: ‘group’: continue …

NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN

NSS: ‘group’: return.

sssp

The order of the above lookups is:

1. Get the UID from the “passwd” database.

2. Get the group membership list from the “group” database.

3. Convert the GIDs into names from the “group” database.

For error conditions, other information is given. For example, if the name service is having to retry, we will see:

result=TRYAGAIN, action=TRYAGAIN_FOREVER

NSS: loop: sleeping 5 …

NSS_retry(123): ‘passwd’: trying ‘nisplus’ …

The above debugging output indicates the 123rd retry. The name service switch code also includes a back-off algorithm; the back-off sleep times typically look like this:

NSS: loop: sleeping 1 …

NSS: loop: sleeping 2 …

NSS: loop: sleeping 4 …

NSS: loop: sleeping 5 …

NSS: loop: sleeping 5 …

These times are measured in seconds and are currently limited to a maximum of 5 seconds.

In …/usr/src/lib/libc/port/gen/nss_common.c, __parse_environment() and set_option() are responsible for parsing and storing the NSS_OPTIONS value, respectively. As for nss_search(), if __nss_debug_eng_loop is more than 1, then it prints the “NSS: loop: sleeping …” messages. If __nss_debug_eng_loop is non-zero, nss_search() calls output_loop_diag_a() and output_loop_diag_b() for debugging output.

In the example for running “groups”, the command makes lookups in the following order:

– getpwuid()
– _getgroupsbymember()
– getgrgid()

Even when the cache is enabled for “group” lookups, calling _getgroupsbymember() does not consult the cache. In fact, this function has to perform its duty for both the “files” and “nisplus” repositories.

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

3 Responses

  1. Tim says:

    If I’ve added a new nsswitch module, but it’s not working, how can I get information what the problem is (e.g. can’t find the library, library won’t load)?
    For my new module, the debug output just says
    NSS: ‘passwd’: continue …
    This also happens if I put non-existent module names in the config file 

  2. Ramdev Ramdev says:

    hi Tim, you should try with “truss -f -twrite,send -wall -p ” so that it will give more information about  the system calls happening outside of NSS,.

  1. July 22, 2016

    […] Read – Debugging the Solaris Name Service Switch […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us