Solaris Troubleshooting : Debugging the Solaris Name Service Switch
The processing of the name service switch configuration file, /etc/nsswitch.conf, is subtle. Existing tools, such as ” truss” , do not always provide sufficent information as to how a lookup is performed. However, by setting the NSS_OPTIONS environment variable, the name service switch can reveal how it processes /etc/nsswitch.conf to perform the lookup and the success or failure thereof.
Steps to Follow
Solaris 8 and later provide an undocumented and unsupported environment variable called NSS_OPTIONS. It accepts an option called “debug_eng_loop” which takes an optional value. The value can be either zero or non-zero. The default value is 1. For example, the following settings have the identical effect:
NSS_OPTIONS=debug_eng_loop
NSS_OPTIONS=debug_eng_loop=1
Setting “debug_eng_loop” to zero turns off debugging. To enable debugging, “debug_eng_loop” must be set to a non-zero value, for example:
NSS_OPTIONS=-2
NSS_OPTIONS=2
The only difference in setting “debug_eng_loop” to 1 and more than 1 is that doing the latter will also display the “NSS: loop: sleeping …” messages.
If the name service cache daemon, nscd , is running then little if any debugging output will be seen as results of most lookups are retrieved from cache, thus bypassing the name service switch. Temporarily disabling the daemon or its individual cache will typically show more.
For example, with caching enabled for “passwd” lookups, there may be no debugging output if the lookup result is already in cache:
$ NSS_OPTIONS=debug_eng_loop=2 ; export NSS_OPTIONS
$ getent passwd foo
foo:xxx:2324:10:Foo Bar:/home/foo:/bin/ksh
$
If we disable the cache, then we can see more:
# nscd -e passwd,no
$ getent passwd foo
NSS_retry(0): ‘passwd’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘passwd’: continue …
NSS_retry(0): ‘passwd’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘passwd’: return.
foo:xxx:2324:10:Foo Bar:/home/foo:/bin/ksh
The above debugging output indicates that “foo” is a user whose entry is stored on NIS+. To enable the cache for “passwd” lookups again, we run:
# nscd -e passwd,yes
Some lookups do not consult the cache. For example, only “group” lookups that call getgrgid(), getgrgid_r(), getgrnam(), and getgrnam_r(), consult the cache. This is why running “groups” has debugging output even when the cache is enabled for “group” lookups:
$ groups
NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘group’: continue …
NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
staff cte techies bld-i386 sssp
But if the cache is disabled, we see much more:
$ groups
NSS_retry(0): ‘passwd’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘passwd’: continue …
NSS_retry(0): ‘passwd’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘passwd’: return.
NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘group’: continue …
NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
NSS_retry(0): ‘group’: trying ‘files’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
staff
NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘group’: continue …
NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
cte
NSS_retry(0): ‘group’: trying ‘files’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
techies
NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘group’: continue …
NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
bld-i386
NSS_retry(0): ‘group’: trying ‘files’ … result=NOTFOUND, action=CONTINUE
NSS: ‘group’: continue …
NSS_retry(0): ‘group’: trying ‘nisplus’ … result=SUCCESS, action=RETURN
NSS: ‘group’: return.
sssp
The order of the above lookups is:
1. Get the UID from the “passwd” database.
2. Get the group membership list from the “group” database.
3. Convert the GIDs into names from the “group” database.
For error conditions, other information is given. For example, if the name service is having to retry, we will see:
result=TRYAGAIN, action=TRYAGAIN_FOREVER
NSS: loop: sleeping 5 …
NSS_retry(123): ‘passwd’: trying ‘nisplus’ …
The above debugging output indicates the 123rd retry. The name service switch code also includes a back-off algorithm; the back-off sleep times typically look like this:
NSS: loop: sleeping 1 …
NSS: loop: sleeping 2 …
NSS: loop: sleeping 4 …
NSS: loop: sleeping 5 …
NSS: loop: sleeping 5 …
These times are measured in seconds and are currently limited to a maximum of 5 seconds.
In …/usr/src/lib/libc/port/gen/nss_common.c, __parse_environment() and set_option() are responsible for parsing and storing the NSS_OPTIONS value, respectively. As for nss_search(), if __nss_debug_eng_loop is more than 1, then it prints the “NSS: loop: sleeping …” messages. If __nss_debug_eng_loop is non-zero, nss_search() calls output_loop_diag_a() and output_loop_diag_b() for debugging output.
In the example for running “groups”, the command makes lookups in the following order:
– getpwuid()
– _getgroupsbymember()
– getgrgid()
Even when the cache is enabled for “group” lookups, calling _getgroupsbymember() does not consult the cache. In fact, this function has to perform its duty for both the “files” and “nisplus” repositories.
If I’ve added a new nsswitch module, but it’s not working, how can I get information what the problem is (e.g. can’t find the library, library won’t load)?
For my new module, the debug output just says
NSS: ‘passwd’: continue …
This also happens if I put non-existent module names in the config fileÂ
hi Tim, you should try with “truss -f -twrite,send -wall -p ” so that it will give more information about  the system calls happening outside of NSS,.