Running “ls -la /” hangs, yet running “ls -la” to other root directories (i.e. ls -la /usr) does NOT hang. And the system logs ( i.e. /var/adm/messages) shows NFS related errors, even though this is NOT a true NFS client.
Here are the some sample errors that may appear in the /var/adm/messages file when the “ls -la /” hangs:
Mar 28 09:23:19 moe nfs: [ID 333984 kern.notice] NFS server for volume management (/vol) not responding still trying
Mar 28 09:31:13 moe nfs: [ID 664466 kern.notice] NFS getattr failed for server for volume management (/vol): error 23 (RPC: Unitdata error)
In this particular situation, this client was mounting a CD remotely from another system, which was shutdown before unsharing the CD, and before the client could unmount the remote CD mount. The tail end of a truss shows that it was hanging on /vol as well (line numbers set and it was hanging on line 256-257):
# cd /
# truss -fall -vall -wall -rall ls -la
249 4884/1: lstat64(“./xfn”, 0xFFBEFAC0) = 0
250 4884/1: d=0×04680002 i=7 m=0040555 l=1 u=0 g=0 sz=1
251 4884/1: at = Mar 27 14:15:01 EST 2002 [ 1017256501 ]
252 4884/1: mt = Mar 27 14:15:01 EST 2002 [ 1017256501 ]
253 4884/1: ct = Mar 8 20:24:51 EST 2002 [ 1015637091 ]
254 4884/1: bsz=8192 blks=1 fs=autofs
255 4884/1: acl(“./xfn”, GETACLCNT, 0, 0×00000000) = 4
256 4884/1: lstat64(“./vol”, 0xFFBEFAC0) (sleeping…)
257 4884/1: lstat64(“./vol”, 0xFFBEFAC0) Err#131 ECONNRESET
258 4884/1: Received signal #2, SIGINT [default]
259 4884/1: *** process killed ***
Err#131 ECONNRESET says that ’Connection reset by peer’ that means A connection was forcibly closed by a peer. This normally results from a loss of the connection on the remote host because of a timeout or a reboot.
Follow below instructions to troubleshoot:
When “ ls -la / ” hangs, check the /etc/mnttab file for a PID associated with /vol. If you run “ ps -ef | grep vol-PID ” and it does not come back with any processes. Use below command to get the PID
# grep “/vol” /etc/mnttab
moe:vold(pid222) /vol nfs ignore,dev=39c0001 1017179906
# ps -ef | grep 222 <== no output returns
The real solution is to unmount /vol:
# umount /vol
You may have to force the unmount. The -f option (forcibly umount) is only available in Solaris 8 Operating Environment.
# umount -f /vol
If you are NOT running Solaris 8, you may have to do a reboot to clear the “ls -la” hang.
Note: Check in the /var/statmon/sm and /var/statmon/sm.bak directories to see if there is a connection still open for this server. If there is, then there is a chance of the system looking to remount the filesystem after reboot. The system will not try to remount if the umount command is successful