feedback1
Follow unixadminschool.com on on Google+

Solaris Troubleshooting NFs : ls -la hangs on root (/)

Running “ls -la /” hangs, yet running “ls -la” to other root directories  (i.e. ls -la /usr) does NOT hang. And the system logs ( i.e. /var/adm/messages) shows NFS related errors, even though this is NOT a true NFS client.

 

Here are the some sample errors that may appear in the /var/adm/messages file when the “ls -la /” hangs:

Mar 28 09:23:19 moe nfs: [ID 333984 kern.notice] NFS server for volume management (/vol) not responding still trying
Mar 28 09:31:13 moe nfs: [ID 664466 kern.notice] NFS getattr failed for server for volume management (/vol): error 23 (RPC: Unitdata error)

In this particular situation, this client was mounting a CD remotely from another system, which was shutdown before unsharing the CD, and before the client could unmount the remote CD mount. The tail end of a truss shows that it was hanging on /vol as well (line numbers set and it was hanging on line 256-257):

# cd /
# truss -fall -vall -wall -rall ls -la


.
.
.
249   4884/1:                 lstat64(“./xfn”, 0xFFBEFAC0)                                       = 0
250   4884/1:                         d=0×04680002 i=7         m=0040555 l=1   u=0         g=0         sz=1
251   4884/1:                                 at = Mar 27 14:15:01 EST 2002   [ 1017256501 ]
252   4884/1:                                 mt = Mar 27 14:15:01 EST 2002   [ 1017256501 ]
253   4884/1:                                 ct = Mar   8 20:24:51 EST 2002   [ 1015637091 ]
254   4884/1:                         bsz=8192   blks=1         fs=autofs
255   4884/1:                 acl(“./xfn”, GETACLCNT, 0, 0×00000000)                   = 4
256   4884/1:                 lstat64(“./vol”, 0xFFBEFAC0)       (sleeping…)
257   4884/1:                 lstat64(“./vol”, 0xFFBEFAC0)                                       Err#131 ECONNRESET
258   4884/1:                         Received signal #2, SIGINT [default]
259   4884/1:                                 *** process killed ***


Err#131   ECONNRESET  says that  ‘Connection reset by peer’  that means A connection was forcibly closed by a peer.   This   normally   results   from   a   loss   of the connection on the remote host because of a timeout or a reboot.


Follow below instructions to troubleshoot:


When ” ls -la / ” hangs,  check the /etc/mnttab file for a PID associated with /vol.  If you run ” ps -ef | grep vol-PID ” and it does not come back with any processes. Use below command to get the PID

 

# grep “/vol” /etc/mnttab

moe:vold(pid222)               /vol       nfs         ignore,dev=39c0001           1017179906

# ps -ef | grep 222 <== no output returns

The real solution is to unmount /vol:

# umount /vol

You may have to force the unmount. The -f option (forcibly umount) is only available in Solaris 8 Operating Environment.

# umount -f /vol

If you are NOT running Solaris 8, you may have to do a reboot to clear the “ls -la” hang.

Note: Check in the /var/statmon/sm and /var/statmon/sm.bak directories to see if there is a connection still open for this server. If there is, then there is a chance of the system looking to remount the filesystem after reboot. The system will not try to remount if the umount command is successful

RamdevKnowledge Article Shared by : Ramdev ( other posts by Ramdev )

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. Click on below web link to connect me at linkedin.


Tags: , ,

8 Responses to Solaris Troubleshooting NFs : ls -la hangs on root (/)

  1. Saurabh on December 23, 2011 at 5:50 pm

    Hi Ram, Could you pls explain # truss -fall -vall -wall -rall ls -la
    command switches…..Is this can be used everytime for above symtoms?

  2. Gowtham on December 25, 2011 at 6:52 am

    Hi Ramdev,

    I am using Solaris10 in VMware (compatibility Workstation 5). and using it from XShell. (windows). Here i am facing a problem.

    After connecting to the Solaris it is going to disconnect after few seconds. what might be the problem. can u help me.

    Thanks in advance..
    Gowtham.

    • Ramdev
      Ramdev on December 25, 2011 at 9:16 am

      @Goutham – 1. please check if you are able to ping to your VMware Solaris IP, if not pinging may be your windows firewall blocking you 2. If pining, check if the Xshell using whether telnet or ssh? In solaris 10 , telnet was disabled by default. 3. if you dont know whether it is telnet or ssh, just download putty and connect to solaris IP.

  3. Yogesh Raheja
    Yogesh Raheja on December 25, 2011 at 8:19 am

    @Gowtham, are you using windows 7?..

  4. Sayan M on December 26, 2011 at 3:13 am

    This Post is helpfull to troubleshoot stale mount issues in a environment where NFS is used a Lot and stale mounts has been created because of network or power outages . ( Like large setup with NAS /Filers , Centralized Home dir automounts ) .

    Good Post and thanks for sharing

    • Ramdev
      Ramdev on December 26, 2011 at 3:37 am

      Hi Sayan, Thanks for explaining a good example.

  5. MS on December 30, 2011 at 4:37 pm

    helpful note…

What is in your mind, about this post ? Leave a Reply

magad1


QUIZ Center


Closead1