Solaris Troubleshooting: CDE login problems
1) Check the related logfiles for error messages:
- /var/dt/Xerrors (note that this file is recycled by dtlogin, especially, it might be empty after dtlogin has been restarted)
- $HOME/.dt/*log* (startlog, startlog.old, startlog.older, errorlog, errorlog.old, and errorlog.older)
- /var/adm/messages
- /var/adm/messages on the home directory server, especially if users with a local home directory can log in, but users with a mounted home directory cannot log in
- On Sun Ray, also check /var/opt/SUNWut/log/messages.
2.) Get a detailed description of the problem:
There exists a plethora of different known root causes for CDE login failure. Identification of all factors which might have contributed to the CDE login failure is the most important step of a detailed investigation.
-
- Confirm that it is CDE login that is failing, rather than login to some application, the screenlock, or Gnome login.
- Does telnet/rlogin/ssh work? If this already fails (rather than being turned off), then this needs to be fixed first.
- Is there any corefile generated (check whether coreadm with global setid core dumps is enabled)?
- Is the dtgreet screen displayed at all, or does failure already occur before the dtgreet screen is displayed?
- User enters name, password, hits return. At what point do they notice a problem? What exactly do they see on the screen? Does the CDE splash screen appear? The CDE backdrop? Some windows?
- Does the user get dropped back to the dtgreet screen, or is CDE login hanging? Does the user get any error message on the screen?
- What processes are running? This tells us how much (if any) of the user processes got started.
- On Sun Ray, what sequence of icons does appear on the screen? If there is a green newt, what icons are displayed in addition to the green newt?
- Does CDE login fail for all locales, or does it fail for specific locales only, such as, say, ISO8859-15 locales, or the Japanese, Korean, or Chinese locales?
- Does login using Gnome or OpenWindows software (if applicable) work?
- Is there a long timeout before CDE login fails (there exist several 300 second timeouts where dtwm and other processes wait for ToolTalk requests to complete)? If yes, how long is that timeout?
- Is any “special” CDE login method used or enabled, such as remote CDE login, smartcard login (“smartcard -c enable” resp. desktop.useSmartCard=True in /etc/smartcard/desktop.properties), or Sun Ray Non Smartcard Mobility (NSCM)?
- Are there other, similarly configured systems where CDE login still works? If yes, what are the differences?
- Is the problem related to certain authentification methods, such as ldap?
- Is the problem intermittent, or strictly reproducible?
- Is there anything else known to be wrong on the same system?
3.) If you don’t get to the dtgreet screen at all:
- /var/dt/Xerrors is the most important logfile here.
- There may be a CDE misconfiguration, a screen resolution the monitor cannot do, missing framebuffer drivers, wrong version of the framebuffer drivers, or missing resp. misconfigured framebuffer devices.
- confirm in ps output that dtgreet is not running. If dtgreet is running, the root cause of the issue is likely with the framebuffer, the monitor, and/or the cable and adapters used to connect framebuffer and monitor.
- Check that the dtlogin master process (pid in /var/dt/Xpid) is up and running. If it is not running, check
– whether it has been turned off (“/usr/dt/bin/dtconfig -d” resp. removing /etc/rc2.d/S99dtlogin). On Solaris 10, when the “dtlogin smf patch” is not installed, dtlogin which is turned on looks like this:
% svcs -a | grep dtlogin
legacy_run Apr_20 lrc:/etc/rc2_d/S99dtlogin
– Solaris 10 with the “dtlogin smf patch” installed:
% svcs -a | grep cde-login
online Jan_29 svc:/application/graphical-login/cde-login:default
– whether old dtlogin child processes are still running, but with parent process id 1, indicating that the dtlogin master process has crashed.
– on non Sun Ray, check whether dtlogin can be restarted from the command line.
-
- If dtchooser is used, check configurations in /etc/dt/config/Xaccess resp. /usr/dt/config/Xaccess.
- Check whether the system is overloaded, or running out of swap space.
- Check the permissions of /usr/bin/login
If customer has a similar system where CDE login works fine:
-
- Check differences, especially CDE configuration files, and patch level.
- Mount /usr, and/or /etc/dt, from the system where CDE login works fine.
Check whether the problem is user specific:
-
- Does the same problem occur for root?
- If CDE login fails for the root user too, this rules out most file permission issues, and it rules out misbehaviour of the home directory server, resp. any other problem with the home directory file system.
- Does the same problem occur for a newly created local user with an absolutely empty $HOME. No hidden .files?
- If yes, this rules out user environment issues caused by the user’s .-files, and it rules out misbehaviour of the home directory server.
- Beware of multiple users sharing the same UID, and of startup scripts which change the UID during login.
If all users are affected by the problem:
-
- Check system CDE startup scripts, especially those in /etc/dt/config and /usr/dt/config. If custom startup scripts exist in /etc/dt/config, check whether the problem persists if they are moved away.
- Check profile and .login in /etc which get sourced before the users’ .profile/.login.
- Check that rpc.ttdbserverd is up and running (if turned on in the inetd.conf – if rpc.ttdbserverd is simply turned off, thats ok on Solaris 7 and higher). On Solaris 10, check as follows:
% svcs -a | grep rpc_tcp
online Apr_20 svc:/network/rpc-100083_1/rpc_tcp:default
-
- Ensure that the /etc/hosts file contains an entry for localhost, as shown below, and an entry for the machine name.
127.0.0.1 localhost
-
- Check that the IP address for the machine is the same as the system IP address. Run “ifconfig -a” to check the interface’s IP addresses. Verify that the first name after the IP address for the machine is the same as defined in /etc/hostname.* where * is the name of the interface.
- Check that rpcbind and automountd are running.
- Check whether some file systems are full.
- Check /tmp for old /tmp/dtdbcache* files which didn’t get cleaned up.
If only some users are affected by the problem, check file permissions and user environment.
-
- $HOME/.dt/*log* are very important logfiles here.
- Does the user have full read and write access rights on his home directory, especially on $HOME/.Xauthority and $HOME/.TTauthority?
- Check whether the user’s quota is full or whether the file system which carries the user’s home directory is full.
- Does backing out $HOME/.dt help (mv $HOME/.dt $HOME/.dt_backup)?
- if affirmative -> check for any user script holding up the session
- manager execution flow (by waiting for input, etc.) -> check for any conflict with startup applications (oracle, etc.)
- Does backing out the user’s .-files help?
- if affirmative, check which of the startup files causes login failure.
- Beware of interactive prompts in dot-files executed on login.
- check /tmp for old /tmp/dtdbcache* files which didn’t get cleaned up.
If only users with a mounted home directory are affected:
-
- If only non-local users are affected, check whether the home directory server is up and reachable.
- Check for dramatic date/time differences to the home directory server.
- If only some users with a mounted home directory are affected, check whether different file systems are used for their home directories.
- Check the mount point for the home directories.
- If the home directory server’s rpc.ttdbserverd is turned on in the inetd.conf, check whether it is down or hanging (then dtwm will not come up on Solaris 7 and higher).
- Check that both hostname and IP address of the home directory server can be resolved all the time.
- Check whether unusual naming services are used.
- Have a look at the Solaris version and patch level of the home directory server.
- If applicable, create a NIS user with a local home directory, and check whether this user can log in.
If remote CDE login fails:
You will need to have a look at both the display system and the remote system.
-
- Check whether users can log in at the console.
- Check /usr/dt/config/Xaccess resp. /etc/dt/config/Xaccess (if it exists).
- Check whether any special startup method for remote CDE login is used.
- What OS and X client is used to remote CDE login to what OS? Remote CDE login issues can be hardware specific!
- If remote login from a PC X-Client fails, to rule out configuration issues and bugs of the PC X-Client, check whether remote login from a Solaris system works.
- Check whether IP address and hostname of the display system can be resolved.
- Might be a font or font server issue.
- Enabling smartcard login turns off CDE remote login
4.) In depth troubleshooting methods:
Before the user logs in, attach a “truss -edaf” to the dtlogin child process (you can add -u to get library calls as well).
Example:
# ps -ef | grep dtgreet
root 3136 3083 0 15:44:31 ? 0:00 dtgreet -display :2 <—–
root 3264 3263 0 15:59:38 pts/6 0:00 grep dtgreet
# ptree 3136
988 /usr/dt/bin/dtlogin -daemon
3083 /usr/dt/bin/dtlogin -daemon <—– dtlogin child process
3136 dtgreet -display :2
# truss -edaf -o /var/tmp/truss.out -p 3083
In the truss, first check whether the Xsession is reached. If the Xsession is reached, crossreference startlog, truss, and the /usr/dt/bin/Xsession script to identify which subprocess fails first. Turn on coreadm, and make sure that coredumpsize is not set to 0:
Turn on coreadm with at least the following settings:
global core dumps: enabled
per-process core dumps: enabled
global setid core dumps: enabled
as root:
# mkdir /var/core
# coreadm -e global -e global-setid -g /var/core/%f.%t
# coreadm
global core file pattern: /var/core/%f.%t
init core file pattern: core
global core dumps: enabled
per-process core dumps: enabled
global setid core dumps: enabled
per-process setid core dumps: disabled
global core dump logging: disabled
To turn off coreadm again:
# coreadm -d global -d global-setid
# coreadm
global core file pattern: /var/core/%f.%t
init core file pattern: core
global core dumps: disabled
per-process core dumps: enabled
global setid core dumps: disabled
per-process setid core dumps: disabled
global core dump logging: disabled
add debug code to the /usr/dt/bin/Xsession script, or to .dtprofile
Example 1: to turn on ToolTalk trace mode, add the following lines at the end of $HOME/.dtprofile:
DTSOURCEPROFILE=true
dtstart_ttsession=”$DT_BINPATH/ttsession -t;tttrace -aFo /tmp/tttrace.out”
Example 2: the Xsession script itself contains various examples how to use the Log function to create debug output.
If you suspect a hostname resolution issue, turn on nscd debug mode
% more /etc/nscd.conf
[…]
logfile /var/adm/nscd.log <—- uncomment
# enable-cache hosts no
debug-level 9 <—- set debug level here
1 Response
[…] Read – Troubleshooting: CDE login problems […]