Solaris Troubleshooting – Common Error Messages

Below is the Reference to Common Solaris Error Messages and their troubleshooting tips.

***** FILE SYSTEM WAS MODIFIED *****

Cause This comment from the fsck(1M) command tells you that it changed the filesystem it was checking.

Action If fsck was checking the root filesystem, reboot the system immediately to avoid corrupting the / partition. If fsck was checking a mounted filesystem, unmount that filesystem and run fsck again, so that work done by fsck is not undone when in-memory file tables are written out to disk.

** Phase 1 – Check Blocks and Sizes

Cause The fsck(1M) command is checking the filesystem shown in the messages that are displayed before this one. The first phase checks the inode list, finds bad or duplicate blocks, and verifies the inode size and format.

Action If more than a dozen errors occur during this important phase, you might want to restore the filesystem from backup tapes. Otherwise it is fine to proceed with fsck.

** Phase 1b – Rescan For More DUPS

Cause The fsck(1M) command detected duplicate blocks while checking a filesystem, so fsck is rescanning the filesystem to find the inode that originally claimed that block.

Action If fsck executes this optional phase, you will see additional DUP/BAD messages in phases 2 and 4.

.

** Phase 2 – Check Pathnames

Cause The fsck(1M) command is checking a filesystem, and fsck is now removing directory entries pointing to bad inodes that were discovered in phases 1 and 1b. This phase might ask you to remove files, salvage directories, fix inodes, reallocate blocks, and so on.

Action If more than a dozen errors occur during this important phase, you might want to restore the filesystem from backup tapes. Otherwise it is fine to proceed with fsck.

** Phase 3 – Check Connectivity

Cause The fsck(1M) command is checking a filesystem, and fsck is now verifying the integrity of directories. You might be asked to adjust, create, expand, reallocate, or reconnect directories.

Action You can usually answer yes to all these questions without harming the filesystem.

** Phase 4 – Check Reference Counts

Cause The fsck(1M) command is checking a filesystem, and fsck is now checking link count information obtained in phases 2 and 3. You might be asked to clear or adjust link counts.

Action You can usually answer yes to all these questions without harming the filesystem.

.

** Phase 5 – Check Cyl groups

Cause The fsck(1M) command is checking a filesystem, and fsck is now checking the free-block and used-inode maps. You might be asked to salvage free blocks or summary information.

Action You can usually answer yes to all these questions without harming the filesystem.

451 timeout waiting for input during variable

Cause When sendmail(1M) reads from anything that might time out, such as an SMTP connection, it sets a timer to the value of the r processing option before reading begins. If the read doesn’t complete before the timer expires, this message appears and reading stops. (Usually this is during RCPT.) The mail message is then queued for later delivery.

Action If you see this message often, increase the value of the r processing option in the /etc/mail/sendmail.cf file. If the timer is already set to a large number, look for hardware problems such as poor network cabling or connections.

See Also For more information about setting the timer, see the section describing the

550 variable… Host unknown

Cause This sendmail(1M) message indicates that the destination host machine, specified by the address portion after the @ (at-sign), was not found during DNS (Domain Naming System) lookup.

Action Use the nslookup(1M) command to verify that the destination host exists in that or other domains, perhaps with a slightly different spelling. Failing that, contact the intended recipient and ask for a proper address.

Sometimes this return message indicates that the intended host is merely down, rather than unknown. If a DNS record contains an unknown alternate host, and the primary host is down, sendmail returns a “Host unknown” message from the alternate host..

For uucp mail addresses, the “Host unknown” message probably means that the destination hostname is not listed in the /etc/uucp/Systems file.

Technical Notes . This is a known sendmail version 8.6.7 bug.

550 variable… User unknown

Cause This sendmail(1M) message indicates that the intended recipient, specified by the address portion before the @ (at-sign), could not be located on the destination host machine.

Action Check the e-mail address and try again, perhaps with a slightly different spelling. If this doesn’t work, contact the intended recipient and ask for a proper address.

554 variable… Local configuration error

Cause This sendmail(1M) message usually indicates that the local host is trying to send mail to itself.

Action Check the value of the $j macro in the /etc/mail/sendmail.cf file to ensure that this value is a fully-qualified domain name.

Technical Notes When the sending system provides its hostname to the receiving system (in the SMTP HELO command), the receiving system compares its name to the sender’s name. If these are the same, the receiving system issues this error message and closes the connection. The name provided in the HELO command is the value of the $j macro.

A

A command window has exited because its child exited.

Cause The argument to a cmdtool(1) or a shelltool(1) window looks like it is supposed to be a command, but the system cannot find the command.

Action To run this command inside a cmdtool or a shelltool, make sure the command is spelled correctly and is in your search path (if necessary, use a full path name). If you intended this argument as an option setting, use a minus sign (-) at the beginning of the option.

Technical Notes Both the cmdtool and the shelltool are OpenWindows terminal emulators.

admintool: Received communication service error 4

Cause AdminTool could not start a display method because a remote procedure call timed out, so it can’t send the request. This error results when admintool tries to access the NIS or NIS+ tables when networking is not enabled.

Action Verify the system network status with ifconfig -a to make sure the system is connected to the network. Make sure the ethernet cable is connected and the system is configured to run NIS or NIS+.

answerbook: XView error: NULL pointer passed to xv_set

Cause The AnswerBook navigator window comes up, but the document viewer window does not. This message appears on the console, and the message “Could not start new viewer” appears in the navigator window. This situation indicates that you have an unknown client or a problem with the network naming service.

Action Run the ypmatch(1) or nismatch(1) command o determine if the client hostname is in the hosts map. If it isn’t, add it to to NIS hosts map on the NIS master server. Then make sure the /etc/hosts file on the client contains an IP address and entry for that hostname followed by loghost (reboot if you changed the /etc/hosts file). Check that the ypmatch or nismatch client hosts command returns the same IP host address as in the /etc/hosts file. Finally, quit all existing AnswerBooks and restart.

Arg list too long

Cause The system could not handle the number of arguments given to a command or program when it combined those arguments with the environment’s exported shell variables. The argument list limit is the size of the argument list plus the size of the environment’s exported shell variables.

Action The easiest solution is to reduce the size of the parent process environment by unsetting extraneous environment variables. (See the man page for the shell you’re using to find out how to list and change your environment variables.) Then run the program again.

Technical Notes An argument list longer than ARG_MAX bytes was presented to a member of the exec() family of system calls.

The symbolic name for this error is E2BIG, errno=7.

Argument out of domain

Cause This is a programming error or a data input error.

Action Ask the program’s author to fix this condition, or supply data in a different format.

Technical Notes This indicates an attempt to evaluate a mathematical programming function at a point where its value is not defined. The argument of a programming function in the math package (3M) is out of the domain of the function. This could happen when taking the square root, power, or log of a negative number, when computing a power to a non-integer, or when passing an out-of-range argument to a hyperbolic programming function.

To help pinpoint a program’s math errors, use the matherr(3M) facility.

The symbolic name for this error is EDOM, errno=33.

Arguments too long

Cause This C shell error message indicates that there are too many arguments after a command. For example, this can happen by invoking rm * in a huge directory. The C shell cannot handle more than 1706 arguments.

Action Temporarily start a Bourne shell with sh and run the command again. The Bourne shell dynamically allocates command line arguments. Return to your original shell by typing exit.

assertion failed: variable, file variable, line number

Cause A condition in the program that was never expected to happen has happened.

Action Contact the vendor or author of the program to ask why it failed. If you have the source code for the program, you can look at the file and line number where the assertion failed. This might give you an idea of how to run the program differently.

Technical Notes This message results from a diagnostic macro called assert() that a programmer inserted into the specified line of a source file. The expression that evaluated untrue precedes the file name and line number.

automountd[number]: No network locking on variable: contact admin to install server change

Action See the similar message “WARNING: No network locking on variable: contact admin to install server change” for details. If the server is not changed, data loss is possible in applications that depend on locking.

automountd[number]: server variable not responding

Cause This automounter message indicates that the system tried to mount a filesystem from an NFS server that is either down or extremely slow to respond. In some cases this message indicates that the network link to the NFS server is broken, although that condition produces other error messages as well.

Action If you are the system administrator responsible for the non-responding NFS server, check it out to see whether the machine needs repair or rebooting. Encourage your user community to report such problems quickly but only once. When the NFS server is back in operation, the automounter will be able to access the requested filesystem.

automount[number]: variable: Not a directory

Cause The file specified after the first colon is not a valid mount point because it is not a directory.

Action Ensure that the mount point is a directory, and not a regular file or a symbolic link.

B

Bad address

Cause The system encountered a hardware fault in attempting to access a parameter of a programming function.

Action Check if the bad address resulted from supplying the wrong device or option to a command. If that is not the problem, contact the vendor or author of the program for an update.

Technical Notes This error could occur any time a function that takes a pointer argument is passed an invalid address. Because processors differ in their ability to detect bad addresses, on some architectures passing bad addresses can result in undefined behaviors.

The symbolic name for this error is EFAULT, errno=14.

BAD/DUP FILE I=i OWNER=o MODE=m SIZE=s MTIME=t . CLEAR?

Cause While checking inode link counts during phase 4, fsck(1M) found a file (or directory) that either does not exist or exists somewhere else.

Action To clear the inode of its reference to this file or directory, answer yes. With the -p (preen) option, fsck automatically clears bad or duplicate file references, so answering yes to this question seldom causes a problem.

Bad file number

Cause Generally this is a program error, not a usage error.

Action Contact the vendor or author of the program for an update.

Technical Notes Either a file descriptor refers to no open file, or a read (or write) request is made to a file that is open only for writing (or reading).

The symbolic name for this error is EBADF, errno=9.

numberBAD I=number

Cause Upon detecting an out-of-range block, fsck(1M) prints the bad block number and its containing inode (after I=).

Action In fsck phases 2 and 4, you will decide whether or not to clear these bad blocks. Before committing to repair with fsck, you could determine which file contains this inode by passing the inode number to the ncheck(1M) command: by passing the inode number to the ncheck(1M) command:

# ncheck -iinum filesystem

bad module/chip at: variable

Cause This message from the memory management system often appears with parity errors, and indicates a bad memory module or chip at the position listed. Data loss is possible if the problem occurs other than at boot time.

Action Replace the memory module or chip at the indicated position. Refer to the vendor’s hardware manual for help finding this location.

BAD SUPER BLOCK: variable

Cause This message from fsck(1M) indicates that a filesystem’s super-block is damaged beyond repair and must be replaced. At boot time (with the -p option) this message is prefaced by the filesystem’s device name. After this message comes the actual damage recognized (see Action). Unfortunately fsck does not print the number of the damaged super-block.

Action The most common cause of this error is overlapping disk partitions. Do not immediately rerun fsck as suggested by the lines that display after the error message. First make sure that you have a recent backup of the filesystem involved; if not, try to back up the filesystem now using ufsdump(1M). Then run the format(1M) command, select the disk involved, and print out the partition information.

# format

: N

> partition

> print

Note whether the overlap occurs at the beginning or end of the filesystem involved. Then run newfs(1M) with the -N option to print out the filesystem parameters, including the location of backup super-blocks.

# newfs -N /dev/dsk/device

Select a super-block from a non-overlapping area of the disk, but note that in most cases you have only one chance to select the proper replacement super-block, which fsck soon propagates to all the cylinders. If you select the wrong replacement super-block, data corruption will probably occur, and you will have to restore from backup tapes. After you select a new super-block, provide fsck with the new master super-block number:

# fsck -o b=NNNN /dev/dsk/device

Technical Notes Specific reasons for a damaged super-block include: a wrong magic number, out of range NCG (number of cylinder groups) or CPG (cylinders per group), the wrong number of cylinders, a preposterously large super-block size, and trashed values in super-block. These reasons are generally not meaningful because a corrupt super-block is usually extremely corrupt.

.

BAD TRAP

Cause A bad trap can indicate faulty hardware or a mismatch between hardware and its configuration information. Data loss is possible if the problem occurs other than at boot time.

Action If you recently installed new hardware, verify that the software was correctly configured. Check the kernel traceback displayed on the console to see which device generated the trap. If the configuration files are correct, you will probably have to replace the device.

In some cases, the bad trap message indicates a bad or down-rev CPU.

Technical Notes A hardware processor trap occurred, and the kernel trap handler was unable to restore system state. This is a fatal error that usually precedes a panic, after which the system performs a sync, dump, and reboot. The following conditions can cause a bad trap: a system text or data access fault, a system data alignment error, or certain kinds of user software traps.

bad trap = number

Action See the message “BAD TRAP” for details.

/bin/sh: variable: too big

Cause This Bourne shell message indicates a classic “no memory” error. While trying to load the program specified after the first colon, the shell noticed that the system ran out of virtual memory (swap space).

Action See the message “Not enough space” for information on reconfiguring your system to add more swap space.

Block device required

Cause A raw (character special) device was specified where a block device was required, such as during a call to the mount(1M) command.

Action To see which block devices are available, use ls -l to look in /devices. Then specify a block device instead of a character device. Block device modes start with a b, whereas raw character device modes start with a c.

Technical Notes The symbolic name for this error is ENOTBLK, errno=15.

Boot device: /iommu/sbus/variable/variable/sd@3,0

Cause This message always appears at the beginning of rebooting. If there is a problem, the system hangs, and no other messages appear. This condition is caused by conflicting SCSI targets for the boot device, which is almost always target 3.

Action The boot device is usually the machine’s internal disk drive, target 3. Make sure that external and secondary disk drives are targeted to 1, 2, or 0, and do not conflict with each other. Also make sure that tape drives are targeted to 4 or 5, and CD drives to 6, avoiding any conflict with each other or with the disk drives. You can set a device’s target number using pushbutton switches or a dial on the back near the SCSI cables. If the targeting of the internal disk drive is in question, check it by powering off the machine, removing all external drives, turning the power on, and running the probe-scsi-all or probe-scsi command from the PROM monitor.

Broadcast Message from root (pts/number) on server [date]

Cause This message from the wall(1M) command gets transmitted to all users logged into a system. You could see it during a rlogin or telnet session, or on terminals connected to a timesharing system.

Action Carefully read the broadcast message. Often this broadcast is followed by a shutdown warning.

See the message “The system will be shut down in number minutes” for details about system shutdown.

Broken pipe

Cause This condition is often normal, and the message is merely informational (as when piping many lines to the head program). The condition occurs when a write on a pipe does not find a reading process. This usually generates a signal to the executing program, but this message displays when the program ignores the signal.

Action Check the process at the end of the pipe to see why it exited.

Technical Notes The symbolic name for this error is EPIPE, errno=32.

Bus Error

Cause A process has received a signal indicating that it attempted to perform I/O to a device that is restricted or that does not exist. This message is usually accompanied by a core dump, except on read-only filesystems.

Action Use a debugger to examine the core file and determine what program fault or system problem led to the bus error. If possible, check the program’s output files for data corruption that might have occurred before the bus error.

Technical Notes Bus errors can result from either programming error or device corruption on your system. Some common causes of bus errors are: invalid file descriptors, unreasonable I/O requests, bad memory allocation, misaligned data structures, compiler bugs, and corrupt boot blocks.

C

Cannot allocate colormap entry for “variable

Cause This message from libXt (X Intrinsics library) indicates that the system colormap was full even before the color name specified in quotes was requested. Some applications can continue after this message. Other applications, such as Workspace Properties Color, fail to come up when the colormap is full.

Action Exit the programs that make heavy use of the colormap, then restart the failed application and try again.

Can’t create public message device (Device busy)

Cause This message comes from the lp print scheduler, indicating that it is either extremely busy or hung.

Action If print jobs are coming out of the printer in question, wait until they are finished and then resubmit this print job. If you see this message again, the lp system is probably hung.

See the message “lp hang” for a procedure to clear the queue.

Technical Notes If lp is unable to create a device for printer messages, the message FIFO could be already in use, or locked by another print job.

Can’t invoke /etc/init, error number

Cause This message can appear while a system is booting, indicating that the init program is missing or corrupted. Note that /etc/init is a symbolic link to /sbin/init.

Action Boot the miniroot so you can replace init. Halt the machine by typing Stop-A or by pressing the reset button. Reboot single-user from CDROM, the net, or diskette. For example, type boot cdrom -s at the ok prompt to boot from CDROM. After the system comes up and gives you a # prompt, mount the device corresponding to the original / partition somewhere, with a command similar to the mount command below. Then copy the init program from the miniroot to the original / partition, and reboot the system.

# mount /dev/dsk/c0t3d0s0 /mnt

# cp /sbin/init /mnt/sbin/init

# reboot

If this doesn’t work, other files might be corrupted, and you might need to reinstall the entire system.

Technical Notes The error number is 2 if /sbin/init is missing, or 8 if /sbin/init has an incorrect executable format. This is usually followed by a “panic: icode” message. The system tries to reboot itself, but goes into a loop, because rebooting is impossible without init.

can’t synchronize with hayes

Cause This message sometimes appears when using a modem that the system regards as a “Hayes” type modem, which includes most modems manufactured today. The message can be caused by incorrect switch settings, by poor cable connections, or by not turning the modem on.

Action Check that the modem is on and that the cables between the modem and your system are securely connected. Check the internal and external modem switch settings. Turn the modem off and then on again, if necessary.

cd: Too many arguments

Cause The C shell’s cd(1) command takes only one argument. Either more than one directory was specified, or a directory name containing a space was specified. Directory names with spaces are easy to create with File Manager.

Action Use only one directory name. To change to a directory whose name contains spaces, enclose the directory name in double (“) or single (‘) quotes, or use File Manager.

Channel number out of range

Cause The system has run out of stream devices. This error results when a stream head attempts to open a minor device that does not exist or that is currently in use.

Action Check that the stream device in question exists and was created with an appropriate number of minor devices. Make sure that the hardware corresponds to this configuration. If the stream device configuration is correct, try again later when more system resources might be available.

Technical Notes The symbolic name for this error is ECHRNG, errno=37.

chmod: ERROR: invalid mode

Cause This message from the chmod(1) command indicates a problem in the first non-option argument.

Action If you are specifying a numeric file mode, you can provide any number of digits (although only the final one to four are considered), but all digits must be between 0 and 7. If you are specifying a symbolic file mode, use the syntax provided in the chmod usage message to avoid the “invalid mode” error message:

Usage: chmod [ugoa][+-=][rwxlstugo] file …

Note that some combinations of symbolic keyletters produce no error message but fail to have any effect. The first group, [ugoa], is truly optional. The second group, [+-=], is mandatory for chmod to have an effect. The third group, [rwxlstugo], is also mandatory for effect, and can be used in combination when that combination does not conflict.

Command not found

Cause The C shell could not find the program you gave as a command.

Action Check the form and spelling of the command line. If that looks correct, echo $path to see if the user’s search path is correct. When communications are garbled, it is possible to unset a search path to such an extent that only built-in shell commands are available. Here is a command to reset a basic search path:

% set path = (/usr/bin /usr/ccs/bin /usr/openwin/bin .)

If the search path looks correct, check the directory contents along the search path to see if programs are missing or if directories are not mounted.

See Also For more information about the C shell, see csh(1).

Connection closed.

Cause This message can appear when using rlogin(1) to another system if the remote host cannot create a process for this user, if the user takes too long to type the correct password, if the user interrupts the network connection, or if the remote host goes down. Data loss is possible if files were modified and not saved before the connection closed.

Action Just try again. If the other system has gone down, wait for it to reboot first.

Connection closed by foreign host.

Cause When a user telnets to another system, this message can appear if the user takes too long to type the correct password, if the remote host cannot create a login for this user, or if the remote host goes down or terminates the connection. Data loss is possible if files were modified and not saved before the connection closed.

Action Just try again. If the other system has gone down, wait for it to reboot first.

[Connection closed. Exiting]

Cause After using the talk(1) command to communicate with another user, the other person enters an interrupt (usually Control-c), and this message appears on your screen.

Action Sending an interrupt like this is the usual way of exiting the talk program. The talk session is over and you can return to your work.

Connection refused

Cause No connection could be made because the target machine actively refused it. This happens either when trying to connect to an inactive service or when a service process is not present at the requested address.

Action Activate the service on the target machine, or start it up again if it has disappeared. If for security reasons you do not intend to provide this service, inform the user community, possibly suggesting an alternative.

Technical Notes The symbolic name for this error is ECONNREFUSED, errno=146.

Connection timed out

Cause This occurs either when the destination host is down or when problems in the network cause lost transmission.

Action First check the operation of the host system, for example by using ping(1M) and ftp(1), then repair or reboot as necessary. If that doesn’t solve the problem, check the network cabling and connections.

Technical Notes No connection was established in a specified time. A connect or send request failed because the destination host did not properly respond after a reasonable interval. (The timeout period is dependent on the communication protocol.)

The symbolic name for this error is ETIMEDOUT, errno=145.

console login: ^J^M^Q^K^K^P

Cause This usually occurs because OpenWindows exited abnormally, leaving the system’s keyboard in the wrong mode. The characters that appear when someone attempts to login are garbage transliterations of what someone types.

Action Find another machine and remote login to this system, then run this command:

$ /usr/openwin/bin/kbd_mode -a

This puts the console back into ASCII mode. Note that kbd_mode is not a windows program, it just fixes the console mode.

Technical Notes The usual reason for this problem occurring is an automated script run from cron that clears out the /tmp directory every so often. Ensure that any such scripts do not remove the /tmp/.X11-pipe or /tmp/.X11-unix directories, or any files therein.

core dumped

Cause A core file contains an image of memory at the point of software failure, and is used by programmers to find the reason for the failure.

Action To see which program produced a core file, run either the file(1) command or the adb(1) command. The following examples show the output of the file and adb commands on a core file from the dtmail program.

$ file core

core: ELF 32-bit MSB core file SPARC Version 1, from ‘dtmail’

$ adb core

core file = core – program ‘dtmail’

SIGSEGV 11: segmentation violation

^D (use Control-d to quit the program)

Ask the vendor or author of this program for a debugged version.

Technical Notes Some signals, such as SIGQUIT, SIGBUS, and SIGSEGV, produce a core dump. See the signal(5) man page for a complete list.

If you have the source code for the program, you can try compiling it with cc -g, and debugging it yourself using dbx or a similar debugger. The where directive of dbx provides a stack trace.

On mixed networks, it can be difficult to discern which machine architecture produced a particular core dump, since adb on one type of system generally cannot read a core file from another type of system, and will produce an “unrecognized file” message. Run adb on various machine architectures until you find the right one.

The term “core” is archaic – ferrite core memory was supplanted by silicon RAM in the 1970s, although spaceships still employ core memory for its imperviousness to radiation.

Could not initialize tooltalk (tt_open): TT_ERR_NOMP

Cause Various desktop tools display or print this message when the ttsession(1) process is not available. The TookTalk service generally tries to restart ttsession if it is not running. So this error indicates that the ToolTalk service is either not installed or is not installed correctly.

Action Verify that the ttsession command exists in /usr/openwin/bin or /usr/dt/bin. If this command is not present, ToolTalk is not installed correctly. The packages constituting ToolTalk are the runtime SUNWtltk, developer support SUNWtltkd, and the manual pages SUNWtltkm. CDE ToolTalk packages have the same names with “.2” appended.

Technical Notes The full TT_ERR_NOMP message string reads as follows: “No ttsession is running, probably because tt_open() has not been called yet. If this is returned from tt_open() it means ttsession could not be started, which generally means ToolTalk is not installed on the system.”

Could not start new viewer

Cause This message appears in the AnswerBook navigator window, along with an XView error message on the console.

Action See the message “answerbook: XView error: NULL pointer passed to xv_set” for details.

cpio: Bad magic number/header.

Cause A cpio(1) archive has either become corrupted or was written out with an incompatible version of cpio.

Action Use the -k option to cpio to skip I/O errors and corrupted file headers. This might permit you to extract other files from the cpio archive. To extract files with corrupted headers, try editing the archive with a binary editor such as emacs. Each cpio file header contains a filename as a string.

Cross-device link

Cause An attempt was made to make a hard link to a file on another device, such as on another filesystem.

Action Establish a symbolic link using ln -s instead. Symbolic links are permitted across filesystem boundaries.

Technical Notes The symbolic name for this error is EXDEV, errno=18.

D

data access exception

Cause This message can result from running an old version of the operating system that does not support new hardware, or by running an operating system that is not configured for new hardware. It can also result from incorrectly installed DSIMMs or from a disk problem.

Action Upgrade your operating system to a version that supports the new hardware or machine architecture. For example, upgrading a SPARCstation 2 (with sun4c kernel architecture) to a SPARCstation 20 (with sun4m kernel architecture) requires an operating system upgrade or reconfiguration.

Data fault

Cause This is a kind of bad trap that usually causes a system panic. When this message appears after a bad trap message, a system text or data access fault probably occurred.. In the absence of a bad trap message, this message might indicate a user text or data access fault. Data loss is possible if the problem occurs other than at boot time.

Action Make sure the machine can reboot, then check the log file /var/adm/messages for hints about what went wrong.

. See the message “BAD TRAP” for more information.

Deadlock situation detected/avoided

Cause A programming deadlock situation was detected and avoided.

Action If the system had not detected and avoided a deadlock, a piece of software would have hung. Run the program again. The deadlock might not reoccur.

Technical Notes This error usually relates to file and record locking, but can also apply to mutexes, semaphores, condition variables, and read/write locks.

The symbolic name for this error is EDEADLK, errno=45.

Device busy

Cause An attempt was made to mount a device that was already mounted or to unmount a device containing an active file (such as an open file, a current directory, a mount point, or a running program). This message also occurs when trying to enable accounting that is already enabled.

Action To unmount a device containing active processes, close all the files under that mount point, quit any programs started from there, and change directories out of that hierarchy. Then try to unmount again.

Technical Notes Mutexes, semaphores, condition variables, and read/write locks set this error condition to indicate that a lock is held.

The symbolic name for this error is EBUSY, errno=16.

/dev/rdsk/variable: CAN’T CHECK FILE SYSTEM.

Cause The system cannot automatically clean (preen) this filesystem because it appears to be set up incorrectly or is having hard disk problems. This message asks that you run fsck(1M) manually, since data corruption might already have occurred.

Action Run fsck to clean the filesystem in question. See the message “/dev/rdsk/variable: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.” for proper procedures.

/dev/rdsk/variable: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

Cause At boot time the /etc/rcS script runs the fsck(1M) command to check the integrity of filesystems marked “fsck” in /etc/vfstab. If fsck cannot repair a filesystem automatically, it interrupts the boot procedure and produces this message. When fsck gets into this state, it cannot repair a filesystem without losing one or more files, so it wants to defer this responsibility to you, the administrator. Data corruption has probably already occurred.

Action First run fsck -n on the filesystem, to see how many and what type of problems exist. Then run fsck again to repair the filesystem. If you have a recent backup of the filesystem, you can generally answer “y” to all the fsck questions. It’s a good idea to keep a record of all problematic files and inode numbers for later reference. To run fsck yourself, specify options as recommended by the boot script. For example:

# fsck /dev/rdsk/c0t4d0s0

Usually the files lost during fsck repair are these that were created just before a crash or power outage, and they cannot be recovered. If you lose important files, you can recover them from backup tapes.

If you don’t have a backup, ask an expert to run fsck for you.

Directory not empty

Cause The directory operation that was attempted, such as directory removal with rmdir, can be performed only on an empty directory.

Action To remove the directory, first remove all the files that it contains. A quick way to remove a non-empty directory hierarchy is with the rm -r command.

Technical Notes The symbolic name for this error is ENOTEMPTY, errno=93.

Disc quota exceeded

Cause The user’s disk limit has been exceeded on a user filesystem, usually because a file was just created or enlarged beyond the limit. This almost always refers to a magnetic disk, and not to an optical disc. Any data created after this condition occurs will be lost.

Action The user can delete files to bring disk usage under the limit, or the server administrator can use the edquota(1M) command to increase the user’s disk limit.

Technical Notes The symbolic name for this error is EDQUOT, errno=49.

dumptm: Cannot open ‘/dev/rmt/variable‘: Device busy

Cause During filesystem backup, the dump program cannot open the tape drive because some other process is holding it open.

Action Find the process that has the tape drive open, and either kill(1) the process or wait for it to finish.

# ps -ef | grep /dev/rmt

# kill -9 processID

DUP/BAD I=i OWNER=o MODE=m SIZE=s MTIME=t FILE=f . REMOVE?

Cause During phase 1, fsck(1M) found duplicate blocks or bad blocks associated with the file or directory specified after FILE= whose inode number appears after I= (with other information).

Action To remove this file or directory, answer yes. If you end up removing more than a few files in this manner, data loss will result, so it might be preferable to restore the filesystem from backup tapes.

numberDUP I=number

Cause Upon detecting a block that is already claimed by another inode, fsck(1M) prints the duplicate block number and its containing inode (after I=).

Action In fsck phases 2 and 4, you will decide whether or not to clear these bad blocks. Before committing to repair with fsck, you could determine which file contains this inode by passing the inode number to the ncheck(1M) command:

# ncheck -iinum filesystem

E

error: DPS has not initialized or server connection failed

Cause This message appears when trying to run AnswerBook with a generic X11 window server or on a generic X terminal.

Action Running AnswerBook requires Display PostScript (DPS), or a NeWS server, or the Adobe DPS NS remote display software. In addition, a complete LaserWriterII Type-1 font set (including Palatino) should be installed on the X server. To find out if your X server has DPS, run xdpyinfo(1) to verify the presence of an “Adobe-DPS-Extension” line. X servers without this line don’t know about DPS.

ERROR: missing file arg (cm3)

Cause An attempt was madd to run some sccs(1) operation that requires a filename, such as create, edit, delget, or prt.

Action Supply the appropriate filename after the SCCS operation.

ERROR [SCCS/s.variable]: ‘SCCS/p.variable‘ nonexistent (ut4)

Cause An attempt was made to sccs edit or sccs get a file that is not yet under SCCS control.

Action Run sccs create on that file to place it under SCCS control.

ERROR [SCCS/s.variable]: writable ‘variable‘ exists (ge4)

Cause An attempt was made to sccs edit a file that is writable, probably because it is already checked out.

Action Run sccs info to see who has the file checked out. If it is you, go ahead and edit it. If it is somebody else, ask that person to check in the file.

esp0: data transfer overrun

Cause When a user tries to mount a CDROM on a third-party CD drive, mount(1M) fails with the above error, followed by the “sr0: SCSI transport failed” message. The CD drive probably comes from a vendor unknown to the system.

Action Third-party CD drives generally have an 8192 block size, as opposed to the 512 block size on supported Sun drives. Check with the vendor to see if any special configuration is possible to allow the drive to operate on a Sun workstation.

Event not found

Cause This C shell message indicates that a user tried to repeat a command from the history list, but that command or number does not exist in the list.

Action Run the C shell history command to display recent events in the history list. If a user often tries to run commands that have disappeared from the history list, make the list longer by setting history to a higher value.

EXCESSIVE BAD BLKS I=number . CONTINUE?

Cause During phase 1, fsck(1M) found more than 10 bad (out-of-range) blocks associated with the specified inode number.

Action With this many bad blocks, it might be preferable to restore the filesystem from backup tapes.

EXCESSIVE DUP BLKS I=number . CONTINUE?

Cause During phase 1, fsck(1M) found more than 10 duplicate (previously claimed) blocks associated with the specified inode number.

Action With this many duplicate blocks, it might be preferable to restore the filesystem from backup tapes.

Exec format error

Cause This often happens when trying to run software compiled for different systems or architectures, such as when executing Solaris 2.x programs on a SunOS 4.1.x system, or when trying to execute SPARC-specific programs on an x86 machine. On a Solaris 2.x system, it can also occur if the Binary Compatibility Package was not installed.

Action Make sure that the software matches the architecture and system you’re using. The file(1) command can help you determine the target architecture. If you’re using SunOS 4.1.x software on a Solaris 2.x system, make sure that the Binary Compatibility Package is installed. You can check for it using this command:

$ pkginfo | grep SUNWbcp

Technical Notes A request was made to execute a file that, although it has the appropriate permissions, does not start with a valid format.

The symbolic name for this error is ENOEXEC, errno=8.

See Also See the a.out(4) man page for a description of executable files.

F

fd0: unformatted diskette or no diskette in the drive

Cause This message appears on the system console to indicate that the floppy driver fd(7) could not read the label on a diskette. Usually this is either because a new diskette has not yet been formatted, or a formatted diskette has become corrupted. This message often appears along with “read failed” and “bad format” messages after volcheck(1) is run.

Action If you are certain that the diskette contains no data, run fdformat -d to format the diskette in DOS format. (You can also format a diskette in UFS format if you like, although then it is not transportable to most other systems.) When the diskette is formatted, you can write on it, if it was not corrupted beyond repair.

File exists

Cause The name of an existing file was mentioned in an inappropriate context. For example, it is not allowed to establish a link to an existing file, or to overwrite an existing file when the csh(1) noclobber option is set.

Action Look at the names of files in the directory, then try again with a different name or after renaming or removing the existing file.

Technical Notes The symbolic name for this error is EEXIST, errno=17.

File locking deadlock

Cause This is a programming problem, in some cases unavoidable.

Action All a user can do is restart the program and hope deadlock does not reoccur.

Technical Notes In the file locking subsystem, two processes tried to modify some lock at the same time. In the multithreading subsystem, two threads became deadlocked and could not continue. When a program using the threads library encounters this error, it should restart the deadlocked threads.

The symbolic name for this error is EDEADLOCK, errno=56.

filemgr: mknod: Permission denied

Cause File Manager issues this message and fails to come up whenever the /tmp/.removable directory is owned by another user and is not 1777 mode. This can happen, for example, when multiple users share a workstation.

Action Have the original owner change the mode ((chmod(1)) of this file back to 1777, its default creation mode. Rebooting the workstation also resolves this problem.

Technical Notes This is a known problem that was fixed in Solaris 2.4.

File name too long

Cause The specified file name has too many characters.

Action If a file name or path name component is too long, devise a shorter name. If the total path name is longer than PATH_MAX characters, first change to an intermediate directory, then specify a shorter path name. Newly-created data will be lost unless written to another file with a shorter name.

Technical Notes In a UFS or NFS-mounted UFS filesystem, the length of a path name component exceeds MAXNAMLEN (255) characters, or the total length of the path name exceeds PATH_MAX (1024) characters. In a System V filesystem, the length of a path name component exceeds NAME_MAX (14) characters while no-truncation mode is in effect. These values are defined in the /usr/include/limits.h(4) file.

The symbolic name for this error is ENAMETOOLONG, errno=78.

FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX?

Cause The fsck(1M) command has just checked a filesystem, and has determined that the filesystem is clean. The filesystem’s superblock, however, still thinks the filesystem is “dirty” in some way.

Action If you believe that the filesystem is adequately repaired, answer yes to mark the filesystem as clean.

Technical Notes Different “dirty” filesystem types are listed in /usr/include/sys/fs/ufs_fs.h, and include FSACTIVE, FSBAD, FSFIX, FSLOG, and FSSUSPEND.

File table overflow

Cause The kernel file table is full because too many files are open on the system. Temporarily, no more files can be opened. New data created under this condition will probably be lost.

Action Simply waiting often gives the system time to close files. However, if this message occurs often, reconfigure the kernel to allow more open files. To increase the size of the file table in Solaris 2.x, increase the value of maxusers in the /etc/system file. The default maxusers value is the amount of main memory in MB, minus 2.

Technical Notes The symbolic name for this error is ENFILE, errno=23.

File too large

Cause The file size exceeded the limit specified by ulimit(1), or the file size exceeds the maximum supported by the file system. New data created under this condition will probably be lost.

Action In the C shell, use the limit command to see or set the default file size. In the Bourne or Korn shells, use the ulimit -a command. Even when the shells claim that the file size is unlimited, in fact the system limit is FCHR_MAX (usually 1 gigabyte).

Technical Notes The symbolic name for this error is EFBIG, errno=27.

FREE BLK COUNT(S) WRONG IN SUPERBLK . SALVAGE?

Cause During phase 5, fsck(1M) detected that the actual number of free blocks in the filesystem did not match the superblock’s free block count. The df(1M) command accesses this free block count when measuring filesystem capacity.

Action Generally you can answer yes to this question without harming the filesystem.

fsck: Can’t open /dev/dsk/variable

Cause The fsck(1M) command cannot open the disk device, because although a similar filesystem exists, the partition specified does not.

Action Run the mount(1M) or the format(1M) command to see what filesystems are configured on the machine. Then run fsck again on an existing partition.

fsck: Can’t stat /dev/dsk/variable

Cause The fsck(1M) command cannot open the disk device, because the specified filesystem does not exist.

Action Run the mount(1M) or the format(1M) command to see what filesystems are configured on the machine. Then run fsck again on an existing filesystem.

G

giving up

Cause This message appears in the SCSI log to indicate that a read or write operation has been retried until it timed out. With SCSI disk the timeout period is usually 30 seconds; with tape the period is usually 20 attempts. Timeout periods are generally coded into the drivers.

Action Check that all SCSI devices are connected and powered on. Make sure that SCSI target numbers are correct and not in conflict. Verify that all cables are no longer than six meters, total, and that all SCSI connections are properly terminated.

Technical Notes The scsi_log(9F) routine usually displays messages on the system console and in the /var/adm/messages file. Run the dmesg(1M) command to see the most recent message buffer.

Graphics Adapter device /dev/fb is of unknown type

Cause The /dev/fb driver is either missing or corrupted.

Action See “InitOutput: Error loading module for /dev/fb” for details.

group.org_dir: NIS+ servers unreachable

Cause This is the second of three messages that an NIS+ client prints when it cannot locate an NIS+ server on the network.

Action See the message “hosts.org_dir: NIS+ servers unreachable” for details.

H

/home/variable: No such file or directory

Cause An attempt was made to change to a user’s home directory, but either that user does not exist or the user’s fileserver has not shared (exported) that filesystem.

Action To check on the existence of a particular user, run the ypmatch(1) or nismatch(1) command, specifying the user name and then the passwd map.

To export filesystems from the remote fileserver, become superuser on that system and run the share(1M) command with the appropriate options. If that system is sharing (exporting) filesystems for the first time, also invoke /etc/init.d/nfs.server start to begin NFS service.

See Also For more information on sharing filesystems, see the share_nfs(1M) man page.

Host is down

Cause A transport connection failed because the destination host was down. For example, mail delivery was attempted over several days, but the destination machine was not available during any of these attempts.

Action Report this error to the system administrator for the host. If you are the person responsible for this system, check to see if the machine needs repair or rebooting.

Technical Notes This error results from status information delivered by the underlying communication interface. If there is no known connection to the host, a different message us

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us