It’s an unfortunate fact that many programmers are lazy about error messages. Very often, all you get is a cryptic “Error 5”, and you may be lucky to get that: sometimes all you get is an error return that you have to examine yourself with “echo $?”. You can’t even depend on that being the actual Unix error, but even if it is, what does it mean?
Well, every Unix/Linux system includes various “.h” files that describe the numeric errors returned by kernel system calls. Unfortunately, those files are only a little bit more illuminating than the numeric errors themselves. For example, here’s a couple of lines from a Linux system:
#define EPERM 1 /* Operation not permitted */
...
#define EACCES 13 /* Permission denied */
What’s the difference? When would you get one versus the other? This article attempts to more fully explain what these errors mean and to give examples of what might cause them.
I’m only going to look at the first 32 of these; there are many more, but these are the more common. Understand that the numeric codes can vary from Unix to Unix- you really need to look in the /usr/include files to find the symbolic names, and even those are used in slightly different ways- certain system calls on a BSD system, for example, will return a different error result than they will on a SysV Unix. However, most of that kind of thing is esoteric detail of concern only to programmers working on multiple platforms.
Even where the error numbers and the symbolic constants are the same, the comments may vary. For example, while SCO Unix and Linux systems would look almost exactly alike for the first 30 or 40 errors, some of the comments are markedly different, and higher numbered errors are defined completely differently. So, the thing to keep in mind is that just because you’ve seen a particular error on a particular platform doesn’t mean it is the same somewhere else. There’s also nothing that prtevents a programmer from misusing these constants in their own error returns, either through ignorance or simple misunderstanding of the historical use of these.
And it also means that the descriptions of what might cause a specific error are heavily dependent on that word “might”. Please keep that in mind as you read this.
On a Linux system with source installed, you can cd to /usr/src/linux*/kernel and do a grep -l for the symbolic constant you are interested in. For example, here’s the places where EPERM is referenced on a 7.2 Red Hat system:
acct.c
capability.c
fork.c
kmod.c
module.c
printk.c
ptrace.c
sched.c
signal.c
sys.c
sysctl.c
time.c
uid16.c
For other Unix systems, pawing through documentation is the only way. For this article, I used:
Again, keep in mind that this is all examples, and may not apply to your specific platform. The system calls shown as examples may not be the only functions that will return these errors; you really need access to the source to know that.
This error is returned by kernel routines when the calling process lacks the necessary authority. For example, if you, as an ordinary user, call the “setuid()” function trying to change to someone else’s ID, it will fail and EPERM will be returned. The “getpgrp()” uses this return if you try to get the process ID of a process not part of your login session.
This one’s easy: You are trying to open a file that doesn’t exist and you haven’t specified to create it. However, it can also be returned for trying to open a non-existent IPC channel, or if one of the directories in a pathname does not exist.
The kill() function returns this if you pass it a non-existent PID. Trying to delete a non-existent route from the routing table also uses this.
When a program reads from a “slow” device (a terminal, for example), that read can sit returning nothing for a long time, and it may be that the process is sent a signal during that wait. The programmer needs to know whether the read returned because it got its data, because there is no more data, or if a signal interrupted it. That’s the purpose of this error, though it is also used for the pause() function and some IPC functions.
The catchall for all manner of unexpected hardware errors. It could be from a physical error, but additionally, an orphaned process (a process whose parent has died) that attempts to read from standard input will get this. BSD systems return this if you try to open a pty device that is already in use. An attempt to read from a stream that is closed will return EIO, as will a disk read or write that is outside of the physical bounds of the device. An open of /dev/tty when the process has no controlling tty will spit back EIO also.
This can be the result of opening a FIFO write-only, with O_NDELAY set, but no process is reading the FIFO. It may also be returned if I/O is attempted on a sub-device of a driver that does not exist (for example, a tape device that has not been defined in the kernel), or if I/O is attempted beyond the limits of the device.
This can be returned by exec() when too much is passed. You could see that, for example, if you tried to run “ls” on a directory with too many files. But it also can come from attempting to pass too much data to an IPC message queue, and from trying to do too many operations in a semop() (semaphore) call .
Ask the kernel to run a binary it doesn’t recognize as valid and this is what you get. Assumiong you aren’t rrying to execute arbitrary data and haven’t copied a binary from some other OS, you probably have a corrupt file.
When a program opens a file, open() returns a numeric file descriptor. Further calls to read() or write() use that descriptor- if it is not valid (never opened it, or closed it prior to the read or write), this is returned.
When a program spawns off a child process, it may wait() for the exit status of the child. If it tries to wait() for a child that doesn’t exist, or re-issues a wait() for the same child, it gets this. This is also used when the parent has set its signals so that children can exit without being waited for; in that case it just indicates that all children have exited.
Good advice in general, but EAGAIN is generally used with non-blocking I/O in the case where part or all of the data you wanted to read or write can’t be completed just now because of your non-blocking request. This is true for files as well as IPC communications. Depending on your platform, this or EACESS may be returned by fcntl() when it cannot grant a lock you have requested. On BSD platforms, bind() can return this when trying to bind a reserved port number if all are in use.
If you try to exec() another process or just ask for more memory in this process, the kernel will give back this if it can’t give you what you need.
Simply, file permissions don’t let you do the open() you requested. But also see EAGAIN above. This can also be returned by getspnam() when you aren’t root.
A bad memory address, specifically one that doesn’t belong to the current process. Typically a programming error causes this.
Try to set disk quotas on something that isn’t a block device and this is the error you’ll get. Mounting/unmounting and other filesystem related functions will also use this return.
Trying to unmount a file system that is in use will generate this. Although less likeky in practice, trying to remove a directory that has a filesystem mounted on it will also complain in this manner. And, while a filesystem is being mounted or unmounted, a process that attempts to access it will find it locked and will get this error.
You get this when you explicitly try to open a file with O_CREAT and O_EXCL set, or try to create a new IPC structure with IPC_CREAT and IPC_EXCL but the file or IPC already exists. The link() function also fails with EEXIST if the “new” file is not.
You can’t link across filesystems (that’s what symbolic links are for). Trying to rename a directory to some other filesystem is the same problem.
Any ioctl requests will generate this when applied to a device that doesn’t support ioctl’s- like /dev/null. Inappropriate requests ( reading from a write only device) may return this or EINVAL- there’s seems to be plenty of confusion as to which to use.
Any system call that expects a directory and doesn’t get one will complain with this.
Attempting write() on a directory will get you this.
EINVAL gets used a lot. TCP has the concept of “out of band data” (urgent data). If a reading process checks for this, and there isn’t any, it get EINVAL. The plock() function ( which locks areas of a process into memory) returns this if you attempt to use it twice on the same memory segment. If you try to specify SIGKILL or SIGSTOP to sigaction(), you’ll get this return. The readv() and writev() calls complain this way if you give them too large an array of buffers. As mentioned above, drivers may return this for inappropriate ioctl() calls. The mmap() call will return this if you’ve specified a specific address but that address can’t be used. A seek() to before the beginning of a file returns this. Streams use this if you attempt to link a stream onto itself. It’s used for many IPC errors also.
When the system itself can open no more files, this is the error returned.
When a process tries to execeed the maximum number of open file descriptors allowed, open() returns this. The “file” could also be a network socket.
While a lot of people programming Unix and Linux may never have seen a real typewriter, would anyone ever confuse a computer with a typewriter? Seriously, this is the generic and time-honored Unix complaint when you try to do something that needs a character device. Ioctl’s return this when applied to ordinary files, for example. So will attempts to get or set attributes ( tcgettattr() tcsetattr() ) on something not a terminal device. So will tcdrain(), tcflush(), tcflow() and tcsendbreak().
A “text” file is a program- an executing binary. It’s illegal to write to a binary while it is executing- simply because allowing that complicates swapping and paging. Interestingly, some Unixes don’t have this at all: Unixware, for example returns different errors: see http://docsrv.caldera.com/SDK_porting/kernel_compat_errnos.html.
You’ve tried to extend a file byond the maximum supported size. That could be the maximum size supported by the file system or it could be a per-process limit imposed on you specifically.
Ooops. Time for a larger disk. IPC creates can also return this.
You aren’t allowed to seek on a pipe. Socket calls can also return this.
Not much to be said about that.
Too many links on a file system would be my guess, but I can’t find a thing on this in source or books. Perhaps you’d get this if you had a recursive directory looping back on itself.
Fairly obvious- the reading or writing side of a pipe drops out of the game.
Here’s some odd ones:
#define ENOPKG 65
#define EISNAM 139
<PRE>
Heres a program that will display the error acronyms and string messages for any platform its compiled on.
Tested on OSR5.0.{56} and RH 6.1 Linux
-- hops
/* * Print out err nums and msgs up to sys_nerr or hit null * return * Note sys_nerr not defined on some systems.. * Hops 27-Apr-95 * add ENO* display as well (osr5) 30-Mar-99 * Build on Linux (RH 6.1) Sept 2002 */ #include #include #include char *enostr(int a); int estrmatch(char *s); #ifdef ERRNO_SCOMAX /* overide past sys_errno */ #define MAX_ERRNO 501 #else #define MAX_ERRNO sys_nerr #endif int main(int ac, char **av) { int a; char *s; if (ac != 2 ) { printf("ErrnotAcronym Msg Stringn"); for (a=1; a <PRE> A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com