capabilities(7) -- Linux man page
NAMEcapabilities - overview of Linux capabilities
For the purpose of performing permission checks, traditional Unix implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is non-zero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process's credentials (usually: effective UID, effective GID, and supplementary group list).
Starting with kernel 2.2, Linux provides an (as yet incomplete) system of capabilities, which divide the privileges traditionally associated with superuser into distinct units that can be independently enabled and disabled.
As at Linux 2.4.20, the following capabilities are implemented:
- Allow arbitrary changes to file UIDs and GIDs (see chown(2)).
- Bypass file read, write, and execute permission checks. (DAC = "discretionary access control".)
- Bypass file read permission checks and directory read and execute permission checks.
- Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file (e.g., utime(2)), excluding those operations covered by the CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH; ignore sticky bit on file deletion.
- Don't clear set-user-ID and set-group-ID bits when a file is modified; permit setting of the set-group-ID bit for a file whose GID does not match the file system or any of the supplementary GIDs of the calling process.
- Permit memory locking (mlock(2), mlockall(2), shmctl(2)).
- Bypass permission checks for operations on System V IPC objects.
- Bypass permission checks for sending signals (see kill(2)).
- (Linux 2.4 onwards) Allow file leases to be established on arbitrary files (see fcntl(2)).
- Allow setting of the EXT2_APPEND_FL and EXT2_IMMUTABLE_FL ext2 extended file attributes.
- (Linux 2.4 onwards) Allow creation of special files using mknod(2).
- Allow various network-related operations (e.g., setting privileged socket options, enabling multicasting, interface configuration, modifying routing tables).
- Allow binding to Internet domain reserved socket ports (port numbers less than 1024).
- (Unused) Allow socket broadcasting, and listening multicasts.
- Permit use of RAW and PACKET sockets.
- Allow arbitrary manipulations of process GIDs and supplementary GID list; allow forged GID when passing socket credentials via Unix domain sockets.
- Grant or remove any capability in the caller's permitted capability set to or from any other process.
- Allow arbitrary manipulations of process UIDs (setuid(2), etc.); allow forged UID when passing socket credentials via Unix domain sockets.
- Permit a range of system administration operations including: quotactl(2), mount(2), swapon(2), sethostname(2), setdomainname(2), IPC_SET and IPC_RMID operations on arbitrary System V IPC objects; allow forged UID when passing socket credentials.
- Permit calls to reboot(2).
- Permit calls to chroot(2).
- Allow loading and unloading of kernel modules; allow modifications to capability bounding set.
- Allow raising process nice value (nice(2), setpriority(2)) and changing of the nice value for arbitrary processes; allow setting of real-time scheduling policies for calling process, and setting scheduling policies and priorities for arbitrary processes (sched_setscheduler(2), sched_setparam(2)).
- Permit calls to acct(2).
- Allow arbitrary processes to be traced using ptrace(2)
- Permit I/O port operations (iopl(2) and ioperm(2)).
- Permit: use of reserved space on ext2 file systems; ioctl(2) calls controlling ext3 journaling; disk quota limits to be overridden; resource limits to be increased (see setrlimit(2)); RLIMIT_NPROC resource limit to be overridden; msg_qbytes limit for a message queue to be raised above the limit in /proc/sys/kernel/msgmnb (see msgop(2) and msgctl(2).
- Allow modification of system clock (settimeofday(2), adjtimex(2)); allow modification of real-time (hardware) clock
- Permit calls to vhangup(2).
Process CapabilitiesEach process has three capability sets containing zero or more of the above capabilities:
- the capabilities used by the kernel to perform permission checks for the process.
- the capabilities that the process may assume (i.e., a limiting superset for the the effective and inheritable sets). If a process drops a capability from its permitted set, it can never re-acquire that capability (unless it execs a set-UID-root program).
- the capabilities preserved across an execve(2).
In the current implementation, a process is granted all permitted and effective capabilities (subject to the operation of the capability bounding set described below) when it execs a set-UID-root program, or if a process with a real UID of zero execs a new program.
A child created via fork(2) inherits copies of its parent's capability sets.
Using capset(2), a process may manipulate its own capability sets, or, if it has the CAP_SETPCAP capability, those of another process.
Capability bounding setWhen a program is execed, the permitted and effective capabities are ANDed with the current value of the so-called capability bounding set, defined in the file /proc/sys/kernel/cap-bound. This parameter can be used to place a system-wide limit on the capabilities granted to all subsequently executed programs. (Confusingly, this bit mask parameter is expressed as a signed decimal number in /proc/sys/kernel/cap-bound.)
Only the init process may set bits in the capability bounding set; other than that, the superuser may only clear bits in this set.
On a standard system the capability bounding set always masks out the CAP_SETPCAP capability. To remove this restriction, modify the definition of CAP_INIT_EFF_SET in include/linux/capability.h and rebuild the kernel.
Current and Future ImplementationA full implementation of capabilities requires:
- that for all privileged operations, the kernel check whether the process has the required capability in its effective set.
- that the kernel provide system calls allowing a process's capability sets to be changed and retrieved.
- file system support for attaching capabilities to an executable file, so that a process gains those capabilities when the file is execed.
As at Linux 2.4.20, only the first two of these requirements are met.
Eventually, it should be possible to associate three capability sets with an executable file, which, in conjunction with the capability sets of the process, will determine the capabilities of a process after an exec:
- this set is ANDed with the process's inherited set to determine which inherited capabilities are permitted to the process after the exec.
- the capabilities automatically permitted to the process, regardless of the process's inherited capabilities.
- those capabilities in the process's new permitted set are also to be set in the new effective set. (F(effective) would normally be either all zeroes or all ones.)
In the meantime, since the current implementation does not support file capability sets, during an exec:
- All three file capability sets are initially assumed to be cleared.
- If a set-UID-root program is being execed, or the real user ID of the process is 0 (root) then the file allowed and forced sets are defined to be all ones (i.e., all capabilities set).
- If a set-UID-root program is being executed, then the file effective set is defined to be all ones.
During an exec, the kernel calculates the new capabilities of the process using the following algorithm:
P'(permitted) = (P(inherited) & F(allowed)) | (F(forced) & cap_bset) P'(effective) = P'(permitted) & F(effective) P'(inherited) = P(inherited) [i.e., unchanged]where:
- denotes the value of a process capability set before the exec
- denotes the value of a capability set after the exec
- denotes a file capability set
- is the value of the capability bounding set.
NOTESThe libcap package provides a suite of routines for setting and getting process capabilities that is more comfortable and less likely to change than the interface provided by capset(2) and capget(2).
CONFORMING TONo standards govern capabilities, but the Linux capability implementation is based on the withdrawn POSIX 1003.1e draft standard.
BUGSThere is as yet no file system support allowing capabilities to be associated with executable files.
SEE ALSOcapget(2), prctl(2)