Skip to content
  • Sargun Dhillon's avatar
    seccomp: Introduce addfd ioctl to seccomp user notifier · 7cf97b12
    Sargun Dhillon authored
    The current SECCOMP_RET_USER_NOTIF API allows for syscall supervision over
    an fd. It is often used in settings where a supervising task emulates
    syscalls on behalf of a supervised task in userspace, either to further
    restrict the supervisee's syscall abilities or to circumvent kernel
    enforced restrictions the supervisor deems safe to lift (e.g. actually
    performing a mount(2) for an unprivileged container).
    While SECCOMP_RET_USER_NOTIF allows for the interception of any syscall,
    only a certain subset of syscalls could be correctly emulated. Over the
    last few development cycles, the set of syscalls which can't be emulated
    has been reduced due to the addition of pidfd_getfd(2). With this we are
    now able to, for example, intercept syscalls that require the supervisor
    to operate on file descriptors of the supervisee such as connect(2).
    However, syscalls that cause new file descriptors to be installed can not
    currently be correctly emulated since there is no way for the supervisor
    to inject file descriptors into the supervisee. This patch adds a
    new addfd ioctl to remove this restriction by allowing the supervisor to
    install file descriptors into the intercepted task. By implementing this
    feature via seccomp the supervisor effectively instructs the supervisee
    to install a set of file descriptors into its own file descriptor table
    during the intercepted syscall. This way it is possible to intercept
    syscalls such as open() or accept(), and install (or replace, like
    dup2(2)) the supervisor's resulting fd into the supervisee. One
    replacement use-case would be to redirect the stdout and stderr of a
    supervisee into log file descriptors opened by the supervisor.
    The ioctl handling is based on the discussions[1] of how Extensible
    Arguments should interact with ioctls. Instead of building size into
    the addfd structure, make it a function of the ioctl command (which
    is how sizes are normally passed to ioctls). To support forward and
    backward compatibility, just mask out the direction and size, and match
    everything. The size (and any future direction) checks are done along
    with copy_struct_from_user() logic.
    As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
    alignment without requiring packing as there have been packing issues
    with uapi highlighted before[2][3]. Although we could overload the
    newfd field and use -1 to indicate that it is not to be used, doing
    so requires changing the size of the fd field, and introduces struct
    packing complexity.
    Cc: Christoph Hellwig <>
    Cc: Christian Brauner <>
    Cc: Tycho Andersen <>
    Cc: Jann Horn <>
    Cc: Robert Sesek <>
    Cc: Chris Palmer <>
    Cc: Al Viro <>
    Suggested-by: default avatarMatt Denton <>
    Signed-off-by: default avatarSargun Dhillon <>
    Reviewed-by: default avatarWill Drewry <>
    Co-developed-by: default avatarKees Cook <>
    Signed-off-by: default avatarKees Cook <>