Skip to content
  • Christian Brauner's avatar
    clone: add CLONE_PIDFD · b3e58382
    Christian Brauner authored
    
    
    This patchset makes it possible to retrieve pid file descriptors at
    process creation time by introducing the new flag CLONE_PIDFD to the
    clone() system call.  Linus originally suggested to implement this as a
    new flag to clone() instead of making it a separate system call.  As
    spotted by Linus, there is exactly one bit for clone() left.
    
    CLONE_PIDFD creates file descriptors based on the anonymous inode
    implementation in the kernel that will also be used to implement the new
    mount api.  They serve as a simple opaque handle on pids.  Logically,
    this makes it possible to interpret a pidfd differently, narrowing or
    widening the scope of various operations (e.g. signal sending).  Thus, a
    pidfd cannot just refer to a tgid, but also a tid, or in theory - given
    appropriate flag arguments in relevant syscalls - a process group or
    session. A pidfd does not represent a privilege.  This does not imply it
    cannot ever be that way but for now this is not the case.
    
    A pidfd comes with additional information in fdinfo if the kernel supports
    procfs.  The fdinfo file contains the pid of the process in the callers
    pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".
    
    As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
    parent_tidptr argument of clone.  This has the advantage that we can
    give back the associated pid and the pidfd at the same time.
    
    To remove worries about missing metadata access this patchset comes with
    a sample program that illustrates how a combination of CLONE_PIDFD, and
    pidfd_send_signal() can be used to gain race-free access to process
    metadata through /proc/<pid>.  The sample program can easily be
    translated into a helper that would be suitable for inclusion in libc so
    that users don't have to worry about writing it themselves.
    
    Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
    Co-developed-by: default avatarJann Horn <jannh@google.com>
    Signed-off-by: default avatarJann Horn <jannh@google.com>
    Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: David Howells <dhowells@redhat.com>
    Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
    Cc: Andy Lutomirsky <luto@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Aleksa Sarai <cyphar@cyphar.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    b3e58382