Skip to content
  • Rik van Riel's avatar
    mm,fork: introduce MADV_WIPEONFORK · d2cd9ede
    Rik van Riel authored
    Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
    in the child process after fork.  This differs from MADV_DONTFORK in one
    important way.
    
    If a child process accesses memory that was MADV_WIPEONFORK, it will get
    zeroes.  The address ranges are still valid, they are just empty.
    
    If a child process accesses memory that was MADV_DONTFORK, it will get a
    segmentation fault, since those address ranges are no longer valid in
    the child after fork.
    
    Since MADV_DONTFORK also seems to be used to allow very large programs
    to fork in systems with strict memory overcommit restrictions, changing
    the semantics of MADV_DONTFORK might break existing programs.
    
    MADV_WIPEONFORK only works on private, anonymous VMAs.
    
    The use case is libraries that store or cache information, and want to
    know that they need to regenerate it in the child process after fork.
    
    Examples of this would be:
     - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
       check, which is too slow without a PID cache)
     - PKCS#11 API reinitialization check (mandated by specification)
     - glibc's upcoming PRNG (reseed after fork)
     - OpenSSL PRNG (reseed after fork)
    
    The security benefits of a forking server having a re-inialized PRNG in
    every child process are pretty obvious.  However, due to libraries
    having all kinds of internal state, and programs getting compiled with
    many different versions of each library, it is unreasonable to expect
    calling programs to re-initialize everything manually after fork.
    
    A further complication is the proliferation of clone flags, programs
    bypassing glibc's functions to call clone directly, and programs calling
    unshare, causing the glibc pthread_atfork hook to not get called.
    
    It would be better to have the kernel take care of this automatically.
    
    The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
    MADV_WIPEONFORK.
    
    This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:
    
        https://man.openbsd.org/minherit.2
    
    [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
    Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
    
    
    Signed-off-by: default avatarRik van Riel <riel@redhat.com>
    Reported-by: default avatarFlorian Weimer <fweimer@redhat.com>
    Reported-by: default avatarColm MacCártaigh <colm@allcosts.net>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Helge Deller <deller@gmx.de>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Drewry <wad@chromium.org>
    Cc: <linux-api@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    d2cd9ede