• David Windsor's avatar
    usercopy: Prepare for usercopy whitelisting · 8eb8284b
    David Windsor authored
    
    
    This patch prepares the slab allocator to handle caches having annotations
    (useroffset and usersize) defining usercopy regions.
    
    This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
    whitelisting code in the last public patch of grsecurity/PaX based on
    my understanding of the code. Changes or omissions from the original
    code are mine and don't reflect the original grsecurity/PaX code.
    
    Currently, hardened usercopy performs dynamic bounds checking on slab
    cache objects. This is good, but still leaves a lot of kernel memory
    available to be copied to/from userspace in the face of bugs. To further
    restrict what memory is available for copying, this creates a way to
    whitelist specific areas of a given slab cache object for copying to/from
    userspace, allowing much finer granularity of access control. Slab caches
    that are never exposed to userspace can declare no whitelist for their
    objects, thereby keeping them unavailable to userspace via dynamic copy
    operations. (Note, an implicit form of whitelisting is the use of constant
    sizes in usercopy operations and get_user()/put_user(); these bypass
    hardened usercopy checks since these sizes cannot change at runtime.)
    
    To support this whitelist annotation, usercopy region offset and size
    members are added to struct kmem_cache. The slab allocator receives a
    new function, kmem_cache_create_usercopy(), that creates a new cache
    with a usercopy region defined, suitable for declaring spans of fields
    within the objects that get copied to/from userspace.
    
    In this patch, the default kmem_cache_create() marks the entire allocation
    as whitelisted, leaving it semantically unchanged. Once all fine-grained
    whitelists have been added (in subsequent patches), this will be changed
    to a usersize of 0, making caches created with kmem_cache_create() not
    copyable to/from userspace.
    
    After the entire usercopy whitelist series is applied, less than 15%
    of the slab cache memory remains exposed to potential usercopy bugs
    after a fresh boot:
    
    Total Slab Memory:           48074720
    Usercopyable Memory:          6367532  13.2%
             task_struct                    0.2%         4480/1630720
             RAW                            0.3%            300/96000
             RAWv6                          2.1%           1408/64768
             ext4_inode_cache               3.0%       269760/8740224
             dentry                        11.1%       585984/5273856
             mm_struct                     29.1%         54912/188448
             kmalloc-8                    100.0%          24576/24576
             kmalloc-16                   100.0%          28672/28672
             kmalloc-32                   100.0%          81920/81920
             kmalloc-192                  100.0%          96768/96768
             kmalloc-128                  100.0%        143360/143360
             names_cache                  100.0%        163840/163840
             kmalloc-64                   100.0%        167936/167936
             kmalloc-256                  100.0%        339968/339968
             kmalloc-512                  100.0%        350720/350720
             kmalloc-96                   100.0%        455616/455616
             kmalloc-8192                 100.0%        655360/655360
             kmalloc-1024                 100.0%        812032/812032
             kmalloc-4096                 100.0%        819200/819200
             kmalloc-2048                 100.0%      1310720/1310720
    
    After some kernel build workloads, the percentage (mainly driven by
    dentry and inode caches expanding) drops under 10%:
    
    Total Slab Memory:           95516184
    Usercopyable Memory:          8497452   8.8%
             task_struct                    0.2%         4000/1456000
             RAW                            0.3%            300/96000
             RAWv6                          2.1%           1408/64768
             ext4_inode_cache               3.0%     1217280/39439872
             dentry                        11.1%     1623200/14608800
             mm_struct                     29.1%         73216/251264
             kmalloc-8                    100.0%          24576/24576
             kmalloc-16                   100.0%          28672/28672
             kmalloc-32                   100.0%          94208/94208
             kmalloc-192                  100.0%          96768/96768
             kmalloc-128                  100.0%        143360/143360
             names_cache                  100.0%        163840/163840
             kmalloc-64                   100.0%        245760/245760
             kmalloc-256                  100.0%        339968/339968
             kmalloc-512                  100.0%        350720/350720
             kmalloc-96                   100.0%        563520/563520
             kmalloc-8192                 100.0%        655360/655360
             kmalloc-1024                 100.0%        794624/794624
             kmalloc-4096                 100.0%        819200/819200
             kmalloc-2048                 100.0%      1257472/1257472
    
    Signed-off-by: default avatarDavid Windsor <dave@nullcore.net>
    [kees: adjust commit log, split out a few extra kmalloc hunks]
    [kees: add field names to function declarations]
    [kees: convert BUGs to WARNs and fail closed]
    [kees: add attack surface reduction analysis to commit log]
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: linux-mm@kvack.org
    Cc: linux-xfs@vger.kernel.org
    Signed-off-by: default avatarKees Cook <keescook@chromium.org>
    Acked-by: default avatarChristoph Lameter <cl@linux.com>
    8eb8284b