1. 14 Jun, 2018 1 commit
  2. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook authored
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
              kmalloc(a * b, gfp)
              kmalloc_array(a * b, gfp)
      as well as handling cases of:
              kmalloc(a * b * c, gfp)
              kmalloc(array3_size(a, b, c), gfp)
      as it's slightly less ugly than:
              kmalloc_array(array_size(a, b), c, gfp)
      This does, however, attempt to ignore constant size factors like:
              kmalloc(4 * 1024, gfp)
      though any constants defined via macros get caught up in the conversion.
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      The Coccinelle script used for this was:
      // Fix redundant parens around sizeof().
      type TYPE;
      expression THING, E;
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
  3. 14 May, 2018 1 commit
  4. 01 Feb, 2018 1 commit
  5. 29 Jan, 2018 1 commit
  6. 15 Jan, 2018 1 commit
    • David Windsor's avatar
      exofs: Define usercopy region in exofs_inode_cache slab cache · 2b06a9e3
      David Windsor authored
      The exofs short symlink names, stored in struct exofs_i_info.i_data and
      therefore contained in the exofs_inode_cache slab cache, need to be copied
      to/from userspace.
      cache object allocation:
                  oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
                  return &oi->vfs_inode;
                  inode->i_link = (char *)oi->i_data;
      example usage trace:
              readlink_copy(..., link):
                  copy_to_user(..., link, len);
              (inlined in vfs_readlink)
              generic_readlink(dentry, ...):
                  struct inode *inode = d_inode(dentry);
                  const char *link = inode->i_link;
                  readlink_copy(..., link);
      In support of usercopy hardening, this patch defines a region in the
      exofs_inode_cache slab cache in which userspace copy operations are
      This region is known as the slab cache's usercopy region. Slab caches
      can now check that each dynamically sized copy operation involving
      cache-managed memory falls entirely within the slab's usercopy region.
      This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
      whitelisting code in the last public patch of grsecurity/PaX based on my
      understanding of the code. Changes or omissions from the original code are
      mine and don't reflect the original grsecurity/PaX code.
      Signed-off-by: default avatarDavid Windsor <dave@nullcore.net>
      [kees: adjust commit log, provide usage trace]
      Cc: Boaz Harrosh <ooo@electrozaur.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
  7. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  8. 01 Aug, 2017 1 commit
    • Jeff Layton's avatar
      fs: convert a pile of fsync routines to errseq_t based reporting · 3b49c9a1
      Jeff Layton authored
      This patch converts most of the in-kernel filesystems that do writeback
      out of the pagecache to report errors using the errseq_t-based
      infrastructure that was recently added. This allows them to report
      errors once for each open file description.
      Most filesystems have a fairly straightforward fsync operation. They
      call filemap_write_and_wait_range to write back all of the data and
      wait on it, and then (sometimes) sync out the metadata.
      For those filesystems this is a straightforward conversion from calling
      filemap_write_and_wait_range in their fsync operation to calling
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
  9. 05 Jul, 2017 1 commit
  10. 09 May, 2017 1 commit
  11. 20 Apr, 2017 1 commit
  12. 14 Jan, 2017 1 commit
    • Peter Zijlstra's avatar
      locking/atomic, kref: Add kref_read() · 2c935bc5
      Peter Zijlstra authored
      Since we need to change the implementation, stop exposing internals.
      Provide kref_read() to read the current reference count; typically
      used for debug messages.
      Kills two anti-patterns:
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  13. 10 Dec, 2016 1 commit
  14. 28 Oct, 2016 1 commit
  15. 11 Oct, 2016 1 commit
  16. 28 Sep, 2016 1 commit
  17. 27 Sep, 2016 2 commits
  18. 22 Sep, 2016 1 commit
  19. 07 Jun, 2016 1 commit
  20. 09 May, 2016 1 commit
  21. 02 May, 2016 1 commit
    • Al Viro's avatar
      make ext2_get_page() and friends work without external serialization · be5b82db
      Al Viro authored
      Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
      relies upon the directory being locked - the way it sets and tests Checked and
      Error bits would be racy without that.  Switch to a slightly different scheme,
      _not_ setting Checked in case of failure.  That way the logics becomes
      	if Checked => OK
      	else if Error => fail
      	else if !validate => fail
      	else => OK
      with validation setting Checked or Error on success and failure resp. and
      returning which one had happened.  Equivalent to the current logics, but unlike
      the current logics not sensitive to the order of set_bit, test_bit getting
      reordered by CPU, etc.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  22. 01 May, 2016 1 commit
  23. 18 Apr, 2016 1 commit
  24. 10 Apr, 2016 1 commit
  25. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      This promise never materialized.  And unlikely will.
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      Let's stop pretending that pages in page cache are special.  They are
      The changes are pretty straight-forward:
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
       - page_cache_get() -> get_page();
       - page_cache_release() -> put_page();
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      virtual patch
      expression E;
      + E
      expression E;
      + E
      + PAGE_SHIFT
      + PAGE_SIZE
      + PAGE_MASK
      expression E;
      + PAGE_ALIGN(E)
      expression E;
      - page_cache_get(E)
      + get_page(E)
      expression E;
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  26. 22 Jan, 2016 1 commit
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  27. 15 Jan, 2016 1 commit
    • Vladimir Davydov's avatar
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov authored
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  28. 12 Dec, 2015 1 commit
    • Hugh Dickins's avatar
      osd fs: __r4w_get_page rely on PageUptodate for uptodate · 3066a967
      Hugh Dickins authored
      Commit 42cb14b1
       ("mm: migrate dirty page without
      clear_page_dirty_for_io etc") simplified the migration of a PageDirty
      pagecache page: one stat needs moving from zone to zone and that's about
      It's convenient and safest for it to shift the PageDirty bit from old
      page to new, just before updating the zone stats: before copying data
      and marking the new PageUptodate.  This is all done while both pages are
      isolated and locked, just as before; and just as before, there's a
      moment when the new page is visible in the radix_tree, but not yet
      PageUptodate.  What's new is that it may now be briefly visible as
      PageDirty before it is PageUptodate.
      When I scoured the tree to see if this could cause a problem anywhere,
      the only places I found were in two similar functions __r4w_get_page():
      which look up a page with find_get_page() (not using page lock), then
      claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
      I'm not sure whether that was right before, but now it might be wrong
      (on rare occasions): only claim the page is uptodate if PageUptodate.
      Or perhaps the page in question could never be migratable anyway?
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarBoaz Harrosh <ooo@electrozaur.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  29. 09 Dec, 2015 1 commit
    • Al Viro's avatar
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro authored
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  30. 09 Nov, 2015 1 commit
  31. 23 Jun, 2015 1 commit
  32. 11 May, 2015 1 commit
  33. 15 Apr, 2015 1 commit
  34. 12 Apr, 2015 2 commits
  35. 17 Feb, 2015 1 commit
    • Matthew Wilcox's avatar
      vfs: remove get_xip_mem · e748dcd0
      Matthew Wilcox authored
      All callers of get_xip_mem() are now gone.  Remove checks for it,
      initialisers of it, documentation of it and the only implementation of it.
       Also remove mm/filemap_xip.c as it is now empty.  Also remove
      documentation of the long-gone get_xip_page().
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  36. 20 Jan, 2015 2 commits
  37. 19 Oct, 2014 1 commit