1. 15 Mar, 2016 24 commits
    • Joonsoo Kim's avatar
      mm/slab: avoid returning values by reference · 70f75067
      Joonsoo Kim authored
      
      
      Returing values by reference is bad practice.  Instead, just use
      function return value.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Suggested-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70f75067
    • Joonsoo Kim's avatar
      mm/slab: introduce new slab management type, OBJFREELIST_SLAB · b03a017b
      Joonsoo Kim authored
      
      
      SLAB needs an array to manage freed objects in a slab.  It is only used
      if some objects are freed so we can use free object itself as this
      array.  This requires additional branch in somewhat critical lock path
      to check if it is first freed object or not but that's all we need.
      Benefits is that we can save extra memory usage and reduce some
      computational overhead by allocating a management array when new slab is
      created.
      
      Code change is rather complex than what we can expect from the idea, in
      order to handle debugging feature efficiently.  If you want to see core
      idea only, please remove '#if DEBUG' block in the patch.
      
      Although this idea can apply to all caches whose size is larger than
      management array size, it isn't applied to caches which have a
      constructor.  If such cache's object is used for management array,
      constructor should be called for it before that object is returned to
      user.  I guess that overhead overwhelm benefit in that case so this idea
      doesn't applied to them at least now.
      
      For summary, from now on, slab management type is determined by
      following logic.
      
      1) if management array size is smaller than object size and no ctor, it
         becomes OBJFREELIST_SLAB.
      
      2) if management array size is smaller than leftover, it becomes
         NORMAL_SLAB which uses leftover as a array.
      
      3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
         It allocate a management array from the other cache so memory waste
         happens.
      
      4) others become NORMAL_SLAB.  It uses dedicated internal memory in a
         slab as a management array so it causes memory waste.
      
      In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
      OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
      memory.  Following is the result of number of caches with specific slab
      management type.
      
      TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF
      
      /Before/
      126 = 0 + 60 + 25 + 41
      
      /After/
      126 = 97 + 12 + 15 + 2
      
      Result shows that number of caches that doesn't waste memory increase
      from 60 to 109.
      
      I did some benchmarking and it looks that benefit are more than loss.
      
      Kmalloc: Repeatedly allocate then free test
      
      /Before/
      [    0.286809] 1. Kmalloc: Repeatedly allocate then free test
      [    1.143674] 100000 times kmalloc(32) -> 116 cycles kfree -> 78 cycles
      [    1.441726] 100000 times kmalloc(64) -> 121 cycles kfree -> 80 cycles
      [    1.815734] 100000 times kmalloc(128) -> 168 cycles kfree -> 85 cycles
      [    2.380709] 100000 times kmalloc(256) -> 287 cycles kfree -> 95 cycles
      [    3.101153] 100000 times kmalloc(512) -> 370 cycles kfree -> 117 cycles
      [    3.942432] 100000 times kmalloc(1024) -> 413 cycles kfree -> 156 cycles
      [    5.227396] 100000 times kmalloc(2048) -> 622 cycles kfree -> 248 cycles
      [    7.519793] 100000 times kmalloc(4096) -> 1102 cycles kfree -> 452 cycles
      
      /After/
      [    1.205313] 100000 times kmalloc(32) -> 117 cycles kfree -> 78 cycles
      [    1.510526] 100000 times kmalloc(64) -> 124 cycles kfree -> 81 cycles
      [    1.827382] 100000 times kmalloc(128) -> 130 cycles kfree -> 84 cycles
      [    2.226073] 100000 times kmalloc(256) -> 177 cycles kfree -> 92 cycles
      [    2.814747] 100000 times kmalloc(512) -> 286 cycles kfree -> 112 cycles
      [    3.532952] 100000 times kmalloc(1024) -> 344 cycles kfree -> 141 cycles
      [    4.608777] 100000 times kmalloc(2048) -> 519 cycles kfree -> 210 cycles
      [    6.350105] 100000 times kmalloc(4096) -> 789 cycles kfree -> 391 cycles
      
      In fact, I tested another idea implementing OBJFREELIST_SLAB with
      extendable linked array through another freed object.  It can remove
      memory waste completely but it causes more computational overhead in
      critical lock path and it seems that overhead outweigh benefit.  So, this
      patch doesn't include it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b03a017b
    • Joonsoo Kim's avatar
      mm/slab: factor out debugging initialization in cache_init_objs() · 10b2e9e8
      Joonsoo Kim authored
      
      
      cache_init_objs() will be changed in following patch and current form
      doesn't fit well for that change.  So, before doing it, this patch
      separates debugging initialization.  This would cause two loop iteration
      when debugging is enabled, but, this overhead seems too light than debug
      feature itself so effect may not be visible.  This patch will greatly
      simplify changes in cache_init_objs() in following patch.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10b2e9e8
    • Joonsoo Kim's avatar
      mm/slab: factor out slab list fixup code · d8410234
      Joonsoo Kim authored
      
      
      Slab list should be fixed up after object is detached from the slab and
      this happens at two places.  They do exactly same thing.  They will be
      changed in the following patch, so, to reduce code duplication, this
      patch factor out them and make it common function.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8410234
    • Joonsoo Kim's avatar
      mm/slab: make criteria for off slab determination robust and simple · 3217fd9b
      Joonsoo Kim authored
      
      
      To become an off slab, there are some constraints to avoid bootstrapping
      problem and recursive call.  This can be avoided differently by simply
      checking that corresponding kmalloc cache is ready and it's not a off
      slab.  It would be more robust because static size checking can be
      affected by cache size change or architecture type but dynamic checking
      isn't.
      
      One check 'freelist_cache->size > cachep->size / 2' is added to check
      benefit of choosing off slab, because, now, there is no size constraint
      which ensures enough advantage when selecting off slab.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3217fd9b
    • Joonsoo Kim's avatar
      mm/slab: do not change cache size if debug pagealloc isn't possible · f3a3c320
      Joonsoo Kim authored
      
      
      We can fail to setup off slab in some conditions.  Even in this case,
      debug pagealloc increases cache size to PAGE_SIZE in advance and it is
      waste because debug pagealloc cannot work for it when it isn't the off
      slab.  To improve this situation, this patch checks first that this
      cache with increased size is suitable for off slab.  It actually
      increases cache size when it is suitable for off-slab, so possible waste
      is removed.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3a3c320
    • Joonsoo Kim's avatar
      mm/slab: clean up cache type determination · 158e319b
      Joonsoo Kim authored
      
      
      Current cache type determination code is open-code and looks not
      understandable.  Following patch will introduce one more cache type and
      it would make code more complex.  So, before it happens, this patch
      abstracts these codes.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      158e319b
    • Joonsoo Kim's avatar
      mm/slab: align cache size first before determination of OFF_SLAB candidate · 832a15d2
      Joonsoo Kim authored
      
      
      Finding suitable OFF_SLAB candidate is more related to aligned cache
      size rather than original size.  Same reasoning can be applied to the
      debug pagealloc candidate.  So, this patch moves up alignment fixup to
      proper position.  From that point, size is aligned so we can remove some
      alignment fixups.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      832a15d2
    • Joonsoo Kim's avatar
      mm/slab: put the freelist at the end of slab page · 2e6b3602
      Joonsoo Kim authored
      
      
      Currently, the freelist is at the front of slab page.  This requires
      extra space to meet object alignment requirement.  If we put the
      freelist at the end of a slab page, objects could start at page boundary
      and will be at correct alignment.  This is possible because freelist has
      no alignment constraint itself.
      
      This gives us two benefits: It removes extra memory space for the
      freelist alignment and remove complex calculation at cache
      initialization step.  I can't think notable drawback here.
      
      I mentioned that this would reduce extra memory space, but, this benefit
      is rather theoretical because it can be applied to very few cases.
      Following is the example cache type that can get benefit from this
      change.
      
        size align num before after
          32    8  124  4100  4092
          64    8   63  4103  4095
          88    8   46  4102  4094
         272    8   15  4103  4095
         408    8   10  4098  4090
          32   16  124  4108  4092
          64   16   63  4111  4095
          32   32  124  4124  4092
          64   32   63  4127  4095
          96   32   42  4106  4074
      
      before means whole size for objects and aligned freelist before applying
      patch and after shows the result of this patch.
      
      Since before is more than 4096, number of object should decrease and
      memory waste happens.
      
      Anyway, this patch removes complex calculation so looks beneficial to
      me.
      
      [akpm@linux-foundation.org: fix kerneldoc]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e6b3602
    • Joonsoo Kim's avatar
      mm/slab: remove object status buffer for DEBUG_SLAB_LEAK · 249247b6
      Joonsoo Kim authored
      
      
      Now, we don't use object status buffer in any setup. Remove it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      249247b6
    • Joonsoo Kim's avatar
      mm/slab: alternative implementation for DEBUG_SLAB_LEAK · d31676df
      Joonsoo Kim authored
      
      
      DEBUG_SLAB_LEAK is a debug option.  It's current implementation requires
      status buffer so we need more memory to use it.  And, it cause
      kmem_cache initialization step more complex.
      
      To remove this extra memory usage and to simplify initialization step,
      this patch implement this feature with another way.
      
      When user requests to get slab object owner information, it marks that
      getting information is started.  And then, all free objects in caches
      are flushed to corresponding slab page.  Now, we can distinguish all
      freed object so we can know all allocated objects, too.  After
      collecting slab object owner information on allocated objects, mark is
      checked that there is no free during the processing.  If true, we can be
      sure that our information is correct so information is returned to user.
      
      Although this way is rather complex, it has two important benefits
      mentioned above.  So, I think it is worth changing.
      
      There is one drawback that it takes more time to get slab object owner
      information but it is just a debug option so it doesn't matter at all.
      
      To help review, this patch implements new way only.  Following patch
      will remove useless code.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d31676df
    • Joonsoo Kim's avatar
      mm/slab: clean up DEBUG_PAGEALLOC processing code · 40b44137
      Joonsoo Kim authored
      
      
      Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
      some sites.  It makes code unreadable and hard to change.
      
      This patch cleans up this code.  The following patch will change the
      criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too.
      
      [akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n]
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40b44137
    • Joonsoo Kim's avatar
      mm/slab: use more appropriate condition check for debug_pagealloc · 40323278
      Joonsoo Kim authored
      
      
      debug_pagealloc debugging is related to SLAB_POISON flag rather than
      FORCED_DEBUG option, although FORCED_DEBUG option will enable
      SLAB_POISON.  Fix it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40323278
    • Joonsoo Kim's avatar
      mm/slab: activate debug_pagealloc in SLAB when it is actually enabled · a307ebd4
      Joonsoo Kim authored
      
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a307ebd4
    • Joonsoo Kim's avatar
      mm/slab: remove the checks for slab implementation bug · 260b61dd
      Joonsoo Kim authored
      
      
      Some of "#if DEBUG" are for reporting slab implementation bug rather
      than user usecase bug.  It's not really needed because slab is stable
      for a quite long time and it makes code too dirty.  This patch remove
      it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      260b61dd
    • Joonsoo Kim's avatar
      mm/slab: remove useless structure define · 6fb92430
      Joonsoo Kim authored
      
      
      It is obsolete so remove it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6fb92430
    • Joonsoo Kim's avatar
      mm/slab: fix stale code comment · 12c61fe9
      Joonsoo Kim authored
      
      
      This patchset implements a new freed object management way, that is,
      OBJFREELIST_SLAB.  Purpose of it is to reduce memory overhead in SLAB.
      
      SLAB needs a array to manage freed objects in a slab.  If there is
      leftover after objects are packed into a slab, we can use it as a
      management array, and, in this case, there is no memory waste.  But, in
      the other cases, we need to allocate extra memory for a management array
      or utilize dedicated internal memory in a slab for it.  Both cases
      causes memory waste so it's not good.
      
      With this patchset, freed object itself can be used for a management
      array.  So, memory waste could be reduced.  Detailed idea and numbers
      are described in last patch's commit description.  Please refer it.
      
      In fact, I tested another idea implementing OBJFREELIST_SLAB with
      extendable linked array through another freed object.  It can remove
      memory waste completely but it causes more computational overhead in
      critical lock path and it seems that overhead outweigh benefit.  So,
      this patchset doesn't include it.  I will attach prototype just for a
      reference.
      
      This patch (of 16):
      
      We use freelist_idx_t type for free object management whose size would be
      smaller than size of unsigned int.  Fix it.
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12c61fe9
    • Jesper Dangaard Brouer's avatar
      mm: new API kfree_bulk() for SLAB+SLUB allocators · ca257195
      Jesper Dangaard Brouer authored
      
      
      This patch introduce a new API call kfree_bulk() for bulk freeing memory
      objects not bound to a single kmem_cache.
      
      Christoph pointed out that it is possible to implement freeing of
      objects, without knowing the kmem_cache pointer as that information is
      available from the object's page->slab_cache.  Proposing to remove the
      kmem_cache argument from the bulk free API.
      
      Jesper demonstrated that these extra steps per object comes at a
      performance cost.  It is only in the case CONFIG_MEMCG_KMEM is compiled
      in and activated runtime that these steps are done anyhow.  The extra
      cost is most visible for SLAB allocator, because the SLUB allocator does
      the page lookup (virt_to_head_page()) anyhow.
      
      Thus, the conclusion was to keep the kmem_cache free bulk API with a
      kmem_cache pointer, but we can still implement a kfree_bulk() API fairly
      easily.  Simply by handling if kmem_cache_free_bulk() gets called with a
      kmem_cache NULL pointer.
      
      This does increase the code size a bit, but implementing a separate
      kfree_bulk() call would likely increase code size even more.
      
      Below benchmarks cost of alloc+free (obj size 256 bytes) on CPU i7-4790K
      @ 4.00GHz, no PREEMPT and CONFIG_MEMCG_KMEM=y.
      
      Code size increase for SLAB:
      
       add/remove: 0/0 grow/shrink: 1/0 up/down: 74/0 (74)
       function                                     old     new   delta
       kmem_cache_free_bulk                         660     734     +74
      
      SLAB fastpath: 87 cycles(tsc) 21.814
        sz - fallback             - kmem_cache_free_bulk - kfree_bulk
         1 - 103 cycles 25.878 ns -  41 cycles 10.498 ns - 81 cycles 20.312 ns
         2 -  94 cycles 23.673 ns -  26 cycles  6.682 ns - 42 cycles 10.649 ns
         3 -  92 cycles 23.181 ns -  21 cycles  5.325 ns - 39 cycles 9.950 ns
         4 -  90 cycles 22.727 ns -  18 cycles  4.673 ns - 26 cycles 6.693 ns
         8 -  89 cycles 22.270 ns -  14 cycles  3.664 ns - 23 cycles 5.835 ns
        16 -  88 cycles 22.038 ns -  14 cycles  3.503 ns - 22 cycles 5.543 ns
        30 -  89 cycles 22.284 ns -  13 cycles  3.310 ns - 20 cycles 5.197 ns
        32 -  88 cycles 22.249 ns -  13 cycles  3.420 ns - 20 cycles 5.166 ns
        34 -  88 cycles 22.224 ns -  14 cycles  3.643 ns - 20 cycles 5.170 ns
        48 -  88 cycles 22.088 ns -  14 cycles  3.507 ns - 20 cycles 5.203 ns
        64 -  88 cycles 22.063 ns -  13 cycles  3.428 ns - 20 cycles 5.152 ns
       128 -  89 cycles 22.483 ns -  15 cycles  3.891 ns - 23 cycles 5.885 ns
       158 -  89 cycles 22.381 ns -  15 cycles  3.779 ns - 22 cycles 5.548 ns
       250 -  91 cycles 22.798 ns -  16 cycles  4.152 ns - 23 cycles 5.967 ns
      
      SLAB when enabling MEMCG_KMEM runtime:
       - kmemcg fastpath: 130 cycles(tsc) 32.684 ns (step:0)
       1 - 148 cycles 37.220 ns -  66 cycles 16.622 ns - 66 cycles 16.583 ns
       2 - 141 cycles 35.510 ns -  51 cycles 12.820 ns - 58 cycles 14.625 ns
       3 - 140 cycles 35.017 ns -  37 cycles 9.326 ns - 33 cycles 8.474 ns
       4 - 137 cycles 34.507 ns -  31 cycles 7.888 ns - 33 cycles 8.300 ns
       8 - 140 cycles 35.069 ns -  25 cycles 6.461 ns - 25 cycles 6.436 ns
       16 - 138 cycles 34.542 ns -  23 cycles 5.945 ns - 22 cycles 5.670 ns
       30 - 136 cycles 34.227 ns -  22 cycles 5.502 ns - 22 cycles 5.587 ns
       32 - 136 cycles 34.253 ns -  21 cycles 5.475 ns - 21 cycles 5.324 ns
       34 - 136 cycles 34.254 ns -  21 cycles 5.448 ns - 20 cycles 5.194 ns
       48 - 136 cycles 34.075 ns -  21 cycles 5.458 ns - 21 cycles 5.367 ns
       64 - 135 cycles 33.994 ns -  21 cycles 5.350 ns - 21 cycles 5.259 ns
       128 - 137 cycles 34.446 ns -  23 cycles 5.816 ns - 22 cycles 5.688 ns
       158 - 137 cycles 34.379 ns -  22 cycles 5.727 ns - 22 cycles 5.602 ns
       250 - 138 cycles 34.755 ns -  24 cycles 6.093 ns - 23 cycles 5.986 ns
      
      Code size increase for SLUB:
       function                                     old     new   delta
       kmem_cache_free_bulk                         717     799     +82
      
      SLUB benchmark:
       SLUB fastpath: 46 cycles(tsc) 11.691 ns (step:0)
        sz - fallback             - kmem_cache_free_bulk - kfree_bulk
         1 -  61 cycles 15.486 ns -  53 cycles 13.364 ns - 57 cycles 14.464 ns
         2 -  54 cycles 13.703 ns -  32 cycles  8.110 ns - 33 cycles 8.482 ns
         3 -  53 cycles 13.272 ns -  25 cycles  6.362 ns - 27 cycles 6.947 ns
         4 -  51 cycles 12.994 ns -  24 cycles  6.087 ns - 24 cycles 6.078 ns
         8 -  50 cycles 12.576 ns -  21 cycles  5.354 ns - 22 cycles 5.513 ns
        16 -  49 cycles 12.368 ns -  20 cycles  5.054 ns - 20 cycles 5.042 ns
        30 -  49 cycles 12.273 ns -  18 cycles  4.748 ns - 19 cycles 4.758 ns
        32 -  49 cycles 12.401 ns -  19 cycles  4.821 ns - 19 cycles 4.810 ns
        34 -  98 cycles 24.519 ns -  24 cycles  6.154 ns - 24 cycles 6.157 ns
        48 -  83 cycles 20.833 ns -  21 cycles  5.446 ns - 21 cycles 5.429 ns
        64 -  75 cycles 18.891 ns -  20 cycles  5.247 ns - 20 cycles 5.238 ns
       128 -  93 cycles 23.271 ns -  27 cycles  6.856 ns - 27 cycles 6.823 ns
       158 - 102 cycles 25.581 ns -  30 cycles  7.714 ns - 30 cycles 7.695 ns
       250 - 107 cycles 26.917 ns -  38 cycles  9.514 ns - 38 cycles 9.506 ns
      
      SLUB when enabling MEMCG_KMEM runtime:
       - kmemcg fastpath: 71 cycles(tsc) 17.897 ns (step:0)
       1 - 85 cycles 21.484 ns -  78 cycles 19.569 ns - 75 cycles 18.938 ns
       2 - 81 cycles 20.363 ns -  45 cycles 11.258 ns - 44 cycles 11.076 ns
       3 - 78 cycles 19.709 ns -  33 cycles 8.354 ns - 32 cycles 8.044 ns
       4 - 77 cycles 19.430 ns -  28 cycles 7.216 ns - 28 cycles 7.003 ns
       8 - 101 cycles 25.288 ns -  23 cycles 5.849 ns - 23 cycles 5.787 ns
       16 - 76 cycles 19.148 ns -  20 cycles 5.162 ns - 20 cycles 5.081 ns
       30 - 76 cycles 19.067 ns -  19 cycles 4.868 ns - 19 cycles 4.821 ns
       32 - 76 cycles 19.052 ns -  19 cycles 4.857 ns - 19 cycles 4.815 ns
       34 - 121 cycles 30.291 ns -  25 cycles 6.333 ns - 25 cycles 6.268 ns
       48 - 108 cycles 27.111 ns -  21 cycles 5.498 ns - 21 cycles 5.458 ns
       64 - 100 cycles 25.164 ns -  20 cycles 5.242 ns - 20 cycles 5.229 ns
       128 - 155 cycles 38.976 ns -  27 cycles 6.886 ns - 27 cycles 6.892 ns
       158 - 132 cycles 33.034 ns -  30 cycles 7.711 ns - 30 cycles 7.728 ns
       250 - 130 cycles 32.612 ns -  38 cycles 9.560 ns - 38 cycles 9.549 ns
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca257195
    • Jesper Dangaard Brouer's avatar
      slab: implement bulk free in SLAB allocator · e6cdb58d
      Jesper Dangaard Brouer authored
      This patch implements the free side of bulk API for the SLAB allocator
      kmem_cache_free_bulk(), and concludes the implementation of optimized
      bulk API for SLAB allocator.
      
      Benchmarked[1] cost of alloc+free (obj size 256 bytes) on CPU i7-4790K @
      4.00GHz, with no debug options, no PREEMPT and CONFIG_MEMCG_KMEM=y but
      no active user of kmemcg.
      
      SLAB single alloc+free cost: 87 cycles(tsc) 21.814 ns with this
      optimized config.
      
      bulk- Current fallback          - optimized SLAB bulk
        1 - 102 cycles(tsc) 25.747 ns - 41 cycles(tsc) 10.490 ns - improved 59.8%
        2 -  94 cycles(tsc) 23.546 ns - 26 cycles(tsc)  6.567 ns - improved 72.3%
        3 -  92 cycles(tsc) 23.127 ns - 20 cycles(tsc)  5.244 ns - improved 78.3%
        4 -  90 cycles(tsc) 22.663 ns - 18 cycles(tsc)  4.588 ns - improved 80.0%
        8 -  88 cycles(tsc) 22.242 ns - 14 cycles(tsc)  3.656 ns - improved 84.1%
       16 -  88 cycles(tsc) 22.010 ns - 13 cycles(tsc)  3.480 ns - improved 85.2%
       30 -  89 cycles(tsc) 22.305 ns - 13 cycles(tsc)  3.303 ns - improved 85.4%
       32 -  89 cycles(tsc) 22.277 ns - 13 cycles(tsc)  3.309 ns - improved 85.4%
       34 -  88 cycles(tsc) 22.246 ns - 13 cycles(tsc)  3.294 ns - improved 85.2%
       48 -  88 cycles(tsc) 22.121 ns - 13 cycles(tsc)  3.492 ns - improved 85.2%
       64 -  88 cycles(tsc) 22.052 ns - 13 cycles(tsc)  3.411 ns - improved 85.2%
      128 -  89 cycles(tsc) 22.452 ns - 15 cycles(tsc)  3.841 ns - improved 83.1%
      158 -  89 cycles(tsc) 22.403 ns - 14 cycles(tsc)  3.746 ns - improved 84.3%
      250 -  91 cycles(tsc) 22.775 ns - 16 cycles(tsc)  4.111 ns - improved 82.4%
      
      Notice it is not recommended to do very large bulk operation with
      this bulk API, because local IRQs are disabled in this period.
      
      [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
      
      
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6cdb58d
    • Jesper Dangaard Brouer's avatar
      slab: avoid running debug SLAB code with IRQs disabled for alloc_bulk · 7b0501dd
      Jesper Dangaard Brouer authored
      
      
      Move the call to cache_alloc_debugcheck_after() outside the IRQ disabled
      section in kmem_cache_alloc_bulk().
      
      When CONFIG_DEBUG_SLAB is disabled the compiler should remove this code.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b0501dd
    • Jesper Dangaard Brouer's avatar
      slab: implement bulk alloc in SLAB allocator · 2a777eac
      Jesper Dangaard Brouer authored
      
      
      This patch implements the alloc side of bulk API for the SLAB allocator.
      
      Further optimization are still possible by changing the call to
      __do_cache_alloc() into something that can return multiple objects.
      This optimization is left for later, given end results already show in
      the area of 80% speedup.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a777eac
    • Jesper Dangaard Brouer's avatar
      slab: use slab_post_alloc_hook in SLAB allocator shared with SLUB · d5e3ed66
      Jesper Dangaard Brouer authored
      
      
      Reviewers notice that the order in slab_post_alloc_hook() of
      kmemcheck_slab_alloc() and kmemleak_alloc_recursive() gets swapped
      compared to slab.c / SLAB allocator.
      
      Also notice memset now occurs before calling kmemcheck_slab_alloc() and
      kmemleak_alloc_recursive().
      
      I assume this reordering of kmemcheck, kmemleak and memset is okay
      because this is the order they are used by the SLUB allocator.
      
      This patch completes the sharing of alloc_hook's between SLUB and SLAB.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5e3ed66
    • Jesper Dangaard Brouer's avatar
      slab: use slab_pre_alloc_hook in SLAB allocator shared with SLUB · 011eceaf
      Jesper Dangaard Brouer authored
      
      
      Deduplicate code in SLAB allocator functions slab_alloc() and
      slab_alloc_node() by using the slab_pre_alloc_hook() call, which is now
      shared between SLUB and SLAB.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      011eceaf
    • Jesper Dangaard Brouer's avatar
      mm: fault-inject take over bootstrap kmem_cache check · fab9963a
      Jesper Dangaard Brouer authored
      
      
      Remove the SLAB specific function slab_should_failslab(), by moving the
      check against fault-injection for the bootstrap slab, into the shared
      function should_failslab() (used by both SLAB and SLUB).
      
      This is a step towards sharing alloc_hook's between SLUB and SLAB.
      
      This bootstrap slab "kmem_cache" is used for allocating struct
      kmem_cache objects to the allocator itself.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fab9963a
  2. 19 Feb, 2016 1 commit
    • Dmitry Safonov's avatar
      mm: slab: free kmem_cache_node after destroy sysfs file · 52b4b950
      Dmitry Safonov authored
      When slub_debug alloc_calls_show is enabled we will try to track
      location and user of slab object on each online node, kmem_cache_node
      structure and cpu_cache/cpu_slub shouldn't be freed till there is the
      last reference to sysfs file.
      
      This fixes the following panic:
      
         BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
         IP:  list_locations+0x169/0x4e0
         PGD 257304067 PUD 438456067 PMD 0
         Oops: 0000 [#1] SMP
         CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30
         Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011
         task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000
         RIP: list_locations+0x169/0x4e0
         Call Trace:
           alloc_calls_show+0x1d/0x30
           slab_attr_show+0x1b/0x30
           sysfs_read_file+0x9a/0x1a0
           vfs_read+0x9c/0x170
           SyS_read+0x58/0xb0
           system_call_fastpath+0x16/0x1b
         Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10
         CR2: 0000000000000020
      
      Separated __kmem_cache_release from __kmem_cache_shutdown which now
      called on slab_kmem_cache_release (after the last reference to sysfs
      file object has dropped).
      
      Reintroduced locking in free_partial as sysfs file might access cache's
      partial list after shutdowning - partial revert of the commit
      69cb8e6b ("slub: free slabs without holding locks").  Zap
      __remove_partial and use remove_partial (w/o underscores) as
      free_partial now takes list_lock which s partial revert for commit
      1e4dd946
      
       ("slub: do not assert not having lock in removing freed
      partial")
      
      Signed-off-by: default avatarDmitry Safonov <dsafonov@virtuozzo.com>
      Suggested-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52b4b950
  3. 15 Jan, 2016 3 commits
  4. 22 Nov, 2015 1 commit
    • Jesper Dangaard Brouer's avatar
      slab/slub: adjust kmem_cache_alloc_bulk API · 865762a8
      Jesper Dangaard Brouer authored
      
      
      Adjust kmem_cache_alloc_bulk API before we have any real users.
      
      Adjust API to return type 'int' instead of previously type 'bool'.  This
      is done to allow future extension of the bulk alloc API.
      
      A future extension could be to allow SLUB to stop at a page boundary, when
      specified by a flag, and then return the number of objects.
      
      The advantage of this approach, would make it easier to make bulk alloc
      run without local IRQs disabled.  With an approach of cmpxchg "stealing"
      the entire c->freelist or page->freelist.  To avoid overshooting we would
      stop processing at a slab-page boundary.  Else we always end up returning
      some objects at the cost of another cmpxchg.
      
      To keep compatible with future users of this API linking against an older
      kernel when using the new flag, we need to return the number of allocated
      objects with this API change.
      
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      865762a8
  5. 07 Nov, 2015 2 commits
    • Kirill A. Shutemov's avatar
      slab, slub: use page->rcu_head instead of page->lru plus cast · bc4f610d
      Kirill A. Shutemov authored
      
      
      We have properly typed page->rcu_head, no need to cast page->lru.
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc4f610d
    • Mel Gorman's avatar
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman authored
      
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  6. 06 Nov, 2015 2 commits
    • Vladimir Davydov's avatar
      memcg: unify slab and other kmem pages charging · f3ccb2c4
      Vladimir Davydov authored
      
      
      We have memcg_kmem_charge and memcg_kmem_uncharge methods for charging and
      uncharging kmem pages to memcg, but currently they are not used for
      charging slab pages (i.e.  they are only used for charging pages allocated
      with alloc_kmem_pages).  The only reason why the slab subsystem uses
      special helpers, memcg_charge_slab and memcg_uncharge_slab, is that it
      needs to charge to the memcg of kmem cache while memcg_charge_kmem charges
      to the memcg that the current task belongs to.
      
      To remove this diversity, this patch adds an extra argument to
      __memcg_kmem_charge that can be a pointer to a memcg or NULL.  If it is
      not NULL, the function tries to charge to the memcg it points to,
      otherwise it charge to the current context.  Next, it makes the slab
      subsystem use this function to charge slab pages.
      
      Since memcg_charge_kmem and memcg_uncharge_kmem helpers are now used only
      in __memcg_kmem_charge and __memcg_kmem_uncharge, they are inlined.  Since
      __memcg_kmem_charge stores a pointer to the memcg in the page struct, we
      don't need memcg_uncharge_slab anymore and can use free_kmem_pages.
      Besides, one can now detect which memcg a slab page belongs to by reading
      /proc/kpagecgroup.
      
      Note, this patch switches slab to charge-after-alloc design.  Since this
      design is already used for all other memcg charges, it should not make any
      difference.
      
      [hannes@cmpxchg.org: better to have an outer function than a magic parameter for the memcg lookup]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f3ccb2c4
    • Catalin Marinas's avatar
      mm: slab: only move management objects off-slab for sizes larger than KMALLOC_MIN_SIZE · d4322d88
      Catalin Marinas authored
      On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc
      configurations defining ARCH_DMA_MINALIGN to 128), the first
      kmalloc_caches[] entry to be initialised after slab_early_init = 0 is
      "kmalloc-128" with index 7.  Depending on the debug kernel configuration,
      sizeof(struct kmem_cache) can be larger than 128 resulting in an
      INDEX_NODE of 8.
      
      Commit 8fc9cf42 ("slab: make more slab management structure off the
      slab") enables off-slab management objects for sizes starting with
      PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation
      of the "kmalloc-128" cache would try to place the management objects
      off-slab.  However, since KMALLOC_MIN_SIZE is already 128 and
      freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size)
      returns NULL (kmalloc_caches[7] not populated yet).  This triggers the
      following bug on arm64:
      
        kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283!
        Internal error: Oops - BUG: 0 [#1] SMP
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540
        Hardware name: Juno (DT)
        PC is at __kmem_cache_create+0x21c/0x280
        LR is at __kmem_cache_create+0x210/0x280
        [...]
        Call trace:
          __kmem_cache_create+0x21c/0x280
          create_boot_cache+0x48/0x80
          create_kmalloc_cache+0x50/0x88
          create_kmalloc_caches+0x4c/0xf4
          kmem_cache_init+0x100/0x118
          start_kernel+0x214/0x33c
      
      This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab
      management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE.
      
      Fixes: 8fc9cf42
      
       ("slab: make more slab management structure off the slab")
      Signed-off-by: Catalin Marinas's avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>	[3.15+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d4322d88
  7. 02 Oct, 2015 1 commit
    • Joonsoo Kim's avatar
      mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) · 03a2d2a3
      Joonsoo Kim authored
      Commit description is copied from the original post of this bug:
      
        http://comments.gmane.org/gmane.linux.kernel.mm/135349
      
      Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next
      larger cache size than the size index INDEX_NODE mapping.  In kernels
      3.9 and earlier we used malloc_sizes[INDEX_L3 + 1].cs_size.
      
      However, sometimes we can't get the right output we expected via
      kmalloc_size(INDEX_NODE + 1), causing a BUG().
      
      The mapping table in the latest kernel is like:
          index = {0,   1,  2 ,  3,  4,   5,   6,   n}
           size = {0,   96, 192, 8, 16,  32,  64,   2^n}
      The mapping table before 3.10 is like this:
          index = {0 , 1 , 2,   3,  4 ,  5 ,  6,   n}
          size  = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)}
      
      The problem on my mips64 machine is as follows:
      
      (1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC
          && DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150",
          and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE
          kmalloc_index(sizeof(struct kmem_cache_node))
      
      (2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8.
      
      (3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size
          = PAGE_SIZE".
      
      (4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and
          "flags |= CFLGS_OFF_SLAB" will be covered.
      
      (5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to
          "cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result
          here may be NULL while kernel bootup.
      
      (6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the
          BUG info as the following shows (may be only mips64 has this problem):
      
      This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) and removes
      the BUG by adding 'size >= 256' check to guarantee that all necessary
      small sized slabs are initialized regardless sequence of slab size in
      mapping table.
      
      Fixes: e3366016
      
       ("slab: Use common kmalloc_index/kmalloc_size...")
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: default avatarLiuhailong <liu.hailong6@zte.com.cn>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03a2d2a3
  8. 08 Sep, 2015 1 commit
    • Vlastimil Babka's avatar
      mm: rename alloc_pages_exact_node() to __alloc_pages_node() · 96db800f
      Vlastimil Babka authored
      alloc_pages_exact_node() was introduced in commit 6484eb3e ("page
      allocator: do not check NUMA node ID when the caller knows the node is
      valid") as an optimized variant of alloc_pages_node(), that doesn't
      fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
      name of the function can easily suggest that the allocation is
      restricted to the given node and fails otherwise.  In truth, the node is
      only preferred, unless __GFP_THISNODE is passed among the gfp flags.
      
      The misleading name has lead to mistakes in the past, see for example
      commits 5265047a ("mm, thp: really limit transparent hugepage
      allocation to local node") and b360edb4
      
       ("mm, mempolicy:
      migrate_to_node should only migrate to node").
      
      Another issue with the name is that there's a family of
      alloc_pages_exact*() functions where 'exact' means exact size (instead
      of page order), which leads to more confusion.
      
      To prevent further mistakes, this patch effectively renames
      alloc_pages_exact_node() to __alloc_pages_node() to better convey that
      it's an optimized variant of alloc_pages_node() not intended for general
      usage.  Both functions get described in comments.
      
      It has been also considered to really provide a convenience function for
      allocations restricted to a node, but the major opinion seems to be that
      __GFP_THISNODE already provides that functionality and we shouldn't
      duplicate the API needlessly.  The number of users would be small
      anyway.
      
      Existing callers of alloc_pages_exact_node() are simply converted to
      call __alloc_pages_node(), with the exception of sba_alloc_coherent()
      which open-codes the check for NUMA_NO_NODE, so it is converted to use
      alloc_pages_node() instead.  This means it no longer performs some
      VM_BUG_ON checks, and since the current check for nid in
      alloc_pages_node() uses a 'nid < 0' comparison (which includes
      NUMA_NO_NODE), it may hide wrong values which would be previously
      exposed.
      
      Both differences will be rectified by the next patch.
      
      To sum up, this patch makes no functional changes, except temporarily
      hiding potentially buggy callers.  Restricting the checks in
      alloc_pages_node() is left for the next patch which can in turn expose
      more existing buggy callers.
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRobin Holt <robinmholt@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Cliff Whickman <cpw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96db800f
  9. 04 Sep, 2015 1 commit
    • Christoph Lameter's avatar
      slab: infrastructure for bulk object allocation and freeing · 484748f0
      Christoph Lameter authored
      
      
      Add the basic infrastructure for alloc/free operations on pointer arrays.
      It includes a generic function in the common slab code that is used in
      this infrastructure patch to create the unoptimized functionality for slab
      bulk operations.
      
      Allocators can then provide optimized allocation functions for situations
      in which large numbers of objects are needed.  These optimization may
      avoid taking locks repeatedly and bypass metadata creation if all objects
      in slab pages can be used to provide the objects required.
      
      Allocators can extend the skeletons provided and add their own code to the
      bulk alloc and free functions.  They can keep the generic allocation and
      freeing and just fall back to those if optimizations would not work (like
      for example when debugging is on).
      
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      484748f0
  10. 21 Aug, 2015 1 commit
    • Michal Hocko's avatar
      mm: make page pfmemalloc check more robust · 2f064f34
      Michal Hocko authored
      Commit c48a11c7 ("netvm: propagate page->pfmemalloc to skb") added
      checks for page->pfmemalloc to __skb_fill_page_desc():
      
              if (page->pfmemalloc && !page->mapping)
                      skb->pfmemalloc = true;
      
      It assumes page->mapping == NULL implies that page->pfmemalloc can be
      trusted.  However, __delete_from_page_cache() can set set page->mapping
      to NULL and leave page->index value alone.  Due to being in union, a
      non-zero page->index will be interpreted as true page->pfmemalloc.
      
      So the assumption is invalid if the networking code can see such a page.
      And it seems it can.  We have encountered this with a NFS over loopback
      setup when such a page is attached to a new skbuf.  There is no copying
      going on in this case so the page confuses __skb_fill_page_desc which
      interprets the index as pfmemalloc flag and the network stack drops
      packets that have been allocated using the reserves unless they are to
      be queued on sockets handling the swapping which is the case here and
      that leads to hangs when the nfs client waits for a response from the
      server which has been dropped and thus never arrive.
      
      The struct page is already heavily packed so rather than finding another
      hole to put it in, let's do a trick instead.  We can reuse the index
      again but define it to an impossible value (-1UL).  This is the page
      index so it should never see the value that large.  Replace all direct
      users of page->pfmemalloc by page_is_pfmemalloc which will hide this
      nastiness from unspoiled eyes.
      
      The information will get lost if somebody wants to use page->index
      obviously but that was the case before and the original code expected
      that the information should be persisted somewhere else if that is
      really needed (e.g.  what SLAB and SLUB do).
      
      [akpm@linux-foundation.org: fix blooper in slub]
      Fixes: c48a11c7
      
       ("netvm: propagate page->pfmemalloc to skb")
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Debugged-by: default avatarVlastimil Babka <vbabka@suse.com>
      Debugged-by: default avatarJiri Bohac <jbohac@suse.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f064f34
  11. 25 Jun, 2015 1 commit
    • Daniel Sanders's avatar
      slab: correct size_index table before replacing the bootstrap kmem_cache_node · 34cc6990
      Daniel Sanders authored
      
      
      This patch moves the initialization of the size_index table slightly
      earlier so that the first few kmem_cache_node's can be safely allocated
      when KMALLOC_MIN_SIZE is large.
      
      There are currently two ways to generate indices into kmalloc_caches (via
      kmalloc_index() and via the size_index table in slab_common.c) and on some
      arches (possibly only MIPS) they potentially disagree with each other
      until create_kmalloc_caches() has been called.  It seems that the
      intention is that the size_index table is a fast equivalent to
      kmalloc_index() and that create_kmalloc_caches() patches the table to
      return the correct value for the cases where kmalloc_index()'s
      if-statements apply.
      
      The failing sequence was:
      * kmalloc_caches contains NULL elements
      * kmem_cache_init initialises the element that 'struct
        kmem_cache_node' will be allocated to. For 32-bit Mips, this is a
        56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7).
      * init_list is called which calls kmalloc_node to allocate a 'struct
        kmem_cache_node'.
      * kmalloc_slab selects the kmem_caches element using
        size_index[size_index_elem(size)]. For MIPS, size is 56, and the
        expression returns 6.
      * This element of kmalloc_caches is NULL and allocation fails.
      * If it had not already failed, it would have called
        create_kmalloc_caches() at this point which would have changed
        size_index[size_index_elem(size)] to 7.
      
      I don't believe the bug to be LLVM specific but GCC doesn't normally
      encounter the problem.  I haven't been able to identify exactly what GCC
      is doing better (probably inlining) but it seems that GCC is managing to
      optimize to the point that it eliminates the problematic allocations.
      This theory is supported by the fact that GCC can be made to fail in the
      same way by changing inline, __inline, __inline__, and __always_inline in
      include/linux/compiler-gcc.h such that they don't actually inline things.
      
      Signed-off-by: default avatarDaniel Sanders <daniel.sanders@imgtec.com>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34cc6990
  12. 14 Apr, 2015 1 commit
    • David Rientjes's avatar
      mm: remove GFP_THISNODE · 4167e9b2
      David Rientjes authored
      NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.
      
      GFP_THISNODE is a secret combination of gfp bits that have different
      behavior than expected.  It is a combination of __GFP_THISNODE,
      __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
      allocator slowpath to fail without trying reclaim even though it may be
      used in combination with __GFP_WAIT.
      
      An example of the problem this creates: commit e97ca8e5
      
       ("mm: fix
      GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
      that really just wanted __GFP_THISNODE.  The problem doesn't end there,
      however, because even it was a no-op for alloc_misplaced_dst_page(),
      which also sets __GFP_NORETRY and __GFP_NOWARN, and
      migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
      is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
      no-op in these cases since the page allocator special-cases
      __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.
      
      It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
      to restrict an allocation to a local node, but remove GFP_THISNODE and
      its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
      wants to avoid reclaim.
      
      This allows the aforementioned functions to actually reclaim as they
      should.  It also enables any future callers that want to do
      __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
      rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.
      
      Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
      it is unchanged.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4167e9b2
  13. 13 Feb, 2015 1 commit
    • Vladimir Davydov's avatar
      slub: make dead caches discard free slabs immediately · d6e0b7fa
      Vladimir Davydov authored
      To speed up further allocations SLUB may store empty slabs in per cpu/node
      partial lists instead of freeing them immediately.  This prevents per
      memcg caches destruction, because kmem caches created for a memory cgroup
      are only destroyed after the last page charged to the cgroup is freed.
      
      To fix this issue, this patch resurrects approach first proposed in [1].
      It forbids SLUB to cache empty slabs after the memory cgroup that the
      cache belongs to was destroyed.  It is achieved by setting kmem_cache's
      cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so
      that it would drop frozen empty slabs immediately if cpu_partial = 0.
      
      The runtime overhead is minimal.  From all the hot functions, we only
      touch relatively cold put_cpu_partial(): we make it call
      unfreeze_partials() after freezing a slab that belongs to an offline
      memory cgroup.  Since slab freezing exists to avoid moving slabs from/to a
      partial list on free/alloc, and there can't be allocations from dead
      caches, it shouldn't cause any overhead.  We do have to disable preemption
      for put_cpu_partial() to achieve that though.
      
      The original patch was accepted well and even merged to the mm tree.
      However, I decided to withdraw it due to changes happening to the memcg
      core at that time.  I had an idea of introducing per-memcg shrinkers for
      kmem caches, but now, as memcg has finally settled down, I do not see it
      as an option, because SLUB shrinker would be too costly to call since SLUB
      does not keep free slabs on a separate list.  Besides, we currently do not
      even call per-memcg shrinkers for offline memcgs.  Overall, it would
      introduce much more complexity to both SLUB and memcg than this small
      patch.
      
      Regarding to SLAB, there's no problem with it, because it shrinks
      per-cpu/node caches periodically.  Thanks to list_lru reparenting, we no
      longer keep entries for offline cgroups in per-memcg arrays (such as
      memcg_cache_params->memcg_caches), so we do not have to bother if a
      per-memcg cache will be shrunk a bit later than it could be.
      
      [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650
      
      
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d6e0b7fa