Skip to content
  • Jesper Dangaard Brouer's avatar
    slub: initial bulk free implementation · fbd02630
    Jesper Dangaard Brouer authored
    
    
    This implements SLUB specific kmem_cache_free_bulk().  SLUB allocator now
    both have bulk alloc and free implemented.
    
    Choose to reenable local IRQs while calling slowpath __slab_free().  In
    worst case, where all objects hit slowpath call, the performance should
    still be faster than fallback function __kmem_cache_free_bulk(), because
    local_irq_{disable+enable} is very fast (7-cycles), while the fallback
    invokes this_cpu_cmpxchg() which is slightly slower (9-cycles).
    Nitpicking, this should be faster for N>=4, due to the entry cost of
    local_irq_{disable+enable}.
    
    Do notice that the save+restore variant is very expensive, this is key to
    why this optimization works.
    
    CPU: i7-4790K CPU @ 4.00GHz
     * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
     * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns
    
    Measurements on CPU CPU i7-4790K @ 4.00GHz
    Baseline normal fastpath (alloc+free cost): 43 cycles(tsc) 10.834 ns
    
    Bulk- fallback                   - this-patch
      1 -  58 cycles(tsc) 14.542 ns  -  43 cycles(tsc) 10.811 ns  improved 25.9%
      2 -  50 cycles(tsc) 12.659 ns  -  27 cycles(tsc)  6.867 ns  improved 46.0%
      3 -  48 cycles(tsc) 12.168 ns  -  21 cycles(tsc)  5.496 ns  improved 56.2%
      4 -  47 cycles(tsc) 11.987 ns  -  24 cycles(tsc)  6.038 ns  improved 48.9%
      8 -  46 cycles(tsc) 11.518 ns  -  17 cycles(tsc)  4.280 ns  improved 63.0%
     16 -  45 cycles(tsc) 11.366 ns  -  17 cycles(tsc)  4.483 ns  improved 62.2%
     30 -  45 cycles(tsc) 11.433 ns  -  18 cycles(tsc)  4.531 ns  improved 60.0%
     32 -  75 cycles(tsc) 18.983 ns  -  58 cycles(tsc) 14.586 ns  improved 22.7%
     34 -  71 cycles(tsc) 17.940 ns  -  53 cycles(tsc) 13.391 ns  improved 25.4%
     48 -  80 cycles(tsc) 20.077 ns  -  65 cycles(tsc) 16.268 ns  improved 18.8%
     64 -  71 cycles(tsc) 17.799 ns  -  53 cycles(tsc) 13.440 ns  improved 25.4%
    128 -  91 cycles(tsc) 22.980 ns  -  79 cycles(tsc) 19.899 ns  improved 13.2%
    158 - 100 cycles(tsc) 25.241 ns  -  90 cycles(tsc) 22.732 ns  improved 10.0%
    250 - 102 cycles(tsc) 25.583 ns  -  95 cycles(tsc) 23.916 ns  improved  6.9%
    
    Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    fbd02630