• Ravikiran G Thirumalai's avatar
    [PATCH] mm: slab: eliminate lock_cpu_hotplug from slab · 8f5be20b
    Ravikiran G Thirumalai authored
    Here's an attempt towards doing away with lock_cpu_hotplug in the slab
    subsystem.  This approach also fixes a bug which shows up when cpus are
    being offlined/onlined and slab caches are being tuned simultaneously.
    
    http://marc.theaimsgroup.com/?l=linux-kernel&m=116098888100481&w=2
    
    The patch has been stress tested overnight on a 2 socket 4 core AMD box with
    repeated cpu online and offline, while dbench and kernbench process are
    running, and slab caches being tuned at the same time.
    There were no lockdep warnings either.  (This test on 2,6.18 as 2.6.19-rc
    crashes at __drain_pages
    http://marc.theaimsgroup.com/?l=linux-kernel&m=116172164217678&w=2
    
     )
    
    The approach here is to hold cache_chain_mutex from CPU_UP_PREPARE until
    CPU_ONLINE (similar in approach as worqueue_mutex) .  Slab code sensitive
    to cpu_online_map (kmem_cache_create, kmem_cache_destroy, slabinfo_write,
    __cache_shrink) is already serialized with cache_chain_mutex.  (This patch
    lengthens cache_chain_mutex hold time at kmem_cache_destroy to cover this).
     This patch also takes the cache_chain_sem at kmem_cache_shrink to protect
    sanity of cpu_online_map at __cache_shrink, as viewed by slab.
    (kmem_cache_shrink->__cache_shrink->drain_cpu_caches).  But, really,
    kmem_cache_shrink is used at just one place in the acpi subsystem!  Do we
    really need to keep kmem_cache_shrink at all?
    
    Another note.  Looks like a cpu hotplug event can send  CPU_UP_CANCELED to
    a registered subsystem even if the subsystem did not receive CPU_UP_PREPARE.
    This could be due to a subsystem registered for notification earlier than
    the current subsystem crapping out with NOTIFY_BAD. Badness can occur with
    in the CPU_UP_CANCELED code path at slab if this happens (The same would
    apply for workqueue.c as well).  To overcome this, we might have to use either
    a) a per subsystem flag and avoid handling of CPU_UP_CANCELED, or
    b) Use a special notifier events like LOCK_ACQUIRE/RELEASE as Gautham was
       using in his experiments, or
    c) Do not send CPU_UP_CANCELED to a subsystem which did not receive
       CPU_UP_PREPARE.
    
    I would prefer c).
    
    Signed-off-by: default avatarRavikiran Thirumalai <kiran@scalex86.org>
    Signed-off-by: default avatarShai Fultheim <shai@scalex86.org>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    8f5be20b