Skip to content
  • Kuppuswamy Sathyanarayanan's avatar
    mm/vmalloc.c: fix percpu free VM area search criteria · 5336e52c
    Kuppuswamy Sathyanarayanan authored
    Recent changes to the vmalloc code by commit 68ad4a33
    ("mm/vmalloc.c: keep track of free blocks for vmap allocation") can
    cause spurious percpu allocation failures.  These, in turn, can result
    in panic()s in the slub code.  One such possible panic was reported by
    Dave Hansen in following link https://lkml.org/lkml/2019/6/19/939.
    Another related panic observed is,
    
     RIP: 0033:0x7f46f7441b9b
     Call Trace:
      dump_stack+0x61/0x80
      pcpu_alloc.cold.30+0x22/0x4f
      mem_cgroup_css_alloc+0x110/0x650
      cgroup_apply_control_enable+0x133/0x330
      cgroup_mkdir+0x41b/0x500
      kernfs_iop_mkdir+0x5a/0x90
      vfs_mkdir+0x102/0x1b0
      do_mkdirat+0x7d/0xf0
      do_syscall_64+0x5b/0x180
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    VMALLOC memory manager divides the entire VMALLOC space (VMALLOC_START
    to VMALLOC_END) into multiple VM areas (struct vm_areas), and it mainly
    uses two lists (vmap_area_list & free_vmap_area_list) to track the used
    and free VM areas in VMALLOC space.  And pcpu_get_vm_areas(offsets[],
    sizes[], nr_vms, align) function is used for allocating congruent VM
    areas for percpu memory allocator.  In order to not conflict with
    VMALLOC users, pcpu_get_vm_areas allocates VM areas near the end of the
    VMALLOC space.  So the search for free vm_area for the given requirement
    starts near VMALLOC_END and moves upwards towards VMALLOC_START.
    
    Prior to commit 68ad4a33, the search for free vm_area in
    pcpu_get_vm_areas() involves following two main steps.
    
    Step 1:
        Find a aligned "base" adress near VMALLOC_END.
        va = free vm area near VMALLOC_END
    Step 2:
        Loop through number of requested vm_areas and check,
            Step 2.1:
               if (base < VMALLOC_START)
                  1. fail with error
            Step 2.2:
               // end is offsets[area] + sizes[area]
               if (base + end > va->vm_end)
                   1. Move the base downwards and repeat Step 2
            Step 2.3:
               if (base + start < va->vm_start)
                  1. Move to previous free vm_area node, find aligned
                     base address and repeat Step 2
    
    But Commit 68ad4a33 removed Step 2.2 and modified Step 2.3 as below:
    
            Step 2.3:
               if (base + start < va->vm_start || base + end > va->vm_end)
                  1. Move to previous free vm_area node, find aligned
                     base address and repeat Step 2
    
    Above change is the root cause of spurious percpu memory allocation
    failures.  For example, consider a case where a relatively large vm_area
    (~ 30 TB) was ignored in free vm_area search because it did not pass the
    base + end < vm->vm_end boundary check.  Ignoring such large free
    vm_area's would lead to not finding free vm_area within boundary of
    VMALLOC_start to VMALLOC_END which in turn leads to allocation failures.
    
    So modify the search algorithm to include Step 2.2.
    
    Link: http://lkml.kernel.org/r/20190729232139.91131-1-sathyanarayanan.kuppuswamy@linux.intel.com
    Fixes: 68ad4a33
    
     ("mm/vmalloc.c: keep track of free blocks for vmap allocation")
    Signed-off-by: default avatarKuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
    Reported-by: default avatarDave Hansen <dave.hansen@intel.com>
    Acked-by: default avatarDennis Zhou <dennis@kernel.org>
    Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: sathyanarayanan kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    5336e52c