Skip to content
  • Robert Richter's avatar
    EDAC/mc: Fix use-after-free and memleaks during device removal · 216aa145
    Robert Richter authored
    A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and
    DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
    
    1) Use-after-free:
    
    On 27.11.19 17:07:33, John Garry wrote:
    > [   22.104498] BUG: KASAN: use-after-free in
    > edac_remove_sysfs_mci_device+0x148/0x180
    
    The use-after-free is caused by the mci_for_each_dimm() macro called in
    edac_remove_sysfs_mci_device(). The iterator was introduced with
    
      c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator").
    
    The iterator loop calls device_unregister(&dimm->dev), which removes
    the sysfs entry of the device, but also frees the dimm struct in
    dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(),
    the dimm struct is accessed again, after having been freed already.
    
    The fix is to free all the mci device's subsequent dimm and csrow
    objects at a later point, in _edac_mc_free(), when the mci device itself
    is being freed.
    
    This keeps the data structures intact and the mci device can be
    fully used until its removal. The change allows the safe usage of
    mci_for_each_dimm() to release dimm devices from sysfs.
    
    2) Memory leaks:
    
    Following memory leaks have been detected:
    
     # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c
           1     [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0      # mci->csrows
          16     [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0      # csr->channels
          16     [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0      # csr->channels[chn]
           1     [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0      # mci->dimms
          34     [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
    
    All leaks are from memory allocated by edac_mc_alloc().
    
    Note: The test above shows that edac_mc_alloc() was called here from
    ghes_edac_register(), thus both functions show up in the stack trace
    but the module causing the leaks is edac_mc. The comments with the data
    structures involved were made manually by analyzing the objdump.
    
    The data structures listed above and created by edac_mc_alloc() are
    not properly removed during device removal, which is done in
    edac_mc_free().
    
    There are two paths implemented to remove the device depending on device
    registration, _edac_mc_free() is called if the device is not registered
    and edac_unregister_sysfs() otherwise.
    
    The implemenations differ. For the sysfs case, the mci device removal
    lacks the removal of subsequent data structures (csrows, channels,
    dimms). This causes the memory leaks (see mci_attr_release()).
    
     [ bp: Massage commit message. ]
    
    Fixes: c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator")
    Fixes: faa2ad09 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.")
    Fixes: 7a623c03
    
     ("edac: rewrite the sysfs code to use struct device")
    Reported-by: default avatarJohn Garry <john.garry@huawei.com>
    Signed-off-by: default avatarRobert Richter <rrichter@marvell.com>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Tested-by: default avatarJohn Garry <john.garry@huawei.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
    216aa145