Skip to content
  • Vishal Verma's avatar
    libnvdimm, btt: rework error clearing · d9b83c75
    Vishal Verma authored
    
    
    Clearing errors or badblocks during a BTT write requires sending an ACPI
    DSM, which means potentially sleeping. Since a BTT IO happens in atomic
    context (preemption disabled, spinlocks may be held), we cannot perform
    error clearing in the course of an IO. Due to this error clearing for
    BTT IOs has hitherto been disabled.
    
    In this patch we move error clearing out of the atomic section, and thus
    re-enable error clearing with BTTs. When we are about to add a block to
    the free list, we check if it was previously marked as an error, and if
    it was, we add it to the freelist, but also set a flag that says error
    clearing will be required. We then drop the lane (ending the atomic
    context), and send a zero buffer so that the error can be cleared. The
    error flag in the free list is protected by the nd 'lane', and is set
    only be a thread while it holds that lane. When the error is cleared,
    the flag is cleared, but while holding a mutex for that freelist index.
    
    When writing, we check for two things -
    1/ If the freelist mutex is held or if the error flag is set. If so,
    this is an error block that is being (or about to be) cleared.
    2/ If the block is a known badblock based on nsio->bb
    
    The second check is required because the BTT map error flag for a map
    entry only gets set when an error LBA is read. If we write to a new
    location that may not have the map error flag set, but still might be in
    the region's badblock list, we can trigger an EIO on the write, which is
    undesirable and completely avoidable.
    
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Cc: Toshi Kani <toshi.kani@hpe.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
    Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
    d9b83c75