Skip to content
  • Filipe Manana's avatar
    Btrfs: fix race between shrinking truncate and fiemap · 28553fa9
    Filipe Manana authored
    
    
    When there is a fiemap executing in parallel with a shrinking truncate
    we can end up in a situation where we have extent maps for which we no
    longer have corresponding file extent items. This is generally harmless
    and at the moment the only consequences are missing file extent items
    representing holes after we expand the file size again after the
    truncate operation removed the prealloc extent items, and stale
    information for future fiemap calls (reporting extents that no longer
    exist or may have been reallocated to other files for example).
    
    Consider the following example:
    
    1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
       and a 1MiB prealloc extent at file offset 128KiB;
    
    2) Task A starts doing a shrinking truncate of our inode to reduce it to
       a size of 64KiB. Before it searches the subvolume tree for file
       extent items to delete, it drops all the extent maps in the range
       from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();
    
    3) Task B starts doing a fiemap against our inode. When looking up for
       the inode's extent maps in the range from 128KiB to (u64)-1, it
       doesn't find any in the inode's extent map tree, since they were
       removed by task A.  Because it didn't find any in the extent map
       tree, it scans the inode's subvolume tree for file extent items, and
       it finds the 1MiB prealloc extent at file offset 128KiB, then it
       creates an extent map based on that file extent item and adds it to
       inode's extent map tree (this ends up being done by
       btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
       get_extent_skip_holes());
    
    4) Task A then drops the prealloc extent at file offset 128KiB and
       shrinks the 128KiB extent file offset 0 to a length of 64KiB. The
       truncation operation finishes and we end up with an extent map
       representing a 1MiB prealloc extent at file offset 128KiB, despite we
       don't have any more that extent;
    
    After this the two types of problems we have are:
    
    1) Future calls to fiemap always report that a 1MiB prealloc extent
       exists at file offset 128KiB. This is stale information, no longer
       correct;
    
    2) If the size of the file is increased, by a truncate operation that
       increases the file size or by a write into a file offset > 64KiB for
       example, we end up not inserting file extent items to represent holes
       for any range between 128KiB and 128KiB + 1MiB, since the hole
       expansion function, btrfs_cont_expand() will skip hole insertion for
       any range for which an extent map exists that represents a prealloc
       extent. This causes fsck to complain about missing file extent items
       when not using the NO_HOLES feature.
    
    The second issue could be often triggered by test case generic/561 from
    fstests, which runs fsstress and duperemove in parallel, and duperemove
    does frequent fiemap calls.
    
    Essentially the problems happens because fiemap does not acquire the
    inode's lock while truncate does, and fiemap locks the file range in the
    inode's iotree while truncate does not. So fix the issue by making
    btrfs_truncate_inode_items() lock the file range from the new file size
    to (u64)-1, so that it serializes with fiemap.
    
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    28553fa9