Skip to content
  • Filipe Manana's avatar
    Btrfs: fix metadata inconsistencies after directory fsync · 2f2ff0ee
    Filipe Manana authored
    
    
    We can get into inconsistency between inodes and directory entries
    after fsyncing a directory. The issue is that while a directory gets
    the new dentries persisted in the fsync log and replayed at mount time,
    the link count of the inode that directory entries point to doesn't
    get updated, staying with an incorrect link count (smaller then the
    correct value). This later leads to stale file handle errors when
    accessing (including attempt to delete) some of the links if all the
    other ones are removed, which also implies impossibility to delete the
    parent directories, since the dentries can not be removed.
    
    Another issue is that (unlike ext3/4, xfs, f2fs, reiserfs, nilfs2),
    when fsyncing a directory, new files aren't logged (their metadata and
    dentries) nor any child directories. So this patch fixes this issue too,
    since it has the same resolution as the incorrect inode link count issue
    mentioned before.
    
    This is very easy to reproduce, and the following excerpt from my test
    case for xfstests shows how:
    
      _scratch_mkfs >> $seqres.full 2>&1
      _init_flakey
      _mount_flakey
    
      # Create our main test file and directory.
      $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foo | _filter_xfs_io
      mkdir $SCRATCH_MNT/mydir
    
      # Make sure all metadata and data are durably persisted.
      sync
    
      # Add a hard link to 'foo' inside our test directory and fsync only the
      # directory. The btrfs fsync implementation had a bug that caused the new
      # directory entry to be visible after the fsync log replay but, the inode
      # of our file remained with a link count of 1.
      ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_2
    
      # Add a few more links and new files.
      # This is just to verify nothing breaks or gives incorrect results after the
      # fsync log is replayed.
      ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_3
      $XFS_IO_PROG -f -c "pwrite -S 0xff 0 64K" $SCRATCH_MNT/hello | _filter_xfs_io
      ln $SCRATCH_MNT/hello $SCRATCH_MNT/mydir/hello_2
    
      # Add some subdirectories and new files and links to them. This is to verify
      # that after fsyncing our top level directory 'mydir', all the subdirectories
      # and their files/links are registered in the fsync log and exist after the
      # fsync log is replayed.
      mkdir -p $SCRATCH_MNT/mydir/x/y/z
      ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/foo_y_link
      ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/z/foo_z_link
      touch $SCRATCH_MNT/mydir/x/y/z/qwerty
    
      # Now fsync only our top directory.
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/mydir
    
      # And fsync now our new file named 'hello', just to verify later that it has
      # the expected content and that the previous fsync on the directory 'mydir' had
      # no bad influence on this fsync.
      $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/hello
    
      # Simulate a crash/power loss.
      _load_flakey_table $FLAKEY_DROP_WRITES
      _unmount_flakey
    
      _load_flakey_table $FLAKEY_ALLOW_WRITES
      _mount_flakey
    
      # Verify the content of our file 'foo' remains the same as before, 8192 bytes,
      # all with the value 0xaa.
      echo "File 'foo' content after log replay:"
      od -t x1 $SCRATCH_MNT/foo
    
      # Remove the first name of our inode. Because of the directory fsync bug, the
      # inode's link count was 1 instead of 5, so removing the 'foo' name ended up
      # deleting the inode and the other names became stale directory entries (still
      # visible to applications). Attempting to remove or access the remaining
      # dentries pointing to that inode resulted in stale file handle errors and
      # made it impossible to remove the parent directories since it was impossible
      # for them to become empty.
      echo "file 'foo' link count after log replay: $(stat -c %h $SCRATCH_MNT/foo)"
      rm -f $SCRATCH_MNT/foo
    
      # Now verify that all files, links and directories created before fsyncing our
      # directory exist after the fsync log was replayed.
      [ -f $SCRATCH_MNT/mydir/foo_2 ] || echo "Link mydir/foo_2 is missing"
      [ -f $SCRATCH_MNT/mydir/foo_3 ] || echo "Link mydir/foo_3 is missing"
      [ -f $SCRATCH_MNT/hello ] || echo "File hello is missing"
      [ -f $SCRATCH_MNT/mydir/hello_2 ] || echo "Link mydir/hello_2 is missing"
      [ -f $SCRATCH_MNT/mydir/x/y/foo_y_link ] || \
          echo "Link mydir/x/y/foo_y_link is missing"
      [ -f $SCRATCH_MNT/mydir/x/y/z/foo_z_link ] || \
          echo "Link mydir/x/y/z/foo_z_link is missing"
      [ -f $SCRATCH_MNT/mydir/x/y/z/qwerty ] || \
          echo "File mydir/x/y/z/qwerty is missing"
    
      # We expect our file here to have a size of 64Kb and all the bytes having the
      # value 0xff.
      echo "file 'hello' content after log replay:"
      od -t x1 $SCRATCH_MNT/hello
    
      # Now remove all files/links, under our test directory 'mydir', and verify we
      # can remove all the directories.
      rm -f $SCRATCH_MNT/mydir/x/y/z/*
      rmdir $SCRATCH_MNT/mydir/x/y/z
      rm -f $SCRATCH_MNT/mydir/x/y/*
      rmdir $SCRATCH_MNT/mydir/x/y
      rmdir $SCRATCH_MNT/mydir/x
      rm -f $SCRATCH_MNT/mydir/*
      rmdir $SCRATCH_MNT/mydir
    
      # An fsck, run by the fstests framework everytime a test finishes, also detected
      # the inconsistency and printed the following error message:
      #
      # root 5 inode 257 errors 2001, no inode item, link count wrong
      #    unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
      #    unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref
    
      status=0
      exit
    
    The expected golden output for the test is:
    
      wrote 8192/8192 bytes at offset 0
      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
      wrote 65536/65536 bytes at offset 0
      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
      File 'foo' content after log replay:
      0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
      *
      0020000
      file 'foo' link count after log replay: 5
      file 'hello' content after log replay:
      0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      *
      0200000
    
    Which is the output after this patch and when running the test against
    ext3/4, xfs, f2fs, reiserfs or nilfs2. Without this patch, the test's
    output is:
    
      wrote 8192/8192 bytes at offset 0
      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
      wrote 65536/65536 bytes at offset 0
      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
      File 'foo' content after log replay:
      0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
      *
      0020000
      file 'foo' link count after log replay: 1
      Link mydir/foo_2 is missing
      Link mydir/foo_3 is missing
      Link mydir/x/y/foo_y_link is missing
      Link mydir/x/y/z/foo_z_link is missing
      File mydir/x/y/z/qwerty is missing
      file 'hello' content after log replay:
      0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      *
      0200000
      rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y/z': No such file or directory
      rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y': No such file or directory
      rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x': No such file or directory
      rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_2': Stale file handle
      rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_3': Stale file handle
      rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir': Directory not empty
    
    Fsck, without this fix, also complains about the wrong link count:
    
      root 5 inode 257 errors 2001, no inode item, link count wrong
          unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
          unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref
    
    So fix this by logging the inodes that the dentries point to when
    fsyncing a directory.
    
    A test case for xfstests follows.
    
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    2f2ff0ee