1. 12 Sep, 2019 1 commit
    • Filipe Manana's avatar
      Btrfs: fix unwritten extent buffers and hangs on future writeback attempts · 18dfa711
      Filipe Manana authored
      The lock_extent_buffer_io() returns 1 to the caller to tell it everything
      went fine and the callers needs to start writeback for the extent buffer
      (submit a bio, etc), 0 to tell the caller everything went fine but it does
      not need to start writeback for the extent buffer, and a negative value if
      some error happened.
      When it's about to return 1 it tries to lock all pages, and if a try lock
      on a page fails, and we didn't flush any existing bio in our "epd", it
      calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
      an error. The page might have been locked elsewhere, not with the goal
      of starting writeback of the extent buffer, and even by some code other
      than btrfs, like page migration for example, so it does not mean the
      writeback of the extent buffer was already started by some other task,
      so returning a 0 tells the caller (btree_write_cache_pages()) to not
      start writeback for the extent buffer. Note that epd might currently have
      either no bio, so flush_write_bio() returns 0 (success) or it might have
      a bio for another extent buffer with a lower index (logical address).
      Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
      extent buffer and writeback is never started for the extent buffer,
      future attempts to writeback the extent buffer will hang forever waiting
      on that bit to be cleared, since it can only be cleared after writeback
      completes. Such hang is reported with a trace like the following:
        [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
        [49887.347059]       Not tainted 5.2.13-gentoo #2
        [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
        [49887.347064] Call Trace:
        [49887.347069]  ? __schedule+0x265/0x830
        [49887.347071]  ? bit_wait+0x50/0x50
        [49887.347072]  ? bit_wait+0x50/0x50
        [49887.347074]  schedule+0x24/0x90
        [49887.347075]  io_schedule+0x3c/0x60
        [49887.347077]  bit_wait_io+0x8/0x50
        [49887.347079]  __wait_on_bit+0x6c/0x80
        [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
        [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
        [49887.347084]  ? var_wake_function+0x20/0x20
        [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
        [49887.347089]  btree_write_cache_pages+0x18e/0x340
        [49887.347091]  do_writepages+0x29/0xb0
        [49887.347093]  ? kmem_cache_free+0x132/0x160
        [49887.347095]  ? convert_extent_bit+0x544/0x680
        [49887.347097]  filemap_fdatawrite_range+0x70/0x90
        [49887.347099]  btrfs_write_marked_extents+0x53/0x120
        [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
        [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
        [49887.347103]  ? start_transaction+0x33e/0x500
        [49887.347105]  transaction_kthread+0x139/0x15c
      So fix this by not overwriting the return value (ret) with the result
      from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
      bit in case flush_write_bio() returns an error, otherwise it will hang
      any future attempts to writeback the extent buffer, and undo all work
      done before (set back EXTENT_BUFFER_DIRTY, etc).
      This is a regression introduced in the 5.2 kernel.
      Fixes: 2e3c2513 ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
      Fixes: f4340622
       ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
      Reported-by: default avatarZdenek Sojka <zsojka@seznam.cz>
      Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
      Reported-by: default avatarStefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
      Reported-by: default avatarDrazen Kacar <drazen.kacar@oradian.com>
      Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
  2. 05 Jul, 2019 1 commit
  3. 04 Jul, 2019 1 commit
  4. 02 Jul, 2019 5 commits
  5. 01 Jul, 2019 5 commits
  6. 30 Apr, 2019 1 commit
  7. 29 Apr, 2019 26 commits