1. 13 Dec, 2019 1 commit
    • Filipe Manana's avatar
      Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues · 6609fee8
      Filipe Manana authored
      When a tree mod log user no longer needs to use the tree it calls
      btrfs_put_tree_mod_seq() to remove itself from the list of users and
      delete all no longer used elements of the tree's red black tree, which
      should be all elements with a sequence number less then our equals to
      the caller's sequence number. However the logic is broken because it
      can delete and free elements from the red black tree that have a
      sequence number greater then the caller's sequence number:
      
      1) At a point in time we have sequence numbers 1, 2, 3 and 4 in the
         tree mod log;
      
      2) The task which got assigned the sequence number 1 calls
         btrfs_put_tree_mod_seq();
      
      3) Sequence number 1 is deleted from the list of sequence numbers;
      
      4) The current minimum sequence number is computed to be the sequence
         number 2;
      
      5) A task using sequence number 2 is at tree_mod_log_rewind() and gets
         a pointer to one of its elements from the red black tree through
         a call to tree_mod_log_search();
      
      6) The task with sequence number 1 iterates the red black tree of tree
         modification elements and deletes (and frees) all elements with a
         sequence number less then or equals to 2 (the computed minimum sequence
         number) - it ends up only leaving elements with sequence numbers of 3
         and 4;
      
      7) The task with sequence number 2 now uses the pointer to its element,
         already freed by the other task, at __tree_mod_log_rewind(), resulting
         in a use-after-free issue. When CONFIG_DEBUG_PAGEALLOC=y it produces
         a trace like the following:
      
        [16804.546854] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        [16804.547451] CPU: 0 PID: 28257 Comm: pool Tainted: G        W         5.4.0-rc8-btrfs-next-51 #1
        [16804.548059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [16804.548666] RIP: 0010:rb_next+0x16/0x50
        (...)
        [16804.550581] RSP: 0018:ffffb948418ef9b0 EFLAGS: 00010202
        [16804.551227] RAX: 6b6b6b6b6b6b6b6b RBX: ffff90e0247f6600 RCX: 6b6b6b6b6b6b6b6b
        [16804.551873] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e0247f6600
        [16804.552504] RBP: ffff90dffe0d4688 R08: 0000000000000001 R09: 0000000000000000
        [16804.553136] R10: ffff90dffa4a0040 R11: 0000000000000000 R12: 000000000000002e
        [16804.553768] R13: ffff90e0247f6600 R14: 0000000000001663 R15: ffff90dff77862b8
        [16804.554399] FS:  00007f4b197ae700(0000) GS:ffff90e036a00000(0000) knlGS:0000000000000000
        [16804.555039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [16804.555683] CR2: 00007f4b10022000 CR3: 00000002060e2004 CR4: 00000000003606f0
        [16804.556336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [16804.556968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [16804.557583] Call Trace:
        [16804.558207]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
        [16804.558835]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
        [16804.559468]  resolve_indirect_refs+0x1eb/0xc70 [btrfs]
        [16804.560087]  ? free_extent_buffer.part.19+0x5a/0xc0 [btrfs]
        [16804.560700]  find_parent_nodes+0x388/0x1120 [btrfs]
        [16804.561310]  btrfs_check_shared+0x115/0x1c0 [btrfs]
        [16804.561916]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
        [16804.562518]  extent_fiemap+0x59d/0x6d0 [btrfs]
        [16804.563112]  ? __might_fault+0x11/0x90
        [16804.563706]  do_vfs_ioctl+0x45a/0x700
        [16804.564299]  ksys_ioctl+0x70/0x80
        [16804.564885]  ? trace_hardirqs_off_thunk+0x1a/0x20
        [16804.565461]  __x64_sys_ioctl+0x16/0x20
        [16804.566020]  do_syscall_64+0x5c/0x250
        [16804.566580]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [16804.567153] RIP: 0033:0x7f4b1ba2add7
        (...)
        [16804.568907] RSP: 002b:00007f4b197adc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [16804.569513] RAX: ffffffffffffffda RBX: 00007f4b100210d8 RCX: 00007f4b1ba2add7
        [16804.570133] RDX: 00007f4b100210d8 RSI: 00000000c020660b RDI: 0000000000000003
        [16804.570726] RBP: 000055de05a6cfe0 R08: 0000000000000000 R09: 00007f4b197add44
        [16804.571314] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4b197add48
        [16804.571905] R13: 00007f4b197add40 R14: 00007f4b100210d0 R15: 00007f4b197add50
        (...)
        [16804.575623] ---[ end trace 87317359aad4ba50 ]---
      
      Fix this by making btrfs_put_tree_mod_seq() skip deletion of elements that
      have a sequence number equals to the computed minimum sequence number, and
      not just elements with a sequence number greater then that minimum.
      
      Fixes: bd989ba3
      
       ("Btrfs: add tree modification log functions")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6609fee8
  2. 18 Nov, 2019 12 commits
  3. 09 Sep, 2019 8 commits
    • Nikolay Borisov's avatar
      btrfs: Don't assign retval of btrfs_try_tree_write_lock/btrfs_tree_read_lock_atomic · 65e99c43
      Nikolay Borisov authored
      
      
      Those function are simple boolean predicates there is no need to assign
      their return values to interim variables. Use them directly as
      predicates. No functional changes.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      65e99c43
    • Johannes Thumshirn's avatar
      btrfs: create structure to encode checksum type and length · af024ed2
      Johannes Thumshirn authored
      
      
      Create a structure to encode the type and length for the known on-disk
      checksums.  This makes it easier to add new checksums later.
      
      The structure and helpers are moved from ctree.h so they don't occupy
      space in all headers including ctree.h. This save some space in the
      final object.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      af024ed2
    • David Sterba's avatar
      btrfs: tie extent buffer and it's token together · c82f823c
      David Sterba authored
      
      
      Further simplifaction of the get/set helpers is possible when the token
      is uniquely tied to an extent buffer. A condition and an assignment can
      be avoided.
      
      The initializations are moved closer to the first use when the extent
      buffer is valid. There's one exception in __push_leaf_left where the
      token is reused.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c82f823c
    • David Sterba's avatar
      btrfs: move functions for tree compare to send.c · 18d0f5c6
      David Sterba authored
      
      
      Send is the only user of tree_compare, we can move it there along with
      the other helpers and definitions.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      18d0f5c6
    • David Sterba's avatar
      btrfs: rename and export read_node_slot · 4b231ae4
      David Sterba authored
      
      
      Preparatory work for code that will be moved out of ctree and uses this
      function.
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4b231ae4
    • Filipe Manana's avatar
      Btrfs: fix use-after-free when using the tree modification log · efad8a85
      Filipe Manana authored
      At ctree.c:get_old_root(), we are accessing a root's header owner field
      after we have freed the respective extent buffer. This results in an
      use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
      is set, results in a stack trace like the following:
      
        [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
        [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
        (...)
        [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286
        [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000
        [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246
        [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000
        [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000
        [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8
        [ 3876.799620] FS:  00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000
        [ 3876.799643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0
        [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 3876.799723] Call Trace:
        [ 3876.799735]  ? do_raw_spin_unlock+0x49/0xc0
        [ 3876.799749]  ? _raw_spin_unlock+0x24/0x30
        [ 3876.799779]  resolve_indirect_refs+0x1eb/0xc80 [btrfs]
        [ 3876.799810]  find_parent_nodes+0x38d/0x1180 [btrfs]
        [ 3876.799841]  btrfs_check_shared+0x11a/0x1d0 [btrfs]
        [ 3876.799870]  ? extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799895]  extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799913]  do_vfs_ioctl+0x45a/0x700
        [ 3876.799926]  ksys_ioctl+0x70/0x80
        [ 3876.799938]  ? trace_hardirqs_off_thunk+0x1a/0x20
        [ 3876.799953]  __x64_sys_ioctl+0x16/0x20
        [ 3876.799965]  do_syscall_64+0x62/0x220
        [ 3876.799977]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 3876.799993] RIP: 0033:0x7f23e0013dd7
        (...)
        [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7
        [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003
        [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c
        [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80
        [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80
        (...)
        [ 3876.805107] ---[ end trace e53161e179ef04f9 ]---
      
      Fix that by saving the root's header owner field into a local variable
      before freeing the root's extent buffer, and then use that local variable
      when needed.
      
      Fixes: 30b0463a
      
       ("Btrfs: fix accessing the root pointer in tree mod log functions")
      CC: stable@vger.kernel.org # 3.10+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      efad8a85
    • Josef Bacik's avatar
      btrfs: make caching_thread use btrfs_find_next_key · 6a9fb468
      Josef Bacik authored
      
      
      extent-tree.c has a find_next_key that just walks up the path to find
      the next key, but it is used for both the caching stuff and the snapshot
      delete stuff.  The snapshot deletion stuff is special so it can't really
      use btrfs_find_next_key, but the caching thread stuff can.  We just need
      to fix btrfs_find_next_key to deal with ->skip_locking and then it works
      exactly the same as the private find_next_key helper.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6a9fb468
    • David Sterba's avatar
      btrfs: assert tree mod log lock in __tree_mod_log_insert · 73e82fe4
      David Sterba authored
      
      
      The tree is going to be modified so it must be the exclusive lock.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      73e82fe4
  4. 29 Apr, 2019 19 commits