Skip to content
  • Filipe Manana's avatar
    Btrfs: fix race leading to fs corruption after transaction abort · cb2d3dad
    Filipe Manana authored
    When one transaction is finishing its commit, it is possible for another
    transaction to start and enter its initial commit phase as well. If the
    first ends up getting aborted, we have a small time window where the second
    transaction commit does not notice that the previous transaction aborted
    and ends up committing, writing a superblock that points to btrees that
    reference extent buffers (nodes and leafs) that were not persisted to disk.
    The consequence is that after mounting the filesystem again, we will be
    unable to load some btree nodes/leafs, either because the content on disk
    is either garbage (or just zeroes) or corresponds to the old content of a
    previouly COWed or deleted node/leaf, resulting in the well known error
    messages "parent transid verify failed on ...".
    The following sequence diagram illustrates how this can happen.
    
            CPU 1                                           CPU 2
    
     <at transaction N>
    
     btrfs_commit_transaction()
       (...)
       --> sets transaction state to
           TRANS_STATE_UNBLOCKED
       --> sets fs_info->running_transaction
           to NULL
    
                                                        (...)
                                                        btrfs_start_transaction()
                                                          start_transaction()
                                                            wait_current_trans()
                                                              --> returns immediately
                                                                  because
                                                                  fs_info->running_transaction
                                                                  is NULL
                                                            join_transaction()
                                                              --> creates transaction N + 1
                                                              --> sets
                                                                  fs_info->running_transaction
                                                                  to transaction N + 1
                                                              --> adds transaction N + 1 to
                                                                  the fs_info->trans_list list
                                                            --> returns transaction handle
                                                                pointing to the new
                                                                transaction N + 1
                                                        (...)
    
                                                        btrfs_sync_file()
                                                          btrfs_start_transaction()
                                                            --> returns handle to
                                                                transaction N + 1
                                                          (...)
    
       btrfs_write_and_wait_transaction()
         --> writeback of some extent
             buffer fails, returns an
    	 error
       btrfs_handle_fs_error()
         --> sets BTRFS_FS_STATE_ERROR in
             fs_info->fs_state
       --> jumps to label "scrub_continue"
       cleanup_transaction()
         btrfs_abort_transaction(N)
           --> sets BTRFS_FS_STATE_TRANS_ABORTED
               flag in fs_info->fs_state
           --> sets aborted field in the
               transaction and transaction
    	   handle structures, for
               transaction N only
         --> removes transaction from the
             list fs_info->trans_list
                                                          btrfs_commit_transaction(N + 1)
                                                            --> transaction N + 1 was not
    							    aborted, so it proceeds
                                                            (...)
                                                            --> sets the transaction's state
                                                                to TRANS_STATE_COMMIT_START
                                                            --> does not find the previous
                                                                transaction (N) in the
                                                                fs_info->trans_list, so it
                                                                doesn't know that transaction
                                                                was aborted, and the commit
                                                                of transaction N + 1 proceeds
                                                            (...)
                                                            --> sets transaction N + 1 state
                                                                to TRANS_STATE_UNBLOCKED
                                                            btrfs_write_and_wait_transaction()
                                                              --> succeeds writing all extent
                                                                  buffers created in the
                                                                  transaction N + 1
                                                            write_all_supers()
                                                               --> succeeds
                                                               --> we now have a superblock on
                                                                   disk that points to trees
                                                                   that refer to at least one
                                                                   extent buffer that was
                                                                   never persisted
    
    So fix this by updating the transaction commit path to check if the flag
    BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
    the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
    transaction in the fs_info->trans_list. If the flag is set, just fail the
    transaction commit with -EROFS, as we do in other places. The exact error
    code for the previous transaction abort was already logged and reported.
    
    Fixes: 49b25e05
    
     ("btrfs: enhance transaction abort infrastructure")
    CC: stable@vger.kernel.org # 4.4+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    cb2d3dad