- 12 Sep, 2019 1 commit
-
-
Filipe Manana authored
The lock_extent_buffer_io() returns 1 to the caller to tell it everything went fine and the callers needs to start writeback for the extent buffer (submit a bio, etc), 0 to tell the caller everything went fine but it does not need to start writeback for the extent buffer, and a negative value if some error happened. When it's about to return 1 it tries to lock all pages, and if a try lock on a page fails, and we didn't flush any existing bio in our "epd", it calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or an error. The page might have been locked elsewhere, not with the goal of starting writeback of the extent buffer, and even by some code other than btrfs, like page migration for example, so it does not mean the writeback of the extent buffer was already started by some other task, so returning a 0 tells the caller (btree_write_cache_pages()) to not start writeback for the extent buffer. Note that epd might currently have either no bio, so flush_write_bio() returns 0 (success) or it might have a bio for another extent buffer with a lower index (logical address). Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the extent buffer and writeback is never started for the extent buffer, future attempts to writeback the extent buffer will hang forever waiting on that bit to be cleared, since it can only be cleared after writeback completes. Such hang is reported with a trace like the following: [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds. [49887.347059] Not tainted 5.2.13-gentoo #2 [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [49887.347062] btrfs-transacti D 0 1752 2 0x80004000 [49887.347064] Call Trace: [49887.347069] ? __schedule+0x265/0x830 [49887.347071] ? bit_wait+0x50/0x50 [49887.347072] ? bit_wait+0x50/0x50 [49887.347074] schedule+0x24/0x90 [49887.347075] io_schedule+0x3c/0x60 [49887.347077] bit_wait_io+0x8/0x50 [49887.347079] __wait_on_bit+0x6c/0x80 [49887.347081] ? __lock_release.isra.29+0x155/0x2d0 [49887.347083] out_of_line_wait_on_bit+0x7b/0x80 [49887.347084] ? var_wake_function+0x20/0x20 [49887.347087] lock_extent_buffer_for_io+0x28c/0x390 [49887.347089] btree_write_cache_pages+0x18e/0x340 [49887.347091] do_writepages+0x29/0xb0 [49887.347093] ? kmem_cache_free+0x132/0x160 [49887.347095] ? convert_extent_bit+0x544/0x680 [49887.347097] filemap_fdatawrite_range+0x70/0x90 [49887.347099] btrfs_write_marked_extents+0x53/0x120 [49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0 [49887.347102] btrfs_commit_transaction+0x6bb/0x990 [49887.347103] ? start_transaction+0x33e/0x500 [49887.347105] transaction_kthread+0x139/0x15c So fix this by not overwriting the return value (ret) with the result from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK bit in case flush_write_bio() returns an error, otherwise it will hang any future attempts to writeback the extent buffer, and undo all work done before (set back EXTENT_BUFFER_DIRTY, etc). This is a regression introduced in the 5.2 kernel. Fixes: 2e3c2513 ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()") Fixes: f4340622 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up") Reported-by:
Zdenek Sojka <zsojka@seznam.cz> Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u Reported-by:
Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t Reported-by:
Drazen Kacar <drazen.kacar@oradian.com> Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377 Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 05 Jul, 2019 1 commit
-
-
Colin Ian King authored
Currently if the allocation of roots or tmp_ulist fails the error handling does not free up the allocation of path causing a memory leak. Fix this and other similar leaks by moving the call of btrfs_free_path from label out to label out_free_ulist. Kudos to David Sterba for spotting the issue in my original fix and suggesting the correct way to fix the leak and Anand Jain for spotting a double free issue. Addresses-Coverity: ("Resource leak") Fixes: 5911c8fe ("btrfs: fiemap: preallocate ulists for btrfs_check_shared") Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Colin Ian King <colin.king@canonical.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 04 Jul, 2019 1 commit
-
-
Goldwyn Rodrigues authored
Simplification. No point passing the tree variable when it can be evaluated from inode. The tests now use the io_tree from btrfs_inode as opposed to creating one. Signed-off-by:
Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 02 Jul, 2019 5 commits
-
-
David Sterba authored
The block device is passed around for the only purpose to set it in new bios. Move the assignment one level up. This is a preparatory patch for further bdev cleanups. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Print the error messages using the helpers that also print the filesystem identification. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
The write_locks is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
The spinning_writers is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
The blocking_writers is either 0 or 1 and always updated under the lock, so we don't need the atomic_t semantics. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 01 Jul, 2019 5 commits
-
-
Nikolay Borisov authored
The function has a lot of return values and specific conventions making it cumbersome to understand what's returned. Have a go at documenting its parameters and return values. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Currently find_first_clear_extent_bit always returns a range whose starting value is >= passed 'start'. This implicit trimming behavior is somewhat subtle and an implementation detail. Instead, this patch modifies the function such that now it always returns the range which contains passed 'start' and has the given bits unset. This range could either be due to presence of existing records which contains 'start' but have the bits unset or because there are no records that contain the given starting offset. This patch also adds test cases which cover find_first_clear_extent_bit since they were missing up until now. Reviewed-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
There several functions which open code btrfs_lock_and_flush_ordered_range, just replace them with a call to the function. No functional changes. Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
At the context of btrfs_run_delalloc_range(), we haven't started/joined a transaction, thus even something went wrong, we can't and won't abort transaction, thus no way to make the fs RO. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
btrfs_check_shared looks up parents of a given extent and uses ulists for that. These are allocated and freed repeatedly. Preallocation in the caller will avoid the overhead and also allow us to use the GFP_KERNEL as it is happens before the extent locks are taken. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 30 Apr, 2019 1 commit
-
-
Christoph Hellwig authored
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by:
Matthew Wilcox <willy@infradead.org> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Acked-by:
David Sterba <dsterba@suse.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Acked-by:
Coly Li <colyli@suse.de> Reviewed-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 29 Apr, 2019 26 commits
-
-
David Sterba authored
Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
None of the implementers of the submit_bio_hook use the bio_offset parameter, simply remove it. No functional changes. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This function always uses the btree inode's io_tree. Stop taking the tree as a function argument and instead access it internally from read_extent_buffer_pages. No functional changes. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This function is very similar to find_first_extent_bit except that it locates the first contiguous span of space which does not have bits set. It's intended use is in the freespace trimming code. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It will be used in a future patch that will require modifying an extent_io_tree struct under a spinlock. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This function is going to be used to clear out the device extent allocation information. Give it a more generic name and export it. This is in preparation to replacing the pending/pinned chunk lists with an extent tree. No functional changes. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from extent buffer and can drop it from the parameters. As all callsites are updated, add the btrfs_ prefix as the function is exported. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
We can read fs_info from extent buffer and can drop it from the parameters. Signed-off-by:
David Sterba <dsterba@suse.com>
-
Arnd Bergmann authored
BUG_ON(1) leads to bogus warnings from clang when CONFIG_PROFILE_ANNOTATED_BRANCHES is set: fs/btrfs/volumes.c:5041:3: error: variable 'max_chunk_size' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] BUG_ON(1); ^~~~~~~~~ include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:48:23: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ fs/btrfs/volumes.c:5046:9: note: uninitialized use occurs here max_chunk_size); ^~~~~~~~~~~~~~ include/linux/kernel.h:860:36: note: expanded from macro 'min' #define min(x, y) __careful_cmp(x, y, <) ^ include/linux/kernel.h:853:17: note: expanded from macro '__careful_cmp' __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op)) ^ include/linux/kernel.h:847:25: note: expanded from macro '__cmp_once' typeof(y) unique_y = (y); \ ^ fs/btrfs/volumes.c:5041:3: note: remove the 'if' if its condition is always true BUG_ON(1); ^ include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON' #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0) ^ fs/btrfs/volumes.c:4993:20: note: initialize the variable 'max_chunk_size' to silence this warning u64 max_chunk_size; ^ = 0 Change it to BUG() so clang can see that this code path can never continue. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Long time ago (2008), the extent buffers were organized in a LRU list and switched to rb-tree in 6af118ce ("Btrfs: Index extent buffers in an rbtree"). There was one stale macro definition left. Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
We can only get <=0 from extent_write_cache_pages, add an ASSERT() for it just in case. Then instead of submitting the write bio even if we got some error, check the return value first. If we have already hit some error, just clean up the corrupted or half-baked bio, and return error. If there is no error so far, then call flush_write_bio() and return the result. Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
This function needs some extra checks on locked pages and eb. For error handling we need to unlock locked pages and the eb. There is a rare >0 return value branch, where all pages get locked while write bio is not flushed. Thankfully it's handled by the only caller, btree_write_cache_pages(), as later write_one_eb() call will trigger submit_one_bio(). So there shouldn't be any problem. Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
We can only get @ret <= 0. Add an ASSERT() for it just in case. Then, instead of submitting the write bio even we got some error, check the return value first. If we have already hit some error, just clean up the corrupted or half-baked bio, and return error. If there is no error so far, then call flush_write_bio() and return the result. Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Since __extent_writepage() will no longer return >0 value, (ret == AOP_WRITEPAGE_ACTIVATE) will never be true. Kill that dead branch. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
In btree_write_cache_pages(), we can only get @ret <= 0. Add an ASSERT() for it just in case. Then instead of submitting the write bio even we got some error, check the return value first. If we have already hit some error, just clean up the corrupted or half-baked bio, and return error. If there is no error so far, then call flush_write_bio() and return the result. Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Since now flush_write_bio() could return error, kill the BUG_ON() first. Then don't call flush_write_bio() unconditionally, instead we check the return value from __extent_writepage() first. If __extent_writepage() fails, we do cleanup, and return error without submitting the possible corrupted or half-baked bio. If __extent_writepage() successes, then we call flush_write_bio() and return the result. Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
We have a BUG_ON() in flush_write_bio() to handle the return value of submit_one_bio(). Move the BUG_ON() one level up to all its callers. This patch will introduce temporary variable, @flush_ret to keep code change minimal in this patch. That variable will be cleaned up when enhancing the error handling later. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
In case we hit the error case for a metadata buffer in end_bio_extent_readpage then 'ret' won't really be checked before it's written again to. This means the -EIO in this case will never be checked, just remove it. Fixes-coverity-id: 1442513 Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Currently extent_readpages (called from btrfs_readpages) will always call __extent_readpages which tries to create contiguous range of pages and call __do_contiguous_readpages when such contiguous range is created. It turns out this is unnecessary due to the fact that generic MM code always calls filesystem's ->readpages callback (btrfs_readpages in this case) with already contiguous pages. Armed with this knowledge it's possible to simplify extent_readpages by eliminating the call to __extent_readpages and directly calling contiguous_readpages. The only edge case that needs to be handled is when add_to_page_cache_lru fails. This is easy as all that is needed is to submit whatever is the number of pages successfully added to the lru. This can happen when the page is already in the range, so it does not need to be read again, and we can't do anything else in case of other errors. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
The member is tracking simple status of the lock, we can use bool for that and make some room for further space reduction in the structure. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::write_locks become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::read_locks become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::spining_readers become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Use the helpers where open coded. On non-debug builds, the warnings will not trigger and extent_buffer::spining_writers become unused and can be moved to the appropriate section, saving a few bytes. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-