* [PATCH AUTOSEL 5.11 01/12] btrfs: avoid checking for RO block group twice during nocow writeback
@ 2021-03-07 13:57 Sasha Levin
2021-03-07 13:57 ` [PATCH AUTOSEL 5.11 03/12] btrfs: subpage: fix the false data csum mismatch error Sasha Levin
0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2021-03-07 13:57 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Filipe Manana, Anand Jain, Josef Bacik, David Sterba, Sasha Levin,
linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
[ Upstream commit 20903032cd9f0260b99aeab92e6540f0350e4a23 ]
During the nocow writeback path, we currently iterate the rbtree of block
groups twice: once for checking if the target block group is RO with the
call to btrfs_extent_readonly()), and once again for getting a nocow
reference on the block group with a call to btrfs_inc_nocow_writers().
Since btrfs_inc_nocow_writers() already returns false when the target
block group is RO, remove the call to btrfs_extent_readonly(). Not only
we avoid searching the blocks group rbtree twice, it also helps reduce
contention on the lock that protects it (specially since it is a spin
lock and not a read-write lock). That may make a noticeable difference
on very large filesystems, with thousands of allocated block groups.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/inode.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ad34c5a09bef..02c4bfa515fb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1657,9 +1657,6 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
*/
btrfs_release_path(path);
- /* If extent is RO, we must COW it */
- if (btrfs_extent_readonly(fs_info, disk_bytenr))
- goto out_check;
ret = btrfs_cross_ref_exist(root, ino,
found_key.offset -
extent_offset, disk_bytenr, false);
@@ -1706,6 +1703,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
WARN_ON_ONCE(freespace_inode);
goto out_check;
}
+ /* If the extent's block group is RO, we must COW */
if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr))
goto out_check;
nocow = true;
--
2.30.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH AUTOSEL 5.11 03/12] btrfs: subpage: fix the false data csum mismatch error
2021-03-07 13:57 [PATCH AUTOSEL 5.11 01/12] btrfs: avoid checking for RO block group twice during nocow writeback Sasha Levin
@ 2021-03-07 13:57 ` Sasha Levin
2021-03-08 15:43 ` David Sterba
0 siblings, 1 reply; 3+ messages in thread
From: Sasha Levin @ 2021-03-07 13:57 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Qu Wenruo, David Sterba, Sasha Levin, linux-btrfs
From: Qu Wenruo <wqu@suse.com>
[ Upstream commit c28ea613fafad910d08f67efe76ae552b1434e44 ]
[BUG]
When running fstresss, we can hit strange data csum mismatch where the
on-disk data is in fact correct (passes scrub).
With some extra debug info added, we have the following traces:
0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192
0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096
0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192
0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864
0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096
0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864
0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192
0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384
1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096
1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864
1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!!
7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0
[CAUSE]
Normally we expect all submitted bio reads to only touch the range we
specified, and under subpage context, it means we should only touch the
range specified in each bvec.
But in data read path, inside end_bio_extent_readpage(), we have page
zeroing which only takes regular page size into consideration.
This means for subpage if we have an inode whose content looks like below:
0 16K 32K 48K 64K
|///////| |///////| |
|//| = data needs to be read from disk
| | = hole
And i_size is 64K initially.
Then the following race can happen:
T1 | T2
--------------------------------+--------------------------------
btrfs_do_readpage() |
|- isize = 64K; |
| At this time, the isize is |
| 64K |
| |
|- submit_extent_page() |
| submit previous assembled bio|
| assemble bio for [0, 16K) |
| |
|- submit_extent_page() |
submit read bio for [0, 16K) |
assemble read bio for |
[32K, 48K) |
|
| btrfs_setsize()
| |- i_size_write(, 16K);
| Now i_size is only 16K
end_io() for [0K, 16K) |
|- end_bio_extent_readpage() |
|- btrfs_verify_data_csum() |
| No csum error |
|- i_size = 16K; |
|- zero_user_segment(16K, |
PAGE_SIZE); |
!!! We zeroed range |
!!! [32K, 48K) |
| end_io for [32K, 48K)
| |- end_bio_extent_readpage()
| |- btrfs_verify_data_csum()
| ! CSUM MISMATCH !
| ! As the range is zeroed now !
[FIX]
To fix the problem, make end_bio_extent_readpage() to only zero the
range of bvec.
The bug only affects subpage read-write support, as for full read-only
mount we can't change i_size thus won't hit the race condition.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/btrfs/extent_io.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c9cee458e001..ff1a0f97ba84 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2972,12 +2972,23 @@ static void end_bio_extent_readpage(struct bio *bio)
if (likely(uptodate)) {
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size >> PAGE_SHIFT;
- unsigned off;
- /* Zero out the end if this page straddles i_size */
- off = offset_in_page(i_size);
- if (page->index == end_index && off)
- zero_user_segment(page, off, PAGE_SIZE);
+ /*
+ * Zero out the remaining part if this range straddles
+ * i_size.
+ *
+ * Here we should only zero the range inside the bvec,
+ * not touch anything else.
+ *
+ * NOTE: i_size is exclusive while end is inclusive.
+ */
+ if (page->index == end_index && i_size <= end) {
+ u32 zero_start = max(offset_in_page(i_size),
+ offset_in_page(end));
+
+ zero_user_segment(page, zero_start,
+ offset_in_page(end) + 1);
+ }
}
ASSERT(bio_offset + len > bio_offset);
bio_offset += len;
--
2.30.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH AUTOSEL 5.11 03/12] btrfs: subpage: fix the false data csum mismatch error
2021-03-07 13:57 ` [PATCH AUTOSEL 5.11 03/12] btrfs: subpage: fix the false data csum mismatch error Sasha Levin
@ 2021-03-08 15:43 ` David Sterba
0 siblings, 0 replies; 3+ messages in thread
From: David Sterba @ 2021-03-08 15:43 UTC (permalink / raw)
To: Sasha Levin; +Cc: linux-kernel, stable, Qu Wenruo, David Sterba, linux-btrfs
On Sun, Mar 07, 2021 at 08:57:37AM -0500, Sasha Levin wrote:
> From: Qu Wenruo <wqu@suse.com>
>
> [ Upstream commit c28ea613fafad910d08f67efe76ae552b1434e44 ]
>
> [BUG]
> When running fstresss, we can hit strange data csum mismatch where the
> on-disk data is in fact correct (passes scrub).
>
> With some extra debug info added, we have the following traces:
>
> 0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192
> 0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096
> 0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192
> 0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864
> 0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096
> 0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864
> 0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192
> 0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384
> 1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096
> 1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864
> 1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!!
> 7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0
>
> [CAUSE]
> Normally we expect all submitted bio reads to only touch the range we
> specified, and under subpage context, it means we should only touch the
> range specified in each bvec.
>
> But in data read path, inside end_bio_extent_readpage(), we have page
> zeroing which only takes regular page size into consideration.
>
> This means for subpage if we have an inode whose content looks like below:
>
> 0 16K 32K 48K 64K
> |///////| |///////| |
>
> |//| = data needs to be read from disk
> | | = hole
>
> And i_size is 64K initially.
>
> Then the following race can happen:
>
> T1 | T2
> --------------------------------+--------------------------------
> btrfs_do_readpage() |
> |- isize = 64K; |
> | At this time, the isize is |
> | 64K |
> | |
> |- submit_extent_page() |
> | submit previous assembled bio|
> | assemble bio for [0, 16K) |
> | |
> |- submit_extent_page() |
> submit read bio for [0, 16K) |
> assemble read bio for |
> [32K, 48K) |
> |
> | btrfs_setsize()
> | |- i_size_write(, 16K);
> | Now i_size is only 16K
> end_io() for [0K, 16K) |
> |- end_bio_extent_readpage() |
> |- btrfs_verify_data_csum() |
> | No csum error |
> |- i_size = 16K; |
> |- zero_user_segment(16K, |
> PAGE_SIZE); |
> !!! We zeroed range |
> !!! [32K, 48K) |
> | end_io for [32K, 48K)
> | |- end_bio_extent_readpage()
> | |- btrfs_verify_data_csum()
> | ! CSUM MISMATCH !
> | ! As the range is zeroed now !
>
> [FIX]
> To fix the problem, make end_bio_extent_readpage() to only zero the
> range of bvec.
>
> The bug only affects subpage read-write support, as for full read-only
> mount we can't change i_size thus won't hit the race condition.
Please drop this patch from autosel because of the above, this is in a
feature that's in progress and does not affect regular filesystems.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-03-08 15:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-07 13:57 [PATCH AUTOSEL 5.11 01/12] btrfs: avoid checking for RO block group twice during nocow writeback Sasha Levin
2021-03-07 13:57 ` [PATCH AUTOSEL 5.11 03/12] btrfs: subpage: fix the false data csum mismatch error Sasha Levin
2021-03-08 15:43 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox