* [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
@ 2024-02-12 5:16 Qu Wenruo
2024-02-12 9:45 ` Johannes Thumshirn
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-02-12 5:16 UTC (permalink / raw)
To: linux-btrfs; +Cc: HAN Yuwei, stable
[BUG]
There is a bug report that with zoned device and sectorsize is smaller
than page size (aka, subpage), btrfs would crash with a very basic
workload:
# getconfig PAGESIZE
16384
# mkfs.btrfs -f $dev -s 4k
# mount $dev $mnt
# $fsstress -w -n 8 -s 1707820327 -v -d $mnt
# umount $mnt
The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:1384!
CPU: 0 PID: 872 Comm: kworker/u9:2 Tainted: G OE 6.8.0-rc3-custom+ #7
Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023
Workqueue: writeback wb_workfn (flush-btrfs-8)
pc : __extent_writepage_io+0x404/0x460 [btrfs]
lr : __extent_writepage_io+0x404/0x460 [btrfs]
Call trace:
__extent_writepage_io+0x404/0x460 [btrfs]
extent_write_locked_range+0x16c/0x460 [btrfs]
run_delalloc_cow+0x88/0x118 [btrfs]
btrfs_run_delalloc_range+0x128/0x228 [btrfs]
writepage_delalloc+0xb8/0x178 [btrfs]
__extent_writepage+0xc8/0x3a0 [btrfs]
extent_write_cache_pages+0x1cc/0x460 [btrfs]
extent_writepages+0x8c/0x120 [btrfs]
btrfs_writepages+0x18/0x30 [btrfs]
do_writepages+0x94/0x1f8
__writeback_single_inode+0x4c/0x388
writeback_sb_inodes+0x208/0x4b0
wb_writeback+0x118/0x3c0
wb_do_writeback+0xbc/0x388
wb_workfn+0x80/0x240
process_one_work+0x154/0x3c8
worker_thread+0x2bc/0x3e0
kthread+0xf4/0x108
ret_from_fork+0x10/0x20
Code: 9102c021 90000be0 91378000 9402bf53 (d4210000)
---[ end trace 0000000000000000 ]---
[CAUSE]
There are several factors causing the problem:
1. __extent_writepage_io() requires all dirty ranges to have delalloc
executed
This can be solved by adding @start and @len parameter to only submit
IO for a subset of the page, and update several involved helpers to
do subpage checks.
So this is not a big deal.
2. Subpage only accepts for full page aligned ranges for
extent_write_locked_range()
For zoned device, regular COW is switched to utilize
extent_write_locked_range() to submit the IO.
But the caller, run_delalloc_cow() can be called to run on a subpage
range, e.g.
0 4K 8K 12K 16K
|/////| |/////|
Where |///| is the dirtied range.
In that case, btrfs_run_delalloc_range() would call run_delalloc_cow(),
which would call extent_write_locked_range() for [0, 4K), and unlock
the whole [0, 16K) page.
But btrfs_run_delalloc_range() would again be called for range [8K,
12K), as there are still dirty range left.
In that case, since the whole page is already unlocked by previous
iteration, and would cause different ASSERT()s inside
extent_write_locked_range().
That's also why compression for subpage cases require fully page
aligned range.
[WORKAROUND]
A proper fix requires some big changes to delalloc workload, to allow
extent_write_locked_range() to handle multiple different entries with
the same @locked_page.
So for now, disable read-write support for subpage zoned btrfs.
The problem can only be solved if subpage btrfs can handle subpage
compression, which need quite some work on the delalloc procedure for
the @locked_page handling.
Reported-by: HAN Yuwei <hrx@bupt.moe>
Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c3ab268533ca..85cd23aebdd6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
* part of @locked_page.
* That's also why compression for subpage only work for page aligned ranges.
*/
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
+ if (fs_info->sectorsize < PAGE_SIZE &&
+ btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
btrfs_warn(fs_info,
"no zoned read-write support for page size %lu with sectorsize %u",
PAGE_SIZE, fs_info->sectorsize);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 5:16 [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size Qu Wenruo
@ 2024-02-12 9:45 ` Johannes Thumshirn
2024-02-12 9:50 ` Qu Wenruo
2024-02-12 11:14 ` David Sterba
2024-02-14 7:28 ` David Sterba
2 siblings, 1 reply; 9+ messages in thread
From: Johannes Thumshirn @ 2024-02-12 9:45 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs@vger.kernel.org; +Cc: HAN Yuwei, stable@vger.kernel.org
On 12.02.24 06:16, Qu Wenruo wrote:
> Reported-by: HAN Yuwei <hrx@bupt.moe>
> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
> CC: stable@vger.kernel.org # 5.10+
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/disk-io.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c3ab268533ca..85cd23aebdd6 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
> * part of @locked_page.
> * That's also why compression for subpage only work for page aligned ranges.
> */
> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
> + if (fs_info->sectorsize < PAGE_SIZE &&
> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
> btrfs_warn(fs_info,
> "no zoned read-write support for page size %lu with sectorsize %u",
> PAGE_SIZE, fs_info->sectorsize);
Please keep btrfs_is_zoned(fs_info) instead of using
btrfs_fs_incompat(fs_info, ZONED).
Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 9:45 ` Johannes Thumshirn
@ 2024-02-12 9:50 ` Qu Wenruo
2024-02-12 10:25 ` Johannes Thumshirn
2024-02-14 7:29 ` David Sterba
0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-02-12 9:50 UTC (permalink / raw)
To: Johannes Thumshirn, Qu Wenruo, linux-btrfs@vger.kernel.org
Cc: HAN Yuwei, stable@vger.kernel.org
On 2024/2/12 20:15, Johannes Thumshirn wrote:
> On 12.02.24 06:16, Qu Wenruo wrote:
>> Reported-by: HAN Yuwei <hrx@bupt.moe>
>> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
>> CC: stable@vger.kernel.org # 5.10+
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/disk-io.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index c3ab268533ca..85cd23aebdd6 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
>> * part of @locked_page.
>> * That's also why compression for subpage only work for page aligned ranges.
>> */
>> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
>> + if (fs_info->sectorsize < PAGE_SIZE &&
>> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
>> btrfs_warn(fs_info,
>> "no zoned read-write support for page size %lu with sectorsize %u",
>> PAGE_SIZE, fs_info->sectorsize);
>
> Please keep btrfs_is_zoned(fs_info) instead of using
> btrfs_fs_incompat(fs_info, ZONED).
At the time of calling, we haven't yet populate fs_info->zone_size, thus
we have to use super flags to verify if it's zoned.
If needed, I can add a comment for it.
Thanks,
Qu
>
> Otherwise,
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 9:50 ` Qu Wenruo
@ 2024-02-12 10:25 ` Johannes Thumshirn
2024-02-14 7:29 ` David Sterba
1 sibling, 0 replies; 9+ messages in thread
From: Johannes Thumshirn @ 2024-02-12 10:25 UTC (permalink / raw)
To: Qu Wenruo, Qu Wenruo, linux-btrfs@vger.kernel.org
Cc: HAN Yuwei, stable@vger.kernel.org
On 12.02.24 10:50, Qu Wenruo wrote:
>
>
> On 2024/2/12 20:15, Johannes Thumshirn wrote:
>> On 12.02.24 06:16, Qu Wenruo wrote:
>>> Reported-by: HAN Yuwei <hrx@bupt.moe>
>>> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
>>> CC: stable@vger.kernel.org # 5.10+
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>> fs/btrfs/disk-io.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index c3ab268533ca..85cd23aebdd6 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
>>> * part of @locked_page.
>>> * That's also why compression for subpage only work for page aligned ranges.
>>> */
>>> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
>>> + if (fs_info->sectorsize < PAGE_SIZE &&
>>> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
>>> btrfs_warn(fs_info,
>>> "no zoned read-write support for page size %lu with sectorsize %u",
>>> PAGE_SIZE, fs_info->sectorsize);
>>
>> Please keep btrfs_is_zoned(fs_info) instead of using
>> btrfs_fs_incompat(fs_info, ZONED).
>
> At the time of calling, we haven't yet populate fs_info->zone_size, thus
> we have to use super flags to verify if it's zoned.
>
> If needed, I can add a comment for it.
All good, I didn't have enough coffee.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 9:50 ` Qu Wenruo
2024-02-12 10:25 ` Johannes Thumshirn
@ 2024-02-14 7:29 ` David Sterba
2024-02-17 0:07 ` Qu Wenruo
1 sibling, 1 reply; 9+ messages in thread
From: David Sterba @ 2024-02-14 7:29 UTC (permalink / raw)
To: Qu Wenruo
Cc: Johannes Thumshirn, Qu Wenruo, linux-btrfs@vger.kernel.org,
HAN Yuwei, stable@vger.kernel.org
On Mon, Feb 12, 2024 at 08:20:21PM +1030, Qu Wenruo wrote:
>
>
> On 2024/2/12 20:15, Johannes Thumshirn wrote:
> > On 12.02.24 06:16, Qu Wenruo wrote:
> >> Reported-by: HAN Yuwei <hrx@bupt.moe>
> >> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
> >> CC: stable@vger.kernel.org # 5.10+
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> fs/btrfs/disk-io.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index c3ab268533ca..85cd23aebdd6 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
> >> * part of @locked_page.
> >> * That's also why compression for subpage only work for page aligned ranges.
> >> */
> >> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
> >> + if (fs_info->sectorsize < PAGE_SIZE &&
> >> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
> >> btrfs_warn(fs_info,
> >> "no zoned read-write support for page size %lu with sectorsize %u",
> >> PAGE_SIZE, fs_info->sectorsize);
> >
> > Please keep btrfs_is_zoned(fs_info) instead of using
> > btrfs_fs_incompat(fs_info, ZONED).
>
> At the time of calling, we haven't yet populate fs_info->zone_size, thus
> we have to use super flags to verify if it's zoned.
>
> If needed, I can add a comment for it.
Yes please add a comment the difference is quite subtle.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-14 7:29 ` David Sterba
@ 2024-02-17 0:07 ` Qu Wenruo
0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-02-17 0:07 UTC (permalink / raw)
To: dsterba
Cc: Johannes Thumshirn, Qu Wenruo, linux-btrfs@vger.kernel.org,
HAN Yuwei, stable@vger.kernel.org
在 2024/2/14 17:59, David Sterba 写道:
> On Mon, Feb 12, 2024 at 08:20:21PM +1030, Qu Wenruo wrote:
>>
>>
>> On 2024/2/12 20:15, Johannes Thumshirn wrote:
>>> On 12.02.24 06:16, Qu Wenruo wrote:
>>>> Reported-by: HAN Yuwei <hrx@bupt.moe>
>>>> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
>>>> CC: stable@vger.kernel.org # 5.10+
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> fs/btrfs/disk-io.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index c3ab268533ca..85cd23aebdd6 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
>>>> * part of @locked_page.
>>>> * That's also why compression for subpage only work for page aligned ranges.
>>>> */
>>>> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
>>>> + if (fs_info->sectorsize < PAGE_SIZE &&
>>>> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
>>>> btrfs_warn(fs_info,
>>>> "no zoned read-write support for page size %lu with sectorsize %u",
>>>> PAGE_SIZE, fs_info->sectorsize);
>>>
>>> Please keep btrfs_is_zoned(fs_info) instead of using
>>> btrfs_fs_incompat(fs_info, ZONED).
>>
>> At the time of calling, we haven't yet populate fs_info->zone_size, thus
>> we have to use super flags to verify if it's zoned.
>>
>> If needed, I can add a comment for it.
>
> Yes please add a comment the difference is quite subtle.
>
I'd say this patch can be dropped.
The reason is:
- I'm already working on the proper subpage handling for the
@locked_page of a delalloc range
The patchset is under testing now, the results looks fine for
both regular and subpage cases.
Will queue extra testing for subpage zoned.
- The rejection would cause future problems for detecting whether we
have proper subpage + zoned support.
Either we do not detect, or introduce a complex mechanism only for
this one edge case.
Thus I prefer not to detect.
- Subage + zoned is too niche for now
The most common subpage usage would be aarch64 (especially for Apple
M1/2 chips).
For those Apple based ones, they have no ability to expand, thus won't
hit any real zoned devices.
For other aarch64 servers, they should have the ability to choose a 4K
page size kernel if they really want to go zoned devices.
This bug is only exposed with a helpful reporter using 16K page sized
loongson board with SATA zoned disk.
Due to above reasons, I really prefer to keep the current situation, and
focus on the proper fix instead.
Thanks,
Qu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 5:16 [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size Qu Wenruo
2024-02-12 9:45 ` Johannes Thumshirn
@ 2024-02-12 11:14 ` David Sterba
2024-02-12 22:54 ` Qu Wenruo
2024-02-14 7:28 ` David Sterba
2 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2024-02-12 11:14 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, HAN Yuwei, stable
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote:
> [BUG]
> There is a bug report that with zoned device and sectorsize is smaller
> than page size (aka, subpage), btrfs would crash with a very basic
> workload:
>
> # getconfig PAGESIZE
> 16384
> # mkfs.btrfs -f $dev -s 4k
> # mount $dev $mnt
> # $fsstress -w -n 8 -s 1707820327 -v -d $mnt
> # umount $mnt
>
> The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
>
> assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
This is the same as what Josef fixed in
https://github.com/btrfs/linux/commit/400bb013912dac637c7d6826407be580ea8ef9cc
"btrfs: don't drop extent_map for free space inode on write error"
but not on zoned+subpage, is it really a different error you're fixing?
[...]
> [WORKAROUND]
> A proper fix requires some big changes to delalloc workload, to allow
> extent_write_locked_range() to handle multiple different entries with
> the same @locked_page.
>
> So for now, disable read-write support for subpage zoned btrfs.
>
> The problem can only be solved if subpage btrfs can handle subpage
> compression, which need quite some work on the delalloc procedure for
> the @locked_page handling.
Ok, this on itself is a valid reason to disable the support.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 11:14 ` David Sterba
@ 2024-02-12 22:54 ` Qu Wenruo
0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2024-02-12 22:54 UTC (permalink / raw)
To: dsterba; +Cc: linux-btrfs, HAN Yuwei, stable
[-- Attachment #1.1.1: Type: text/plain, Size: 2379 bytes --]
On 2024/2/12 21:44, David Sterba wrote:
> On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote:
>> [BUG]
>> There is a bug report that with zoned device and sectorsize is smaller
>> than page size (aka, subpage), btrfs would crash with a very basic
>> workload:
>>
>> # getconfig PAGESIZE
>> 16384
>> # mkfs.btrfs -f $dev -s 4k
>> # mount $dev $mnt
>> # $fsstress -w -n 8 -s 1707820327 -v -d $mnt
>> # umount $mnt
>>
>> The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
>>
>> assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
>
> This is the same as what Josef fixed in
> https://github.com/btrfs/linux/commit/400bb013912dac637c7d6826407be580ea8ef9cc
> "btrfs: don't drop extent_map for free space inode on write error"
> but not on zoned+subpage, is it really a different error you're fixing?
Yes, it's really a different bug.
Although the root cause is a little more complex, it has to be explained
through the whole CAUSE section.
>
> [...]
>> [WORKAROUND]
>> A proper fix requires some big changes to delalloc workload, to allow
>> extent_write_locked_range() to handle multiple different entries with
>> the same @locked_page.
>>
>> So for now, disable read-write support for subpage zoned btrfs.
>>
>> The problem can only be solved if subpage btrfs can handle subpage
>> compression, which need quite some work on the delalloc procedure for
>> the @locked_page handling.
>
> Ok, this on itself is a valid reason to disable the support.
But this reminds me that, I have to work on the proper delalloc support
for subpage.
My current plan is:
1. Introduce subpage LOCK bitmap
Not just only an atomic counter.
2. Make find_lock_delalloc_range() to populate that lock bitmap
3. Make find_lock_delalloc_range() to find all delalloc range of
a page and properly lock them
This would not change the behavior for page sized sectorsize.
4. Make find_lock_delalloc_range() to skip the page if it's already
being locked before.
This is also a subpage specific behavior
With above planned work, btrfs should be able to handle subpage unlock,
so that we won't need weird @locked_page handling.
Hopefully this would solve both the subpage+zoned and full sector
aligned subpage compression support.
Thanks,
Qu
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size
2024-02-12 5:16 [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size Qu Wenruo
2024-02-12 9:45 ` Johannes Thumshirn
2024-02-12 11:14 ` David Sterba
@ 2024-02-14 7:28 ` David Sterba
2 siblings, 0 replies; 9+ messages in thread
From: David Sterba @ 2024-02-14 7:28 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, HAN Yuwei, stable
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote:
> [BUG]
> There is a bug report that with zoned device and sectorsize is smaller
> than page size (aka, subpage), btrfs would crash with a very basic
> workload:
>
> # getconfig PAGESIZE
> 16384
> # mkfs.btrfs -f $dev -s 4k
> # mount $dev $mnt
> # $fsstress -w -n 8 -s 1707820327 -v -d $mnt
> # umount $mnt
>
> The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
>
> assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:1384!
> CPU: 0 PID: 872 Comm: kworker/u9:2 Tainted: G OE 6.8.0-rc3-custom+ #7
> Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023
> Workqueue: writeback wb_workfn (flush-btrfs-8)
> pc : __extent_writepage_io+0x404/0x460 [btrfs]
> lr : __extent_writepage_io+0x404/0x460 [btrfs]
> Call trace:
> __extent_writepage_io+0x404/0x460 [btrfs]
> extent_write_locked_range+0x16c/0x460 [btrfs]
> run_delalloc_cow+0x88/0x118 [btrfs]
> btrfs_run_delalloc_range+0x128/0x228 [btrfs]
> writepage_delalloc+0xb8/0x178 [btrfs]
> __extent_writepage+0xc8/0x3a0 [btrfs]
> extent_write_cache_pages+0x1cc/0x460 [btrfs]
> extent_writepages+0x8c/0x120 [btrfs]
> btrfs_writepages+0x18/0x30 [btrfs]
> do_writepages+0x94/0x1f8
> __writeback_single_inode+0x4c/0x388
> writeback_sb_inodes+0x208/0x4b0
> wb_writeback+0x118/0x3c0
> wb_do_writeback+0xbc/0x388
> wb_workfn+0x80/0x240
> process_one_work+0x154/0x3c8
> worker_thread+0x2bc/0x3e0
> kthread+0xf4/0x108
> ret_from_fork+0x10/0x20
> Code: 9102c021 90000be0 91378000 9402bf53 (d4210000)
> ---[ end trace 0000000000000000 ]---
>
> [CAUSE]
> There are several factors causing the problem:
>
> 1. __extent_writepage_io() requires all dirty ranges to have delalloc
> executed
> This can be solved by adding @start and @len parameter to only submit
> IO for a subset of the page, and update several involved helpers to
> do subpage checks.
>
> So this is not a big deal.
>
> 2. Subpage only accepts for full page aligned ranges for
> extent_write_locked_range()
> For zoned device, regular COW is switched to utilize
> extent_write_locked_range() to submit the IO.
>
> But the caller, run_delalloc_cow() can be called to run on a subpage
> range, e.g.
>
> 0 4K 8K 12K 16K
> |/////| |/////|
>
> Where |///| is the dirtied range.
>
> In that case, btrfs_run_delalloc_range() would call run_delalloc_cow(),
> which would call extent_write_locked_range() for [0, 4K), and unlock
> the whole [0, 16K) page.
>
> But btrfs_run_delalloc_range() would again be called for range [8K,
> 12K), as there are still dirty range left.
> In that case, since the whole page is already unlocked by previous
> iteration, and would cause different ASSERT()s inside
> extent_write_locked_range().
>
> That's also why compression for subpage cases require fully page
> aligned range.
>
> [WORKAROUND]
> A proper fix requires some big changes to delalloc workload, to allow
> extent_write_locked_range() to handle multiple different entries with
> the same @locked_page.
>
> So for now, disable read-write support for subpage zoned btrfs.
>
> The problem can only be solved if subpage btrfs can handle subpage
> compression, which need quite some work on the delalloc procedure for
> the @locked_page handling.
>
> Reported-by: HAN Yuwei <hrx@bupt.moe>
> Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aae84cc@bupt.moe/
> CC: stable@vger.kernel.org # 5.10+
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/disk-io.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c3ab268533ca..85cd23aebdd6 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount)
> * part of @locked_page.
> * That's also why compression for subpage only work for page aligned ranges.
> */
> - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
> + if (fs_info->sectorsize < PAGE_SIZE &&
> + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
> btrfs_warn(fs_info,
> "no zoned read-write support for page size %lu with sectorsize %u",
> PAGE_SIZE, fs_info->sectorsize);
This does not match any code I see in usual branches and it also does
not apply so it can't be picked as a fix.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-02-17 0:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-12 5:16 [PATCH] btrfs: reject zoned RW mount if sectorsize is smaller than page size Qu Wenruo
2024-02-12 9:45 ` Johannes Thumshirn
2024-02-12 9:50 ` Qu Wenruo
2024-02-12 10:25 ` Johannes Thumshirn
2024-02-14 7:29 ` David Sterba
2024-02-17 0:07 ` Qu Wenruo
2024-02-12 11:14 ` David Sterba
2024-02-12 22:54 ` Qu Wenruo
2024-02-14 7:28 ` David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox