* [PATCH 0/3] btrfs: support 2k block size for debug builds
@ 2025-02-26 4:10 Qu Wenruo
2025-02-26 4:10 ` [PATCH 1/3] btrfs: subpage: do not hold subpage spin lock when clearing folio writeback Qu Wenruo
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-02-26 4:10 UTC (permalink / raw)
To: linux-btrfs
[REPO]
This series depends on the existing subpage related patches to pass most
fstests, so please fetch it from the following repo:
https://github.com/adam900710/linux/tree/2k_blocksize
Of course, one can still apply those involved patches on for-next
branch, but running such btrfs with 2K block size are going to hit most
if not all bugs fixed in the subpage branch.
From day 1 btrfs only supports block size as small as 4K, this means on
the most common architecture, x86_64, has no way to test subpage block
size support.
That's why most of my tests are done on aarch64 nowadays, but such
limited availability is not a good thing for test coverage.
The situation can be improved if we have larger data folios support, but
that is another huge feature, and we're not sure how far away we really
are.
So here we go with a much simpler solution, just lowering the minimal
block size to 2K for debug builds.
The support has quite some limitations, but should not be a big deal
because we're not pushing this support to end users:
- No 2K node size support
This is the limit by mkfs, not by the kernel.
But it's still a problem as this means we can not test the metadata
subpage routine.
- No mixed block groups support
As there is no 2K node size support from mkfs.btrfs.
- Very limited inline data extents support
No inline extent size can go beyond 2K, this affects both regular
files and symlinks/xattrs.
Quite some inline related test will fail due to this.
This allows x86_64 to utilize the subpage block size routine, and in
fact it already exposed a bug that is not reproducible on aarch64.
(I believe it's related to the page reclaim behavior)
The first patch is to fix the deadlock that is only reproducible on
x86_64.
The second one is to fix btrfs-check errors that non-compressed block
sized inline extents are reported as an error.
The final one enables the 2K block size support for DEBUG builds.
For now there are around a dozen of failed test cases, mostly related to
inline and mkfs limitations, but this is good enough as the beginning of
subpage testing on x86_64.
Qu Wenruo (3):
btrfs: subpage: do not hold subpage spin lock when clearing folio
writeback
btrfs: properly limit inline data extent according to block size
btrfs: allow debug builds to accept 2K block size
fs/btrfs/disk-io.c | 12 +++++++++---
fs/btrfs/fs.h | 12 ++++++++++++
fs/btrfs/inode.c | 14 +++++++++++++-
fs/btrfs/subpage.c | 10 ++++++++--
fs/btrfs/subpage.h | 2 +-
fs/btrfs/sysfs.c | 3 ++-
6 files changed, 45 insertions(+), 8 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] btrfs: subpage: do not hold subpage spin lock when clearing folio writeback
2025-02-26 4:10 [PATCH 0/3] btrfs: support 2k block size for debug builds Qu Wenruo
@ 2025-02-26 4:10 ` Qu Wenruo
2025-02-26 4:10 ` [PATCH 2/3] btrfs: properly limit inline data extent according to block size Qu Wenruo
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-02-26 4:10 UTC (permalink / raw)
To: linux-btrfs; +Cc: stable
[BUG]
When testing subpage block size btrfs (block size < page size), I hit
the following spin lock hang on x86_64, with the experimental 2K block
size support:
<TASK>
_raw_spin_lock_irq+0x2f/0x40
wait_subpage_spinlock+0x69/0x80 [btrfs]
btrfs_release_folio+0x46/0x70 [btrfs]
folio_unmap_invalidate+0xcb/0x250
folio_end_writeback+0x127/0x1b0
btrfs_subpage_clear_writeback+0xef/0x140 [btrfs]
end_bbio_data_write+0x13a/0x3c0 [btrfs]
btrfs_bio_end_io+0x6f/0xc0 [btrfs]
process_one_work+0x156/0x310
worker_thread+0x252/0x390
? __pfx_worker_thread+0x10/0x10
kthread+0xef/0x250
? finish_task_switch.isra.0+0x8a/0x250
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
[CAUSE]
It's a self deadlock with the following sequence:
btrfs_subpage_clear_writeback()
|- spin_lock_irqsave(&subpage->lock);
|- folio_end_writeback()
|- folio_end_dropbehind_write()
|- folio_unmap_invalidate()
|- btrfs_release_folio()
|- wait_subpage_spinlock()
|- spin_lock_irq(&subpage->lock);
!! DEADLOCK !!
We're trying to acquire the same spin lock already held by ourselves.
This has never been reproducibled on aarch64 as it looks like some x86_64
specific folio reclaim behavior?
[FIX]
Move the folio_end_writeback() call out of the spin lock critical
section.
And since we no longer have all the bitmap operation and the writeback
flag clearing happening inside the critical section, we must do extra
checks to make sure only the last one clearing the writeback bitmap can
clear the folio writeback flag.
Fixes: 3470da3b7d87 ("btrfs: subpage: introduce helpers for writeback status")
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/subpage.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 4d1bf1124ba0..3ce3d7093ddb 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -466,15 +466,21 @@ void btrfs_subpage_clear_writeback(const struct btrfs_fs_info *fs_info,
struct btrfs_subpage *subpage = folio_get_private(folio);
unsigned int start_bit = subpage_calc_start_bit(fs_info, folio,
writeback, start, len);
+ bool was_writeback;
+ bool last = false;
unsigned long flags;
spin_lock_irqsave(&subpage->lock, flags);
+ was_writeback = !subpage_test_bitmap_all_zero(fs_info, folio, writeback);
bitmap_clear(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
- if (subpage_test_bitmap_all_zero(fs_info, folio, writeback)) {
+ if (subpage_test_bitmap_all_zero(fs_info, folio, writeback) &&
+ was_writeback) {
ASSERT(folio_test_writeback(folio));
- folio_end_writeback(folio);
+ last = true;
}
spin_unlock_irqrestore(&subpage->lock, flags);
+ if (last)
+ folio_end_writeback(folio);
}
void btrfs_subpage_set_ordered(const struct btrfs_fs_info *fs_info,
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/3] btrfs: properly limit inline data extent according to block size
2025-02-26 4:10 [PATCH 0/3] btrfs: support 2k block size for debug builds Qu Wenruo
2025-02-26 4:10 ` [PATCH 1/3] btrfs: subpage: do not hold subpage spin lock when clearing folio writeback Qu Wenruo
@ 2025-02-26 4:10 ` Qu Wenruo
2025-02-26 4:10 ` [PATCH 3/3] btrfs: allow debug builds to accept 2K " Qu Wenruo
2025-02-28 14:19 ` [PATCH 0/3] btrfs: support 2k block size for debug builds David Sterba
3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-02-26 4:10 UTC (permalink / raw)
To: linux-btrfs
Btrfs utilizes inline data extent for the following cases:
- Regular small files
- Symlink files
And "btrfs check" detects any file extents that are too large as an
error.
It's not a problem for 4K block size, but for the incoming smaller
block sizes (2K), it can cause problems due to bad limits:
- Non-compressed inline data extents
We do not allow a non-compressed inline data extent to be as large as
block size.
- Symblinks
Currently the only real limit on symblinks are 4K, which can be larger
than 2K block size.
These will result btrfs-check to report too large file extents.
Fix it by adding proper size checks for the above cases.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4629706485dc..386e700515ca 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -570,6 +570,13 @@ static bool can_cow_file_range_inline(struct btrfs_inode *inode,
if (size > fs_info->sectorsize)
return false;
+ /*
+ * We do not allow a non-compressed extent to be as large
+ * as block size.
+ */
+ if (data_len >= fs_info->sectorsize)
+ return false;
+
/* We cannot exceed the maximum inline data size. */
if (data_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info))
return false;
@@ -8673,7 +8680,12 @@ static int btrfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
struct extent_buffer *leaf;
name_len = strlen(symname);
- if (name_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info))
+ /*
+ * Symlink utilize uncompressed inline extent data, which should
+ * not reach block size.
+ */
+ if (name_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info) ||
+ name_len >= fs_info->sectorsize)
return -ENAMETOOLONG;
inode = new_inode(dir->i_sb);
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/3] btrfs: allow debug builds to accept 2K block size
2025-02-26 4:10 [PATCH 0/3] btrfs: support 2k block size for debug builds Qu Wenruo
2025-02-26 4:10 ` [PATCH 1/3] btrfs: subpage: do not hold subpage spin lock when clearing folio writeback Qu Wenruo
2025-02-26 4:10 ` [PATCH 2/3] btrfs: properly limit inline data extent according to block size Qu Wenruo
@ 2025-02-26 4:10 ` Qu Wenruo
2025-02-28 14:19 ` [PATCH 0/3] btrfs: support 2k block size for debug builds David Sterba
3 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2025-02-26 4:10 UTC (permalink / raw)
To: linux-btrfs
Currently we only support two block sizes, 4K and PAGE_SIZE.
This means on the most common architecture x86_64, we have no way to
test subpage block size.
And that's exactly I have an aarch64 machine dedicated for subpage
tests.
But this is still a hurdle for a lot of btrfs developers, and to improve
the test coverage mostly on x86_64, here we enable debug builds to
accept 2K block size.
This involves:
- Introduce a dedicated minimal block size macro
BTRFS_MIN_BLOCKSIZE, which depends on if CONFIG_BTRFS_DEBUG is set.
If so it's 2K, otherwise it's 4K as usual.
- Allow 4K, PAGE_SIZE and BTRFS_MIN_BLOCKSIZE as block size
- Update subpage block size checks to be based on BTRFS_MIN_BLOCKSIZE
- Export the new supported blocksize through sysfs interfaces
As most of the subpage support is already pretty mature, there is no
extra work needed to support the extra 2K block size.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 12 +++++++++---
fs/btrfs/fs.h | 12 ++++++++++++
fs/btrfs/subpage.h | 2 +-
fs/btrfs/sysfs.c | 3 ++-
4 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c0b40dedceb5..6a8368421fbc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2446,21 +2446,27 @@ int btrfs_validate_super(const struct btrfs_fs_info *fs_info,
* Check sectorsize and nodesize first, other check will need it.
* Check all possible sectorsize(4K, 8K, 16K, 32K, 64K) here.
*/
- if (!is_power_of_2(sectorsize) || sectorsize < 4096 ||
+ if (!is_power_of_2(sectorsize) || sectorsize < BTRFS_MIN_BLOCKSIZE ||
sectorsize > BTRFS_MAX_METADATA_BLOCKSIZE) {
btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
ret = -EINVAL;
}
/*
- * We only support at most two sectorsizes: 4K and PAGE_SIZE.
+ * We only support at most 3 sectorsizes: 4K, PAGE_SIZE, MIN_BLOCKSIZE.
+ *
+ * For 4K page sized systems with non-debug builds, all 3 matches (4K).
+ * For 4K page sized systems with debug builds, there are two block sizes
+ * supported. (4K and 2K)
*
* We can support 16K sectorsize with 64K page size without problem,
* but such sectorsize/pagesize combination doesn't make much sense.
* 4K will be our future standard, PAGE_SIZE is supported from the very
* beginning.
*/
- if (sectorsize > PAGE_SIZE || (sectorsize != SZ_4K && sectorsize != PAGE_SIZE)) {
+ if (sectorsize > PAGE_SIZE || (sectorsize != SZ_4K &&
+ sectorsize != PAGE_SIZE &&
+ sectorsize != BTRFS_MIN_BLOCKSIZE)) {
btrfs_err(fs_info,
"sectorsize %llu not yet supported for page size %lu",
sectorsize, PAGE_SIZE);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 8e8ac7db1355..b8c2e59ffc43 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -47,6 +47,18 @@ struct btrfs_subpage_info;
struct btrfs_stripe_hash_table;
struct btrfs_space_info;
+/*
+ * Minimal data and metadata block size.
+ *
+ * Normally it's 4K, but for testing subpage block size on 4K page systems,
+ * we allow DEBUG builds to accept 2K page size.
+ */
+#ifdef CONFIG_BTRFS_DEBUG
+#define BTRFS_MIN_BLOCKSIZE (SZ_2K)
+#else
+#define BTRFS_MIN_BLOCKSIZE (SZ_4K)
+#endif
+
#define BTRFS_MAX_EXTENT_SIZE SZ_128M
#define BTRFS_OLDEST_GENERATION 0ULL
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index f8d1efa1a227..5327093cf466 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -70,7 +70,7 @@ enum btrfs_subpage_type {
BTRFS_SUBPAGE_DATA,
};
-#if PAGE_SIZE > SZ_4K
+#if PAGE_SIZE > BTRFS_MIN_BLOCKSIZE
/*
* Subpage support for metadata is more complex, as we can have dummy extent
* buffers, where folios have no mapping to determine the owning inode.
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 53b846d99ece..4be612cab10f 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -411,7 +411,8 @@ static ssize_t supported_sectorsizes_show(struct kobject *kobj,
{
ssize_t ret = 0;
- /* An artificial limit to only support 4K and PAGE_SIZE */
+ if (BTRFS_MIN_BLOCKSIZE != SZ_4K && BTRFS_MIN_BLOCKSIZE != PAGE_SIZE)
+ ret += sysfs_emit_at(buf, ret, "%u ", BTRFS_MIN_BLOCKSIZE);
if (PAGE_SIZE > SZ_4K)
ret += sysfs_emit_at(buf, ret, "%u ", SZ_4K);
ret += sysfs_emit_at(buf, ret, "%lu\n", PAGE_SIZE);
--
2.48.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] btrfs: support 2k block size for debug builds
2025-02-26 4:10 [PATCH 0/3] btrfs: support 2k block size for debug builds Qu Wenruo
` (2 preceding siblings ...)
2025-02-26 4:10 ` [PATCH 3/3] btrfs: allow debug builds to accept 2K " Qu Wenruo
@ 2025-02-28 14:19 ` David Sterba
3 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2025-02-28 14:19 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Feb 26, 2025 at 02:40:19PM +1030, Qu Wenruo wrote:
> [REPO]
> This series depends on the existing subpage related patches to pass most
> fstests, so please fetch it from the following repo:
>
> https://github.com/adam900710/linux/tree/2k_blocksize
>
> Of course, one can still apply those involved patches on for-next
> branch, but running such btrfs with 2K block size are going to hit most
> if not all bugs fixed in the subpage branch.
>
>
> >From day 1 btrfs only supports block size as small as 4K, this means on
> the most common architecture, x86_64, has no way to test subpage block
> size support.
>
> That's why most of my tests are done on aarch64 nowadays, but such
> limited availability is not a good thing for test coverage.
> The situation can be improved if we have larger data folios support, but
> that is another huge feature, and we're not sure how far away we really
> are.
>
> So here we go with a much simpler solution, just lowering the minimal
> block size to 2K for debug builds.
>
> The support has quite some limitations, but should not be a big deal
> because we're not pushing this support to end users:
>
> - No 2K node size support
> This is the limit by mkfs, not by the kernel.
> But it's still a problem as this means we can not test the metadata
> subpage routine.
>
> - No mixed block groups support
> As there is no 2K node size support from mkfs.btrfs.
>
> - Very limited inline data extents support
> No inline extent size can go beyond 2K, this affects both regular
> files and symlinks/xattrs.
>
> Quite some inline related test will fail due to this.
>
> This allows x86_64 to utilize the subpage block size routine, and in
> fact it already exposed a bug that is not reproducible on aarch64.
> (I believe it's related to the page reclaim behavior)
>
> The first patch is to fix the deadlock that is only reproducible on
> x86_64.
> The second one is to fix btrfs-check errors that non-compressed block
> sized inline extents are reported as an error.
> The final one enables the 2K block size support for DEBUG builds.
>
> For now there are around a dozen of failed test cases, mostly related to
> inline and mkfs limitations, but this is good enough as the beginning of
> subpage testing on x86_64.
>
> Qu Wenruo (3):
> btrfs: subpage: do not hold subpage spin lock when clearing folio
> writeback
> btrfs: properly limit inline data extent according to block size
> btrfs: allow debug builds to accept 2K block size
Reviewed-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-02-28 14:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26 4:10 [PATCH 0/3] btrfs: support 2k block size for debug builds Qu Wenruo
2025-02-26 4:10 ` [PATCH 1/3] btrfs: subpage: do not hold subpage spin lock when clearing folio writeback Qu Wenruo
2025-02-26 4:10 ` [PATCH 2/3] btrfs: properly limit inline data extent according to block size Qu Wenruo
2025-02-26 4:10 ` [PATCH 3/3] btrfs: allow debug builds to accept 2K " Qu Wenruo
2025-02-28 14:19 ` [PATCH 0/3] btrfs: support 2k block size for debug builds David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.