[PATCH 0/4] btrfs: initial subpage support for zoned devices

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] btrfs: initial subpage support for zoned devices
@ 2024-02-19  6:08 Qu Wenruo
  2024-02-19  6:08 ` [PATCH 1/4] btrfs: make __extent_writepage_io() to write specified range only Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Qu Wenruo @ 2024-02-19  6:08 UTC (permalink / raw)
  To: linux-btrfs

[REPO]
https://github.com/adam900710/linux/tree/subpage_delalloc

Please fetch the whole series for testing, as it relies on 3 submitted
patches.

[BUG]
When running subpage btrfs (sectorsize < PAGE_SIZE) with zoned device,
btrfs can easily crash (with CONFIG_BTRFS_ASSERT enabled) with an
ASSERT():

 assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/extent_io.c:1384!
 CPU: 2 PID: 1711 Comm: fsstress Tainted: G           OE      6.8.0-rc4-custom+ #9
 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __extent_writepage_io+0x404/0x460 [btrfs]
 lr : __extent_writepage_io+0x404/0x460 [btrfs]
 Call trace:
  __extent_writepage_io+0x404/0x460 [btrfs]
  extent_write_locked_range+0x16c/0x460 [btrfs]
  run_delalloc_cow+0x88/0x118 [btrfs]
  btrfs_run_delalloc_range+0x128/0x228 [btrfs]
  writepage_delalloc+0xb8/0x178 [btrfs]
  __extent_writepage+0xc8/0x3a0 [btrfs]
  extent_write_cache_pages+0x1cc/0x460 [btrfs]
  extent_writepages+0x8c/0x120 [btrfs]
  btrfs_writepages+0x18/0x30 [btrfs]
  do_writepages+0x94/0x1f8
  filemap_fdatawrite_wbc+0x88/0xc8
  __filemap_fdatawrite_range+0x6c/0xa8
  filemap_flush+0x24/0x38
  btrfs_remap_file_range_prep+0x100/0x1a8 [btrfs]
  btrfs_remap_file_range+0x2ec/0x448 [btrfs]
  vfs_copy_file_range+0x4cc/0x520
  __do_sys_copy_file_range+0xc4/0x2e8
  __arm64_sys_copy_file_range+0x30/0xd0
  invoke_syscall+0x78/0x100
  el0_svc_common.constprop.0+0x48/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x3c/0x138
  el0t_64_sync_handler+0x120/0x130
  el0t_64_sync+0x194/0x198
 Code: 9108c021 90000be0 913d8000 9402bfad (d4210000)
 ---[ end trace 0000000000000000 ]---

[CAUSE]
There are several problems involved in this case:

- __extent_writepage_io() would always try writeback the whole page
- extent_write_locked_range() would only lock the first delalloc range

This two limits combined, result the following page cache layout to
cause the problem:

     0     4K     8K    12K    16K
     |/////|      |/////|

- btrfs_run_delalloc_range() called on the above page
- run_dealloc_cow() ran for range [0, 4K)
- __extent_writepage_io() called for the whole page
- __extent_writepage_io() submitted bio for [0, 4K)
- __extent_writepage_io() try to submit IO for [8K, 12K)
  But this range has no OE covered, and the existing extent map is a
  hole, and triggered the ASSERT().

  Even if we ignore the ASSERT(), we would still hit other problems.

- run_delalloc_cow() would unlock the full page.
- btrfs_run_delalloc_range() called again on the page
  The page is no longer locked, triggering another ASSERT() on
  PageLocked.

[FIX]
This series would try to fix the problem by:

- making __extent_writepage_io() to only write the specified range
- making writepage_delalloc() to lock all delalloc range first
  Then btrfs_run_delalloc_range() for each locked range.

  This patch is a temporaray solution, until needed subpage interfaces
  are introduced, and allowing me to do extra testing to make sure the
  lock-in-one-go behavior is safe.

  This is a preparation for allowing subpage delalloc async submission.

- adding subpage interfaces to go through all subpage locked ranges

- using new subpage interfaces to make sure the full page is only
  unlocked when the last writer lock owner is releasing the lock

But please note, this fix is only for the above mentioned problems.
There can still be other problems related to zoned+subpage, but at least
we won't trigger ASSERT()s and crash.

Qu Wenruo (4):
  btrfs: make __extent_writepage_io() to write specified range only
  btrfs: lock subpage ranges in one go for writepage_delalloc()
  btrfs: subpage: introduce helpers to handle subpage delalloc locking
  btrfs: migrate writepage_delalloc() to use subpage helpers

 fs/btrfs/extent_io.c |  95 +++++++++++++++++++++++-----
 fs/btrfs/subpage.c   | 144 +++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/subpage.h   |  10 ++-
 3 files changed, 228 insertions(+), 21 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] btrfs: make __extent_writepage_io() to write specified range only
  2024-02-19  6:08 [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
@ 2024-02-19  6:08 ` Qu Wenruo
  2024-02-19  6:08 ` [PATCH 2/4] btrfs: lock subpage ranges in one go for writepage_delalloc() Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2024-02-19  6:08 UTC (permalink / raw)
  To: linux-btrfs

Function __extent_writepage_io() is designed to find all dirty range of
a page, and add that dirty range into the bio_ctrl for submission.
It requires all the dirtied range to be covered by a delalloc range.

It get called in two locations, but one call site is not subpage aware:

- __extent_writepage()
  It get called when writepage_delalloc() returned 0, which means
  writepage_delalloc() has handled dellalloc for all subpage sectors
  inside the page.

  So this call site is fine.

- extent_write_locked_range()
  This call site is utilized by zoned support, and in this case, we may
  only run delalloc range for a subset of the page.

  In that case, __extent_writepage_io() can hit range which is not yet
  covered by delalloc range, and hit various ASSERT()s.

Fix this problem by:

- Introducing range parameters (@start and @len) to specify the range
  where __extent_writepage_io() should write check for.

  For the first call site, we just pass the whole page, and the behavior
  is not touched.

  For the second call site, we would avoid touching anything beyond the
  range, thus avoid the dirty range which is not yet covered by any
  delalloc range.

- Making btrfs_folio_assert_not_dirty() subpage aware
  The only caller is inside __extent_writepage_io(), and since that
  caller now accepts a subpage range, we should also check the subpage
  range other than the whole page.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 18 +++++++++++-------
 fs/btrfs/subpage.c   | 22 ++++++++++++++++------
 fs/btrfs/subpage.h   |  3 ++-
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 197b9f50e75c..556ec44fdf8e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1311,20 +1311,23 @@ static void find_next_dirty_byte(struct btrfs_fs_info *fs_info,
  * < 0 if there were errors (page still locked)
  */
 static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
-				 struct page *page,
+				 struct page *page, u64 start, u32 len,
 				 struct btrfs_bio_ctrl *bio_ctrl,
 				 loff_t i_size,
 				 int *nr_ret)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u64 cur = page_offset(page);
-	u64 end = cur + PAGE_SIZE - 1;
+	u64 cur = start;
+	u64 end = start + len - 1;
 	u64 extent_offset;
 	u64 block_start;
 	struct extent_map *em;
 	int ret = 0;
 	int nr = 0;
 
+	ASSERT(start >= page_offset(page) &&
+	       start + len <= page_offset(page) + PAGE_SIZE);
+
 	ret = btrfs_writepage_cow_fixup(page);
 	if (ret) {
 		/* Fixup worker will requeue */
@@ -1413,7 +1416,7 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		nr++;
 	}
 
-	btrfs_folio_assert_not_dirty(fs_info, page_folio(page));
+	btrfs_folio_assert_not_dirty(fs_info, page_folio(page), start, len);
 	*nr_ret = nr;
 	return 0;
 
@@ -1471,7 +1474,8 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 	if (ret)
 		goto done;
 
-	ret = __extent_writepage_io(BTRFS_I(inode), page, bio_ctrl, i_size, &nr);
+	ret = __extent_writepage_io(BTRFS_I(inode), page, page_offset(page),
+				    PAGE_SIZE, bio_ctrl, i_size, &nr);
 	if (ret == 1)
 		return 0;
 
@@ -2223,8 +2227,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
 			clear_page_dirty_for_io(page);
 		}
 
-		ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl,
-					    i_size, &nr);
+		ret = __extent_writepage_io(BTRFS_I(inode), page, cur, cur_len,
+					    &bio_ctrl, i_size, &nr);
 		if (ret == 1)
 			goto next_page;
 
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 3b3ef8bffe05..b33e43de7f93 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -706,19 +706,29 @@ IMPLEMENT_BTRFS_PAGE_OPS(checked, folio_set_checked, folio_clear_checked,
  * Make sure not only the page dirty bit is cleared, but also subpage dirty bit
  * is cleared.
  */
-void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio)
+void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
+				  struct folio *folio, u64 start, u32 len)
 {
-	struct btrfs_subpage *subpage = folio_get_private(folio);
+	struct btrfs_subpage *subpage;
+	int start_bit;
+	int nbits;
+	unsigned long flags;
 
 	if (!IS_ENABLED(CONFIG_BTRFS_ASSERT))
 		return;
 
-	ASSERT(!folio_test_dirty(folio));
-	if (!btrfs_is_subpage(fs_info, folio->mapping))
+	if (!btrfs_is_subpage(fs_info, folio->mapping)) {
+		ASSERT(!folio_test_dirty(folio));
 		return;
+	}
 
-	ASSERT(folio_test_private(folio) && folio_get_private(folio));
-	ASSERT(subpage_test_bitmap_all_zero(fs_info, subpage, dirty));
+	start_bit = subpage_calc_start_bit(fs_info, folio, dirty, start, len);
+	nbits = len >> fs_info->sectorsize_bits;
+	subpage = folio_get_private(folio);
+	ASSERT(subpage);
+	spin_lock_irqsave(&subpage->lock, flags);
+	ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
+	spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
 /*
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index b6dc013b0fdc..4b363d9453af 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -156,7 +156,8 @@ DECLARE_BTRFS_SUBPAGE_OPS(checked);
 bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
 					struct folio *folio, u64 start, u32 len);
 
-void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info, struct folio *folio);
+void btrfs_folio_assert_not_dirty(const struct btrfs_fs_info *fs_info,
+				  struct folio *folio, u64 start, u32 len);
 void btrfs_folio_unlock_writer(struct btrfs_fs_info *fs_info,
 			       struct folio *folio, u64 start, u32 len);
 void __cold btrfs_subpage_dump_bitmap(const struct btrfs_fs_info *fs_info,
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] btrfs: lock subpage ranges in one go for writepage_delalloc()
  2024-02-19  6:08 [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
  2024-02-19  6:08 ` [PATCH 1/4] btrfs: make __extent_writepage_io() to write specified range only Qu Wenruo
@ 2024-02-19  6:08 ` Qu Wenruo
  2024-02-19  6:08 ` [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2024-02-19  6:08 UTC (permalink / raw)
  To: linux-btrfs

If we have a subpage range like this for a 16K page with 4K sectorsize:

    0     4K     8K     12K     16K
    |/////|      |//////|       |

    |/////| = dirty range

Currently writepage_delalloc() would go through the following steps:

- lock range [0, 4K)
- run delalloc range for [0, 4K)
- lock range [8K, 12K)
- run delalloc range for [8K 12K)

So far it's fine for subpage, as btrfs_run_delalloc_range() would only
go through cow_file_range()/run_delalloc_cow(), or
run_delalloc_compressed() for the whole page.

But there is a special pitfall for zoned subpage, where we will go
through run_delalloc_cow(), which would unlock the whole page.

This patch would not yet address the run_delalloc_cow() problem, but
would prepare for the proper fix by changing the workflow to the
following one:

- lock range [0, 4K)
- lock range [8K, 12K)
- run delalloc range for [0, 4K)
- run delalloc range for [8K, 12K)

So later for subpage cases we can do extra work to ensure
run_delalloc_cow() to only unlock the full page until the last lock
user.

To save the previously locked range, we utilize a structure called
locked_delalloc_list, which would cached the last hit delalloc range,
thus for non-subpage cases, it would use that cached value.

For subpage cases since we can hit multiple delalloc ranges inside a
page, a list would be utilized and we will allocate memory for them.
This introduced a new memory allocation thus extra error paths, but this
method is only tempoarary, we will later use subpage bitmap to avoid
the memory allocation.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 140 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 136 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 556ec44fdf8e..e79676422c16 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1185,6 +1185,101 @@ static inline void contiguous_readpages(struct page *pages[], int nr_pages,
 	}
 }
 
+struct locked_delalloc_range {
+	struct list_head list;
+	u64 delalloc_start;
+	u32 delalloc_len;
+};
+
+/*
+ * Save the locked delalloc range.
+ *
+ * This is for subpage only, as for regular sectorsize, there will be at most
+ * one locked delalloc range for a page.
+ */
+struct locked_delalloc_list {
+	u64 last_delalloc_start;
+	u32 last_delalloc_len;
+	struct list_head head;
+};
+
+static void init_locked_delalloc_list(struct locked_delalloc_list *locked_list)
+{
+	INIT_LIST_HEAD(&locked_list->head);
+	locked_list->last_delalloc_start = 0;
+	locked_list->last_delalloc_len = 0;
+}
+
+static void release_locked_delalloc_list(struct locked_delalloc_list *locked_list)
+{
+	while (!list_empty(&locked_list->head)) {
+		struct locked_delalloc_range *entry;
+
+		entry = list_entry(locked_list->head.next,
+				   struct locked_delalloc_range, list);
+
+		list_del_init(&entry->list);
+		kfree(entry);
+	}
+}
+
+static int add_locked_delalloc_range(struct btrfs_fs_info *fs_info,
+				     struct locked_delalloc_list *locked_list,
+				     u64 start, u32 len)
+{
+	struct locked_delalloc_range *entry;
+
+	entry = kmalloc(sizeof(*entry), GFP_NOFS);
+	if (!entry)
+		return -ENOMEM;
+
+	if (locked_list->last_delalloc_len == 0) {
+		locked_list->last_delalloc_start = start;
+		locked_list->last_delalloc_len = len;
+		return 0;
+	}
+	/* The new entry must be beyond the current one. */
+	ASSERT(start >= locked_list->last_delalloc_start +
+			locked_list->last_delalloc_len);
+
+	/* Only subpage case can have more than one delalloc ranges inside a page. */
+	ASSERT(fs_info->sectorsize < PAGE_SIZE);
+
+	entry->delalloc_start = locked_list->last_delalloc_start;
+	entry->delalloc_len = locked_list->last_delalloc_len;
+	locked_list->last_delalloc_start = start;
+	locked_list->last_delalloc_len = len;
+	list_add_tail(&entry->list, &locked_list->head);
+	return 0;
+}
+
+static void __cold unlock_one_locked_delalloc_range(struct btrfs_inode *binode,
+						    struct page *locked_page,
+						    u64 start, u32 len)
+{
+	u64 delalloc_end = start + len - 1;
+
+	unlock_extent(&binode->io_tree, start, delalloc_end, NULL);
+	__unlock_for_delalloc(&binode->vfs_inode, locked_page, start,
+			      delalloc_end);
+}
+
+static void unlock_locked_delalloc_list(struct btrfs_inode *binode,
+					struct page *locked_page,
+					struct locked_delalloc_list *locked_list)
+{
+	struct locked_delalloc_range *entry;
+
+	list_for_each_entry(entry, &locked_list->head, list)
+		unlock_one_locked_delalloc_range(binode, locked_page,
+				entry->delalloc_start, entry->delalloc_len);
+	if (locked_list->last_delalloc_len) {
+		unlock_one_locked_delalloc_range(binode, locked_page,
+				locked_list->last_delalloc_start,
+				locked_list->last_delalloc_len);
+	}
+}
+
 /*
  * helper for __extent_writepage, doing all of the delayed allocation setup.
  *
@@ -1198,13 +1293,17 @@ static inline void contiguous_readpages(struct page *pages[], int nr_pages,
 static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		struct page *page, struct writeback_control *wbc)
 {
+	struct btrfs_fs_info *fs_info = inode_to_fs_info(&inode->vfs_inode);
 	const u64 page_start = page_offset(page);
 	const u64 page_end = page_start + PAGE_SIZE - 1;
+	struct locked_delalloc_list locked_list;
+	struct locked_delalloc_range *entry;
 	u64 delalloc_start = page_start;
 	u64 delalloc_end = page_end;
 	u64 delalloc_to_write = 0;
 	int ret = 0;
 
+	init_locked_delalloc_list(&locked_list);
 	while (delalloc_start < page_end) {
 		delalloc_end = page_end;
 		if (!find_lock_delalloc_range(&inode->vfs_inode, page,
@@ -1212,14 +1311,47 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			delalloc_start = delalloc_end + 1;
 			continue;
 		}
-
-		ret = btrfs_run_delalloc_range(inode, page, delalloc_start,
-					       delalloc_end, wbc);
-		if (ret < 0)
+		ret = add_locked_delalloc_range(fs_info, &locked_list,
+				delalloc_start, delalloc_end + 1 - delalloc_start);
+		if (ret < 0) {
+			unlock_locked_delalloc_list(inode, page, &locked_list);
+			release_locked_delalloc_list(&locked_list);
 			return ret;
+		}
 
 		delalloc_start = delalloc_end + 1;
 	}
+	list_for_each_entry(entry, &locked_list.head, list) {
+		delalloc_end = entry->delalloc_start + entry->delalloc_len - 1;
+
+		/*
+		 * Hit error in the previous run, cleanup the locked
+		 * extents/pages.
+		 */
+		if (ret < 0) {
+			unlock_one_locked_delalloc_range(inode, page,
+					entry->delalloc_start, entry->delalloc_len);
+			continue;
+		}
+		ret = btrfs_run_delalloc_range(inode, page, entry->delalloc_start,
+					       delalloc_end, wbc);
+	}
+	if (locked_list.last_delalloc_len) {
+		delalloc_end = locked_list.last_delalloc_start +
+			       locked_list.last_delalloc_len - 1;
+
+		if (ret < 0)
+			unlock_one_locked_delalloc_range(inode, page,
+					locked_list.last_delalloc_start,
+					locked_list.last_delalloc_len);
+		else
+			ret = btrfs_run_delalloc_range(inode, page,
+					locked_list.last_delalloc_start,
+					delalloc_end, wbc);
+	}
+	release_locked_delalloc_list(&locked_list);
+	if (ret < 0)
+		return ret;
 
 	/*
 	 * delalloc_end is already one less than the total length, so
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
  2024-02-19  6:08 [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
  2024-02-19  6:08 ` [PATCH 1/4] btrfs: make __extent_writepage_io() to write specified range only Qu Wenruo
  2024-02-19  6:08 ` [PATCH 2/4] btrfs: lock subpage ranges in one go for writepage_delalloc() Qu Wenruo
@ 2024-02-19  6:08 ` Qu Wenruo
  2024-02-20  0:52   ` kernel test robot
  2024-02-19  6:08 ` [PATCH 4/4] btrfs: migrate writepage_delalloc() to use subpage helpers Qu Wenruo
  2024-03-04  3:13 ` [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
  4 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-02-19  6:08 UTC (permalink / raw)
  To: linux-btrfs

Two new helpers are introduced for the incoming subpage delalloc locking
change.

- btrfs_folio_set_writer_lock()
  This is to mark specified range with subpage specific writer lock.
  After calling this, the subpage range can be proper unlocked by
  btrfs_folio_end_writer_lock()

- btrfs_subpage_find_writer_locked()
  This is to find the writer locked subpage range in a page.
  With the help of btrfs_folio_set_writer_lock(), it can allow us to
  record and find previously locked subpage range without extra memory
  allocation.

- btrfs_folio_end_all_writers()
  This is for the locked_page of __extent_writepage(), as there may be
  multiple subpage delalloc ranges locked.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/subpage.c | 116 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/subpage.h |   7 +++
 2 files changed, 123 insertions(+)

diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index b33e43de7f93..162a10eee3fd 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -778,6 +778,122 @@ void btrfs_folio_unlock_writer(struct btrfs_fs_info *fs_info,
 	btrfs_folio_end_writer_lock(fs_info, folio, start, len);
 }
 
+/*
+ * This is for folio already locked by plain lock_page()/folio_lock(), which
+ * doesn't have any subpage awareness.
+ *
+ * This would populate the involved subpage ranges so that subpage helpers can
+ * properly unlock them.
+ */
+void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
+				 struct folio *folio, u64 start, u32 len)
+{
+	struct btrfs_subpage *subpage;
+	unsigned long flags;
+	int start_bit;
+	int nbits;
+	int ret;
+
+	ASSERT(folio_test_locked(folio));
+	if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping))
+		return;
+
+	subpage = folio_get_private(folio);
+	start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len);
+	nbits = len >> fs_info->sectorsize_bits;
+	spin_lock_irqsave(&subpage->lock, flags);
+	/* Target range should not yet be locked. */
+	ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
+	bitmap_set(subpage->bitmaps, start_bit, nbits);
+	ret = atomic_add_return(nbits, &subpage->writers);
+	ASSERT(ret <= fs_info->subpage_info->bitmap_nr_bits);
+	spin_unlock_irqrestore(&subpage->lock, flags);
+}
+
+/*
+ * Find any subpage writer locked range inside @folio, starting at file offset
+ * @search_start.
+ * The caller should ensure the folio is locked.
+ *
+ * Return true and update @found_start_ret and @found_len_ret to the first
+ * writer locked range.
+ * Return false if there is no writer locked range.
+ */
+bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info,
+				      struct folio *folio, u64 search_start,
+				      u64 *found_start_ret, u32 *found_len_ret)
+{
+	struct btrfs_subpage_info *subpage_info = fs_info->subpage_info;
+	struct btrfs_subpage *subpage = folio_get_private(folio);
+	const int len = PAGE_SIZE - offset_in_page(search_start);
+	const int start_bit = subpage_calc_start_bit(fs_info, folio, locked,
+						     search_start, len);
+	const int locked_bitmap_start = subpage_info->locked_offset;
+	const int locked_bitmap_end = locked_bitmap_start +
+				      subpage_info->bitmap_nr_bits;
+	unsigned long flags;
+	int first_zero;
+	int first_set;
+	bool found = false;
+
+	ASSERT(folio_test_locked(folio));
+	spin_lock_irqsave(&subpage->lock, flags);
+	first_set = find_next_bit(subpage->bitmaps, locked_bitmap_end,
+				  start_bit);
+	if (first_set >= locked_bitmap_end)
+		goto out;
+
+	found = true;
+	*found_start_ret = folio_pos(folio) +
+		((first_set - locked_bitmap_start) << fs_info->sectorsize_bits);
+
+	first_zero = find_next_zero_bit(subpage->bitmaps,
+					locked_bitmap_end, first_set);
+	*found_len_ret = (first_zero - first_set) << fs_info->sectorsize_bits;
+out:
+	spin_unlock_irqrestore(&subpage->lock, flags);
+	return found;
+}
+
+/*
+ * Unlike btrfs_folio_end_writer_lock() which unlock a specified subpage range,
+ * this would end all writer locked ranges of a page.
+ *
+ * This is for the locked page of __extent_writepage(), as the locked page
+ * can contain several locked subpage ranges.
+ */
+void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info,
+				 struct folio *folio)
+{
+	u64 folio_start = folio_pos(folio);
+	u64 cur = folio_start;
+
+	ASSERT(folio_test_locked(folio));
+	if (!btrfs_is_subpage(fs_info, folio->mapping)) {
+		folio_unlock(folio);
+		return;
+	}
+
+	while (cur < folio_start + PAGE_SIZE) {
+		u64 found_start;
+		u32 found_len;
+		bool found;
+		bool last;
+
+		found = btrfs_subpage_find_writer_locked(fs_info, folio, cur,
+							 &found_start, &found_len);
+		if (!found)
+			break;
+		last = btrfs_subpage_end_and_test_writer(fs_info, folio,
+							 found_start, found_len);
+		if (last) {
+			folio_unlock(folio);
+			break;
+		}
+		cur = found_start + found_len;
+	}
+}
+
 #define GET_SUBPAGE_BITMAP(subpage, subpage_info, name, dst)		\
 	bitmap_cut(dst, subpage->bitmaps, 0,				\
 		   subpage_info->name##_offset, subpage_info->bitmap_nr_bits)
diff --git a/fs/btrfs/subpage.h b/fs/btrfs/subpage.h
index 4b363d9453af..9f19850d59f2 100644
--- a/fs/btrfs/subpage.h
+++ b/fs/btrfs/subpage.h
@@ -112,6 +112,13 @@ int btrfs_folio_start_writer_lock(const struct btrfs_fs_info *fs_info,
 				  struct folio *folio, u64 start, u32 len);
 void btrfs_folio_end_writer_lock(const struct btrfs_fs_info *fs_info,
 				 struct folio *folio, u64 start, u32 len);
+void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
+				 struct folio *folio, u64 start, u32 len);
+bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info,
+				      struct folio *folio, u64 search_start,
+				      u64 *found_start_ret, u32 *found_len_ret);
+void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info,
+				 struct folio *folio);
 
 /*
  * Template for subpage related operations.
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
  2024-02-19  6:08 ` [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking Qu Wenruo
@ 2024-02-20  0:52   ` kernel test robot
  2024-02-20  1:16     ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: kernel test robot @ 2024-02-20  0:52 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: oe-kbuild-all

Hi Qu,

kernel test robot noticed the following build errors:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on linus/master v6.8-rc5 next-20240219]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-make-__extent_writepage_io-to-write-specified-range-only/20240219-141053
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/02f5ad17b6415670bea37433c8ca332a06253315.1708322044.git.wqu%40suse.com
patch subject: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20240220/202402200823.Su3xBnia-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240220/202402200823.Su3xBnia-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202402200823.Su3xBnia-lkp@intel.com/

All errors (new ones prefixed by >>):

   fs/btrfs/subpage.c: In function 'btrfs_folio_set_writer_lock':
>> fs/btrfs/subpage.c:758:60: error: 'struct btrfs_subpage_info' has no member named 'locked_offset'; did you mean 'checked_offset'?
     758 |         start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len);
         |                                                            ^~~~~~
   fs/btrfs/subpage.c:374:45: note: in definition of macro 'subpage_calc_start_bit'
     374 |         start_bit += fs_info->subpage_info->name##_offset;              \
         |                                             ^~~~
   fs/btrfs/subpage.c: In function 'btrfs_subpage_find_writer_locked':
   fs/btrfs/subpage.c:785:70: error: 'struct btrfs_subpage_info' has no member named 'locked_offset'; did you mean 'checked_offset'?
     785 |         const int start_bit = subpage_calc_start_bit(fs_info, folio, locked,
         |                                                                      ^~~~~~
   fs/btrfs/subpage.c:374:45: note: in definition of macro 'subpage_calc_start_bit'
     374 |         start_bit += fs_info->subpage_info->name##_offset;              \
         |                                             ^~~~
   fs/btrfs/subpage.c:787:55: error: 'struct btrfs_subpage_info' has no member named 'locked_offset'; did you mean 'checked_offset'?
     787 |         const int locked_bitmap_start = subpage_info->locked_offset;
         |                                                       ^~~~~~~~~~~~~
         |                                                       checked_offset


vim +758 fs/btrfs/subpage.c

   736	
   737	/*
   738	 * This is for folio already locked by plain lock_page()/folio_lock(), which
   739	 * doesn't have any subpage awareness.
   740	 *
   741	 * This would populate the involved subpage ranges so that subpage helpers can
   742	 * properly unlock them.
   743	 */
   744	void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
   745					 struct folio *folio, u64 start, u32 len)
   746	{
   747		struct btrfs_subpage *subpage;
   748		unsigned long flags;
   749		int start_bit;
   750		int nbits;
   751		int ret;
   752	
   753		ASSERT(folio_test_locked(folio));
   754		if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping))
   755			return;
   756	
   757		subpage = folio_get_private(folio);
 > 758		start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len);
   759		nbits = len >> fs_info->sectorsize_bits;
   760		spin_lock_irqsave(&subpage->lock, flags);
   761		/* Target range should not yet be locked. */
   762		ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
   763		bitmap_set(subpage->bitmaps, start_bit, nbits);
   764		ret = atomic_add_return(nbits, &subpage->writers);
   765		ASSERT(ret <= fs_info->subpage_info->bitmap_nr_bits);
   766		spin_unlock_irqrestore(&subpage->lock, flags);
   767	}
   768	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
  2024-02-20  0:52   ` kernel test robot
@ 2024-02-20  1:16     ` Qu Wenruo
  2024-02-20  7:58       ` Yujie Liu
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-02-20  1:16 UTC (permalink / raw)
  To: kernel test robot, Qu Wenruo, linux-btrfs; +Cc: oe-kbuild-all



在 2024/2/20 11:22, kernel test robot 写道:
> Hi Qu,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on kdave/for-next]
> [also build test ERROR on linus/master v6.8-rc5 next-20240219]
> [If your patch is applied to the wrong git tree, kindly drop us a note.

This is applied without the previous preparation patches.

Is it possible to teach LKP to fetch from certain branch other than
applying them directly to for-next?

Do I need certain keyword in the cover letter?

Thanks,
Qu
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-make-__extent_writepage_io-to-write-specified-range-only/20240219-141053
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> patch link:    https://lore.kernel.org/r/02f5ad17b6415670bea37433c8ca332a06253315.1708322044.git.wqu%40suse.com
> patch subject: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
> config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20240220/202402200823.Su3xBnia-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240220/202402200823.Su3xBnia-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202402200823.Su3xBnia-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
>     fs/btrfs/subpage.c: In function 'btrfs_folio_set_writer_lock':
>>> fs/btrfs/subpage.c:758:60: error: 'struct btrfs_subpage_info' has no member named 'locked_offset'; did you mean 'checked_offset'?
>       758 |         start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len);
>           |                                                            ^~~~~~
>     fs/btrfs/subpage.c:374:45: note: in definition of macro 'subpage_calc_start_bit'
>       374 |         start_bit += fs_info->subpage_info->name##_offset;              \
>           |                                             ^~~~
>     fs/btrfs/subpage.c: In function 'btrfs_subpage_find_writer_locked':
>     fs/btrfs/subpage.c:785:70: error: 'struct btrfs_subpage_info' has no member named 'locked_offset'; did you mean 'checked_offset'?
>       785 |         const int start_bit = subpage_calc_start_bit(fs_info, folio, locked,
>           |                                                                      ^~~~~~
>     fs/btrfs/subpage.c:374:45: note: in definition of macro 'subpage_calc_start_bit'
>       374 |         start_bit += fs_info->subpage_info->name##_offset;              \
>           |                                             ^~~~
>     fs/btrfs/subpage.c:787:55: error: 'struct btrfs_subpage_info' has no member named 'locked_offset'; did you mean 'checked_offset'?
>       787 |         const int locked_bitmap_start = subpage_info->locked_offset;
>           |                                                       ^~~~~~~~~~~~~
>           |                                                       checked_offset
>
>
> vim +758 fs/btrfs/subpage.c
>
>     736
>     737	/*
>     738	 * This is for folio already locked by plain lock_page()/folio_lock(), which
>     739	 * doesn't have any subpage awareness.
>     740	 *
>     741	 * This would populate the involved subpage ranges so that subpage helpers can
>     742	 * properly unlock them.
>     743	 */
>     744	void btrfs_folio_set_writer_lock(const struct btrfs_fs_info *fs_info,
>     745					 struct folio *folio, u64 start, u32 len)
>     746	{
>     747		struct btrfs_subpage *subpage;
>     748		unsigned long flags;
>     749		int start_bit;
>     750		int nbits;
>     751		int ret;
>     752
>     753		ASSERT(folio_test_locked(folio));
>     754		if (unlikely(!fs_info) || !btrfs_is_subpage(fs_info, folio->mapping))
>     755			return;
>     756
>     757		subpage = folio_get_private(folio);
>   > 758		start_bit = subpage_calc_start_bit(fs_info, folio, locked, start, len);
>     759		nbits = len >> fs_info->sectorsize_bits;
>     760		spin_lock_irqsave(&subpage->lock, flags);
>     761		/* Target range should not yet be locked. */
>     762		ASSERT(bitmap_test_range_all_zero(subpage->bitmaps, start_bit, nbits));
>     763		bitmap_set(subpage->bitmaps, start_bit, nbits);
>     764		ret = atomic_add_return(nbits, &subpage->writers);
>     765		ASSERT(ret <= fs_info->subpage_info->bitmap_nr_bits);
>     766		spin_unlock_irqrestore(&subpage->lock, flags);
>     767	}
>     768
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
  2024-02-20  1:16     ` Qu Wenruo
@ 2024-02-20  7:58       ` Yujie Liu
  2024-02-20  8:26         ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Yujie Liu @ 2024-02-20  7:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: kernel test robot, Qu Wenruo, linux-btrfs, oe-kbuild-all

Hi Qu,

On Tue, Feb 20, 2024 at 11:46:17AM +1030, Qu Wenruo wrote:
> 在 2024/2/20 11:22, kernel test robot 写道:
> > Hi Qu,
> > 
> > kernel test robot noticed the following build errors:
> > 
> > [auto build test ERROR on kdave/for-next]
> > [also build test ERROR on linus/master v6.8-rc5 next-20240219]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> 
> This is applied without the previous preparation patches.
> 
> Is it possible to teach LKP to fetch from certain branch other than
> applying them directly to for-next?
> 
> Do I need certain keyword in the cover letter?

Sorry for applying this patchset on an incorrect base. If the cover
letter mentions that the patches has already been pushed to a certain
branch, usually we will skip the patchset and directly test that branch,
but seems the bot didn't recognize that there is a github link in the
cover letter. We will investigate this and fix it ASAP.

Best Regards,
Yujie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
  2024-02-20  7:58       ` Yujie Liu
@ 2024-02-20  8:26         ` Qu Wenruo
  2024-02-20  9:23           ` Yujie Liu
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-02-20  8:26 UTC (permalink / raw)
  To: Yujie Liu, Qu Wenruo; +Cc: kernel test robot, linux-btrfs, oe-kbuild-all



在 2024/2/20 18:28, Yujie Liu 写道:
> Hi Qu,
> 
> On Tue, Feb 20, 2024 at 11:46:17AM +1030, Qu Wenruo wrote:
>> 在 2024/2/20 11:22, kernel test robot 写道:
>>> Hi Qu,
>>>
>>> kernel test robot noticed the following build errors:
>>>
>>> [auto build test ERROR on kdave/for-next]
>>> [also build test ERROR on linus/master v6.8-rc5 next-20240219]
>>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>>
>> This is applied without the previous preparation patches.
>>
>> Is it possible to teach LKP to fetch from certain branch other than
>> applying them directly to for-next?
>>
>> Do I need certain keyword in the cover letter?
> 
> Sorry for applying this patchset on an incorrect base. If the cover
> letter mentions that the patches has already been pushed to a certain
> branch, usually we will skip the patchset and directly test that branch,
> but seems the bot didn't recognize that there is a github link in the
> cover letter. We will investigate this and fix it ASAP.

My guess is, I'm using github's branch URL in the cover letter:

   https://github.com/adam900710/linux/tree/subpage_delalloc

I guess I should go something more traditional like?

   https://github.com/adam900710/linux.git subpage_delalloc

Thanks,
Qu
> 
> Best Regards,
> Yujie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking
  2024-02-20  8:26         ` Qu Wenruo
@ 2024-02-20  9:23           ` Yujie Liu
  0 siblings, 0 replies; 14+ messages in thread
From: Yujie Liu @ 2024-02-20  9:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, kernel test robot, linux-btrfs, oe-kbuild-all

On Tue, Feb 20, 2024 at 06:56:33PM +1030, Qu Wenruo wrote:
> 
> 
> 在 2024/2/20 18:28, Yujie Liu 写道:
> > Hi Qu,
> > 
> > On Tue, Feb 20, 2024 at 11:46:17AM +1030, Qu Wenruo wrote:
> > > 在 2024/2/20 11:22, kernel test robot 写道:
> > > > Hi Qu,
> > > > 
> > > > kernel test robot noticed the following build errors:
> > > > 
> > > > [auto build test ERROR on kdave/for-next]
> > > > [also build test ERROR on linus/master v6.8-rc5 next-20240219]
> > > > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > > 
> > > This is applied without the previous preparation patches.
> > > 
> > > Is it possible to teach LKP to fetch from certain branch other than
> > > applying them directly to for-next?
> > > 
> > > Do I need certain keyword in the cover letter?
> > 
> > Sorry for applying this patchset on an incorrect base. If the cover
> > letter mentions that the patches has already been pushed to a certain
> > branch, usually we will skip the patchset and directly test that branch,
> > but seems the bot didn't recognize that there is a github link in the
> > cover letter. We will investigate this and fix it ASAP.
> 
> My guess is, I'm using github's branch URL in the cover letter:
> 
>   https://github.com/adam900710/linux/tree/subpage_delalloc
> 
> I guess I should go something more traditional like?
> 
>   https://github.com/adam900710/linux.git subpage_delalloc

Thanks. We've fixed the bot to recognize the github links in both of
above styles.

Best Regards,
Yujie

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/4] btrfs: migrate writepage_delalloc() to use subpage helpers
  2024-02-19  6:08 [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
                   ` (2 preceding siblings ...)
  2024-02-19  6:08 ` [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking Qu Wenruo
@ 2024-02-19  6:08 ` Qu Wenruo
  2024-03-04  3:13 ` [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
  4 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2024-02-19  6:08 UTC (permalink / raw)
  To: linux-btrfs

Currently writepage_delalloc() is using a list based solution to save
locked subpage ranges, but that would introduce extra memory allocation
thus extra error paths.

On the other hand, we already have subpage locked bitmap and helpers to
set and find a subpage locked range, we can use those helpers to record
locked subpage ranges without allocating new memory.

Although we still have several small pitfalls:

- We still need to record the last delalloc range end
  This is because subpage bitmap can only record ranges inside the page,
  while the last delalloc range end can go beyond the page boundary.

- We still need to handle errors in previous iteration
  Since we can have multiple locked delalloc ranges thus we have to call
  run_delalloc_ranges() multiple times.
  If we hit an error half way, we still need to unlock the remaining
  ranges.

- We can not longer touch the page if we have run the last delalloc
  range.
  This is for zoned subpage support, as for non-zoned subpage, the async
  submission can only happen for full page aligned ranges.

  For zoned subpage, if we hit the last delalloc range, it would unlock
  the full page, and if we continue to do the search,
  btrfs_subpage_find_writer_locked() would throw an ASSERT() as the
  page is no longer locked.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 195 ++++++++++++++-----------------------------
 fs/btrfs/subpage.c   |   6 ++
 2 files changed, 69 insertions(+), 132 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e79676422c16..522bfa9670b3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1185,101 +1185,6 @@ static inline void contiguous_readpages(struct page *pages[], int nr_pages,
 	}
 }
 
-struct locked_delalloc_range {
-	struct list_head list;
-	u64 delalloc_start;
-	u32 delalloc_len;
-};
-
-/*
- * Save the locked delalloc range.
- *
- * This is for subpage only, as for regular sectorsize, there will be at most
- * one locked delalloc range for a page.
- */
-struct locked_delalloc_list {
-	u64 last_delalloc_start;
-	u32 last_delalloc_len;
-	struct list_head head;
-};
-
-static void init_locked_delalloc_list(struct locked_delalloc_list *locked_list)
-{
-	INIT_LIST_HEAD(&locked_list->head);
-	locked_list->last_delalloc_start = 0;
-	locked_list->last_delalloc_len = 0;
-}
-
-static void release_locked_delalloc_list(struct locked_delalloc_list *locked_list)
-{
-	while (!list_empty(&locked_list->head)) {
-		struct locked_delalloc_range *entry;
-
-		entry = list_entry(locked_list->head.next,
-				   struct locked_delalloc_range, list);
-
-		list_del_init(&entry->list);
-		kfree(entry);
-	}
-}
-
-static int add_locked_delalloc_range(struct btrfs_fs_info *fs_info,
-				     struct locked_delalloc_list *locked_list,
-				     u64 start, u32 len)
-{
-	struct locked_delalloc_range *entry;
-
-	entry = kmalloc(sizeof(*entry), GFP_NOFS);
-	if (!entry)
-		return -ENOMEM;
-
-	if (locked_list->last_delalloc_len == 0) {
-		locked_list->last_delalloc_start = start;
-		locked_list->last_delalloc_len = len;
-		return 0;
-	}
-	/* The new entry must be beyond the current one. */
-	ASSERT(start >= locked_list->last_delalloc_start +
-			locked_list->last_delalloc_len);
-
-	/* Only subpage case can have more than one delalloc ranges inside a page. */
-	ASSERT(fs_info->sectorsize < PAGE_SIZE);
-
-	entry->delalloc_start = locked_list->last_delalloc_start;
-	entry->delalloc_len = locked_list->last_delalloc_len;
-	locked_list->last_delalloc_start = start;
-	locked_list->last_delalloc_len = len;
-	list_add_tail(&entry->list, &locked_list->head);
-	return 0;
-}
-
-static void __cold unlock_one_locked_delalloc_range(struct btrfs_inode *binode,
-						    struct page *locked_page,
-						    u64 start, u32 len)
-{
-	u64 delalloc_end = start + len - 1;
-
-	unlock_extent(&binode->io_tree, start, delalloc_end, NULL);
-	__unlock_for_delalloc(&binode->vfs_inode, locked_page, start,
-			      delalloc_end);
-}
-
-static void unlock_locked_delalloc_list(struct btrfs_inode *binode,
-					struct page *locked_page,
-					struct locked_delalloc_list *locked_list)
-{
-	struct locked_delalloc_range *entry;
-
-	list_for_each_entry(entry, &locked_list->head, list)
-		unlock_one_locked_delalloc_range(binode, locked_page,
-				entry->delalloc_start, entry->delalloc_len);
-	if (locked_list->last_delalloc_len) {
-		unlock_one_locked_delalloc_range(binode, locked_page,
-				locked_list->last_delalloc_start,
-				locked_list->last_delalloc_len);
-	}
-}
-
 /*
  * helper for __extent_writepage, doing all of the delayed allocation setup.
  *
@@ -1294,16 +1199,21 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 		struct page *page, struct writeback_control *wbc)
 {
 	struct btrfs_fs_info *fs_info = inode_to_fs_info(&inode->vfs_inode);
+	struct folio *folio = page_folio(page);
 	const u64 page_start = page_offset(page);
 	const u64 page_end = page_start + PAGE_SIZE - 1;
-	struct locked_delalloc_list locked_list;
-	struct locked_delalloc_range *entry;
+	/*
+	 * Saves the last found delalloc end. As the delalloc end can go beyond
+	 * page boundary, thus we can not rely on subpage bitmap to locate
+	 * the last dealloc end.
+	 */
+	u64 last_delalloc_end = 0;
 	u64 delalloc_start = page_start;
 	u64 delalloc_end = page_end;
 	u64 delalloc_to_write = 0;
 	int ret = 0;
 
-	init_locked_delalloc_list(&locked_list);
+	/* Lock all (subpage) dealloc ranges inside the page first. */
 	while (delalloc_start < page_end) {
 		delalloc_end = page_end;
 		if (!find_lock_delalloc_range(&inode->vfs_inode, page,
@@ -1311,48 +1221,68 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
 			delalloc_start = delalloc_end + 1;
 			continue;
 		}
-		ret = add_locked_delalloc_range(fs_info, &locked_list,
-				delalloc_start, delalloc_end + 1 - delalloc_start);
-		if (ret < 0) {
-			unlock_locked_delalloc_list(inode, page, &locked_list);
-			release_locked_delalloc_list(&locked_list);
-			return ret;
-		}
-
+		btrfs_folio_set_writer_lock(fs_info, folio, delalloc_start,
+					    min(delalloc_end, page_end) + 1 -
+					    delalloc_start);
+		last_delalloc_end = delalloc_end;
 		delalloc_start = delalloc_end + 1;
 	}
-	list_for_each_entry(entry, &locked_list.head, list) {
-		delalloc_end = entry->delalloc_start + entry->delalloc_len - 1;
+	delalloc_start = page_start;
+	/* Run the delalloc ranges for above locked ranges. */
+	while (last_delalloc_end && delalloc_start < page_end) {
+		u64 found_start;
+		u32 found_len;
+		bool found;
 
-		/*
-		 * Hit error in the previous run, cleanup the locked
-		 * extents/pages.
-		 */
-		if (ret < 0) {
-			unlock_one_locked_delalloc_range(inode, page,
-					entry->delalloc_start, entry->delalloc_len);
-			continue;
+		if (!btrfs_is_subpage(fs_info, page->mapping)) {
+			/*
+			 * For non-subpage case, the found delalloc range must
+			 * cover this page and there must be only one locked
+			 * delalloc range.
+			 */
+			found_start = page_start;
+			found_len = last_delalloc_end + 1 - found_start;
+			found = true;
+		} else {
+			found = btrfs_subpage_find_writer_locked(fs_info, folio,
+					delalloc_start, &found_start, &found_len);
 		}
-		ret = btrfs_run_delalloc_range(inode, page, entry->delalloc_start,
-					       delalloc_end, wbc);
-	}
-	if (locked_list.last_delalloc_len) {
-		delalloc_end = locked_list.last_delalloc_start +
-			       locked_list.last_delalloc_len - 1;
+		if (!found)
+			break;
+		/*
+		 * The subpage range covers the last sector, the delalloc range may
+		 * end beyonds the page boundary, use the saved delalloc_end
+		 * instead.
+		 */
+		if (found_start + found_len >= page_end)
+			found_len = last_delalloc_end + 1 - found_start;
 
-		if (ret < 0)
-			unlock_one_locked_delalloc_range(inode, page,
-					locked_list.last_delalloc_start,
-					locked_list.last_delalloc_len);
-		else
-			ret = btrfs_run_delalloc_range(inode, page,
-					locked_list.last_delalloc_start,
-					delalloc_end, wbc);
+		if (ret < 0) {
+			/* Cleanup the remaining locked ranges. */
+			unlock_extent(&inode->io_tree, found_start,
+				      found_start + found_len - 1, NULL);
+			__unlock_for_delalloc(&inode->vfs_inode, page, found_start,
+					      found_start + found_len - 1);
+		} else {
+			ret = btrfs_run_delalloc_range(inode, page, found_start,
+						       found_start + found_len - 1, wbc);
+		}
+		/*
+		 * Above btrfs_run_delalloc_range() may have unlocked the page,
+		 * Thus for the last range, we can not touch the page anymore.
+		 */
+		if (found_start + found_len >= last_delalloc_end + 1)
+			break;
+
+		delalloc_start = found_start + found_len;
 	}
-	release_locked_delalloc_list(&locked_list);
 	if (ret < 0)
 		return ret;
 
+	if (last_delalloc_end)
+		delalloc_end = last_delalloc_end;
+	else
+		delalloc_end = page_end;
 	/*
 	 * delalloc_end is already one less than the total length, so
 	 * we don't subtract one from PAGE_SIZE
@@ -1624,7 +1554,8 @@ static int __extent_writepage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl
 					       PAGE_SIZE, !ret);
 		mapping_set_error(page->mapping, ret);
 	}
-	unlock_page(page);
+
+	btrfs_folio_end_all_writers(inode_to_fs_info(inode), folio);
 	ASSERT(ret <= 0);
 	return ret;
 }
diff --git a/fs/btrfs/subpage.c b/fs/btrfs/subpage.c
index 162a10eee3fd..8793c6f6edc1 100644
--- a/fs/btrfs/subpage.c
+++ b/fs/btrfs/subpage.c
@@ -865,6 +865,7 @@ bool btrfs_subpage_find_writer_locked(const struct btrfs_fs_info *fs_info,
 void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info,
 				 struct folio *folio)
 {
+	struct btrfs_subpage *subpage = folio_get_private(folio);
 	u64 folio_start = folio_pos(folio);
 	u64 cur = folio_start;
 
@@ -874,6 +875,11 @@ void btrfs_folio_end_all_writers(const struct btrfs_fs_info *fs_info,
 		return;
 	}
 
+	/* The page has no new delalloc range locked on it. Just plain unlock. */
+	if (atomic_read(&subpage->writers) == 0) {
+		folio_unlock(folio);
+		return;
+	}
 	while (cur < folio_start + PAGE_SIZE) {
 		u64 found_start;
 		u32 found_len;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] btrfs: initial subpage support for zoned devices
  2024-02-19  6:08 [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
                   ` (3 preceding siblings ...)
  2024-02-19  6:08 ` [PATCH 4/4] btrfs: migrate writepage_delalloc() to use subpage helpers Qu Wenruo
@ 2024-03-04  3:13 ` Qu Wenruo
  2024-03-04  5:10   ` Neal Gompa
  4 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-03-04  3:13 UTC (permalink / raw)
  To: linux-btrfs

Ping?

I know this is a very niche scenario (subpage + zoned), and the change 
itself looks very scary, but the change should be safe for non-subpage 
routine (as the new lock all delalloc ranges covers the page would still 
at most lock one delalloc range for normal cases).

Furthermore without this series, there seems be no proper way to support 
subpage + zoned, unless we do a much larger change to merge 
extent_writepage_io() into run_delalloc_range() (which I believe it's 
still needed, but can be done in the future).

Thanks,
Qu

在 2024/2/19 16:38, Qu Wenruo 写道:
> [REPO]
> https://github.com/adam900710/linux/tree/subpage_delalloc
> 
> Please fetch the whole series for testing, as it relies on 3 submitted
> patches.
> 
> [BUG]
> When running subpage btrfs (sectorsize < PAGE_SIZE) with zoned device,
> btrfs can easily crash (with CONFIG_BTRFS_ASSERT enabled) with an
> ASSERT():
> 
>   assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/extent_io.c:1384!
>   CPU: 2 PID: 1711 Comm: fsstress Tainted: G           OE      6.8.0-rc4-custom+ #9
>   Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023
>   pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : __extent_writepage_io+0x404/0x460 [btrfs]
>   lr : __extent_writepage_io+0x404/0x460 [btrfs]
>   Call trace:
>    __extent_writepage_io+0x404/0x460 [btrfs]
>    extent_write_locked_range+0x16c/0x460 [btrfs]
>    run_delalloc_cow+0x88/0x118 [btrfs]
>    btrfs_run_delalloc_range+0x128/0x228 [btrfs]
>    writepage_delalloc+0xb8/0x178 [btrfs]
>    __extent_writepage+0xc8/0x3a0 [btrfs]
>    extent_write_cache_pages+0x1cc/0x460 [btrfs]
>    extent_writepages+0x8c/0x120 [btrfs]
>    btrfs_writepages+0x18/0x30 [btrfs]
>    do_writepages+0x94/0x1f8
>    filemap_fdatawrite_wbc+0x88/0xc8
>    __filemap_fdatawrite_range+0x6c/0xa8
>    filemap_flush+0x24/0x38
>    btrfs_remap_file_range_prep+0x100/0x1a8 [btrfs]
>    btrfs_remap_file_range+0x2ec/0x448 [btrfs]
>    vfs_copy_file_range+0x4cc/0x520
>    __do_sys_copy_file_range+0xc4/0x2e8
>    __arm64_sys_copy_file_range+0x30/0xd0
>    invoke_syscall+0x78/0x100
>    el0_svc_common.constprop.0+0x48/0xf0
>    do_el0_svc+0x24/0x38
>    el0_svc+0x3c/0x138
>    el0t_64_sync_handler+0x120/0x130
>    el0t_64_sync+0x194/0x198
>   Code: 9108c021 90000be0 913d8000 9402bfad (d4210000)
>   ---[ end trace 0000000000000000 ]---
> 
> [CAUSE]
> There are several problems involved in this case:
> 
> - __extent_writepage_io() would always try writeback the whole page
> - extent_write_locked_range() would only lock the first delalloc range
> 
> This two limits combined, result the following page cache layout to
> cause the problem:
> 
>       0     4K     8K    12K    16K
>       |/////|      |/////|
> 
> - btrfs_run_delalloc_range() called on the above page
> - run_dealloc_cow() ran for range [0, 4K)
> - __extent_writepage_io() called for the whole page
> - __extent_writepage_io() submitted bio for [0, 4K)
> - __extent_writepage_io() try to submit IO for [8K, 12K)
>    But this range has no OE covered, and the existing extent map is a
>    hole, and triggered the ASSERT().
> 
>    Even if we ignore the ASSERT(), we would still hit other problems.
> 
> - run_delalloc_cow() would unlock the full page.
> - btrfs_run_delalloc_range() called again on the page
>    The page is no longer locked, triggering another ASSERT() on
>    PageLocked.
> 
> [FIX]
> This series would try to fix the problem by:
> 
> - making __extent_writepage_io() to only write the specified range
> - making writepage_delalloc() to lock all delalloc range first
>    Then btrfs_run_delalloc_range() for each locked range.
> 
>    This patch is a temporaray solution, until needed subpage interfaces
>    are introduced, and allowing me to do extra testing to make sure the
>    lock-in-one-go behavior is safe.
> 
>    This is a preparation for allowing subpage delalloc async submission.
> 
> - adding subpage interfaces to go through all subpage locked ranges
> 
> - using new subpage interfaces to make sure the full page is only
>    unlocked when the last writer lock owner is releasing the lock
> 
> But please note, this fix is only for the above mentioned problems.
> There can still be other problems related to zoned+subpage, but at least
> we won't trigger ASSERT()s and crash.
> 
> Qu Wenruo (4):
>    btrfs: make __extent_writepage_io() to write specified range only
>    btrfs: lock subpage ranges in one go for writepage_delalloc()
>    btrfs: subpage: introduce helpers to handle subpage delalloc locking
>    btrfs: migrate writepage_delalloc() to use subpage helpers
> 
>   fs/btrfs/extent_io.c |  95 +++++++++++++++++++++++-----
>   fs/btrfs/subpage.c   | 144 +++++++++++++++++++++++++++++++++++++++++--
>   fs/btrfs/subpage.h   |  10 ++-
>   3 files changed, 228 insertions(+), 21 deletions(-)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] btrfs: initial subpage support for zoned devices
  2024-03-04  3:13 ` [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
@ 2024-03-04  5:10   ` Neal Gompa
  2024-03-04  7:32     ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Neal Gompa @ 2024-03-04  5:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Mar 3, 2024 at 10:13 PM Qu Wenruo <wqu@suse.com> wrote:
>
> Ping?
>
> I know this is a very niche scenario (subpage + zoned), and the change
> itself looks very scary, but the change should be safe for non-subpage
> routine (as the new lock all delalloc ranges covers the page would still
> at most lock one delalloc range for normal cases).
>
> Furthermore without this series, there seems be no proper way to support
> subpage + zoned, unless we do a much larger change to merge
> extent_writepage_io() into run_delalloc_range() (which I believe it's
> still needed, but can be done in the future).
>

On the contrary, I don't think this is a niche scenario at all. Quite
the opposite: I expect this to be a *very* common scenario because we
will see AArch64 systems increasingly rely on subpage because 16K
AArch64 Linux is used on two very popular platforms: Apple Silicon
Macs (Fedora Asahi Remix) and Raspberry Pi 5 (Raspbian/RPi OS).

We *need* this series, but I do not have the hardware to stress this
patch set, unfortunately.

The code otherwise looks reasonable to me, though.

Acked-by: Neal Gompa <neal@gompa.dev>


-- 
真実はいつも一つ！/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] btrfs: initial subpage support for zoned devices
  2024-03-04  5:10   ` Neal Gompa
@ 2024-03-04  7:32     ` Qu Wenruo
  2024-03-04 10:51       ` Neal Gompa
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2024-03-04  7:32 UTC (permalink / raw)
  To: Neal Gompa, Qu Wenruo; +Cc: linux-btrfs

在 2024/3/4 15:40, Neal Gompa 写道:
> On Sun, Mar 3, 2024 at 10:13 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Ping?
>>
>> I know this is a very niche scenario (subpage + zoned), and the change
>> itself looks very scary, but the change should be safe for non-subpage
>> routine (as the new lock all delalloc ranges covers the page would still
>> at most lock one delalloc range for normal cases).
>>
>> Furthermore without this series, there seems be no proper way to support
>> subpage + zoned, unless we do a much larger change to merge
>> extent_writepage_io() into run_delalloc_range() (which I believe it's
>> still needed, but can be done in the future).
>>
>
> On the contrary, I don't think this is a niche scenario at all. Quite
> the opposite: I expect this to be a *very* common scenario because we
> will see AArch64 systems increasingly rely on subpage because 16K
> AArch64 Linux is used on two very popular platforms: Apple Silicon
> Macs (Fedora Asahi Remix) and Raspberry Pi 5 (Raspbian/RPi OS).

In fact, the above 2 platforms further prove this is still a very niche
combination, at least for consumer hardware.

Apple Silicons, you know it's the usual anti-repair and anti-customer
Apple, there is no way to add a zoned device natively, and if one goes a
convertor/dongle, I strongly doubt if a SATA/SAS to USB convertor
supports APPEND operation correctly.

It's possible to go thunderbolt -> PCIE -> U.2/NVME to attach a ZNS
device, but I strongly doubt if any Asahi Linux user has such hardware
to go in the first place.

It's the same for RPI5, I really appreciate the performance improvement
since RPI4, but the IO is even worse than Apple, and not really get any
better even in RPI5.

For my environment, it's indeed aarch64, but with a better board with a
lot of more IOs (RK3588, 4x PCIE3.0 + 1x PCIE2.0 + 1x PCIE2.0), but it's
still not ideal, and I have to go tcmu-runner to emulate a zoned HDD.

Unfortunately, we're still in the wild west of subpage world.

Thanks,
Qu
>
> We *need* this series, but I do not have the hardware to stress this
> patch set, unfortunately.
>
> The code otherwise looks reasonable to me, though.
>
> Acked-by: Neal Gompa <neal@gompa.dev>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/4] btrfs: initial subpage support for zoned devices
  2024-03-04  7:32     ` Qu Wenruo
@ 2024-03-04 10:51       ` Neal Gompa
  0 siblings, 0 replies; 14+ messages in thread
From: Neal Gompa @ 2024-03-04 10:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Mon, Mar 4, 2024 at 2:32 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/3/4 15:40, Neal Gompa 写道:
> > On Sun, Mar 3, 2024 at 10:13 PM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> Ping?
> >>
> >> I know this is a very niche scenario (subpage + zoned), and the change
> >> itself looks very scary, but the change should be safe for non-subpage
> >> routine (as the new lock all delalloc ranges covers the page would still
> >> at most lock one delalloc range for normal cases).
> >>
> >> Furthermore without this series, there seems be no proper way to support
> >> subpage + zoned, unless we do a much larger change to merge
> >> extent_writepage_io() into run_delalloc_range() (which I believe it's
> >> still needed, but can be done in the future).
> >>
> >
> > On the contrary, I don't think this is a niche scenario at all. Quite
> > the opposite: I expect this to be a *very* common scenario because we
> > will see AArch64 systems increasingly rely on subpage because 16K
> > AArch64 Linux is used on two very popular platforms: Apple Silicon
> > Macs (Fedora Asahi Remix) and Raspberry Pi 5 (Raspbian/RPi OS).
>
> In fact, the above 2 platforms further prove this is still a very niche
> combination, at least for consumer hardware.
>
> Apple Silicons, you know it's the usual anti-repair and anti-customer
> Apple, there is no way to add a zoned device natively, and if one goes a
> convertor/dongle, I strongly doubt if a SATA/SAS to USB convertor
> supports APPEND operation correctly.
>
> It's possible to go thunderbolt -> PCIE -> U.2/NVME to attach a ZNS
> device, but I strongly doubt if any Asahi Linux user has such hardware
> to go in the first place.
>
> It's the same for RPI5, I really appreciate the performance improvement
> since RPI4, but the IO is even worse than Apple, and not really get any
> better even in RPI5.
>
> For my environment, it's indeed aarch64, but with a better board with a
> lot of more IOs (RK3588, 4x PCIE3.0 + 1x PCIE2.0 + 1x PCIE2.0), but it's
> still not ideal, and I have to go tcmu-runner to emulate a zoned HDD.
>
> Unfortunately, we're still in the wild west of subpage world.
>

My understanding is that some server and cloud platforms are planning
to move to AArch64 16K pages precisely because of those two platforms.
So yes, it will become more common, not less.

Even Android is working on moving to 16K pages now.



-- 
真実はいつも一つ！/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-04 10:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-19  6:08 [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
2024-02-19  6:08 ` [PATCH 1/4] btrfs: make __extent_writepage_io() to write specified range only Qu Wenruo
2024-02-19  6:08 ` [PATCH 2/4] btrfs: lock subpage ranges in one go for writepage_delalloc() Qu Wenruo
2024-02-19  6:08 ` [PATCH 3/4] btrfs: subpage: introduce helpers to handle subpage delalloc locking Qu Wenruo
2024-02-20  0:52   ` kernel test robot
2024-02-20  1:16     ` Qu Wenruo
2024-02-20  7:58       ` Yujie Liu
2024-02-20  8:26         ` Qu Wenruo
2024-02-20  9:23           ` Yujie Liu
2024-02-19  6:08 ` [PATCH 4/4] btrfs: migrate writepage_delalloc() to use subpage helpers Qu Wenruo
2024-03-04  3:13 ` [PATCH 0/4] btrfs: initial subpage support for zoned devices Qu Wenruo
2024-03-04  5:10   ` Neal Gompa
2024-03-04  7:32     ` Qu Wenruo
2024-03-04 10:51       ` Neal Gompa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox