public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
@ 2023-07-12  6:37 Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 1/6] btrfs: tests: enhance extent buffer bitmap tests Qu Wenruo
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers

[BACKGROUND]

Recently I'm checking on the feasibility on converting metadata handling
to go a folio based solution.

The best part of using a single folio for metadata is, we can get rid of
the complexity of cross-page handling, everything would be just a single
memory operation on a continuous memory range.

[PITFALLS]

One of the biggest problem for metadata folio conversion is, we still
need the current page based solution (or folios with order 0) as a
fallback solution when we can not get a high order folio.

In that case, there would be a hell to handle the four different
combinations (folio/folio, folio/page, page/folio, page/page) for extent
buffer helpers involving two extent buffers.

Although there are some new ideas on how to handle metadata memory (e.g.
go full vmallocated memory), reducing the open-coded memory handling for
metadata should always be a good start point.

[OBJECTIVE]

So this patchset is the preparation to reduce direct page operations for
metadata.

The patchset would do this mostly by concentrating the operations to use
the common helper, write_extent_buffer() and read_extent_buffer().

For bitmap operations it's much complex, thus this patchset refactor it
completely to go a 3 part solution:

- Handle the first byte
- Handle the byte aligned ranges
- Handle the last byte

This needs more complex testing (which I failed several times during
development) to prevent regression.

Finally there is only one function which can not be properly migrated,
memmove_extent_buffer(), which has to use memmove() calls, thus must go
per-page mapping handling.

Thankfully if we go folio in the end, the folio based handling would
just be a single memmove(), thus it won't be too much burden.


Qu Wenruo (6):
  btrfs: tests: enhance extent buffer bitmap tests
  btrfs: refactor extent buffer bitmaps operations
  btrfs: use write_extent_buffer() to implement
    write_extent_buffer_*id()
  btrfs: refactor memcpy_extent_buffer()
  btrfs: refactor copy_extent_buffer_full()
  btrfs: call copy_extent_buffer_full() inside
    btrfs_clone_extent_buffer()

 fs/btrfs/extent_io.c             | 224 +++++++++++++------------------
 fs/btrfs/extent_io.h             |  19 ++-
 fs/btrfs/tests/extent-io-tests.c | 161 ++++++++++++++--------
 3 files changed, 215 insertions(+), 189 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/6] btrfs: tests: enhance extent buffer bitmap tests
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
@ 2023-07-12  6:37 ` Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 2/6] btrfs: refactor extent buffer bitmaps operations Qu Wenruo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

Enhance extent bitmap tests for the following aspects:

- Remove unnecessary @len from __test_eb_bitmaps()
  We can fetch the length from extent buffer

- Explicitly distinguish bit and byte length
  Now every start/len inside bitmap tests would have either "byte_" or
  "bit_" prefix to make it more explicit.

- Better error reporting

  If we have mismatch bits, the error report would dump the following
  contents:

  * start bytenr
  * bit number
  * the full byte from bitmap
  * the full byte from the extent

  This is to save developers time so obvious problem can be found
  immediately

- Extract bitmap set/clear and check operation into two helpers
  This is to save some code lines, as we will have more tests to do.

- Add new tests

  The following tests are added, mostly for the incoming extent bitmap
  accessor refactor:

  * Set bits inside the same byte
  * Clear bits inside the same byte
  * Cross byte boundary set
  * Cross byte boundary clear
  * Cross multi-byte boundary set
  * Cross multi-byte boundary clear

  Those new tests have already saved my backend for the incoming extent
  buffer bitmap refactor.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tests/extent-io-tests.c | 161 +++++++++++++++++++++----------
 1 file changed, 108 insertions(+), 53 deletions(-)

diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index f6bc6d738555..f97f344e28ab 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -319,86 +319,138 @@ static int test_find_delalloc(u32 sectorsize)
 	return ret;
 }
 
-static int check_eb_bitmap(unsigned long *bitmap, struct extent_buffer *eb,
-			   unsigned long len)
+static int check_eb_bitmap(unsigned long *bitmap, struct extent_buffer *eb)
 {
 	unsigned long i;
 
-	for (i = 0; i < len * BITS_PER_BYTE; i++) {
+	for (i = 0; i < eb->len * BITS_PER_BYTE; i++) {
 		int bit, bit1;
 
 		bit = !!test_bit(i, bitmap);
 		bit1 = !!extent_buffer_test_bit(eb, 0, i);
 		if (bit1 != bit) {
-			test_err("bits do not match");
+			u8 has;
+			u8 expect;
+
+			read_extent_buffer(eb, &has, i / BITS_PER_BYTE, 1);
+			expect = bitmap_get_value8(bitmap, ALIGN(i, BITS_PER_BYTE));
+
+			test_err("bits not match, start byte 0 bit %lu, byte %lu has 0x%02x expect 0x%02x",
+				 i, i / BITS_PER_BYTE, has, expect);
 			return -EINVAL;
 		}
 
 		bit1 = !!extent_buffer_test_bit(eb, i / BITS_PER_BYTE,
 						i % BITS_PER_BYTE);
 		if (bit1 != bit) {
-			test_err("offset bits do not match");
+			u8 has;
+			u8 expect;
+
+			read_extent_buffer(eb, &has, i / BITS_PER_BYTE, 1);
+			expect = bitmap_get_value8(bitmap, ALIGN(i, BITS_PER_BYTE));
+
+			test_err("bits not match, start byte %lu bit %lu, byte %lu has 0x%02x expect 0x%02x",
+				 i / BITS_PER_BYTE, i % BITS_PER_BYTE,
+				 i / BITS_PER_BYTE, has, expect);
 			return -EINVAL;
 		}
 	}
 	return 0;
 }
 
-static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
-			     unsigned long len)
+static int test_bitmap_set(const char *name, unsigned long *bitmap,
+			   struct extent_buffer *eb,
+			   unsigned long byte_start, unsigned long bit_start,
+			   unsigned long bit_len)
+{
+	int ret;
+
+	bitmap_set(bitmap, byte_start * BITS_PER_BYTE + bit_start, bit_len);
+	extent_buffer_bitmap_set(eb, byte_start, bit_start, bit_len);
+	ret = check_eb_bitmap(bitmap, eb);
+	if (ret < 0)
+		test_err("%s test failed", name);
+	return ret;
+}
+
+static int test_bitmap_clear(const char *name, unsigned long *bitmap,
+			     struct extent_buffer *eb,
+			     unsigned long byte_start, unsigned long bit_start,
+			     unsigned long bit_len)
+{
+	int ret;
+
+	bitmap_clear(bitmap, byte_start * BITS_PER_BYTE + bit_start, bit_len);
+	extent_buffer_bitmap_clear(eb, byte_start, bit_start, bit_len);
+	ret = check_eb_bitmap(bitmap, eb);
+	if (ret < 0)
+		test_err("%s test failed", name);
+	return ret;
+}
+static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb)
 {
 	unsigned long i, j;
+	unsigned long byte_len = eb->len;
 	u32 x;
 	int ret;
 
-	memset(bitmap, 0, len);
-	memzero_extent_buffer(eb, 0, len);
-	if (memcmp_extent_buffer(eb, bitmap, 0, len) != 0) {
-		test_err("bitmap was not zeroed");
-		return -EINVAL;
-	}
-
-	bitmap_set(bitmap, 0, len * BITS_PER_BYTE);
-	extent_buffer_bitmap_set(eb, 0, 0, len * BITS_PER_BYTE);
-	ret = check_eb_bitmap(bitmap, eb, len);
-	if (ret) {
-		test_err("setting all bits failed");
+	ret = test_bitmap_clear("clear all run 1", bitmap, eb, 0, 0,
+				byte_len * BITS_PER_BYTE);
+	if (ret < 0)
 		return ret;
-	}
 
-	bitmap_clear(bitmap, 0, len * BITS_PER_BYTE);
-	extent_buffer_bitmap_clear(eb, 0, 0, len * BITS_PER_BYTE);
-	ret = check_eb_bitmap(bitmap, eb, len);
-	if (ret) {
-		test_err("clearing all bits failed");
+	ret = test_bitmap_set("set all", bitmap, eb, 0, 0,
+			      byte_len * BITS_PER_BYTE);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_clear("clear all run 2", bitmap, eb, 0, 0,
+				byte_len * BITS_PER_BYTE);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_set("same byte set", bitmap, eb, 0, 2, 4);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_clear("same byte partial clear", bitmap, eb, 0, 4, 1);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_set("cross byte set", bitmap, eb, 2, 4, 8);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_set("cross multi byte set", bitmap, eb, 4, 4, 24);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_clear("cross byte clear", bitmap, eb, 2, 6, 4);
+	if (ret < 0)
+		return ret;
+
+	ret = test_bitmap_clear("cross multi byte clear", bitmap, eb, 4, 6, 20);
+	if (ret < 0)
 		return ret;
-	}
 
 	/* Straddling pages test */
-	if (len > PAGE_SIZE) {
-		bitmap_set(bitmap,
-			(PAGE_SIZE - sizeof(long) / 2) * BITS_PER_BYTE,
-			sizeof(long) * BITS_PER_BYTE);
-		extent_buffer_bitmap_set(eb, PAGE_SIZE - sizeof(long) / 2, 0,
-					sizeof(long) * BITS_PER_BYTE);
-		ret = check_eb_bitmap(bitmap, eb, len);
-		if (ret) {
-			test_err("setting straddling pages failed");
+	if (byte_len > PAGE_SIZE) {
+		ret = test_bitmap_set("cross page set", bitmap, eb,
+				      PAGE_SIZE - sizeof(long) / 2, 0,
+				      sizeof(long) * BITS_PER_BYTE);
+		if (ret < 0)
 			return ret;
-		}
 
-		bitmap_set(bitmap, 0, len * BITS_PER_BYTE);
-		bitmap_clear(bitmap,
-			(PAGE_SIZE - sizeof(long) / 2) * BITS_PER_BYTE,
-			sizeof(long) * BITS_PER_BYTE);
-		extent_buffer_bitmap_set(eb, 0, 0, len * BITS_PER_BYTE);
-		extent_buffer_bitmap_clear(eb, PAGE_SIZE - sizeof(long) / 2, 0,
-					sizeof(long) * BITS_PER_BYTE);
-		ret = check_eb_bitmap(bitmap, eb, len);
-		if (ret) {
-			test_err("clearing straddling pages failed");
+		ret = test_bitmap_set("cross page set all", bitmap, eb, 0, 0,
+				      byte_len * BITS_PER_BYTE);
+		if (ret < 0)
+			return ret;
+
+		ret = test_bitmap_clear("cross page clear", bitmap, eb,
+				PAGE_SIZE - sizeof(long) / 2, 0,
+				sizeof(long) * BITS_PER_BYTE);
+		if (ret < 0)
 			return ret;
-		}
 	}
 
 	/*
@@ -406,9 +458,12 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
 	 * something repetitive that could miss some hypothetical off-by-n bug.
 	 */
 	x = 0;
-	bitmap_clear(bitmap, 0, len * BITS_PER_BYTE);
-	extent_buffer_bitmap_clear(eb, 0, 0, len * BITS_PER_BYTE);
-	for (i = 0; i < len * BITS_PER_BYTE / 32; i++) {
+	ret = test_bitmap_clear("clear all run 3", bitmap, eb, 0, 0,
+				byte_len * BITS_PER_BYTE);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < byte_len * BITS_PER_BYTE / 32; i++) {
 		x = (0x19660dULL * (u64)x + 0x3c6ef35fULL) & 0xffffffffU;
 		for (j = 0; j < 32; j++) {
 			if (x & (1U << j)) {
@@ -418,7 +473,7 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
 		}
 	}
 
-	ret = check_eb_bitmap(bitmap, eb, len);
+	ret = check_eb_bitmap(bitmap, eb);
 	if (ret) {
 		test_err("random bit pattern failed");
 		return ret;
@@ -456,7 +511,7 @@ static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
+	ret = __test_eb_bitmaps(bitmap, eb);
 	if (ret)
 		goto out;
 
@@ -473,7 +528,7 @@ static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
 		goto out;
 	}
 
-	ret = __test_eb_bitmaps(bitmap, eb, nodesize);
+	ret = __test_eb_bitmaps(bitmap, eb);
 out:
 	free_extent_buffer(eb);
 	kfree(bitmap);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/6] btrfs: refactor extent buffer bitmaps operations
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 1/6] btrfs: tests: enhance extent buffer bitmap tests Qu Wenruo
@ 2023-07-12  6:37 ` Qu Wenruo
  2023-07-13 11:53   ` David Sterba
  2023-07-12  6:37 ` [PATCH v2 3/6] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() Qu Wenruo
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

[BACKGROUND]
Currently we handle extent bitmaps manually in
extent_buffer_bitmap_set() and extent_buffer_bitmap_clear().

Although with various helper like eb_bitmap_offset() it's still a little
messy to read.
The code seems to be a copy of bitmap_set(), but with all the cross-page
handling embedded into the code.

[ENHANCEMENT]
This patch would enhance the readability by introducing two helpers:

- memset_extent_buffer()
  To handle the byte aligned range, thus all the cross-page handling is
  done there.

- extent_buffer_get_byte()
  This for the first and the last byte operation, which only needs to
  grab one byte, thus no need for any cross-page handling.

So we can split both extent_buffer_bitmap_set() and
extent_buffer_bitmap_clear() into 3 parts:

- Handle the first byte
  If the range fits inside the first byte, we can exit early.

- Handle the byte aligned part
  This is the part which can have cross-page operations, and it would
  be handled by memset_extent_buffer().

- Handle the last byte

This refactor does not only make the code a little easier to read, but
also make later folio/page switch much easier, as the switch only needs
to be done inside memset_extent_buffer() and extent_buffer_get_byte().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 141 +++++++++++++++++++++----------------------
 1 file changed, 68 insertions(+), 73 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a845a90d46f7..6a7abcbe6bec 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4229,32 +4229,30 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	}
 }
 
+static void memset_extent_buffer(const struct extent_buffer *eb, int c,
+				 unsigned long start, unsigned long len)
+{
+	unsigned long cur = start;
+
+	while (cur < start + len) {
+		int index = get_eb_page_index(cur);
+		int offset = get_eb_offset_in_page(eb, cur);
+		int cur_len = min(start + len - cur, PAGE_SIZE - offset);
+		struct page *page = eb->pages[index];
+
+		assert_eb_page_uptodate(eb, page);
+		memset(page_address(page) + offset, c, cur_len);
+
+		cur += cur_len;
+	}
+}
+
 void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 		unsigned long len)
 {
-	size_t cur;
-	size_t offset;
-	struct page *page;
-	char *kaddr;
-	unsigned long i = get_eb_page_index(start);
-
 	if (check_eb_range(eb, start, len))
 		return;
-
-	offset = get_eb_offset_in_page(eb, start);
-
-	while (len > 0) {
-		page = eb->pages[i];
-		assert_eb_page_uptodate(eb, page);
-
-		cur = min(len, PAGE_SIZE - offset);
-		kaddr = page_address(page);
-		memset(kaddr + offset, 0, cur);
-
-		len -= cur;
-		offset = 0;
-		i++;
-	}
+	return memset_extent_buffer(eb, 0, start, len);
 }
 
 void copy_extent_buffer_full(const struct extent_buffer *dst,
@@ -4371,6 +4369,15 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
 }
 
+static u8 *extent_buffer_get_byte(const struct extent_buffer *eb, unsigned long bytenr)
+{
+	unsigned long index = get_eb_page_index(bytenr);
+
+	if (check_eb_range(eb, bytenr, 1))
+		return NULL;
+	return page_address(eb->pages[index]) + get_eb_offset_in_page(eb, bytenr);
+}
+
 /*
  * Set an area of a bitmap to 1.
  *
@@ -4382,35 +4389,29 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
 void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long start,
 			      unsigned long pos, unsigned long len)
 {
+	unsigned int first_byte = start + BIT_BYTE(pos);
+	unsigned int last_byte = start + BIT_BYTE(pos + len - 1);
+	bool same_byte = (first_byte == last_byte);
+	u8 mask = BITMAP_FIRST_BYTE_MASK(pos);
 	u8 *kaddr;
-	struct page *page;
-	unsigned long i;
-	size_t offset;
-	const unsigned int size = pos + len;
-	int bits_to_set = BITS_PER_BYTE - (pos % BITS_PER_BYTE);
-	u8 mask_to_set = BITMAP_FIRST_BYTE_MASK(pos);
 
-	eb_bitmap_offset(eb, start, pos, &i, &offset);
-	page = eb->pages[i];
-	assert_eb_page_uptodate(eb, page);
-	kaddr = page_address(page);
+	if (same_byte)
+		mask &= BITMAP_LAST_BYTE_MASK(pos + len);
 
-	while (len >= bits_to_set) {
-		kaddr[offset] |= mask_to_set;
-		len -= bits_to_set;
-		bits_to_set = BITS_PER_BYTE;
-		mask_to_set = ~0;
-		if (++offset >= PAGE_SIZE && len > 0) {
-			offset = 0;
-			page = eb->pages[++i];
-			assert_eb_page_uptodate(eb, page);
-			kaddr = page_address(page);
-		}
-	}
-	if (len) {
-		mask_to_set &= BITMAP_LAST_BYTE_MASK(size);
-		kaddr[offset] |= mask_to_set;
-	}
+	/* Handle the first byte. */
+	kaddr = extent_buffer_get_byte(eb, first_byte);
+	*kaddr |= mask;
+	if (same_byte)
+		return;
+
+	/* Handle the byte aligned part. */
+	ASSERT(first_byte + 1 <= last_byte);
+	memset_extent_buffer(eb, 0xff, first_byte + 1,
+			     last_byte - first_byte - 1);
+
+	/* Handle the last byte. */
+	kaddr = extent_buffer_get_byte(eb, last_byte);
+	*kaddr |= BITMAP_LAST_BYTE_MASK(pos + len);
 }
 
 
@@ -4426,35 +4427,29 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 				unsigned long start, unsigned long pos,
 				unsigned long len)
 {
+	int first_byte = start + BIT_BYTE(pos);
+	int last_byte = start + BIT_BYTE(pos + len - 1);
+	bool same_byte = (first_byte == last_byte);
+	u8 mask = BITMAP_FIRST_BYTE_MASK(pos);
 	u8 *kaddr;
-	struct page *page;
-	unsigned long i;
-	size_t offset;
-	const unsigned int size = pos + len;
-	int bits_to_clear = BITS_PER_BYTE - (pos % BITS_PER_BYTE);
-	u8 mask_to_clear = BITMAP_FIRST_BYTE_MASK(pos);
 
-	eb_bitmap_offset(eb, start, pos, &i, &offset);
-	page = eb->pages[i];
-	assert_eb_page_uptodate(eb, page);
-	kaddr = page_address(page);
+	if (same_byte)
+		mask &= BITMAP_LAST_BYTE_MASK(pos + len);
 
-	while (len >= bits_to_clear) {
-		kaddr[offset] &= ~mask_to_clear;
-		len -= bits_to_clear;
-		bits_to_clear = BITS_PER_BYTE;
-		mask_to_clear = ~0;
-		if (++offset >= PAGE_SIZE && len > 0) {
-			offset = 0;
-			page = eb->pages[++i];
-			assert_eb_page_uptodate(eb, page);
-			kaddr = page_address(page);
-		}
-	}
-	if (len) {
-		mask_to_clear &= BITMAP_LAST_BYTE_MASK(size);
-		kaddr[offset] &= ~mask_to_clear;
-	}
+	/* Handle the first byte. */
+	kaddr = extent_buffer_get_byte(eb, first_byte);
+	*kaddr &= ~mask;
+	if (same_byte)
+		return;
+
+	/* Handle the byte aligned part. */
+	ASSERT(first_byte + 1 <= last_byte);
+	memset_extent_buffer(eb, 0, first_byte + 1,
+			     last_byte - first_byte - 1);
+
+	/* Handle the last byte. */
+	kaddr = extent_buffer_get_byte(eb, last_byte);
+	*kaddr &= ~BITMAP_LAST_BYTE_MASK(pos + len);
 }
 
 static inline bool areas_overlap(unsigned long src, unsigned long dst, unsigned long len)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 3/6] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id()
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 1/6] btrfs: tests: enhance extent buffer bitmap tests Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 2/6] btrfs: refactor extent buffer bitmaps operations Qu Wenruo
@ 2023-07-12  6:37 ` Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 4/6] btrfs: refactor memcpy_extent_buffer() Qu Wenruo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

For helpers write_extent_buffer_chunk_tree_uuid() and
write_extent_buffer_fsid(), they can be implemented by
write_extent_buffer().

And those two helpers are not that hot, they only get called during
initialization of a new tree block.

There is not much need for those slightly optimized versions.
And since they can be easily converted to one write_extent_buffer() call,
define them as inline helpers.

This would make later page/folio switch much easier, as all change only
need to happen in write_extent_buffer().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 22 ----------------------
 fs/btrfs/extent_io.h | 19 ++++++++++++++++---
 2 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6a7abcbe6bec..4e252fd7b78a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4175,28 +4175,6 @@ static void assert_eb_page_uptodate(const struct extent_buffer *eb,
 	}
 }
 
-void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
-		const void *srcv)
-{
-	char *kaddr;
-
-	assert_eb_page_uptodate(eb, eb->pages[0]);
-	kaddr = page_address(eb->pages[0]) +
-		get_eb_offset_in_page(eb, offsetof(struct btrfs_header,
-						   chunk_tree_uuid));
-	memcpy(kaddr, srcv, BTRFS_FSID_SIZE);
-}
-
-void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
-{
-	char *kaddr;
-
-	assert_eb_page_uptodate(eb, eb->pages[0]);
-	kaddr = page_address(eb->pages[0]) +
-		get_eb_offset_in_page(eb, offsetof(struct btrfs_header, fsid));
-	memcpy(kaddr, srcv, BTRFS_FSID_SIZE);
-}
-
 void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 			 unsigned long start, unsigned long len)
 {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index c5fae3a7d911..dd46811cf2f5 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -236,11 +236,24 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dst,
 int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
 				       void __user *dst, unsigned long start,
 				       unsigned long len);
-void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *src);
-void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
-		const void *src);
 void write_extent_buffer(const struct extent_buffer *eb, const void *src,
 			 unsigned long start, unsigned long len);
+
+static inline void write_extent_buffer_chunk_tree_uuid(
+		const struct extent_buffer *eb, const void *chunk_tree_uuid)
+{
+	write_extent_buffer(eb, chunk_tree_uuid,
+			    offsetof(struct btrfs_header, chunk_tree_uuid),
+			    BTRFS_FSID_SIZE);
+}
+
+static inline void write_extent_buffer_fsid(const struct extent_buffer *eb,
+		const void *fsid)
+{
+	write_extent_buffer(eb, fsid, offsetof(struct btrfs_header, fsid),
+			    BTRFS_FSID_SIZE);
+}
+
 void copy_extent_buffer_full(const struct extent_buffer *dst,
 			     const struct extent_buffer *src);
 void copy_extent_buffer(const struct extent_buffer *dst,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 4/6] btrfs: refactor memcpy_extent_buffer()
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
                   ` (2 preceding siblings ...)
  2023-07-12  6:37 ` [PATCH v2 3/6] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() Qu Wenruo
@ 2023-07-12  6:37 ` Qu Wenruo
  2023-07-13 11:51   ` David Sterba
  2023-07-12  6:37 ` [PATCH v2 5/6] btrfs: refactor copy_extent_buffer_full() Qu Wenruo
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

[BACKGROUND]
Currently memcpy_extent_buffer() goes a loop where it would stop at
any page boundary inside [dst_offset, dst_offset + len) or [src_offset,
src_offset + len).

This is mostly allowing us to go copy_pages(), but if we're going folio
we will need to handle multi-page (the old behavior) or single folio
(the new optimization).

The current code would be a burden for future changes.

[ENHANCEMENT]
Instead of sticking with copy_pages(), here we utilize
write_extent_buffer() to handle writing into the dst range.

Now we only need to handle the page boundaries inside the source range,
making later switch to folio much easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4e252fd7b78a..4db90ede8219 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4183,6 +4183,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	struct page *page;
 	char *kaddr;
 	char *src = (char *)srcv;
+	/* For unmapped (dummy) ebs, no need to check their uptodate status. */
+	bool check_uptodate = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
 	unsigned long i = get_eb_page_index(start);
 
 	WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags));
@@ -4194,7 +4196,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 
 	while (len > 0) {
 		page = eb->pages[i];
-		assert_eb_page_uptodate(eb, page);
+		if (check_uptodate)
+			assert_eb_page_uptodate(eb, page);
 
 		cur = min(len, PAGE_SIZE - offset);
 		kaddr = page_address(page);
@@ -4462,34 +4465,21 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 			  unsigned long dst_offset, unsigned long src_offset,
 			  unsigned long len)
 {
-	size_t cur;
-	size_t dst_off_in_page;
-	size_t src_off_in_page;
-	unsigned long dst_i;
-	unsigned long src_i;
+	unsigned long cur = src_offset;
 
 	if (check_eb_range(dst, dst_offset, len) ||
 	    check_eb_range(dst, src_offset, len))
 		return;
 
-	while (len > 0) {
-		dst_off_in_page = get_eb_offset_in_page(dst, dst_offset);
-		src_off_in_page = get_eb_offset_in_page(dst, src_offset);
+	while (cur < src_offset + len) {
+		int index = get_eb_page_index(cur);
+		unsigned long offset = get_eb_offset_in_page(dst, cur);
+		unsigned long cur_len = min(src_offset + len - cur, PAGE_SIZE - offset);
+		unsigned long offset_to_start = cur - src_offset;
+		void *src_addr = page_address(dst->pages[index]) + offset;
 
-		dst_i = get_eb_page_index(dst_offset);
-		src_i = get_eb_page_index(src_offset);
-
-		cur = min(len, (unsigned long)(PAGE_SIZE -
-					       src_off_in_page));
-		cur = min_t(unsigned long, cur,
-			(unsigned long)(PAGE_SIZE - dst_off_in_page));
-
-		copy_pages(dst->pages[dst_i], dst->pages[src_i],
-			   dst_off_in_page, src_off_in_page, cur);
-
-		src_offset += cur;
-		dst_offset += cur;
-		len -= cur;
+		write_extent_buffer(dst, src_addr, dst_offset + offset_to_start, cur_len);
+		cur += cur_len;
 	}
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 5/6] btrfs: refactor copy_extent_buffer_full()
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
                   ` (3 preceding siblings ...)
  2023-07-12  6:37 ` [PATCH v2 4/6] btrfs: refactor memcpy_extent_buffer() Qu Wenruo
@ 2023-07-12  6:37 ` Qu Wenruo
  2023-07-12  6:37 ` [PATCH v2 6/6] btrfs: call copy_extent_buffer_full() inside btrfs_clone_extent_buffer() Qu Wenruo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

[BACKGROUND]
copy_extent_buffer_full() currently does different handling for regular
and subpage cases, for regular cases it does a page by page copying.
For subpage cases, it just copy the content.

This is fine for the page based extent buffer code, but for the incoming
folio conversion, it can be a burden to add a new branch just to handle
all the different combinations (subpage vs regular, one single folio vs
multi pages).

[ENHANCE]
Instead of handling the different combinations, just go one single
handling for all cases, utilizing write_extent_buffer() to do the
copying.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4db90ede8219..c84e64181acb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4239,24 +4239,19 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 void copy_extent_buffer_full(const struct extent_buffer *dst,
 			     const struct extent_buffer *src)
 {
-	int i;
-	int num_pages;
+	unsigned long cur = 0;
 
 	ASSERT(dst->len == src->len);
 
-	if (dst->fs_info->nodesize >= PAGE_SIZE) {
-		num_pages = num_extent_pages(dst);
-		for (i = 0; i < num_pages; i++)
-			copy_page(page_address(dst->pages[i]),
-				  page_address(src->pages[i]));
-	} else {
-		size_t src_offset = get_eb_offset_in_page(src, 0);
-		size_t dst_offset = get_eb_offset_in_page(dst, 0);
+	while (cur < src->len) {
+		int index = get_eb_page_index(cur);
+		unsigned long offset = get_eb_offset_in_page(src, cur);
+		unsigned long cur_len = min(src->len, PAGE_SIZE - offset);
+		void *addr = page_address(src->pages[index]) + offset;
 
-		ASSERT(src->fs_info->nodesize < PAGE_SIZE);
-		memcpy(page_address(dst->pages[0]) + dst_offset,
-		       page_address(src->pages[0]) + src_offset,
-		       src->len);
+		write_extent_buffer(dst, addr, cur, cur_len);
+
+		cur += cur_len;
 	}
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 6/6] btrfs: call copy_extent_buffer_full() inside btrfs_clone_extent_buffer()
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
                   ` (4 preceding siblings ...)
  2023-07-12  6:37 ` [PATCH v2 5/6] btrfs: refactor copy_extent_buffer_full() Qu Wenruo
@ 2023-07-12  6:37 ` Qu Wenruo
  2023-07-13 11:55   ` David Sterba
  2023-07-12  6:44 ` [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Sweet Tea Dorminy
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12  6:37 UTC (permalink / raw)
  To: linux-btrfs

Function btrfs_clone_extent_buffer() is calling of copy_page() directly.

To make later migration for folio easier, just call
copy_extent_buffer_full() instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c84e64181acb..7f0a532de645 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3285,8 +3285,8 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
 			return NULL;
 		}
 		WARN_ON(PageDirty(p));
-		copy_page(page_address(p), page_address(src->pages[i]));
 	}
+	copy_extent_buffer_full(new, src);
 	set_extent_buffer_uptodate(new);
 
 	return new;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
                   ` (5 preceding siblings ...)
  2023-07-12  6:37 ` [PATCH v2 6/6] btrfs: call copy_extent_buffer_full() inside btrfs_clone_extent_buffer() Qu Wenruo
@ 2023-07-12  6:44 ` Sweet Tea Dorminy
  2023-07-12 16:41 ` Christoph Hellwig
  2023-07-13 12:09 ` David Sterba
  8 siblings, 0 replies; 29+ messages in thread
From: Sweet Tea Dorminy @ 2023-07-12  6:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 7/12/23 02:37, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
> 
> [BACKGROUND]
> 
> Recently I'm checking on the feasibility on converting metadata handling
> to go a folio based solution.
> 
> The best part of using a single folio for metadata is, we can get rid of
> the complexity of cross-page handling, everything would be just a single
> memory operation on a continuous memory range.
> 
> [PITFALLS]
> 
> One of the biggest problem for metadata folio conversion is, we still
> need the current page based solution (or folios with order 0) as a
> fallback solution when we can not get a high order folio.
> 
> In that case, there would be a hell to handle the four different
> combinations (folio/folio, folio/page, page/folio, page/page) for extent
> buffer helpers involving two extent buffers.
> 
> Although there are some new ideas on how to handle metadata memory (e.g.
> go full vmallocated memory), reducing the open-coded memory handling for
> metadata should always be a good start point.
> 
> [OBJECTIVE]
> 
> So this patchset is the preparation to reduce direct page operations for
> metadata.
> 
> The patchset would do this mostly by concentrating the operations to use
> the common helper, write_extent_buffer() and read_extent_buffer().
> 
> For bitmap operations it's much complex, thus this patchset refactor it
> completely to go a 3 part solution:
> 
> - Handle the first byte
> - Handle the byte aligned ranges
> - Handle the last byte
> 
> This needs more complex testing (which I failed several times during
> development) to prevent regression.
> 
> Finally there is only one function which can not be properly migrated,
> memmove_extent_buffer(), which has to use memmove() calls, thus must go
> per-page mapping handling.
> 
> Thankfully if we go folio in the end, the folio based handling would
> just be a single memmove(), thus it won't be too much burden.
> 
> 
> Qu Wenruo (6):
>    btrfs: tests: enhance extent buffer bitmap tests
>    btrfs: refactor extent buffer bitmaps operations
>    btrfs: use write_extent_buffer() to implement
>      write_extent_buffer_*id()
>    btrfs: refactor memcpy_extent_buffer()
>    btrfs: refactor copy_extent_buffer_full()
>    btrfs: call copy_extent_buffer_full() inside
>      btrfs_clone_extent_buffer()
> 
>   fs/btrfs/extent_io.c             | 224 +++++++++++++------------------
>   fs/btrfs/extent_io.h             |  19 ++-
>   fs/btrfs/tests/extent-io-tests.c | 161 ++++++++++++++--------
>   3 files changed, 215 insertions(+), 189 deletions(-)
> 

For the series:
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
                   ` (6 preceding siblings ...)
  2023-07-12  6:44 ` [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Sweet Tea Dorminy
@ 2023-07-12 16:41 ` Christoph Hellwig
  2023-07-12 23:58   ` Qu Wenruo
  2023-07-13 12:09 ` David Sterba
  8 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2023-07-12 16:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, willy, linux-mm

On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
> One of the biggest problem for metadata folio conversion is, we still
> need the current page based solution (or folios with order 0) as a
> fallback solution when we can not get a high order folio.

Do we?  btrfs by default uses a 16k nodesize (order 2 on x86), with
a maximum of 64k (order 4).  IIRC we should be able to get them pretty
reliably.

If not the best thning is to just a virtually contigous allocation as
fallback, i.e. use vm_map_ram.  That's what XFS uses in it's buffer
cache, and it already did so before it stopped to use page cache to
back it's buffer cache, something I plan to do for the btrfs buffer
cache as well, as the page cache algorithms tend to not work very
well for buffer based metadata, never mind that there is an incredible
amount of complex code just working around the interactions.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-12 16:41 ` Christoph Hellwig
@ 2023-07-12 23:58   ` Qu Wenruo
  2023-07-13 11:16     ` Christoph Hellwig
  2023-07-13 11:26     ` David Sterba
  0 siblings, 2 replies; 29+ messages in thread
From: Qu Wenruo @ 2023-07-12 23:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-btrfs, willy, linux-mm



On 2023/7/13 00:41, Christoph Hellwig wrote:
> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
>> One of the biggest problem for metadata folio conversion is, we still
>> need the current page based solution (or folios with order 0) as a
>> fallback solution when we can not get a high order folio.
> 
> Do we?  btrfs by default uses a 16k nodesize (order 2 on x86), with
> a maximum of 64k (order 4).  IIRC we should be able to get them pretty
> reliably.

If it can be done as reliable as order 0 with NOFAIL, I'm totally fine 
with that.

> 
> If not the best thning is to just a virtually contigous allocation as
> fallback, i.e. use vm_map_ram.

That's also what Sweet Tea Dorminy mentioned, and I believe it's the 
correct way to go (as the fallback)

Although my concern is my lack of experience on MM code, and if those 
pages can still be attached to address space (with PagePrivate set).

>  That's what XFS uses in it's buffer
> cache, and it already did so before it stopped to use page cache to
> back it's buffer cache, something I plan to do for the btrfs buffer
> cache as well, as the page cache algorithms tend to not work very
> well for buffer based metadata, never mind that there is an incredible
> amount of complex code just working around the interactions.
Thus we have the preparation patchset as the first step.
It should help no matter what the next step we go.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-12 23:58   ` Qu Wenruo
@ 2023-07-13 11:16     ` Christoph Hellwig
  2023-07-13 11:26     ` David Sterba
  1 sibling, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2023-07-13 11:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs, willy, linux-mm

On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote:
> > Do we?  btrfs by default uses a 16k nodesize (order 2 on x86), with
> > a maximum of 64k (order 4).  IIRC we should be able to get them pretty
> > reliably.
> 
> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine with
> that.

I think that is the aim.  I'm not entirely sure if we are entirely there
yes, thus the Ccs.

> > If not the best thning is to just a virtually contigous allocation as
> > fallback, i.e. use vm_map_ram.
> 
> That's also what Sweet Tea Dorminy mentioned, and I believe it's the correct
> way to go (as the fallback)
> 
> Although my concern is my lack of experience on MM code, and if those pages
> can still be attached to address space (with PagePrivate set).

At least they could back in the day when XFS did exactly that.  In fact
that was the use case why I added vmap originally back in 2002..


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-12 23:58   ` Qu Wenruo
  2023-07-13 11:16     ` Christoph Hellwig
@ 2023-07-13 11:26     ` David Sterba
  2023-07-13 11:41       ` Qu Wenruo
  1 sibling, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-13 11:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs, willy, linux-mm

On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote:
> On 2023/7/13 00:41, Christoph Hellwig wrote:
> > On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
> >> One of the biggest problem for metadata folio conversion is, we still
> >> need the current page based solution (or folios with order 0) as a
> >> fallback solution when we can not get a high order folio.
> > 
> > Do we?  btrfs by default uses a 16k nodesize (order 2 on x86), with
> > a maximum of 64k (order 4).  IIRC we should be able to get them pretty
> > reliably.
> 
> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine 
> with that.

I have mentioned my concerns about the allocation problems with higher
order than 0 in the past. Allocator gives some guarantees about not
failing for certain levels, now it's 1 (mm/fail_page_alloc.c
fail_page_alloc.min_oder = 1).

Per comment in page_alloc.c:rmqueue()

2814         /*
2815          * We most definitely don't want callers attempting to
2816          * allocate greater than order-1 page units with __GFP_NOFAIL.
2817          */
2818         WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

For allocations with higher order, eg. 4 to match the default 16K nodes,
this increases pressure and can trigger compaction, logic around
PAGE_ALLOC_COSTLY_ORDER which is 3.

> > If not the best thning is to just a virtually contigous allocation as
> > fallback, i.e. use vm_map_ram.

So we can allocate 0-order pages and then map them to virtual addresses,
which needs manipulation of PTE (page table entries), and requires
additional memory. This is what xfs does,
fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory,
so vm_unmap_aliases() is required and brings some overhead, and at the
end vm_unmap_ram() needs to be called, another overhead but probably
bearable.

With all that in place there would be a contiguous memory range
representing the metadata, so a simple memcpy() can be done. Sure,
with higher overhead and decreased reliability due to potentially
failing memory allocations - for metadata operations.

Compare that to what we have:

Pages are allocated as order 0, so there's much higher chance to get
them under pressure and not increasing the pressure otherwise.  We don't
need any virtual mappings. The cost is that we have to iterate the pages
and do the partial copying ourselves, but this is hidden in helpers.

We have different usage pattern of the metadata buffers than xfs, so
that it does something with vmapped contiguous buffers may not be easily
transferable to btrfs and bring us new problems.

The conversion to folios will happen eventually, though I don't want to
sacrifice reliability just for API use convenience. First the conversion
should be done 1:1 with pages and folios both order 0 before switching
to some higher order allocations hidden behind API calls.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-13 11:26     ` David Sterba
@ 2023-07-13 11:41       ` Qu Wenruo
  2023-07-13 11:49         ` David Sterba
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-13 11:41 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: Christoph Hellwig, linux-btrfs, willy, linux-mm



On 2023/7/13 19:26, David Sterba wrote:
> On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote:
>> On 2023/7/13 00:41, Christoph Hellwig wrote:
>>> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
>>>> One of the biggest problem for metadata folio conversion is, we still
>>>> need the current page based solution (or folios with order 0) as a
>>>> fallback solution when we can not get a high order folio.
>>>
>>> Do we?  btrfs by default uses a 16k nodesize (order 2 on x86), with
>>> a maximum of 64k (order 4).  IIRC we should be able to get them pretty
>>> reliably.
>>
>> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine
>> with that.
>
> I have mentioned my concerns about the allocation problems with higher
> order than 0 in the past. Allocator gives some guarantees about not
> failing for certain levels, now it's 1 (mm/fail_page_alloc.c
> fail_page_alloc.min_oder = 1).
>
> Per comment in page_alloc.c:rmqueue()
>
> 2814         /*
> 2815          * We most definitely don't want callers attempting to
> 2816          * allocate greater than order-1 page units with __GFP_NOFAIL.
> 2817          */
> 2818         WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>
> For allocations with higher order, eg. 4 to match the default 16K nodes,
> this increases pressure and can trigger compaction, logic around
> PAGE_ALLOC_COSTLY_ORDER which is 3.
>
>>> If not the best thning is to just a virtually contigous allocation as
>>> fallback, i.e. use vm_map_ram.
>
> So we can allocate 0-order pages and then map them to virtual addresses,
> which needs manipulation of PTE (page table entries), and requires
> additional memory. This is what xfs does,
> fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory,
> so vm_unmap_aliases() is required and brings some overhead, and at the
> end vm_unmap_ram() needs to be called, another overhead but probably
> bearable.
>
> With all that in place there would be a contiguous memory range
> representing the metadata, so a simple memcpy() can be done. Sure,
> with higher overhead and decreased reliability due to potentially
> failing memory allocations - for metadata operations.
>
> Compare that to what we have:
>
> Pages are allocated as order 0, so there's much higher chance to get
> them under pressure and not increasing the pressure otherwise.  We don't
> need any virtual mappings. The cost is that we have to iterate the pages
> and do the partial copying ourselves, but this is hidden in helpers.
>
> We have different usage pattern of the metadata buffers than xfs, so
> that it does something with vmapped contiguous buffers may not be easily
> transferable to btrfs and bring us new problems.
>
> The conversion to folios will happen eventually, though I don't want to
> sacrifice reliability just for API use convenience. First the conversion
> should be done 1:1 with pages and folios both order 0 before switching
> to some higher order allocations hidden behind API calls.

In fact, I have another solution as a middle ground before adding folio
into the situation.

   Check if the pages are already physically continuous.
   If so, everything can go without any cross-page handling.

   If not, we can either keep the current cross-page handling, or migrate
   to the virtually continuous mapped pages.

Currently we already have around 50~66% of eb pages are already
allocated physically continuous.

If we can just reduce the cross page handling for more than half of the
ebs, it's already a win.

For the vmapped pages, I'm not sure about the overhead, but I can try to
go that path and check the result.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-13 11:41       ` Qu Wenruo
@ 2023-07-13 11:49         ` David Sterba
  0 siblings, 0 replies; 29+ messages in thread
From: David Sterba @ 2023-07-13 11:49 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: dsterba, Qu Wenruo, Christoph Hellwig, linux-btrfs, willy,
	linux-mm

On Thu, Jul 13, 2023 at 07:41:53PM +0800, Qu Wenruo wrote:
> On 2023/7/13 19:26, David Sterba wrote:
> > On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote:
> >> On 2023/7/13 00:41, Christoph Hellwig wrote:
> >>> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
> >>>> One of the biggest problem for metadata folio conversion is, we still
> >>>> need the current page based solution (or folios with order 0) as a
> >>>> fallback solution when we can not get a high order folio.
> >>>
> >>> Do we?  btrfs by default uses a 16k nodesize (order 2 on x86), with
> >>> a maximum of 64k (order 4).  IIRC we should be able to get them pretty
> >>> reliably.
> >>
> >> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine
> >> with that.
> >
> > I have mentioned my concerns about the allocation problems with higher
> > order than 0 in the past. Allocator gives some guarantees about not
> > failing for certain levels, now it's 1 (mm/fail_page_alloc.c
> > fail_page_alloc.min_oder = 1).
> >
> > Per comment in page_alloc.c:rmqueue()
> >
> > 2814         /*
> > 2815          * We most definitely don't want callers attempting to
> > 2816          * allocate greater than order-1 page units with __GFP_NOFAIL.
> > 2817          */
> > 2818         WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> >
> > For allocations with higher order, eg. 4 to match the default 16K nodes,
> > this increases pressure and can trigger compaction, logic around
> > PAGE_ALLOC_COSTLY_ORDER which is 3.
> >
> >>> If not the best thning is to just a virtually contigous allocation as
> >>> fallback, i.e. use vm_map_ram.
> >
> > So we can allocate 0-order pages and then map them to virtual addresses,
> > which needs manipulation of PTE (page table entries), and requires
> > additional memory. This is what xfs does,
> > fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory,
> > so vm_unmap_aliases() is required and brings some overhead, and at the
> > end vm_unmap_ram() needs to be called, another overhead but probably
> > bearable.
> >
> > With all that in place there would be a contiguous memory range
> > representing the metadata, so a simple memcpy() can be done. Sure,
> > with higher overhead and decreased reliability due to potentially
> > failing memory allocations - for metadata operations.
> >
> > Compare that to what we have:
> >
> > Pages are allocated as order 0, so there's much higher chance to get
> > them under pressure and not increasing the pressure otherwise.  We don't
> > need any virtual mappings. The cost is that we have to iterate the pages
> > and do the partial copying ourselves, but this is hidden in helpers.
> >
> > We have different usage pattern of the metadata buffers than xfs, so
> > that it does something with vmapped contiguous buffers may not be easily
> > transferable to btrfs and bring us new problems.
> >
> > The conversion to folios will happen eventually, though I don't want to
> > sacrifice reliability just for API use convenience. First the conversion
> > should be done 1:1 with pages and folios both order 0 before switching
> > to some higher order allocations hidden behind API calls.
> 
> In fact, I have another solution as a middle ground before adding folio
> into the situation.
> 
>    Check if the pages are already physically continuous.
>    If so, everything can go without any cross-page handling.
> 
>    If not, we can either keep the current cross-page handling, or migrate
>    to the virtually continuous mapped pages.
> 
> Currently we already have around 50~66% of eb pages are already
> allocated physically continuous.

Memory fragmentation becomes problem over time on systems running for
weeks/months, then the contiguous ranges will became scarce. So if you
measure that on a system with a lot of memory and for a short time then
of course this will reach high rate of contiguous pages.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 4/6] btrfs: refactor memcpy_extent_buffer()
  2023-07-12  6:37 ` [PATCH v2 4/6] btrfs: refactor memcpy_extent_buffer() Qu Wenruo
@ 2023-07-13 11:51   ` David Sterba
  0 siblings, 0 replies; 29+ messages in thread
From: David Sterba @ 2023-07-13 11:51 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

For refactoring patches, please try to describe what exactly or how is
being refactored. In this case it's the main loop, so I've updated the
subjects in the patches.

On Wed, Jul 12, 2023 at 02:37:44PM +0800, Qu Wenruo wrote:
> [BACKGROUND]
> Currently memcpy_extent_buffer() goes a loop where it would stop at
> any page boundary inside [dst_offset, dst_offset + len) or [src_offset,
> src_offset + len).
> 
> This is mostly allowing us to go copy_pages(), but if we're going folio
> we will need to handle multi-page (the old behavior) or single folio
> (the new optimization).
> 
> The current code would be a burden for future changes.
> 
> [ENHANCEMENT]
> Instead of sticking with copy_pages(), here we utilize
> write_extent_buffer() to handle writing into the dst range.
> 
> Now we only need to handle the page boundaries inside the source range,
> making later switch to folio much easier.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 36 +++++++++++++-----------------------
>  1 file changed, 13 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 4e252fd7b78a..4db90ede8219 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4183,6 +4183,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
>  	struct page *page;
>  	char *kaddr;
>  	char *src = (char *)srcv;
> +	/* For unmapped (dummy) ebs, no need to check their uptodate status. */
> +	bool check_uptodate = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);

const bool

>  	unsigned long i = get_eb_page_index(start);
>  
>  	WARN_ON(test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags));
> @@ -4194,7 +4196,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
>  
>  	while (len > 0) {
>  		page = eb->pages[i];
> -		assert_eb_page_uptodate(eb, page);
> +		if (check_uptodate)
> +			assert_eb_page_uptodate(eb, page);
>  
>  		cur = min(len, PAGE_SIZE - offset);
>  		kaddr = page_address(page);
> @@ -4462,34 +4465,21 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
>  			  unsigned long dst_offset, unsigned long src_offset,
>  			  unsigned long len)
>  {
> -	size_t cur;
> -	size_t dst_off_in_page;
> -	size_t src_off_in_page;
> -	unsigned long dst_i;
> -	unsigned long src_i;
> +	unsigned long cur = src_offset;
>  
>  	if (check_eb_range(dst, dst_offset, len) ||
>  	    check_eb_range(dst, src_offset, len))
>  		return;
>  
> -	while (len > 0) {
> -		dst_off_in_page = get_eb_offset_in_page(dst, dst_offset);
> -		src_off_in_page = get_eb_offset_in_page(dst, src_offset);
> +	while (cur < src_offset + len) {
> +		int index = get_eb_page_index(cur);

get_eb_page_index() returns unsigned long, so this should be unified.
Fixed in the commit.

> +		unsigned long offset = get_eb_offset_in_page(dst, cur);
> +		unsigned long cur_len = min(src_offset + len - cur, PAGE_SIZE - offset);
> +		unsigned long offset_to_start = cur - src_offset;
> +		void *src_addr = page_address(dst->pages[index]) + offset;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 2/6] btrfs: refactor extent buffer bitmaps operations
  2023-07-12  6:37 ` [PATCH v2 2/6] btrfs: refactor extent buffer bitmaps operations Qu Wenruo
@ 2023-07-13 11:53   ` David Sterba
  0 siblings, 0 replies; 29+ messages in thread
From: David Sterba @ 2023-07-13 11:53 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jul 12, 2023 at 02:37:42PM +0800, Qu Wenruo wrote:
> [BACKGROUND]
> Currently we handle extent bitmaps manually in
> extent_buffer_bitmap_set() and extent_buffer_bitmap_clear().
> 
> Although with various helper like eb_bitmap_offset() it's still a little
> messy to read.
> The code seems to be a copy of bitmap_set(), but with all the cross-page
> handling embedded into the code.
> 
> [ENHANCEMENT]
> This patch would enhance the readability by introducing two helpers:
> 
> - memset_extent_buffer()
>   To handle the byte aligned range, thus all the cross-page handling is
>   done there.
> 
> - extent_buffer_get_byte()
>   This for the first and the last byte operation, which only needs to
>   grab one byte, thus no need for any cross-page handling.
> 
> So we can split both extent_buffer_bitmap_set() and
> extent_buffer_bitmap_clear() into 3 parts:
> 
> - Handle the first byte
>   If the range fits inside the first byte, we can exit early.
> 
> - Handle the byte aligned part
>   This is the part which can have cross-page operations, and it would
>   be handled by memset_extent_buffer().
> 
> - Handle the last byte
> 
> This refactor does not only make the code a little easier to read, but
> also make later folio/page switch much easier, as the switch only needs
> to be done inside memset_extent_buffer() and extent_buffer_get_byte().
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 141 +++++++++++++++++++++----------------------
>  1 file changed, 68 insertions(+), 73 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index a845a90d46f7..6a7abcbe6bec 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4229,32 +4229,30 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
>  	}
>  }
>  
> +static void memset_extent_buffer(const struct extent_buffer *eb, int c,
> +				 unsigned long start, unsigned long len)
> +{
> +	unsigned long cur = start;
> +
> +	while (cur < start + len) {
> +		int index = get_eb_page_index(cur);
> +		int offset = get_eb_offset_in_page(eb, cur);
> +		int cur_len = min(start + len - cur, PAGE_SIZE - offset);

All three should be unsigned long. Fixed in the commit.

> +		struct page *page = eb->pages[index];
> +
> +		assert_eb_page_uptodate(eb, page);
> +		memset(page_address(page) + offset, c, cur_len);
> +
> +		cur += cur_len;
> +	}
> +}
> +
>  void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
>  		unsigned long len)
>  {
> -	size_t cur;
> -	size_t offset;
> -	struct page *page;
> -	char *kaddr;
> -	unsigned long i = get_eb_page_index(start);
> -
>  	if (check_eb_range(eb, start, len))
>  		return;
> -
> -	offset = get_eb_offset_in_page(eb, start);
> -
> -	while (len > 0) {
> -		page = eb->pages[i];
> -		assert_eb_page_uptodate(eb, page);
> -
> -		cur = min(len, PAGE_SIZE - offset);
> -		kaddr = page_address(page);
> -		memset(kaddr + offset, 0, cur);
> -
> -		len -= cur;
> -		offset = 0;
> -		i++;
> -	}
> +	return memset_extent_buffer(eb, 0, start, len);
>  }
>  
>  void copy_extent_buffer_full(const struct extent_buffer *dst,
> @@ -4371,6 +4369,15 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
>  	return 1U & (kaddr[offset] >> (nr & (BITS_PER_BYTE - 1)));
>  }
>  
> +static u8 *extent_buffer_get_byte(const struct extent_buffer *eb, unsigned long bytenr)
> +{
> +	unsigned long index = get_eb_page_index(bytenr);
> +
> +	if (check_eb_range(eb, bytenr, 1))
> +		return NULL;
> +	return page_address(eb->pages[index]) + get_eb_offset_in_page(eb, bytenr);
> +}
> +
>  /*
>   * Set an area of a bitmap to 1.
>   *
> @@ -4382,35 +4389,29 @@ int extent_buffer_test_bit(const struct extent_buffer *eb, unsigned long start,
>  void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long start,
>  			      unsigned long pos, unsigned long len)
>  {
> +	unsigned int first_byte = start + BIT_BYTE(pos);
> +	unsigned int last_byte = start + BIT_BYTE(pos + len - 1);
> +	bool same_byte = (first_byte == last_byte);

const bool

> +	u8 mask = BITMAP_FIRST_BYTE_MASK(pos);
>  	u8 *kaddr;
> -	struct page *page;
> -	unsigned long i;
> -	size_t offset;
> -	const unsigned int size = pos + len;
> -	int bits_to_set = BITS_PER_BYTE - (pos % BITS_PER_BYTE);
> -	u8 mask_to_set = BITMAP_FIRST_BYTE_MASK(pos);
>  
> -	eb_bitmap_offset(eb, start, pos, &i, &offset);
> -	page = eb->pages[i];
> -	assert_eb_page_uptodate(eb, page);
> -	kaddr = page_address(page);
> +	if (same_byte)
> +		mask &= BITMAP_LAST_BYTE_MASK(pos + len);
>  
> -	while (len >= bits_to_set) {
> -		kaddr[offset] |= mask_to_set;
> -		len -= bits_to_set;
> -		bits_to_set = BITS_PER_BYTE;
> -		mask_to_set = ~0;
> -		if (++offset >= PAGE_SIZE && len > 0) {
> -			offset = 0;
> -			page = eb->pages[++i];
> -			assert_eb_page_uptodate(eb, page);
> -			kaddr = page_address(page);
> -		}
> -	}
> -	if (len) {
> -		mask_to_set &= BITMAP_LAST_BYTE_MASK(size);
> -		kaddr[offset] |= mask_to_set;
> -	}
> +	/* Handle the first byte. */
> +	kaddr = extent_buffer_get_byte(eb, first_byte);
> +	*kaddr |= mask;
> +	if (same_byte)
> +		return;
> +
> +	/* Handle the byte aligned part. */
> +	ASSERT(first_byte + 1 <= last_byte);
> +	memset_extent_buffer(eb, 0xff, first_byte + 1,
> +			     last_byte - first_byte - 1);
> +
> +	/* Handle the last byte. */
> +	kaddr = extent_buffer_get_byte(eb, last_byte);
> +	*kaddr |= BITMAP_LAST_BYTE_MASK(pos + len);
>  }
>  
>  
> @@ -4426,35 +4427,29 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
>  				unsigned long start, unsigned long pos,
>  				unsigned long len)
>  {
> +	int first_byte = start + BIT_BYTE(pos);
> +	int last_byte = start + BIT_BYTE(pos + len - 1);
> +	bool same_byte = (first_byte == last_byte);

const bool

> +	u8 mask = BITMAP_FIRST_BYTE_MASK(pos);
>  	u8 *kaddr;
> -	struct page *page;
> -	unsigned long i;
> -	size_t offset;
> -	const unsigned int size = pos + len;
> -	int bits_to_clear = BITS_PER_BYTE - (pos % BITS_PER_BYTE);
> -	u8 mask_to_clear = BITMAP_FIRST_BYTE_MASK(pos);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 6/6] btrfs: call copy_extent_buffer_full() inside btrfs_clone_extent_buffer()
  2023-07-12  6:37 ` [PATCH v2 6/6] btrfs: call copy_extent_buffer_full() inside btrfs_clone_extent_buffer() Qu Wenruo
@ 2023-07-13 11:55   ` David Sterba
  0 siblings, 0 replies; 29+ messages in thread
From: David Sterba @ 2023-07-13 11:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

For the subject, please try to describe it in more human readable way,
not just saying what the code does. Sometimes the function names explain
it best but here it's moving page copying out of the loop and does it in
one go, so I've changed the changelog like that.

On Wed, Jul 12, 2023 at 02:37:46PM +0800, Qu Wenruo wrote:
> Function btrfs_clone_extent_buffer() is calling of copy_page() directly.
> 
> To make later migration for folio easier, just call
> copy_extent_buffer_full() instead.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index c84e64181acb..7f0a532de645 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3285,8 +3285,8 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
>  			return NULL;
>  		}
>  		WARN_ON(PageDirty(p));
> -		copy_page(page_address(p), page_address(src->pages[i]));
>  	}
> +	copy_extent_buffer_full(new, src);
>  	set_extent_buffer_uptodate(new);
>  
>  	return new;
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
                   ` (7 preceding siblings ...)
  2023-07-12 16:41 ` Christoph Hellwig
@ 2023-07-13 12:09 ` David Sterba
  2023-07-13 16:39   ` David Sterba
  8 siblings, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-13 12:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers
> 
> [BACKGROUND]
> 
> Recently I'm checking on the feasibility on converting metadata handling
> to go a folio based solution.
> 
> The best part of using a single folio for metadata is, we can get rid of
> the complexity of cross-page handling, everything would be just a single
> memory operation on a continuous memory range.
> 
> [PITFALLS]
> 
> One of the biggest problem for metadata folio conversion is, we still
> need the current page based solution (or folios with order 0) as a
> fallback solution when we can not get a high order folio.
> 
> In that case, there would be a hell to handle the four different
> combinations (folio/folio, folio/page, page/folio, page/page) for extent
> buffer helpers involving two extent buffers.
> 
> Although there are some new ideas on how to handle metadata memory (e.g.
> go full vmallocated memory), reducing the open-coded memory handling for
> metadata should always be a good start point.
> 
> [OBJECTIVE]
> 
> So this patchset is the preparation to reduce direct page operations for
> metadata.
> 
> The patchset would do this mostly by concentrating the operations to use
> the common helper, write_extent_buffer() and read_extent_buffer().
> 
> For bitmap operations it's much complex, thus this patchset refactor it
> completely to go a 3 part solution:
> 
> - Handle the first byte
> - Handle the byte aligned ranges
> - Handle the last byte
> 
> This needs more complex testing (which I failed several times during
> development) to prevent regression.
> 
> Finally there is only one function which can not be properly migrated,
> memmove_extent_buffer(), which has to use memmove() calls, thus must go
> per-page mapping handling.
> 
> Thankfully if we go folio in the end, the folio based handling would
> just be a single memmove(), thus it won't be too much burden.
> 
> 
> Qu Wenruo (6):
>   btrfs: tests: enhance extent buffer bitmap tests
>   btrfs: refactor extent buffer bitmaps operations
>   btrfs: use write_extent_buffer() to implement
>     write_extent_buffer_*id()
>   btrfs: refactor memcpy_extent_buffer()
>   btrfs: refactor copy_extent_buffer_full()
>   btrfs: call copy_extent_buffer_full() inside
>     btrfs_clone_extent_buffer()

Added to misc-next, with some fixups, thanks. How far we'll get with the
folio conversions or other page contiguous improvements depends but this
patchset is fairly independent.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-13 12:09 ` David Sterba
@ 2023-07-13 16:39   ` David Sterba
  2023-07-13 21:30     ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-13 16:39 UTC (permalink / raw)
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs

On Thu, Jul 13, 2023 at 02:09:35PM +0200, David Sterba wrote:
> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
> Added to misc-next

And removed again, it explodes right before the first test:

BTRFS: device fsid 4e9cf0f7-cdc4-4e38-9e59-de4d88122ee9 devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (13714)
BTRFS info (device vdb): using crc32c (crc32c-generic) checksum algorithm
BTRFS info (device vdb): using free space tree
BTRFS info (device vdb): auto enabling async discard
BTRFS info (device vdb): checking UUID tree
------------[ cut here ]------------
WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3026 __btrfs_free_extent+0x9ac/0x1280 [btrfs]
Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
CPU: 3 PID: 13739 Comm: umount Not tainted 6.5.0-rc1-default+ #2126
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
RIP: 0010:__btrfs_free_extent+0x9ac/0x1280 [btrfs]
RSP: 0018:ffff8880031c78a8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: ffffffffc065e9ba
RDX: dffffc0000000000 RSI: ffffffffc063f610 RDI: ffff888026511130
RBP: ffff888002734000 R08: 0000000000000000 R09: ffffed1000638eff
R10: ffff8880031c77ff R11: 0000000000000001 R12: 0000000000000001
R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000
FS:  00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0
Call Trace:
 <TASK>
 ? __warn+0xa1/0x200
 ? __btrfs_free_extent+0x9ac/0x1280 [btrfs]
 ? report_bug+0x207/0x270
 ? handle_bug+0x65/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? __btrfs_free_extent+0x39a/0x1280 [btrfs]
 ? unlock_up+0x160/0x370 [btrfs]
 ? __btrfs_free_extent+0x9ac/0x1280 [btrfs]
 ? __btrfs_free_extent+0x39a/0x1280 [btrfs]
 ? lookup_extent_backref+0xd0/0xd0 [btrfs]
 ? __lock_release.isra.0+0x14e/0x510
 ? reacquire_held_locks+0x280/0x280
 run_delayed_tree_ref+0x10b/0x2d0 [btrfs]
 btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs]
 __btrfs_run_delayed_refs+0xce/0x160 [btrfs]
 btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs]
 commit_cowonly_roots+0x3f1/0x4c0 [btrfs]
 ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs]
 ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs]
 btrfs_commit_transaction+0xc13/0x17e0 [btrfs]
 ? cleanup_transaction+0x640/0x640 [btrfs]
 ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
 sync_filesystem+0xd3/0x100
 generic_shutdown_super+0x44/0x1f0
 kill_anon_super+0x1e/0x40
 btrfs_kill_super+0x25/0x30 [btrfs]
 deactivate_locked_super+0x4c/0xc0
 cleanup_mnt+0x13a/0x1f0
 task_work_run+0xf2/0x170
 ? task_work_cancel+0x20/0x20
 ? mark_held_locks+0x1a/0x80
 exit_to_user_mode_prepare+0x16c/0x170
 syscall_exit_to_user_mode+0x19/0x50
 do_syscall_64+0x49/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fb5c250f4bb
RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20
RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084
 </TASK>
irq event stamp: 11109
hardirqs last  enabled at (11119): [<ffffffff841678a2>] __up_console_sem+0x52/0x60
hardirqs last disabled at (11130): [<ffffffff84167887>] __up_console_sem+0x37/0x60
softirqs last  enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae
softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
BTRFS: Transaction aborted (error -117)
WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3027 __btrfs_free_extent+0x10ff/0x1280 [btrfs]
Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
CPU: 3 PID: 13739 Comm: umount Tainted: G        W          6.5.0-rc1-default+ #2126
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
RIP: 0010:__btrfs_free_extent+0x10ff/0x1280 [btrfs]
RSP: 0018:ffff8880031c78a8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff841007a8 RDI: ffffffff87c9e0e0
RBP: ffff888002734000 R08: 0000000000000001 R09: ffffed1000638eba
R10: ffff8880031c75d7 R11: 0000000000000001 R12: 0000000000000001
R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000
FS:  00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0
Call Trace:
 <TASK>
 ? __warn+0xa1/0x200
 ? __btrfs_free_extent+0x10ff/0x1280 [btrfs]
 ? report_bug+0x207/0x270
 ? handle_bug+0x65/0x90
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? preempt_count_sub+0x18/0xc0
 ? __btrfs_free_extent+0x10ff/0x1280 [btrfs]
 ? __btrfs_free_extent+0x10ff/0x1280 [btrfs]
 ? lookup_extent_backref+0xd0/0xd0 [btrfs]
 ? __lock_release.isra.0+0x14e/0x510
 ? reacquire_held_locks+0x280/0x280
 run_delayed_tree_ref+0x10b/0x2d0 [btrfs]
 btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs]
 __btrfs_run_delayed_refs+0xce/0x160 [btrfs]
 btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs]
 commit_cowonly_roots+0x3f1/0x4c0 [btrfs]
 ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs]
 ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs]
 btrfs_commit_transaction+0xc13/0x17e0 [btrfs]
 ? cleanup_transaction+0x640/0x640 [btrfs]
 ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
 sync_filesystem+0xd3/0x100
 generic_shutdown_super+0x44/0x1f0
 kill_anon_super+0x1e/0x40
 btrfs_kill_super+0x25/0x30 [btrfs]
 deactivate_locked_super+0x4c/0xc0
 cleanup_mnt+0x13a/0x1f0
 task_work_run+0xf2/0x170
 ? task_work_cancel+0x20/0x20
 ? mark_held_locks+0x1a/0x80
 exit_to_user_mode_prepare+0x16c/0x170
 syscall_exit_to_user_mode+0x19/0x50
 do_syscall_64+0x49/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fb5c250f4bb
RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20
RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084
 </TASK>
irq event stamp: 11925
hardirqs last  enabled at (11935): [<ffffffff841678a2>] __up_console_sem+0x52/0x60
hardirqs last disabled at (11946): [<ffffffff84167887>] __up_console_sem+0x37/0x60
softirqs last  enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae
softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100
---[ end trace 0000000000000000 ]---
BTRFS: error (device vdb: state A) in __btrfs_free_extent:3027: errno=-117 Filesystem corrupted
BTRFS info (device vdb: state EA): forced readonly
BTRFS info (device vdb: state EA): leaf 30474240 gen 7 total ptrs 16 free space 15382 owner 2
BTRFS info (device vdb: state EA): refs 3 lock_owner 13739 current 13739
	item 0 key (13631488 192 8388608) itemoff 16259 itemsize 24
		block group used 0 chunk_objectid 256 flags 1
	item 1 key (22020096 192 8388608) itemoff 16235 itemsize 24
		block group used 16384 chunk_objectid 256 flags 34
	item 2 key (22036480 169 0) itemoff 16202 itemsize 33
		extent refs 1 gen 6 flags 2
		ref#0: tree block backref root 3
	item 3 key (30408704 169 0) itemoff 16169 itemsize 33
		extent refs 1 gen 6 flags 2
		ref#0: tree block backref root 2
	item 4 key (30408704 192 268435456) itemoff 16145 itemsize 24
		block group used 131072 chunk_objectid 256 flags 36
	item 5 key (30425088 169 0) itemoff 16112 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 5
	item 6 key (30441472 169 0) itemoff 16079 itemsize 33
		extent refs 1 gen 7 flags 2
		ref#0: tree block backref root 1
	item 7 key (30457856 169 0) itemoff 16046 itemsize 33
		extent refs 1 gen 7 flags 2
		ref#0: tree block backref root 4
	item 8 key (30474240 169 0) itemoff 16013 itemsize 33
		extent refs 1 gen 7 flags 2
		ref#0: tree block backref root 2
	item 9 key (30490624 169 0) itemoff 15980 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 7
	item 10 key (30507008 169 0) itemoff 15947 itemsize 33
		extent refs 1 gen 7 flags 2
		ref#0: tree block backref root 10
	item 11 key (30523392 169 0) itemoff 15914 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 7
	item 12 key (30539776 169 0) itemoff 15881 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 7
	item 13 key (30556160 169 0) itemoff 15848 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 7
	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 7
	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
		extent refs 1 gen 5 flags 2
		ref#0: tree block backref root 7
BTRFS critical (device vdb: state EA): unable to find ref byte nr 30556160 parent 0 root 4 owner 0 offset 0 slot 14
BTRFS error (device vdb: state EA): failed to run delayed ref for logical 30556160 num_bytes 16384 type 176 action 2 ref_mod 1: -2
BTRFS: error (device vdb: state EA) in btrfs_run_delayed_refs:2102: errno=-2 No such entry
BTRFS warning (device vdb: state EA): Skipping commit of aborted transaction.
BTRFS: error (device vdb: state EA) in cleanup_transaction:1977: errno=-2 No such entry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-13 16:39   ` David Sterba
@ 2023-07-13 21:30     ` Qu Wenruo
  2023-07-13 22:03       ` David Sterba
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-13 21:30 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs



On 2023/7/14 00:39, David Sterba wrote:
> On Thu, Jul 13, 2023 at 02:09:35PM +0200, David Sterba wrote:
>> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote:
>> Added to misc-next
>
> And removed again, it explodes right before the first test:

Weird, it passed my local btrfs/* tests.


>
> BTRFS: device fsid 4e9cf0f7-cdc4-4e38-9e59-de4d88122ee9 devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (13714)
> BTRFS info (device vdb): using crc32c (crc32c-generic) checksum algorithm
> BTRFS info (device vdb): using free space tree
> BTRFS info (device vdb): auto enabling async discard
> BTRFS info (device vdb): checking UUID tree
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3026 __btrfs_free_extent+0x9ac/0x1280 [btrfs]
> Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
> CPU: 3 PID: 13739 Comm: umount Not tainted 6.5.0-rc1-default+ #2126
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> RIP: 0010:__btrfs_free_extent+0x9ac/0x1280 [btrfs]
> RSP: 0018:ffff8880031c78a8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: ffffffffc065e9ba
> RDX: dffffc0000000000 RSI: ffffffffc063f610 RDI: ffff888026511130
> RBP: ffff888002734000 R08: 0000000000000000 R09: ffffed1000638eff
> R10: ffff8880031c77ff R11: 0000000000000001 R12: 0000000000000001
> R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000
> FS:  00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0
> Call Trace:
>   <TASK>
>   ? __warn+0xa1/0x200
>   ? __btrfs_free_extent+0x9ac/0x1280 [btrfs]
>   ? report_bug+0x207/0x270
>   ? handle_bug+0x65/0x90
>   ? exc_invalid_op+0x13/0x40
>   ? asm_exc_invalid_op+0x16/0x20
>   ? __btrfs_free_extent+0x39a/0x1280 [btrfs]
>   ? unlock_up+0x160/0x370 [btrfs]
>   ? __btrfs_free_extent+0x9ac/0x1280 [btrfs]
>   ? __btrfs_free_extent+0x39a/0x1280 [btrfs]
>   ? lookup_extent_backref+0xd0/0xd0 [btrfs]
>   ? __lock_release.isra.0+0x14e/0x510
>   ? reacquire_held_locks+0x280/0x280
>   run_delayed_tree_ref+0x10b/0x2d0 [btrfs]
>   btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs]
>   __btrfs_run_delayed_refs+0xce/0x160 [btrfs]
>   btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs]
>   commit_cowonly_roots+0x3f1/0x4c0 [btrfs]
>   ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs]
>   ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs]
>   btrfs_commit_transaction+0xc13/0x17e0 [btrfs]
>   ? cleanup_transaction+0x640/0x640 [btrfs]
>   ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
>   sync_filesystem+0xd3/0x100
>   generic_shutdown_super+0x44/0x1f0
>   kill_anon_super+0x1e/0x40
>   btrfs_kill_super+0x25/0x30 [btrfs]
>   deactivate_locked_super+0x4c/0xc0
>   cleanup_mnt+0x13a/0x1f0
>   task_work_run+0xf2/0x170
>   ? task_work_cancel+0x20/0x20
>   ? mark_held_locks+0x1a/0x80
>   exit_to_user_mode_prepare+0x16c/0x170
>   syscall_exit_to_user_mode+0x19/0x50
>   do_syscall_64+0x49/0x90
>   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7fb5c250f4bb
> RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20
> RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084
>   </TASK>
> irq event stamp: 11109
> hardirqs last  enabled at (11119): [<ffffffff841678a2>] __up_console_sem+0x52/0x60
> hardirqs last disabled at (11130): [<ffffffff84167887>] __up_console_sem+0x37/0x60
> softirqs last  enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae
> softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100
> ---[ end trace 0000000000000000 ]---
> ------------[ cut here ]------------
> BTRFS: Transaction aborted (error -117)
> WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3027 __btrfs_free_extent+0x10ff/0x1280 [btrfs]
> Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop
> CPU: 3 PID: 13739 Comm: umount Tainted: G        W          6.5.0-rc1-default+ #2126
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> RIP: 0010:__btrfs_free_extent+0x10ff/0x1280 [btrfs]
> RSP: 0018:ffff8880031c78a8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: ffffffff841007a8 RDI: ffffffff87c9e0e0
> RBP: ffff888002734000 R08: 0000000000000001 R09: ffffed1000638eba
> R10: ffff8880031c75d7 R11: 0000000000000001 R12: 0000000000000001
> R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000
> FS:  00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0
> Call Trace:
>   <TASK>
>   ? __warn+0xa1/0x200
>   ? __btrfs_free_extent+0x10ff/0x1280 [btrfs]
>   ? report_bug+0x207/0x270
>   ? handle_bug+0x65/0x90
>   ? exc_invalid_op+0x13/0x40
>   ? asm_exc_invalid_op+0x16/0x20
>   ? preempt_count_sub+0x18/0xc0
>   ? __btrfs_free_extent+0x10ff/0x1280 [btrfs]
>   ? __btrfs_free_extent+0x10ff/0x1280 [btrfs]
>   ? lookup_extent_backref+0xd0/0xd0 [btrfs]
>   ? __lock_release.isra.0+0x14e/0x510
>   ? reacquire_held_locks+0x280/0x280
>   run_delayed_tree_ref+0x10b/0x2d0 [btrfs]
>   btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs]
>   __btrfs_run_delayed_refs+0xce/0x160 [btrfs]
>   btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs]
>   commit_cowonly_roots+0x3f1/0x4c0 [btrfs]
>   ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs]
>   ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs]
>   btrfs_commit_transaction+0xc13/0x17e0 [btrfs]
>   ? cleanup_transaction+0x640/0x640 [btrfs]
>   ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
>   sync_filesystem+0xd3/0x100
>   generic_shutdown_super+0x44/0x1f0
>   kill_anon_super+0x1e/0x40
>   btrfs_kill_super+0x25/0x30 [btrfs]
>   deactivate_locked_super+0x4c/0xc0
>   cleanup_mnt+0x13a/0x1f0
>   task_work_run+0xf2/0x170
>   ? task_work_cancel+0x20/0x20
>   ? mark_held_locks+0x1a/0x80
>   exit_to_user_mode_prepare+0x16c/0x170
>   syscall_exit_to_user_mode+0x19/0x50
>   do_syscall_64+0x49/0x90
>   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7fb5c250f4bb
> RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20
> RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084
>   </TASK>
> irq event stamp: 11925
> hardirqs last  enabled at (11935): [<ffffffff841678a2>] __up_console_sem+0x52/0x60
> hardirqs last disabled at (11946): [<ffffffff84167887>] __up_console_sem+0x37/0x60
> softirqs last  enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae
> softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100
> ---[ end trace 0000000000000000 ]---
> BTRFS: error (device vdb: state A) in __btrfs_free_extent:3027: errno=-117 Filesystem corrupted
> BTRFS info (device vdb: state EA): forced readonly
> BTRFS info (device vdb: state EA): leaf 30474240 gen 7 total ptrs 16 free space 15382 owner 2
> BTRFS info (device vdb: state EA): refs 3 lock_owner 13739 current 13739
> 	item 0 key (13631488 192 8388608) itemoff 16259 itemsize 24
> 		block group used 0 chunk_objectid 256 flags 1
> 	item 1 key (22020096 192 8388608) itemoff 16235 itemsize 24
> 		block group used 16384 chunk_objectid 256 flags 34
> 	item 2 key (22036480 169 0) itemoff 16202 itemsize 33
> 		extent refs 1 gen 6 flags 2
> 		ref#0: tree block backref root 3
> 	item 3 key (30408704 169 0) itemoff 16169 itemsize 33
> 		extent refs 1 gen 6 flags 2
> 		ref#0: tree block backref root 2
> 	item 4 key (30408704 192 268435456) itemoff 16145 itemsize 24
> 		block group used 131072 chunk_objectid 256 flags 36
> 	item 5 key (30425088 169 0) itemoff 16112 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 5
> 	item 6 key (30441472 169 0) itemoff 16079 itemsize 33
> 		extent refs 1 gen 7 flags 2
> 		ref#0: tree block backref root 1
> 	item 7 key (30457856 169 0) itemoff 16046 itemsize 33
> 		extent refs 1 gen 7 flags 2
> 		ref#0: tree block backref root 4
> 	item 8 key (30474240 169 0) itemoff 16013 itemsize 33
> 		extent refs 1 gen 7 flags 2
> 		ref#0: tree block backref root 2
> 	item 9 key (30490624 169 0) itemoff 15980 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 7
> 	item 10 key (30507008 169 0) itemoff 15947 itemsize 33
> 		extent refs 1 gen 7 flags 2
> 		ref#0: tree block backref root 10
> 	item 11 key (30523392 169 0) itemoff 15914 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 7
> 	item 12 key (30539776 169 0) itemoff 15881 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 7
> 	item 13 key (30556160 169 0) itemoff 15848 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 7
> 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 7
> 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
> 		extent refs 1 gen 5 flags 2
> 		ref#0: tree block backref root 7

This looks like an error in memmove_extent_buffer() which I
intentionally didn't touch.

Anyway I'll try rebase and more tests.

Can you put your modified commits in an external branch so I can inherit
all your modifications?

Thanks,
Qu

> BTRFS critical (device vdb: state EA): unable to find ref byte nr 30556160 parent 0 root 4 owner 0 offset 0 slot 14
> BTRFS error (device vdb: state EA): failed to run delayed ref for logical 30556160 num_bytes 16384 type 176 action 2 ref_mod 1: -2
> BTRFS: error (device vdb: state EA) in btrfs_run_delayed_refs:2102: errno=-2 No such entry
> BTRFS warning (device vdb: state EA): Skipping commit of aborted transaction.
> BTRFS: error (device vdb: state EA) in cleanup_transaction:1977: errno=-2 No such entry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-13 21:30     ` Qu Wenruo
@ 2023-07-13 22:03       ` David Sterba
  2023-07-14  0:09         ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-13 22:03 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote:
> On 2023/7/14 00:39, David Sterba wrote:
> > 		ref#0: tree block backref root 7
> > 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
> > 		extent refs 1 gen 5 flags 2
> > 		ref#0: tree block backref root 7
> > 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
> > 		extent refs 1 gen 5 flags 2
> > 		ref#0: tree block backref root 7
> 
> This looks like an error in memmove_extent_buffer() which I
> intentionally didn't touch.
> 
> Anyway I'll try rebase and more tests.
> 
> Can you put your modified commits in an external branch so I can inherit
> all your modifications?

First I saw the crashes with the modified patches but the report is from
what you sent to the mailinglist so I can eliminate error on my side.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-13 22:03       ` David Sterba
@ 2023-07-14  0:09         ` Qu Wenruo
  2023-07-14  0:26           ` David Sterba
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-14  0:09 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs



On 2023/7/14 06:03, David Sterba wrote:
> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote:
>> On 2023/7/14 00:39, David Sterba wrote:
>>> 		ref#0: tree block backref root 7
>>> 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
>>> 		extent refs 1 gen 5 flags 2
>>> 		ref#0: tree block backref root 7
>>> 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
>>> 		extent refs 1 gen 5 flags 2
>>> 		ref#0: tree block backref root 7
>>
>> This looks like an error in memmove_extent_buffer() which I
>> intentionally didn't touch.
>>
>> Anyway I'll try rebase and more tests.
>>
>> Can you put your modified commits in an external branch so I can inherit
>> all your modifications?
>
> First I saw the crashes with the modified patches but the report is from
> what you sent to the mailinglist so I can eliminate error on my side.

Still a branch would help a lot, as you won't want to re-do the usual
modification (like grammar, comments etc).

Thanks,
Qu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-14  0:09         ` Qu Wenruo
@ 2023-07-14  0:26           ` David Sterba
  2023-07-14  1:58             ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-14  0:26 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/7/14 06:03, David Sterba wrote:
> > On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote:
> >> On 2023/7/14 00:39, David Sterba wrote:
> >>> 		ref#0: tree block backref root 7
> >>> 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
> >>> 		extent refs 1 gen 5 flags 2
> >>> 		ref#0: tree block backref root 7
> >>> 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
> >>> 		extent refs 1 gen 5 flags 2
> >>> 		ref#0: tree block backref root 7
> >>
> >> This looks like an error in memmove_extent_buffer() which I
> >> intentionally didn't touch.
> >>
> >> Anyway I'll try rebase and more tests.
> >>
> >> Can you put your modified commits in an external branch so I can inherit
> >> all your modifications?
> >
> > First I saw the crashes with the modified patches but the report is from
> > what you sent to the mailinglist so I can eliminate error on my side.
> 
> Still a branch would help a lot, as you won't want to re-do the usual
> modification (like grammar, comments etc).

Branch ext/qu-eb-page-clanups-updated-broken at github.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-14  0:26           ` David Sterba
@ 2023-07-14  1:58             ` Qu Wenruo
  2023-07-14 10:03               ` David Sterba
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-14  1:58 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs



On 2023/7/14 08:26, David Sterba wrote:
> On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2023/7/14 06:03, David Sterba wrote:
>>> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote:
>>>> On 2023/7/14 00:39, David Sterba wrote:
>>>>> 		ref#0: tree block backref root 7
>>>>> 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
>>>>> 		extent refs 1 gen 5 flags 2
>>>>> 		ref#0: tree block backref root 7
>>>>> 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
>>>>> 		extent refs 1 gen 5 flags 2
>>>>> 		ref#0: tree block backref root 7
>>>>
>>>> This looks like an error in memmove_extent_buffer() which I
>>>> intentionally didn't touch.
>>>>
>>>> Anyway I'll try rebase and more tests.
>>>>
>>>> Can you put your modified commits in an external branch so I can inherit
>>>> all your modifications?
>>>
>>> First I saw the crashes with the modified patches but the report is from
>>> what you sent to the mailinglist so I can eliminate error on my side.
>>
>> Still a branch would help a lot, as you won't want to re-do the usual
>> modification (like grammar, comments etc).
>
> Branch ext/qu-eb-page-clanups-updated-broken at github.

Already running the auto group with that branch, and no explosion so far
(btrfs/004 failed to mount with -o atime though).

Any extra setup needed to trigger the failure?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-14  1:58             ` Qu Wenruo
@ 2023-07-14 10:03               ` David Sterba
  2023-07-14 10:32                 ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-14 10:03 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Fri, Jul 14, 2023 at 09:58:00AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/7/14 08:26, David Sterba wrote:
> > On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2023/7/14 06:03, David Sterba wrote:
> >>> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote:
> >>>> On 2023/7/14 00:39, David Sterba wrote:
> >>>>> 		ref#0: tree block backref root 7
> >>>>> 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
> >>>>> 		extent refs 1 gen 5 flags 2
> >>>>> 		ref#0: tree block backref root 7
> >>>>> 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
> >>>>> 		extent refs 1 gen 5 flags 2
> >>>>> 		ref#0: tree block backref root 7
> >>>>
> >>>> This looks like an error in memmove_extent_buffer() which I
> >>>> intentionally didn't touch.
> >>>>
> >>>> Anyway I'll try rebase and more tests.
> >>>>
> >>>> Can you put your modified commits in an external branch so I can inherit
> >>>> all your modifications?
> >>>
> >>> First I saw the crashes with the modified patches but the report is from
> >>> what you sent to the mailinglist so I can eliminate error on my side.
> >>
> >> Still a branch would help a lot, as you won't want to re-do the usual
> >> modification (like grammar, comments etc).
> >
> > Branch ext/qu-eb-page-clanups-updated-broken at github.
> 
> Already running the auto group with that branch, and no explosion so far
> (btrfs/004 failed to mount with -o atime though).
> 
> Any extra setup needed to trigger the failure?

I'm not aware of anything different than usual. Patches applied to git,
built, updated VM and started. I had another branch built and tested and
it finished the fstests. I can at least bisect which patch does it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-14 10:03               ` David Sterba
@ 2023-07-14 10:32                 ` Qu Wenruo
  2023-07-14 10:41                   ` David Sterba
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-14 10:32 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs



On 2023/7/14 18:03, David Sterba wrote:
> On Fri, Jul 14, 2023 at 09:58:00AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2023/7/14 08:26, David Sterba wrote:
>>> On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2023/7/14 06:03, David Sterba wrote:
>>>>> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote:
>>>>>> On 2023/7/14 00:39, David Sterba wrote:
>>>>>>> 		ref#0: tree block backref root 7
>>>>>>> 	item 14 key (30572544 169 0) itemoff 15815 itemsize 33
>>>>>>> 		extent refs 1 gen 5 flags 2
>>>>>>> 		ref#0: tree block backref root 7
>>>>>>> 	item 15 key (30588928 169 0) itemoff 15782 itemsize 33
>>>>>>> 		extent refs 1 gen 5 flags 2
>>>>>>> 		ref#0: tree block backref root 7
>>>>>>
>>>>>> This looks like an error in memmove_extent_buffer() which I
>>>>>> intentionally didn't touch.
>>>>>>
>>>>>> Anyway I'll try rebase and more tests.
>>>>>>
>>>>>> Can you put your modified commits in an external branch so I can inherit
>>>>>> all your modifications?
>>>>>
>>>>> First I saw the crashes with the modified patches but the report is from
>>>>> what you sent to the mailinglist so I can eliminate error on my side.
>>>>
>>>> Still a branch would help a lot, as you won't want to re-do the usual
>>>> modification (like grammar, comments etc).
>>>
>>> Branch ext/qu-eb-page-clanups-updated-broken at github.
>>
>> Already running the auto group with that branch, and no explosion so far
>> (btrfs/004 failed to mount with -o atime though).
>>
>> Any extra setup needed to trigger the failure?
>
> I'm not aware of anything different than usual. Patches applied to git,
> built, updated VM and started. I had another branch built and tested and
> it finished the fstests. I can at least bisect which patch does it.

A bisection would be very appreciated.

Although I guess it should be the memcpy_extent_buffer() patch, I didn't
see something obvious right now...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-14 10:32                 ` Qu Wenruo
@ 2023-07-14 10:41                   ` David Sterba
  2023-07-15  0:39                     ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: David Sterba @ 2023-07-14 10:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs

On Fri, Jul 14, 2023 at 06:32:27PM +0800, Qu Wenruo wrote:
> >> Already running the auto group with that branch, and no explosion so far
> >> (btrfs/004 failed to mount with -o atime though).
> >>
> >> Any extra setup needed to trigger the failure?
> >
> > I'm not aware of anything different than usual. Patches applied to git,
> > built, updated VM and started. I had another branch built and tested and
> > it finished the fstests. I can at least bisect which patch does it.
> 
> A bisection would be very appreciated.
> 
> Although I guess it should be the memcpy_extent_buffer() patch, I didn't
> see something obvious right now...

5ebf7593abb81ec1993f31e90a7573b75aff4db4 is the first bad commit
btrfs: refactor main loop in memcpy_extent_buffer()

$ git bisect log
# bad: [5c6c140622dd7107acb13da404f0c682f1f954a6] btrfs: copy all pages at once at the end of btrfs_clone_extent_buffer()
# good: [72c15cf7e64769ca9273a825fff8495d99975c9c] btrfs: deprecate integrity checker feature
git bisect start 'ext/qu-eb-page-clanups-updated-broken' '72c15cf7e64769ca9273a825fff8495d99975c9c'
# good: [85ab525a6a63c477b92099835d6b05eaebd4ad4b] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id()
git bisect good 85ab525a6a63c477b92099835d6b05eaebd4ad4b
# bad: [cd6668ef43a224b3f8130b78f4e3b922a7175a05] btrfs: refactor main loop in copy_extent_buffer_full()
git bisect bad cd6668ef43a224b3f8130b78f4e3b922a7175a05
# bad: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer()
git bisect bad 5ebf7593abb81ec1993f31e90a7573b75aff4db4
# first bad commit: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer()

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-14 10:41                   ` David Sterba
@ 2023-07-15  0:39                     ` Qu Wenruo
  2023-07-15  1:02                       ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2023-07-15  0:39 UTC (permalink / raw)
  To: dsterba; +Cc: Qu Wenruo, linux-btrfs



On 2023/7/14 18:41, David Sterba wrote:
> On Fri, Jul 14, 2023 at 06:32:27PM +0800, Qu Wenruo wrote:
>>>> Already running the auto group with that branch, and no explosion so far
>>>> (btrfs/004 failed to mount with -o atime though).
>>>>
>>>> Any extra setup needed to trigger the failure?
>>>
>>> I'm not aware of anything different than usual. Patches applied to git,
>>> built, updated VM and started. I had another branch built and tested and
>>> it finished the fstests. I can at least bisect which patch does it.
>>
>> A bisection would be very appreciated.
>>
>> Although I guess it should be the memcpy_extent_buffer() patch, I didn't
>> see something obvious right now...
>
> 5ebf7593abb81ec1993f31e90a7573b75aff4db4 is the first bad commit
> btrfs: refactor main loop in memcpy_extent_buffer()

Anything special on the system that you can reproduce the bug?

I checked the overall code, it's a little different than the original
behavior.

The original behavior has double limits on the cross-page case, while
the new code only handles the cross-page on the source, and let
write_extent_buffer() to handle the cross-page situation on the destination.

Considering memcpy() is called for memmove() case, it can explain the
corrupted tree block we see in your report.

Although I can not see the obvious problem, I guess there may be some
hidden corner cases that would be finally exposed if we move to
folio/vmallocated memory eventually.

If I can reproduce it locally the turnover time can be reduced greatly.

Thanks,
Qu
>
> $ git bisect log
> # bad: [5c6c140622dd7107acb13da404f0c682f1f954a6] btrfs: copy all pages at once at the end of btrfs_clone_extent_buffer()
> # good: [72c15cf7e64769ca9273a825fff8495d99975c9c] btrfs: deprecate integrity checker feature
> git bisect start 'ext/qu-eb-page-clanups-updated-broken' '72c15cf7e64769ca9273a825fff8495d99975c9c'
> # good: [85ab525a6a63c477b92099835d6b05eaebd4ad4b] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id()
> git bisect good 85ab525a6a63c477b92099835d6b05eaebd4ad4b
> # bad: [cd6668ef43a224b3f8130b78f4e3b922a7175a05] btrfs: refactor main loop in copy_extent_buffer_full()
> git bisect bad cd6668ef43a224b3f8130b78f4e3b922a7175a05
> # bad: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer()
> git bisect bad 5ebf7593abb81ec1993f31e90a7573b75aff4db4
> # first bad commit: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer()

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion
  2023-07-15  0:39                     ` Qu Wenruo
@ 2023-07-15  1:02                       ` Qu Wenruo
  0 siblings, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2023-07-15  1:02 UTC (permalink / raw)
  To: Qu Wenruo, dsterba; +Cc: linux-btrfs



On 2023/7/15 08:39, Qu Wenruo wrote:
> 
> 
> On 2023/7/14 18:41, David Sterba wrote:
>> On Fri, Jul 14, 2023 at 06:32:27PM +0800, Qu Wenruo wrote:
>>>>> Already running the auto group with that branch, and no explosion 
>>>>> so far
>>>>> (btrfs/004 failed to mount with -o atime though).
>>>>>
>>>>> Any extra setup needed to trigger the failure?
>>>>
>>>> I'm not aware of anything different than usual. Patches applied to git,
>>>> built, updated VM and started. I had another branch built and tested 
>>>> and
>>>> it finished the fstests. I can at least bisect which patch does it.
>>>
>>> A bisection would be very appreciated.
>>>
>>> Although I guess it should be the memcpy_extent_buffer() patch, I didn't
>>> see something obvious right now...
>>
>> 5ebf7593abb81ec1993f31e90a7573b75aff4db4 is the first bad commit
>> btrfs: refactor main loop in memcpy_extent_buffer()
> 
> Anything special on the system that you can reproduce the bug?
> 
> I checked the overall code, it's a little different than the original
> behavior.
> 
> The original behavior has double limits on the cross-page case, while
> the new code only handles the cross-page on the source, and let
> write_extent_buffer() to handle the cross-page situation on the 
> destination.

OK, I got the cause.

It's indeed the memcpy_extent_buffer() rework.

memcpy() itself is not safe if the range is overlapping, and the old 
code is doing proper overlap checks for both memcpy and memmove through 
copy_pages() helper.

And unfortunately I didn't go that copy_pages() helper and triggered the 
problem.

Let me find a better solution for this case.

Thanks,
Qu
> 
> Considering memcpy() is called for memmove() case, it can explain the
> corrupted tree block we see in your report.
> 
> Although I can not see the obvious problem, I guess there may be some
> hidden corner cases that would be finally exposed if we move to
> folio/vmallocated memory eventually.
> 
> If I can reproduce it locally the turnover time can be reduced greatly.
> 
> Thanks,
> Qu
>>
>> $ git bisect log
>> # bad: [5c6c140622dd7107acb13da404f0c682f1f954a6] btrfs: copy all 
>> pages at once at the end of btrfs_clone_extent_buffer()
>> # good: [72c15cf7e64769ca9273a825fff8495d99975c9c] btrfs: deprecate 
>> integrity checker feature
>> git bisect start 'ext/qu-eb-page-clanups-updated-broken' 
>> '72c15cf7e64769ca9273a825fff8495d99975c9c'
>> # good: [85ab525a6a63c477b92099835d6b05eaebd4ad4b] btrfs: use 
>> write_extent_buffer() to implement write_extent_buffer_*id()
>> git bisect good 85ab525a6a63c477b92099835d6b05eaebd4ad4b
>> # bad: [cd6668ef43a224b3f8130b78f4e3b922a7175a05] btrfs: refactor main 
>> loop in copy_extent_buffer_full()
>> git bisect bad cd6668ef43a224b3f8130b78f4e3b922a7175a05
>> # bad: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main 
>> loop in memcpy_extent_buffer()
>> git bisect bad 5ebf7593abb81ec1993f31e90a7573b75aff4db4
>> # first bad commit: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: 
>> refactor main loop in memcpy_extent_buffer()

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-07-15  1:02 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-12  6:37 [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Qu Wenruo
2023-07-12  6:37 ` [PATCH v2 1/6] btrfs: tests: enhance extent buffer bitmap tests Qu Wenruo
2023-07-12  6:37 ` [PATCH v2 2/6] btrfs: refactor extent buffer bitmaps operations Qu Wenruo
2023-07-13 11:53   ` David Sterba
2023-07-12  6:37 ` [PATCH v2 3/6] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() Qu Wenruo
2023-07-12  6:37 ` [PATCH v2 4/6] btrfs: refactor memcpy_extent_buffer() Qu Wenruo
2023-07-13 11:51   ` David Sterba
2023-07-12  6:37 ` [PATCH v2 5/6] btrfs: refactor copy_extent_buffer_full() Qu Wenruo
2023-07-12  6:37 ` [PATCH v2 6/6] btrfs: call copy_extent_buffer_full() inside btrfs_clone_extent_buffer() Qu Wenruo
2023-07-13 11:55   ` David Sterba
2023-07-12  6:44 ` [PATCH v2 0/6] btrfs: preparation patches for the incoming metadata folio conversion Sweet Tea Dorminy
2023-07-12 16:41 ` Christoph Hellwig
2023-07-12 23:58   ` Qu Wenruo
2023-07-13 11:16     ` Christoph Hellwig
2023-07-13 11:26     ` David Sterba
2023-07-13 11:41       ` Qu Wenruo
2023-07-13 11:49         ` David Sterba
2023-07-13 12:09 ` David Sterba
2023-07-13 16:39   ` David Sterba
2023-07-13 21:30     ` Qu Wenruo
2023-07-13 22:03       ` David Sterba
2023-07-14  0:09         ` Qu Wenruo
2023-07-14  0:26           ` David Sterba
2023-07-14  1:58             ` Qu Wenruo
2023-07-14 10:03               ` David Sterba
2023-07-14 10:32                 ` Qu Wenruo
2023-07-14 10:41                   ` David Sterba
2023-07-15  0:39                     ` Qu Wenruo
2023-07-15  1:02                       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox