All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup
@ 2022-08-02  7:52 Qu Wenruo
  2022-08-02  7:52 ` [PATCH v2 1/3] btrfs-progs: avoid repeated data write for metadata Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-02  7:52 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Separate the fixes from the initial patch
- Fix a bug in BUG_ON() condition which causes mkfs test failure

There is a bug report from Shinichiro that for zoned device mkfs -m DUP
(using RST) doesn't work.

It turns out to be a bug in commit 2a93728391a1 ("btrfs-progs: use
write_data_to_disk() to replace write_extent_to_disk()"), which I
wrongly assumed that write_data_to_disk() will only write the data to
one mirror.

In fact, write_data_to_disk() writes data to all mirrors, thus the
@mirror argument is completely unnecessary.

The first patch will fix the problem and cleanup the unnecessary
argument to avoid confusion.

Then the 2nd patch will fix a BUG_ON() condition.

Finally the last patch will cleanup write_and_map_eb() to completely
rely on write_data_to_disk(), without manually handling RAID56. 


Qu Wenruo (3):
  btrfs-progs: avoid repeated data write for metadata
  btrfs-progs: fix a BUG_ON() condition for write_data_to_disk()
  btrfs-progs: use write_data_to_disk() to handle RAID56 in
    write_and_map_eb()

 image/main.c              |  2 +-
 kernel-shared/disk-io.c   | 39 +++------------------------------------
 kernel-shared/extent_io.c | 12 +++++++++---
 kernel-shared/extent_io.h |  2 +-
 4 files changed, 14 insertions(+), 41 deletions(-)

-- 
2.37.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] btrfs-progs: avoid repeated data write for metadata
  2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
@ 2022-08-02  7:52 ` Qu Wenruo
  2022-08-02  7:52 ` [PATCH v2 2/3] btrfs-progs: fix a BUG_ON() condition for write_data_to_disk() Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-02  7:52 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Shinichiro Kawasaki

[BUG]
Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write
into the device.
For non-zoned device this is not a big deal, but for zoned device this
is critical, as zoned device doesn't support overwrite at all.

[CAUSE]
The problem is related to write_and_map_eb() call, since commit
2a93728391a1 ("btrfs-progs: use write_data_to_disk() to replace
write_extent_to_disk()"), we call write_data_to_disk() for metadata
write back.

But the problem is, write_data_to_disk() will call btrfs_map_block()
with rw = WRITE.

By that btrfs_map_block() will always return all stripes, while in
write_data_to_disk() we also iterate through each mirror of the range.

This results above repeated writeback.

[FIX]
Fix this problem by completely remove @mirror argument
from write_data_to_disk().
With extra comments to explicitly show that function will write to
all mirrors.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 2a93728391a1 ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 image/main.c              |  2 +-
 kernel-shared/disk-io.c   | 24 ++++++++++--------------
 kernel-shared/extent_io.c | 10 ++++++++--
 kernel-shared/extent_io.h |  2 +-
 4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/image/main.c b/image/main.c
index 5bcd10f021d7..6793fe4b9076 100644
--- a/image/main.c
+++ b/image/main.c
@@ -1495,7 +1495,7 @@ static int restore_one_work(struct mdrestore_struct *mdres,
 			}
 		} else if (async->start != BTRFS_SUPER_INFO_OFFSET) {
 			ret = write_data_to_disk(mdres->info, buffer,
-						 async->start, out_len, 0);
+						 async->start, out_len);
 			if (ret) {
 				error("failed to write data");
 				exit(1);
diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index 26b1c9aa192a..a6e66aee7bf7 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -452,8 +452,6 @@ struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
 int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
 {
 	int ret;
-	int mirror_num;
-	int max_mirror;
 	u64 length;
 	u64 *raid_map = NULL;
 	struct btrfs_multi_bio *multi = NULL;
@@ -483,18 +481,16 @@ int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
 		goto out;
 	}
 
-	/* For non-RAID56, we just writeback data to each mirror */
-	max_mirror = btrfs_num_copies(fs_info, eb->start, eb->len);
-	for (mirror_num = 1; mirror_num <= max_mirror; mirror_num++) {
-		ret = write_data_to_disk(fs_info, eb->data, eb->start, eb->len,
-				         mirror_num);
-		if (ret < 0) {
-			errno = -ret;
-			error(
-		"failed to write bytenr %llu length %u to mirror %d: %m",
-				eb->start, eb->len, mirror_num);
-			goto out;
-		}
+	/*
+	 * For non-RAID56, we just writeback data to all mirrors using
+	 * write_data_to_disk().
+	 */
+	ret = write_data_to_disk(fs_info, eb->data, eb->start, eb->len);
+	if (ret < 0) {
+		errno = -ret;
+		error("failed to write bytenr %llu length %u: %m",
+			eb->start, eb->len);
+		goto out;
 	}
 
 out:
diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c
index d6326ab2dc52..a05249815bb1 100644
--- a/kernel-shared/extent_io.c
+++ b/kernel-shared/extent_io.c
@@ -938,8 +938,14 @@ int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 logical,
 	return 0;
 }
 
+/*
+ * Write the data in @buf to logical bytenr @offset.
+ *
+ * Such data will be written to all mirrors and RAID56 P/Q will also be
+ * properly handled.
+ */
 int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
-		      u64 bytes, int mirror)
+		       u64 bytes)
 {
 	struct btrfs_multi_bio *multi = NULL;
 	struct btrfs_device *device;
@@ -956,7 +962,7 @@ int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
 		dev_nr = 0;
 
 		ret = btrfs_map_block(info, WRITE, offset, &this_len, &multi,
-				      mirror, &raid_map);
+				      0, &raid_map);
 		if (ret) {
 			fprintf(stderr, "Couldn't map the block %llu\n",
 				offset);
diff --git a/kernel-shared/extent_io.h b/kernel-shared/extent_io.h
index fc2e4cc455d6..2148a8112428 100644
--- a/kernel-shared/extent_io.h
+++ b/kernel-shared/extent_io.h
@@ -162,7 +162,7 @@ int clear_extent_buffer_dirty(struct extent_buffer *eb);
 int read_data_from_disk(struct btrfs_fs_info *info, void *buf, u64 logical,
 			u64 *len, int mirror);
 int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
-		       u64 bytes, int mirror);
+		       u64 bytes);
 void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start,
                                 unsigned long pos, unsigned long len);
 void extent_buffer_bitmap_set(struct extent_buffer *eb, unsigned long start,
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] btrfs-progs: fix a BUG_ON() condition for write_data_to_disk()
  2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
  2022-08-02  7:52 ` [PATCH v2 1/3] btrfs-progs: avoid repeated data write for metadata Qu Wenruo
@ 2022-08-02  7:52 ` Qu Wenruo
  2022-08-02  7:52 ` [PATCH v2 3/3] btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb() Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-02  7:52 UTC (permalink / raw)
  To: linux-btrfs

The BUG_ON() condition in write_data_to_disk() is no longer correct.

Now write_raid56_with_parity() will return the bytes written of last
stripe.

Thus a success writeback can trigger the BUG_ON(ret).

Fix the condition to (ret < 0).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kernel-shared/extent_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel-shared/extent_io.c b/kernel-shared/extent_io.c
index a05249815bb1..48bcf2cf2f96 100644
--- a/kernel-shared/extent_io.c
+++ b/kernel-shared/extent_io.c
@@ -990,7 +990,7 @@ int write_data_to_disk(struct btrfs_fs_info *info, void *buf, u64 offset,
 			memcpy(eb->data, buf + total_write, this_len);
 			ret = write_raid56_with_parity(info, eb, multi,
 						       stripe_len, raid_map);
-			BUG_ON(ret);
+			BUG_ON(ret < 0);
 
 			free(eb);
 			kfree(raid_map);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb()
  2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
  2022-08-02  7:52 ` [PATCH v2 1/3] btrfs-progs: avoid repeated data write for metadata Qu Wenruo
  2022-08-02  7:52 ` [PATCH v2 2/3] btrfs-progs: fix a BUG_ON() condition for write_data_to_disk() Qu Wenruo
@ 2022-08-02  7:52 ` Qu Wenruo
  2022-08-02  8:04 ` [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Johannes Thumshirn
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-02  7:52 UTC (permalink / raw)
  To: linux-btrfs

Function write_data_to_disk() can handle RAID56 writes without any
problem.

So just call write_data_to_disk() inside write_and_map_eb() instead of
manually doing the RAID56 write.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 kernel-shared/disk-io.c | 31 +------------------------------
 1 file changed, 1 insertion(+), 30 deletions(-)

diff --git a/kernel-shared/disk-io.c b/kernel-shared/disk-io.c
index a6e66aee7bf7..d276e52df060 100644
--- a/kernel-shared/disk-io.c
+++ b/kernel-shared/disk-io.c
@@ -452,39 +452,10 @@ struct extent_buffer* read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
 int write_and_map_eb(struct btrfs_fs_info *fs_info, struct extent_buffer *eb)
 {
 	int ret;
-	u64 length;
 	u64 *raid_map = NULL;
 	struct btrfs_multi_bio *multi = NULL;
 
-	length = eb->len;
-	ret = btrfs_map_block(fs_info, WRITE, eb->start, &length,
-			      &multi, 0, &raid_map);
-	if (ret < 0) {
-		errno = -ret;
-		error("failed to map bytenr %llu length %u: %m",
-			eb->start, eb->len);
-		goto out;
-	}
-
-	/* RAID56 write back need RMW */
-	if (raid_map) {
-		ret = write_raid56_with_parity(fs_info, eb, multi,
-					       length, raid_map);
-		if (ret < 0) {
-			errno = -ret;
-			error(
-		"failed to write raid56 stripe for bytenr %llu length %llu: %m",
-				eb->start, length);
-		} else {
-			ret = 0;
-		}
-		goto out;
-	}
-
-	/*
-	 * For non-RAID56, we just writeback data to all mirrors using
-	 * write_data_to_disk().
-	 */
+	/* write_data_to_disk() will handle all mirrors and RAID56. */
 	ret = write_data_to_disk(fs_info, eb->data, eb->start, eb->len);
 	if (ret < 0) {
 		errno = -ret;
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup
  2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-08-02  7:52 ` [PATCH v2 3/3] btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb() Qu Wenruo
@ 2022-08-02  8:04 ` Johannes Thumshirn
  2022-08-02  8:06   ` Qu Wenruo
  2022-08-02  8:34 ` Shinichiro Kawasaki
  2022-08-03 19:25 ` David Sterba
  5 siblings, 1 reply; 8+ messages in thread
From: Johannes Thumshirn @ 2022-08-02  8:04 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs@vger.kernel.org

On 02.08.22 09:53, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Separate the fixes from the initial patch
> - Fix a bug in BUG_ON() condition which causes mkfs test failure
> 
> There is a bug report from Shinichiro that for zoned device mkfs -m DUP
> (using RST) doesn't work.

Nit, it's without RST.

Anyways, for the series:
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup
  2022-08-02  8:04 ` [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Johannes Thumshirn
@ 2022-08-02  8:06   ` Qu Wenruo
  0 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2022-08-02  8:06 UTC (permalink / raw)
  To: Johannes Thumshirn, Qu Wenruo, linux-btrfs@vger.kernel.org



On 2022/8/2 16:04, Johannes Thumshirn wrote:
> On 02.08.22 09:53, Qu Wenruo wrote:
>> [CHANGELOG]
>> v2:
>> - Separate the fixes from the initial patch
>> - Fix a bug in BUG_ON() condition which causes mkfs test failure
>>
>> There is a bug report from Shinichiro that for zoned device mkfs -m DUP
>> (using RST) doesn't work.
>
> Nit, it's without RST.

My bad, for RST it should be -d DUP.

Thanks,
Qu
>
> Anyways, for the series:
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup
  2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-08-02  8:04 ` [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Johannes Thumshirn
@ 2022-08-02  8:34 ` Shinichiro Kawasaki
  2022-08-03 19:25 ` David Sterba
  5 siblings, 0 replies; 8+ messages in thread
From: Shinichiro Kawasaki @ 2022-08-02  8:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org

On Aug 02, 2022 / 15:52, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Separate the fixes from the initial patch
> - Fix a bug in BUG_ON() condition which causes mkfs test failure
> 
> There is a bug report from Shinichiro that for zoned device mkfs -m DUP
> (using RST) doesn't work.
> 
> It turns out to be a bug in commit 2a93728391a1 ("btrfs-progs: use
> write_data_to_disk() to replace write_extent_to_disk()"), which I
> wrongly assumed that write_data_to_disk() will only write the data to
> one mirror.
> 
> In fact, write_data_to_disk() writes data to all mirrors, thus the
> @mirror argument is completely unnecessary.
> 
> The first patch will fix the problem and cleanup the unnecessary
> argument to avoid confusion.
> 
> Then the 2nd patch will fix a BUG_ON() condition.
> 
> Finally the last patch will cleanup write_and_map_eb() to completely
> rely on write_data_to_disk(), without manually handling RAID56. 

I reconfirmed that this v2 series avoids the mkfs.btrfs failure. It
also avoids the duplicate write. For the series:

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

-- 
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup
  2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-08-02  8:34 ` Shinichiro Kawasaki
@ 2022-08-03 19:25 ` David Sterba
  5 siblings, 0 replies; 8+ messages in thread
From: David Sterba @ 2022-08-03 19:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Aug 02, 2022 at 03:52:40PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Separate the fixes from the initial patch
> - Fix a bug in BUG_ON() condition which causes mkfs test failure
> 
> There is a bug report from Shinichiro that for zoned device mkfs -m DUP
> (using RST) doesn't work.
> 
> It turns out to be a bug in commit 2a93728391a1 ("btrfs-progs: use
> write_data_to_disk() to replace write_extent_to_disk()"), which I
> wrongly assumed that write_data_to_disk() will only write the data to
> one mirror.
> 
> In fact, write_data_to_disk() writes data to all mirrors, thus the
> @mirror argument is completely unnecessary.
> 
> The first patch will fix the problem and cleanup the unnecessary
> argument to avoid confusion.
> 
> Then the 2nd patch will fix a BUG_ON() condition.
> 
> Finally the last patch will cleanup write_and_map_eb() to completely
> rely on write_data_to_disk(), without manually handling RAID56. 
> 
> 
> Qu Wenruo (3):
>   btrfs-progs: avoid repeated data write for metadata
>   btrfs-progs: fix a BUG_ON() condition for write_data_to_disk()
>   btrfs-progs: use write_data_to_disk() to handle RAID56 in
>     write_and_map_eb()

Added to devel, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-03 19:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-02  7:52 [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Qu Wenruo
2022-08-02  7:52 ` [PATCH v2 1/3] btrfs-progs: avoid repeated data write for metadata Qu Wenruo
2022-08-02  7:52 ` [PATCH v2 2/3] btrfs-progs: fix a BUG_ON() condition for write_data_to_disk() Qu Wenruo
2022-08-02  7:52 ` [PATCH v2 3/3] btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb() Qu Wenruo
2022-08-02  8:04 ` [PATCH v2 0/3] btrfs-progs: avoid repeated data write for metadata and a small cleanup Johannes Thumshirn
2022-08-02  8:06   ` Qu Wenruo
2022-08-02  8:34 ` Shinichiro Kawasaki
2022-08-03 19:25 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.