[PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes
@ 2022-10-26  5:06 Qu Wenruo
  2022-10-26  5:06 ` [PATCH 1/8] btrfs: raid56: extract the vertical stripe recovery code into recover_vertical() Qu Wenruo
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs raid56 is have many and very chaotic function entrances:

- full_stripe_write()
  For full stripe write, will try to lock the full stripe and then do
  the write.

- finish_rmw()
  For rbio which holds the full stripe lock, only do the writes, for
  either full stripe write, or sub-stripe write with cached rbio.

- raid56_rmw_stripe()
  For sub-stripe write which owns the full stripe lock.

Furthermore we are using endio functions to go the next stage of the
work, it's really hard to properly follow the workflow.

The truth is, full-stripe is just a subset of a full RMW cycle, there
really not that much difference to treat full-stripe that differently
(except skip the plug).

This patchset will rework the raid56 write path (recover and scrub path
is not touched yet) by:

- Introduce a main function for raid56 writes
  The main function will be called run_one_write_rbio(), and it always
  executed in rmw_worker workqueue.

- Unified handling for all writes (full/sub-stripe, cached/non-cached,
  degraded or not)
  For full stripe write, it skips the read, and go into write part
  directly.

  For sub-stripe write, we will try to read the missing sectors first,
  and wait for it (we may not read anything if it's cached).

  Then check if we have some missing devices for the above read.
  If so, do recovery first.

  Finally we have everything needed, can submit the write bios, and wait
  for the write to finish.

- No more need for end_io_work
  Since we don't rely on endio functions to jump to the next step.

  Unfortunately rbio::end_io_work can only be removed when recovery
  and scrub path are also migrated to the new single main thread way.

By this, we have unified entrance for all raid56 writes, and no extra
jumping/workqueue mess to interrupt the workflow.

This would make later destructive RMW fix much easier to add, as the
timing of each step in RMW cycle should be very easy to grasp.

Thus I hope this series can be merged before the previous RFC series of
destructive RMW fix.

Qu Wenruo (8):
  btrfs: raid56: extract the vertical stripe recovery code into
    recover_vertical()
  btrfs: raid56: extract the pq generation code into a helper
  btrfs: raid56: introduce a new framework for RAID56 writes
  btrfs: raid56: implement the read part for run_one_write_rbio()
  btrfs: raid56: implement the degraded write for run_one_write_rbio()
  btrfs: raid56: implement the write submission part for
    run_one_write_bio()
  btrfs: raid56: implement raid56_parity_write_v2()
  btrfs: raid56: switch to the new run_one_write_rbio()

 fs/btrfs/raid56.c | 1199 +++++++++++++++++++++++++++------------------
 fs/btrfs/raid56.h |    4 +
 2 files changed, 727 insertions(+), 476 deletions(-)

-- 
2.38.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] btrfs: raid56: extract the vertical stripe recovery code into recover_vertical()
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 2/8] btrfs: raid56: extract the pq generation code into a helper Qu Wenruo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

This refactor includes the following behavior change first:

- Don't error out if only P/Q is corrupted

  The old code will directly error out if only P/Q is corrupted.
  Although it is an logical error if we go into rebuild path with
  only P/Q corrupted, there is no need to error out.

  Just skip the rebuild and return the already good data.

Then comes the following refactor which shouldn't cause behavior
changes:

- Introduce a helper to do vertical stripe recovery

  This not only reduce one indent level, but also paves the road for
  later data checksum verification in RMW cycles.

- Sort rbio->faila/b before recovery

  So we don't need to do the same swap every vertical stripe

- Replace a BUG_ON() with ASSERT()

  Or checkpatch won't let me pass.

- Mark recovered sectors uptodate after the recover loop

- Do the cleanup for pointers unconditionally

  We only need to initialize @pointers and @unmap_array to NULL, so
  we can safely free them unconditionally.

- Mark the repaired sector uptodate in recover_vertical()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 285 ++++++++++++++++++++++++----------------------
 1 file changed, 149 insertions(+), 136 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index c009c0a2081e..085c549a09a9 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1885,6 +1885,144 @@ void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
 	bio_endio(bio);
 }
 
+/*
+ * Recover a vertical stripe specified by @sector_nr.
+ * @*pointers are the pre-allocated pointers by the caller, so we don't
+ * need to allocate/free the pointers again and again.
+ */
+static void recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
+			     void **pointers, void **unmap_array)
+{
+	struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
+	struct sector_ptr *sector;
+	const u32 sectorsize = fs_info->sectorsize;
+	const int faila = rbio->faila;
+	const int failb = rbio->failb;
+	int stripe_nr;
+
+	/*
+	 * Now we just use bitmap to mark the horizontal stripes in
+	 * which we have data when doing parity scrub.
+	 */
+	if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB &&
+	    !test_bit(sector_nr, &rbio->dbitmap))
+		return;
+
+	/*
+	 * Setup our array of pointers with sectors from each stripe
+	 *
+	 * NOTE: store a duplicate array of pointers to preserve the
+	 * pointer order.
+	 */
+	for (stripe_nr = 0; stripe_nr < rbio->real_stripes; stripe_nr++) {
+		/*
+		 * If we're rebuilding a read, we have to use
+		 * pages from the bio list
+		 */
+		if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+		     rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
+		    (stripe_nr == faila || stripe_nr == failb)) {
+			sector = sector_in_rbio(rbio, stripe_nr, sector_nr, 0);
+		} else {
+			sector = rbio_stripe_sector(rbio, stripe_nr, sector_nr);
+		}
+		ASSERT(sector->page);
+		pointers[stripe_nr] = kmap_local_page(sector->page) +
+				   sector->pgoff;
+		unmap_array[stripe_nr] = pointers[stripe_nr];
+	}
+
+	/* All raid6 handling here */
+	if (rbio->bioc->map_type & BTRFS_BLOCK_GROUP_RAID6) {
+		/* Single failure, rebuild from parity raid5 style */
+		if (failb < 0) {
+			if (faila == rbio->nr_data)
+				/*
+				 * Just the P stripe has failed, without
+				 * a bad data or Q stripe.
+				 * We have nothing to do, just skip the
+				 * recovery for this stripe.
+				 */
+				goto cleanup;
+			/*
+			 * a single failure in raid6 is rebuilt
+			 * in the pstripe code below
+			 */
+			goto pstripe;
+		}
+
+		/*
+		 * If the q stripe is failed, do a pstripe reconstruction from
+		 * the xors.
+		 * If both the q stripe and the P stripe are failed, we're
+		 * here due to a crc mismatch and we can't give them the
+		 * data they want.
+		 */
+		if (rbio->bioc->raid_map[failb] == RAID6_Q_STRIPE) {
+			if (rbio->bioc->raid_map[faila] ==
+			    RAID5_P_STRIPE)
+				/*
+				 * Only P and Q are corrupted.
+				 * We only care about data stripes recovery,
+				 * can skip this vertical stripe.
+				 */
+				goto cleanup;
+			/*
+			 * Otherwise we have one bad data stripe and
+			 * a good P stripe.  raid5!
+			 */
+			goto pstripe;
+		}
+
+		if (rbio->bioc->raid_map[failb] == RAID5_P_STRIPE) {
+			raid6_datap_recov(rbio->real_stripes, sectorsize,
+					  faila, pointers);
+		} else {
+			raid6_2data_recov(rbio->real_stripes, sectorsize,
+					  faila, failb, pointers);
+		}
+	} else {
+		void *p;
+
+		/* Rebuild from P stripe here (raid5 or raid6). */
+		ASSERT(failb == -1);
+pstripe:
+		/* Copy parity block into failed block to start with */
+		memcpy(pointers[faila], pointers[rbio->nr_data], sectorsize);
+
+		/* Rearrange the pointer array */
+		p = pointers[faila];
+		for (stripe_nr = faila; stripe_nr < rbio->nr_data - 1;
+		     stripe_nr++)
+			pointers[stripe_nr] = pointers[stripe_nr + 1];
+		pointers[rbio->nr_data - 1] = p;
+
+		/* Xor in the rest */
+		run_xor(pointers, rbio->nr_data - 1, sectorsize);
+
+	}
+
+	/*
+	 * No matter if this is a RMW or recovery, we should have all
+	 * failed sectors repaired in the vertical stripe, thus they are now
+	 * uptodate.
+	 * Especially if we determine to cache the rbio, we need to
+	 * have at least all data sectors uptodate.
+	 */
+	if (rbio->faila >= 0) {
+		sector = rbio_stripe_sector(rbio, rbio->faila, sector_nr);
+		sector->uptodate = 1;
+	}
+	if (rbio->failb >= 0) {
+		sector = rbio_stripe_sector(rbio, rbio->failb, sector_nr);
+		sector->uptodate = 1;
+	}
+
+cleanup:
+	for (stripe_nr = rbio->real_stripes - 1; stripe_nr >= 0; stripe_nr--)
+		kunmap_local(unmap_array[stripe_nr]);
+}
+
 /*
  * all parity reconstruction happens here.  We've read in everything
  * we can find from the drives and this does the heavy lifting of
@@ -1892,13 +2030,10 @@ void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
  */
 static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 {
-	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
-	int sectornr, stripe;
-	void **pointers;
-	void **unmap_array;
-	int faila = -1, failb = -1;
+	int sectornr;
+	void **pointers = NULL;
+	void **unmap_array = NULL;
 	blk_status_t err;
-	int i;
 
 	/*
 	 * This array stores the pointer for each sector, thus it has the extra
@@ -1907,7 +2042,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
 	if (!pointers) {
 		err = BLK_STS_RESOURCE;
-		goto cleanup_io;
+		goto cleanup;
 	}
 
 	/*
@@ -1917,11 +2052,12 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
 	if (!unmap_array) {
 		err = BLK_STS_RESOURCE;
-		goto cleanup_pointers;
+		goto cleanup;
 	}
 
-	faila = rbio->faila;
-	failb = rbio->failb;
+	/* Make sure faila and fail b are in order. */
+	if (rbio->faila >= 0 && rbio->failb >= 0 && rbio->faila > rbio->failb)
+		swap(rbio->faila, rbio->failb);
 
 	if (rbio->operation == BTRFS_RBIO_READ_REBUILD ||
 	    rbio->operation == BTRFS_RBIO_REBUILD_MISSING) {
@@ -1932,138 +2068,15 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 
 	index_rbio_pages(rbio);
 
-	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
-		struct sector_ptr *sector;
-
-		/*
-		 * Now we just use bitmap to mark the horizontal stripes in
-		 * which we have data when doing parity scrub.
-		 */
-		if (rbio->operation == BTRFS_RBIO_PARITY_SCRUB &&
-		    !test_bit(sectornr, &rbio->dbitmap))
-			continue;
-
-		/*
-		 * Setup our array of pointers with sectors from each stripe
-		 *
-		 * NOTE: store a duplicate array of pointers to preserve the
-		 * pointer order
-		 */
-		for (stripe = 0; stripe < rbio->real_stripes; stripe++) {
-			/*
-			 * If we're rebuilding a read, we have to use
-			 * pages from the bio list
-			 */
-			if ((rbio->operation == BTRFS_RBIO_READ_REBUILD ||
-			     rbio->operation == BTRFS_RBIO_REBUILD_MISSING) &&
-			    (stripe == faila || stripe == failb)) {
-				sector = sector_in_rbio(rbio, stripe, sectornr, 0);
-			} else {
-				sector = rbio_stripe_sector(rbio, stripe, sectornr);
-			}
-			ASSERT(sector->page);
-			pointers[stripe] = kmap_local_page(sector->page) +
-					   sector->pgoff;
-			unmap_array[stripe] = pointers[stripe];
-		}
-
-		/* All raid6 handling here */
-		if (rbio->bioc->map_type & BTRFS_BLOCK_GROUP_RAID6) {
-			/* Single failure, rebuild from parity raid5 style */
-			if (failb < 0) {
-				if (faila == rbio->nr_data) {
-					/*
-					 * Just the P stripe has failed, without
-					 * a bad data or Q stripe.
-					 * TODO, we should redo the xor here.
-					 */
-					err = BLK_STS_IOERR;
-					goto cleanup;
-				}
-				/*
-				 * a single failure in raid6 is rebuilt
-				 * in the pstripe code below
-				 */
-				goto pstripe;
-			}
-
-			/* make sure our ps and qs are in order */
-			if (faila > failb)
-				swap(faila, failb);
-
-			/* if the q stripe is failed, do a pstripe reconstruction
-			 * from the xors.
-			 * If both the q stripe and the P stripe are failed, we're
-			 * here due to a crc mismatch and we can't give them the
-			 * data they want
-			 */
-			if (rbio->bioc->raid_map[failb] == RAID6_Q_STRIPE) {
-				if (rbio->bioc->raid_map[faila] ==
-				    RAID5_P_STRIPE) {
-					err = BLK_STS_IOERR;
-					goto cleanup;
-				}
-				/*
-				 * otherwise we have one bad data stripe and
-				 * a good P stripe.  raid5!
-				 */
-				goto pstripe;
-			}
-
-			if (rbio->bioc->raid_map[failb] == RAID5_P_STRIPE) {
-				raid6_datap_recov(rbio->real_stripes,
-						  sectorsize, faila, pointers);
-			} else {
-				raid6_2data_recov(rbio->real_stripes,
-						  sectorsize, faila, failb,
-						  pointers);
-			}
-		} else {
-			void *p;
-
-			/* rebuild from P stripe here (raid5 or raid6) */
-			BUG_ON(failb != -1);
-pstripe:
-			/* Copy parity block into failed block to start with */
-			memcpy(pointers[faila], pointers[rbio->nr_data], sectorsize);
-
-			/* rearrange the pointer array */
-			p = pointers[faila];
-			for (stripe = faila; stripe < rbio->nr_data - 1; stripe++)
-				pointers[stripe] = pointers[stripe + 1];
-			pointers[rbio->nr_data - 1] = p;
-
-			/* xor in the rest */
-			run_xor(pointers, rbio->nr_data - 1, sectorsize);
-		}
-
-		/*
-		 * No matter if this is a RMW or recovery, we should have all
-		 * failed sectors repaired, thus they are now uptodate.
-		 * Especially if we determine to cache the rbio, we need to
-		 * have at least all data sectors uptodate.
-		 */
-		for (i = 0;  i < rbio->stripe_nsectors; i++) {
-			if (faila != -1) {
-				sector = rbio_stripe_sector(rbio, faila, i);
-				sector->uptodate = 1;
-			}
-			if (failb != -1) {
-				sector = rbio_stripe_sector(rbio, failb, i);
-				sector->uptodate = 1;
-			}
-		}
-		for (stripe = rbio->real_stripes - 1; stripe >= 0; stripe--)
-			kunmap_local(unmap_array[stripe]);
-	}
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++)
+		recover_vertical(rbio, sectornr, pointers, unmap_array);
 
 	err = BLK_STS_OK;
+
 cleanup:
 	kfree(unmap_array);
-cleanup_pointers:
 	kfree(pointers);
 
-cleanup_io:
 	/*
 	 * Similar to READ_REBUILD, REBUILD_MISSING at this point also has a
 	 * valid rbio which is consistent with ondisk content, thus such a
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] btrfs: raid56: extract the pq generation code into a helper
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
  2022-10-26  5:06 ` [PATCH 1/8] btrfs: raid56: extract the vertical stripe recovery code into recover_vertical() Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 3/8] btrfs: raid56: introduce a new framework for RAID56 writes Qu Wenruo
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

Currently finish_rmw() will updates the P/Q stripes before submitting
the writes.

It's done behind a for loop, it's a little congested indent wise, so
extract the code into a helper called generate_pq_vertical().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 90 +++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 085c549a09a9..acf36fcaa9f2 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1192,6 +1192,48 @@ static void bio_get_trace_info(struct btrfs_raid_bio *rbio, struct bio *bio,
 	trace_info->stripe_nr = -1;
 }
 
+/* Generate PQ for one veritical stripe. */
+static void generate_pq_vertical(struct btrfs_raid_bio *rbio, int sectornr)
+{
+	void **pointers = rbio->finish_pointers;
+	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
+	struct sector_ptr *sector;
+	int stripe;
+	const bool has_qstripe = rbio->bioc->map_type & BTRFS_BLOCK_GROUP_RAID6;
+
+	/* First collect one sector from each data stripe */
+	for (stripe = 0; stripe < rbio->nr_data; stripe++) {
+		sector = sector_in_rbio(rbio, stripe, sectornr, 0);
+		pointers[stripe] = kmap_local_page(sector->page) +
+				   sector->pgoff;
+	}
+
+	/* Then add the parity stripe */
+	sector = rbio_pstripe_sector(rbio, sectornr);
+	sector->uptodate = 1;
+	pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
+
+	if (has_qstripe) {
+		/*
+		 * RAID6, add the qstripe and call the library function
+		 * to fill in our p/q
+		 */
+		sector = rbio_qstripe_sector(rbio, sectornr);
+		sector->uptodate = 1;
+		pointers[stripe++] = kmap_local_page(sector->page) +
+				     sector->pgoff;
+
+		raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
+					pointers);
+	} else {
+		/* raid5 */
+		memcpy(pointers[rbio->nr_data], pointers[0], sectorsize);
+		run_xor(pointers + 1, rbio->nr_data - 1, sectorsize);
+	}
+	for (stripe = stripe - 1; stripe >= 0; stripe--)
+		kunmap_local(pointers[stripe]);
+}
+
 /*
  * this is called from one of two situations.  We either
  * have a full stripe from the higher layers, or we've read all
@@ -1203,28 +1245,17 @@ static void bio_get_trace_info(struct btrfs_raid_bio *rbio, struct bio *bio,
 static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 {
 	struct btrfs_io_context *bioc = rbio->bioc;
-	const u32 sectorsize = bioc->fs_info->sectorsize;
-	void **pointers = rbio->finish_pointers;
-	int nr_data = rbio->nr_data;
 	/* The total sector number inside the full stripe. */
 	int total_sector_nr;
 	int stripe;
 	/* Sector number inside a stripe. */
 	int sectornr;
-	bool has_qstripe;
 	struct bio_list bio_list;
 	struct bio *bio;
 	int ret;
 
 	bio_list_init(&bio_list);
 
-	if (rbio->real_stripes - rbio->nr_data == 1)
-		has_qstripe = false;
-	else if (rbio->real_stripes - rbio->nr_data == 2)
-		has_qstripe = true;
-	else
-		BUG();
-
 	/* We should have at least one data sector. */
 	ASSERT(bitmap_weight(&rbio->dbitmap, rbio->stripe_nsectors));
 
@@ -1257,41 +1288,8 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	else
 		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
 
-	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++) {
-		struct sector_ptr *sector;
-
-		/* First collect one sector from each data stripe */
-		for (stripe = 0; stripe < nr_data; stripe++) {
-			sector = sector_in_rbio(rbio, stripe, sectornr, 0);
-			pointers[stripe] = kmap_local_page(sector->page) +
-					   sector->pgoff;
-		}
-
-		/* Then add the parity stripe */
-		sector = rbio_pstripe_sector(rbio, sectornr);
-		sector->uptodate = 1;
-		pointers[stripe++] = kmap_local_page(sector->page) + sector->pgoff;
-
-		if (has_qstripe) {
-			/*
-			 * RAID6, add the qstripe and call the library function
-			 * to fill in our p/q
-			 */
-			sector = rbio_qstripe_sector(rbio, sectornr);
-			sector->uptodate = 1;
-			pointers[stripe++] = kmap_local_page(sector->page) +
-					     sector->pgoff;
-
-			raid6_call.gen_syndrome(rbio->real_stripes, sectorsize,
-						pointers);
-		} else {
-			/* raid5 */
-			memcpy(pointers[nr_data], pointers[0], sectorsize);
-			run_xor(pointers + 1, nr_data - 1, sectorsize);
-		}
-		for (stripe = stripe - 1; stripe >= 0; stripe--)
-			kunmap_local(pointers[stripe]);
-	}
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++)
+		generate_pq_vertical(rbio, sectornr);
 
 	/*
 	 * Start writing.  Make bios for everything from the higher layers (the
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] btrfs: raid56: introduce a new framework for RAID56 writes
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
  2022-10-26  5:06 ` [PATCH 1/8] btrfs: raid56: extract the vertical stripe recovery code into recover_vertical() Qu Wenruo
  2022-10-26  5:06 ` [PATCH 2/8] btrfs: raid56: extract the pq generation code into a helper Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 4/8] btrfs: raid56: implement the read part for run_one_write_rbio() Qu Wenruo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

Currently raid56 code has a strong distinguish between full-stripe and
sub-stripe writes, and uses end_io functions to jump to the next step of
RMW.

This has several disadvantages:

- Very hard to follow workflow
  One has to jump several times to follow the workflow.

  Just like OS context switch, it's also expensive for human to do
  context switch.

- Not much shared code for raid56 write path
  In fact, there are 3 types of writes for raid56:

  * Sub-stripe writes without cached rbio
    We need to do full RMW cycle.

  * Sub-stripe writes with cached rbio
    We have all the data needed, can submit writes directly

  * Full-stripe writes
    Just the same as sub-stripe writes with cache.

  As one can see, there full stripe is not much different than
  sub-stripe writes, especially if the sub-stripe writes has a cache hit.

  It's more reasonable to handle all the writes in a single function.

So this patch will introduce a skeleton function called
run_one_write_rbio(), to do all the write operation.

Unlike the existing code, it will follow the submit-and-wait idea, so
that there should be much easier to follow the workflow, and will handle
all sub-stripe/full-stripe writes using the same function.

Currently no real read/write path is implemented, just a skeleton to
show the idea.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/raid56.h |  4 +++
 2 files changed, 71 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index acf36fcaa9f2..c3b33fb8c033 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -988,6 +988,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	}
 
 	bio_list_init(&rbio->bio_list);
+	init_waitqueue_head(&rbio->io_wait);
 	INIT_LIST_HEAD(&rbio->plug_list);
 	spin_lock_init(&rbio->bio_list_lock);
 	INIT_LIST_HEAD(&rbio->stripe_cache);
@@ -1039,6 +1040,19 @@ static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
 	return 0;
 }
 
+static int alloc_rbio_data_pages(struct btrfs_raid_bio *rbio)
+{
+	const int data_pages = rbio->nr_data * rbio->stripe_npages;
+	int ret;
+
+	ret = btrfs_alloc_page_array(data_pages, rbio->stripe_pages);
+	if (ret < 0)
+		return ret;
+
+	index_stripe_sectors(rbio);
+	return 0;
+}
+
 /*
  * Add a single sector @sector into our list of bios for IO.
  *
@@ -2803,3 +2817,56 @@ void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio)
 	if (!lock_stripe_add(rbio))
 		start_async_work(rbio, read_rebuild_work);
 }
+
+/*
+ * This is the main entry to run a write rbio, which will do read-modify-write
+ * cycle.
+ *
+ * Caller should ensure the rbio is holding the full stripe lock.
+ */
+void run_one_write_rbio(struct btrfs_raid_bio *rbio)
+{
+	int ret = 0;
+
+	/*
+	 * Allocate the pages for parity first, as P/Q pages will always be
+	 * needed for both full-stripe and sub-stripe writes.
+	 */
+	ret = alloc_rbio_parity_pages(rbio);
+	if (ret < 0)
+		goto out;
+
+	/* Full stripe write, can write the full stripe right now. */
+	if (rbio_is_full(rbio))
+		goto write;
+
+	/*
+	 * Now we're doing sub-stripe write, need the extra stripe_pages to do
+	 * the full RMW.
+	 */
+	ret = alloc_rbio_data_pages(rbio);
+	if (ret < 0)
+		goto out;
+
+	/* Place holder for read the missing sectors. */
+
+	/*
+	 * We may or may not submitted any sectors, but it doesn't matter.
+	 * Just wait until stripes_pending is zero.
+	 */
+	wait_event(rbio->io_wait, atomic_read(&rbio->stripes_pending) == 0);
+
+	/* Place holder for extra verification for above reads and data csum. */
+write:
+	/* Place holder for real write code. */
+	wait_event(rbio->io_wait, atomic_read(&rbio->stripes_pending) == 0);
+	if (atomic_read(&rbio->error) > rbio->bioc->max_errors)
+		ret = -EIO;
+
+out:
+	/*
+	 * This function needs extra work, as unlock_stripe() will still queue
+	 * the next rbio using the old function entrance.
+	 */
+	rbio_orig_end_io(rbio, errno_to_blk_status(ret));
+}
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 91d5c0adad15..8657cafd32c0 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -95,6 +95,8 @@ struct btrfs_raid_bio {
 
 	atomic_t error;
 
+	wait_queue_head_t io_wait;
+
 	struct work_struct end_io_work;
 
 	/* Bitmap to record which horizontal stripe has data */
@@ -183,4 +185,6 @@ void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio);
 int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info);
 void btrfs_free_stripe_hash_table(struct btrfs_fs_info *info);
 
+void run_one_write_rbio(struct btrfs_raid_bio *rbio);
+
 #endif
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] btrfs: raid56: implement the read part for run_one_write_rbio()
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-10-26  5:06 ` [PATCH 3/8] btrfs: raid56: introduce a new framework for RAID56 writes Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 5/8] btrfs: raid56: implement the degraded write " Qu Wenruo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

The read part is mostly identical as raid56_rmw_stripe(), the
differences are:

- The endio function
  To co-opearate with the new submit-and-wait idea.

- The error handling
  The error handling will be properly done in the caller for
  the original bios, thus the function only need to handle
  the bio list.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index c3b33fb8c033..ffcb9bb226be 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2818,6 +2818,89 @@ void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio)
 		start_async_work(rbio, read_rebuild_work);
 }
 
+static void raid_wait_read_end_io(struct bio *bio)
+{
+	struct btrfs_raid_bio *rbio = bio->bi_private;
+
+	if (bio->bi_status)
+		fail_bio_stripe(rbio, bio);
+	else
+		set_bio_pages_uptodate(rbio, bio);
+
+
+	bio_put(bio);
+	atomic_dec(&rbio->stripes_pending);
+	wake_up(&rbio->io_wait);
+}
+
+static int rmw_submit_reads(struct btrfs_raid_bio *rbio)
+{
+	int bios_to_read = 0;
+	struct bio_list bio_list;
+	const int nr_data_sectors = rbio->stripe_nsectors * rbio->nr_data;
+	int ret;
+	int total_sector_nr;
+	struct bio *bio;
+
+	bio_list_init(&bio_list);
+	atomic_set(&rbio->error, 0);
+
+	/* Build a list of bios to read all the missing data sectors. */
+	for (total_sector_nr = 0; total_sector_nr < nr_data_sectors;
+	     total_sector_nr++) {
+		struct sector_ptr *sector;
+		int stripe = total_sector_nr / rbio->stripe_nsectors;
+		int sectornr = total_sector_nr % rbio->stripe_nsectors;
+
+		/*
+		 * We want to find all the sectors missing from the rbio and
+		 * read them from the disk.  If sector_in_rbio() finds a page
+		 * in the bio list we don't need to read it off the stripe.
+		 */
+		sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+		if (sector)
+			continue;
+
+		sector = rbio_stripe_sector(rbio, stripe, sectornr);
+		/*
+		 * The rbio cache may have handed us an uptodate page.  If so,
+		 * use it.
+		 */
+		if (sector->uptodate)
+			continue;
+
+		ret = rbio_add_io_sector(rbio, &bio_list, sector,
+			       stripe, sectornr, REQ_OP_READ);
+		if (ret)
+			goto error;
+	}
+
+	bios_to_read = bio_list_size(&bio_list);
+	/* This can happen if we have a cached bio. */
+	if (!bios_to_read)
+		return 0;
+
+	atomic_set(&rbio->stripes_pending, bios_to_read);
+	while ((bio = bio_list_pop(&bio_list))) {
+		bio->bi_end_io = raid_wait_read_end_io;
+
+		if (trace_raid56_read_partial_enabled()) {
+			struct raid56_bio_trace_info trace_info = { 0 };
+
+			bio_get_trace_info(rbio, bio, &trace_info);
+			trace_raid56_read_partial(rbio, bio, &trace_info);
+		}
+		submit_bio(bio);
+	}
+	return 0;
+
+error:
+	while ((bio = bio_list_pop(&bio_list)))
+		bio_put(bio);
+
+	return -EIO;
+}
+
 /*
  * This is the main entry to run a write rbio, which will do read-modify-write
  * cycle.
@@ -2848,7 +2931,10 @@ void run_one_write_rbio(struct btrfs_raid_bio *rbio)
 	if (ret < 0)
 		goto out;
 
-	/* Place holder for read the missing sectors. */
+	/* Read the missing secotrs. */
+	ret = rmw_submit_reads(rbio);
+	if (ret < 0)
+		goto out;
 
 	/*
 	 * We may or may not submitted any sectors, but it doesn't matter.
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] btrfs: raid56: implement the degraded write for run_one_write_rbio()
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-10-26  5:06 ` [PATCH 4/8] btrfs: raid56: implement the read part for run_one_write_rbio() Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 6/8] btrfs: raid56: implement the write submission part for run_one_write_bio() Qu Wenruo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

For degraded mount (with missing device), before doing a RMW, we should
read out all the good sectors, and recovery the missing device.

This patch will implement the following new functions:

- recover_submit_reads()
  This is different from rmw_submit_reads() by:

  * won't trust any cache
  * will read P/Q stripes

- recover_one_rbio()
  Mostly the recovery part of __raid_recover_end_io() but follows the
  submit-and-wait idea.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 130 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index ffcb9bb226be..592286783e95 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2901,6 +2901,125 @@ static int rmw_submit_reads(struct btrfs_raid_bio *rbio)
 	return -EIO;
 }
 
+static int recover_submit_reads(struct btrfs_raid_bio *rbio)
+{
+	int bios_to_read = 0;
+	struct bio_list bio_list;
+	int ret;
+	int total_sector_nr;
+	struct bio *bio;
+
+	ASSERT(rbio->faila >= 0 || rbio->failb >= 0);
+	bio_list_init(&bio_list);
+	atomic_set(&rbio->error, 0);
+
+	/* Read out all sectors that doesn't failed. */
+	for (total_sector_nr = 0; total_sector_nr < rbio->nr_sectors;
+	     total_sector_nr++) {
+		int stripe = total_sector_nr / rbio->stripe_nsectors;
+		int sectornr = total_sector_nr % rbio->stripe_nsectors;
+		struct sector_ptr *sector;
+
+		if (rbio->faila == stripe || rbio->failb == stripe) {
+			atomic_inc(&rbio->error);
+			/* Skip the current stripe. */
+			ASSERT(sectornr == 0);
+			total_sector_nr += rbio->stripe_nsectors - 1;
+			continue;
+		}
+		/* We don't trust any cache this time. */
+		sector = rbio_stripe_sector(rbio, stripe, sectornr);
+		ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
+					 sectornr, REQ_OP_READ);
+		if (ret < 0)
+			goto error;
+	}
+
+	bios_to_read = bio_list_size(&bio_list);
+	/*
+	 * We should always need to read some stripes, as we don't use
+	 * any cache.
+	 */
+	ASSERT(bios_to_read);
+
+	atomic_set(&rbio->stripes_pending, bios_to_read);
+	while ((bio = bio_list_pop(&bio_list))) {
+		bio->bi_end_io = raid_wait_read_end_io;
+
+		if (trace_raid56_read_partial_enabled()) {
+			struct raid56_bio_trace_info trace_info = { 0 };
+
+			bio_get_trace_info(rbio, bio, &trace_info);
+			trace_raid56_read_partial(rbio, bio, &trace_info);
+		}
+		submit_bio(bio);
+	}
+	return 0;
+
+error:
+	while ((bio = bio_list_pop(&bio_list)))
+		bio_put(bio);
+
+	return -EIO;
+}
+
+static int recover_one_rbio(struct btrfs_raid_bio *rbio)
+{
+	void **pointers = NULL;
+	void **unmap_array = NULL;
+	int ret = 0;
+	int i;
+
+	/*
+	 * @pointers array stores the pointer for each sector, thus it has the
+	 * extra pgoff value added from each sector
+	 *
+	 * @unmap_array is a copy of pointers that does not get reordered during
+	 * reconstruction so that kunmap_local works.
+	 * This is to keep the order of kunmap_local().
+	 */
+	pointers = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
+	unmap_array = kcalloc(rbio->real_stripes, sizeof(void *), GFP_NOFS);
+	if (!pointers || !unmap_array) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * Firstly read out all sectors we didn't failed. This time we won't
+	 * trust any cache.
+	 */
+	ret = recover_submit_reads(rbio);
+	if (ret < 0)
+		goto out;
+	wait_event(rbio->io_wait, atomic_read(&rbio->stripes_pending) == 0);
+	if (atomic_read(&rbio->error) > rbio->bioc->max_errors) {
+		ret = -EIO;
+		goto out;
+	}
+
+	/* Make sure faila and fail b are in order. */
+	if (rbio->faila >= 0 && rbio->failb >= 0 && rbio->faila > rbio->failb)
+		swap(rbio->faila, rbio->failb);
+	if (rbio->operation == BTRFS_RBIO_READ_REBUILD ||
+	    rbio->operation == BTRFS_RBIO_REBUILD_MISSING) {
+		spin_lock_irq(&rbio->bio_list_lock);
+		set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags);
+		spin_unlock_irq(&rbio->bio_list_lock);
+	}
+
+	/* Now recovery the full stripe. */
+	for (i = 0; i < rbio->stripe_nsectors; i++)
+		recover_vertical(rbio, i, pointers, unmap_array);
+
+	index_rbio_pages(rbio);
+
+out:
+	kfree(pointers);
+	kfree(unmap_array);
+	return ret;
+}
+
 /*
  * This is the main entry to run a write rbio, which will do read-modify-write
  * cycle.
@@ -2942,7 +3061,16 @@ void run_one_write_rbio(struct btrfs_raid_bio *rbio)
 	 */
 	wait_event(rbio->io_wait, atomic_read(&rbio->stripes_pending) == 0);
 
-	/* Place holder for extra verification for above reads and data csum. */
+	/*
+	 * We have some sectors on the missing device(s), needs to recover
+	 * the full stripe before writing.
+	 */
+	if (rbio->faila >= 0 || rbio->failb >= 0) {
+		ret = recover_one_rbio(rbio);
+		if (ret < 0)
+			goto out;
+	}
+
 write:
 	/* Place holder for real write code. */
 	wait_event(rbio->io_wait, atomic_read(&rbio->stripes_pending) == 0);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] btrfs: raid56: implement the write submission part for run_one_write_bio()
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-10-26  5:06 ` [PATCH 5/8] btrfs: raid56: implement the degraded write " Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 7/8] btrfs: raid56: implement raid56_parity_write_v2() Qu Wenruo
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

This is mostly the same as finish_rmw(), the differences are:

- Use dedicated endio
  Which follows the submit-and-wait idea.

- Error handling
  No need to call rbio_orig_end_io().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 160 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 592286783e95..4f648720b97a 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -3020,6 +3020,162 @@ static int recover_one_rbio(struct btrfs_raid_bio *rbio)
 	return ret;
 }
 
+static void raid_wait_write_end_io(struct bio *bio)
+{
+	struct btrfs_raid_bio *rbio = bio->bi_private;
+	blk_status_t err = bio->bi_status;
+
+	if (err)
+		fail_bio_stripe(rbio, bio);
+	bio_put(bio);
+	atomic_dec(&rbio->stripes_pending);
+	wake_up(&rbio->io_wait);
+}
+
+static int rmw_submit_writes(struct btrfs_raid_bio *rbio)
+{
+	struct btrfs_io_context *bioc = rbio->bioc;
+	/* The total sector number inside the full stripe. */
+	int total_sector_nr;
+	int stripe;
+	/* Sector number inside a stripe. */
+	int sectornr;
+	struct bio_list bio_list;
+	struct bio *bio;
+	int ret = 0;
+
+	bio_list_init(&bio_list);
+
+	/* We should have at least one data sector. */
+	ASSERT(bitmap_weight(&rbio->dbitmap, rbio->stripe_nsectors));
+
+	/*
+	 * At this point we either have a full stripe,
+	 * or we've read the full stripe from the drive.
+	 * recalculate the parity and write the new results.
+	 *
+	 * We're not allowed to add any new bios to the
+	 * bio list here, anyone else that wants to
+	 * change this stripe needs to do their own rmw.
+	 */
+	spin_lock_irq(&rbio->bio_list_lock);
+	set_bit(RBIO_RMW_LOCKED_BIT, &rbio->flags);
+	spin_unlock_irq(&rbio->bio_list_lock);
+
+	atomic_set(&rbio->error, 0);
+	rbio->faila = -1;
+	rbio->failb = -1;
+
+	/*
+	 * Now that we've set rmw_locked, run through the
+	 * bio list one last time and map the page pointers
+	 *
+	 * We don't cache full rbios because we're assuming
+	 * the higher layers are unlikely to use this area of
+	 * the disk again soon.  If they do use it again,
+	 * hopefully they will send another full bio.
+	 */
+	index_rbio_pages(rbio);
+	if (!rbio_is_full(rbio))
+		cache_rbio_pages(rbio);
+	else
+		clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags);
+
+	for (sectornr = 0; sectornr < rbio->stripe_nsectors; sectornr++)
+		generate_pq_vertical(rbio, sectornr);
+
+	/*
+	 * Assemble the write bios.
+	 * For data bios always use the bio from higher layer.
+	 */
+	for (total_sector_nr = 0; total_sector_nr < rbio->nr_sectors;
+	     total_sector_nr++) {
+		struct sector_ptr *sector;
+
+		stripe = total_sector_nr / rbio->stripe_nsectors;
+		sectornr = total_sector_nr % rbio->stripe_nsectors;
+
+		/* This vertical stripe has no data, skip it. */
+		if (!test_bit(sectornr, &rbio->dbitmap))
+			continue;
+
+		if (stripe < rbio->nr_data) {
+			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+			if (!sector)
+				continue;
+		} else {
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
+		}
+
+		ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe,
+					 sectornr, REQ_OP_WRITE);
+		if (ret)
+			goto cleanup;
+	}
+
+	if (likely(!bioc->num_tgtdevs))
+		goto write_data;
+
+	/* Assemble write bios for dev-replace target. */
+	for (total_sector_nr = 0; total_sector_nr < rbio->nr_sectors;
+	     total_sector_nr++) {
+		struct sector_ptr *sector;
+
+		stripe = total_sector_nr / rbio->stripe_nsectors;
+		sectornr = total_sector_nr % rbio->stripe_nsectors;
+
+		if (!bioc->tgtdev_map[stripe]) {
+			/*
+			 * We can skip the whole stripe completely, note
+			 * total_sector_nr will be increased by one anyway.
+			 */
+			ASSERT(sectornr == 0);
+			total_sector_nr += rbio->stripe_nsectors - 1;
+			continue;
+		}
+
+		/* This vertical stripe has no data, skip it. */
+		if (!test_bit(sectornr, &rbio->dbitmap))
+			continue;
+
+		if (stripe < rbio->nr_data) {
+			sector = sector_in_rbio(rbio, stripe, sectornr, 1);
+			if (!sector)
+				continue;
+		} else {
+			sector = rbio_stripe_sector(rbio, stripe, sectornr);
+		}
+
+		ret = rbio_add_io_sector(rbio, &bio_list, sector,
+					 rbio->bioc->tgtdev_map[stripe],
+					 sectornr, REQ_OP_WRITE);
+		if (ret)
+			goto cleanup;
+	}
+
+write_data:
+	ASSERT(bio_list_size(&bio_list));
+	atomic_set(&rbio->stripes_pending, bio_list_size(&bio_list));
+
+	while ((bio = bio_list_pop(&bio_list))) {
+		bio->bi_end_io = raid_wait_write_end_io;
+
+		if (trace_raid56_write_stripe_enabled()) {
+			struct raid56_bio_trace_info trace_info = { 0 };
+
+			bio_get_trace_info(rbio, bio, &trace_info);
+			trace_raid56_write_stripe(rbio, bio, &trace_info);
+		}
+		submit_bio(bio);
+	}
+	return 0;
+
+cleanup:
+	while ((bio = bio_list_pop(&bio_list)))
+		bio_put(bio);
+	return ret;
+}
+
 /*
  * This is the main entry to run a write rbio, which will do read-modify-write
  * cycle.
@@ -3072,7 +3228,9 @@ void run_one_write_rbio(struct btrfs_raid_bio *rbio)
 	}
 
 write:
-	/* Place holder for real write code. */
+	ret = rmw_submit_writes(rbio);
+	if (ret < 0)
+		goto out;
 	wait_event(rbio->io_wait, atomic_read(&rbio->stripes_pending) == 0);
 	if (atomic_read(&rbio->error) > rbio->bioc->max_errors)
 		ret = -EIO;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] btrfs: raid56: implement raid56_parity_write_v2()
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-10-26  5:06 ` [PATCH 6/8] btrfs: raid56: implement the write submission part for run_one_write_bio() Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-26  5:06 ` [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio() Qu Wenruo
  2022-10-31  7:39 ` [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

This is the main entrance for raid56 writes.

The difference between this and the old one is:

- Unified queuing
  Now no matter if we're doing full stripe write, or sub-stripe write,
  we always queue the work into rmw_workers, and in that context we
  grab the full stripe lock, and call run_one_write_rbio().

- Simplified plug
  Since there is no work run at the same context of the caller, we have
  no need to delay the unplug work.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/raid56.h |  1 +
 2 files changed, 95 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 4f648720b97a..96be2764433e 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -3242,3 +3242,97 @@ void run_one_write_rbio(struct btrfs_raid_bio *rbio)
 	 */
 	rbio_orig_end_io(rbio, errno_to_blk_status(ret));
 }
+
+static void run_one_write_rbio_work(struct work_struct *work)
+{
+	struct btrfs_raid_bio *rbio =
+		container_of(work, struct btrfs_raid_bio, work);
+	int ret;
+
+	ret = lock_stripe_add(rbio);
+	if (ret == 0)
+		run_one_write_rbio(rbio);
+}
+
+static void queue_write_rbio(struct btrfs_raid_bio *rbio)
+{
+	INIT_WORK(&rbio->work, run_one_write_rbio_work);
+	queue_work(rbio->bioc->fs_info->rmw_workers, &rbio->work);
+}
+
+static void raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
+{
+	struct btrfs_plug_cb *plug = container_of(cb, struct btrfs_plug_cb, cb);
+	struct btrfs_raid_bio *cur;
+	struct btrfs_raid_bio *last = NULL;
+
+	list_sort(NULL, &plug->rbio_list, plug_cmp);
+
+	while (!list_empty(&plug->rbio_list)) {
+		cur = list_entry(plug->rbio_list.next,
+				 struct btrfs_raid_bio, plug_list);
+		list_del_init(&cur->plug_list);
+
+		if (rbio_is_full(cur)) {
+			/* We have a full stripe, queue it down. */
+			queue_write_rbio(cur);
+			continue;
+		}
+		if (last) {
+			if (rbio_can_merge(last, cur)) {
+				merge_rbio(last, cur);
+				free_raid_bio(cur);
+				continue;
+			}
+			queue_write_rbio(cur);
+		}
+		last = cur;
+	}
+	if (last)
+		queue_write_rbio(cur);
+	kfree(plug);
+}
+
+void raid56_parity_write_v2(struct bio *bio, struct btrfs_io_context *bioc)
+{
+	struct btrfs_fs_info *fs_info = bioc->fs_info;
+	struct btrfs_raid_bio *rbio;
+	struct btrfs_plug_cb *plug = NULL;
+	struct blk_plug_cb *cb;
+	int ret = 0;
+
+	rbio = alloc_rbio(fs_info, bioc);
+	if (IS_ERR(rbio)) {
+		ret = PTR_ERR(rbio);
+		goto fail;
+	}
+	rbio->operation = BTRFS_RBIO_WRITE;
+	rbio_add_bio(rbio, bio);
+
+	/* No need to plug on full rbios, just queue this rbio immediately. */
+	if (rbio_is_full(rbio))
+		goto queue_rbio;
+
+	/* Plug to try merge with other writes into the same full stripe. */
+	cb = blk_check_plugged(raid_unplug, fs_info, sizeof(*plug));
+	if (cb) {
+		plug = container_of(cb, struct btrfs_plug_cb, cb);
+		if (!plug->info) {
+			plug->info = fs_info;
+			INIT_LIST_HEAD(&plug->rbio_list);
+		}
+		list_add_tail(&rbio->plug_list, &plug->rbio_list);
+		return;
+	}
+queue_rbio:
+	/*
+	 * Either we don't have any existing plug, or we're doing a full stripe,
+	 * Just queue the rbio.
+	 */
+	queue_write_rbio(rbio);
+
+	return;
+fail:
+	bio->bi_status = errno_to_blk_status(ret);
+	bio_endio(bio);
+}
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 8657cafd32c0..9ae9e89190e4 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -168,6 +168,7 @@ struct btrfs_device;
 void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 			   int mirror_num);
 void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc);
+void raid56_parity_write_v2(struct bio *bio, struct btrfs_io_context *bioc);
 
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
 			    unsigned int pgoff, u64 logical);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio()
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-10-26  5:06 ` [PATCH 7/8] btrfs: raid56: implement raid56_parity_write_v2() Qu Wenruo
@ 2022-10-26  5:06 ` Qu Wenruo
  2022-10-31  1:07   ` Qu Wenruo
  2022-10-31  7:39 ` [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
  8 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2022-10-26  5:06 UTC (permalink / raw)
  To: linux-btrfs

This includes:

- Remove the functions only utilized by the old interface

- Make unlock_stripe() to queue run_one_write_rbio_work_lock()
  As at unlock_stripe(), the next rbio is the one holding the full
  stripe lock, thus it can not use the existing
  run_one_write_rbio_work(), or the rbio may not be executed forever.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c | 351 ++++------------------------------------------
 fs/btrfs/raid56.h |   1 -
 2 files changed, 27 insertions(+), 325 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 96be2764433e..0f0e03904cb1 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -65,7 +65,6 @@ struct sector_ptr {
 
 static int __raid56_parity_recover(struct btrfs_raid_bio *rbio);
 static noinline void finish_rmw(struct btrfs_raid_bio *rbio);
-static void rmw_work(struct work_struct *work);
 static void read_rebuild_work(struct work_struct *work);
 static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio *bio);
 static int fail_rbio_index(struct btrfs_raid_bio *rbio, int failed);
@@ -751,6 +750,29 @@ static noinline int lock_stripe_add(struct btrfs_raid_bio *rbio)
 	return ret;
 }
 
+static void run_one_write_rbio_work(struct work_struct *work)
+{
+	struct btrfs_raid_bio *rbio =
+		container_of(work, struct btrfs_raid_bio, work);
+	int ret;
+
+	ret = lock_stripe_add(rbio);
+	if (ret == 0)
+		run_one_write_rbio(rbio);
+}
+
+/*
+ * This is the special version for unlock_stripe(), where the rbio
+ * is already holding the full stripe lock.
+ */
+static void run_one_write_rbio_work_locked(struct work_struct *work)
+{
+	struct btrfs_raid_bio *rbio =
+		container_of(work, struct btrfs_raid_bio, work);
+
+	run_one_write_rbio(rbio);
+}
+
 /*
  * called as rmw or parity rebuild is completed.  If the plug list has more
  * rbios waiting for this stripe, the next one on the list will be started
@@ -814,7 +836,7 @@ static noinline void unlock_stripe(struct btrfs_raid_bio *rbio)
 				start_async_work(next, read_rebuild_work);
 			} else if (next->operation == BTRFS_RBIO_WRITE) {
 				steal_rbio(rbio, next);
-				start_async_work(next, rmw_work);
+				start_async_work(next, run_one_write_rbio_work_locked);
 			} else if (next->operation == BTRFS_RBIO_PARITY_SCRUB) {
 				steal_rbio(rbio, next);
 				start_async_work(next, scrub_parity_work);
@@ -1119,23 +1141,6 @@ static int rbio_add_io_sector(struct btrfs_raid_bio *rbio,
 	return 0;
 }
 
-/*
- * while we're doing the read/modify/write cycle, we could
- * have errors in reading pages off the disk.  This checks
- * for errors and if we're not able to read the page it'll
- * trigger parity reconstruction.  The rmw will be finished
- * after we've reconstructed the failed stripes
- */
-static void validate_rbio_for_rmw(struct btrfs_raid_bio *rbio)
-{
-	if (rbio->faila >= 0 || rbio->failb >= 0) {
-		BUG_ON(rbio->faila == rbio->real_stripes - 1);
-		__raid56_parity_recover(rbio);
-	} else {
-		finish_rmw(rbio);
-	}
-}
-
 static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 {
 	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
@@ -1548,174 +1553,11 @@ static void raid56_bio_end_io(struct bio *bio)
 			   &rbio->end_io_work);
 }
 
-/*
- * End io handler for the read phase of the RMW cycle.  All the bios here are
- * physical stripe bios we've read from the disk so we can recalculate the
- * parity of the stripe.
- *
- * This will usually kick off finish_rmw once all the bios are read in, but it
- * may trigger parity reconstruction if we had any errors along the way
- */
-static void raid56_rmw_end_io_work(struct work_struct *work)
-{
-	struct btrfs_raid_bio *rbio =
-		container_of(work, struct btrfs_raid_bio, end_io_work);
-
-	if (atomic_read(&rbio->error) > rbio->bioc->max_errors) {
-		rbio_orig_end_io(rbio, BLK_STS_IOERR);
-		return;
-	}
-
-	/*
-	 * This will normally call finish_rmw to start our write but if there
-	 * are any failed stripes we'll reconstruct from parity first.
-	 */
-	validate_rbio_for_rmw(rbio);
-}
-
-/*
- * the stripe must be locked by the caller.  It will
- * unlock after all the writes are done
- */
-static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
-{
-	int bios_to_read = 0;
-	struct bio_list bio_list;
-	const int nr_data_sectors = rbio->stripe_nsectors * rbio->nr_data;
-	int ret;
-	int total_sector_nr;
-	struct bio *bio;
-
-	bio_list_init(&bio_list);
-
-	ret = alloc_rbio_pages(rbio);
-	if (ret)
-		goto cleanup;
-
-	index_rbio_pages(rbio);
-
-	atomic_set(&rbio->error, 0);
-	/* Build a list of bios to read all the missing data sectors. */
-	for (total_sector_nr = 0; total_sector_nr < nr_data_sectors;
-	     total_sector_nr++) {
-		struct sector_ptr *sector;
-		int stripe = total_sector_nr / rbio->stripe_nsectors;
-		int sectornr = total_sector_nr % rbio->stripe_nsectors;
-
-		/*
-		 * We want to find all the sectors missing from the rbio and
-		 * read them from the disk.  If sector_in_rbio() finds a page
-		 * in the bio list we don't need to read it off the stripe.
-		 */
-		sector = sector_in_rbio(rbio, stripe, sectornr, 1);
-		if (sector)
-			continue;
-
-		sector = rbio_stripe_sector(rbio, stripe, sectornr);
-		/*
-		 * The bio cache may have handed us an uptodate page.  If so,
-		 * use it.
-		 */
-		if (sector->uptodate)
-			continue;
-
-		ret = rbio_add_io_sector(rbio, &bio_list, sector,
-			       stripe, sectornr, REQ_OP_READ);
-		if (ret)
-			goto cleanup;
-	}
-
-	bios_to_read = bio_list_size(&bio_list);
-	if (!bios_to_read) {
-		/*
-		 * this can happen if others have merged with
-		 * us, it means there is nothing left to read.
-		 * But if there are missing devices it may not be
-		 * safe to do the full stripe write yet.
-		 */
-		goto finish;
-	}
-
-	/*
-	 * The bioc may be freed once we submit the last bio. Make sure not to
-	 * touch it after that.
-	 */
-	atomic_set(&rbio->stripes_pending, bios_to_read);
-	INIT_WORK(&rbio->end_io_work, raid56_rmw_end_io_work);
-	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_end_io = raid56_bio_end_io;
-
-		if (trace_raid56_read_partial_enabled()) {
-			struct raid56_bio_trace_info trace_info = { 0 };
-
-			bio_get_trace_info(rbio, bio, &trace_info);
-			trace_raid56_read_partial(rbio, bio, &trace_info);
-		}
-		submit_bio(bio);
-	}
-	/* the actual write will happen once the reads are done */
-	return 0;
-
-cleanup:
-	rbio_orig_end_io(rbio, BLK_STS_IOERR);
-
-	while ((bio = bio_list_pop(&bio_list)))
-		bio_put(bio);
-
-	return -EIO;
-
-finish:
-	validate_rbio_for_rmw(rbio);
-	return 0;
-}
-
-/*
- * if the upper layers pass in a full stripe, we thank them by only allocating
- * enough pages to hold the parity, and sending it all down quickly.
- */
-static int full_stripe_write(struct btrfs_raid_bio *rbio)
-{
-	int ret;
-
-	ret = alloc_rbio_parity_pages(rbio);
-	if (ret)
-		return ret;
-
-	ret = lock_stripe_add(rbio);
-	if (ret == 0)
-		finish_rmw(rbio);
-	return 0;
-}
-
 /*
  * partial stripe writes get handed over to async helpers.
  * We're really hoping to merge a few more writes into this
  * rbio before calculating new parity
  */
-static int partial_stripe_write(struct btrfs_raid_bio *rbio)
-{
-	int ret;
-
-	ret = lock_stripe_add(rbio);
-	if (ret == 0)
-		start_async_work(rbio, rmw_work);
-	return 0;
-}
-
-/*
- * sometimes while we were reading from the drive to
- * recalculate parity, enough new bios come into create
- * a full stripe.  So we do a check here to see if we can
- * go directly to finish_rmw
- */
-static int __raid56_parity_write(struct btrfs_raid_bio *rbio)
-{
-	/* head off into rmw land if we don't have a full stripe */
-	if (!rbio_is_full(rbio))
-		return partial_stripe_write(rbio);
-	return full_stripe_write(rbio);
-}
-
 /*
  * We use plugging call backs to collect full stripes.
  * Any time we get a partial stripe write while plugged
@@ -1750,71 +1592,6 @@ static int plug_cmp(void *priv, const struct list_head *a,
 	return 0;
 }
 
-static void run_plug(struct btrfs_plug_cb *plug)
-{
-	struct btrfs_raid_bio *cur;
-	struct btrfs_raid_bio *last = NULL;
-
-	/*
-	 * sort our plug list then try to merge
-	 * everything we can in hopes of creating full
-	 * stripes.
-	 */
-	list_sort(NULL, &plug->rbio_list, plug_cmp);
-	while (!list_empty(&plug->rbio_list)) {
-		cur = list_entry(plug->rbio_list.next,
-				 struct btrfs_raid_bio, plug_list);
-		list_del_init(&cur->plug_list);
-
-		if (rbio_is_full(cur)) {
-			int ret;
-
-			/* we have a full stripe, send it down */
-			ret = full_stripe_write(cur);
-			BUG_ON(ret);
-			continue;
-		}
-		if (last) {
-			if (rbio_can_merge(last, cur)) {
-				merge_rbio(last, cur);
-				free_raid_bio(cur);
-				continue;
-
-			}
-			__raid56_parity_write(last);
-		}
-		last = cur;
-	}
-	if (last) {
-		__raid56_parity_write(last);
-	}
-	kfree(plug);
-}
-
-/*
- * if the unplug comes from schedule, we have to push the
- * work off to a helper thread
- */
-static void unplug_work(struct work_struct *work)
-{
-	struct btrfs_plug_cb *plug;
-	plug = container_of(work, struct btrfs_plug_cb, work);
-	run_plug(plug);
-}
-
-static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
-{
-	struct btrfs_plug_cb *plug;
-	plug = container_of(cb, struct btrfs_plug_cb, cb);
-
-	if (from_schedule) {
-		INIT_WORK(&plug->work, unplug_work);
-		queue_work(plug->info->rmw_workers, &plug->work);
-		return;
-	}
-	run_plug(plug);
-}
-
 /* Add the original bio into rbio->bio_list, and update rbio::dbitmap. */
 static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
 {
@@ -1842,61 +1619,6 @@ static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
 	}
 }
 
-/*
- * our main entry point for writes from the rest of the FS.
- */
-void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
-{
-	struct btrfs_fs_info *fs_info = bioc->fs_info;
-	struct btrfs_raid_bio *rbio;
-	struct btrfs_plug_cb *plug = NULL;
-	struct blk_plug_cb *cb;
-	int ret = 0;
-
-	rbio = alloc_rbio(fs_info, bioc);
-	if (IS_ERR(rbio)) {
-		ret = PTR_ERR(rbio);
-		goto fail;
-	}
-	rbio->operation = BTRFS_RBIO_WRITE;
-	rbio_add_bio(rbio, bio);
-
-	/*
-	 * don't plug on full rbios, just get them out the door
-	 * as quickly as we can
-	 */
-	if (rbio_is_full(rbio)) {
-		ret = full_stripe_write(rbio);
-		if (ret) {
-			free_raid_bio(rbio);
-			goto fail;
-		}
-		return;
-	}
-
-	cb = blk_check_plugged(btrfs_raid_unplug, fs_info, sizeof(*plug));
-	if (cb) {
-		plug = container_of(cb, struct btrfs_plug_cb, cb);
-		if (!plug->info) {
-			plug->info = fs_info;
-			INIT_LIST_HEAD(&plug->rbio_list);
-		}
-		list_add_tail(&rbio->plug_list, &plug->rbio_list);
-	} else {
-		ret = __raid56_parity_write(rbio);
-		if (ret) {
-			free_raid_bio(rbio);
-			goto fail;
-		}
-	}
-
-	return;
-
-fail:
-	bio->bi_status = errno_to_blk_status(ret);
-	bio_endio(bio);
-}
-
 /*
  * Recover a vertical stripe specified by @sector_nr.
  * @*pointers are the pre-allocated pointers by the caller, so we don't
@@ -2308,14 +2030,6 @@ void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 	bio_endio(bio);
 }
 
-static void rmw_work(struct work_struct *work)
-{
-	struct btrfs_raid_bio *rbio;
-
-	rbio = container_of(work, struct btrfs_raid_bio, work);
-	raid56_rmw_stripe(rbio);
-}
-
 static void read_rebuild_work(struct work_struct *work)
 {
 	struct btrfs_raid_bio *rbio;
@@ -3243,17 +2957,6 @@ void run_one_write_rbio(struct btrfs_raid_bio *rbio)
 	rbio_orig_end_io(rbio, errno_to_blk_status(ret));
 }
 
-static void run_one_write_rbio_work(struct work_struct *work)
-{
-	struct btrfs_raid_bio *rbio =
-		container_of(work, struct btrfs_raid_bio, work);
-	int ret;
-
-	ret = lock_stripe_add(rbio);
-	if (ret == 0)
-		run_one_write_rbio(rbio);
-}
-
 static void queue_write_rbio(struct btrfs_raid_bio *rbio)
 {
 	INIT_WORK(&rbio->work, run_one_write_rbio_work);
@@ -3284,16 +2987,16 @@ static void raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
 				free_raid_bio(cur);
 				continue;
 			}
-			queue_write_rbio(cur);
+			queue_write_rbio(last);
 		}
 		last = cur;
 	}
 	if (last)
-		queue_write_rbio(cur);
+		queue_write_rbio(last);
 	kfree(plug);
 }
 
-void raid56_parity_write_v2(struct bio *bio, struct btrfs_io_context *bioc)
+void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
 	struct btrfs_raid_bio *rbio;
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 9ae9e89190e4..8657cafd32c0 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -168,7 +168,6 @@ struct btrfs_device;
 void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 			   int mirror_num);
 void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc);
-void raid56_parity_write_v2(struct bio *bio, struct btrfs_io_context *bioc);
 
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
 			    unsigned int pgoff, u64 logical);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio()
  2022-10-26  5:06 ` [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio() Qu Wenruo
@ 2022-10-31  1:07   ` Qu Wenruo
  2022-10-31 16:48     ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2022-10-31  1:07 UTC (permalink / raw)
  To: linux-btrfs



On 2022/10/26 13:06, Qu Wenruo wrote:
> This includes:
> 
> - Remove the functions only utilized by the old interface
> 
> - Make unlock_stripe() to queue run_one_write_rbio_work_lock()
>    As at unlock_stripe(), the next rbio is the one holding the full
>    stripe lock, thus it can not use the existing
>    run_one_write_rbio_work(), or the rbio may not be executed forever.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/raid56.c | 351 ++++------------------------------------------
>   fs/btrfs/raid56.h |   1 -
>   2 files changed, 27 insertions(+), 325 deletions(-)
> 
[...]
> @@ -3284,16 +2987,16 @@ static void raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
>   				free_raid_bio(cur);
>   				continue;
>   			}
> -			queue_write_rbio(cur);
> +			queue_write_rbio(last);
>   		}
>   		last = cur;
>   	}
>   	if (last)
> -		queue_write_rbio(cur);
> +		queue_write_rbio(last);
>   	kfree(plug);
>   }

This part is in fact a bug fix which should go into previous patch, or 
it can break bisection.

This is already fix in my github repo, will update the series with some 
extra polish, like remove the raid56_parity_write_v2() definition, and 
make recovery path to follow the same idea.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio()
  2022-10-31  1:07   ` Qu Wenruo
@ 2022-10-31 16:48     ` David Sterba
  2022-10-31 23:25       ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2022-10-31 16:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Oct 31, 2022 at 09:07:03AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/10/26 13:06, Qu Wenruo wrote:
> > This includes:
> > 
> > - Remove the functions only utilized by the old interface
> > 
> > - Make unlock_stripe() to queue run_one_write_rbio_work_lock()
> >    As at unlock_stripe(), the next rbio is the one holding the full
> >    stripe lock, thus it can not use the existing
> >    run_one_write_rbio_work(), or the rbio may not be executed forever.
> > 
> > Signed-off-by: Qu Wenruo <wqu@suse.com>
> > ---
> >   fs/btrfs/raid56.c | 351 ++++------------------------------------------
> >   fs/btrfs/raid56.h |   1 -
> >   2 files changed, 27 insertions(+), 325 deletions(-)
> > 
> [...]
> > @@ -3284,16 +2987,16 @@ static void raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >   				free_raid_bio(cur);
> >   				continue;
> >   			}
> > -			queue_write_rbio(cur);
> > +			queue_write_rbio(last);
> >   		}
> >   		last = cur;
> >   	}
> >   	if (last)
> > -		queue_write_rbio(cur);
> > +		queue_write_rbio(last);
> >   	kfree(plug);
> >   }
> 
> This part is in fact a bug fix which should go into previous patch, or 
> it can break bisection.
> 
> This is already fix in my github repo, will update the series with some 
> extra polish, like remove the raid56_parity_write_v2() definition, and 
> make recovery path to follow the same idea.

Ok, I'll use the github branch for for-next if needed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio()
  2022-10-31 16:48     ` David Sterba
@ 2022-10-31 23:25       ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-31 23:25 UTC (permalink / raw)
  To: dsterba, Qu Wenruo; +Cc: linux-btrfs



On 2022/11/1 00:48, David Sterba wrote:
> On Mon, Oct 31, 2022 at 09:07:03AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/10/26 13:06, Qu Wenruo wrote:
>>> This includes:
>>>
>>> - Remove the functions only utilized by the old interface
>>>
>>> - Make unlock_stripe() to queue run_one_write_rbio_work_lock()
>>>     As at unlock_stripe(), the next rbio is the one holding the full
>>>     stripe lock, thus it can not use the existing
>>>     run_one_write_rbio_work(), or the rbio may not be executed forever.
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>    fs/btrfs/raid56.c | 351 ++++------------------------------------------
>>>    fs/btrfs/raid56.h |   1 -
>>>    2 files changed, 27 insertions(+), 325 deletions(-)
>>>
>> [...]
>>> @@ -3284,16 +2987,16 @@ static void raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
>>>    				free_raid_bio(cur);
>>>    				continue;
>>>    			}
>>> -			queue_write_rbio(cur);
>>> +			queue_write_rbio(last);
>>>    		}
>>>    		last = cur;
>>>    	}
>>>    	if (last)
>>> -		queue_write_rbio(cur);
>>> +		queue_write_rbio(last);
>>>    	kfree(plug);
>>>    }
>>
>> This part is in fact a bug fix which should go into previous patch, or
>> it can break bisection.
>>
>> This is already fix in my github repo, will update the series with some
>> extra polish, like remove the raid56_parity_write_v2() definition, and
>> make recovery path to follow the same idea.
> 
> Ok, I'll use the github branch for for-next if needed.

Please don't, the github branch is going to under big changes to cover 
recovery and scrub.

Especially for recovery, it turns out it would be much easier if we 
start the convert for recovery first, as the code is shared between both 
recovery and degraded mount.

I'll send out a v2 to cover all raid56 entrance soon.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes
  2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-10-26  5:06 ` [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio() Qu Wenruo
@ 2022-10-31  7:39 ` Qu Wenruo
  8 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2022-10-31  7:39 UTC (permalink / raw)
  To: linux-btrfs



On 2022/10/26 13:06, Qu Wenruo wrote:
> Currently btrfs raid56 is have many and very chaotic function entrances:
> 
> - full_stripe_write()
>    For full stripe write, will try to lock the full stripe and then do
>    the write.
> 
> - finish_rmw()
>    For rbio which holds the full stripe lock, only do the writes, for
>    either full stripe write, or sub-stripe write with cached rbio.
> 
> - raid56_rmw_stripe()
>    For sub-stripe write which owns the full stripe lock.
> 
> Furthermore we are using endio functions to go the next stage of the
> work, it's really hard to properly follow the workflow.

The patchset can be used for testing, but there will be some updates 
mostly related to cosmetic changes (function names, comments) and 
further expansion to cover recovery and scrub.

Thus please don't merge it as is for now.

Thanks,
Qu

> 
> 
> The truth is, full-stripe is just a subset of a full RMW cycle, there
> really not that much difference to treat full-stripe that differently
> (except skip the plug).
> 
> 
> This patchset will rework the raid56 write path (recover and scrub path
> is not touched yet) by:
> 
> - Introduce a main function for raid56 writes
>    The main function will be called run_one_write_rbio(), and it always
>    executed in rmw_worker workqueue.
> 
> - Unified handling for all writes (full/sub-stripe, cached/non-cached,
>    degraded or not)
>    For full stripe write, it skips the read, and go into write part
>    directly.
> 
>    For sub-stripe write, we will try to read the missing sectors first,
>    and wait for it (we may not read anything if it's cached).
> 
>    Then check if we have some missing devices for the above read.
>    If so, do recovery first.
> 
>    Finally we have everything needed, can submit the write bios, and wait
>    for the write to finish.
> 
> - No more need for end_io_work
>    Since we don't rely on endio functions to jump to the next step.
> 
>    Unfortunately rbio::end_io_work can only be removed when recovery
>    and scrub path are also migrated to the new single main thread way.
> 
> By this, we have unified entrance for all raid56 writes, and no extra
> jumping/workqueue mess to interrupt the workflow.
> 
> This would make later destructive RMW fix much easier to add, as the
> timing of each step in RMW cycle should be very easy to grasp.
> 
> Thus I hope this series can be merged before the previous RFC series of
> destructive RMW fix.
> 
> 
> Qu Wenruo (8):
>    btrfs: raid56: extract the vertical stripe recovery code into
>      recover_vertical()
>    btrfs: raid56: extract the pq generation code into a helper
>    btrfs: raid56: introduce a new framework for RAID56 writes
>    btrfs: raid56: implement the read part for run_one_write_rbio()
>    btrfs: raid56: implement the degraded write for run_one_write_rbio()
>    btrfs: raid56: implement the write submission part for
>      run_one_write_bio()
>    btrfs: raid56: implement raid56_parity_write_v2()
>    btrfs: raid56: switch to the new run_one_write_rbio()
> 
>   fs/btrfs/raid56.c | 1199 +++++++++++++++++++++++++++------------------
>   fs/btrfs/raid56.h |    4 +
>   2 files changed, 727 insertions(+), 476 deletions(-)
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-31 23:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-26  5:06 [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo
2022-10-26  5:06 ` [PATCH 1/8] btrfs: raid56: extract the vertical stripe recovery code into recover_vertical() Qu Wenruo
2022-10-26  5:06 ` [PATCH 2/8] btrfs: raid56: extract the pq generation code into a helper Qu Wenruo
2022-10-26  5:06 ` [PATCH 3/8] btrfs: raid56: introduce a new framework for RAID56 writes Qu Wenruo
2022-10-26  5:06 ` [PATCH 4/8] btrfs: raid56: implement the read part for run_one_write_rbio() Qu Wenruo
2022-10-26  5:06 ` [PATCH 5/8] btrfs: raid56: implement the degraded write " Qu Wenruo
2022-10-26  5:06 ` [PATCH 6/8] btrfs: raid56: implement the write submission part for run_one_write_bio() Qu Wenruo
2022-10-26  5:06 ` [PATCH 7/8] btrfs: raid56: implement raid56_parity_write_v2() Qu Wenruo
2022-10-26  5:06 ` [PATCH 8/8] btrfs: raid56: switch to the new run_one_write_rbio() Qu Wenruo
2022-10-31  1:07   ` Qu Wenruo
2022-10-31 16:48     ` David Sterba
2022-10-31 23:25       ` Qu Wenruo
2022-10-31  7:39 ` [PATCH 0/8] btrfs: raid56: use submit-and-wait method to handle raid56 writes Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).