From: Dan Williams <dan.j.williams@intel.com>
To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Cc: neilb@suse.de, akpm@linux-foundation.org, davem@davemloft.net,
christopher.leech@intel.com, shannon.nelson@intel.com,
herbert@gondor.apana.org.au, jeff@garzik.org
Subject: [md-accel PATCH 09/19] md: handle_stripe5 - add request/completion logic for async write ops
Date: Tue, 26 Jun 2007 18:51:20 -0700 [thread overview]
Message-ID: <20070627015120.18962.89311.stgit@dwillia2-linux.ch.intel.com> (raw)
In-Reply-To: <20070627014823.18962.96398.stgit@dwillia2-linux.ch.intel.com>
After handle_stripe5 decides whether it wants to perform a
read-modify-write, or a reconstruct write it calls
handle_write_operations5. A read-modify-write operation will perform an
xor subtraction of the blocks marked with the R5_Wantprexor flag, copy the
new data into the stripe (biodrain) and perform a postxor operation across
all up-to-date blocks to generate the new parity. A reconstruct write is run
when all blocks are already up-to-date in the cache so all that is needed
is a biodrain and postxor.
On the completion path STRIPE_OP_PREXOR will be set if the operation was a
read-modify-write. The STRIPE_OP_BIODRAIN flag is used in the completion
path to differentiate write-initiated postxor operations versus
expansion-initiated postxor operations. Completion of a write triggers i/o
to the drives.
Changelog:
* make the 'rcw' parameter to handle_write_operations5 a simple flag, Neil Brown
* remove test_and_set/test_and_clear BUG_ONs, Neil Brown
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-By: NeilBrown <neilb@suse.de>
---
drivers/md/raid5.c | 161 +++++++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 138 insertions(+), 23 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7c688f6..b2e88fe 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1815,7 +1815,79 @@ static void compute_block_2(struct stripe_head *sh, int dd_idx1, int dd_idx2)
}
}
+static int
+handle_write_operations5(struct stripe_head *sh, int rcw, int expand)
+{
+ int i, pd_idx = sh->pd_idx, disks = sh->disks;
+ int locked = 0;
+
+ if (rcw) {
+ /* if we are not expanding this is a proper write request, and
+ * there will be bios with new data to be drained into the
+ * stripe cache
+ */
+ if (!expand) {
+ set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+ sh->ops.count++;
+ }
+
+ set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+ sh->ops.count++;
+
+ for (i = disks; i--; ) {
+ struct r5dev *dev = &sh->dev[i];
+
+ if (dev->towrite) {
+ set_bit(R5_LOCKED, &dev->flags);
+ if (!expand)
+ clear_bit(R5_UPTODATE, &dev->flags);
+ locked++;
+ }
+ }
+ } else {
+ BUG_ON(!(test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags) ||
+ test_bit(R5_Wantcompute, &sh->dev[pd_idx].flags)));
+
+ set_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+ set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+ set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+
+ sh->ops.count += 3;
+
+ for (i = disks; i--; ) {
+ struct r5dev *dev = &sh->dev[i];
+ if (i == pd_idx)
+ continue;
+
+ /* For a read-modify write there may be blocks that are
+ * locked for reading while others are ready to be
+ * written so we distinguish these blocks by the
+ * R5_Wantprexor bit
+ */
+ if (dev->towrite &&
+ (test_bit(R5_UPTODATE, &dev->flags) ||
+ test_bit(R5_Wantcompute, &dev->flags))) {
+ set_bit(R5_Wantprexor, &dev->flags);
+ set_bit(R5_LOCKED, &dev->flags);
+ clear_bit(R5_UPTODATE, &dev->flags);
+ locked++;
+ }
+ }
+ }
+
+ /* keep the parity disk locked while asynchronous operations
+ * are in flight
+ */
+ set_bit(R5_LOCKED, &sh->dev[pd_idx].flags);
+ clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
+ locked++;
+ pr_debug("%s: stripe %llu locked: %d pending: %lx\n",
+ __FUNCTION__, (unsigned long long)sh->sector,
+ locked, sh->ops.pending);
+
+ return locked;
+}
/*
* Each stripe/dev can have one or more bion attached.
@@ -2210,27 +2282,8 @@ static void handle_issuing_new_write_requests5(raid5_conf_t *conf,
* we can start a write request
*/
if (s->locked == 0 && (rcw == 0 || rmw == 0) &&
- !test_bit(STRIPE_BIT_DELAY, &sh->state)) {
- pr_debug("Computing parity...\n");
- compute_parity5(sh, rcw == 0 ?
- RECONSTRUCT_WRITE : READ_MODIFY_WRITE);
- /* now every locked buffer is ready to be written */
- for (i = disks; i--; )
- if (test_bit(R5_LOCKED, &sh->dev[i].flags)) {
- pr_debug("Writing block %d\n", i);
- s->locked++;
- set_bit(R5_Wantwrite, &sh->dev[i].flags);
- if (!test_bit(R5_Insync, &sh->dev[i].flags)
- || (i == sh->pd_idx && s->failed == 0))
- set_bit(STRIPE_INSYNC, &sh->state);
- }
- if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
- atomic_dec(&conf->preread_active_stripes);
- if (atomic_read(&conf->preread_active_stripes) <
- IO_THRESHOLD)
- md_wakeup_thread(conf->mddev->thread);
- }
- }
+ !test_bit(STRIPE_BIT_DELAY, &sh->state))
+ s->locked += handle_write_operations5(sh, rcw == 0, 0);
}
static void handle_issuing_new_write_requests6(raid5_conf_t *conf,
@@ -2649,8 +2702,70 @@ static void handle_stripe5(struct stripe_head *sh)
(s.syncing && (s.uptodate < disks)) || s.expanding)
handle_issuing_new_read_requests5(sh, &s, disks);
- /* now to consider writing and what else, if anything should be read */
- if (s.to_write)
+ /* Now we check to see if any write operations have recently
+ * completed
+ */
+
+ /* leave prexor set until postxor is done, allows us to distinguish
+ * a rmw from a rcw during biodrain
+ */
+ if (test_bit(STRIPE_OP_PREXOR, &sh->ops.complete) &&
+ test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) {
+
+ clear_bit(STRIPE_OP_PREXOR, &sh->ops.complete);
+ clear_bit(STRIPE_OP_PREXOR, &sh->ops.ack);
+ clear_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+
+ for (i = disks; i--; )
+ clear_bit(R5_Wantprexor, &sh->dev[i].flags);
+ }
+
+ /* if only POSTXOR is set then this is an 'expand' postxor */
+ if (test_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete) &&
+ test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) {
+
+ clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete);
+ clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.ack);
+ clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+
+ clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete);
+ clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack);
+ clear_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+
+ /* All the 'written' buffers and the parity block are ready to
+ * be written back to disk
+ */
+ BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags));
+ for (i = disks; i--; ) {
+ dev = &sh->dev[i];
+ if (test_bit(R5_LOCKED, &dev->flags) &&
+ (i == sh->pd_idx || dev->written)) {
+ pr_debug("Writing block %d\n", i);
+ set_bit(R5_Wantwrite, &dev->flags);
+ if (!test_and_set_bit(
+ STRIPE_OP_IO, &sh->ops.pending))
+ sh->ops.count++;
+ if (!test_bit(R5_Insync, &dev->flags) ||
+ (i == sh->pd_idx && s.failed == 0))
+ set_bit(STRIPE_INSYNC, &sh->state);
+ }
+ }
+ if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
+ atomic_dec(&conf->preread_active_stripes);
+ if (atomic_read(&conf->preread_active_stripes) <
+ IO_THRESHOLD)
+ md_wakeup_thread(conf->mddev->thread);
+ }
+ }
+
+ /* Now to consider new write requests and what else, if anything
+ * should be read. We do not handle new writes when:
+ * 1/ A 'write' operation (copy+xor) is already in flight.
+ * 2/ A 'check' operation is in flight, as it may clobber the parity
+ * block.
+ */
+ if (s.to_write && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending) &&
+ !test_bit(STRIPE_OP_CHECK, &sh->ops.pending))
handle_issuing_new_write_requests5(conf, sh, &s, disks);
/* maybe we need to check and possibly fix the parity for this stripe
next prev parent reply other threads:[~2007-06-27 1:51 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-27 1:50 [md-accel PATCH 00/19] md raid acceleration and the async_tx api Dan Williams
2007-06-27 1:50 ` [md-accel PATCH 01/19] dmaengine: refactor dmaengine around dma_async_tx_descriptor Dan Williams
2007-06-27 1:50 ` [md-accel PATCH 02/19] dmaengine: make clients responsible for managing channels Dan Williams
2007-06-27 1:50 ` [md-accel PATCH 03/19] xor: make 'xor_blocks' a library routine for use with async_tx Dan Williams
2007-06-27 6:39 ` Satyam Sharma
2007-06-27 16:13 ` Dan Williams
2007-06-27 16:22 ` Herbert Xu
2007-06-27 1:50 ` [md-accel PATCH 04/19] async_tx: add the async_tx api Dan Williams
2007-06-27 1:50 ` Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 05/19] raid5: refactor handle_stripe5 and handle_stripe6 (v2) Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 06/19] raid5: replace custom debug PRINTKs with standard pr_debug Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 07/19] md: raid5_run_ops - run stripe operations outside sh->lock Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 08/19] md: common infrastructure for running operations with raid5_run_ops Dan Williams
2007-06-27 1:51 ` Dan Williams [this message]
2007-06-27 1:51 ` [md-accel PATCH 10/19] md: handle_stripe5 - add request/completion logic for async compute ops Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 11/19] md: handle_stripe5 - add request/completion logic for async check ops Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 12/19] md: handle_stripe5 - add request/completion logic for async read ops Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 13/19] md: handle_stripe5 - add request/completion logic for async expand ops Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 14/19] md: handle_stripe5 - request io processing in raid5_run_ops Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 15/19] md: remove raid5 compute_block and compute_parity5 Dan Williams
2007-06-27 1:51 ` [md-accel PATCH 16/19] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines Dan Williams
2007-06-27 1:51 ` Dan Williams
2007-08-27 13:11 ` saeed bishara
2007-08-27 13:14 ` saeed bishara
2007-08-27 19:31 ` Williams, Dan J
2007-08-27 19:31 ` Williams, Dan J
2007-08-30 18:43 ` saeed bishara
2007-08-30 20:41 ` Dan Williams
2007-06-27 1:52 ` [md-accel PATCH 17/19] iop13xx: surface the iop13xx adma units to the iop-adma driver Dan Williams
2007-06-27 1:52 ` [md-accel PATCH 18/19] iop3xx: surface the iop3xx DMA and AAU " Dan Williams
2007-06-27 1:52 ` Dan Williams
2007-06-27 1:52 ` [md-accel PATCH 19/19] ARM: Add drivers/dma to arch/arm/Kconfig Dan Williams
2007-06-27 3:49 ` [md-accel PATCH 00/19] md raid acceleration and the async_tx api Mr. James W. Laferriere
2007-06-27 4:02 ` Dan Williams
2007-06-27 16:45 ` Bill Davidsen
2007-06-27 17:09 ` Williams, Dan J
2007-06-27 17:09 ` Williams, Dan J
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070627015120.18962.89311.stgit@dwillia2-linux.ch.intel.com \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=christopher.leech@intel.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=jeff@garzik.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shannon.nelson@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.