* [md PATCH 0/4] fix ddf asynchronous raid6 recovery
@ 2009-10-20 7:11 Dan Williams
2009-10-20 7:11 ` [md PATCH 1/4] md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning Dan Williams
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Dan Williams @ 2009-10-20 7:11 UTC (permalink / raw)
To: neilb; +Cc: linux-raid
Hi Neil,
The layout test from mdadm.git caught two cases where invalid operations
were being passed to offload drivers, fixed in patch4. The other
patches are just some minor cleanups on top of your for-next branch.
They are also available on my 'for-neil' branch.
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/md.git for-neil
Thanks,
Dan
---
Dan Williams (4):
md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning
async_pq: kill a stray dma_map() call and other cleanups
async_pq: rename scribble page
async_tx: fix asynchronous raid6 recovery for ddf layouts
crypto/async_tx/async_pq.c | 30 ++++++-----
crypto/async_tx/async_raid6_recov.c | 98 ++++++++++++++++++++++++++++-------
crypto/async_tx/raid6test.c | 6 +-
drivers/md/raid5.c | 12 +++-
include/linux/async_tx.h | 6 +-
5 files changed, 108 insertions(+), 44 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread* [md PATCH 1/4] md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning 2009-10-20 7:11 [md PATCH 0/4] fix ddf asynchronous raid6 recovery Dan Williams @ 2009-10-20 7:11 ` Dan Williams 2009-10-20 7:11 ` [md PATCH 2/4] async_pq: kill a stray dma_map() call and other cleanups Dan Williams ` (2 subsequent siblings) 3 siblings, 0 replies; 9+ messages in thread From: Dan Williams @ 2009-10-20 7:11 UTC (permalink / raw) To: neilb; +Cc: linux-raid Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/md/raid5.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index dcd9e65..81abefc 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -156,16 +156,16 @@ static inline int raid6_next_disk(int disk, int raid_disks) static int raid6_idx_to_slot(int idx, struct stripe_head *sh, int *count, int syndrome_disks) { - int slot; + int slot = *count; if (sh->ddf_layout) - slot = (*count)++; + (*count)++; if (idx == sh->pd_idx) return syndrome_disks; if (idx == sh->qd_idx) return syndrome_disks + 1; if (!sh->ddf_layout) - slot = (*count)++; + (*count)++; return slot; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [md PATCH 2/4] async_pq: kill a stray dma_map() call and other cleanups 2009-10-20 7:11 [md PATCH 0/4] fix ddf asynchronous raid6 recovery Dan Williams 2009-10-20 7:11 ` [md PATCH 1/4] md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning Dan Williams @ 2009-10-20 7:11 ` Dan Williams 2009-10-20 7:11 ` [md PATCH 3/4] async_pq: rename scribble page Dan Williams 2009-10-20 7:11 ` [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts Dan Williams 3 siblings, 0 replies; 9+ messages in thread From: Dan Williams @ 2009-10-20 7:11 UTC (permalink / raw) To: neilb; +Cc: linux-raid - update the kernel doc for async_syndrome to indicate what NULL in the source list means - whitespace fixups Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- crypto/async_tx/async_pq.c | 15 ++++++++------- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c index 43b1436..6047656 100644 --- a/crypto/async_tx/async_pq.c +++ b/crypto/async_tx/async_pq.c @@ -181,10 +181,14 @@ do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks, * blocks[disks-1] to NULL. When P or Q is omitted 'len' must be <= * PAGE_SIZE as a temporary buffer of this size is used in the * synchronous path. 'disks' always accounts for both destination - * buffers. + * buffers. If any source buffers (blocks[i] where i < disks - 2) are + * set to NULL those buffers will be replaced with the raid6_zero_page + * in the synchronous path and omitted in the hardware-asynchronous + * path. * * 'blocks' note: if submit->scribble is NULL then the contents of - * 'blocks' may be overridden + * 'blocks' may be overwritten to perform address conversions + * (dma_map_page() or page_address()). */ struct dma_async_tx_descriptor * async_gen_syndrome(struct page **blocks, unsigned int offset, int disks, @@ -283,13 +287,13 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks, if (!P(blocks, disks)) dma_flags |= DMA_PREP_PQ_DISABLE_P; else - pq[0] = dma_map_page(dev, P(blocks,disks), + pq[0] = dma_map_page(dev, P(blocks, disks), offset, len, DMA_TO_DEVICE); if (!Q(blocks, disks)) dma_flags |= DMA_PREP_PQ_DISABLE_Q; else - pq[1] = dma_map_page(dev, Q(blocks,disks), + pq[1] = dma_map_page(dev, Q(blocks, disks), offset, len, DMA_TO_DEVICE); @@ -303,9 +307,6 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks, coefs[src_cnt] = raid6_gfexp[i]; src_cnt++; } - pq[1] = dma_map_page(dev, Q(blocks,disks), - offset, len, - DMA_TO_DEVICE); for (;;) { tx = device->device_prep_dma_pq_val(chan, pq, dma_src, ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [md PATCH 3/4] async_pq: rename scribble page 2009-10-20 7:11 [md PATCH 0/4] fix ddf asynchronous raid6 recovery Dan Williams 2009-10-20 7:11 ` [md PATCH 1/4] md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning Dan Williams 2009-10-20 7:11 ` [md PATCH 2/4] async_pq: kill a stray dma_map() call and other cleanups Dan Williams @ 2009-10-20 7:11 ` Dan Williams 2009-10-20 7:11 ` [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts Dan Williams 3 siblings, 0 replies; 9+ messages in thread From: Dan Williams @ 2009-10-20 7:11 UTC (permalink / raw) To: neilb; +Cc: linux-raid The global scribble page is used as a temporary destination buffer when disabling the P or Q result is requested. The local scribble buffer contains memory for performing address conversions. Rename the global variable to avoid confusion. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- crypto/async_tx/async_pq.c | 15 ++++++++------- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c index 6047656..6b5cc4f 100644 --- a/crypto/async_tx/async_pq.c +++ b/crypto/async_tx/async_pq.c @@ -26,9 +26,10 @@ #include <linux/async_tx.h> /** - * scribble - space to hold throwaway P buffer for synchronous gen_syndrome + * pq_scribble_page - space to hold throwaway P or Q buffer for + * synchronous gen_syndrome */ -static struct page *scribble; +static struct page *pq_scribble_page; /* the struct page *blocks[] parameter passed to async_gen_syndrome() * and async_syndrome_val() contains the 'P' destination address at @@ -226,11 +227,11 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks, async_tx_quiesce(&submit->depend_tx); if (!P(blocks, disks)) { - P(blocks, disks) = scribble; + P(blocks, disks) = pq_scribble_page; BUG_ON(len + offset > PAGE_SIZE); } if (!Q(blocks, disks)) { - Q(blocks, disks) = scribble; + Q(blocks, disks) = pq_scribble_page; BUG_ON(len + offset > PAGE_SIZE); } do_sync_gen_syndrome(blocks, offset, disks, len, submit); @@ -384,9 +385,9 @@ EXPORT_SYMBOL_GPL(async_syndrome_val); static int __init async_pq_init(void) { - scribble = alloc_page(GFP_KERNEL); + pq_scribble_page = alloc_page(GFP_KERNEL); - if (scribble) + if (pq_scribble_page) return 0; pr_err("%s: failed to allocate required spare page\n", __func__); @@ -396,7 +397,7 @@ static int __init async_pq_init(void) static void __exit async_pq_exit(void) { - put_page(scribble); + put_page(pq_scribble_page); } module_init(async_pq_init); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts 2009-10-20 7:11 [md PATCH 0/4] fix ddf asynchronous raid6 recovery Dan Williams ` (2 preceding siblings ...) 2009-10-20 7:11 ` [md PATCH 3/4] async_pq: rename scribble page Dan Williams @ 2009-10-20 7:11 ` Dan Williams 2009-10-21 23:26 ` Neil Brown 3 siblings, 1 reply; 9+ messages in thread From: Dan Williams @ 2009-10-20 7:11 UTC (permalink / raw) To: neilb; +Cc: linux-raid The raid6 recovery code currently requires special handling of the 4-disk and 5-disk recovery scenarios for the native layout. Quoting from commit 0a82a623: In these situations the default N-disk algorithm will present 0-source or 1-source operations to dma devices. To cover for dma devices where the minimum source count is 2 we implement 4-disk and 5-disk handling in the recovery code. Recovery in the ddf layout case needs explicit handling of the 6-disk and 7-disk recovery cases which pose the same problems as the ones mentioned above. Note that N-disks refers to N syndrome-disks not the width of the array. Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- crypto/async_tx/async_raid6_recov.c | 98 ++++++++++++++++++++++++++++------- crypto/async_tx/raid6test.c | 6 +- drivers/md/raid5.c | 6 +- include/linux/async_tx.h | 6 +- 4 files changed, 89 insertions(+), 27 deletions(-) diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c index 8e30b6e..54b39d2 100644 --- a/crypto/async_tx/async_raid6_recov.c +++ b/crypto/async_tx/async_raid6_recov.c @@ -132,7 +132,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len, static struct dma_async_tx_descriptor * __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, - struct async_submit_ctl *submit) + bool is_ddf, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *a, *b; @@ -143,8 +143,13 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, void *cb_param = submit->cb_param; void *scribble = submit->scribble; - p = blocks[4-2]; - q = blocks[4-1]; + if (is_ddf) { + p = blocks[6-2]; + q = blocks[6-1]; + } else { + p = blocks[4-2]; + q = blocks[4-1]; + } a = blocks[faila]; b = blocks[failb]; @@ -171,7 +176,7 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, static struct dma_async_tx_descriptor * __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, - struct async_submit_ctl *submit) + bool is_ddf, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *g, *dp, *dq; @@ -181,21 +186,34 @@ __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, dma_async_tx_callback cb_fn = submit->cb_fn; void *cb_param = submit->cb_param; void *scribble = submit->scribble; - int uninitialized_var(good); + int good = -1; int i; - for (i = 0; i < 3; i++) { - if (i == faila || i == failb) - continue; - else { + if (is_ddf) { + /* a 7-disk ddf operation devolves to the 5-disk native + * layout case modulo these fixups + */ + for (i = 0; i < 7-2; i++) { + if (blocks[i] == NULL) + continue; + if (i == faila || i == failb) + continue; + BUG_ON(good != -1); good = i; - break; } + p = blocks[7-2]; + q = blocks[7-1]; + } else { + for (i = 0; i < 5-2; i++) { + if (i == faila || i == failb) + continue; + BUG_ON(good != -1); + good = i; + } + p = blocks[5-2]; + q = blocks[5-1]; } - BUG_ON(i >= 3); - - p = blocks[5-2]; - q = blocks[5-1]; + BUG_ON(good == -1); g = blocks[good]; /* Compute syndrome with zero for the missing data pages @@ -317,11 +335,13 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb, * @faila: first failed drive index * @failb: second failed drive index * @blocks: array of source pointers where the last two entries are p and q + * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout * @submit: submission/completion modifiers */ struct dma_async_tx_descriptor * async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, - struct page **blocks, struct async_submit_ctl *submit) + struct page **blocks, bool is_ddf, + struct async_submit_ctl *submit) { BUG_ON(faila == failb); if (failb < faila) @@ -356,13 +376,23 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, * operation (in contrast to the synchronous case), so * explicitly handle the 4 disk special case */ - return __2data_recov_4(bytes, faila, failb, blocks, submit); + return __2data_recov_4(bytes, faila, failb, blocks, false, submit); case 5: /* dma devices do not uniformly understand a single * source pq operation (in contrast to the synchronous * case), so explicitly handle the 5 disk special case */ - return __2data_recov_5(bytes, faila, failb, blocks, submit); + return __2data_recov_5(bytes, faila, failb, blocks, false, submit); + case 6: + if (is_ddf) + return __2data_recov_4(bytes, faila, failb, blocks, + true, submit); + /* fall through */ + case 7: + if (is_ddf) + return __2data_recov_5(bytes, faila, failb, blocks, + true, submit); + /* fall through */ default: return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); } @@ -375,11 +405,13 @@ EXPORT_SYMBOL_GPL(async_raid6_2data_recov); * @bytes: block size * @faila: failed drive index * @blocks: array of source pointers where the last two entries are p and q + * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout * @submit: submission/completion modifiers */ struct dma_async_tx_descriptor * async_raid6_datap_recov(int disks, size_t bytes, int faila, - struct page **blocks, struct async_submit_ctl *submit) + struct page **blocks, bool is_ddf, + struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *dq; @@ -423,10 +455,11 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, blocks[faila] = NULL; blocks[disks-1] = dq; - /* in the 4 disk case we only need to perform a single source - * multiplication + /* in the 4-disk (or 6-disk ddf layout) case we only need to + * perform a single source multiplication with the one good data + * block. */ - if (disks == 4) { + if (disks == 4 && !is_ddf) { int good = faila == 0 ? 1 : 0; struct page *g = blocks[good]; @@ -437,6 +470,29 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble); tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); + } else if (disks == 6 && is_ddf) { + struct page *g; + int good = -1; + int i; + + for (i = 0; i < 6-2; i++) { + if (blocks[i] == NULL) + continue; + if (i == faila) + continue; + BUG_ON(good != -1); + good = i; + } + BUG_ON(good == -1); + g = blocks[good]; + + init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, + scribble); + tx = async_memcpy(p, g, 0, 0, bytes, submit); + + init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, + scribble); + tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); } else { init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble); diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c index 3ec27c7..9d7b3e1 100644 --- a/crypto/async_tx/raid6test.c +++ b/crypto/async_tx/raid6test.c @@ -108,11 +108,13 @@ static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, stru if (failb == disks-2) { /* data+P failure. */ init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); - tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, &submit); + tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, + false, &submit); } else { /* data+data failure. */ init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); - tx = async_raid6_2data_recov(disks, bytes, faila, failb, ptrs, &submit); + tx = async_raid6_2data_recov(disks, bytes, faila, failb, + ptrs, false, &submit); } } init_completion(&cmp); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 81abefc..18d6ed4 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -888,12 +888,14 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu) /* We're missing D+P. */ return async_raid6_datap_recov(syndrome_disks+2, STRIPE_SIZE, faila, - blocks, &submit); + blocks, sh->ddf_layout, + &submit); } else { /* We're missing D+D. */ return async_raid6_2data_recov(syndrome_disks+2, STRIPE_SIZE, faila, failb, - blocks, &submit); + blocks, sh->ddf_layout, + &submit); } } } diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h index a1c486a..7520204 100644 --- a/include/linux/async_tx.h +++ b/include/linux/async_tx.h @@ -199,11 +199,13 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int src_cnt, struct dma_async_tx_descriptor * async_raid6_2data_recov(int src_num, size_t bytes, int faila, int failb, - struct page **ptrs, struct async_submit_ctl *submit); + struct page **ptrs, bool is_ddf, + struct async_submit_ctl *submit); struct dma_async_tx_descriptor * async_raid6_datap_recov(int src_num, size_t bytes, int faila, - struct page **ptrs, struct async_submit_ctl *submit); + struct page **ptrs, bool is_ddf, + struct async_submit_ctl *submit); void async_tx_quiesce(struct dma_async_tx_descriptor **tx); #endif /* _ASYNC_TX_H_ */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts 2009-10-20 7:11 ` [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts Dan Williams @ 2009-10-21 23:26 ` Neil Brown 2009-10-22 0:42 ` Dan Williams 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2009-10-21 23:26 UTC (permalink / raw) To: Dan Williams; +Cc: linux-raid Thanks for these Dan. I have pulled them and will ask Linus to pull them shortly. I'm a little concerned about the following patch though. While it looks like it is probably correct, it seems to be creating some rather complex code. How hard would it be do simply scan the src_ptrs list for non-NULL entries and if there are 0 or 1 of them, handle them with special case code, rather than carrying around the 'is_ddf' flag and so forth?? Thanks, NeilBrown On Tuesday October 20, dan.j.williams@intel.com wrote: > The raid6 recovery code currently requires special handling of the > 4-disk and 5-disk recovery scenarios for the native layout. Quoting > from commit 0a82a623: > > In these situations the default N-disk algorithm will present > 0-source or 1-source operations to dma devices. To cover for > dma devices where the minimum source count is 2 we implement > 4-disk and 5-disk handling in the recovery code. > > Recovery in the ddf layout case needs explicit handling of the 6-disk > and 7-disk recovery cases which pose the same problems as the ones > mentioned above. Note that N-disks refers to N syndrome-disks not the > width of the array. > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > --- > crypto/async_tx/async_raid6_recov.c | 98 ++++++++++++++++++++++++++++------- > crypto/async_tx/raid6test.c | 6 +- > drivers/md/raid5.c | 6 +- > include/linux/async_tx.h | 6 +- > 4 files changed, 89 insertions(+), 27 deletions(-) > > diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c > index 8e30b6e..54b39d2 100644 > --- a/crypto/async_tx/async_raid6_recov.c > +++ b/crypto/async_tx/async_raid6_recov.c > @@ -132,7 +132,7 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len, > > static struct dma_async_tx_descriptor * > __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, > - struct async_submit_ctl *submit) > + bool is_ddf, struct async_submit_ctl *submit) > { > struct dma_async_tx_descriptor *tx = NULL; > struct page *p, *q, *a, *b; > @@ -143,8 +143,13 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, > void *cb_param = submit->cb_param; > void *scribble = submit->scribble; > > - p = blocks[4-2]; > - q = blocks[4-1]; > + if (is_ddf) { > + p = blocks[6-2]; > + q = blocks[6-1]; > + } else { > + p = blocks[4-2]; > + q = blocks[4-1]; > + } > > a = blocks[faila]; > b = blocks[failb]; > @@ -171,7 +176,7 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, > > static struct dma_async_tx_descriptor * > __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, > - struct async_submit_ctl *submit) > + bool is_ddf, struct async_submit_ctl *submit) > { > struct dma_async_tx_descriptor *tx = NULL; > struct page *p, *q, *g, *dp, *dq; > @@ -181,21 +186,34 @@ __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, > dma_async_tx_callback cb_fn = submit->cb_fn; > void *cb_param = submit->cb_param; > void *scribble = submit->scribble; > - int uninitialized_var(good); > + int good = -1; > int i; > > - for (i = 0; i < 3; i++) { > - if (i == faila || i == failb) > - continue; > - else { > + if (is_ddf) { > + /* a 7-disk ddf operation devolves to the 5-disk native > + * layout case modulo these fixups > + */ > + for (i = 0; i < 7-2; i++) { > + if (blocks[i] == NULL) > + continue; > + if (i == faila || i == failb) > + continue; > + BUG_ON(good != -1); > good = i; > - break; > } > + p = blocks[7-2]; > + q = blocks[7-1]; > + } else { > + for (i = 0; i < 5-2; i++) { > + if (i == faila || i == failb) > + continue; > + BUG_ON(good != -1); > + good = i; > + } > + p = blocks[5-2]; > + q = blocks[5-1]; > } > - BUG_ON(i >= 3); > - > - p = blocks[5-2]; > - q = blocks[5-1]; > + BUG_ON(good == -1); > g = blocks[good]; > > /* Compute syndrome with zero for the missing data pages > @@ -317,11 +335,13 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb, > * @faila: first failed drive index > * @failb: second failed drive index > * @blocks: array of source pointers where the last two entries are p and q > + * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout > * @submit: submission/completion modifiers > */ > struct dma_async_tx_descriptor * > async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, > - struct page **blocks, struct async_submit_ctl *submit) > + struct page **blocks, bool is_ddf, > + struct async_submit_ctl *submit) > { > BUG_ON(faila == failb); > if (failb < faila) > @@ -356,13 +376,23 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, > * operation (in contrast to the synchronous case), so > * explicitly handle the 4 disk special case > */ > - return __2data_recov_4(bytes, faila, failb, blocks, submit); > + return __2data_recov_4(bytes, faila, failb, blocks, false, submit); > case 5: > /* dma devices do not uniformly understand a single > * source pq operation (in contrast to the synchronous > * case), so explicitly handle the 5 disk special case > */ > - return __2data_recov_5(bytes, faila, failb, blocks, submit); > + return __2data_recov_5(bytes, faila, failb, blocks, false, submit); > + case 6: > + if (is_ddf) > + return __2data_recov_4(bytes, faila, failb, blocks, > + true, submit); > + /* fall through */ > + case 7: > + if (is_ddf) > + return __2data_recov_5(bytes, faila, failb, blocks, > + true, submit); > + /* fall through */ > default: > return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); > } > @@ -375,11 +405,13 @@ EXPORT_SYMBOL_GPL(async_raid6_2data_recov); > * @bytes: block size > * @faila: failed drive index > * @blocks: array of source pointers where the last two entries are p and q > + * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout > * @submit: submission/completion modifiers > */ > struct dma_async_tx_descriptor * > async_raid6_datap_recov(int disks, size_t bytes, int faila, > - struct page **blocks, struct async_submit_ctl *submit) > + struct page **blocks, bool is_ddf, > + struct async_submit_ctl *submit) > { > struct dma_async_tx_descriptor *tx = NULL; > struct page *p, *q, *dq; > @@ -423,10 +455,11 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > blocks[faila] = NULL; > blocks[disks-1] = dq; > > - /* in the 4 disk case we only need to perform a single source > - * multiplication > + /* in the 4-disk (or 6-disk ddf layout) case we only need to > + * perform a single source multiplication with the one good data > + * block. > */ > - if (disks == 4) { > + if (disks == 4 && !is_ddf) { > int good = faila == 0 ? 1 : 0; > struct page *g = blocks[good]; > > @@ -437,6 +470,29 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > scribble); > tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); > + } else if (disks == 6 && is_ddf) { > + struct page *g; > + int good = -1; > + int i; > + > + for (i = 0; i < 6-2; i++) { > + if (blocks[i] == NULL) > + continue; > + if (i == faila) > + continue; > + BUG_ON(good != -1); > + good = i; > + } > + BUG_ON(good == -1); > + g = blocks[good]; > + > + init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > + scribble); > + tx = async_memcpy(p, g, 0, 0, bytes, submit); > + > + init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > + scribble); > + tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); > } else { > init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > scribble); > diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c > index 3ec27c7..9d7b3e1 100644 > --- a/crypto/async_tx/raid6test.c > +++ b/crypto/async_tx/raid6test.c > @@ -108,11 +108,13 @@ static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, stru > if (failb == disks-2) { > /* data+P failure. */ > init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); > - tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, &submit); > + tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, > + false, &submit); > } else { > /* data+data failure. */ > init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); > - tx = async_raid6_2data_recov(disks, bytes, faila, failb, ptrs, &submit); > + tx = async_raid6_2data_recov(disks, bytes, faila, failb, > + ptrs, false, &submit); > } > } > init_completion(&cmp); > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 81abefc..18d6ed4 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -888,12 +888,14 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu) > /* We're missing D+P. */ > return async_raid6_datap_recov(syndrome_disks+2, > STRIPE_SIZE, faila, > - blocks, &submit); > + blocks, sh->ddf_layout, > + &submit); > } else { > /* We're missing D+D. */ > return async_raid6_2data_recov(syndrome_disks+2, > STRIPE_SIZE, faila, failb, > - blocks, &submit); > + blocks, sh->ddf_layout, > + &submit); > } > } > } > diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h > index a1c486a..7520204 100644 > --- a/include/linux/async_tx.h > +++ b/include/linux/async_tx.h > @@ -199,11 +199,13 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int src_cnt, > > struct dma_async_tx_descriptor * > async_raid6_2data_recov(int src_num, size_t bytes, int faila, int failb, > - struct page **ptrs, struct async_submit_ctl *submit); > + struct page **ptrs, bool is_ddf, > + struct async_submit_ctl *submit); > > struct dma_async_tx_descriptor * > async_raid6_datap_recov(int src_num, size_t bytes, int faila, > - struct page **ptrs, struct async_submit_ctl *submit); > + struct page **ptrs, bool is_ddf, > + struct async_submit_ctl *submit); > > void async_tx_quiesce(struct dma_async_tx_descriptor **tx); > #endif /* _ASYNC_TX_H_ */ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts 2009-10-21 23:26 ` Neil Brown @ 2009-10-22 0:42 ` Dan Williams 2009-10-22 2:46 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: Dan Williams @ 2009-10-22 0:42 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid@vger.kernel.org On Wed, 2009-10-21 at 16:26 -0700, Neil Brown wrote: > Thanks for these Dan. I have pulled them and will ask Linus to pull > them shortly. > > I'm a little concerned about the following patch though. While it > looks like it is probably correct, it seems to be creating some rather > complex code. > > How hard would it be do simply scan the src_ptrs list for non-NULL > entries and if there are 0 or 1 of them, handle them with special case > code, rather than carrying around the 'is_ddf' flag and so forth?? > To be honest I did not really like the 'is_ddf' flag either. How about the following totally untested cleanup? crypto/async_tx/async_raid6_recov.c | 142 +++++++++++++++-------------------- crypto/async_tx/raid6test.c | 6 - drivers/md/raid5.c | 6 - include/linux/async_tx.h | 6 - 4 files changed, 67 insertions(+), 93 deletions(-) diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c index 54b39d2..9fdd824 100644 --- a/crypto/async_tx/async_raid6_recov.c +++ b/crypto/async_tx/async_raid6_recov.c @@ -131,8 +131,8 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len, } static struct dma_async_tx_descriptor * -__2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, - bool is_ddf, struct async_submit_ctl *submit) +__2data_recov_4(int disks, size_t bytes, int faila, int failb, + struct page **blocks, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *a, *b; @@ -143,13 +143,8 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, void *cb_param = submit->cb_param; void *scribble = submit->scribble; - if (is_ddf) { - p = blocks[6-2]; - q = blocks[6-1]; - } else { - p = blocks[4-2]; - q = blocks[4-1]; - } + p = blocks[disks-2]; + q = blocks[disks-1]; a = blocks[faila]; b = blocks[failb]; @@ -175,8 +170,8 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, } static struct dma_async_tx_descriptor * -__2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, - bool is_ddf, struct async_submit_ctl *submit) +__2data_recov_5(int disks, size_t bytes, int faila, int failb, + struct page **blocks, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *g, *dp, *dq; @@ -186,34 +181,22 @@ __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, dma_async_tx_callback cb_fn = submit->cb_fn; void *cb_param = submit->cb_param; void *scribble = submit->scribble; - int good = -1; - int i; - - if (is_ddf) { - /* a 7-disk ddf operation devolves to the 5-disk native - * layout case modulo these fixups - */ - for (i = 0; i < 7-2; i++) { - if (blocks[i] == NULL) - continue; - if (i == faila || i == failb) - continue; - BUG_ON(good != -1); - good = i; - } - p = blocks[7-2]; - q = blocks[7-1]; - } else { - for (i = 0; i < 5-2; i++) { - if (i == faila || i == failb) - continue; - BUG_ON(good != -1); - good = i; - } - p = blocks[5-2]; - q = blocks[5-1]; + int good_srcs, good, i; + + good_srcs = 0; + good = -1; + for (i = 0; i < disks-2; i++) { + if (blocks[i] == NULL) + continue; + if (i == faila || i == failb) + continue; + good = i; + good_srcs++; } - BUG_ON(good == -1); + BUG_ON(good_srcs > 1); + + p = blocks[disks-2]; + q = blocks[disks-1]; g = blocks[good]; /* Compute syndrome with zero for the missing data pages @@ -335,14 +318,14 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb, * @faila: first failed drive index * @failb: second failed drive index * @blocks: array of source pointers where the last two entries are p and q - * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout * @submit: submission/completion modifiers */ struct dma_async_tx_descriptor * async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, - struct page **blocks, bool is_ddf, - struct async_submit_ctl *submit) + struct page **blocks, struct async_submit_ctl *submit) { + int non_zero_srcs, i; + BUG_ON(faila == failb); if (failb < faila) swap(faila, failb); @@ -376,26 +359,35 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, * operation (in contrast to the synchronous case), so * explicitly handle the 4 disk special case */ - return __2data_recov_4(bytes, faila, failb, blocks, false, submit); + return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); case 5: /* dma devices do not uniformly understand a single * source pq operation (in contrast to the synchronous * case), so explicitly handle the 5 disk special case */ - return __2data_recov_5(bytes, faila, failb, blocks, false, submit); + return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); case 6: - if (is_ddf) - return __2data_recov_4(bytes, faila, failb, blocks, - true, submit); - /* fall through */ + non_zero_srcs = 0; + for (i = 0; i < disks-2; i++) + if (blocks[i]) + non_zero_srcs++; + if (non_zero_srcs > 4) + break; + /* catch the ddf 6-disk special case */ + return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); case 7: - if (is_ddf) - return __2data_recov_5(bytes, faila, failb, blocks, - true, submit); - /* fall through */ + non_zero_srcs = 0; + for (i = 0; i < disks-2; i++) + if (blocks[i]) + non_zero_srcs++; + if (non_zero_srcs > 5) + break; + /* catch the ddf 7-disk special case */ + return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); default: - return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); + break; } + return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); } EXPORT_SYMBOL_GPL(async_raid6_2data_recov); @@ -405,13 +397,11 @@ EXPORT_SYMBOL_GPL(async_raid6_2data_recov); * @bytes: block size * @faila: failed drive index * @blocks: array of source pointers where the last two entries are p and q - * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout * @submit: submission/completion modifiers */ struct dma_async_tx_descriptor * async_raid6_datap_recov(int disks, size_t bytes, int faila, - struct page **blocks, bool is_ddf, - struct async_submit_ctl *submit) + struct page **blocks, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *dq; @@ -420,6 +410,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, dma_async_tx_callback cb_fn = submit->cb_fn; void *cb_param = submit->cb_param; void *scribble = submit->scribble; + int good_srcs, good, i; struct page *srcs[2]; pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes); @@ -429,7 +420,6 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, */ if (!scribble) { void **ptrs = (void **) blocks; - int i; async_tx_quiesce(&submit->depend_tx); for (i = 0; i < disks; i++) @@ -445,6 +435,20 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, return NULL; } + good_srcs = 0; + good = -1; + for (i = 0; i < disks-2; i++) { + if (i == faila) + continue; + if (blocks[i]) { + good = i; + good_srcs++; + if (good_srcs > 1) + break; + } + } + BUG_ON(good_srcs == 0); + p = blocks[disks-2]; q = blocks[disks-1]; @@ -459,8 +463,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, * perform a single source multiplication with the one good data * block. */ - if (disks == 4 && !is_ddf) { - int good = faila == 0 ? 1 : 0; + if (good_srcs == 1) { struct page *g = blocks[good]; init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, @@ -470,29 +473,6 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble); tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); - } else if (disks == 6 && is_ddf) { - struct page *g; - int good = -1; - int i; - - for (i = 0; i < 6-2; i++) { - if (blocks[i] == NULL) - continue; - if (i == faila) - continue; - BUG_ON(good != -1); - good = i; - } - BUG_ON(good == -1); - g = blocks[good]; - - init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, - scribble); - tx = async_memcpy(p, g, 0, 0, bytes, submit); - - init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, - scribble); - tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); } else { init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble); diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c index 9d7b3e1..3ec27c7 100644 --- a/crypto/async_tx/raid6test.c +++ b/crypto/async_tx/raid6test.c @@ -108,13 +108,11 @@ static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, stru if (failb == disks-2) { /* data+P failure. */ init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); - tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, - false, &submit); + tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, &submit); } else { /* data+data failure. */ init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); - tx = async_raid6_2data_recov(disks, bytes, faila, failb, - ptrs, false, &submit); + tx = async_raid6_2data_recov(disks, bytes, faila, failb, ptrs, &submit); } } init_completion(&cmp); diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 18d6ed4..81abefc 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -888,14 +888,12 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu) /* We're missing D+P. */ return async_raid6_datap_recov(syndrome_disks+2, STRIPE_SIZE, faila, - blocks, sh->ddf_layout, - &submit); + blocks, &submit); } else { /* We're missing D+D. */ return async_raid6_2data_recov(syndrome_disks+2, STRIPE_SIZE, faila, failb, - blocks, sh->ddf_layout, - &submit); + blocks, &submit); } } } diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h index 7520204..a1c486a 100644 --- a/include/linux/async_tx.h +++ b/include/linux/async_tx.h @@ -199,13 +199,11 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int src_cnt, struct dma_async_tx_descriptor * async_raid6_2data_recov(int src_num, size_t bytes, int faila, int failb, - struct page **ptrs, bool is_ddf, - struct async_submit_ctl *submit); + struct page **ptrs, struct async_submit_ctl *submit); struct dma_async_tx_descriptor * async_raid6_datap_recov(int src_num, size_t bytes, int faila, - struct page **ptrs, bool is_ddf, - struct async_submit_ctl *submit); + struct page **ptrs, struct async_submit_ctl *submit); void async_tx_quiesce(struct dma_async_tx_descriptor **tx); #endif /* _ASYNC_TX_H_ */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts 2009-10-22 0:42 ` Dan Williams @ 2009-10-22 2:46 ` Neil Brown 2009-10-22 21:17 ` Dan Williams 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2009-10-22 2:46 UTC (permalink / raw) To: Dan Williams; +Cc: linux-raid@vger.kernel.org On Wednesday October 21, dan.j.williams@intel.com wrote: > On Wed, 2009-10-21 at 16:26 -0700, Neil Brown wrote: > > Thanks for these Dan. I have pulled them and will ask Linus to pull > > them shortly. > > > > I'm a little concerned about the following patch though. While it > > looks like it is probably correct, it seems to be creating some rather > > complex code. > > > > How hard would it be do simply scan the src_ptrs list for non-NULL > > entries and if there are 0 or 1 of them, handle them with special case > > code, rather than carrying around the 'is_ddf' flag and so forth?? > > > > To be honest I did not really like the 'is_ddf' flag either. How about > the following totally untested cleanup? Looks like a real improvement, but I think you need to be comparing non_zero_srcs to '2' or '3', not '4', or '5'?? How about this on top of what you had? Equally untested. NeilBrown diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c index 9fdd824..55ca712 100644 --- a/crypto/async_tx/async_raid6_recov.c +++ b/crypto/async_tx/async_raid6_recov.c @@ -353,41 +353,33 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, return NULL; } - switch (disks) { - case 4: + non_zero_srcs = 0; + for (i = 0; i < disks-2 && non_zero_srcs < 4; i++) + if (blocks[i]) + non_zero_srcs++; + switch (non_zero_srcs) { + case 0: + case 1: + /* There must be at least 2 sources - the failed devices. */ + BUG(); + + case 2: /* dma devices do not uniformly understand a zero source pq * operation (in contrast to the synchronous case), so - * explicitly handle the 4 disk special case + * explicitly handle the special case of a 4 disk array with + * both data disks missing. */ return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); - case 5: + case 3: /* dma devices do not uniformly understand a single * source pq operation (in contrast to the synchronous - * case), so explicitly handle the 5 disk special case + * case), so explicitly handle the special case of a 5 disk + * array with 2 of 3 data disks missing. */ return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); - case 6: - non_zero_srcs = 0; - for (i = 0; i < disks-2; i++) - if (blocks[i]) - non_zero_srcs++; - if (non_zero_srcs > 4) - break; - /* catch the ddf 6-disk special case */ - return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); - case 7: - non_zero_srcs = 0; - for (i = 0; i < disks-2; i++) - if (blocks[i]) - non_zero_srcs++; - if (non_zero_srcs > 5) - break; - /* catch the ddf 7-disk special case */ - return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); default: - break; + return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); } - return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); } EXPORT_SYMBOL_GPL(async_raid6_2data_recov); > > crypto/async_tx/async_raid6_recov.c | 142 +++++++++++++++-------------------- > crypto/async_tx/raid6test.c | 6 - > drivers/md/raid5.c | 6 - > include/linux/async_tx.h | 6 - > 4 files changed, 67 insertions(+), 93 deletions(-) > > diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c > index 54b39d2..9fdd824 100644 > --- a/crypto/async_tx/async_raid6_recov.c > +++ b/crypto/async_tx/async_raid6_recov.c > @@ -131,8 +131,8 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len, > } > > static struct dma_async_tx_descriptor * > -__2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, > - bool is_ddf, struct async_submit_ctl *submit) > +__2data_recov_4(int disks, size_t bytes, int faila, int failb, > + struct page **blocks, struct async_submit_ctl *submit) > { > struct dma_async_tx_descriptor *tx = NULL; > struct page *p, *q, *a, *b; > @@ -143,13 +143,8 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, > void *cb_param = submit->cb_param; > void *scribble = submit->scribble; > > - if (is_ddf) { > - p = blocks[6-2]; > - q = blocks[6-1]; > - } else { > - p = blocks[4-2]; > - q = blocks[4-1]; > - } > + p = blocks[disks-2]; > + q = blocks[disks-1]; > > a = blocks[faila]; > b = blocks[failb]; > @@ -175,8 +170,8 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, > } > > static struct dma_async_tx_descriptor * > -__2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, > - bool is_ddf, struct async_submit_ctl *submit) > +__2data_recov_5(int disks, size_t bytes, int faila, int failb, > + struct page **blocks, struct async_submit_ctl *submit) > { > struct dma_async_tx_descriptor *tx = NULL; > struct page *p, *q, *g, *dp, *dq; > @@ -186,34 +181,22 @@ __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, > dma_async_tx_callback cb_fn = submit->cb_fn; > void *cb_param = submit->cb_param; > void *scribble = submit->scribble; > - int good = -1; > - int i; > - > - if (is_ddf) { > - /* a 7-disk ddf operation devolves to the 5-disk native > - * layout case modulo these fixups > - */ > - for (i = 0; i < 7-2; i++) { > - if (blocks[i] == NULL) > - continue; > - if (i == faila || i == failb) > - continue; > - BUG_ON(good != -1); > - good = i; > - } > - p = blocks[7-2]; > - q = blocks[7-1]; > - } else { > - for (i = 0; i < 5-2; i++) { > - if (i == faila || i == failb) > - continue; > - BUG_ON(good != -1); > - good = i; > - } > - p = blocks[5-2]; > - q = blocks[5-1]; > + int good_srcs, good, i; > + > + good_srcs = 0; > + good = -1; > + for (i = 0; i < disks-2; i++) { > + if (blocks[i] == NULL) > + continue; > + if (i == faila || i == failb) > + continue; > + good = i; > + good_srcs++; > } > - BUG_ON(good == -1); > + BUG_ON(good_srcs > 1); > + > + p = blocks[disks-2]; > + q = blocks[disks-1]; > g = blocks[good]; > > /* Compute syndrome with zero for the missing data pages > @@ -335,14 +318,14 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb, > * @faila: first failed drive index > * @failb: second failed drive index > * @blocks: array of source pointers where the last two entries are p and q > - * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout > * @submit: submission/completion modifiers > */ > struct dma_async_tx_descriptor * > async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, > - struct page **blocks, bool is_ddf, > - struct async_submit_ctl *submit) > + struct page **blocks, struct async_submit_ctl *submit) > { > + int non_zero_srcs, i; > + > BUG_ON(faila == failb); > if (failb < faila) > swap(faila, failb); > @@ -376,26 +359,35 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, > * operation (in contrast to the synchronous case), so > * explicitly handle the 4 disk special case > */ > - return __2data_recov_4(bytes, faila, failb, blocks, false, submit); > + return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); > case 5: > /* dma devices do not uniformly understand a single > * source pq operation (in contrast to the synchronous > * case), so explicitly handle the 5 disk special case > */ > - return __2data_recov_5(bytes, faila, failb, blocks, false, submit); > + return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); > case 6: > - if (is_ddf) > - return __2data_recov_4(bytes, faila, failb, blocks, > - true, submit); > - /* fall through */ > + non_zero_srcs = 0; > + for (i = 0; i < disks-2; i++) > + if (blocks[i]) > + non_zero_srcs++; > + if (non_zero_srcs > 4) > + break; > + /* catch the ddf 6-disk special case */ > + return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); > case 7: > - if (is_ddf) > - return __2data_recov_5(bytes, faila, failb, blocks, > - true, submit); > - /* fall through */ > + non_zero_srcs = 0; > + for (i = 0; i < disks-2; i++) > + if (blocks[i]) > + non_zero_srcs++; > + if (non_zero_srcs > 5) > + break; > + /* catch the ddf 7-disk special case */ > + return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); > default: > - return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); > + break; > } > + return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); > } > EXPORT_SYMBOL_GPL(async_raid6_2data_recov); > > @@ -405,13 +397,11 @@ EXPORT_SYMBOL_GPL(async_raid6_2data_recov); > * @bytes: block size > * @faila: failed drive index > * @blocks: array of source pointers where the last two entries are p and q > - * @is_ddf: flag to indicate whether 'blocks' is in the ddf layout > * @submit: submission/completion modifiers > */ > struct dma_async_tx_descriptor * > async_raid6_datap_recov(int disks, size_t bytes, int faila, > - struct page **blocks, bool is_ddf, > - struct async_submit_ctl *submit) > + struct page **blocks, struct async_submit_ctl *submit) > { > struct dma_async_tx_descriptor *tx = NULL; > struct page *p, *q, *dq; > @@ -420,6 +410,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > dma_async_tx_callback cb_fn = submit->cb_fn; > void *cb_param = submit->cb_param; > void *scribble = submit->scribble; > + int good_srcs, good, i; > struct page *srcs[2]; > > pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes); > @@ -429,7 +420,6 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > */ > if (!scribble) { > void **ptrs = (void **) blocks; > - int i; > > async_tx_quiesce(&submit->depend_tx); > for (i = 0; i < disks; i++) > @@ -445,6 +435,20 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > return NULL; > } > > + good_srcs = 0; > + good = -1; > + for (i = 0; i < disks-2; i++) { > + if (i == faila) > + continue; > + if (blocks[i]) { > + good = i; > + good_srcs++; > + if (good_srcs > 1) > + break; > + } > + } > + BUG_ON(good_srcs == 0); > + > p = blocks[disks-2]; > q = blocks[disks-1]; > > @@ -459,8 +463,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > * perform a single source multiplication with the one good data > * block. > */ > - if (disks == 4 && !is_ddf) { > - int good = faila == 0 ? 1 : 0; > + if (good_srcs == 1) { > struct page *g = blocks[good]; > > init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > @@ -470,29 +473,6 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, > init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > scribble); > tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); > - } else if (disks == 6 && is_ddf) { > - struct page *g; > - int good = -1; > - int i; > - > - for (i = 0; i < 6-2; i++) { > - if (blocks[i] == NULL) > - continue; > - if (i == faila) > - continue; > - BUG_ON(good != -1); > - good = i; > - } > - BUG_ON(good == -1); > - g = blocks[good]; > - > - init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > - scribble); > - tx = async_memcpy(p, g, 0, 0, bytes, submit); > - > - init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > - scribble); > - tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit); > } else { > init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, > scribble); > diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c > index 9d7b3e1..3ec27c7 100644 > --- a/crypto/async_tx/raid6test.c > +++ b/crypto/async_tx/raid6test.c > @@ -108,13 +108,11 @@ static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, stru > if (failb == disks-2) { > /* data+P failure. */ > init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); > - tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, > - false, &submit); > + tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, &submit); > } else { > /* data+data failure. */ > init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv); > - tx = async_raid6_2data_recov(disks, bytes, faila, failb, > - ptrs, false, &submit); > + tx = async_raid6_2data_recov(disks, bytes, faila, failb, ptrs, &submit); > } > } > init_completion(&cmp); > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 18d6ed4..81abefc 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -888,14 +888,12 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu) > /* We're missing D+P. */ > return async_raid6_datap_recov(syndrome_disks+2, > STRIPE_SIZE, faila, > - blocks, sh->ddf_layout, > - &submit); > + blocks, &submit); > } else { > /* We're missing D+D. */ > return async_raid6_2data_recov(syndrome_disks+2, > STRIPE_SIZE, faila, failb, > - blocks, sh->ddf_layout, > - &submit); > + blocks, &submit); > } > } > } > diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h > index 7520204..a1c486a 100644 > --- a/include/linux/async_tx.h > +++ b/include/linux/async_tx.h > @@ -199,13 +199,11 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int src_cnt, > > struct dma_async_tx_descriptor * > async_raid6_2data_recov(int src_num, size_t bytes, int faila, int failb, > - struct page **ptrs, bool is_ddf, > - struct async_submit_ctl *submit); > + struct page **ptrs, struct async_submit_ctl *submit); > > struct dma_async_tx_descriptor * > async_raid6_datap_recov(int src_num, size_t bytes, int faila, > - struct page **ptrs, bool is_ddf, > - struct async_submit_ctl *submit); > + struct page **ptrs, struct async_submit_ctl *submit); > > void async_tx_quiesce(struct dma_async_tx_descriptor **tx); > #endif /* _ASYNC_TX_H_ */ > ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts 2009-10-22 2:46 ` Neil Brown @ 2009-10-22 21:17 ` Dan Williams 0 siblings, 0 replies; 9+ messages in thread From: Dan Williams @ 2009-10-22 21:17 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid@vger.kernel.org On Wed, 2009-10-21 at 19:46 -0700, Neil Brown wrote: > Looks like a real improvement, but I think you need to be comparing > non_zero_srcs to '2' or '3', not '4', or '5'?? > > How about this on top of what you had? Equally untested. > Nice. Now tested with the ioatdma and iop-adma offload drivers via raid6test, 01raid6integ, and 07layouts. Pushed back out to: git://git.kernel.org/pub/scm/linux/kernel/git/djbw/md.git for-neil You will need to throw away the last pull as I replaced the commit in question with the patch below (for-neil is now at commit da17bf43). Thanks, Dan ----> async_tx: fix asynchronous raid6 recovery for ddf layouts From: Dan Williams <dan.j.williams@intel.com> The raid6 recovery code currently requires special handling of the 4-disk and 5-disk recovery scenarios for the native layout. Quoting from commit 0a82a623: In these situations the default N-disk algorithm will present 0-source or 1-source operations to dma devices. To cover for dma devices where the minimum source count is 2 we implement 4-disk and 5-disk handling in the recovery code. The ddf layout presents disks=6 and disks=7 to the recovery code in these situations. Instead of looking at the number of disks count the number of non-zero sources in the list and call the special case code when the number of non-failed sources is 0 or 1. [neilb@suse.de: replace 'ddf' flag with counting good sources] Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- crypto/async_tx/async_raid6_recov.c | 86 +++++++++++++++++++++++------------ 1 files changed, 56 insertions(+), 30 deletions(-) diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c index 8e30b6e..943f2ab 100644 --- a/crypto/async_tx/async_raid6_recov.c +++ b/crypto/async_tx/async_raid6_recov.c @@ -131,8 +131,8 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len, } static struct dma_async_tx_descriptor * -__2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, - struct async_submit_ctl *submit) +__2data_recov_4(int disks, size_t bytes, int faila, int failb, + struct page **blocks, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *a, *b; @@ -143,8 +143,8 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, void *cb_param = submit->cb_param; void *scribble = submit->scribble; - p = blocks[4-2]; - q = blocks[4-1]; + p = blocks[disks-2]; + q = blocks[disks-1]; a = blocks[faila]; b = blocks[failb]; @@ -170,8 +170,8 @@ __2data_recov_4(size_t bytes, int faila, int failb, struct page **blocks, } static struct dma_async_tx_descriptor * -__2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, - struct async_submit_ctl *submit) +__2data_recov_5(int disks, size_t bytes, int faila, int failb, + struct page **blocks, struct async_submit_ctl *submit) { struct dma_async_tx_descriptor *tx = NULL; struct page *p, *q, *g, *dp, *dq; @@ -181,21 +181,22 @@ __2data_recov_5(size_t bytes, int faila, int failb, struct page **blocks, dma_async_tx_callback cb_fn = submit->cb_fn; void *cb_param = submit->cb_param; void *scribble = submit->scribble; - int uninitialized_var(good); - int i; + int good_srcs, good, i; - for (i = 0; i < 3; i++) { + good_srcs = 0; + good = -1; + for (i = 0; i < disks-2; i++) { + if (blocks[i] == NULL) + continue; if (i == faila || i == failb) continue; - else { - good = i; - break; - } + good = i; + good_srcs++; } - BUG_ON(i >= 3); + BUG_ON(good_srcs > 1); - p = blocks[5-2]; - q = blocks[5-1]; + p = blocks[disks-2]; + q = blocks[disks-1]; g = blocks[good]; /* Compute syndrome with zero for the missing data pages @@ -323,6 +324,8 @@ struct dma_async_tx_descriptor * async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, struct page **blocks, struct async_submit_ctl *submit) { + int non_zero_srcs, i; + BUG_ON(faila == failb); if (failb < faila) swap(faila, failb); @@ -334,12 +337,11 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, */ if (!submit->scribble) { void **ptrs = (void **) blocks; - int i; async_tx_quiesce(&submit->depend_tx); for (i = 0; i < disks; i++) if (blocks[i] == NULL) - ptrs[i] = (void*)raid6_empty_zero_page; + ptrs[i] = (void *) raid6_empty_zero_page; else ptrs[i] = page_address(blocks[i]); @@ -350,19 +352,30 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, return NULL; } - switch (disks) { - case 4: + non_zero_srcs = 0; + for (i = 0; i < disks-2 && non_zero_srcs < 4; i++) + if (blocks[i]) + non_zero_srcs++; + switch (non_zero_srcs) { + case 0: + case 1: + /* There must be at least 2 sources - the failed devices. */ + BUG(); + + case 2: /* dma devices do not uniformly understand a zero source pq * operation (in contrast to the synchronous case), so - * explicitly handle the 4 disk special case + * explicitly handle the special case of a 4 disk array with + * both data disks missing. */ - return __2data_recov_4(bytes, faila, failb, blocks, submit); - case 5: + return __2data_recov_4(disks, bytes, faila, failb, blocks, submit); + case 3: /* dma devices do not uniformly understand a single * source pq operation (in contrast to the synchronous - * case), so explicitly handle the 5 disk special case + * case), so explicitly handle the special case of a 5 disk + * array with 2 of 3 data disks missing. */ - return __2data_recov_5(bytes, faila, failb, blocks, submit); + return __2data_recov_5(disks, bytes, faila, failb, blocks, submit); default: return __2data_recov_n(disks, bytes, faila, failb, blocks, submit); } @@ -388,6 +401,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, dma_async_tx_callback cb_fn = submit->cb_fn; void *cb_param = submit->cb_param; void *scribble = submit->scribble; + int good_srcs, good, i; struct page *srcs[2]; pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes); @@ -397,7 +411,6 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, */ if (!scribble) { void **ptrs = (void **) blocks; - int i; async_tx_quiesce(&submit->depend_tx); for (i = 0; i < disks; i++) @@ -413,6 +426,20 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, return NULL; } + good_srcs = 0; + good = -1; + for (i = 0; i < disks-2; i++) { + if (i == faila) + continue; + if (blocks[i]) { + good = i; + good_srcs++; + if (good_srcs > 1) + break; + } + } + BUG_ON(good_srcs == 0); + p = blocks[disks-2]; q = blocks[disks-1]; @@ -423,11 +450,10 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, blocks[faila] = NULL; blocks[disks-1] = dq; - /* in the 4 disk case we only need to perform a single source - * multiplication + /* in the 4-disk case we only need to perform a single source + * multiplication with the one good data block. */ - if (disks == 4) { - int good = faila == 0 ? 1 : 0; + if (good_srcs == 1) { struct page *g = blocks[good]; init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-10-22 21:17 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-20 7:11 [md PATCH 0/4] fix ddf asynchronous raid6 recovery Dan Williams 2009-10-20 7:11 ` [md PATCH 1/4] md/raid6: kill a gcc-4.0.1 'uninitialized variable' warning Dan Williams 2009-10-20 7:11 ` [md PATCH 2/4] async_pq: kill a stray dma_map() call and other cleanups Dan Williams 2009-10-20 7:11 ` [md PATCH 3/4] async_pq: rename scribble page Dan Williams 2009-10-20 7:11 ` [md PATCH 4/4] async_tx: fix asynchronous raid6 recovery for ddf layouts Dan Williams 2009-10-21 23:26 ` Neil Brown 2009-10-22 0:42 ` Dan Williams 2009-10-22 2:46 ` Neil Brown 2009-10-22 21:17 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).