* md raid acceleration and the async_tx api @ 2007-08-27 8:49 Yuri Tikhonov 2007-08-27 19:12 ` Williams, Dan J 0 siblings, 1 reply; 8+ messages in thread From: Yuri Tikhonov @ 2007-08-27 8:49 UTC (permalink / raw) To: dan.j.williams; +Cc: linux-raid, Wolfgang Denk, dzu Hello, I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE set to 64KB and found the bonnie++ application hangs-up during the "Re-writing" test. I made some investigations and discovered that the hang-up occurs because one of the mpage_end_io_read() calls is missing (these are the callbacks initiated from the ops_complete_biofill() function). The fact is that my low-level ADMA driver (the ppc440spe one) successfully initiated the ops_complete_biofill() callback but the ops_complete_biofill() function itself skipped calling the bi_end_io() handler of the completed bio (current dev->read) because during processing of this (current dev->read) bio some other request had come to the sh (current dev_q->toread). Thus ops_complete_biofill() scheduled another biofill operation which, as a result, overwrote the unacknowledged bio (dev->read in ops_run_biofill()), and so we lost the previous dev->read bio completely. Here is a patch that solves this problem. Perhaps this might be implemented in some more elegant and effective way. What are your thoughts regarding this? Regards, Yuri diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 08b4893..7abc96b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -838,11 +838,24 @@ static void ops_complete_biofill(void *stripe_head_ref) /* acknowledge completion of a biofill operation */ /* and check if we need to reply to a read request */ - if (test_bit(R5_Wantfill, &dev_q->flags) && !dev_q->toread) { + if (test_bit(R5_Wantfill, &dev_q->flags)) { struct bio *rbi, *rbi2; struct r5dev *dev = &sh->dev[i]; - clear_bit(R5_Wantfill, &dev_q->flags); + /* There is a chance that another fill operation + * had been scheduled for this dev while we + * processed sh. In this case do one of the following + * alternatives: + * - if there is no active completed biofill for the dev + * then go to the next dev leaving Wantfill set; + * - if there is active completed biofill for the dev + * then ack it but leave Wantfill set. + */ + if (dev_q->toread && !dev->read) + continue; + + if (!dev_q->toread) + clear_bit(R5_Wantfill, &dev_q->flags); /* The access to dev->read is outside of the * spin_lock_irq(&conf->device_lock), but is protected ^ permalink raw reply related [flat|nested] 8+ messages in thread
* RE: md raid acceleration and the async_tx api 2007-08-27 8:49 md raid acceleration and the async_tx api Yuri Tikhonov @ 2007-08-27 19:12 ` Williams, Dan J 2007-08-30 14:57 ` Yuri Tikhonov 0 siblings, 1 reply; 8+ messages in thread From: Williams, Dan J @ 2007-08-27 19:12 UTC (permalink / raw) To: Yuri Tikhonov; +Cc: linux-raid, Wolfgang Denk, dzu [-- Attachment #1: Type: text/plain, Size: 5966 bytes --] > From: Yuri Tikhonov [mailto:yur@emcraft.com] > Hello, > > I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE set to > 64KB and found the bonnie++ application hangs-up during the "Re-writing" > test. I made some investigations and discovered that the hang-up occurs > because one of the mpage_end_io_read() calls is missing (these are the > callbacks initiated from the ops_complete_biofill() function). > > The fact is that my low-level ADMA driver (the ppc440spe one) successfully > initiated the ops_complete_biofill() callback but the ops_complete_biofill() > function itself skipped calling the bi_end_io() handler of the completed bio > (current dev->read) because during processing of this (current dev->read) bio > some other request had come to the sh (current dev_q->toread). Thus > ops_complete_biofill() scheduled another biofill operation which, as a > result, overwrote the unacknowledged bio (dev->read in ops_run_biofill()), > and so we lost the previous dev->read bio completely. > > Here is a patch that solves this problem. Perhaps this might be implemented > in some more elegant and effective way. What are your thoughts regarding > this? > > Regards, Yuri > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 08b4893..7abc96b 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -838,11 +838,24 @@ static void ops_complete_biofill(void *stripe_head_ref) > /* acknowledge completion of a biofill operation */ > /* and check if we need to reply to a read request > */ > - if (test_bit(R5_Wantfill, &dev_q->flags) && !dev_q->toread) { > + if (test_bit(R5_Wantfill, &dev_q->flags)) { > struct bio *rbi, *rbi2; > struct r5dev *dev = &sh->dev[i]; > > - clear_bit(R5_Wantfill, &dev_q->flags); > + /* There is a chance that another fill operation > + * had been scheduled for this dev while we > + * processed sh. In this case do one of the following > + * alternatives: > + * - if there is no active completed biofill for the dev > + * then go to the next dev leaving Wantfill set; > + * - if there is active completed biofill for the dev > + * then ack it but leave Wantfill set. > + */ > + if (dev_q->toread && !dev->read) > + continue; > + > + if (!dev_q->toread) > + clear_bit(R5_Wantfill, &dev_q->flags); > > /* The access to dev->read is outside of the > * spin_lock_irq(&conf->device_lock), but is protected This still looks racy... I think the complete fix is to make the R5_Wantfill and dev_q->toread accesses atomic. Please test the following patch (also attached) and let me know if it fixes what you are seeing: Applies on top of git://lost.foo-projects.org/~dwillia2/git/iop md-for-linus --- raid5: fix ops_complete_biofill race in the asynchronous offload case Protect against dev_q->toread toggling while testing and clearing the R5_Wantfill bit. This change prevents all asynchronous completions (tasklets) from running during handle_stripe. --- drivers/md/raid5.c | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2f9022d..91c14c6 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -824,6 +824,7 @@ static void ops_complete_biofill(void *stripe_head_ref) (unsigned long long)sh->sector); /* clear completed biofills */ + spin_lock(&sq->lock); for (i = sh->disks; i--; ) { struct r5dev *dev = &sh->dev[i]; struct r5_queue_dev *dev_q = &sq->dev[i]; @@ -861,6 +862,7 @@ static void ops_complete_biofill(void *stripe_head_ref) } clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack); clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending); + spin_unlock(&sq->lock); return_io(return_bi); @@ -2279,7 +2281,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, (unsigned long long)bi->bi_sector, (unsigned long long)sq->sector); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); spin_lock_irq(&conf->device_lock); sh = sq->sh; if (forwrite) { @@ -2306,7 +2308,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, *bip = bi; bi->bi_phys_segments ++; spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", (unsigned long long)bi->bi_sector, @@ -2339,7 +2341,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, overlap: set_bit(R5_Overlap, &sh->dev[dd_idx].flags); spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); return 0; } @@ -3127,7 +3129,7 @@ static void handle_stripe5(struct stripe_head *sh) atomic_read(&sh->count), sq->pd_idx, sh->ops.pending, sh->ops.ack, sh->ops.complete); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); clear_bit(STRIPE_HANDLE, &sh->state); s.syncing = test_bit(STRIPE_SYNCING, &sh->state); @@ -3370,7 +3372,7 @@ static void handle_stripe5(struct stripe_head *sh) if (sh->ops.count) pending = get_stripe_work(sh); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); if (pending) raid5_run_ops(sh, pending); @@ -3397,7 +3399,7 @@ static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page) atomic_read(&sh->count), pd_idx, r6s.qd_idx); memset(&s, 0, sizeof(s)); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); clear_bit(STRIPE_HANDLE, &sh->state); s.syncing = test_bit(STRIPE_SYNCING, &sh->state); @@ -3576,7 +3578,7 @@ static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page) if (s.expanding && s.locked == 0) handle_stripe_expansion(conf, sh, &r6s); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); return_io(return_bi); [-- Attachment #2: md-accel-fix-ops-complete-biofill-race.patch --] [-- Type: application/octet-stream, Size: 3079 bytes --] raid5: fix ops_complete_biofill race in the asynchronous offload case From: Dan Williams <dan.j.williams@intel.com> Protect against dev_q->toread toggling while testing and clearing the R5_Wantfill bit. This change prevents all asynchronous completions (tasklets) from running during handle_stripe. --- drivers/md/raid5.c | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2f9022d..91c14c6 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -824,6 +824,7 @@ static void ops_complete_biofill(void *stripe_head_ref) (unsigned long long)sh->sector); /* clear completed biofills */ + spin_lock(&sq->lock); for (i = sh->disks; i--; ) { struct r5dev *dev = &sh->dev[i]; struct r5_queue_dev *dev_q = &sq->dev[i]; @@ -861,6 +862,7 @@ static void ops_complete_biofill(void *stripe_head_ref) } clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack); clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending); + spin_unlock(&sq->lock); return_io(return_bi); @@ -2279,7 +2281,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, (unsigned long long)bi->bi_sector, (unsigned long long)sq->sector); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); spin_lock_irq(&conf->device_lock); sh = sq->sh; if (forwrite) { @@ -2306,7 +2308,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, *bip = bi; bi->bi_phys_segments ++; spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", (unsigned long long)bi->bi_sector, @@ -2339,7 +2341,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, overlap: set_bit(R5_Overlap, &sh->dev[dd_idx].flags); spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); return 0; } @@ -3127,7 +3129,7 @@ static void handle_stripe5(struct stripe_head *sh) atomic_read(&sh->count), sq->pd_idx, sh->ops.pending, sh->ops.ack, sh->ops.complete); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); clear_bit(STRIPE_HANDLE, &sh->state); s.syncing = test_bit(STRIPE_SYNCING, &sh->state); @@ -3370,7 +3372,7 @@ static void handle_stripe5(struct stripe_head *sh) if (sh->ops.count) pending = get_stripe_work(sh); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); if (pending) raid5_run_ops(sh, pending); @@ -3397,7 +3399,7 @@ static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page) atomic_read(&sh->count), pd_idx, r6s.qd_idx); memset(&s, 0, sizeof(s)); - spin_lock(&sq->lock); + spin_lock_bh(&sq->lock); clear_bit(STRIPE_HANDLE, &sh->state); s.syncing = test_bit(STRIPE_SYNCING, &sh->state); @@ -3576,7 +3578,7 @@ static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page) if (s.expanding && s.locked == 0) handle_stripe_expansion(conf, sh, &r6s); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); return_io(return_bi); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: md raid acceleration and the async_tx api 2007-08-27 19:12 ` Williams, Dan J @ 2007-08-30 14:57 ` Yuri Tikhonov 2007-08-30 19:34 ` Dan Williams 0 siblings, 1 reply; 8+ messages in thread From: Yuri Tikhonov @ 2007-08-30 14:57 UTC (permalink / raw) To: Williams, Dan J; +Cc: linux-raid, Wolfgang Denk, dzu Hi Dan, On Monday 27 August 2007 23:12, you wrote: > This still looks racy... I think the complete fix is to make the > R5_Wantfill and dev_q->toread accesses atomic. Please test the > following patch (also attached) and let me know if it fixes what you are > seeing: Your approach doesn't help, the Bonnie++ utility hangs up during the ReWriting stage. Note that before applying your patch I rolled my fix in the ops_complete_biofill() function back. Do I understand it right that your patch should be used *instead* of my one rather than *with* it ? Regards, Yuri ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md raid acceleration and the async_tx api 2007-08-30 14:57 ` Yuri Tikhonov @ 2007-08-30 19:34 ` Dan Williams 0 siblings, 0 replies; 8+ messages in thread From: Dan Williams @ 2007-08-30 19:34 UTC (permalink / raw) To: Yuri Tikhonov; +Cc: linux-raid, Wolfgang Denk, dzu [-- Attachment #1: Type: text/plain, Size: 1556 bytes --] On 8/30/07, Yuri Tikhonov <yur@emcraft.com> wrote: > > Hi Dan, > > On Monday 27 August 2007 23:12, you wrote: > > This still looks racy... I think the complete fix is to make the > > R5_Wantfill and dev_q->toread accesses atomic. Please test the > > following patch (also attached) and let me know if it fixes what you are > > seeing: > > Your approach doesn't help, the Bonnie++ utility hangs up during the > ReWriting stage. > Looking at it again I see that what I added would not affect the failure you are seeing. However I noticed that you are using a broken version of the stripe-queue and cache_arbiter patches. In the current revisions the dev_q->flags field has been moved back to dev->flags which fixes a data corruption issue and could potentially address the hang you are seeing. The latest revisions are: raid5: add the stripe_queue object for tracking raid io requests (rev2) raid5: use stripe_queues to prioritize the "most deserving" requests (rev6) > Note that before applying your patch I rolled my fix in the > ops_complete_biofill() function back. Do I understand it right that your > patch should be used *instead* of my one rather than *with* it ? > You understood correctly. The attached patch integrates your change to keep R5_Wantfill set while also protecting the 'more_to_read' case. Please try it on top of the latest stripe-queue changes [1] (instead of the other proposed patches) . > Regards, Yuri Thanks, Dan [1] git fetch -f git://lost.foo-projects.org/~dwillia2/git/iop md-for-linus:refs/heads/md-for-linus [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: fix-ops-complete-biofill.patch --] [-- Type: text/x-patch; name="fix-ops-complete-biofill.patch", Size: 2878 bytes --] raid5: fix the 'more_to_read' case in ops_complete_biofill From: Dan Williams <dan.j.williams@intel.com> Prevent ops_complete_biofill from running concurrently with add_queue_bio --- drivers/md/raid5.c | 33 +++++++++++++++++++-------------- 1 files changed, 19 insertions(+), 14 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2f9022d..1c591d3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -828,22 +828,19 @@ static void ops_complete_biofill(void *stripe_head_ref) struct r5dev *dev = &sh->dev[i]; struct r5_queue_dev *dev_q = &sq->dev[i]; - /* check if this stripe has new incoming reads */ + /* 1/ acknowledge completion of a biofill operation + * 2/ check if we need to reply to a read request. + * 3/ check if we need to reschedule handle_stripe + */ if (dev_q->toread) more_to_read++; - /* acknowledge completion of a biofill operation */ - /* and check if we need to reply to a read request - */ - if (test_bit(R5_Wantfill, &dev->flags) && !dev_q->toread) { + if (test_bit(R5_Wantfill, &dev->flags)) { struct bio *rbi, *rbi2; - clear_bit(R5_Wantfill, &dev->flags); - /* The access to dev->read is outside of the - * spin_lock_irq(&conf->device_lock), but is protected - * by the STRIPE_OP_BIOFILL pending bit - */ - BUG_ON(!dev->read); + if (!dev_q->toread) + clear_bit(R5_Wantfill, &dev->flags); + rbi = dev->read; dev->read = NULL; while (rbi && rbi->bi_sector < @@ -899,8 +896,15 @@ static void ops_run_biofill(struct stripe_head *sh) } atomic_inc(&sh->count); + + /* spin_lock prevents ops_complete_biofill from running concurrently + * with add_queue_bio in the synchronous case + */ + spin_lock(&sq->lock); async_trigger_callback(ASYNC_TX_DEP_ACK | ASYNC_TX_ACK, tx, ops_complete_biofill, sh); + spin_unlock(&sq->lock); + } static void ops_complete_compute5(void *stripe_head_ref) @@ -2279,7 +2283,8 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, (unsigned long long)bi->bi_sector, (unsigned long long)sq->sector); - spin_lock(&sq->lock); + /* prevent asynchronous completions from running */ + spin_lock_bh(&sq->lock); spin_lock_irq(&conf->device_lock); sh = sq->sh; if (forwrite) { @@ -2306,7 +2311,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, *bip = bi; bi->bi_phys_segments ++; spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", (unsigned long long)bi->bi_sector, @@ -2339,7 +2344,7 @@ static int add_queue_bio(struct stripe_queue *sq, struct bio *bi, int dd_idx, overlap: set_bit(R5_Overlap, &sh->dev[dd_idx].flags); spin_unlock_irq(&conf->device_lock); - spin_unlock(&sq->lock); + spin_unlock_bh(&sq->lock); return 0; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
[parent not found: <200709071444.34911.yur@emcraft.com>]
[parent not found: <0C7297FA1D2D244A9C7F6959C0BF1E520268732A@azsmsx413.amr.corp.intel.com>]
* Re: md raid acceleration and the async_tx api [not found] ` <0C7297FA1D2D244A9C7F6959C0BF1E520268732A@azsmsx413.amr.corp.intel.com> @ 2007-09-13 9:38 ` Yuri Tikhonov 2007-09-13 16:52 ` Dan Williams 0 siblings, 1 reply; 8+ messages in thread From: Yuri Tikhonov @ 2007-09-13 9:38 UTC (permalink / raw) To: Williams, Dan J; +Cc: Wolfgang Denk, linux-raid Hi Dan, On Friday 07 September 2007 20:02, you wrote: > You need to fetch from the 'md-for-linus' tree. But I have attached > them as well. > > git fetch git://lost.foo-projects.org/~dwillia2/git/iop > md-for-linus:md-for-linus Thanks. Unrelated question. Comparing the drivers/md/raid5.c file in the Linus's 2.6.23-rc6 tree and in your md-for-linus one I'd found the following difference in the expand-related part of the handle_stripe5() function: - s.locked += handle_write_operations5(sh, 1, 1); + s.locked += handle_write_operations5(sh, 0, 1); That is, in your case we are passing rcw=0, whereas in the Linus's case the handle_write_operation5() is called with rcw=1. What code is correct ? Regards, Yuri ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md raid acceleration and the async_tx api 2007-09-13 9:38 ` Yuri Tikhonov @ 2007-09-13 16:52 ` Dan Williams 2007-09-13 21:14 ` Mr. James W. Laferriere 0 siblings, 1 reply; 8+ messages in thread From: Dan Williams @ 2007-09-13 16:52 UTC (permalink / raw) To: Yuri Tikhonov; +Cc: Wolfgang Denk, linux-raid, Andrew Morton, Neil Brown On 9/13/07, Yuri Tikhonov <yur@emcraft.com> wrote: > > Hi Dan, > > On Friday 07 September 2007 20:02, you wrote: > > You need to fetch from the 'md-for-linus' tree. But I have attached > > them as well. > > > > git fetch git://lost.foo-projects.org/~dwillia2/git/iop > > md-for-linus:md-for-linus > > Thanks. > > Unrelated question. Comparing the drivers/md/raid5.c file in the Linus's > 2.6.23-rc6 tree and in your md-for-linus one I'd found the following > difference in the expand-related part of the handle_stripe5() function: > > - s.locked += handle_write_operations5(sh, 1, 1); > + s.locked += handle_write_operations5(sh, 0, 1); > > That is, in your case we are passing rcw=0, whereas in the Linus's case the > handle_write_operation5() is called with rcw=1. What code is correct ? > There was a recent bug discovered in my changes to the expansion code. The fix has now gone into Linus's tree through Andrew's tree. I kept the fix out of my 'md-for-linus' tree to prevent it getting dropped from -mm due to automatic git-tree merge-detection. I have now rebased my git tree so everything is in sync. However, after talking with Neil at LCE we came to the conclusion that it would be best if I just sent patches since git tree updates tend to not get enough review, and because the patch sets will be more manageable now that the big pieces of the acceleration infrastructure have been merged. > Regards, Yuri Thanks, Dan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: md raid acceleration and the async_tx api 2007-09-13 16:52 ` Dan Williams @ 2007-09-13 21:14 ` Mr. James W. Laferriere 2007-09-13 21:30 ` Williams, Dan J 0 siblings, 1 reply; 8+ messages in thread From: Mr. James W. Laferriere @ 2007-09-13 21:14 UTC (permalink / raw) To: Dan Williams; +Cc: linux-raid maillist Hello Dan , On Thu, 13 Sep 2007, Dan Williams wrote: > On 9/13/07, Yuri Tikhonov <yur@emcraft.com> wrote: >> Hi Dan, >> On Friday 07 September 2007 20:02, you wrote: >>> You need to fetch from the 'md-for-linus' tree. But I have attached >>> them as well. >>> >>> git fetch git://lost.foo-projects.org/~dwillia2/git/iop >>> md-for-linus:md-for-linus >> >> Thanks. >> >> Unrelated question. Comparing the drivers/md/raid5.c file in the Linus's >> 2.6.23-rc6 tree and in your md-for-linus one I'd found the following >> difference in the expand-related part of the handle_stripe5() function: >> >> - s.locked += handle_write_operations5(sh, 1, 1); >> + s.locked += handle_write_operations5(sh, 0, 1); >> >> That is, in your case we are passing rcw=0, whereas in the Linus's case the >> handle_write_operation5() is called with rcw=1. What code is correct ? >> > There was a recent bug discovered in my changes to the expansion code. > The fix has now gone into Linus's tree through Andrew's tree. I kept > the fix out of my 'md-for-linus' tree to prevent it getting dropped > from -mm due to automatic git-tree merge-detection. I have now > rebased my git tree so everything is in sync. > > However, after talking with Neil at LCE we came to the conclusion that > it would be best if I just sent patches since git tree updates tend to > not get enough review, and because the patch sets will be more > manageable now that the big pieces of the acceleration infrastructure > have been merged. > >> Regards, Yuri > > Thanks, > Dan Does this discussion of patches include any changes to cure the 'BUG' instance I reported ? ie: raid5:md3: kernel BUG , followed by , Silent halt . Tia , JimL -- +-----------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 663 Beaumont Blvd | Give me Linux | | babydr@baby-dragons.com | Pacifica, CA. 94044 | only on AXP | +-----------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: md raid acceleration and the async_tx api 2007-09-13 21:14 ` Mr. James W. Laferriere @ 2007-09-13 21:30 ` Williams, Dan J 0 siblings, 0 replies; 8+ messages in thread From: Williams, Dan J @ 2007-09-13 21:30 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist > From: Mr. James W. Laferriere [mailto:babydr@baby-dragons.com] > Hello Dan , > > On Thu, 13 Sep 2007, Dan Williams wrote: > > On 9/13/07, Yuri Tikhonov <yur@emcraft.com> wrote: > >> Hi Dan, > >> On Friday 07 September 2007 20:02, you wrote: > >>> You need to fetch from the 'md-for-linus' tree. But I have attached > >>> them as well. > >>> > >>> git fetch git://lost.foo-projects.org/~dwillia2/git/iop > >>> md-for-linus:md-for-linus > >> > >> Thanks. > >> > >> Unrelated question. Comparing the drivers/md/raid5.c file in the Linus's > >> 2.6.23-rc6 tree and in your md-for-linus one I'd found the following > >> difference in the expand-related part of the handle_stripe5() function: > >> > >> - s.locked += handle_write_operations5(sh, 1, 1); > >> + s.locked += handle_write_operations5(sh, 0, 1); > >> > >> That is, in your case we are passing rcw=0, whereas in the Linus's case > the > >> handle_write_operation5() is called with rcw=1. What code is correct ? > >> > > There was a recent bug discovered in my changes to the expansion code. > > The fix has now gone into Linus's tree through Andrew's tree. I kept > > the fix out of my 'md-for-linus' tree to prevent it getting dropped > > from -mm due to automatic git-tree merge-detection. I have now > > rebased my git tree so everything is in sync. > > > > However, after talking with Neil at LCE we came to the conclusion that > > it would be best if I just sent patches since git tree updates tend to > > not get enough review, and because the patch sets will be more > > manageable now that the big pieces of the acceleration infrastructure > > have been merged. > > > >> Regards, Yuri > > > > Thanks, > > Dan > Does this discussion of patches include any changes to cure the 'BUG' > instance I reported ? > > ie: raid5:md3: kernel BUG , followed by , Silent halt . No, this is referring to: http://marc.info/?l=linux-raid&m=118845398229443&w=2 The BUG you reported currently looks to be caused by interactions with the bitmap support code... still investigating. > Tia , JimL Thanks, Dan ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-09-13 21:30 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-27 8:49 md raid acceleration and the async_tx api Yuri Tikhonov
2007-08-27 19:12 ` Williams, Dan J
2007-08-30 14:57 ` Yuri Tikhonov
2007-08-30 19:34 ` Dan Williams
[not found] <200709071444.34911.yur@emcraft.com>
[not found] ` <0C7297FA1D2D244A9C7F6959C0BF1E520268732A@azsmsx413.amr.corp.intel.com>
2007-09-13 9:38 ` Yuri Tikhonov
2007-09-13 16:52 ` Dan Williams
2007-09-13 21:14 ` Mr. James W. Laferriere
2007-09-13 21:30 ` Williams, Dan J
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).