linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* md raid acceleration and the async_tx api
@ 2007-08-27  8:49 Yuri Tikhonov
  2007-08-27 19:12 ` Williams, Dan J
  0 siblings, 1 reply; 8+ messages in thread
From: Yuri Tikhonov @ 2007-08-27  8:49 UTC (permalink / raw)
  To: dan.j.williams; +Cc: linux-raid, Wolfgang Denk, dzu


 Hello,

 I tested the h/w accelerated RAID-5 using the kernel with PAGE_SIZE set to 
64KB and found the bonnie++ application hangs-up during the "Re-writing" 
test. I made some investigations and discovered that the hang-up occurs 
because one of the mpage_end_io_read() calls is missing (these are the 
callbacks initiated from the ops_complete_biofill() function).

 The fact is that my low-level ADMA driver (the ppc440spe one) successfully 
initiated the ops_complete_biofill() callback but the ops_complete_biofill() 
function itself skipped calling the bi_end_io() handler of the completed bio 
(current dev->read) because during processing of this (current dev->read) bio 
some other request had come to the sh (current dev_q->toread). Thus 
ops_complete_biofill() scheduled another biofill operation which, as a 
result, overwrote the unacknowledged bio (dev->read in ops_run_biofill()), 
and so we lost the previous dev->read bio completely.

 Here is a patch that solves this problem. Perhaps this might be implemented 
in some more elegant and effective way. What are your thoughts regarding 
this?

 Regards, Yuri

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 08b4893..7abc96b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -838,11 +838,24 @@ static void ops_complete_biofill(void *stripe_head_ref)
 		/* acknowledge completion of a biofill operation */
 		/* and check if we need to reply to a read request
 		 */
-		if (test_bit(R5_Wantfill, &dev_q->flags) && !dev_q->toread) {
+		if (test_bit(R5_Wantfill, &dev_q->flags)) {
 			struct bio *rbi, *rbi2;
 			struct r5dev *dev = &sh->dev[i];
 
-			clear_bit(R5_Wantfill, &dev_q->flags);
+			/* There is a chance that another fill operation
+			 * had been scheduled for this dev while we
+			 * processed sh. In this case do one of the following
+			 * alternatives:
+			 * - if there is no active completed biofill for the dev
+			 *   then go to the next dev leaving Wantfill set;
+			 * - if there is active completed biofill for the dev
+			 *   then ack it but leave Wantfill set.
+			 */
+			if (dev_q->toread && !dev->read)
+				continue;
+
+			if (!dev_q->toread)
+				clear_bit(R5_Wantfill, &dev_q->flags);
 
 			/* The access to dev->read is outside of the
 			 * spin_lock_irq(&conf->device_lock), but is protected

^ permalink raw reply related	[flat|nested] 8+ messages in thread
[parent not found: <200709071444.34911.yur@emcraft.com>]

end of thread, other threads:[~2007-09-13 21:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-27  8:49 md raid acceleration and the async_tx api Yuri Tikhonov
2007-08-27 19:12 ` Williams, Dan J
2007-08-30 14:57   ` Yuri Tikhonov
2007-08-30 19:34     ` Dan Williams
     [not found] <200709071444.34911.yur@emcraft.com>
     [not found] ` <0C7297FA1D2D244A9C7F6959C0BF1E520268732A@azsmsx413.amr.corp.intel.com>
2007-09-13  9:38   ` Yuri Tikhonov
2007-09-13 16:52     ` Dan Williams
2007-09-13 21:14       ` Mr. James W. Laferriere
2007-09-13 21:30         ` Williams, Dan J

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).