All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] ataraid_end_request hides errors (all? 2.4 kernels)
@ 2004-07-26 23:36 Martins Krikis
  2004-07-26 23:57 ` Marcelo Tosatti
  0 siblings, 1 reply; 5+ messages in thread
From: Martins Krikis @ 2004-07-26 23:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, arjanv, mkrikis

[-- Attachment #1: Type: text/plain, Size: 3481 bytes --]

Marcelo,

I had already written the subject off as "nobody cares"; I'm very 
glad you responded. Some more comments are below and I'm attaching
a new patch.

> The correct thing seems to pass IO error ("!uptodate") to the upper 
> ->b_end_io callback only in case both components of an IO operation
> have 
> failed. As long as at least one of the mirrors have successfully 
> finished the IO, we should continue operation, providing reliability,
> AFAICS. 

I think you are talking about the RAID-1 read case. In this case
the ataraid subdrivers just pick one disk among all the possible
mirrors and submit the request to that. ataraid_end_request() is
not involved. The fact that most ataraid subdrivers won't resubmit
the IO to a different mirror if it fails is a very small problem
compared to what I'm picking on here.

And that is the following. For RAID1 writes, the IOs have to be
submitted to all the mirrors. If the IO for _any_ of them fails,
the upper levels should be notified of the error, because the
data integrity is broken across mirrors. (With the exception of
those few subdrivers that are capable of keeping working with the
volume in "degraded" mode.) Ataraid_end_request() is widely used
for the RAID1 write case and thus carries the danger of never
notifying the upper levels that anything went wrong. 

Furthermore, for RAID0 reads and writes, the ataraid subdrivers
widely employ the ataraid_split_request() function in order to 
get the IO size to fit in just one strip and thus be applicable
to just one disk. The ataraid_split_request() just splits the 
request in halves and may get invoked again on each part if they
still don't fit. Each pair of new IOs that are produced this way 
are tied together through the use of ataraid_end_request(). Thus, 
in theory there could be a whole binary tree of requests formed 
from the "original request". And again, the trouble from using
ataraid_end_request() is that most of these IOs in the tree could
go wrong, but as long as the last one succeeds, a chain of
ataraid_end_request()-s will happily report a successful IO
completion to each other and eventually back to the upper levels.
The userland application may never learn about the massive data
corruption that has just occured. Again, it is my strong belief
that if _any_ "component IO" fails, the whole "original IO" should
be failed. 

Therefore, the patch keeps track of cumulative IO status by using
an otherwise unused bit from the b_state field in each IO's "parent
buffer_head". Other solutions exist and I mentioned it before, but I
think this is the least intrusive and does not require extra memory.

> About error reporting, indeed the current scheme is very bad and
> doesnt
> correctly report errors. We should at least print some warning on 
> ataraid_end_request() informing of IO errors. 

I added a printk to the patch to follow your suggestion,
but I think the main thing is to let the upper levels know.

> Its not OK to mess with BH_PrivateStart, no, jbd/xfs use it
> for their own purposes.

That's very unfortunate, because the comments in fs.h for 2.4 kernels
and in buffer_head.h for 2.6 kernels make it look like all the bits
starting with BH_PrivateStart are fair game. Any chance of replacing
the inviting comment with a big warning, or better yet, officially
incorporating all the bits used in jbd.h into the b_state?

Best regards,

  Martins Krikis

P.S. The patch is untested. I don't have a setup to do this easily.




[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: ataraid.patch --]
[-- Type: text/x-patch, Size: 1248 bytes --]

--- linux/drivers/ide/raid/ataraid.c.orig	2004-07-13 23:45:21.000000000 -0400
+++ linux/drivers/ide/raid/ataraid.c	2004-07-26 18:52:21.000000000 -0400
@@ -34,7 +34,8 @@
 
 #include "ataraid.h"
 
-                                        
+#define BH_IOFailure (BH_PrivateStart + 5) /* jbd.h uses +0 to +4 */
+
 static int ataraid_hardsect_size[256];
 static int ataraid_blksize_size[256];
 
@@ -153,7 +154,17 @@ void ataraid_end_request(struct buffer_h
 	if (private==NULL)
 		BUG();
 
+	if (!uptodate) { 
+		set_bit(BH_IOFailure, &private->parent->b_state);
+		printk(KERN_ERR "ataraid: IO error on major %d minor %d\n",
+		       MAJOR(bh->b_rdev), MINOR(bh->b_rdev));
+	}		       
+	
 	if (atomic_dec_and_test(&private->count)) {
+		if (test_bit(BH_IOFailure, &private->parent->b_state)) {
+			uptodate = 0; /* fail the completed original I/O */
+			clear_bit(BH_IOFailure, &private->parent->b_state);
+		}
 		private->parent->b_end_io(private->parent,uptodate);
 		private->parent = NULL;
 		kfree(private);
@@ -194,6 +205,8 @@ static void ataraid_split_request(reques
 
 	bh2->b_data +=  bh->b_size/2;
 
+	clear_bit(BH_IOFailure, &bh->b_state); /* this bit tracks success */
+
 	generic_make_request(rw,bh1);
 	generic_make_request(rw,bh2);
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread
* [RFC][PATCH] ataraid_end_request hides errors (all? 2.4 kernels)
@ 2004-07-14  4:11 Martins Krikis
  2004-07-25 22:38 ` Marcelo Tosatti
  0 siblings, 1 reply; 5+ messages in thread
From: Martins Krikis @ 2004-07-14  4:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: mkrikis

[-- Attachment #1: Type: text/plain, Size: 1592 bytes --]

I know that interest in 2.4 kernels and ataraid at this point
is probably minimal, and I myself don't use ataraid_end_request.

However, I would appreciate if somebody could tell me whether a
patch like the one attached is acceptable or whether the use of
the BH_PrivateStart flag (style issues aside) can introduce any
new problems. In particular, I've noticed that ext3 and xfs also
use this flag.

The bug that the patch attempts to solve is the following.
Ataraid_end_request() uses the success/failure of the last
one of component I/Os as the success/failure of the complete
I/O (to a RAID volume). Thus, even if all the first components
fail, as long as the last one succeeds, it will report the
complete I/O as a success, and the user will not even get
an indication of any errors. The code is used by all ataraid
subdrivers except iswraid, as far as I know.

If using the BH_PrivateStart is not appropriate in this module,
a different free bit from b_state can be chosen. Protection
against component-I/O failure can also be achieved by introducing
a new field in ataraid_bh_private. Ideally the subdrivers would
also clear the flag/field, although for a truly unused flag
in b_state this can be skipped, so a flag-based solution can
be both quicker and need no extra space, I think.

Anyway, I'm very interested to hear whether anybody cares
and whether it is OK to use BH_PrivateStart in block device
drivers.

Thanks,

  Martins Krikis



		
__________________________________
Do you Yahoo!?
Take Yahoo! Mail with you! Get it on your mobile phone.
http://mobile.yahoo.com/maildemo 

[-- Attachment #2: ataraid.patch --]
[-- Type: application/octet-stream, Size: 949 bytes --]

--- linux/drivers/ide/raid/ataraid.c.orig	2004-07-13 23:45:21.000000000 -0400
+++ linux/drivers/ide/raid/ataraid.c	2004-07-13 23:45:32.000000000 -0400
@@ -153,7 +153,14 @@ void ataraid_end_request(struct buffer_h
 	if (private==NULL)
 		BUG();
 
+	if (!uptodate) /* record failures of components of the original I/O */
+		set_bit(BH_PrivateStart, &private->parent->b_state);
+	
 	if (atomic_dec_and_test(&private->count)) {
+		if (test_bit(BH_PrivateStart, &private->parent->b_state)) {
+			uptodate = 0; /* fail the completed original I/O */
+			clear_bit(BH_PrivateStart, &private->parent->b_state);
+		}
 		private->parent->b_end_io(private->parent,uptodate);
 		private->parent = NULL;
 		kfree(private);
@@ -194,6 +201,8 @@ static void ataraid_split_request(reques
 
 	bh2->b_data +=  bh->b_size/2;
 
+	clear_bit(BH_PrivateStart, &bh->b_state); /* this bit tracks success */
+
 	generic_make_request(rw,bh1);
 	generic_make_request(rw,bh2);
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-07-27  5:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-26 23:36 [RFC][PATCH] ataraid_end_request hides errors (all? 2.4 kernels) Martins Krikis
2004-07-26 23:57 ` Marcelo Tosatti
2004-07-27  5:15   ` Martins Krikis
  -- strict thread matches above, loose matches on Subject: below --
2004-07-14  4:11 Martins Krikis
2004-07-25 22:38 ` Marcelo Tosatti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.