The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Martins Krikis <mkrikis@yahoo.com>, arjanv@redhat.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] ataraid_end_request hides errors (all? 2.4 kernels)
Date: Mon, 26 Jul 2004 20:57:00 -0300	[thread overview]
Message-ID: <20040726235659.GA17761@logos.cnet> (raw)
In-Reply-To: <87d62i9w9q.fsf@yahoo.com>

On Mon, Jul 26, 2004 at 07:36:01PM -0400, Martins Krikis wrote:
> Marcelo,
> 
> I had already written the subject off as "nobody cares"; I'm very 
> glad you responded. Some more comments are below and I'm attaching
> a new patch.

:)

> > The correct thing seems to pass IO error ("!uptodate") to the upper 
> > ->b_end_io callback only in case both components of an IO operation
> > have 
> > failed. As long as at least one of the mirrors have successfully 
> > finished the IO, we should continue operation, providing reliability,
> > AFAICS. 

Note, I'm really ignorant about ataraid, so I might miss something, 
as you just pointed out that I have :) 

Anyway, lets keep going...

> I think you are talking about the RAID-1 read case. In this case
> the ataraid subdrivers just pick one disk among all the possible
> mirrors and submit the request to that. ataraid_end_request() is
> not involved. The fact that most ataraid subdrivers won't resubmit
> the IO to a different mirror if it fails is a very small problem
> compared to what I'm picking on here.
> 
> And that is the following. For RAID1 writes, the IOs have to be
> submitted to all the mirrors. If the IO for _any_ of them fails,
> the upper levels should be notified of the error, because the
> data integrity is broken across mirrors. (With the exception of
> those few subdrivers that are capable of keeping working with the
> volume in "degraded" mode.) 

I supposed all of them would be able to work in degraded mode? No?

> Ataraid_end_request() is widely used
> for the RAID1 write case and thus carries the danger of never
> notifying the upper levels that anything went wrong. 
> 
> Furthermore, for RAID0 reads and writes, the ataraid subdrivers
> widely employ the ataraid_split_request() function in order to 
> get the IO size to fit in just one strip and thus be applicable
> to just one disk. The ataraid_split_request() just splits the 
> request in halves and may get invoked again on each part if they
> still don't fit. Each pair of new IOs that are produced this way 
> are tied together through the use of ataraid_end_request(). Thus, 
> in theory there could be a whole binary tree of requests formed 
> from the "original request". And again, the trouble from using
> ataraid_end_request() is that most of these IOs in the tree could
> go wrong, but as long as the last one succeeds, a chain of
> ataraid_end_request()-s will happily report a successful IO
> completion to each other and eventually back to the upper levels.
> The userland application may never learn about the massive data
> corruption that has just occured. Again, it is my strong belief
> that if _any_ "component IO" fails, the whole "original IO" should
> be failed. 
> 
> Therefore, the patch keeps track of cumulative IO status by using
> an otherwise unused bit from the b_state field in each IO's "parent
> buffer_head". Other solutions exist and I mentioned it before, but I
> think this is the least intrusive and does not require extra memory.
> 
> > About error reporting, indeed the current scheme is very bad and
> > doesnt
> > correctly report errors. We should at least print some warning on 
> > ataraid_end_request() informing of IO errors. 
> 
> I added a printk to the patch to follow your suggestion,
> but I think the main thing is to let the upper levels know.
> 
> > Its not OK to mess with BH_PrivateStart, no, jbd/xfs use it
> > for their own purposes.
> 
> That's very unfortunate, because the comments in fs.h for 2.4 kernels
> and in buffer_head.h for 2.6 kernels make it look like all the bits
> starting with BH_PrivateStart are fair game. Any chance of replacing
> the inviting comment with a big warning, or better yet, officially
> incorporating all the bits used in jbd.h into the b_state?

Well, the comment says

        BH_PrivateStart,/* not a state bit, but the first bit available
                         * for private allocation by other entities
                         */

And "other entities" are some of the journalling filesystems.

When I read the comment, I can't figure out that using it 
freely from anywhere is "fair game". But you seem to understand
otherwise.

Yes, we could stick a stronger indication of "journalling fs'es for instance 
use this". Incorporating the bits in jbd.h into b_state doesnt seem right
because BH_PrivateStart is a generic thing, and not a JBD only thing. 
Does that make sense?

> Best regards,
> 
>   Martins Krikis
> 
> P.S. The patch is untested. I don't have a setup to do this easily.


The "#define BH_IOFailure (BH_PrivateStart + 5) /* jbd.h uses +0 to +4 */" 
is not the cleanest thing in the world, but...

Lets ask the guy who wrote this. Arjan, what you think of this patch?

>From my POV, better to report IO errors than not at all.

> --- linux/drivers/ide/raid/ataraid.c.orig	2004-07-13 23:45:21.000000000 -0400
> +++ linux/drivers/ide/raid/ataraid.c	2004-07-26 18:52:21.000000000 -0400
> @@ -34,7 +34,8 @@
>  
>  #include "ataraid.h"
>  
> -                                        
> +#define BH_IOFailure (BH_PrivateStart + 5) /* jbd.h uses +0 to +4 */
> +
>  static int ataraid_hardsect_size[256];
>  static int ataraid_blksize_size[256];
>  
> @@ -153,7 +154,17 @@ void ataraid_end_request(struct buffer_h
>  	if (private==NULL)
>  		BUG();
>  
> +	if (!uptodate) { 
> +		set_bit(BH_IOFailure, &private->parent->b_state);
> +		printk(KERN_ERR "ataraid: IO error on major %d minor %d\n",
> +		       MAJOR(bh->b_rdev), MINOR(bh->b_rdev));
> +	}		       
> +	
>  	if (atomic_dec_and_test(&private->count)) {
> +		if (test_bit(BH_IOFailure, &private->parent->b_state)) {
> +			uptodate = 0; /* fail the completed original I/O */
> +			clear_bit(BH_IOFailure, &private->parent->b_state);
> +		}
>  		private->parent->b_end_io(private->parent,uptodate);
>  		private->parent = NULL;
>  		kfree(private);
> @@ -194,6 +205,8 @@ static void ataraid_split_request(reques
>  
>  	bh2->b_data +=  bh->b_size/2;
>  
> +	clear_bit(BH_IOFailure, &bh->b_state); /* this bit tracks success */
> +
>  	generic_make_request(rw,bh1);
>  	generic_make_request(rw,bh2);
>  }


  reply	other threads:[~2004-07-27  1:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-26 23:36 [RFC][PATCH] ataraid_end_request hides errors (all? 2.4 kernels) Martins Krikis
2004-07-26 23:57 ` Marcelo Tosatti [this message]
2004-07-27  5:15   ` Martins Krikis
  -- strict thread matches above, loose matches on Subject: below --
2004-07-14  4:11 Martins Krikis
2004-07-25 22:38 ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040726235659.GA17761@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=arjanv@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkrikis@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox