RAID1 robust read and read/write correct patch

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID1 robust read and read/write correct patch
@ 2005-02-23 11:01 J. David Beutel
  2005-02-23 11:50 ` Peter T. Breuer
  0 siblings, 1 reply; 4+ messages in thread
From: J. David Beutel @ 2005-02-23 11:01 UTC (permalink / raw)
  To: linux-raid

I'd like to try this patch 
http://marc.theaimsgroup.com/?l=linux-raid&m=110704868115609&w=2 with 
EVMS BBR.

Has anyone tried it on 2.6.10 (with FC2 1.9 and EVMS patches)?  Has 
anyone tried the rewrite part at all?  I don't know md or the kernel or 
this patch, but the following lines of the patch seem suspicious to me.  
Should it set R1BIO_ReadRewrite instead?  That bit gets tested later, 
whereas it seems like R1BIO_ReadRetry is never tested and 
R1BIO_ReadRewrite is never set.

+#ifdef DO_ADD_READ_WRITE_CORRECT
+	        else    /* tell next time we're here that we're a retry */
+	                set_bit(R1BIO_ReadRetry, &r1_bio->state);
+#endif /* DO_ADD_READ_WRITE_CORRECT */

Cheers,
11011011

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID1 robust read and read/write correct patch
  2005-02-23 11:01 RAID1 robust read and read/write correct patch J. David Beutel
@ 2005-02-23 11:50 ` Peter T. Breuer
  2005-02-23 20:48   ` J. David Beutel
  0 siblings, 1 reply; 4+ messages in thread
From: Peter T. Breuer @ 2005-02-23 11:50 UTC (permalink / raw)
  To: linux-raid

J. David Beutel <jdb@getsu.com> wrote:
> I'd like to try this patch 
> http://marc.theaimsgroup.com/?l=linux-raid&m=110704868115609&w=2 with 
> EVMS BBR.
> 
> Has anyone tried it on 2.6.10 (with FC2 1.9 and EVMS patches)?  Has 
> anyone tried the rewrite part at all?  I don't know md or the kernel or 
> this patch, but the following lines of the patch seem suspicious to me.  
> Should it set R1BIO_ReadRewrite instead?  That bit gets tested later, 
> whereas it seems like R1BIO_ReadRetry is never tested and 
> R1BIO_ReadRewrite is never set.
> 
> +#ifdef DO_ADD_READ_WRITE_CORRECT
> +               else    /* tell next time we're here that we're a retry */
> +                       set_bit(R1BIO_ReadRetry, &r1_bio->state);
> +#endif /* DO_ADD_READ_WRITE_CORRECT */

Quite possibly - I never tested the rewrite part of the patch, just
wrote it to indicate how it should go and stuck it in to encourage
others to go on from there.  It's disabled by default.  You almost
certainly don't want to enable it unless you are a developer (or a
guinea pig :).

As I recall I set a new flag (don't remember offhand what it is called)
to help the code understand that it is dealing with a retry, which in
turn helps the patch be more modular and not introduce extra program
state.

Let me look ...

Yes, the only flag added for the write correction stuff was this in raid1.h:

+#ifdef DO_ADD_READ_WRITE_CORRECT
+#define R1BIO_ReadRewrite  7
+#endif /* DO_ADD_READ_WRITE_CORRECT */

So hmmmmmmm. Let me look at what the "hidden" (disabled, whatever) part
of the patch does. It's a patch entirely to raid1.c (plus one bit
definition in raid1.h).

The first part of the patch is the couple of lines you noted above,
which simply mark the master bio to say that we have had a failure
on read here that would normally cause a disk expulsion, and we haven't
done it. Those lines are harmless in themselves (out of context). They
are in raid1_end_read_request() and the (mirrored) request in question
has just failed to read on its target device.

The second part of the patch is lower down in the same function and
will not be activated in the situation that the previous hunk was
activated. The hunk says:

#ifdef DO_ADD_READ_WRITE_CORRECT
                if (uptodate && test_bit(R1BIO_ReadRewrite, &r1_bio->state)) {
                        /* Success at last - rewrite failed reads */
                        set_bit(R1BIO_IsSync, &r1_bio->state);
                        reschedule_retry(r1_bio);
                } else
#endif /* DO_ADD_READ_WRITE_CORRECT */

(the "uptdate" guarantees that we're not in the same invocation as
before if we get here, because the previous hunk was active only in the
!uptodate case). I believe therefore that a previous failed
read retried on another target and we are now in the retry, which has
succeeded. I therefore believe that we should be testing the same bit
as we set before. The R1BIO_ReadRetry bit is not tested anywhere else -
I  believe it should be tested now. That will tell us that this is a
retry and we want to try and start a write based on our successful
read, hoping that the write will mend whatever has gone wrong on the
disk.

So yes, the patch in the second hunk should read

#ifdef DO_ADD_READ_WRITE_CORRECT
                if (uptodate && test_bit(R1BIO_ReadRetry, &r1_bio->state)) {
                        /* Success at last - rewrite failed reads */
                        set_bit(R1BIO_IsSync, &r1_bio->state);
                        reschedule_retry(r1_bio);
                } else
#endif /* DO_ADD_READ_WRITE_CORRECT */

I clearly wasn't concentrating on the name of the bit. I was more
worried about the mechanism that might trigger a rewrite attempt
(sufficiently worried not to test it myself!). The idea is that
we launch reschedule_retry() which should put the request on a queue to
be dealt with by the raid1d daemon thread.

This thread normally handles resyncs, but it'll do its read-then-write
trick given half a chance. By setting the IsSync bit we make it think
that the read has already been done (well, it has!), and that it's time
to send out a write based on it to all mirrors. I expect that the
completed read will have signalled all the interested kernel buffers
that the data is available. I am unsure if I need to increment a
reference counter on those buffers to stop them vanishing while we are
doing a write from them in raid1d.

That appears to be the whole patch!

Peter

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID1 robust read and read/write correct patch
  2005-02-23 11:50 ` Peter T. Breuer
@ 2005-02-23 20:48   ` J. David Beutel
  2005-02-23 21:15     ` Peter T. Breuer
  0 siblings, 1 reply; 4+ messages in thread
From: J. David Beutel @ 2005-02-23 20:48 UTC (permalink / raw)
  To: Peter T. Breuer; +Cc: linux-raid

Peter T. Breuer wrote, on 2005-Feb-23 1:50 AM:

> Quite possibly - I never tested the rewrite part of the patch, just
>
>wrote it to indicate how it should go and stuck it in to encourage
>others to go on from there.  It's disabled by default.  You almost
>certainly don't want to enable it unless you are a developer (or a
>guinea pig :).
>  
>

Thanks for taking a look at it!  Unfortunately, I'm not a kernel 
developer.  I haven't even been using C for the last 8 years.  But I'd 
really like to have that rewrite functionality, and I can dedicate my 
system as a guinea pig for at least a little while, if there's a way I 
can test it in a finite amount of time and build some confidence in it 
before I start to really use that system.

I'd like to start with an md unit test suite.  Is there one?  I don't 
know if the architecture would allow for this, but naively I'm thinking 
that the test suite would use a mock disk driver (e.g., in memory only) 
to simulate various kinds of hardware failures and confirm that md 
responds as expected to both the layer above (the kernel?) and below 
(the disk driver?).  Unit tests are also good for simulating unlikely 
and hard to reproduce race conditions, although stress tests are better 
at discovering new ones.   But, should the test suite play the roll of 
the kernel by calling md functions directly in a user space sandbox 
(mock kernel, threads, etc)?  Or, should it play the roll of a user 
process by calling the real kernel to test the real md (broadening the 
scope of the test)?  I'd appreciate opinions or advice from kernel or md 
developers.

Also, does anyone have advice on how I should do system and stress tests 
on this?

Cheers,
11011011

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RAID1 robust read and read/write correct patch
  2005-02-23 20:48   ` J. David Beutel
@ 2005-02-23 21:15     ` Peter T. Breuer
  0 siblings, 0 replies; 4+ messages in thread
From: Peter T. Breuer @ 2005-02-23 21:15 UTC (permalink / raw)
  To: linux-raid

J. David Beutel <jdb@getsu.com> wrote:
> Peter T. Breuer wrote, on 2005-Feb-23 1:50 AM:
> 
> > Quite possibly - I never tested the rewrite part of the patch, just
> >
> >wrote it to indicate how it should go and stuck it in to encourage
> >others to go on from there.  It's disabled by default.  You almost
> >certainly don't want to enable it unless you are a developer (or a
> >guinea pig :).
> 
> Thanks for taking a look at it!  Unfortunately, I'm not a kernel 
> developer.  I haven't even been using C for the last 8 years.  But I'd 
> really like to have that rewrite functionality, and I can dedicate my 
> system as a guinea pig for at least a little while, if there's a way I 
> can test it in a finite amount of time and build some confidence in it 
> before I start to really use that system.

I'd say that theres's about a 50/50 chance that it will work as it is
without crashing the kernel. But it's impossible to say until somebody
tries it unless more people offer their kibbitzing thought-experiments
first!

I can run the 2.4 UML kernel safely for tests, but it's not as good as
running a real kernel because you don't get an OOPS when things go bad.
You just don't have to suffer the psych pain of rebooting! So you can
do more debugging, although the debugging is not as good. And you stil 
have to set up the test again each time.

I'm particularly unclear in the present patch if end_io is run on the
original read request after it's been retried and used as the first
half of a read-write resync pair. I simply can't see from the code, and
running is the only way of finding out. There are also possible race
conditions against the resync thread proper under some conditions, but
that won't be a problem in testing.

> I'd like to start with an md unit test suite.  Is there one?  I don't 

!! I always simply do

   dd if=/dev/zero of=/tmp/core0 bs=4k count=1k
   dd if=/dev/zero of=/tmp/core1 bs=4k count=1k
   losetup /dev/loop0 /tmp/core0
   losetup /dev/loop1 /tmp/core1
   mdadm -C -l 1 -n 2 -x 0 --force /dev/md0 /dev/loop[01]

or something very like that.

> know if the architecture would allow for this, but naively I'm thinking 
> that the test suite would use a mock disk driver (e.g., in memory only) 
> to simulate various kinds of hardware failures and confirm that md 

Uh, one can do that via the devicemapper (dmsetup?) but I've never
bothered - it's much simpler to add a line or two to the code along the
lines of "if (bio->b_sector == 1024) return -1;" in order to simulate an
error. 

One could add ioctls to make that configurable, but by then we're in dm
territory.

> responds as expected to both the layer above (the kernel?) and below 
> (the disk driver?).  Unit tests are also good for simulating unlikely 
> and hard to reproduce race conditions, although stress tests are better 

Well, if you could make a dm-based testrig, yes please!

> at discovering new ones.   But, should the test suite play the roll of 
> the kernel by calling md functions directly in a user space sandbox 

Never mind that for now. The actual user space reads or writes
can be in a makefile. The difficulty is engineering the "devices"
to have the intended failures.

> (mock kernel, threads, etc)?  Or, should it play the roll of a user 
> process by calling the real kernel to test the real md (broadening the 

No - nothing like that. The testsuite will be run under a kernel. It's
not your business to know if it's a real kernel or a uml kernel or some
other kind of sandbox.

> scope of the test)?  I'd appreciate opinions or advice from kernel or md 
> developers.
> 
> Also, does anyone have advice on how I should do system and stress tests 
> on this?

Well, setting up is the major problem. After that running the tests is
just standard scripting.

Peter

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-02-23 21:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-02-23 11:01 RAID1 robust read and read/write correct patch J. David Beutel
2005-02-23 11:50 ` Peter T. Breuer
2005-02-23 20:48   ` J. David Beutel
2005-02-23 21:15     ` Peter T. Breuer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).