Q: Is this how 'check' works (on raid10 in particular)?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Q: Is this how 'check' works (on raid10 in particular)?
@ 2008-08-03 12:32 Jon Nelson
  2008-08-03 12:54 ` Keld Jørn Simonsen
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Nelson @ 2008-08-03 12:32 UTC (permalink / raw)
  To: LinuxRaid

After digging through the code (admittedly, way too late at night), I
think I have a basic understanding of how the resync code works, and
why it appears to be suboptimal (speed-wise) for raid10.

It would appear that, upon receipt of a 'check' (other resync methods
have different paths, sometimes), md.c basically says, "start at the
first sector or the first sector after the checkpoint and proceed
logically through the end (unless told to stop)' and md.c schedules
this check with the relevant sync_request method. For raid10, this
finds the first device with that logical sector as a copy and then
compares the data there to the data in all of the other copies on the
other disks. For raid10 in f2 format (and to a less extent with the
offset format) this is going to result in a great deal of thrashing.
I'm guessing this is the reason why a 'check' operation raid10,f2
takes 2x as long as for raid5 (same disks). One way to improve the
efficiency here would be to perform a loop like this:

for device in devices:
  for chunk that is not a mirror:
    read chunk
    compare chunk to mirror chunks on other devices

If I'm not wrong this should result in near streaming speeds from each
device with a minimum of seeking. However, to effect this change it
looks like the changes would be more invasive than just changing
raid10.c. One way, of course, might be to abstract the sync code just
a bit more so that md.c could ask each device to provide a function
which does the driving (the above 4 lines) and md.c does all of the
common error checking, interrupt checking, etc... Does this seem like
crazy talk? If I can get some help I might give it a stab.

-- 
Jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: Is this how 'check' works (on raid10 in particular)?
  2008-08-03 12:32 Q: Is this how 'check' works (on raid10 in particular)? Jon Nelson
@ 2008-08-03 12:54 ` Keld Jørn Simonsen
  2008-08-03 13:28   ` Keld Jørn Simonsen
  2008-08-05  1:36   ` Jon Nelson
  0 siblings, 2 replies; 5+ messages in thread
From: Keld Jørn Simonsen @ 2008-08-03 12:54 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Sun, Aug 03, 2008 at 07:32:00AM -0500, Jon Nelson wrote:
> After digging through the code (admittedly, way too late at night), I
> think I have a basic understanding of how the resync code works, and
> why it appears to be suboptimal (speed-wise) for raid10.
> 
> It would appear that, upon receipt of a 'check' (other resync methods
> have different paths, sometimes), md.c basically says, "start at the
> first sector or the first sector after the checkpoint and proceed
> logically through the end (unless told to stop)' and md.c schedules
> this check with the relevant sync_request method. For raid10, this
> finds the first device with that logical sector as a copy and then
> compares the data there to the data in all of the other copies on the
> other disks. For raid10 in f2 format (and to a less extent with the
> offset format) this is going to result in a great deal of thrashing.
> I'm guessing this is the reason why a 'check' operation raid10,f2
> takes 2x as long as for raid5 (same disks). One way to improve the
> efficiency here would be to perform a loop like this:
> 
> for device in devices:
>   for chunk that is not a mirror:
>     read chunk
>     compare chunk to mirror chunks on other devices
> 
> If I'm not wrong this should result in near streaming speeds from each
> device with a minimum of seeking. However, to effect this change it
> looks like the changes would be more invasive than just changing
> raid10.c. One way, of course, might be to abstract the sync code just
> a bit more so that md.c could ask each device to provide a function
> which does the driving (the above 4 lines) and md.c does all of the
> common error checking, interrupt checking, etc... Does this seem like
> crazy talk? If I can get some help I might give it a stab.

My idea is to do the checks in bigger blocks, then you would minimize
the trashing, by minimizing the number of times you need to move the
head.  And this would not need much change in the code. I have done a
patch to do this, but I have not yet tested it.

Best regards
keld

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: Is this how 'check' works (on raid10 in particular)?
  2008-08-03 12:54 ` Keld Jørn Simonsen
@ 2008-08-03 13:28   ` Keld Jørn Simonsen
  2008-08-05  1:36   ` Jon Nelson
  1 sibling, 0 replies; 5+ messages in thread
From: Keld Jørn Simonsen @ 2008-08-03 13:28 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

[-- Attachment #1: Type: text/plain, Size: 2243 bytes --]

On Sun, Aug 03, 2008 at 02:54:13PM +0200, Keld Jørn Simonsen wrote:
> On Sun, Aug 03, 2008 at 07:32:00AM -0500, Jon Nelson wrote:
> > After digging through the code (admittedly, way too late at night), I
> > think I have a basic understanding of how the resync code works, and
> > why it appears to be suboptimal (speed-wise) for raid10.
> > 
> > It would appear that, upon receipt of a 'check' (other resync methods
> > have different paths, sometimes), md.c basically says, "start at the
> > first sector or the first sector after the checkpoint and proceed
> > logically through the end (unless told to stop)' and md.c schedules
> > this check with the relevant sync_request method. For raid10, this
> > finds the first device with that logical sector as a copy and then
> > compares the data there to the data in all of the other copies on the
> > other disks. For raid10 in f2 format (and to a less extent with the
> > offset format) this is going to result in a great deal of thrashing.
> > I'm guessing this is the reason why a 'check' operation raid10,f2
> > takes 2x as long as for raid5 (same disks). One way to improve the
> > efficiency here would be to perform a loop like this:
> > 
> > for device in devices:
> >   for chunk that is not a mirror:
> >     read chunk
> >     compare chunk to mirror chunks on other devices
> > 
> > If I'm not wrong this should result in near streaming speeds from each
> > device with a minimum of seeking. However, to effect this change it
> > looks like the changes would be more invasive than just changing
> > raid10.c. One way, of course, might be to abstract the sync code just
> > a bit more so that md.c could ask each device to provide a function
> > which does the driving (the above 4 lines) and md.c does all of the
> > common error checking, interrupt checking, etc... Does this seem like
> > crazy talk? If I can get some help I might give it a stab.
> 
> My idea is to do the checks in bigger blocks, then you would minimize
> the trashing, by minimizing the number of times you need to move the
> head.  And this would not need much change in the code. I have done a
> patch to do this, but I have not yet tested it.

Maybe you could test the patch?  enclosed

Best regards
keld

[-- Attachment #2: raid10.resync.patch --]
[-- Type: text/plain, Size: 681 bytes --]

--- raid10.c	2008-07-12 18:28:59.438235317 +0200
+++ raid10.c~	2008-07-03 05:46:47.000000000 +0200
@@ -80,7 +80,7 @@
 //#define RESYNC_BLOCK_SIZE PAGE_SIZE
 #define RESYNC_SECTORS (RESYNC_BLOCK_SIZE >> 9)
 #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
-#define RESYNC_WINDOW (2048*1024*16)
+#define RESYNC_WINDOW (2048*1024)
 
 /*
  * When performing a resync, we need to read and compare, so
@@ -686,7 +686,7 @@
  *    there is no normal IO happeing.  It must arrange to call
  *    lower_barrier when the particular background IO completes.
  */
-#define RESYNC_DEPTH 32*16
+#define RESYNC_DEPTH 32
 
 static void raise_barrier(conf_t *conf, int force)
 {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: Is this how 'check' works (on raid10 in particular)?
  2008-08-03 12:54 ` Keld Jørn Simonsen
  2008-08-03 13:28   ` Keld Jørn Simonsen
@ 2008-08-05  1:36   ` Jon Nelson
  2008-08-05 10:17     ` Keld Jørn Simonsen
  1 sibling, 1 reply; 5+ messages in thread
From: Jon Nelson @ 2008-08-05  1:36 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: LinuxRaid

The patch Keld sent me makes a significant difference:

Before:
(Stock 2.6.25.11 x86-64 openSUSE 11.0 kernel raid10 module):   4.5h to 5.5h

After:
Same kernel, minor changes to raid10.c and compiled:  1h 56m.

::

Aug  4 18:37:09 turnip kernel: md: data-check of RAID array md0
Aug  4 20:33:24 turnip kernel: md: md0: data-check done.


> My idea is to do the checks in bigger blocks, then you would minimize
> the trashing, by minimizing the number of times you need to move the
> head.  And this would not need much change in the code. I have done a
> patch to do this, but I have not yet tested it.


-- 
Jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Q: Is this how 'check' works (on raid10 in particular)?
  2008-08-05  1:36   ` Jon Nelson
@ 2008-08-05 10:17     ` Keld Jørn Simonsen
  0 siblings, 0 replies; 5+ messages in thread
From: Keld Jørn Simonsen @ 2008-08-05 10:17 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Mon, Aug 04, 2008 at 08:36:33PM -0500, Jon Nelson wrote:
> The patch Keld sent me makes a significant difference:
> 
> Before:
> (Stock 2.6.25.11 x86-64 openSUSE 11.0 kernel raid10 module):   4.5h to 5.5h
> 
> After:
> Same kernel, minor changes to raid10.c and compiled:  1h 56m.
> 
> ::
> 
> Aug  4 18:37:09 turnip kernel: md: data-check of RAID array md0
> Aug  4 20:33:24 turnip kernel: md: md0: data-check done.
> 
> 
> > My idea is to do the checks in bigger blocks, then you would minimize
> > the trashing, by minimizing the number of times you need to move the
> > head.  And this would not need much change in the code. I have done a
> > patch to do this, but I have not yet tested it.

Thanks for testing it. It sounds good, and as I expected it to behave.
I think the patch is clean and I have sent it to the list and Neil for
inclusion in the tree.

Best regards
keld

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-08-05 10:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-03 12:32 Q: Is this how 'check' works (on raid10 in particular)? Jon Nelson
2008-08-03 12:54 ` Keld Jørn Simonsen
2008-08-03 13:28   ` Keld Jørn Simonsen
2008-08-05  1:36   ` Jon Nelson
2008-08-05 10:17     ` Keld Jørn Simonsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).