linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: patrik@dsl.sk
Cc: David Brown <david.brown@hesbynett.no>, linux-raid@vger.kernel.org
Subject: Re: Hot-replace for RAID5
Date: Wed, 16 May 2012 08:47:30 +1000	[thread overview]
Message-ID: <20120516084730.0b30fe31@notabene.brown> (raw)
In-Reply-To: <CAAOsTSn1+jtViRE-9f7YpMduZ9avBfWMC72vHF_NFC0kjF-hRg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4498 bytes --]

On Tue, 15 May 2012 21:39:10 +0200 Patrik Horník <patrik@dsl.sk> wrote:

> BTW thank you very much for the fix for layout=preserve. As soon as
> current reshape finishes, I am going to other arrays.
> 
> Are regressions in 2.3.4 serious and so to which version I should
> apply the patch? Or when you looked at the code, should
> layout=left-symmetric-6 work in 2.3.2?

Regression isn't dangerous, just inconvenient (--add often doesn't work).
--layout=left-symmetric-6 will work on 2.3.2, providing the current layout
of the array is "left symmetric" which I think is the default, but you should
check.

NeilBrown

> 
> In regard reshaping speed, estimation when doing things a lot more
> sequentially gives much higher speeds. Lets say 48 MB backup, 6 drives
> with 80 MB/s sequential speed. If you do reshaping like this:
> - Read 8 MB sequential from each drive in parallel, 0.1 s
> - Then write it to backup, 48/80 = 0.6 s
> - Calculate Q for something like 48 MB (guessing 0.05 s) and writing
> it back to diff drives in parallel in 0.1 s. Because it is in the
> cache and you are only writing  in this phase (?), there is not back
> and forth seeking and rotational latency applies only couple of times
> altogether, lets say 0.02.
> - Update superblock and move header back, two worst seeks, 0.03 s (I
> dont know how often do you update superblocks?)
> 
> you process 8 MB in cca 0.9 s, so speed in this scenario should be cca 9 MB/s.
> 
> I guess the main real difference when you logically doing it in
> stripes can be that when you waiting for completion of writing chunks
> (are you waiting for real completion of writes?), the difference
> between first and last drive is often long enough to need wait one or
> more rotations for writing another stripe. If that is the case, you
> need add cca 128 * lets say 1.5 * 0.005 s = 0.64 s and so we are down
> to cca 4.3 MB/s theoretically.
> 
> Patrik
> 
> On Tue, May 15, 2012 at 2:13 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 15 May 2012 13:56:58 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >
> >> Anyway increasing it to 5K did not help and drives don't seem to be
> >> fully utilized.
> >>
> >> Does the reshape work something like this:
> >> - Read about X = (50M / N - 1 / stripe size) stripes from drives and
> >> write them to the backup-file
> >> - Reshape X stripes one by another sequentially
> >> - Reshaping stripe by reading chunks from all drives, calculate Q,
> >> writing all chunks back and doing I/O for next stripe only after
> >> finishing previous one?
> >>
> >> So after increasing stripe_cache_size the cache should hold stripes
> >> after backing them and so reshaping should not need to read them from
> >> drives again?
> >>
> >> Cant the slow speed be caused by some synchronization issues? How are
> >> the stripes read for writing them to backup-file? Is it done one by
> >> one, so I/Os for next stripe are issued only after having read the
> >> previous stripe completely? Are they issued in maximum parallel way
> >> possible?
> >
> > There is as much parallelism as I could manage.
> > The backup file is divided into 2 sections.
> > Write to one,  then the other, then invalidate the first and write to it etc.
> > So while one half is being written, the data in the other half is being
> > reshaped in the array.
> > Also the stripe reads are scheduled asynchronously and as soon as a stripe is
> > fully available, the Q is calculated and they are scheduled for write.
> >
> > The slowness is due to continually having to seek back a little way to over
> > write what has just be read, and also having to update the metadata each time
> > to record where we are up to.
> >
> > NeilBrown
> >
> >
> >>
> >> Patrik
> >>
> >>
> >> On Tue, May 15, 2012 at 1:28 PM, NeilBrown <neilb@suse.de> wrote:
> >> > On Tue, 15 May 2012 13:16:42 +0200 Patrik Horník <patrik@dsl.sk> wrote:
> >> >
> >> >> Can I increase it during reshape by echo N >
> >> >> /sys/block/mdX/md/stripe_cache_size?
> >> >
> >> > Yes.
> >> >
> >> >
> >> >>
> >> >> How is the size determined? I have only 1027 while having 8 GB system memory...
> >> >
> >> > Not very well.
> >> >
> >> > It is set to 256, or the minimum size needed to allow the reshape to proceed
> >> > (which means about 4 chunks worth).  I should probably add some auto-sizing
> >> > but that sort of stuff is hard :-(
> >> >
> >> > NeilBrown
> >> >
> >


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-05-15 22:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-08  9:10 Hot-replace for RAID5 Patrik Horník
2012-05-10  6:59 ` David Brown
2012-05-10  8:50   ` Patrik Horník
2012-05-10 17:16   ` Patrik Horník
2012-05-11  0:50     ` NeilBrown
2012-05-11  2:44       ` Patrik Horník
2012-05-11  7:16         ` David Brown
2012-05-12  4:40           ` Patrik Horník
2012-05-12 15:56             ` Patrik Horník
2012-05-12 23:19               ` NeilBrown
2012-05-13  7:43                 ` Patrik Horník
2012-05-13 21:41                   ` Patrik Horník
2012-05-13 22:15                     ` NeilBrown
2012-05-14  0:52                       ` Patrik Horník
2012-05-15 10:11                         ` Patrik Horník
2012-05-15 10:43                           ` NeilBrown
     [not found]                             ` <CAAOsTSmMrs2bHDbFrND4-iaxwrTA0WySd_AVaK+KXZ-XZsysag@mail.gmail.com>
     [not found]                               ` <20120515212820.14db2fd2@notabene.brown>
2012-05-15 11:56                                 ` Patrik Horník
2012-05-15 12:13                                   ` NeilBrown
2012-05-15 19:39                                     ` Patrik Horník
2012-05-15 22:47                                       ` NeilBrown [this message]
2012-05-16  5:51                                         ` Patrik Horník
2012-05-16 23:34       ` Oliver Martin
2012-05-18  3:45         ` NeilBrown
2012-05-19 10:40           ` Patrik Horník
2012-05-21  9:54           ` Asdo
2012-05-21 10:12             ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120516084730.0b30fe31@notabene.brown \
    --to=neilb@suse.de \
    --cc=david.brown@hesbynett.no \
    --cc=linux-raid@vger.kernel.org \
    --cc=patrik@dsl.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).