Re: 1 week to rebuid 4x 3TB raid10 is a long time!

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Date: Tue, 22 Jul 2014 02:51:13 +0000 (UTC)	[thread overview]
Message-ID: <pan$6f72e$34416d71$688a2a2$980c8acd@cox.net> (raw)
In-Reply-To: CAN05THTSG9czwpM7AYEPWg5hpZuJ=vjJrjz8yn5V0rXe5oguxA@mail.gmail.com

ronnie sahlberg posted on Mon, 21 Jul 2014 09:46:07 -0700 as excerpted:

> On Sun, Jul 20, 2014 at 7:48 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> ashford posted on Sun, 20 Jul 2014 12:59:21 -0700 as excerpted:
>>
>>> If you assume a 12ms average seek time (normal for 7200RPM SATA
>>> drives), an 8.3ms rotational latency (half a rotation), an average
>>> 64kb write and a 100MB/S streaming write speed, each write comes in
>>> at ~21ms, which gives us ~47 IOPS.  With the 64KB write size, this
>>> comes out to ~3MB/S, DISK LIMITED.
>>
>>> The 5MB/S that TM is seeing is fine, considering the small files he
>>> says he has.
>>
> That is actually nonsense.
> Raid rebuild operates on the block/stripe layer and not on the
> filesystem layer.

If we were talking about a normal raid, yes.  But we're talking about 
btrFS, note the FS for filesystem, so indeed it *IS* the filesystem 
layer.  Now this particular "filesystem" /does/ happen to have raid 
properties as well, but it's definitely filesystem level...

> It does not matter at all what the average file size is.

... and the filesize /does/ matter.

> Raid rebuild is really only limited by disk i/o speed when performing a
> linear read of the whole spindle using huge i/o sizes,
> or, if you have multiple spindles on the same bus, the bus saturation
> speed.

Makes sense... if you're dealing at the raid level.  If we were talking 
about dmraid or mdraid... and they're both much more mature and 
optimized, as well, so 50 MiB/sec, per spindle in parallel, would indeed 
be a reasonable expectation for them.

But (barring bugs, which will and do happen at this stage of development) 
btrfs both makes far better data validity guarantees, and does a lot more 
complex processing what with COW and snapshotting, etc, of course in 
addition to the normal filesystem level stuff AND the raid-level stuff it 
does.

> Thus is is perfectly reasonabe to expect ~50MByte/second, per spindle,
> when doing a raid rebuild.

... And perfectly reasonable, at least at this point, to expect ~5 MiB/
sec total thruput, one spindle at a time, for btrfs.

> That is for the naive rebuild that rebuilds every single stripe. A
> smarter rebuild that knows which stripes are unused can skip the unused
> stripes and thus become even faster than that.
> 
> 
> Now, that the rebuild is off by an order of magnitude is by design but
> should be fixed at some stage, but with the current state of btrfs it is
> probably better to focus on other more urgent areas first.

Because of all the extra work it does, btrfs may never get to full 
streaming speed across all spindles at once.  But it can and will 
certainly get much better than it is, once the focus moves to 
optimization.  *AND*, because it /does/ know which areas of the device 
are actually in use, once btrfs is optimized, it's quite likely that 
despite the slower raw speed, because it won't have to deal with the 
unused area, at least with the typically 20-60% unused filesystems most 
people run, rebuild times will match or be faster than raid-layer-only 
technologies that must rebuild the entire device, because they do /not/ 
know which areas are unused.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-07-22  2:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-20  8:45 1 week to rebuid 4x 3TB raid10 is a long time! TM
2014-07-20 13:53 ` Duncan
2014-07-20 14:00   ` Tomasz Torcz
2014-07-20 14:50     ` Austin S Hemmelgarn
2014-07-20 17:15     ` ashford
2014-07-20 18:21       ` TM
2014-07-20 18:23       ` TM
2014-07-20 19:15 ` Bob Marley
2014-07-20 19:36   ` Roman Mamedov
2014-07-20 19:59     ` ashford
2014-07-21  2:48       ` Duncan
2014-07-21 16:46         ` ronnie sahlberg
2014-07-21 18:31           ` Chris Murphy
2014-07-22  2:51           ` Duncan [this message]
2014-07-22 17:13             ` Chris Murphy
2014-07-24 17:19               ` Chris Murphy
2014-07-20 21:28     ` Bob Marley
2014-07-20 21:54       ` George Mitchell
2014-07-21  1:22 ` Wang Shilong
2014-07-21 14:00   ` TM
2014-07-22  1:10     ` Wang Shilong
2014-07-22  1:17     ` Wang Shilong
2014-07-22 14:43       ` TM
2014-07-22 15:30         ` Stefan Behrens
2014-07-22 20:21           ` TM

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$6f72e$34416d71$688a2a2$980c8acd@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).