linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: russell@coker.com.au
Cc: Linux RAID Mailing List <linux-raid@vger.kernel.org>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Triple parity and beyond
Date: Sun, 24 Nov 2013 15:44:35 -0600	[thread overview]
Message-ID: <52927343.9000203@hardwarefreak.com> (raw)
In-Reply-To: <201311241619.08668.russell@coker.com.au>

On 11/23/2013 11:19 PM, Russell Coker wrote:
> On Sun, 24 Nov 2013, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>> I have always surmised that the culprit is rotational latency, because
>> we're not able to get a real sector-by-sector streaming read from each
>> drive.  If even only one disk in the array has to wait for the platter
>> to come round again, the entire stripe read is slowed down by an
>> additional few milliseconds.  For example, in an 8 drive array let's say
>> each stripe read is slowed 5ms by only one of the 7 drives due to
>> rotational latency, maybe acoustical management, or some other firmware
>> hiccup in the drive.  This slows down the entire stripe read because we
>> can't do parity reconstruction until all chunks are in.  An 8x 2TB array
>> with 512KB chunk has 4 million stripes of 4MB each.  Reading 4M stripes,
>> that extra 5ms per stripe read costs us
>>
>> (4,000,000 * 0.005)/3600 = 5.56 hours
> 
> If that is the problem then the solution would be to just enable read-ahead.  
> Don't we already have that in both the OS and the disk hardware?  The hard-
> drive read-ahead buffer should at least cover the case where a seek completes 
> but the desired sector isn't under the heads.

I'm not sure if read-ahead would solve such a problem, if indeed this is
a possible problem.  AFAIK the RAID5/6 drivers process stripes serially,
not asynchronously, so I'd think the rebuild may still stall for ms at a
time in such a situation.

> RAM size is steadily increasing, it seems that the smallest that you can get 
> nowadays is 1G in a phone and for a server the smallest is probably 4G.
> 
> On the smallest system that might have an 8 disk array you should be able to 
> use 512M for buffers which allows a read-ahead of 128 chunks.
> 
>> Now consider that arrays typically have a few years on them before the
>> first drive failure.  During our rebuild it's likely that some drives
>> will take a few rotations to return a sector that's marginal.
> 
> Are you suggesting that it would be a common case that people just write data 
> to an array and never read it or do an array scrub?  I hope that it will 
> become standard practice to have a cron job scrubbing all filesystems.

Given the frequency of RAID5 double drive failure "save me!" help
requests we see on a very regular basis here, it seems pretty clear this
is exactly what many users do.

>> So  this
>> might slow down a stripe read by dozens of milliseconds, maybe a full
>> second.  If this happens to multiple drives many times throughout the
>> rebuild it will add even more elapsed time, possibly additional hours.
> 
> Have you observed such 1 second reads in practice?

We seem to have regular reports from DIY hardware users intentionally
using mismatched consumer drives, as many believe this gives them
additional protection against a firmware bug in a given drive model.
But then they often see multiple second timeouts causing drives to be
kicked, or performance to be slow, because of the mismatched drives.

In my time on this list, it seems pretty clear that the vast majority of
posters use DIY hardware, not matched, packaged, tested solutions from
the likes of Dell, HP, IBM, etc.  Some of the things I've speculated
about in my last few posts could very well occur, and indeed be caused
by, ad hoc component selection and system assembly.  Obviously not in
all DIY cases, but probably many.

-- 
Stan


> One thing I've considered doing is placing a cheap disk on a speaker cone to 
> test vibration induced performance problems.  Then I can use a PC to control 
> the level of vibration in a reasonably repeatable manner.  I'd like to see 
> what the limits are for retries.
> 
> Some years ago a company I worked for had some vibration problems which 
> dropped the contiguous read speed from about 100MB/s to about 40MB/s on some 
> parts of the disk (other parts gave full performance).  That was a serious and 
> unusual problem and it only abouty halved the overall speed.

  reply	other threads:[~2013-11-24 21:44 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 22:08 Triple parity and beyond Andrea Mazzoleni
2013-11-18 22:12 ` H. Peter Anvin
2013-11-18 22:35   ` Andrea Mazzoleni
2013-11-18 23:25     ` H. Peter Anvin
2013-11-19 10:16       ` David Brown
2013-11-19 17:36         ` Andrea Mazzoleni
2013-11-19 22:51           ` Drew
2013-11-20  0:54             ` Chris Murphy
2013-11-20  1:23               ` John Williams
2013-11-20 10:35                 ` David Brown
2013-11-20 10:31           ` David Brown
2013-11-20 18:09             ` John Williams
2013-11-20 18:44               ` Andrea Mazzoleni
2013-11-21  6:15                 ` Stan Hoeppner
2013-11-21  8:32               ` David Brown
2013-11-20 18:34             ` Andrea Mazzoleni
2013-11-20 18:43               ` H. Peter Anvin
2013-11-20 18:56                 ` Andrea Mazzoleni
2013-11-20 18:59                   ` H. Peter Anvin
2013-11-20 21:21                     ` Andrea Mazzoleni
2013-11-20 19:00                   ` H. Peter Anvin
2013-11-20 21:04                     ` Andrea Mazzoleni
2013-11-20 21:06                       ` H. Peter Anvin
2013-11-21  8:36               ` David Brown
2013-11-19 17:28       ` Andrea Mazzoleni
2013-11-19 20:29         ` Ric Wheeler
2013-11-20 16:16           ` James Plank
2013-11-20 19:05             ` Andrea Mazzoleni
2013-11-20 19:10               ` H. Peter Anvin
2013-11-20 20:30                 ` James Plank
2013-11-20 21:23                   ` Andrea Mazzoleni
2013-11-27  2:50                     ` ronnie sahlberg
2013-11-20 21:28                   ` H. Peter Anvin
2013-11-21  1:28             ` Stan Hoeppner
2013-11-21  2:46               ` John Williams
2013-11-21  6:52                 ` Stan Hoeppner
2013-11-21  7:05                   ` John Williams
2013-11-21 22:57                     ` Stan Hoeppner
2013-11-21 23:38                       ` John Williams
2013-11-22  9:35                         ` Stan Hoeppner
2013-11-22 15:01                           ` John Williams
2013-11-22 22:28                             ` Stan Hoeppner
2013-11-22 23:07                       ` NeilBrown
2013-11-23  3:46                         ` Stan Hoeppner
2013-11-23  5:04                           ` NeilBrown
2013-11-23  5:34                             ` John Williams
2013-11-23  7:12                               ` NeilBrown
2013-11-24  4:03                                 ` Stan Hoeppner
2013-11-24  5:14                                   ` John Williams
2013-11-24 21:13                                     ` Stan Hoeppner
2013-11-24 23:28                                       ` Rudy Zijlstra
     [not found]                                       ` <l6u3h9$l72$2@ger.gmane.org>
2013-11-25  2:04                                         ` Stan Hoeppner
2013-11-25  9:15                                       ` David Brown
2013-11-24  5:19                                   ` Russell Coker
2013-11-24 21:44                                     ` Stan Hoeppner [this message]
2013-11-24 22:31                                       ` Mark Knecht
2013-11-25  2:14                                       ` Russell Coker
2013-11-25  9:20                                         ` David Brown
2013-11-21  8:08               ` joystick
2013-11-22  0:30                 ` Stan Hoeppner
2013-11-22  0:33                   ` H. Peter Anvin
2013-11-22  0:45                   ` David Brown
2013-11-21  9:07               ` David Brown
2013-11-21  9:54                 ` Adam Goryachev
2013-11-21 10:32                   ` David Brown
2013-11-22  8:12                   ` Russell Coker
2013-11-25 18:23                     ` Pasi Kärkkäinen
2013-11-22  8:13                 ` Stan Hoeppner
2013-11-22 13:15                   ` David Brown
2013-11-22 16:07                   ` Stan Hoeppner
2013-11-22 22:59                     ` NeilBrown
2013-11-23 17:39                       ` David Brown
2013-11-22 16:50                   ` Mark Knecht
2013-11-22 19:51                     ` Duncan
2013-11-22  8:38                 ` Stan Hoeppner
2013-11-22 13:24                   ` David Brown
2013-11-28  7:16                     ` Stan Hoeppner
2013-11-28  7:36                       ` Russell Coker
2013-11-28  9:56                       ` David Brown
2013-11-21 19:56               ` Piergiorgio Sartor
2013-11-19 18:12 ` Piergiorgio Sartor
2013-11-20 10:44   ` David Brown
2013-11-20 21:59     ` Piergiorgio Sartor
2013-11-21 10:13       ` David Brown
2013-11-21 17:37         ` Goffredo Baroncelli
2013-11-21 20:05         ` Piergiorgio Sartor
2013-11-21 20:31           ` David Brown
2013-11-21 20:52             ` Piergiorgio Sartor
2013-11-22  0:32               ` David Brown
2013-11-22 20:32                 ` Piergiorgio Sartor
2013-11-26 18:10             ` joystick
2013-11-20 21:38   ` Andrea Mazzoleni
2013-11-20 22:29 ` Piergiorgio Sartor
2013-11-23  7:55   ` Andrea Mazzoleni
2013-11-23 22:10     ` Piergiorgio Sartor
2013-11-24  9:39       ` Andrea Mazzoleni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52927343.9000203@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=russell@coker.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).