linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alireza Haghdoost <alireza@cs.umn.edu>
Cc: Markus Stockhausen <stockhausen@collogia.de>,
	Roman Mamedov <rm@romanrm.net>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: RAID6 write I/O amplification?
Date: Thu, 26 Feb 2015 11:55:31 +1100	[thread overview]
Message-ID: <20150226115531.0df57e08@notabene.brown> (raw)
In-Reply-To: <CAB-428mPydqGoku-RnhZUDVyHVnzb73Yz=5bL7DOd+G88siDtg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2226 bytes --]

On Wed, 25 Feb 2015 18:40:46 -0600 Alireza Haghdoost <alireza@cs.umn.edu>
wrote:

> On Tue, Feb 24, 2015 at 12:29 AM, Markus Stockhausen
> <stockhausen@collogia.de> wrote:
> >> Von: linux-raid-owner@vger.kernel.org [linux-raid-owner@vger.kernel.org]&quot; im Auftrag von &quot;Roman Mamedov [rm@romanrm.net]
> >> Gesendet: Dienstag, 24. Februar 2015 00:58
> >> An: linux-raid@vger.kernel.org
> >> Betreff: RAID6 write I/O amplification?
> >>
> >> Hello,
> >>
> >> Got a bit of a "how does it actually work" question...
> >>
> >> Suppose I have an MD RAID6 of 8 drives, with 64KB chunk size.
> >>
> >> I am rewriting a 4KB filesystem sector somewhere on that RAID (not crossing
> >> the stripe boundary).
> >>
> >> What's the amount of disk I/O in total this will result in?
> >>
> >> I assume the RAID will need to read data from all drives, recompute parity,
> >> then write to the data stripe where the updated piece happened to be, and also
> >> write to two parity stripes.
> >>
> >> Is this done at a stripe granularity, so 6x64KB reads, 3x64KB writes?
> >> Or down to individual sectors (pages), i.e. 6x4KB reads, 3x4KB writes?
> >> Or am I describing this algorithm correctly at all?
> >
> > Implementation will work on "internal" stripe granularity and that is 4K
> > So your case will be 6x4KB read + 3x4KB write.
> 
> Having said that, does it mean that following description of "chunk
> size"  is wrong:
> '[chunk size] is the smallest "atomic" mass of data that can be
> written to the devices'
> since in this case chunk size is 64KB but 4KB is written atomically (?).
> I have find it in the kernel.org wiki page [1]

I think that when it says "atomic" it means in space, not time.

i.e. one (properly aligned) chunk of data will not be split up and 
written to different devices, it will all be written to one device.
If you write more than a chunk, it will be split up and parts of if written
to different devices.

You can still write less than a chunk.

So the intent is correct I think, but the word "atomic" doesn't really convey
the right meaning.  Probably it should be re-written to avoid that term and
just spell out what is happening.

NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

      reply	other threads:[~2015-02-26  0:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-23 23:58 RAID6 write I/O amplification? Roman Mamedov
2015-02-24  6:29 ` AW: " Markus Stockhausen
2015-02-26  0:40   ` Alireza Haghdoost
2015-02-26  0:55     ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150226115531.0df57e08@notabene.brown \
    --to=neilb@suse.de \
    --cc=alireza@cs.umn.edu \
    --cc=linux-raid@vger.kernel.org \
    --cc=rm@romanrm.net \
    --cc=stockhausen@collogia.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).