From: Stan Hoeppner <stan@hardwarefreak.com>
To: joystick <joystick@shiftmail.org>, Phillip Susi <psusi@ubuntu.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: The chunk size paradox
Date: Thu, 02 Jan 2014 09:41:27 -0600 [thread overview]
Message-ID: <52C588A7.6010207@hardwarefreak.com> (raw)
In-Reply-To: <52C57C7B.80400@shiftmail.org>
On 1/2/2014 8:49 AM, joystick wrote:
> For a 4k write in raid5, two 4k sectors are read, then
> two 4k sectors are written, and this is completely independent from
> chunk size.
First, there is no such thing as a 4K sector in Linux. Sectors are 512
bytes. Filesystem blocks and memory pages are 4K.
I'm no expert WRT raid5.c/raid6.c, but I'm pretty sure it doesn't work
as you state. I'm pretty sure it works like this:
Redundancy is maintained at the chunk level, not the filesystem block
level or page level. If modifying a single filesystem block, md will
read the data chunk of the stripe in which the 4 sectors of the 4KB
block resides, write back the chunk incorporating the changes to the 4
sectors, read the parity chunk, recalculate the parity chunk based on
the new data chunk, and then write back the parity chunk.
This is precisely why many folks, including myself, consider the current
512KB chunk default to be way too high. Modifying a single 4KB
filesystem block requires reading 1MB from disk and writing 1MB, a total
of 2MB of IO just to modify a single 4KB page. And AFAIK this is the
best case scenario. According to past posts by Neil, IIRC, the current
RAID5/6 code may read more than just two chunks during RMW depending on
certain factors. With RAID6 you have at least one extra chunk write, if
not an extra chunk read, so your IO is at least 2.5MB for a single 4K
write with RAID6.
--
Stan
next prev parent reply other threads:[~2014-01-02 15:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-30 18:48 The chunk size paradox Phillip Susi
2013-12-30 23:38 ` Peter Grandi
2013-12-31 0:01 ` Wolfgang Denk
2013-12-31 13:51 ` David Brown
2014-01-02 20:08 ` Phillip Susi
2014-01-02 14:49 ` joystick
2014-01-02 15:24 ` Phillip Susi
2014-01-02 15:41 ` Stan Hoeppner [this message]
2014-01-02 16:31 ` Phillip Susi
2014-01-02 18:02 ` Stan Hoeppner
2014-01-02 19:10 ` Phillip Susi
2014-01-02 22:49 ` Peter Grandi
2014-01-02 23:16 ` Stan Hoeppner
2014-01-03 1:02 ` Phillip Susi
2014-01-02 19:21 ` Joe Landman
2014-01-02 22:42 ` Stan Hoeppner
2014-01-02 22:56 ` Carsten Aulbert
2014-01-03 0:19 ` Phillip Susi
2014-01-03 1:24 ` Stan Hoeppner
2014-01-03 3:14 ` Joe Landman
2014-01-03 3:19 ` Stan Hoeppner
2014-01-03 4:24 ` Stan Hoeppner
2014-01-02 23:22 ` Peter Grandi
2014-01-03 3:09 ` Joe Landman
2014-01-03 4:58 ` Joe Landman
2014-01-02 22:32 ` Wolfgang Denk
2014-01-03 14:51 ` Benjamin ESTRABAUD
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52C588A7.6010207@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=joystick@shiftmail.org \
--cc=linux-raid@vger.kernel.org \
--cc=psusi@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).