From: torn5 <torn5@shiftmail.org>
To: Theodore Tso <tytso@MIT.EDU>
Cc: linux-ext4@vger.kernel.org
Subject: Re: What to put for unknown stripe-width?
Date: Tue, 20 Sep 2011 17:29:34 +0200 [thread overview]
Message-ID: <4E78B15E.9060702@shiftmail.org> (raw)
In-Reply-To: <9D3B900A-8FCF-41B1-852A-FADD953FBDBD@mit.edu>
On 09/20/11 14:47, Theodore Tso wrote:
> But that's OK, because I don't know of any RAID array that supports
> this kind of radical surgery in parameters in the first case. :-)
Ted, thanks for your reply,
Linux MD raid supports this, it's called reshape. Most parameters
changes are supported, in particular the addition of a new disk and
restriping of a raid5 is supported *live*. It's not very stable though...
But apart from the MD live reshape/restripe, what I could do more likely
is to move such filesystem *live* across various RAIDs I have,
leveraging LVM's "pvmove". Such RAIDs are almost all of 1MB stride, but
with various number of elements, hence they have a different stripe-width.
> The other thing to consider is small writes. If you are doing small writes, a large stripe size is a disaster, because a 32k random write by a program like MySQL will turn into a 3MB read + 3MB write request.
No this is not correct, for MD at least.
MD uses strips to compute parity, which are always 4k wide for each
device. The reads in your example would be 32k read from two devices,
followed by 32k write to two devices. I am testing this now with iostat
to confirm what I'm saying with a dd 4k write: I see various spurious
read and writes (probably due to MD and LVM accounting, dirty flags etc)
which sum up to about 108k read and 18k write (that's the aggregated sum
from all drives) for a single 4k write to the MD device. That's
definitely not as large as even a single chunk which is 1MB.
What chunksize does is to regulate every how much data the placement of
parity is changed (i.e. your ascii-art picture was correct). Large
chunksize like I use, means that reads smaller than 1MB hopefully come
from 1 spindle only. This is useful for us.
So, regarding my original problem, the way you use stride-size in ext4
is that you begin every new file at the start of a stripe?
For growing an existing file what do you do, do you continue to write it
from where it was, without holes, or you put a hole, select a new
location at the start of a new stripe and start from there?
Regarding multiple very small files wrote together by pdflush, what do
they do? They are sticked together on the same stripe without holes, or
each one goes to a different stripe?
Is the change of stripe-width with tune2fs supported on a live, mounted
fs? (I mean maybe with a mount -o remount but no umount)
Thanks for your help,
T.
next prev parent reply other threads:[~2011-09-20 15:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-20 10:30 What to put for unknown stripe-width? torn5
2011-09-20 12:47 ` Theodore Tso
2011-09-20 15:29 ` torn5 [this message]
2011-09-20 16:00 ` Ted Ts'o
2011-09-20 23:29 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E78B15E.9060702@shiftmail.org \
--to=torn5@shiftmail.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@MIT.EDU \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox