From: Heinz Mauelshagen <heinzm@redhat.com>
To: Neil Brown <neilb@suse.de>,
Brassow Jonathan <jbrassow@redhat.com>,
device-mapper development <dm-devel@redhat.com>
Subject: Re: Queuing of dm-raid1 resyncs to the same underlying block devices
Date: Thu, 8 Oct 2015 13:50:02 +0200 [thread overview]
Message-ID: <5616586A.4000200@redhat.com> (raw)
In-Reply-To: <87fv1m8ied.fsf@notabene.neil.brown.name>
On 10/07/2015 11:42 PM, Neil Brown wrote:
> Heinz Mauelshagen <heinzm@redhat.com> writes:
>
>> On 10/01/2015 12:20 AM, Neil Brown wrote:
>>> Heinz Mauelshagen <heinzm@redhat.com> writes:
>>>> BTW:
>>>> When you create a raid1/4/5/6/10 LVs _and_ never read what you have not
>>>> written,
>>>> "--nosync" can be used anyway in order to avoid the initial
>>>> resynchronization load
>>>> on the devices. Any data written in that case will update all
>>>> mirrors/raid redundancy data.
>>>>
>>> While this is true for RAID1 and RAID10, and (I think) for the current
>>> implementation of RAID6, it is definitely not true for RAID4/5.
>> Thanks for the clarification.
>>
>> I find that to be really bad situation.
>>
>>
>>> For RAID4/5 a single-block write will be handled by reading
>>> old-data/parity, subtracting the old data from the parity and adding the
>>> new data, then writing out new data/parity.
>> Obviously for optimization reasons.
>>
>>> So if the parity was wrong before, it will be wrong afterwards.
>> So even overwriting complete stripes in raid4/5/(6)
>> would not ensure correct parity, thus always requiring
>> initial sync.
> No, over-writing complete stripes will result in correct parity.
> Even writing more than half of the data in a stripe will result in
> correct parity.
Useless, as you say, because we can never be sure, that
any filesystem/dbms/... upstack will guarantee >= half stripe
writes initially; even more so with many devices and large chunk sizes...
>
> So if you have a filesystem which only ever writes full stripes, then
> there is no need to sync at the start. But I don't know any filesysetms
> which promise that.
>
> If you don't sync at creation time, then you may be perfectly safe when
> a device fails, but I can't promise that. And without guarantees, RAID
> is fairly pointless.
Indeed.
>
>> We should think about a solution to avoid it in lieu
>> of growing disk/array sizes.
> With spinning-rust devices you need to read the entire array ("scrub")
> every few weeks just to make sure the media isn't degrading. When you
> do that it is useful to check that the parity is still correct - as a
> potential warning sign of problems.
> If you don't sync first, then checking the parity doesn't tell you
> anything.
Yes, aware of this.
My point was avoiding superfluous mass io whenever possible.
E.g. keep track of the 'new' state of the array and initialize
parity/syndrome on first access to any given stripe with
the given performance optimization thereafter.
Metadata kept to housekeep this could be organized in a b-tree
(e.g. via dm-persistent-data), thus storing just one node
defining the whole array as 'new' and splitting the tree up
as we go and have a size threshold to not allow to grow
such metadata too big.
Heinz
> And as you have to process the entire array occasionally anyway, you
> make as well do it at creation time.
>
> NeilBrown
>
>
>>
>> Heinz
>>
>>
>>> If the device that new data was written to then fails, the data on it is
>>> lost.
>>>
>>> So do this for RAID1/10 if you like, but not for other levels.
>>>
>>> NeilBrown
next prev parent reply other threads:[~2015-10-08 11:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-26 15:49 Queuing of dm-raid1 resyncs to the same underlying block devices Richard Davies
2015-09-30 13:22 ` Brassow Jonathan
2015-09-30 14:00 ` Heinz Mauelshagen
2015-09-30 22:20 ` Neil Brown
2015-10-01 10:09 ` Heinz Mauelshagen
2015-10-07 21:42 ` Neil Brown
2015-10-08 11:50 ` Heinz Mauelshagen [this message]
2015-10-08 22:01 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5616586A.4000200@redhat.com \
--to=heinzm@redhat.com \
--cc=dm-devel@redhat.com \
--cc=jbrassow@redhat.com \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).