Re: Queuing of dm-raid1 resyncs to the same underlying block devices

dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Heinz Mauelshagen <heinzm@redhat.com>
To: Neil Brown <neilb@suse.de>,
	Brassow Jonathan <jbrassow@redhat.com>,
	device-mapper development <dm-devel@redhat.com>
Subject: Re: Queuing of dm-raid1 resyncs to the same underlying block devices
Date: Thu, 8 Oct 2015 13:50:02 +0200	[thread overview]
Message-ID: <5616586A.4000200@redhat.com> (raw)
In-Reply-To: <87fv1m8ied.fsf@notabene.neil.brown.name>



On 10/07/2015 11:42 PM, Neil Brown wrote:
> Heinz Mauelshagen <heinzm@redhat.com> writes:
>
>> On 10/01/2015 12:20 AM, Neil Brown wrote:
>>> Heinz Mauelshagen <heinzm@redhat.com> writes:
>>>> BTW:
>>>> When you create a raid1/4/5/6/10 LVs _and_ never read what you have not
>>>> written,
>>>> "--nosync" can be used anyway in order to avoid the initial
>>>> resynchronization load
>>>> on the devices. Any data written in that case will update all
>>>> mirrors/raid redundancy data.
>>>>
>>> While this is true for RAID1 and RAID10, and (I think) for the current
>>> implementation of RAID6, it is definitely not true for RAID4/5.
>> Thanks for the clarification.
>>
>> I find that to be really bad situation.
>>
>>
>>> For RAID4/5 a single-block write will be handled by reading
>>> old-data/parity, subtracting the old data from the parity and adding the
>>> new data, then writing out new data/parity.
>> Obviously for optimization reasons.
>>
>>> So if the parity was wrong before, it will be wrong afterwards.
>> So even overwriting complete stripes in raid4/5/(6)
>> would not ensure correct parity, thus always requiring
>> initial sync.
> No, over-writing complete stripes will result in correct parity.
> Even writing more than half of the data in a stripe will result in
> correct parity.


Useless, as you say, because we can never be sure, that
any filesystem/dbms/... upstack will guarantee >= half stripe
writes initially; even more so with many devices and large chunk sizes...

>
> So if you have a filesystem which only ever writes full stripes, then
> there is no need to sync at the start.  But I don't know any filesysetms
> which promise that.
>
> If you don't sync at creation time, then you may be perfectly safe when
> a device fails, but I can't promise that.  And without guarantees, RAID
> is fairly pointless.

Indeed.

>
>> We should think about a solution to avoid it in lieu
>> of growing disk/array sizes.
> With spinning-rust devices you need to read the entire array ("scrub")
> every few weeks just to make sure the media isn't degrading.  When you
> do that it is useful to check that the parity is still correct - as a
> potential warning sign of problems.
> If you don't sync first, then checking the parity doesn't tell you
> anything.

Yes, aware of this.

My point was avoiding superfluous mass io whenever possible.

E.g. keep track of the 'new' state of the array and initialize
parity/syndrome on first access to any given stripe with
the given performance optimization thereafter.

Metadata kept to housekeep this  could be organized in a b-tree
(e.g. via dm-persistent-data), thus storing just one node
defining the whole array as 'new' and splitting the tree up
as we go and have a size threshold to not allow to grow
such metadata too big.

Heinz

> And as you have to process the entire array occasionally anyway, you
> make as well do it at creation time.
>
> NeilBrown
>
>
>>
>> Heinz
>>
>>
>>> If the device that new data was written to then fails, the data on it is
>>> lost.
>>>
>>> So do this for RAID1/10 if you like, but not for other levels.
>>>
>>> NeilBrown

next prev parent reply	other threads:[~2015-10-08 11:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-26 15:49 Queuing of dm-raid1 resyncs to the same underlying block devices Richard Davies
2015-09-30 13:22 ` Brassow Jonathan
2015-09-30 14:00   ` Heinz Mauelshagen
2015-09-30 22:20     ` Neil Brown
2015-10-01 10:09       ` Heinz Mauelshagen
2015-10-07 21:42         ` Neil Brown
2015-10-08 11:50           ` Heinz Mauelshagen [this message]
2015-10-08 22:01             ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5616586A.4000200@redhat.com \
    --to=heinzm@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jbrassow@redhat.com \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).