Re: RAID creation resync behaviors

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Brown <david.brown@hesbynett.no>
To: NeilBrown <neilb@suse.com>, Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org, jes.sorensen@gmail.com, neilb@suse.de
Subject: Re: RAID creation resync behaviors
Date: Fri, 05 May 2017 08:46:51 +0200	[thread overview]
Message-ID: <590C1FDB.5090904@hesbynett.no> (raw)
In-Reply-To: <877f1wp9o3.fsf@notabene.neil.brown.name>

On 04/05/17 23:57, NeilBrown wrote:
> On Thu, May 04 2017, David Brown wrote:
> 
>>
>> I have another couple of questions that might be relevant, but I am
>> really not sure about the correct answers.
>>
>> First, if you have a stripe that you know is unused - it has not been
>> written to since the array was created - could the raid layer safely
>> return all zeros if an attempt was made to read the stripe?
> 
> "know is unused" and "it has not been written to since the array was
> created" are not necessarily the same thing.
> 
> If I have some devices which used to have a RAID5 array but for which
> the metadata got destroyed, I might carefully "create" a RAID5 over the
> devices and then have access to my data.  This has been done more than
> once - it is not just theoretical.

That is true, of course - anything like this would have to be optional
(command line switches in mdadm, for example).

There is also the opposite situation - when you /have/ had something
written to the array, but now you know it is unused (due to a trim).
Knowing the stripe is unused might make a later partial write a little
faster, and it would certainly speed up a scrub or other consistency
check since unused stripes can be skipped.

> 
> But if you really "know" it is unused, then returning zeros should be fine.
> 
>>
>> Second, when syncing an unused stripe (such as during creation), rather
>> than reading the old data and copying it or generating parities, could
>> we simply write all zeros to all the blocks in the stripes?  For many
>> SSDs, this is very efficient.
> 
> If you were happy to destroy whatever was there before (see above
> recovery example for when you wouldn't), then it might be possible to
> make this work.

As above, this would have to be option-controlled.  (I have had occasion
to pull disks from one dead server to recover them on another machine -
it's nerve-racking enough at the best of times, without fearing that you
will zero out your remaining good disks!)

> You would need to be careful not to write zeros over a region that the
> filesystem has already used.

Yes, but that should not be a difficult problem - the array is created
before the filesystem.

> That means you either disable all writes until the initialization
> completes (waste of time), or you add complexity to track which strips
> have been written and which haven't, and only initialise strips that have
> not been written.  This complexity would only be used once in the entire
> life of the RAID.  That might not be best use of resources.
> 

I am not sure I see how this would be a problem.  But it is something
that would need to be considered carefully when looking at details of
implementing these ideas (if anyone thinks they would be worth
implementing).

mvh.,

David

next prev parent reply	other threads:[~2017-05-05  6:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-03 20:27 RAID creation resync behaviors Shaohua Li
2017-05-03 21:06 ` David Brown
2017-05-04  1:54   ` Shaohua Li
2017-05-04  7:37     ` David Brown
2017-05-04 16:02       ` Wols Lists
2017-05-04 21:57       ` NeilBrown
2017-05-05  6:46         ` David Brown [this message]
2017-05-04 15:50     ` Wols Lists
2017-05-04 22:00       ` NeilBrown
2017-05-03 23:58 ` Andreas Klauer
2017-05-04  2:22   ` Shaohua Li
2017-05-04  7:55     ` Andreas Klauer
2017-05-04  8:06       ` Roman Mamedov
2017-05-04 15:20       ` Brad Campbell
2017-05-04  1:07 ` NeilBrown
2017-05-04  2:04   ` Shaohua Li
2017-05-09 18:39     ` Jes Sorensen
2017-05-09 20:30       ` NeilBrown
2017-05-09 20:49         ` Jes Sorensen
2017-05-09 21:03           ` Martin K. Petersen
2017-05-09 21:11             ` Jes Sorensen
2017-05-09 21:16               ` Martin K. Petersen
2017-05-09 21:22                 ` Jes Sorensen
2017-05-09 23:56                   ` Martin K. Petersen
2017-05-10  5:58                   ` Hannes Reinecke
2017-05-10 22:20                     ` Martin K. Petersen
2017-05-10 17:30                   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=590C1FDB.5090904@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=jes.sorensen@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=neilb@suse.de \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).