From: Wols Lists <antlists@youngman.org.uk>
To: Shaohua Li <shli@kernel.org>, David Brown <david.brown@hesbynett.no>
Cc: linux-raid@vger.kernel.org, jes.sorensen@gmail.com, neilb@suse.de
Subject: Re: RAID creation resync behaviors
Date: Thu, 4 May 2017 16:50:38 +0100 [thread overview]
Message-ID: <590B4DCE.3070801@youngman.org.uk> (raw)
In-Reply-To: <20170504015454.d4obiuume6e3yrdv@kernel.org>
On 04/05/17 02:54, Shaohua Li wrote:
> On Wed, May 03, 2017 at 11:06:01PM +0200, David Brown wrote:
>> On 03/05/17 22:27, Shaohua Li wrote:
>>> Hi,
>>>
>>> Currently we have different resync behaviors in array creation.
>>>
>>> - raid1: copy data from disk 0 to disk 1 (overwrite)
>>> - raid10: read both disks, compare and write if there is difference (compare-write)
>>> - raid4/5: read first n-1 disks, calculate parity and then write parity to the last disk (overwrite)
>>> - raid6: read all disks, calculate parity and compare, and write if there is difference (compare-write)
>>>
>>> Write whole disk is very unfriendly for SSD, because it reduces lifetime. And
>>> if user already does a trim before creation, the unncessary write could make
>>> SSD slower in the future. Could we prefer compare-write to overwrite if mdadm
>>> detects the disks are SSD? Surely sometimes compare-write is slower than
>>> overwrite, so maybe add new option in mdadm. An option to let mdadm trim SSD
>>> before creation sounds reasonable too.
>>>
>>
>> When doing the first sync, md tracks how far its sync has got, keeping a
>> record in the metadata in case it has to be restarted (such as due to a
>> reboot while syncing). Why not simply /not/ sync stripes until you first
>> write to them? It may be that a counter of synced stripes is not enough,
>> and you need a bitmap (like the write intent bitmap), but it would reduce
>> the creation sync time to 0 and avoid any writes at all.
>
> For raid 4/5/6, this means we always must do a full stripe write for any normal
> write if it hits a range not synced. This would harm the performance of the
> norma write. For raid1/10, this sounds more appealing. But since each bit in
> the bitmap will stand for a range. If only part of the range is written by
> normal IO, we have two choices. sync the range immediately and clear the bit,
> this sync will impact normal IO. Don't do the sync immediately, but since the
> bit is set (which means the range isn't synced), read IO can only access the
> first disk, which is harmful too.
>
We're creating the array, right? So the user is sitting in front of
mdadm looking at its output, right?
So we just print a message saying "the disks aren't sync'd. If you don't
want a performance hit in normal use, fire up a sync now and take the
hit up front".
The question isn't "how do we avoid a performance hit?", it's "we're
going to take a hit, do we take it up-front on creation or defer it
until we're using the array?".
Cheers,
Wol
next prev parent reply other threads:[~2017-05-04 15:50 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-03 20:27 RAID creation resync behaviors Shaohua Li
2017-05-03 21:06 ` David Brown
2017-05-04 1:54 ` Shaohua Li
2017-05-04 7:37 ` David Brown
2017-05-04 16:02 ` Wols Lists
2017-05-04 21:57 ` NeilBrown
2017-05-05 6:46 ` David Brown
2017-05-04 15:50 ` Wols Lists [this message]
2017-05-04 22:00 ` NeilBrown
2017-05-03 23:58 ` Andreas Klauer
2017-05-04 2:22 ` Shaohua Li
2017-05-04 7:55 ` Andreas Klauer
2017-05-04 8:06 ` Roman Mamedov
2017-05-04 15:20 ` Brad Campbell
2017-05-04 1:07 ` NeilBrown
2017-05-04 2:04 ` Shaohua Li
2017-05-09 18:39 ` Jes Sorensen
2017-05-09 20:30 ` NeilBrown
2017-05-09 20:49 ` Jes Sorensen
2017-05-09 21:03 ` Martin K. Petersen
2017-05-09 21:11 ` Jes Sorensen
2017-05-09 21:16 ` Martin K. Petersen
2017-05-09 21:22 ` Jes Sorensen
2017-05-09 23:56 ` Martin K. Petersen
2017-05-10 5:58 ` Hannes Reinecke
2017-05-10 22:20 ` Martin K. Petersen
2017-05-10 17:30 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=590B4DCE.3070801@youngman.org.uk \
--to=antlists@youngman.org.uk \
--cc=david.brown@hesbynett.no \
--cc=jes.sorensen@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).