From: Wols Lists <antlists@youngman.org.uk>
To: Shaohua Li <shli@kernel.org>, David Brown <david.brown@hesbynett.no>
Cc: linux-raid@vger.kernel.org, jes.sorensen@gmail.com, neilb@suse.de
Subject: Re: RAID creation resync behaviors
Date: Thu, 4 May 2017 16:50:38 +0100 [thread overview]
Message-ID: <590B4DCE.3070801@youngman.org.uk> (raw)
In-Reply-To: <20170504015454.d4obiuume6e3yrdv@kernel.org>
On 04/05/17 02:54, Shaohua Li wrote:
> On Wed, May 03, 2017 at 11:06:01PM +0200, David Brown wrote:
>> On 03/05/17 22:27, Shaohua Li wrote:
>>> Hi,
>>>
>>> Currently we have different resync behaviors in array creation.
>>>
>>> - raid1: copy data from disk 0 to disk 1 (overwrite)
>>> - raid10: read both disks, compare and write if there is difference (compare-write)
>>> - raid4/5: read first n-1 disks, calculate parity and then write parity to the last disk (overwrite)
>>> - raid6: read all disks, calculate parity and compare, and write if there is difference (compare-write)
>>>
>>> Write whole disk is very unfriendly for SSD, because it reduces lifetime. And
>>> if user already does a trim before creation, the unncessary write could make
>>> SSD slower in the future. Could we prefer compare-write to overwrite if mdadm
>>> detects the disks are SSD? Surely sometimes compare-write is slower than
>>> overwrite, so maybe add new option in mdadm. An option to let mdadm trim SSD
>>> before creation sounds reasonable too.
>>>
>>
>> When doing the first sync, md tracks how far its sync has got, keeping a
>> record in the metadata in case it has to be restarted (such as due to a
>> reboot while syncing). Why not simply /not/ sync stripes until you first
>> write to them? It may be that a counter of synced stripes is not enough,
>> and you need a bitmap (like the write intent bitmap), but it would reduce
>> the creation sync time to 0 and avoid any writes at all.
>
> For raid 4/5/6, this means we always must do a full stripe write for any normal
> write if it hits a range not synced. This would harm the performance of the
> norma write. For raid1/10, this sounds more appealing. But since each bit in
> the bitmap will stand for a range. If only part of the range is written by
> normal IO, we have two choices. sync the range immediately and clear the bit,
> this sync will impact normal IO. Don't do the sync immediately, but since the
> bit is set (which means the range isn't synced), read IO can only access the
> first disk, which is harmful too.
>
We're creating the array, right? So the user is sitting in front of
mdadm looking at its output, right?
So we just print a message saying "the disks aren't sync'd. If you don't
want a performance hit in normal use, fire up a sync now and take the
hit up front".
The question isn't "how do we avoid a performance hit?", it's "we're
going to take a hit, do we take it up-front on creation or defer it
until we're using the array?".
Cheers,
Wol
next prev parent reply other threads:[~2017-05-04 15:50 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-03 20:27 RAID creation resync behaviors Shaohua Li
2017-05-03 21:06 ` David Brown
2017-05-04 1:54 ` Shaohua Li
2017-05-04 7:37 ` David Brown
2017-05-04 16:02 ` Wols Lists
2017-05-04 21:57 ` NeilBrown
2017-05-05 6:46 ` David Brown
2017-05-04 15:50 ` Wols Lists [this message]
2017-05-04 22:00 ` NeilBrown
2017-05-03 23:58 ` Andreas Klauer
2017-05-04 2:22 ` Shaohua Li
2017-05-04 7:55 ` Andreas Klauer
2017-05-04 8:06 ` Roman Mamedov
2017-05-04 15:20 ` Brad Campbell
2017-05-04 1:07 ` NeilBrown
2017-05-04 2:04 ` Shaohua Li
2017-05-09 18:39 ` Jes Sorensen
2017-05-09 20:30 ` NeilBrown
2017-05-09 20:49 ` Jes Sorensen
2017-05-09 21:03 ` Martin K. Petersen
2017-05-09 21:11 ` Jes Sorensen
2017-05-09 21:16 ` Martin K. Petersen
2017-05-09 21:22 ` Jes Sorensen
2017-05-09 23:56 ` Martin K. Petersen
2017-05-10 5:58 ` Hannes Reinecke
2017-05-10 22:20 ` Martin K. Petersen
2017-05-10 17:30 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=590B4DCE.3070801@youngman.org.uk \
--to=antlists@youngman.org.uk \
--cc=david.brown@hesbynett.no \
--cc=jes.sorensen@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.