From: Shaohua Li <shli@kernel.org>
To: Andreas Klauer <Andreas.Klauer@metamorpher.de>
Cc: linux-raid@vger.kernel.org, jes.sorensen@gmail.com, neilb@suse.de
Subject: Re: RAID creation resync behaviors
Date: Wed, 3 May 2017 19:22:58 -0700 [thread overview]
Message-ID: <20170504022258.eov6xh2zwtbfvcch@kernel.org> (raw)
In-Reply-To: <20170503235856.GA9698@metamorpher.de>
On Thu, May 04, 2017 at 01:58:56AM +0200, Andreas Klauer wrote:
> On Wed, May 03, 2017 at 01:27:48PM -0700, Shaohua Li wrote:
> > Write whole disk is very unfriendly for SSD, because it reduces lifetime.
> > And if user already does a trim before creation, the unncessary write
> > could make SSD slower in the future.
>
> I'm not a kernel developer so maybe I shouldn't reply. Feel free to ignore.
>
> I don't see this as a big issue, whoever uses SSD will likely also fstrim,
> so all SSD will know about free blocks regardless how the drive was added
> to the RAID.
>
> You don't resync everyday and once populated with data you just can't help
> but have many writes when adding / replacing drives. No way around it.
I agree fstrim doesn't make issue smaller. But there are still extra writes,
which if we can avoid in an easy way, we should do it.
> > An option to let mdadm trim SSD before creation sounds reasonable too.
>
> This is my personal opinion but - there is way too much trim in Linux.
>
> On HDD if you did a botched mkfs on the wrong device you still had a chance
> to recover data, with SSD it's all gone in an eyeblink, because mkfs.ext4
> and other programs unfortunately do trim without asking. Lots of people
> come to this list only after already playing with mdadm --create and if
> mdadm simply started trimming SSDs too, then all would be lost.
> LVM has these nice metadata backups but they're rendered useless
> if lvm.conf has issue_discards set to 1. Etc...
Totally understand the concerns. I think a new option is required for this and
it should not be default.
> And it's entirely superfluous, there was a big hullabaloo when SSD were
> new, everyone was concerned about how quickly they'd die when written to,
> but tests show their endurance is considerably greater than advertized.
> A single RAID resync won't put a dent in even a consumer's SSD lifetime.
From my experience, if my filesystem under a SSD is nearly full, I found the
system is unstable in at least one type of SSD. Fully write a SSD not just
reduces the lifetime, it also makes firmware of SSD has higher chance to fail.
> At the same time you have two utilities blkdiscard and fstrim so anyone
> who desires to trim can already easily do so with little effort. For SSD
> that return zero after TRIM you can already create like this:
>
> blkdiscard device1
> blkdiscard device2
> blkdiscard device3
> echo 3 > /proc/sys/vm/drop_caches # optional: Linux caches trimmed data
> mdadm --create --assume-clean /dev/md ... device1 device2 device3
Unfortunately not all SSD return zero after trim.
> If you wanted mdadm to do that directly, how about a mdadm --create --trim
> which implies assume-clean? But in my opinion it should not happen unasked.
> If it was up to me I'd even add a prompt asking to confirm dataloss...
>
> As for overwrite vs. compare-write, I don't know if it's possible or
> how painful it would be to implement but could you start out comparing,
> continue while the data actually matches, but switch to presumably much
> faster overwrite mode once there are sufficient mismatches? Perhaps with a
> fallback option so it can go back to compare later if data starts to match.
>
> So kind of a smart-compare-overwrite mode which would go something like:
>
> Compare. Match.
> Compare. Match.
> Compare. Mismatch. Overwrite.
> Compare. Mismatch. Overwrite x2.
> Compare. Mismatch. Overwrite x4.
> Compare. Match.
> Compare. Mismatch. Overwrite x8.
> Compare. Mismatch. Overwrite x16.
>
> Perhaps cap the overwrite multiplier at a certain point...
>
> Maybe a silly idea, I don't know.
This certainly is an interesting idea. Not sure if we should put complex
heuristics into kernel side though. If there is easy approach in mdadm side, it
definitely will be preferred.
Thanks,
Shaohua
next prev parent reply other threads:[~2017-05-04 2:22 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-03 20:27 RAID creation resync behaviors Shaohua Li
2017-05-03 21:06 ` David Brown
2017-05-04 1:54 ` Shaohua Li
2017-05-04 7:37 ` David Brown
2017-05-04 16:02 ` Wols Lists
2017-05-04 21:57 ` NeilBrown
2017-05-05 6:46 ` David Brown
2017-05-04 15:50 ` Wols Lists
2017-05-04 22:00 ` NeilBrown
2017-05-03 23:58 ` Andreas Klauer
2017-05-04 2:22 ` Shaohua Li [this message]
2017-05-04 7:55 ` Andreas Klauer
2017-05-04 8:06 ` Roman Mamedov
2017-05-04 15:20 ` Brad Campbell
2017-05-04 1:07 ` NeilBrown
2017-05-04 2:04 ` Shaohua Li
2017-05-09 18:39 ` Jes Sorensen
2017-05-09 20:30 ` NeilBrown
2017-05-09 20:49 ` Jes Sorensen
2017-05-09 21:03 ` Martin K. Petersen
2017-05-09 21:11 ` Jes Sorensen
2017-05-09 21:16 ` Martin K. Petersen
2017-05-09 21:22 ` Jes Sorensen
2017-05-09 23:56 ` Martin K. Petersen
2017-05-10 5:58 ` Hannes Reinecke
2017-05-10 22:20 ` Martin K. Petersen
2017-05-10 17:30 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170504022258.eov6xh2zwtbfvcch@kernel.org \
--to=shli@kernel.org \
--cc=Andreas.Klauer@metamorpher.de \
--cc=jes.sorensen@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.