From: Tim Small <tim@seoss.co.uk>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
Tejun Heo <tj@kernel.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
linux-ide@vger.kernel.org
Subject: Re: [PATCH] libata: Whitelist SSDs that are known to properly return zeroes after TRIM
Date: Wed, 10 Dec 2014 15:43:03 +0000 [thread overview]
Message-ID: <54886A07.5080307@seoss.co.uk> (raw)
In-Reply-To: <1418184578.2121.3.camel@HansenPartnership.com>
On 10/12/14 04:09, James Bottomley wrote:
> RAID requires that the redundancy verification of the array
> components matches otherwise a verify of the RAID fails. This means
> we have to have a guarantee what the verify read of a trimmed block
> of a RAID component will return. So for RAID-1, we just need both
> trimmed components to return the same data (we don't actually care
> what it is, just that it be mirrored);
FWIW, Unless I'm out-of-date, md RAID-1 and RAID-10 currently DON'T
guarantee that the data is always mirrored. The md(4) man page from
http://git.neil.brown.name/?p=mdadm.git;a=blob;f=md.4 says:
> However on RAID1 and RAID10 it is possible for software issues to
> cause a mismatch to be reported. This does not necessarily mean
> that the data on the array is corrupted. It could simply be that
> the system does not care what is stored on that part of the array -
> it is unused space.
>
> The most likely cause for an unexpected mismatch on RAID1 or RAID10
> occurs if a swap partition or swap file is stored on the array.
>
> When the swap subsystem wants to write a page of memory out, it
> flags the page as 'clean' in the memory manager and requests the
> swap device to write it out. It is quite possible that the memory
> will be changed while the write-out is happening. In that case the
> 'clean' flag will be found to be clear when the write completes and
> so the swap subsystem will simply forget that the swapout had been
> attempted, and will possibly choose a different page to write out.
>
> If the swap device was on RAID1 (or RAID10), then the data is sent
> from memory to a device twice (or more depending on the number of
> devices in the array). Thus it is possible that the memory gets
> changed between the times it is sent, so different data can be
> written to the different devices in the array. This will be
> detected by check as a mismatch. However it does not reflect any
> corruption as the block where this mismatch occurs is being treated
> by the swap system as being empty, and the data will never be read
> from that block.
>
> It is conceivable for a similar situation to occur on non-swap
> files, though it is less likely.
In my experience these inconsistencies are very common in data written
by certain applications (e.g. DBMS) on md RAID1 and RAID10 devices, and
effectively makes md verify useless for them. This is a shame -
particularly with things like FTLs where (at least anecdotally) data
scrambling seems to be a reasonably frequent occurrence.
So unreliable read-zero-on-TRIM isn't going to break the md RAID-1 /
RAID-10 use-case significantly, because verify hasn't worked on those
forever anyway.
Of course, a non-deterministic return-zero-on-trim will still cause
problems with md RAID5/6, and with other users such as dm RAID too I'd
guess.
Tim.
prev parent reply other threads:[~2014-12-10 16:11 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-04 2:44 [PATCH] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen
2014-12-04 3:02 ` Phillip Susi
2014-12-04 3:24 ` Martin K. Petersen
2014-12-04 3:28 ` Phillip Susi
2014-12-04 3:35 ` Martin K. Petersen
2014-12-04 4:40 ` Phillip Susi
2014-12-05 1:53 ` Martin K. Petersen
2014-12-04 21:49 ` One Thousand Gnomes
2014-12-05 2:46 ` Martin K. Petersen
2014-12-04 17:06 ` Tejun Heo
2014-12-05 2:13 ` Martin K. Petersen
2014-12-05 14:51 ` Tejun Heo
2014-12-10 4:09 ` James Bottomley
2014-12-10 14:29 ` Tejun Heo
2014-12-10 20:34 ` James Bottomley
2014-12-10 21:02 ` Martin K. Petersen
2014-12-12 8:35 ` Ming Lei
2015-01-05 16:28 ` Tejun Heo
2015-01-07 0:05 ` Martin K. Petersen
2015-01-07 2:54 ` Tejun Heo
2015-01-07 4:15 ` Dave Chinner
2015-01-07 15:26 ` Tejun Heo
2015-01-08 14:28 ` Martin K. Petersen
2015-01-08 15:11 ` Tejun Heo
2015-01-08 15:34 ` Martin K. Petersen
2015-01-08 15:36 ` Tejun Heo
2015-01-08 15:58 ` Tim Small
2015-01-09 20:52 ` Martin K. Petersen
2015-01-09 21:39 ` Tejun Heo
2015-01-08 14:29 ` Martin K. Petersen
2015-01-08 4:05 ` Phillip Susi
2015-01-08 4:58 ` Andreas Dilger
2015-01-08 14:09 ` Phillip Susi
2015-01-08 22:31 ` Andreas Dilger
2014-12-10 15:43 ` Tim Small [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54886A07.5080307@seoss.co.uk \
--to=tim@seoss.co.uk \
--cc=James.Bottomley@HansenPartnership.com \
--cc=linux-ide@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).