linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Greg Freemyer <greg.freemyer@norcrossgroup.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	linux-raid <linux-raid@vger.kernel.org>,
	Dongjun Shin <djshin90@gmail.com>,
	IDE/ATA development list <linux-ide@vger.kernel.org>
Subject: Re: SSD data reliable vs. unreliable [Was: Re: Data Recovery from SSDs - Impact of trim?]
Date: Tue, 27 Jan 2009 06:23:46 -0500	[thread overview]
Message-ID: <497EEEC2.1040907@redhat.com> (raw)
In-Reply-To: <18814.39074.194781.490676@notabene.brown>

Neil Brown wrote:
> On Monday January 26, James.Bottomley@HansenPartnership.com wrote:
>   
>> On Mon, 2009-01-26 at 12:34 -0500, Greg Freemyer wrote:
>>     
>>> Adding mdraid list:
>>>
>>> Top post as a recap for mdraid list (redundantly at end of email if
>>> anyone wants to respond to any of this).:
>>>
>>> == Start RECAP
>>> With proposed spec changes for both T10 and T13 a new "unmap" or
>>> "trim" command is proposed respectively.  The linux kernel is
>>> implementing this as a sector discard and will be called by various
>>> file systems as they delete data files.  Ext4 will be one of the first
>>> to support this. (At least via out of kernel patches.)
>>>
>>> SCSI - see http://www.t10.org/cgi-bin/ac.pl?t=d&f=08-356r5.pdf
>>> ATA - see T13/e08137r2 draft
>>>
>>> Per the proposed spec changes, the underlying SSD device can
>>> optionally modify the unmapped data.  SCSI T10 at least restricts the
>>> way the modification happens, but data modification of unmapped data
>>> is still definitely allowed for both classes of SSD.
>>>
>>> Thus if a filesystem "discards" a sector, the contents of the sector
>>> can change and thus parity values are no longer meaningful for the
>>> stripe.
>>>       
>> This isn't correct.  The implementation is via bio and request discard
>> flags.  linux raid as a bio->bio mapping entity can choose to drop or
>> implement the discard flag (by default it will be dropped unless the
>> raid layer is modified).
>>     
>
> That's good.  I would be worried if they could slip through without
> md/raid noticing.
>
>   
>>> ie. If the unmap-ed blocks don't exactly correlate with the Raid-5 / 6
>>> stripping, then the integrity of a stripe containing both mapped and
>>> unmapped data is lost.
>>>
>>> Thus it seems that either the filesystem will have to understand the
>>> raid 5 / 6 stripping / chunking setup and ensure it never issues a
>>> discard command unless an entire stripe is being discarded.  Or that
>>> the raid implementation must must snoop the discard commands and take
>>> appropriate actions.
>>>       
>> No.  It only works if the discard is supported all the way through the
>> stack to the controller and device ... any point in the stack can drop
>> the discard.  It's also theoretically possible that any layer could
>> accumulate them as well (i.e. up to stripe size for raid).
>>     
>
> Accumulating them in the raid level would probably be awkward.
>
> It was my understanding that filesystems would (try to) send the
> largest possible 'discard' covering any surrounding blocks that had
> already been discarded.  Then e.g. raid5 could just round down any
> discard request to an aligned number of complete stripes and just
> discard those.  i.e. have all the accumulation done in the filesystem.
>
> To be able to safely discard stripes, raid5 would need to remember
> which stripes were discarded so that it could be sure to write out the
> whole stripe when updating any block on it, thus ensuring that parity
> will be correct again and will remain correct.
>
> Probably the only practical data structure for this would be a bitmap
> similar to the current write-intent bitmap.
>
> Is it really worth supporting this in raid5?   Are the sorts of
> devices that will benefit from 'discard' requests likely to be used
> inside an md/raid5 array I wonder....
>
> raid1 and raid10 are much easier to handle, so supporting 'discard'
> there certainly makes sense.
>
> NeilBrown
> --
>   

The benefit is also seen by SSD devices (T13) and high end arrays 
(T10).  On the array end, they almost universally do RAID support 
internally.

I suppose that people might make RAID5 devices out of SSD's locally, but 
it is probably not an immediate priority....

ric


  parent reply	other threads:[~2009-01-27 11:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <87f94c370901221553p4d3a749fl4717deabba5419ec@mail.gmail.com>
     [not found] ` <497A2B3C.3060603@redhat.com>
     [not found]   ` <1232749447.3250.146.camel@localhost.localdomain>
     [not found]     ` <87f94c370901231526jb41ea66ta1d6a23d7631d63c@mail.gmail.com>
     [not found]       ` <497A542C.1040900@redhat.com>
     [not found]         ` <7fce22690901260659u30ffd634m3fb7f75102141ee9@mail.gmail.com>
     [not found]           ` <497DE35C.6090308@redhat.com>
2009-01-26 17:34             ` SSD data reliable vs. unreliable [Was: Re: Data Recovery from SSDs - Impact of trim?] Greg Freemyer
2009-01-26 17:46               ` Ric Wheeler
2009-01-26 17:47               ` James Bottomley
2009-01-27  5:16                 ` Neil Brown
2009-01-27 10:49                   ` John Robinson
2009-01-28 20:11                     ` Bill Davidsen
     [not found]                       ` <7fce22690901281556h67fb353dp879f88e6c2a76eaf@mail.gmail.com>
2009-01-29  1:49                         ` John Robinson
2009-01-27 11:23                   ` Ric Wheeler [this message]
2009-01-28 20:28                     ` Bill Davidsen
2009-01-27 14:48                   ` James Bottomley
2009-01-27 14:54                     ` Ric Wheeler
2009-01-26 17:51               ` Mark Lord
2009-01-26 18:09                 ` Greg Freemyer
2009-01-26 18:21                   ` Mark Lord
2009-01-29 14:07                     ` Dongjun Shin
2009-01-29 15:46                       ` Mark Lord
2009-01-29 16:27                         ` Greg Freemyer
2009-01-30 15:43                           ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=497EEEC2.1040907@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=djshin90@gmail.com \
    --cc=greg.freemyer@norcrossgroup.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).