linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Woodhouse <dwmw2@infradead.org>,
	"Martin K. Petersen" <mkp@mkp.net>,
	Matthew Wilcox <matthew@wil.cx>, Jeff Garzik <jeff@garzik.org>,
	linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	IDE/ATA development list <linux-ide@vger.kernel.org>,
	Eyal Shani <Eyal.Shani@sandisk.com>
Subject: Re: TRIM vs UNMAP vs WRITE SAME and thin devices
Date: Sat, 07 Feb 2009 11:14:08 -0500	[thread overview]
Message-ID: <498DB350.8000905@redhat.com> (raw)
In-Reply-To: <1234019372.4658.9.camel@localhost.localdomain>

James Bottomley wrote:
> On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
>   
>> I have been poked at by some vendors about the status of our support for 
>> the virtually/thinly provisioned luns since they are getting close to 
>> being able to test with real devices.
>>     
>
> With my LSF hat on, a certain array vendor might be sponsoring to get
> the opportunity to raise this issue more fully.  The impression (mostly
> correct) is that we're thinking about trim/unmap purely from the SSD FTL
> point of view and perhaps not being as useful as we might to virtually
> provisioned LUNs ... so you could mention to the other vendors that they
> might have an interest in coming (and even possibly sponsoring).
>   

That is probably worth bringing up - I don't see this as a large project 
and should be reasonably quick to get completed given all the work that 
David and others have already put into it. If you (with you LF hat on 
:-)) have a standard form or offer process, you might want to poke at 
NetApp, EMC, Hitachi,  IBM, HP and Dell. We both know the names of some 
people in storage in a few of those companies, others I have less 
contacts with.

On the other hand, this might also be an opportunity to get them and 
their engineers on the array side more directly and personally involved.
>   
>> My quick summary is that we most of the work so far has been done 
>> without any real hardware to play with - in 2.6.29-rc3, I don't see any 
>> low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD 
>> into the specific ATA or SCSI commands. Did I miss something & if not, 
>> do we have plans to push anything upstream soonish?
>>     
>
> With no devices it's a bit hard.  Also we need at least three pieces for
> SSDs: Devices supporting trim, the T13 implementation of TRIM and the
> SAT for UNMAP.  We can get the latter two out of the proposals, but it's
> still a bit of a moving target.
>   

I think that it has settled a bit - do we have a good sense of the 
status of the various proposals in T13 and T10?
>   
>> One note on the SCSI devices, there was a T10 proposal to add an "UNMAP" 
>> bit to the "WRITE SAME" command for SCSI. The details of the proposed 
>> interface are at:
>>
>> http://www.t11.org/t10/document.08/08-356r4.pdf
>>
>> The up side of using WRITE SAME with unmap is that there are no fuzzy 
>> semantics about what the unmapped sectors will be - they will all be 
>> whatever the WRITE SAME command would have set (usually zeroes I assume).
>>
>> The summary of write same is that you send down one sector (say 512 
>> bytes of zeroes) and a count so you can do a zeroing of the target 
>> without having to send all of the data over the wire. Very useful for 
>> initializing members of a RAID device for example to a known pattern.
>>
>> The down side would be that if we incorrectly send down a WRITE SAME 
>> command to a non-thin device, I think that we would kick off a potential 
>> extremely long IO. For example, imagine doing a write same of a full TB 
>> - that could take an hour which might be an issue :-)  Of course, we 
>> should not be doing that if we get the code right.
>>     
>
> As I read it, non thin provisioned devices can be identified (and may
> not even accept WRITE SAME).
>   

I agree that the intersection of write same and thin devices is not 
going to be 100%. We might end up needing both for SCSI in the worst 
case I suppose.
>   
>> I don't see another of the PDF's claims of advantages for file systems 
>> to be really all that useful.
>>
>> With either the write same and its proposed unmap bit or with the 
>> original T10 unmap, do we have a short list of infrastructure that needs 
>> fleshed out? Anything we can do to help get peoples patches to test with 
>> their non-GA thin enabled devices?
>>     
>
> Yes, REQ_DISCARD simply isn't broad enough to cope with all the
> potential uses of WRITE SAME.  If it's just a mechanism to get known
> data into a discard sector, fine, we can set that at the lower level.
> However, WRITE SAME has uses beyond TRIM in that it can be used as an
> engine for data deduplication.  If vendors are thinking of doing this,
> then REQ_DISCARD isn't flexible enough.
>   

I am more interested personally in the sparse support. On the dedup 
side, I think that most implementations do not rely on write same. They 
tend to compute hashes on the various blocks and so on.
>   
>> Is there a similar short list of things to be done for T13 devices with 
>> TRIM? Anyone have a chance to test on real hardware yet?
>>     
>
> Not that I know of yet.  It's all sort of on hold until actual devices
> become available.
>
> James
>
>
>   
The vendors certainly have things that they could try in their labs if 
we can get bits and pieces together for them to test with. We will need 
to avoid the chicken and egg scenario where they wait for us and we wait 
for them :-)

Ric
 

  reply	other threads:[~2009-02-07 16:14 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090123041558.GC24652@parisc-linux.org>
     [not found] ` <4979AF62.7070409@redhat.com>
     [not found]   ` <1232721777.4430.7.camel@macbook.infradead.org>
2009-02-07 14:53     ` TRIM vs UNMAP vs WRITE SAME and thin devices Ric Wheeler
2009-02-07 15:09       ` James Bottomley
2009-02-07 16:14         ` Ric Wheeler [this message]
2009-02-12 13:51           ` Eyal Shani
2009-03-23 19:05             ` Greg Freemyer
2009-03-23 19:23               ` Mark Lord
2009-02-07 22:50         ` Matthew Wilcox
2009-02-07 23:03           ` James Bottomley
2009-02-08 16:47           ` Ric Wheeler
2009-02-08 20:50             ` Matthew Wilcox
2009-02-08 23:58               ` Ric Wheeler
2009-02-07 22:47       ` Matthew Wilcox
2009-02-07 23:36         ` David Woodhouse
2009-02-07 23:46         ` Jeff Garzik
2009-02-08  0:24           ` Matthew Wilcox
2009-02-08 20:06       ` Greg Freemyer
2009-02-08 20:44         ` Matthew Wilcox
2009-02-09  0:01           ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=498DB350.8000905@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=Eyal.Shani@sandisk.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=dwmw2@infradead.org \
    --cc=jeff@garzik.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=mkp@mkp.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).