linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TRIM vs UNMAP vs WRITE SAME and thin devices
       [not found]   ` <1232721777.4430.7.camel@macbook.infradead.org>
@ 2009-02-07 14:53     ` Ric Wheeler
  2009-02-07 15:09       ` James Bottomley
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ric Wheeler @ 2009-02-07 14:53 UTC (permalink / raw)
  To: David Woodhouse, James Bottomley, Martin K. Petersen
  Cc: Matthew Wilcox, Jeff Garzik, linux-scsi, linux-fsdevel,
	IDE/ATA development list


I have been poked at by some vendors about the status of our support for 
the virtually/thinly provisioned luns since they are getting close to 
being able to test with real devices.

My quick summary is that we most of the work so far has been done 
without any real hardware to play with - in 2.6.29-rc3, I don't see any 
low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD 
into the specific ATA or SCSI commands. Did I miss something & if not, 
do we have plans to push anything upstream soonish?

One note on the SCSI devices, there was a T10 proposal to add an "UNMAP" 
bit to the "WRITE SAME" command for SCSI. The details of the proposed 
interface are at:

http://www.t11.org/t10/document.08/08-356r4.pdf

The up side of using WRITE SAME with unmap is that there are no fuzzy 
semantics about what the unmapped sectors will be - they will all be 
whatever the WRITE SAME command would have set (usually zeroes I assume).

The summary of write same is that you send down one sector (say 512 
bytes of zeroes) and a count so you can do a zeroing of the target 
without having to send all of the data over the wire. Very useful for 
initializing members of a RAID device for example to a known pattern.

The down side would be that if we incorrectly send down a WRITE SAME 
command to a non-thin device, I think that we would kick off a potential 
extremely long IO. For example, imagine doing a write same of a full TB 
- that could take an hour which might be an issue :-)  Of course, we 
should not be doing that if we get the code right.

I don't see another of the PDF's claims of advantages for file systems 
to be really all that useful.

With either the write same and its proposed unmap bit or with the 
original T10 unmap, do we have a short list of infrastructure that needs 
fleshed out? Anything we can do to help get peoples patches to test with 
their non-GA thin enabled devices?

Is there a similar short list of things to be done for T13 devices with 
TRIM? Anyone have a chance to test on real hardware yet?

Thanks!

Ric


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 14:53     ` TRIM vs UNMAP vs WRITE SAME and thin devices Ric Wheeler
@ 2009-02-07 15:09       ` James Bottomley
  2009-02-07 16:14         ` Ric Wheeler
  2009-02-07 22:50         ` Matthew Wilcox
  2009-02-07 22:47       ` Matthew Wilcox
  2009-02-08 20:06       ` Greg Freemyer
  2 siblings, 2 replies; 18+ messages in thread
From: James Bottomley @ 2009-02-07 15:09 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: David Woodhouse, Martin K. Petersen, Matthew Wilcox, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
> I have been poked at by some vendors about the status of our support for 
> the virtually/thinly provisioned luns since they are getting close to 
> being able to test with real devices.

With my LSF hat on, a certain array vendor might be sponsoring to get
the opportunity to raise this issue more fully.  The impression (mostly
correct) is that we're thinking about trim/unmap purely from the SSD FTL
point of view and perhaps not being as useful as we might to virtually
provisioned LUNs ... so you could mention to the other vendors that they
might have an interest in coming (and even possibly sponsoring).

> My quick summary is that we most of the work so far has been done 
> without any real hardware to play with - in 2.6.29-rc3, I don't see any 
> low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD 
> into the specific ATA or SCSI commands. Did I miss something & if not, 
> do we have plans to push anything upstream soonish?

With no devices it's a bit hard.  Also we need at least three pieces for
SSDs: Devices supporting trim, the T13 implementation of TRIM and the
SAT for UNMAP.  We can get the latter two out of the proposals, but it's
still a bit of a moving target.

> One note on the SCSI devices, there was a T10 proposal to add an "UNMAP" 
> bit to the "WRITE SAME" command for SCSI. The details of the proposed 
> interface are at:
> 
> http://www.t11.org/t10/document.08/08-356r4.pdf
> 
> The up side of using WRITE SAME with unmap is that there are no fuzzy 
> semantics about what the unmapped sectors will be - they will all be 
> whatever the WRITE SAME command would have set (usually zeroes I assume).
> 
> The summary of write same is that you send down one sector (say 512 
> bytes of zeroes) and a count so you can do a zeroing of the target 
> without having to send all of the data over the wire. Very useful for 
> initializing members of a RAID device for example to a known pattern.
> 
> The down side would be that if we incorrectly send down a WRITE SAME 
> command to a non-thin device, I think that we would kick off a potential 
> extremely long IO. For example, imagine doing a write same of a full TB 
> - that could take an hour which might be an issue :-)  Of course, we 
> should not be doing that if we get the code right.

As I read it, non thin provisioned devices can be identified (and may
not even accept WRITE SAME).

> I don't see another of the PDF's claims of advantages for file systems 
> to be really all that useful.
> 
> With either the write same and its proposed unmap bit or with the 
> original T10 unmap, do we have a short list of infrastructure that needs 
> fleshed out? Anything we can do to help get peoples patches to test with 
> their non-GA thin enabled devices?

Yes, REQ_DISCARD simply isn't broad enough to cope with all the
potential uses of WRITE SAME.  If it's just a mechanism to get known
data into a discard sector, fine, we can set that at the lower level.
However, WRITE SAME has uses beyond TRIM in that it can be used as an
engine for data deduplication.  If vendors are thinking of doing this,
then REQ_DISCARD isn't flexible enough.

> Is there a similar short list of things to be done for T13 devices with 
> TRIM? Anyone have a chance to test on real hardware yet?

Not that I know of yet.  It's all sort of on hold until actual devices
become available.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 15:09       ` James Bottomley
@ 2009-02-07 16:14         ` Ric Wheeler
  2009-02-12 13:51           ` Eyal Shani
  2009-02-07 22:50         ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Ric Wheeler @ 2009-02-07 16:14 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Woodhouse, Martin K. Petersen, Matthew Wilcox, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list, Eyal Shani

James Bottomley wrote:
> On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
>   
>> I have been poked at by some vendors about the status of our support for 
>> the virtually/thinly provisioned luns since they are getting close to 
>> being able to test with real devices.
>>     
>
> With my LSF hat on, a certain array vendor might be sponsoring to get
> the opportunity to raise this issue more fully.  The impression (mostly
> correct) is that we're thinking about trim/unmap purely from the SSD FTL
> point of view and perhaps not being as useful as we might to virtually
> provisioned LUNs ... so you could mention to the other vendors that they
> might have an interest in coming (and even possibly sponsoring).
>   

That is probably worth bringing up - I don't see this as a large project 
and should be reasonably quick to get completed given all the work that 
David and others have already put into it. If you (with you LF hat on 
:-)) have a standard form or offer process, you might want to poke at 
NetApp, EMC, Hitachi,  IBM, HP and Dell. We both know the names of some 
people in storage in a few of those companies, others I have less 
contacts with.

On the other hand, this might also be an opportunity to get them and 
their engineers on the array side more directly and personally involved.
>   
>> My quick summary is that we most of the work so far has been done 
>> without any real hardware to play with - in 2.6.29-rc3, I don't see any 
>> low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD 
>> into the specific ATA or SCSI commands. Did I miss something & if not, 
>> do we have plans to push anything upstream soonish?
>>     
>
> With no devices it's a bit hard.  Also we need at least three pieces for
> SSDs: Devices supporting trim, the T13 implementation of TRIM and the
> SAT for UNMAP.  We can get the latter two out of the proposals, but it's
> still a bit of a moving target.
>   

I think that it has settled a bit - do we have a good sense of the 
status of the various proposals in T13 and T10?
>   
>> One note on the SCSI devices, there was a T10 proposal to add an "UNMAP" 
>> bit to the "WRITE SAME" command for SCSI. The details of the proposed 
>> interface are at:
>>
>> http://www.t11.org/t10/document.08/08-356r4.pdf
>>
>> The up side of using WRITE SAME with unmap is that there are no fuzzy 
>> semantics about what the unmapped sectors will be - they will all be 
>> whatever the WRITE SAME command would have set (usually zeroes I assume).
>>
>> The summary of write same is that you send down one sector (say 512 
>> bytes of zeroes) and a count so you can do a zeroing of the target 
>> without having to send all of the data over the wire. Very useful for 
>> initializing members of a RAID device for example to a known pattern.
>>
>> The down side would be that if we incorrectly send down a WRITE SAME 
>> command to a non-thin device, I think that we would kick off a potential 
>> extremely long IO. For example, imagine doing a write same of a full TB 
>> - that could take an hour which might be an issue :-)  Of course, we 
>> should not be doing that if we get the code right.
>>     
>
> As I read it, non thin provisioned devices can be identified (and may
> not even accept WRITE SAME).
>   

I agree that the intersection of write same and thin devices is not 
going to be 100%. We might end up needing both for SCSI in the worst 
case I suppose.
>   
>> I don't see another of the PDF's claims of advantages for file systems 
>> to be really all that useful.
>>
>> With either the write same and its proposed unmap bit or with the 
>> original T10 unmap, do we have a short list of infrastructure that needs 
>> fleshed out? Anything we can do to help get peoples patches to test with 
>> their non-GA thin enabled devices?
>>     
>
> Yes, REQ_DISCARD simply isn't broad enough to cope with all the
> potential uses of WRITE SAME.  If it's just a mechanism to get known
> data into a discard sector, fine, we can set that at the lower level.
> However, WRITE SAME has uses beyond TRIM in that it can be used as an
> engine for data deduplication.  If vendors are thinking of doing this,
> then REQ_DISCARD isn't flexible enough.
>   

I am more interested personally in the sparse support. On the dedup 
side, I think that most implementations do not rely on write same. They 
tend to compute hashes on the various blocks and so on.
>   
>> Is there a similar short list of things to be done for T13 devices with 
>> TRIM? Anyone have a chance to test on real hardware yet?
>>     
>
> Not that I know of yet.  It's all sort of on hold until actual devices
> become available.
>
> James
>
>
>   
The vendors certainly have things that they could try in their labs if 
we can get bits and pieces together for them to test with. We will need 
to avoid the chicken and egg scenario where they wait for us and we wait 
for them :-)

Ric
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 14:53     ` TRIM vs UNMAP vs WRITE SAME and thin devices Ric Wheeler
  2009-02-07 15:09       ` James Bottomley
@ 2009-02-07 22:47       ` Matthew Wilcox
  2009-02-07 23:36         ` David Woodhouse
  2009-02-07 23:46         ` Jeff Garzik
  2009-02-08 20:06       ` Greg Freemyer
  2 siblings, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2009-02-07 22:47 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: David Woodhouse, James Bottomley, Martin K. Petersen, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

On Sat, Feb 07, 2009 at 09:53:06AM -0500, Ric Wheeler wrote:
> I have been poked at by some vendors about the status of our support for 
> the virtually/thinly provisioned luns since they are getting close to 
> being able to test with real devices.
> 
> My quick summary is that we most of the work so far has been done 
> without any real hardware to play with - in 2.6.29-rc3, I don't see any 
> low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD 
> into the specific ATA or SCSI commands. Did I miss something & if not, 
> do we have plans to push anything upstream soonish?

Bearing in mind that I'm now three weeks behind on email, you might want
to look at
http://git.kernel.org/?p=linux/kernel/git/willy/ssd.git;a=shortlog;h=trim-20081231
which has at least one known bug (fixed by Dave Woodhouse and Ben
Herrenschmidt).  I'll be able to give a more coherent answer in a few
days.  Or maybe Dave will beat me to it ;-)

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 15:09       ` James Bottomley
  2009-02-07 16:14         ` Ric Wheeler
@ 2009-02-07 22:50         ` Matthew Wilcox
  2009-02-07 23:03           ` James Bottomley
  2009-02-08 16:47           ` Ric Wheeler
  1 sibling, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2009-02-07 22:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Ric Wheeler, David Woodhouse, Martin K. Petersen, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

On Sat, Feb 07, 2009 at 09:09:32AM -0600, James Bottomley wrote:
> On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
> > I have been poked at by some vendors about the status of our support for 
> > the virtually/thinly provisioned luns since they are getting close to 
> > being able to test with real devices.
> 
> With my LSF hat on, a certain array vendor might be sponsoring to get
> the opportunity to raise this issue more fully.  The impression (mostly
> correct) is that we're thinking about trim/unmap purely from the SSD FTL
> point of view and perhaps not being as useful as we might to virtually
> provisioned LUNs ... so you could mention to the other vendors that they
> might have an interest in coming (and even possibly sponsoring).

I thought we had agreed on a plan which satisfied the SSD and insane
array vendors.  That is that we would do no tracking of allocation units
in the filesystem, but instead extend each trim out to cover the maximum
possible size.  I've confirmed with Intel's SSD people that this would
cause them no harm at all (trimming already trimmed sectors won't even
cause a slowdown).  Whether the filesystem people have taken note of
this, I have no idea.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 22:50         ` Matthew Wilcox
@ 2009-02-07 23:03           ` James Bottomley
  2009-02-08 16:47           ` Ric Wheeler
  1 sibling, 0 replies; 18+ messages in thread
From: James Bottomley @ 2009-02-07 23:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ric Wheeler, David Woodhouse, Martin K. Petersen, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

On Sat, 2009-02-07 at 15:50 -0700, Matthew Wilcox wrote:
> On Sat, Feb 07, 2009 at 09:09:32AM -0600, James Bottomley wrote:
> > On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
> > > I have been poked at by some vendors about the status of our support for 
> > > the virtually/thinly provisioned luns since they are getting close to 
> > > being able to test with real devices.
> > 
> > With my LSF hat on, a certain array vendor might be sponsoring to get
> > the opportunity to raise this issue more fully.  The impression (mostly
> > correct) is that we're thinking about trim/unmap purely from the SSD FTL
> > point of view and perhaps not being as useful as we might to virtually
> > provisioned LUNs ... so you could mention to the other vendors that they
> > might have an interest in coming (and even possibly sponsoring).
> 
> I thought we had agreed on a plan which satisfied the SSD and insane
> array vendors.

I don't think we got any input from array vendors, so it's rather hard
to claim this.  So part of this idea would be gathering the necessary
inputs.

>   That is that we would do no tracking of allocation units
> in the filesystem, but instead extend each trim out to cover the maximum
> possible size.  I've confirmed with Intel's SSD people that this would
> cause them no harm at all (trimming already trimmed sectors won't even
> cause a slowdown).  Whether the filesystem people have taken note of
> this, I have no idea.

It's one idea, but absent requirements from array vendors, we don't
really know if it's the right one.

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 22:47       ` Matthew Wilcox
@ 2009-02-07 23:36         ` David Woodhouse
  2009-02-07 23:46         ` Jeff Garzik
  1 sibling, 0 replies; 18+ messages in thread
From: David Woodhouse @ 2009-02-07 23:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ric Wheeler, David Woodhouse, James Bottomley, Martin K. Petersen,
	Jeff Garzik, linux-scsi, linux-fsdevel, IDE/ATA development list

> On Sat, Feb 07, 2009 at 09:53:06AM -0500, Ric Wheeler wrote:
>> I have been poked at by some vendors about the status of our support for
>> the virtually/thinly provisioned luns since they are getting close to
>> being able to test with real devices.
>>
>> My quick summary is that we most of the work so far has been done
>> without any real hardware to play with - in 2.6.29-rc3, I don't see any
>> low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD
>> into the specific ATA or SCSI commands. Did I miss something & if not,
>> do we have plans to push anything upstream soonish?
>
> Bearing in mind that I'm now three weeks behind on email, you might want
> to look at
> http://git.kernel.org/?p=linux/kernel/git/willy/ssd.git;a=shortlog;h=trim-20081231
> which has at least one known bug (fixed by Dave Woodhouse and Ben
> Herrenschmidt).  I'll be able to give a more coherent answer in a few
> days.  Or maybe Dave will beat me to it ;-)

Ben's suggestion was that the IDE core wouldn't be sending the payload of
the command because it looks at the R/W bit... which is clear (read) in
our discard requests ATM. Making them appear to be writes is simple enough
though. I gave an updated test kernel to the Sandisk folks but haven't got
results back from them yet.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 22:47       ` Matthew Wilcox
  2009-02-07 23:36         ` David Woodhouse
@ 2009-02-07 23:46         ` Jeff Garzik
  2009-02-08  0:24           ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Garzik @ 2009-02-07 23:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Ric Wheeler, David Woodhouse, James Bottomley, Martin K. Petersen,
	linux-scsi, linux-fsdevel, IDE/ATA development list

Matthew Wilcox wrote:
> On Sat, Feb 07, 2009 at 09:53:06AM -0500, Ric Wheeler wrote:
>> I have been poked at by some vendors about the status of our support for 
>> the virtually/thinly provisioned luns since they are getting close to 
>> being able to test with real devices.
>>
>> My quick summary is that we most of the work so far has been done 
>> without any real hardware to play with - in 2.6.29-rc3, I don't see any 
>> low level ATA or SCSI bits that turn requests tagged with REQ_DISCARD 
>> into the specific ATA or SCSI commands. Did I miss something & if not, 
>> do we have plans to push anything upstream soonish?
> 
> Bearing in mind that I'm now three weeks behind on email, you might want
> to look at
> http://git.kernel.org/?p=linux/kernel/git/willy/ssd.git;a=shortlog;h=trim-20081231
> which has at least one known bug (fixed by Dave Woodhouse and Ben
> Herrenschmidt).  I'll be able to give a more coherent answer in a few
> days.  Or maybe Dave will beat me to it ;-)

BTW when will somebody send me the 4k sector patches?  :)

	Jeff




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 23:46         ` Jeff Garzik
@ 2009-02-08  0:24           ` Matthew Wilcox
  0 siblings, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2009-02-08  0:24 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ric Wheeler, David Woodhouse, James Bottomley, Martin K. Petersen,
	linux-scsi, linux-fsdevel, IDE/ATA development list

On Sat, Feb 07, 2009 at 06:46:42PM -0500, Jeff Garzik wrote:
> BTW when will somebody send me the 4k sector patches?  :)

I'll get to that on Monday; just arrived back from holiday today.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 22:50         ` Matthew Wilcox
  2009-02-07 23:03           ` James Bottomley
@ 2009-02-08 16:47           ` Ric Wheeler
  2009-02-08 20:50             ` Matthew Wilcox
  1 sibling, 1 reply; 18+ messages in thread
From: Ric Wheeler @ 2009-02-08 16:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, David Woodhouse, Martin K. Petersen, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

Matthew Wilcox wrote:
> On Sat, Feb 07, 2009 at 09:09:32AM -0600, James Bottomley wrote:
>   
>> On Sat, 2009-02-07 at 09:53 -0500, Ric Wheeler wrote:
>>     
>>> I have been poked at by some vendors about the status of our support for 
>>> the virtually/thinly provisioned luns since they are getting close to 
>>> being able to test with real devices.
>>>       
>> With my LSF hat on, a certain array vendor might be sponsoring to get
>> the opportunity to raise this issue more fully.  The impression (mostly
>> correct) is that we're thinking about trim/unmap purely from the SSD FTL
>> point of view and perhaps not being as useful as we might to virtually
>> provisioned LUNs ... so you could mention to the other vendors that they
>> might have an interest in coming (and even possibly sponsoring).
>>     
>
> I thought we had agreed on a plan which satisfied the SSD and insane
> array vendors.  That is that we would do no tracking of allocation units
> in the filesystem, but instead extend each trim out to cover the maximum
> possible size.  I've confirmed with Intel's SSD people that this would
> cause them no harm at all (trimming already trimmed sectors won't even
> cause a slowdown).  Whether the filesystem people have taken note of
> this, I have no idea.
>
>   
That should be helpful for the array people, but for some of them with 
really large delete chuck sizes, they will still miss a lot since their 
size is larger than the average file size :-)  I guess that we could do 
something to resync - Ted mentioned some ideas for ext4.

On another note, they are pondering either using write same with the 
discard bit set or the unmap command. It would seem that for thin 
provisioning alone, either would work.

ric


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 14:53     ` TRIM vs UNMAP vs WRITE SAME and thin devices Ric Wheeler
  2009-02-07 15:09       ` James Bottomley
  2009-02-07 22:47       ` Matthew Wilcox
@ 2009-02-08 20:06       ` Greg Freemyer
  2009-02-08 20:44         ` Matthew Wilcox
  2 siblings, 1 reply; 18+ messages in thread
From: Greg Freemyer @ 2009-02-08 20:06 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: David Woodhouse, James Bottomley, Martin K. Petersen,
	Matthew Wilcox, Jeff Garzik, linux-scsi, linux-fsdevel,
	IDE/ATA development list

On Sat, Feb 7, 2009 at 9:53 AM, Ric Wheeler <rwheeler@redhat.com> wrote:
>
> I have been poked at by some vendors about the status of our support for the
> virtually/thinly provisioned luns since they are getting close to being able
> to test with real devices.

I found a list of T10 activities just since just Dec. 1, 2008 and it
is a bit overwhelming.  (ie. 08-356r4 is but one of many recent
reports)

http://www.t10.org/new_a.htm

For those of us that don't live and breath the SCSI spec, is there an
overview site describing what is going on.

Maybe:
09-059r0 	T10 Project Summary - January 2009 	John Lohmeyer 	PDF
(34729)	2009/01/22
http://www.t10.org/cgi-bin/ac.pl?t=d&f=09-059r0.pdf

I have not read any of the Post Dec. 1 stuff including the above
project summary, but based on the names these seem potentially
relevant:

09-055r0 	T13 Liaison Report January 09 	Dan Colegrove 	PDF (4770)	2009/01/15

08-356r4 	SBC-3: WRITE SAME unmap bit 	David L. Black 	PDF (56608)	2008/12/10

08-356r5 	SBC-3 Thin Provisioning Commands 	Fred Knight, David L.
Black 	PDF (387549)	2009/01/15

08-149r7 	SBC - Thin Provisioning 	Frederick Knight 	PDF (281001)	2008/12/08

08-149r8 	SBC - Thin Provisioning 	Frederick Knight 	PDF (281387)	2009/01/09

09-011r1 	SBC-3 Thin Provisioning Threshold Notification 	Frederick
Knight 	PDF (32757)	2009/01/09

08-149r9 	SBC - Thin Provisioning 	Frederick Knight 	PDF (353888)	2009/01/15

09-012r0 	Minutes: CAP - Thin Provisioning 12/4 con-call 	Frederick
Knight 	PDF (38063)	2008/12/08

09-011r0 	SBC-3 Thin Provisioning Threshold Notification 	Frederick
Knight 	PDF (48523)	2008/12/08

08-396r3 	SPC-4: Reporting support for all DIF types 	George Penokie
	PDF (85358)	2009/01/14

09-058r0 	Agenda for T10 Meeting #90 March 2009 	John Lohmeyer 	PDF
(61437)	2009/01/19

09-020r0 	T11 Liaison Report, December 2008 	Robert Snively 	PDF
(13117)	2008/12/19

09-032r0 	Minutes of T10 Plenary Meeting #89 - January 15, 2009 	Weber
& Lohmeyer 	HTM (141593)	2009/01/23

09-032r0 	Minutes of T10 Plenary Meeting #89 - January 15, 2009 	Weber
& Lohmeyer 	PDF (344891)	2009/01/23

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-08 20:06       ` Greg Freemyer
@ 2009-02-08 20:44         ` Matthew Wilcox
  2009-02-09  0:01           ` Ric Wheeler
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2009-02-08 20:44 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Ric Wheeler, David Woodhouse, James Bottomley, Martin K. Petersen,
	Jeff Garzik, linux-scsi, linux-fsdevel, IDE/ATA development list

On Sun, Feb 08, 2009 at 03:06:44PM -0500, Greg Freemyer wrote:
> I found a list of T10 activities just since just Dec. 1, 2008 and it
> is a bit overwhelming.  (ie. 08-356r4 is but one of many recent
> reports)
> 
> http://www.t10.org/new_a.htm
> 
> For those of us that don't live and breath the SCSI spec, is there an
> overview site describing what is going on.

I've been working off 08-149r7.pdf.  I'm sure that's been superseded by
now.  

> 08-356r4 	SBC-3: WRITE SAME unmap bit 	David L. Black 	PDF (56608)	2008/12/10

Probably interesting.  Haven't read it myself.

> 08-356r5 	SBC-3 Thin Provisioning Commands 	Fred Knight, David L.
> Black 	PDF (387549)	2009/01/15

Fred Knight seems to be the main coordinator of this effort, so yes.

> 08-149r7 	SBC - Thin Provisioning 	Frederick Knight 	PDF (281001)	2008/12/08

That's the one I'm working from.

> 08-149r8 	SBC - Thin Provisioning 	Frederick Knight 	PDF (281387)	2009/01/09

A newer version ... thought so.

> 09-011r1 	SBC-3 Thin Provisioning Threshold Notification 	Frederick
> Knight 	PDF (32757)	2009/01/09

Clearly related.

> 08-149r9 	SBC - Thin Provisioning 	Frederick Knight 	PDF (353888)	2009/01/15

Even newer version of what I've been working from.

> 09-012r0 	Minutes: CAP - Thin Provisioning 12/4 con-call 	Frederick
> Knight 	PDF (38063)	2008/12/08

Probably tedious.

> 08-396r3 	SPC-4: Reporting support for all DIF types 	George Penokie
> 	PDF (85358)	2009/01/14

Unrelated, I would think.

I'd go with 08-149r9 to get a good overview.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-08 16:47           ` Ric Wheeler
@ 2009-02-08 20:50             ` Matthew Wilcox
  2009-02-08 23:58               ` Ric Wheeler
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox @ 2009-02-08 20:50 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: James Bottomley, David Woodhouse, Martin K. Petersen, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

On Sun, Feb 08, 2009 at 11:47:25AM -0500, Ric Wheeler wrote:
> Matthew Wilcox wrote:
> >I thought we had agreed on a plan which satisfied the SSD and insane
> >array vendors.  That is that we would do no tracking of allocation units
> >in the filesystem, but instead extend each trim out to cover the maximum
> >possible size.  I've confirmed with Intel's SSD people that this would
> >cause them no harm at all (trimming already trimmed sectors won't even
> >cause a slowdown).  Whether the filesystem people have taken note of
> >this, I have no idea.
>
> That should be helpful for the array people, but for some of them with 
> really large delete chuck sizes, they will still miss a lot since their 
> size is larger than the average file size :-)  I guess that we could do 
> something to resync - Ted mentioned some ideas for ext4.

I'm not sure I communicated the plan effectively.

Let's consider deleting a 4k file.

The DISCARD that the filesystem sends down does not just cover the 4k
of data.  It covers all adjacent free space to that 4k of data, so it
might end up sending a DISCARD of several megabytes or even gigabytes,
assuming there's that much contiguous free space.

Now, filesystems which fragment their free space will not do well on
thin provisioned devices, but then they won't do well on any devices --
keeping your free space compacted is an essential part of any filesystem's
job, even on SSDs.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-08 20:50             ` Matthew Wilcox
@ 2009-02-08 23:58               ` Ric Wheeler
  0 siblings, 0 replies; 18+ messages in thread
From: Ric Wheeler @ 2009-02-08 23:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, David Woodhouse, Martin K. Petersen, Jeff Garzik,
	linux-scsi, linux-fsdevel, IDE/ATA development list

Matthew Wilcox wrote:
> On Sun, Feb 08, 2009 at 11:47:25AM -0500, Ric Wheeler wrote:
>   
>> Matthew Wilcox wrote:
>>     
>>> I thought we had agreed on a plan which satisfied the SSD and insane
>>> array vendors.  That is that we would do no tracking of allocation units
>>> in the filesystem, but instead extend each trim out to cover the maximum
>>> possible size.  I've confirmed with Intel's SSD people that this would
>>> cause them no harm at all (trimming already trimmed sectors won't even
>>> cause a slowdown).  Whether the filesystem people have taken note of
>>> this, I have no idea.
>>>       
>> That should be helpful for the array people, but for some of them with 
>> really large delete chuck sizes, they will still miss a lot since their 
>> size is larger than the average file size :-)  I guess that we could do 
>> something to resync - Ted mentioned some ideas for ext4.
>>     
>
> I'm not sure I communicated the plan effectively.
>
> Let's consider deleting a 4k file.
>
> The DISCARD that the filesystem sends down does not just cover the 4k
> of data.  It covers all adjacent free space to that 4k of data, so it
> might end up sending a DISCARD of several megabytes or even gigabytes,
> assuming there's that much contiguous free space.
>
> Now, filesystems which fragment their free space will not do well on
> thin provisioned devices, but then they won't do well on any devices --
> keeping your free space compacted is an essential part of any filesystem's
> job, even on SSDs.
>
>   
Thanks - that does sound like it will in fact help clean up.  I suppose 
the worst case would be deleting lots of non-contiguous small files from 
a full file system (say every other 4KB or something obscure like that). 

I will see what the vendors I know have come up with, I think that this 
should give them something interesting to play with....

Ric



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-08 20:44         ` Matthew Wilcox
@ 2009-02-09  0:01           ` Ric Wheeler
  0 siblings, 0 replies; 18+ messages in thread
From: Ric Wheeler @ 2009-02-09  0:01 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Greg Freemyer, David Woodhouse, James Bottomley,
	Martin K. Petersen, Jeff Garzik, linux-scsi, linux-fsdevel,
	IDE/ATA development list

Matthew Wilcox wrote:
> On Sun, Feb 08, 2009 at 03:06:44PM -0500, Greg Freemyer wrote:
>   
>> I found a list of T10 activities just since just Dec. 1, 2008 and it
>> is a bit overwhelming.  (ie. 08-356r4 is but one of many recent
>> reports)
>>
>> http://www.t10.org/new_a.htm
>>
>> For those of us that don't live and breath the SCSI spec, is there an
>> overview site describing what is going on.
>>     
>
> I've been working off 08-149r7.pdf.  I'm sure that's been superseded by
> now.  
>
>   
>> 08-356r4 	SBC-3: WRITE SAME unmap bit 	David L. Black 	PDF (56608)	2008/12/10
>>     
>
> Probably interesting.  Haven't read it myself.
>   

This is only a four page proposal - basically, we would use the write 
same command with a special unmap bit set to tell the target that it may 
(at its option) unmap the blocks. If not, it would in fact have to set 
the data to the indicated pattern in the command which I presume would 
be all zeros in the normal case.
>   
>> 08-356r5 	SBC-3 Thin Provisioning Commands 	Fred Knight, David L.
>> Black 	PDF (387549)	2009/01/15
>>     
>
> Fred Knight seems to be the main coordinator of this effort, so yes.
>   

Fred and David Black both have been quite active.
>   
>> 08-149r7 	SBC - Thin Provisioning 	Frederick Knight 	PDF (281001)	2008/12/08
>>     
>
> That's the one I'm working from.
>
>   
>> 08-149r8 	SBC - Thin Provisioning 	Frederick Knight 	PDF (281387)	2009/01/09
>>     
>
> A newer version ... thought so.
>
>   
>> 09-011r1 	SBC-3 Thin Provisioning Threshold Notification 	Frederick
>> Knight 	PDF (32757)	2009/01/09
>>     
>
> Clearly related.
>
>   
>> 08-149r9 	SBC - Thin Provisioning 	Frederick Knight 	PDF (353888)	2009/01/15
>>     
>
> Even newer version of what I've been working from.
>
>   
>> 09-012r0 	Minutes: CAP - Thin Provisioning 12/4 con-call 	Frederick
>> Knight 	PDF (38063)	2008/12/08
>>     
>
> Probably tedious.
>
>   
>> 08-396r3 	SPC-4: Reporting support for all DIF types 	George Penokie
>> 	PDF (85358)	2009/01/14
>>     
>
> Unrelated, I would think.
>
> I'd go with 08-149r9 to get a good overview.
>
>   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-07 16:14         ` Ric Wheeler
@ 2009-02-12 13:51           ` Eyal Shani
  2009-03-23 19:05             ` Greg Freemyer
  0 siblings, 1 reply; 18+ messages in thread
From: Eyal Shani @ 2009-02-12 13:51 UTC (permalink / raw)
  To: Ric Wheeler, James Bottomley
  Cc: David Woodhouse, Martin K. Petersen, Matthew Wilcox, Jeff Garzik,
	linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	IDE/ATA development list, Eyal Shani

Adding my 5 cents.

T13 added Trim to the latest ATA8 proposal.
http://www.t13.org/Documents/UploadedDocuments/docs2009/d2015r1-ATAATAPI_Command_Set_-_2_ACS-2.pdf

This is after the changes put into the definition, with 'Deterministic Read after Trim'.
This is not STANDARDIZED, but pretty much excepted by all sides.

I was hoping that would settle the differences between T10/T13 on this - little did I know...

We are working with David W. on his implementation for Trim feature, and hope to get to the bottom of debug process soon.
Hope to update soon...


Regards,
Eyal Shani.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-02-12 13:51           ` Eyal Shani
@ 2009-03-23 19:05             ` Greg Freemyer
  2009-03-23 19:23               ` Mark Lord
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Freemyer @ 2009-03-23 19:05 UTC (permalink / raw)
  To: Eyal Shani
  Cc: Ric Wheeler, James Bottomley, David Woodhouse, Martin K. Petersen,
	Matthew Wilcox, Jeff Garzik, linux-scsi@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, IDE/ATA development list,
	Theodore Tso

On Thu, Feb 12, 2009 at 9:51 AM, Eyal Shani <Eyal.Shani@sandisk.com> wrote:
> Adding my 5 cents.
>
> T13 added Trim to the latest ATA8 proposal.
> http://www.t13.org/Documents/UploadedDocuments/docs2009/d2015r1-ATAATAPI_Command_Set_-_2_ACS-2.pdf
>
> This is after the changes put into the definition, with 'Deterministic Read after Trim'.
> This is not STANDARDIZED, but pretty much excepted by all sides.
>
> I was hoping that would settle the differences between T10/T13 on this - little did I know...
>
> We are working with David W. on his implementation for Trim feature, and hope to get to the bottom of debug process soon.
> Hope to update soon...
>
>
> Regards,
> Eyal Shani.

FYI:

Several of you remember I've been concerned about the lack of
"audit-ability" associated with the new Trim feature as relates to the
T13 spec.

I finally found a contact that is on the T-13 committee and have
expressed my concern.  He said the issue was raised at a recent
meeting of the committee and that a sub-group was tasked with making a
recommendation.  He said that he understands my concern and said he
would push to ensure that some sort of "reliable data" flag be in the
eventual spec.

Obviously he is just one person, so no guarantees, but I am happy to
have finally connected with someone on the committee.

Greg
-- 
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: TRIM vs UNMAP vs WRITE SAME and thin devices
  2009-03-23 19:05             ` Greg Freemyer
@ 2009-03-23 19:23               ` Mark Lord
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Lord @ 2009-03-23 19:23 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Eyal Shani, Ric Wheeler, James Bottomley, David Woodhouse,
	Martin K. Petersen, Matthew Wilcox, Jeff Garzik,
	linux-scsi@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	IDE/ATA development list, Theodore Tso

..
> On Thu, Feb 12, 2009 at 9:51 AM, Eyal Shani <Eyal.Shani@sandisk.com> wrote:
>> Adding my 5 cents.
>>
>> T13 added Trim to the latest ATA8 proposal.
>> http://www.t13.org/Documents/UploadedDocuments/docs2009/d2015r1-ATAATAPI_Command_Set_-_2_ACS-2.pdf
..

Note that there is also a Rev.1a edition, same link as above
except change the d2015r1 to d2015r1a:

http://www.t13.org/Documents/UploadedDocuments/docs2009/d2015r1a-ATAATAPI_Command_Set_-_2_ACS-2.pdf

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-03-23 19:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090123041558.GC24652@parisc-linux.org>
     [not found] ` <4979AF62.7070409@redhat.com>
     [not found]   ` <1232721777.4430.7.camel@macbook.infradead.org>
2009-02-07 14:53     ` TRIM vs UNMAP vs WRITE SAME and thin devices Ric Wheeler
2009-02-07 15:09       ` James Bottomley
2009-02-07 16:14         ` Ric Wheeler
2009-02-12 13:51           ` Eyal Shani
2009-03-23 19:05             ` Greg Freemyer
2009-03-23 19:23               ` Mark Lord
2009-02-07 22:50         ` Matthew Wilcox
2009-02-07 23:03           ` James Bottomley
2009-02-08 16:47           ` Ric Wheeler
2009-02-08 20:50             ` Matthew Wilcox
2009-02-08 23:58               ` Ric Wheeler
2009-02-07 22:47       ` Matthew Wilcox
2009-02-07 23:36         ` David Woodhouse
2009-02-07 23:46         ` Jeff Garzik
2009-02-08  0:24           ` Matthew Wilcox
2009-02-08 20:06       ` Greg Freemyer
2009-02-08 20:44         ` Matthew Wilcox
2009-02-09  0:01           ` Ric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).