public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: karn@ka9q.net
Cc: xfs@oss.sgi.com
Subject: Re: TRIM details
Date: Thu, 06 Jan 2011 23:35:57 -0500	[thread overview]
Message-ID: <yq1mxndnxea.fsf@sermon.lab.mkp.net> (raw)
In-Reply-To: <4D2686ED.7000304@philkarn.net> (Phil Karn's message of "Thu, 06 Jan 2011 19:22:21 -0800")

>>>>> "Phil" == Phil Karn <karn@philkarn.net> writes:

Phil> I'd like to know exactly how the drives implement TRIM but I've
Phil> only found bits and pieces. Can anyone suggest a current and
Phil> complete reference for the complete SATA command set that includes
Phil> all the TRIM related stuff?

You kind-of have to be T13 member to get it. But try googling ATA
ACS-2...


Phil> As I understand it, there's a SATA (and SCSI?) command that will
Phil> repeatedly write a fixed block of data to some number of
Phil> consecutive LBAs (WRITE SAME), and an "unmap" bit in the write
Phil> command can be set to indicate that instead of actually writing
Phil> the blocks, they can be marked for erasure and placed in the free
Phil> pool.

There are several commands and variations...

For ATA there's the DSM TRIM command which allows you to indicate ranges
of blocks to discard. The ranges are stored in the data blocks and not
the command itself. A device can indicate how many blocks of payload it
supports. Many don't. Some of those that do blow up if you actually send
more than one block.

In SCSI there are three ways:

1. WRITE SAME with a zeroed payload
2. WRITE SAME with the UNMAP bit set
3. UNMAP command

UNMAP, like ATA DSM, takes a set of ranges in the data payload. Just to
make things more interesting they are not the same format and don't have
a 1:1 mapping with the ATA ranges.

There is no official support for (1) at the protocol level. You have to
know via means outside the standard whether the device supports logical
block provisioning with zero detection. There are a few storage arrays
out there that do.

Whether the device supports (2) or (3) is indicated in a set of VPD
pages that also indicate preferred granularity, alignment, etc. That
didn't use to be the case so for a while you just had to guess. We have
some heuristics in place that pick the right command depending on the
device.

Furthermore, in Linux, ATA sits underneath SCSI. So we translate WRITE
SAME(16) with the UNMAP bit set to DSM TRIM in our SCSI-ATA Translation
Layer.

Finally, there are a set of bits in both ATA and SCSI that indicate
whether read after a discard will return zeroes or garbage. Some devices
report that they return zeroes but don't in all cases.

The kernel goes through a lot of blah to make sure we're doing the right
thing. I really don't think that's a headache that's worth repeating.

Thankfully, at the top of the stack we have a generic block device ioctl
that hides all the complexity from the user. If you want to tinker
that's a much better place to start.

If you check the archives you'll also see that the filesystem-specific
FITRIM ioctl is being worked on. Plus some filesystems have the option
of doing discards in realtime.


Phil> Just have the drive interpret an ordinary write of all 0's to any
Phil> LBA as an implicit "unmap" indication for that LBA. As long as the
Phil> drive returns all 0's when an unmapped LBA is read (and I believe
Phil> this is already a requirement) then were an application to write a
Phil> block of real data that just happens to contain all 0's, it would
Phil> still get back what it wrote.

See above.


Phil> Then you could manually trim a drive with something like

Phil> dd if=/dev/zero of=foobar bs=1024k count=10240k rm foobar

But if the device does not detect zeroes then you'll end up:

 - transferring a bunch of useless data across the bus which will slow
   things to a grinding halt

and

 - if it's an SSD, wear out a lot of flash cells for no reason

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-01-07  4:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-07  3:22 TRIM details Phil Karn
2011-01-07  4:35 ` Martin K. Petersen [this message]
2011-01-07  9:11 ` Matthias Schniedermeyer
2011-01-07  9:17   ` Matthias Schniedermeyer
2011-01-07 14:15     ` Phil Karn
2011-01-07 14:13   ` Phil Karn
2011-01-07 16:50     ` Martin K. Petersen
2011-01-07 23:43       ` Phil Karn
2011-01-07 14:21   ` Phil Karn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1mxndnxea.fsf@sermon.lab.mkp.net \
    --to=martin.petersen@oracle.com \
    --cc=karn@ka9q.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox