From: Christoph Hellwig <hch@infradead.org>
To: James Bottomley <James.Bottomley@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>,
Matthew Wilcox <matthew@wil.cx>,
linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
linux-kernel@vger.kernel.org, liml@rtr.ca, jens.axboe@oracle.com,
dwmw2@infradead.org
Subject: Re: [PATCH 0/7] discard support revisited
Date: Sun, 30 Aug 2009 18:48:29 -0400 [thread overview]
Message-ID: <20090830224829.GA22682@infradead.org> (raw)
In-Reply-To: <1251663439.10135.159.camel@mulgrave.site>
On Sun, Aug 30, 2009 at 03:17:19PM -0500, James Bottomley wrote:
> > Good question. Latest I had heard was that at least one array vendor
> > prefers the WRITE SAME. To me it looks like the much saner interface
> > for the OS, so unless there are arrays that strongly prefer UNMAP or
> > we need to make use of the multiple extends feature in it I'd go with
> > WRITE SAME as first choice.
>
> So, since their respective names are on the proposals, it's no real
> secret that EMC are pushing WRITE_SAME and Netapp UNMAP, but they are
> both working together on this. I've already communicated to T10 via
> intermediaries that we'd like only a single implementation for this,
> please. However, failing that, the current situation where we know from
> an inquiry that the array supports thin provisioning, but don't know
> whether it supports WRITE_SAME or UNMAP until we get a command failure
> is unacceptable.
>
> If we could get some good solid implementation evidence that WRITE_SAME
> is much easier for an OS than UNMAP, that might help with the T10
> deliberations.
As I've recently worked on all sides of the discard battle (filesystem
support, initiator support, and target support) here are my notes:
- WRITE_SAME is extremly nice to implement for both the initiator and
target. It has the LBA and len exactly in the same place as normal
16 byte commands, the payload length is fixed to one block, which
we can allocate once and zero so that we don't even need any memory
allocations for this command in the initiator.
- UNMAP is a pain to implement in both initiator and target. Not
actuall having the LBA/len information in the cdb but in the payload
is at least a minor incovenience in the initator, and quite annoying
in the target as we now need to process payload data in the fastpath,
which we otherwise only do for slow path CDBs. This will be
especially bad for split kernel/user target implementations.
Now the weird design of UNMAP of course has a rather (besides some
apparent pissing contest at NetApp about who can't come with the worst
possible protocol specifications, whose results can be seen in NFSv4
and iSer), and that is that it allows dicarding of multiple
discontinguous ranges. Doing so is really bad for the filesystem as
it requires it to track multiple outstanding discard requests, which
requires locking, and book keeping to make sure we do not re-use these
blocks before they are discarded.
And at least for my target design it does not provide any measureable
benefits at all, the discard operations are mapped to a hole punch
ioctl on a filesystem, which has a constant basic overhead for each
region punched (synchronous transaction commit) and a small linear
cost per extent removed. The only benefit of the multiple rangs unmap
would be a saving of protocol roundtrips.
Now that is interestingly actually a downside at least for my still
rather dumb target implementation with a typical Linux filesystem
workload on the initiator side. If we actually do a lot different unmap
operations in a single unmap command it can start to take significant
amounts of time, and do to Linux waiting for queue drains frequently
due to the barrier implementations we will end up waiting for the unmap
command.
next prev parent reply other threads:[~2009-08-30 22:48 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-29 23:03 [PATCH 0/7] discard support revisited Christoph Hellwig
2009-08-29 23:03 ` [PATCH 1/7] Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads Christoph Hellwig
2009-09-03 17:52 ` David Woodhouse
2009-09-03 17:56 ` Matthew Wilcox
2009-08-29 23:03 ` [PATCH 2/7] block: use blkdev_issue_discard in blk_ioctl_discard Christoph Hellwig
2009-09-01 9:06 ` Steven Whitehouse
2009-09-11 22:26 ` Christoph Hellwig
2009-09-14 9:40 ` Jens Axboe
2009-08-29 23:03 ` [PATCH 3/7] block: discard may need to allocate pages Christoph Hellwig
2009-08-29 23:03 ` [PATCH 4/7] sd: add support for WRITE SAME (16) with unmap bit Christoph Hellwig
2009-08-30 0:43 ` Douglas Gilbert
2009-08-30 1:05 ` Christoph Hellwig
2009-08-30 2:43 ` Douglas Gilbert
2009-08-30 2:48 ` Christoph Hellwig
2009-08-30 11:12 ` Sergei Shtylyov
2009-08-30 17:14 ` Christoph Hellwig
2009-08-29 23:03 ` [PATCH 5/7] libata: Add support for TRIM Christoph Hellwig
2009-08-29 23:03 ` [PATCH 6/7] block: allow large discard requests Christoph Hellwig
2009-08-30 2:49 ` Mark Lord
2009-08-30 2:50 ` Matthew Wilcox
2009-08-30 2:52 ` Mark Lord
2009-08-30 2:56 ` Christoph Hellwig
2009-08-29 23:03 ` [PATCH 7/7] xfs: add batches discard support Christoph Hellwig
2009-08-29 23:37 ` [PATCH 0/7] discard support revisited Matthew Wilcox
2009-08-30 2:15 ` Christoph Hellwig
2009-08-30 3:03 ` Matthew Wilcox
2009-08-30 20:17 ` James Bottomley
2009-08-30 21:42 ` Matthew Wilcox
2009-08-30 22:48 ` Christoph Hellwig [this message]
2009-09-02 19:46 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090830224829.GA22682@infradead.org \
--to=hch@infradead.org \
--cc=James.Bottomley@suse.de \
--cc=dwmw2@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=liml@rtr.ca \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=matthew@wil.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).