From: "Alan D. Brunelle" <Alan.Brunelle@hp.com>
To: Matthew Wilcox <matthew@wil.cx>
Cc: "Knight, Frederick" <Frederick.Knight@netapp.com>,
David Woodhouse <dwmw2@infradead.org>,
ricwheeler@gmail.com, linux-fsdevel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>
Subject: Re: Thin device provisioning
Date: Wed, 13 Aug 2008 12:50:32 -0400 [thread overview]
Message-ID: <48A310D8.4030607@hp.com> (raw)
In-Reply-To: <20080812232137.GB8618@parisc-linux.org>
Matthew Wilcox wrote:
> On Tue, Aug 12, 2008 at 04:38:48PM -0400, Knight, Frederick wrote:
>> I don't see how it doesn't match T13 TRIM command? Both can do single
>> ranges. In both cases, you can have 1 LBA and 1 length. There is
>> nothing requiring > 1 range to be sent via the SCSI proposal. In both
>> cases, you pass the same values to the H/W driver. In one H/W driver it
>> will load a bunch of values (including the LBA/length) into a set of
>> registers (PATA) of a memory structure (SATA). In the other H/W driver,
>> it will load a bunch of values into memory structures (CDB/buffer), and
>> then tweek the H/W to send the memory structures.
>
> If you consider a SATL implemented in an array device, it can receive a
> PUNCH command with multiple ranges. It must then send multiple TRIM
> commands, one for each range.
>
> The proposal also suboptimal if the common case is just one range. The SCSI
> driver has to allocate a 20-byte block and do a DATA OUT command.
>
>> Most SCSI drivers I've seen that have tagged queuing enabled turn off
>> their elevator algorithms (since the drive itself is doing it's own
>> optimizations)
>
> In Linux, we try not to have elevators in the device drivers themselves
> (though I believe there are still a few which have their own). Instead we
> have an elevator in the block layer where typically we have much more
> information about which IOs can be merged and which IOs cannot pass
> each other, which OS process submitted the IO (and hence can do fair
> scheduling between different users) and so on.
>
> Each request queue (~= SCSI LUN) can choose which elevator controls its
> behaviour, so if it works out better to have the drive do the scheduling,
> it can be disabled by switching to the noop elevator.
This is not completely true: the generic elevator code does attempt some
merge tries, and the NOOP I/O scheduler also performs a primitive sort.
Recent kernels have the "nomerges" tunable added under
/sys/block/*/queue which can turn off the more complicated merge
attempts (for any scheduler).
>
>> There is no difference at the filesystem de-allocator level. The only
>> difference is how the H/W sends the values to the other end of the wire,
>> and there will always be differences at that layer.
>
> I think Dave's point is that batching all the discards together into one
> list isn't a natural interface for a filesystem; they prefer an
> interface which is a single extent.
Is it expected that the file system code would emit PUNCH directives in
"specially marked" struct bio's through the block I/O storage system?
Then the I/O schedulers would be responsible for discriminating between
PUNCH bio's and "normal" read/write bio's when it performed merging (and
sorting?).
In either case, would the block I/O layer then build "specially marked"
PUNCH requests to the underlying physical drivers?
Alan
next prev parent reply other threads:[~2008-08-13 16:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200808081714.m78HEMkA026466@coles02.co.lsil.com>
[not found] ` <AC32D7C72530234288643DD5F1435D53A80EC3@RTPMVEXC1-PRD.hq.netapp.com>
2008-08-09 16:45 ` Thin device provisioning Matthew Wilcox
2008-08-09 17:12 ` Knight, Frederick
2008-08-12 18:56 ` David Woodhouse
2008-08-12 20:38 ` Knight, Frederick
2008-08-12 23:21 ` Matthew Wilcox
2008-08-13 16:50 ` Alan D. Brunelle [this message]
2008-08-13 17:04 ` David Woodhouse
2008-08-10 0:50 ` Jamie Lokier
2008-08-10 3:51 ` Matthew Wilcox
2008-08-08 13:15 Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48A310D8.4030607@hp.com \
--to=alan.brunelle@hp.com \
--cc=Frederick.Knight@netapp.com \
--cc=dwmw2@infradead.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=ricwheeler@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.