From: Jeff Garzik <jeff@garzik.org>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Matthew Wilcox <matthew@wil.cx>, Hugh Dickins <hugh@veritas.com>,
Matthew Wilcox <willy@linux.intel.com>,
linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
Jeff Garzik <jgarzik@redhat.com>,
linux-scsi@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>,
Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>,
Mark Lord <lkml@rtr.ca>
Subject: Re: New TRIM/UNMAP tree published (2009-05-02)
Date: Sun, 03 May 2009 14:34:35 -0400 [thread overview]
Message-ID: <49FDE3BB.505@garzik.org> (raw)
In-Reply-To: <49FDC786.6070309@panasas.com>
Boaz Harrosh wrote:
> On 05/03/2009 06:42 PM, Matthew Wilcox wrote:
>> On Sun, May 03, 2009 at 06:02:51PM +0300, Boaz Harrosh wrote:
>>> I agree with Hugh. The allocation is done at, too-low in the food chain.
>>> (And that free of buffer at upper layer allocated by lower layer).
>>>
>>> I think you need to separate the: "does lld need buffer, what size"
>>> from the "here is buffer prepare", so upper layer that can sleep does
>>> sleep.
>> So you want two function pointers in the request queue relating to discard?
>>
>
> OK I don't know what I want, I guess. ;-)
>
> I'm not a block-device export but from the small osdblk device I maintain
> it looks like osdblk_prepare_flush which is set into:
> blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, osdblk_prepare_flush);
>
> does some internal structure setup, but the actual flush command is only executed
> later in the global osdblk_rq_fn which is set into:
> blk_init_queue(osdblk_rq_fn, &osdev->lock);
>
> But I'm not even sure that prepare_flush is called in a better context then
> queue_fn, and what does it means to let block devices take care of another
> new command type at queue_fn.
>
> I guess it comes back to Jeff Garzik's comment about not having a central
> place to ask the request what we need to do.
>
> But I do hate that allocation is done by driver and free by mid-layer,
> so yes two vectors, request_queue is allocated once per device it's not
> that bad. And later when Jeff's comment is addressed it can be removed.
May I presume you are referring to the following osdblk.c comment?
/* deduce our operation (read, write, flush) */
/* I wish the block layer simplified
* cmd_type/cmd_flags/cmd[]
* into a clearly defined set of RPC commands:
* read, write, flush, scsi command, power mgmt req,
* driver-specific, etc.
*/
Yes, the task of figuring out -what to do- in the queue's request
function is quite complex, and discard makes it even more so.
The API makes life difficult -- you have to pass temporary info to
yourself in ->prepare_flush_fn() and ->prepare_discard_fn(), and the
overall sum is a bewildering collection of opcodes, flags, and internal
driver notes to itself.
Add to this yet another prep function, ->prep_rq_fn()
It definitely sucks, especially with regards to failed atomic
allocations... but I think fixing this quite a big more than Matthew
probably willing to tackle ;-)
My ideal block layer interface would be a lot more opcode-based, e.g.
(1) create REQ_TYPE_DISCARD
(2) determine at init if queue (a) supports explicit DISCARD and/or (b)
supports DISCARD flag passed with READ or WRITE
(3) when creating a discard request, use block helpers w/ queue-specific
knowledge to create either
(a) one request, REQ_TYPE_FS, with discard flag or
(b) two requests, REQ_TYPE_FS followed by REQ_TYPE_DISCARD
(4) blkdev_issue_discard() would function like an empty barrier, and
unconditionally create REQ_TYPE_DISCARD.
This type of setup would require NO prepare_discard command, as all
knowledge would be passed directly to ->prep_rq_fn() and ->request_fn()
And to tangent a bit... I feel barriers should be handled in exactly
the same way. Create REQ_TYPE_FLUSH, which would be issued for above
examples #2a and #4, if the queue is setup that way.
All this MINIMIZES the amount of information a driver must "pass to
itself", by utilizing existing ->prep_fn_rq() and ->request_fn() pathways.
Jeff
next prev parent reply other threads:[~2009-05-03 18:34 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-02 14:37 [PATCH 1/5] Block: Discard may need to allocate pages Matthew Wilcox
2009-04-02 14:37 ` [PATCH 2/5] Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads Matthew Wilcox
2009-04-02 14:37 ` [PATCH 3/5] ata: Add TRIM infrastructure Matthew Wilcox
2009-04-02 14:37 ` [PATCH 4/5] ide: Add support for TRIM Matthew Wilcox
2009-04-02 14:37 ` [PATCH 5/5] libata: " Matthew Wilcox
2009-04-02 17:20 ` Mark Lord
2009-04-02 17:55 ` Matthew Wilcox
2009-04-16 20:25 ` Mark Lord
2009-04-17 19:44 ` Mark Lord
2009-04-02 15:58 ` [PATCH 4/5] ide: " Sergei Shtylyov
2009-04-02 16:28 ` Matthew Wilcox
2009-04-02 16:38 ` Sergei Shtylyov
2009-04-02 16:51 ` Matthew Wilcox
2009-04-02 19:37 ` Bartlomiej Zolnierkiewicz
2009-04-07 21:38 ` Bartlomiej Zolnierkiewicz
2009-04-07 22:15 ` Matthew Wilcox
2009-04-07 22:26 ` Jeff Garzik
2009-04-07 22:35 ` Bartlomiej Zolnierkiewicz
2009-04-07 17:20 ` Jeff Garzik
2009-04-07 17:57 ` Mark Lord
2009-04-07 18:10 ` Markus Trippelsdorf
2009-04-07 19:58 ` Mark Lord
2009-04-08 7:14 ` Markus Trippelsdorf
2009-04-08 14:25 ` Mark Lord
2009-04-08 14:33 ` Mark Lord
2009-04-08 14:44 ` Dongjun Shin
2009-04-08 14:59 ` Jeff Garzik
2009-04-08 15:50 ` Mark Lord
2009-04-02 15:55 ` [PATCH 3/5] ata: Add TRIM infrastructure Sergei Shtylyov
2009-04-02 16:18 ` Matthew Wilcox
2009-04-02 16:32 ` Sergei Shtylyov
2009-04-02 16:47 ` Matthew Wilcox
2009-04-07 0:02 ` Jeff Garzik
2009-04-05 12:28 ` [PATCH 1/5] Block: Discard may need to allocate pages Boaz Harrosh
2009-04-06 20:34 ` Matthew Wilcox
2009-05-03 6:11 ` Matthew Wilcox
2009-05-03 7:16 ` New TRIM/UNMAP tree published (2009-05-02) Matthew Wilcox
2009-05-03 13:07 ` Hugh Dickins
2009-05-03 14:48 ` Matthew Wilcox
2009-05-03 15:02 ` Boaz Harrosh
2009-05-03 15:42 ` Matthew Wilcox
2009-05-03 16:34 ` Boaz Harrosh
2009-05-03 18:34 ` Jeff Garzik [this message]
2009-05-03 18:40 ` Jeff Garzik
2009-05-03 19:04 ` James Bottomley
2009-05-03 19:20 ` Jeff Garzik
2009-05-03 19:37 ` James Bottomley
2009-05-04 14:03 ` Douglas Gilbert
2009-05-04 14:40 ` James Bottomley
2009-05-04 15:11 ` Douglas Gilbert
2009-05-04 15:23 ` James Bottomley
2009-05-03 19:47 ` James Bottomley
2009-05-03 22:47 ` Jeff Garzik
2009-05-04 15:28 ` Boaz Harrosh
2009-05-03 21:48 ` Matthew Wilcox
2009-05-03 22:54 ` Jeff Garzik
2009-05-03 18:48 ` Bartlomiej Zolnierkiewicz
2009-05-03 15:05 ` Hugh Dickins
2009-04-17 21:23 ` [PATCH 1/5] Block: Discard may need to allocate pages Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49FDE3BB.505@garzik.org \
--to=jeff@garzik.org \
--cc=bharrosh@panasas.com \
--cc=bzolnier@gmail.com \
--cc=hugh@veritas.com \
--cc=jens.axboe@oracle.com \
--cc=jgarzik@redhat.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lkml@rtr.ca \
--cc=matthew@wil.cx \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).