From: Jeff Garzik <jeff@garzik.org>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Matthew Wilcox <matthew@wil.cx>, Hugh Dickins <hugh@veritas.com>,
Matthew Wilcox <willy@linux.intel.com>,
linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org,
Jeff Garzik <jgarzik@redhat.com>,
linux-scsi@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>,
Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>,
Mark Lord <lkml@rtr.ca>
Subject: Re: New TRIM/UNMAP tree published (2009-05-02)
Date: Sun, 03 May 2009 14:34:35 -0400 [thread overview]
Message-ID: <49FDE3BB.505@garzik.org> (raw)
In-Reply-To: <49FDC786.6070309@panasas.com>
Boaz Harrosh wrote:
> On 05/03/2009 06:42 PM, Matthew Wilcox wrote:
>> On Sun, May 03, 2009 at 06:02:51PM +0300, Boaz Harrosh wrote:
>>> I agree with Hugh. The allocation is done at, too-low in the food chain.
>>> (And that free of buffer at upper layer allocated by lower layer).
>>>
>>> I think you need to separate the: "does lld need buffer, what size"
>>> from the "here is buffer prepare", so upper layer that can sleep does
>>> sleep.
>> So you want two function pointers in the request queue relating to discard?
>>
>
> OK I don't know what I want, I guess. ;-)
>
> I'm not a block-device export but from the small osdblk device I maintain
> it looks like osdblk_prepare_flush which is set into:
> blk_queue_ordered(q, QUEUE_ORDERED_DRAIN_FLUSH, osdblk_prepare_flush);
>
> does some internal structure setup, but the actual flush command is only executed
> later in the global osdblk_rq_fn which is set into:
> blk_init_queue(osdblk_rq_fn, &osdev->lock);
>
> But I'm not even sure that prepare_flush is called in a better context then
> queue_fn, and what does it means to let block devices take care of another
> new command type at queue_fn.
>
> I guess it comes back to Jeff Garzik's comment about not having a central
> place to ask the request what we need to do.
>
> But I do hate that allocation is done by driver and free by mid-layer,
> so yes two vectors, request_queue is allocated once per device it's not
> that bad. And later when Jeff's comment is addressed it can be removed.
May I presume you are referring to the following osdblk.c comment?
/* deduce our operation (read, write, flush) */
/* I wish the block layer simplified
* cmd_type/cmd_flags/cmd[]
* into a clearly defined set of RPC commands:
* read, write, flush, scsi command, power mgmt req,
* driver-specific, etc.
*/
Yes, the task of figuring out -what to do- in the queue's request
function is quite complex, and discard makes it even more so.
The API makes life difficult -- you have to pass temporary info to
yourself in ->prepare_flush_fn() and ->prepare_discard_fn(), and the
overall sum is a bewildering collection of opcodes, flags, and internal
driver notes to itself.
Add to this yet another prep function, ->prep_rq_fn()
It definitely sucks, especially with regards to failed atomic
allocations... but I think fixing this quite a big more than Matthew
probably willing to tackle ;-)
My ideal block layer interface would be a lot more opcode-based, e.g.
(1) create REQ_TYPE_DISCARD
(2) determine at init if queue (a) supports explicit DISCARD and/or (b)
supports DISCARD flag passed with READ or WRITE
(3) when creating a discard request, use block helpers w/ queue-specific
knowledge to create either
(a) one request, REQ_TYPE_FS, with discard flag or
(b) two requests, REQ_TYPE_FS followed by REQ_TYPE_DISCARD
(4) blkdev_issue_discard() would function like an empty barrier, and
unconditionally create REQ_TYPE_DISCARD.
This type of setup would require NO prepare_discard command, as all
knowledge would be passed directly to ->prep_rq_fn() and ->request_fn()
And to tangent a bit... I feel barriers should be handled in exactly
the same way. Create REQ_TYPE_FLUSH, which would be issued for above
examples #2a and #4, if the queue is setup that way.
All this MINIMIZES the amount of information a driver must "pass to
itself", by utilizing existing ->prep_fn_rq() and ->request_fn() pathways.
Jeff
next prev parent reply other threads:[~2009-05-03 18:34 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-02 14:37 [PATCH 1/5] Block: Discard may need to allocate pages Matthew Wilcox
2009-04-02 14:37 ` [PATCH 2/5] Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads Matthew Wilcox
2009-04-02 14:37 ` [PATCH 3/5] ata: Add TRIM infrastructure Matthew Wilcox
2009-04-02 14:37 ` [PATCH 4/5] ide: Add support for TRIM Matthew Wilcox
2009-04-02 14:37 ` [PATCH 5/5] libata: " Matthew Wilcox
2009-04-02 17:20 ` Mark Lord
2009-04-02 17:55 ` Matthew Wilcox
2009-04-16 20:25 ` Mark Lord
2009-04-17 19:44 ` Mark Lord
2009-04-02 15:58 ` [PATCH 4/5] ide: " Sergei Shtylyov
2009-04-02 16:28 ` Matthew Wilcox
2009-04-02 16:38 ` Sergei Shtylyov
2009-04-02 16:51 ` Matthew Wilcox
2009-04-02 19:37 ` Bartlomiej Zolnierkiewicz
2009-04-07 21:38 ` Bartlomiej Zolnierkiewicz
2009-04-07 22:15 ` Matthew Wilcox
2009-04-07 22:26 ` Jeff Garzik
2009-04-07 22:35 ` Bartlomiej Zolnierkiewicz
2009-04-07 17:20 ` Jeff Garzik
2009-04-07 17:57 ` Mark Lord
2009-04-07 18:10 ` Markus Trippelsdorf
2009-04-07 19:58 ` Mark Lord
2009-04-08 7:14 ` Markus Trippelsdorf
2009-04-08 14:25 ` Mark Lord
2009-04-08 14:33 ` Mark Lord
2009-04-08 14:44 ` Dongjun Shin
2009-04-08 14:59 ` Jeff Garzik
2009-04-08 15:50 ` Mark Lord
2009-04-02 15:55 ` [PATCH 3/5] ata: Add TRIM infrastructure Sergei Shtylyov
2009-04-02 16:18 ` Matthew Wilcox
2009-04-02 16:32 ` Sergei Shtylyov
2009-04-02 16:47 ` Matthew Wilcox
2009-04-07 0:02 ` Jeff Garzik
2009-04-05 12:28 ` [PATCH 1/5] Block: Discard may need to allocate pages Boaz Harrosh
2009-04-06 20:34 ` Matthew Wilcox
2009-05-03 6:11 ` Matthew Wilcox
2009-05-03 7:16 ` New TRIM/UNMAP tree published (2009-05-02) Matthew Wilcox
2009-05-03 13:07 ` Hugh Dickins
2009-05-03 14:48 ` Matthew Wilcox
2009-05-03 15:02 ` Boaz Harrosh
2009-05-03 15:42 ` Matthew Wilcox
2009-05-03 16:34 ` Boaz Harrosh
2009-05-03 18:34 ` Jeff Garzik [this message]
2009-05-03 18:40 ` Jeff Garzik
2009-05-03 19:04 ` James Bottomley
2009-05-03 19:20 ` Jeff Garzik
2009-05-03 19:37 ` James Bottomley
2009-05-04 14:03 ` Douglas Gilbert
2009-05-04 14:40 ` James Bottomley
2009-05-04 15:11 ` Douglas Gilbert
2009-05-04 15:23 ` James Bottomley
2009-05-03 19:47 ` James Bottomley
2009-05-03 22:47 ` Jeff Garzik
2009-05-04 15:28 ` Boaz Harrosh
2009-05-03 21:48 ` Matthew Wilcox
2009-05-03 22:54 ` Jeff Garzik
2009-05-03 18:48 ` Bartlomiej Zolnierkiewicz
2009-05-03 15:05 ` Hugh Dickins
2009-04-17 21:23 ` [PATCH 1/5] Block: Discard may need to allocate pages Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49FDE3BB.505@garzik.org \
--to=jeff@garzik.org \
--cc=bharrosh@panasas.com \
--cc=bzolnier@gmail.com \
--cc=hugh@veritas.com \
--cc=jens.axboe@oracle.com \
--cc=jgarzik@redhat.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lkml@rtr.ca \
--cc=matthew@wil.cx \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.