All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	jens.axboe@oracle.com, hch@infradead.org
Cc: James.Bottomley@SteelEye.com, tomof@acm.org, akpm@osdl.org,
	michaelc@cs.wisc.edu, linux-scsi@vger.kernel.org,
	bhalevy@panasas.com
Subject: Re: [PATCH 4/4] bidi support: bidirectional request
Date: Thu, 03 May 2007 20:42:44 +0300	[thread overview]
Message-ID: <463A1F14.9080309@panasas.com> (raw)
In-Reply-To: <20070502043754N.fujita.tomonori@lab.ntt.co.jp>

FUJITA Tomonori wrote:
> From: Jens Axboe <jens.axboe@oracle.com>
> Subject: Re: [PATCH 4/4] bidi support: bidirectional request
> Date: Mon, 30 Apr 2007 13:11:57 +0200
> 
>> On Sun, Apr 29 2007, James Bottomley wrote:
>>> On Sun, 2007-04-29 at 18:48 +0300, Boaz Harrosh wrote:
>>>> FUJITA Tomonori wrote:
>>>>> From: Boaz Harrosh <bharrosh@panasas.com>
>>>>> Subject: [PATCH 4/4] bidi support: bidirectional request
>>>>> Date: Sun, 15 Apr 2007 20:33:28 +0300
>>>>>
>>>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>>>> index 645d24b..16a02ee 100644
>>>>>> --- a/include/linux/blkdev.h
>>>>>> +++ b/include/linux/blkdev.h
>>>>>> @@ -322,6 +322,7 @@ struct request {
>>>>>>      void *end_io_data;
>>>>>>
>>>>>>      struct request_io_part uni;
>>>>>> +    struct request_io_part bidi_read;
>>>>>>  };
>>>>> Would be more straightforward to have:
>>>>>
>>>>> struct request_io_part in;
>>>>> struct request_io_part out;
>>>>>
>>>> Yes I wish I could do that. For bidi supporting drivers this is the most logical.
>>>> But for the 99.9% of uni-directional drivers, calling rq_uni(), and being some what on
>>>> the hotish paths, this means we will need a pointer to a uni request_io_part.
>>>> This is bad because:
>>>> 1st- There is no defined stage in a request life where to definitely set that pointer,
>>>>      specially in the preparation stages.
>>>> 2nd- hacks like scsi_error.c/scsi_send_eh_cmnd() will not work at all. Now this is a
>>>>      very bad spot already, and I have a short term fix for it in the SCSI-bidi patches
>>>>      (not sent yet) but a more long term solution is needed. Once such hacks are
>>>>      cleaned up we can do what you say. This is exactly why I use the access functions
>>>>      rq_uni/rq_io/rq_in/rq_out and not open code access.
>>> I'm still not really convinced about this approach.  The primary job of
>>> the block layer is to manage and merge READ and WRITE requests.  It
>>> serves a beautiful secondary function of queueing for arbitrary requests
>>> it doesn't understand (REQ_TYPE_BLOCK_PC or REQ_TYPE_SPECIAL ... or
>>> indeed any non REQ_TYPE_FS).
>>>
>>> bidirectional requests fall into the latter category (there's nothing
>>> really we can do to merge them ... they're just transported by the block
>>> layer).  The only unusual feature is that they carry two bios.  I think
>>> the drivers that actually support bidirectional will be a rarity, so it
>>> might even be advisable to add it to the queue capability (refuse
>>> bidirectional requests at the top rather than perturbing all the drivers
>>> to process them).
>>>
>>> So, what about REQ_TYPE_BIDIRECTIONAL rather than REQ_BIDI?  That will
>>> remove it from the standard path and put it on the special command type
>>> path where we can process it specially.  Additionally, if you take this
>>> approach, you can probably simply chain the second bio through
>>> req->special as an additional request in the stream.  The only thing
>>> that would then need modification would be the dequeue of the block
>>> driver (it would have to dequeue both requests and prepare them) and
>>> that needs to be done only for drivers handling bidirectional requests.
>> I agree, I'm really not crazy about shuffling the entire request setup
>> around just for something as exotic as bidirection commands. How about
>> just keeping it simple - have a second request linked off the first one
>> for the second data phase? So keep it completely seperate, not just
>> overload ->special for 2nd bio list.
>>
>> So basically just add a struct request pointer, so you can do rq =
>> rq->next_rq or something for the next data phase. I bet this would be a
>> LOT less invasive as well, and we can get by with a few helpers to
>> support it.
> 
> This patch tried this approach. It's just for seeing how it works.
> 
> I added bidi support to open-iscsi and bsg and tested this patch
> lightly. I've attached only a patch for the block layer and scsl-ml.
> You can get all the patches are:
> 
> http://www.kernel.org/pub/linux/kernel/people/tomo/bidi
> 
> If we go with this approach, we need just minor changes to the block
> layer. The overloading rq->special approach needs more but it's
> reasonable too.
> 
> I need to add the proper error handling code, which might be a bit
> tricky, but I think that it will not be so complicated.
> 
> 
>> And it should definitely be a request type.
> 
> I'm not sure about this. I think that bidi can't be a request type to
> trace bidi pc requests (we have bidi special requests like SMP). I use
> REQ_BIDI though I've not implemented bidi trace code.
> 
> 
> From 7d278323ff8aad86fb82c823538f7ddfb6ded11c Mon Sep 17 00:00:00 2001
> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date: Wed, 2 May 2007 03:55:56 +0900
> Subject: [PATCH] add bidi support
> 
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> ---
>  block/ll_rw_blk.c        |    1 +
>  drivers/scsi/scsi_lib.c  |   72 +++++++++++++++++++++++++++++++++++++++-------
>  include/linux/blkdev.h   |    7 ++++
>  include/scsi/scsi_cmnd.h |    9 ++++++
>  4 files changed, 78 insertions(+), 11 deletions(-)
> 

Well what can I say guys. Just that I told you so.
After 6 month you all agree on my original solution. Back than I
wouldn't even dream of touching block layer and struct request.
The 2 requests approach seemed perfectly fine. But people laughed
that it is maybe good for out-of-tree debugging but I must do
a proper support at the request layer. Even then, I say "ok but
can we do it very ugly but safe on the side?, not to touch any
legacy code." "No!" people say "do one for in one for out".
So what should I understand for next time? a "Proper bidi support
at the block layer" means we'll let you have a bad named pointer
so you don't have to overload req->end_io_data at the scsi-ml level.
Thanks I could use req->data for that.

Tomo's code is a perfectly valid approach, but it is certainly not
finished. I have some comments and questions below.
If anyone cares I put up a patcheset that I've tested to be stable.
it is here:
http://www.bhalevy.com/open-osd/download/double_request_bidi/

1. I'm not very experienced with blk_map_user but there is some
   asymmetry at SCSI-ml between handling of the first WRITE request
   and second READ request. At scsi_end_request() there is the
   end_that_request_chunk() calls. Now from my experience
   with kernel buffers (I am using sglists through scsi_req_map_sg())
   not calling end_that_request_chunk() on the second request will leak
   the BIOs. Maybe blk_{,un}map_user() does not care I'm not sure. In my
   SCSI-ml patch I have hacked up a scsi_end_request_block() to free
   the BIOs unconditionally, since it is what I want to do. But calling
   end_that_request_chunk() with the right size should also do the trick.
   Also at scsi_release_buffers() you better NULL the pointers and check
   for not NULL as scsi_release_buffers() can be called multiple times now,
   and on half prepared commands, in the error handling paths.
   (See my patch: 0002-scsi-midlayer-bidirectional-commands.patch)

2. At struct request the blk_bidi_rq() should just be:
   #define blk_bidi_rq(rq)		((rq)->next_rq != NULL)
   The flag gives us nothing but an extra BUG_ON. In fact we don't
   need the flag at all.
   And the name next_rq? are we intending to have a chain of these?
   What's bad with plain bidi_read?

3. On the other hand we need a new flag REQ_IO_ONLY, to stick on
   the Second READ request. So block layer can avoid doing any
   extra work on IO_ONLY requests. For now it does not matter
   since BIDI devices like OSD do not get any regular FS traffic.
   But when they will, the Second READ request might confuse the
   elevator handling.
   (See: 0003-out-of-tree-block-bidi-debugging.patch for what I mean)

Right now I'm not sure how to proceed. You can see my current patchset.
if any of it you like it's there. If any of it you don't like I can
change them. Any questions I will answer But I think I'll go crawl
under a rock for a while before I make a fool of myself...
oops is that to late?;)

Boaz


  reply	other threads:[~2007-05-03 18:43 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-15 17:17 [PATCH 0/4] bidi support: block layer bidirectional io Boaz Harrosh
2007-04-15 17:25 ` [PATCH 1/4] bidi support: request dma_data_direction Boaz Harrosh
2007-04-15 17:31 ` [PATCH 2/4] bidi support: fix req->cmd == INT cases Boaz Harrosh
2007-04-15 17:32 ` [PATCH 3/4] bidi support: request_io_part Boaz Harrosh
2007-04-29 15:49   ` Boaz Harrosh
2007-04-15 17:33 ` [PATCH 4/4] bidi support: bidirectional request Boaz Harrosh
2007-04-28 19:48   ` FUJITA Tomonori
2007-04-29 15:48     ` Boaz Harrosh
2007-04-29 18:49       ` James Bottomley
2007-04-30 11:11         ` Jens Axboe
2007-04-30 11:53           ` Benny Halevy
2007-04-30 11:59             ` Jens Axboe
2007-04-30 14:52               ` Douglas Gilbert
2007-04-30 14:51                 ` Jens Axboe
2007-04-30 15:12                   ` Benny Halevy
2007-05-01 18:22                   ` Boaz Harrosh
2007-05-01 18:57                     ` Jens Axboe
2007-05-01 19:01                       ` FUJITA Tomonori
2007-04-30 13:05           ` Mark Lord
2007-04-30 13:07             ` Jens Axboe
2007-05-01 19:50           ` FUJITA Tomonori
2007-05-03 17:42             ` Boaz Harrosh [this message]
2007-05-04 13:55               ` FUJITA Tomonori
2007-04-16 18:03 ` [PATCH 0/4] bidi support: block layer bidirectional io Douglas Gilbert
2007-04-17  9:32   ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=463A1F14.9080309@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=akpm@osdl.org \
    --cc=bhalevy@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=tomof@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.