All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: Wang Sen <senwang@linux.vnet.ibm.com>,
	linux-scsi@vger.kernel.org, JBottomley@parallels.com,
	stefanha@linux.vnet.ibm.com, mc@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
Date: Wed, 25 Jul 2012 16:17:56 +0200	[thread overview]
Message-ID: <50100014.2010109@redhat.com> (raw)
In-Reply-To: <500FF412.3090600@panasas.com>

Il 25/07/2012 15:26, Boaz Harrosh ha scritto:
>>> In SCSI land most LLDs should support chaining just by virtu of using the
>>> for_each_sg macro. That all it takes. Your code above does support it.
>>
>> Yes, it supports it but still has to undo them before passing to virtio.
>>
>> What my LLD does is add a request descriptor in front of the scatterlist
>> that the LLD receives.  I would like to do this with a 2-item
>> scatterlist: one for the request descriptor, and one which is a chain to
>> the original scatterlist.  
> 
> I hate that plan. Why yet override the scatter element yet again with a third
> union of a "request descriptor"

I'm not overriding (or did you mean overloading?) anything, and I think
you're reading too much in my words.

What I am saying is (for a WRITE command):

1) what I get is a scsi_cmnd which contains an N-element scatterlist.

2) virtio-scsi has to build the "packet" that is passed to the hardware
(it does not matter that the hardware is virtual).  This packet (per
virtio-scsi spec) has an N+1-element scatterlist, where the first
element is a request descriptor (struct virtio_scsi_cmd_req), and the
others describe the written data.

3) virtio takes care of converting the "packet" from a scatterlist
(which currently must be a flat one) to the hardware representation.
Here a walk is inevitable, so we don't care about this walk.

4) What I'm doing now: copying (and flattening) the N-element
scatterlist onto the last elements of an N+1 array that I pass to virtio.

          _ _ _ _ _ _
         |_|_|_|_|_|_|  scsi_cmnd scatterlist

         vvv COPY vvv
        _ _ _ _ _ _ _
       |_|_|_|_|_|_|_|  scatterlist passed to virtio
        |
    virtio_scsi_cmd_req

Then I hand off the scatterlist to virtio.  virtio walks it and converts
it to hardware format.

5) What I want to do: create a 2-element scatterlist, the first being
the request descriptor and the second chaining to scsi_cmnd's N-element
scatterlist.

             _ _ _ _ _ _
            |_|_|_|_|_|_|  scsi_cmnd scatterlist
        _ _/
       |_|C|               scatterlist passed to virtio
        |
    virtio_scsi_cmd_req

Then I hand off the scatterlist to virtio.  virtio will still walk the
scatterlist chain, and convert it to N+1 elements for the hardware to
consume.  Still, removing one walk largely reduces the length of my
critical sections.  I also save some pointer-chasing because the
2-element scatterlist are short-lived and can reside on the stack.


Other details (you can probably skip these):

There is also a response descriptor.  In the case of writes this is the
only element that the hardware will write to, so in the case of writes
the "written by hardware" scatterlist has 1 element only and does not
need chaining.

Reads are entirely symmetric.  The hardware will read the request
descriptor from a 1-element scatterlist, and will write response+data
into an N+1-element scatterlist (the response descriptor precedes the
data that was read).  It can be treated in exactly the same way.  The
N+1-element scatterlist could also become a 2-element scatterlist with
chaining.

>> Except that if I call sg_chain and my
>> architecture does not define ARCH_HAS_SG_CHAIN, it will BUG out.  So I
>> need to keep all the scatterlist allocation and copying crap that I have
>> now within #ifdef, and it will bitrot.
> 
> except that with the correct design you don't call sg_chain you just do:
> 	request_descriptor->sg_list = sg;

By the above it should be clear, that request_descriptor is not a
driver-private extension of the scsi_cmnd.  It is something passed to
the hardware.

>>> Well that can be done as well, (If done carefully) It should be easy to add
>>> chained fragments to both the end of a given chain just as at beginning.
>>> It only means that the last element of the appended-to chain moves to
>>> the next fragment and it's place is replaced by a link.
>>
>> But you cannot do that in constant time, can you?  And that means you do
>> not enjoy any benefit in terms of cache misses etc.
> 
> I did not understand "constant time" it is O(0) if that what you meant.

In the worst case it is a linked list, no?  So in the worst case
_finding_ the last element of the appended-to chain is O(n).

Actually, appending to the end is not a problem for virtio-scsi.  But
for example the virtio-blk spec places the response descriptor _after_
the input data.  I think this was a mistake, and I didn't repeat it for
virtio-scsi, but I cited it because in the past Rusty complained that
the long sglist implementation was "SCSI-centric".

>> Also, this assumes that I can modify the appended-to chain.  I'm not
>> sure I can do this?
> 
> Each case it's own. If the appended-to chain is const, yes you'll have
> to reallocate it and copy. Is that your case?

It will be virtio-blk's case, but we can ignore it.

Paolo

  parent reply	other threads:[~2012-07-25 14:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-25  8:29 [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list Wang Sen
2012-07-25  8:44 ` Paolo Bonzini
2012-07-25  9:22   ` Boaz Harrosh
2012-07-25  9:22     ` Boaz Harrosh
2012-07-25  9:41     ` Paolo Bonzini
2012-07-25 12:34       ` Boaz Harrosh
2012-07-25 12:34         ` Boaz Harrosh
2012-07-25 12:49         ` Paolo Bonzini
2012-07-25 13:26           ` Boaz Harrosh
2012-07-25 13:26             ` Boaz Harrosh
2012-07-25 13:36             ` Paolo Bonzini
2012-07-25 14:36               ` Boaz Harrosh
2012-07-25 14:36                 ` Boaz Harrosh
2012-07-25 15:09                 ` performance improvements for the sglist API (Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list) Paolo Bonzini
2012-07-25 15:16                   ` Paolo Bonzini
2012-07-25 14:17             ` Paolo Bonzini [this message]
2012-07-25 15:28               ` virtio(-scsi) vs. chained sg_lists (was " Boaz Harrosh
2012-07-25 15:28                 ` Boaz Harrosh
2012-07-25 17:43                 ` Paolo Bonzini
2012-07-25 19:16                   ` Boaz Harrosh
2012-07-25 19:16                     ` Boaz Harrosh
2012-07-25 20:06                     ` Paolo Bonzini
2012-07-25 21:04                       ` Boaz Harrosh
2012-07-25 21:04                         ` Boaz Harrosh
2012-07-26  7:23                         ` Paolo Bonzini
2012-07-26  7:56                           ` Boaz Harrosh
2012-07-26  7:56                             ` Boaz Harrosh
2012-07-26  7:58                             ` Paolo Bonzini
2012-07-26 13:05                               ` Paolo Bonzini
2012-07-27  6:27                                 ` Rusty Russell
2012-07-27  6:27                                   ` Rusty Russell
2012-07-27  8:11                                   ` Paolo Bonzini
2012-07-29 23:50                                     ` Rusty Russell
2012-07-29 23:50                                       ` Rusty Russell
2012-07-30  7:12                                       ` Paolo Bonzini
2012-07-30  8:56                                         ` Boaz Harrosh
2012-07-30  8:56                                           ` Boaz Harrosh
2012-07-25 10:41   ` [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list Stefan Hajnoczi
2012-07-25 11:48     ` Sen Wang
2012-07-25 11:44   ` Sen Wang
2012-07-25 12:40     ` Boaz Harrosh
2012-07-25 12:40       ` Boaz Harrosh
2012-07-27  3:12       ` Wang Sen
2012-07-27  6:50         ` Paolo Bonzini
2012-07-25 10:04 ` Rolf Eike Beer
2012-07-25 10:04   ` Rolf Eike Beer
2012-07-25 11:46   ` Sen Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50100014.2010109@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=JBottomley@parallels.com \
    --cc=bharrosh@panasas.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mc@linux.vnet.ibm.com \
    --cc=rusty@rustcorp.com.au \
    --cc=senwang@linux.vnet.ibm.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.