linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com>
To: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>,
	linux-fsdevel@vger.kernel.org,
	v9fs-developer@lists.sourceforge.net,
	Badari Pulavarty <pbadari@us.ibm.com>
Subject: Re: [V9fs-developer] [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list.
Date: Thu, 19 Aug 2010 16:35:28 -0700	[thread overview]
Message-ID: <4C6DBFC0.3020906@linux.vnet.ibm.com> (raw)
In-Reply-To: <AANLkTinDw=hBTsv9Q+HJDkMNY06sL7T0eZudsycp1H9F@mail.gmail.com>

Eric Van Hensbergen wrote:
> Yeah, I haven't done a thorough critique of the code yet as I'm
> overloaded with other things, but my overall impression is similar to
> lucho's.  Last time I looked at doing this my approach was to keep
> everything zero copy in the fs and client layer and if I needed to
> allocate/copy in the transport because it didn't support
> scatter/gather I did it in the transport.
> 
> Of course, this touches more than just the virtio transport and so I
> can understand the trepidation at broader changes, but it seems like
> we are doing backflips to try and avoid broader changes and the result
> is somewhat awkward due the overloaded nature of many of the fields --
> in the end it is likely to lead to more problems.

I understand that it is not very pretty.. :) but the main motto was modular and
limited change.
Unless we separate the header from payload completely at the protocol level,
I think it may be difficult to achieve zero copy cleanly.

I will be more than happy to take your suggestions on how to get it right cleanly.
Increasing msize in the current model really doesn't solve the issue, as it forces
us to bigger kmalloc()s even for simple transactions like TLCREATE/RLCREATE.

Another area where we need to look at is, the same structure, 9p_fcall is being
used for
both transmission and reception. Most of the times only one way will need larger
memory,
not the other way.


> 
> On Thu, Aug 19, 2010 at 4:07 PM, Latchesar Ionkov <lucho@ionkov.net> wrote:
>> Frankly, I don't like the way the zero copy is implemented much.
>>
>> 1. In case of Twrite, the size[4] field of the 9P message contains the
>> size of the message without the data field.
>>
>> 2. In case of Tread message, the pages to receive the data are
>> contained in req->tc instead of req->rc.

We had some debate internally, on this before choosing req->tc. :)
The argument being, whatever client sends, it puts it on TC, and server responds
back on RC.
But I do understand your point too.

Eric/Lucho,

Please let me know if you have any suggestions to get this right.

Thanks,
JV

>>
>> What is the point of having the page data in p9_fcall, if only the one
>> in req->tc is ever going to be used? I am trying to implement a
>> transport that uses your patches, I have to fill my code with
>> explanations why I am doing crazy things.
>>
>> Thanks,
>>    Lucho
>>
>> On Thu, Aug 19, 2010 at 2:47 PM, Venkateswararao Jujjuri (JV)
>> <jvrao@linux.vnet.ibm.com> wrote:
>>> Latchesar Ionkov wrote:
>>>> It is kind of strange that part of the fcall packet (everything other
>>>> than the Rread/Twrite data) is located in a buffer pointed by sdata,
>>>> and the rest is in pages pointed by pdata. A scatterlist would make it
>>>> tidier and easier to figure out what is where.
>>> A separate sg list will further increase the size of PDU. Initially I had a sg list
>>> sitting in the fcall structure. But when we are using the page address directly,
>>> we are not making use of sdata completely ...only initial portion is used for
>>> the header.
>>> To make use of kernel memory efficiently, we came up with this plan of overloading
>>> sdata with page pointers during the user initiated Rread/Twrite calls.
>>>
>>> The major advantage we saw with this method is, changes are very modular.
>>> Other parts of the code, and other transports work without a change.
>>>
>>> If needed, we can easily have a separate sg list vector in the fcall, but it may
>>> not be using
>>> the kernel memory efficiently as we have the whole sdata allocated but not being
>>> used.
>>>
>>> Thanks,
>>> JV
>>>
>>>> Thanks,
>>>>     Lucho
>>>>
>>>> On Thu, Aug 19, 2010 at 12:28 PM, Venkateswararao Jujjuri (JV)
>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>> Latchesar Ionkov wrote:
>>>>>> Is there any particular reason p9_fcall to have pointers to pages
>>>>>> instead of a scatterlist?
>>>>> Given that page sizes are constant, all we need is offset into the first page.
>>>>> IO size determines the last page. So we decided no need to put sg list in the
>>>>> p9_fcall.
>>>>>
>>>>> Thanks,
>>>>> JV
>>>>>
>>>>>> Thanks,
>>>>>>     Lucho
>>>>>>
>>>>>> On Tue, Aug 17, 2010 at 11:27 AM, Venkateswararao Jujjuri (JV)
>>>>>> <jvrao@linux.vnet.ibm.com> wrote:
>>>>>>> This patch adds necessary infrastructure for placing page addresses
>>>>>>> directly on the sg list for the server to consume.
>>>>>>>
>>>>>>> The newly added routine pack_sg_list_p() is just like pack_sg_list()
>>>>>>> except that it takes page array as an input and directly places them on
>>>>>>> the sg list after taking care of the first page offset.
>>>>>>>
>>>>>>> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
>>>>>>> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
>>>>>>> ---
>>>>>>>  include/net/9p/9p.h   |    6 ++++-
>>>>>>>  net/9p/client.c       |    4 +++
>>>>>>>  net/9p/trans_virtio.c |   59 ++++++++++++++++++++++++++++++++++++++++++++++--
>>>>>>>  3 files changed, 65 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>>>>>> index a8de812..382ef22 100644
>>>>>>> --- a/include/net/9p/9p.h
>>>>>>> +++ b/include/net/9p/9p.h
>>>>>>> @@ -651,7 +651,11 @@ struct p9_fcall {
>>>>>>>
>>>>>>>        size_t offset;
>>>>>>>        size_t capacity;
>>>>>>> -
>>>>>>> +       struct page **pdata;
>>>>>>> +       uint32_t pdata_mapped_pages;
>>>>>>> +       uint32_t pdata_off;
>>>>>>> +       uint32_t pdata_write_len;
>>>>>>> +       uint32_t pdata_read_len;
>>>>>>>        uint8_t *sdata;
>>>>>>>  };
>>>>>>>
>>>>>>> diff --git a/net/9p/client.c b/net/9p/client.c
>>>>>>> index 29bbbbd..5487896 100644
>>>>>>> --- a/net/9p/client.c
>>>>>>> +++ b/net/9p/client.c
>>>>>>> @@ -244,8 +244,12 @@ static struct p9_req_t *p9_tag_alloc(struct p9_client *c, u16 tag)
>>>>>>>                }
>>>>>>>                req->tc->sdata = (char *) req->tc + sizeof(struct p9_fcall);
>>>>>>>                req->tc->capacity = c->msize;
>>>>>>> +               req->tc->pdata_write_len = 0;
>>>>>>> +               req->tc->pdata_read_len = 0;
>>>>>>>                req->rc->sdata = (char *) req->rc + sizeof(struct p9_fcall);
>>>>>>>                req->rc->capacity = c->msize;
>>>>>>> +               req->rc->pdata_write_len = 0;
>>>>>>> +               req->rc->pdata_read_len = 0;
>>>>>>>        }
>>>>>>>
>>>>>>>        p9pdu_reset(req->tc);
>>>>>>> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
>>>>>>> index 762c19f..8f86cb5 100644
>>>>>>> --- a/net/9p/trans_virtio.c
>>>>>>> +++ b/net/9p/trans_virtio.c
>>>>>>> @@ -180,6 +180,44 @@ pack_sg_list(struct scatterlist *sg, int start, int limit, char *data,
>>>>>>>        return index-start;
>>>>>>>  }
>>>>>>>
>>>>>>> +/**
>>>>>>> + * pack_sg_list_p - Pack a scatter gather list from an array of pages.
>>>>>>> + * @sg: scatter/gather list to pack into
>>>>>>> + * @start: which segment of the sg_list to start at
>>>>>>> + * @limit: maximum segment to pack data to
>>>>>>> + * @pdu: pdu prepared to put on the wire.
>>>>>>> + * @count: amount of data to pack into the scatter/gather list
>>>>>>> + *
>>>>>>> + * This is just like pack_sg_list() except that it takes page array
>>>>>>> + * as an input and directly places them on the sg list after taking
>>>>>>> + * care of the first page offset.
>>>>>>> + */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +pack_sg_list_p(struct scatterlist *sg, int start, int limit,
>>>>>>> +               struct p9_fcall *pdu, int count)
>>>>>>> +{
>>>>>>> +       int s;
>>>>>>> +       int i = 0;
>>>>>>> +       int index = start;
>>>>>>> +
>>>>>>> +       if (pdu->pdata_off) {
>>>>>>> +               s = min((int)(PAGE_SIZE - pdu->pdata_off), count);
>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, pdu->pdata_off);
>>>>>>> +               count -= s;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       while (count) {
>>>>>>> +               BUG_ON(index > limit);
>>>>>>> +               s = min((int)PAGE_SIZE, count);
>>>>>>> +               sg_set_page(&sg[index++], pdu->pdata[i++], s, 0);
>>>>>>> +               count -= s;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       return index-start;
>>>>>>> +}
>>>>>>> +
>>>>>>> +
>>>>>>>  /* We don't currently allow canceling of virtio requests */
>>>>>>>  static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>  {
>>>>>>> @@ -196,16 +234,31 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>>>>>>>  static int
>>>>>>>  p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
>>>>>>>  {
>>>>>>> -       int in, out;
>>>>>>> +       int in, out, outp, inp;
>>>>>>>        struct virtio_chan *chan = client->trans;
>>>>>>>        char *rdata = (char *)req->rc+sizeof(struct p9_fcall);
>>>>>>>
>>>>>>>        P9_DPRINTK(P9_DEBUG_TRANS, "9p debug: virtio request\n");
>>>>>>>
>>>>>>>        out = pack_sg_list(chan->sg, 0, VIRTQUEUE_NUM, req->tc->sdata,
>>>>>>> -                                                               req->tc->size);
>>>>>>> -       in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM-out, rdata,
>>>>>>> +                       req->tc->size);
>>>>>>> +
>>>>>>> +       BUG_ON(req->tc->pdata_write_len && req->tc->pdata_read_len);
>>>>>>> +
>>>>>>> +       if (req->tc->pdata_write_len) {
>>>>>>> +               outp = pack_sg_list_p(chan->sg, out, VIRTQUEUE_NUM,
>>>>>>> +                                       req->tc, req->tc->pdata_write_len);
>>>>>>> +               out += outp;
>>>>>>> +       }
>>>>>>> +       if (req->tc->pdata_read_len) {
>>>>>>> +               inp = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata, 11);
>>>>>>> +               in = pack_sg_list_p(chan->sg, out+inp, VIRTQUEUE_NUM,
>>>>>>> +                                       req->tc, req->tc->pdata_read_len);
>>>>>>> +               in += inp;
>>>>>>> +       } else {
>>>>>>> +               in = pack_sg_list(chan->sg, out, VIRTQUEUE_NUM, rdata,
>>>>>>>                                                                client->msize);
>>>>>>> +       }
>>>>>>>
>>>>>>>        req->status = REQ_STATUS_SENT;
>>>>>>>
>>>>>>> --
>>>>>>> 1.6.5.2
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by
>>>>>>>
>>>>>>> Make an app they can't live without
>>>>>>> Enter the BlackBerry Developer Challenge
>>>>>>> http://p.sf.net/sfu/RIM-dev2dev
>>>>>>> _______________________________________________
>>>>>>> V9fs-developer mailing list
>>>>>>> V9fs-developer@lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>>>>>>
>>>>>
>>>
>>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> V9fs-developer mailing list
>> V9fs-developer@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/v9fs-developer
>>



  reply	other threads:[~2010-08-19 23:35 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-17 17:27 [00/05] Add zero copy capability to virtio transport Venkateswararao Jujjuri (JV)
2010-08-17 17:27 ` [PATCH 1/5] [net/9p] Add capability() to p9_trans_module Venkateswararao Jujjuri (JV)
2010-08-17 20:43   ` [V9fs-developer] " Eric Van Hensbergen
2010-08-17 20:46     ` Latchesar Ionkov
2010-08-17 23:31       ` Venkateswararao Jujjuri (JV)
2010-08-18 15:16         ` Eric Van Hensbergen
2010-08-18 16:56           ` Venkateswararao Jujjuri (JV)
2010-08-18 18:26             ` Eric Van Hensbergen
2010-08-17 17:27 ` [PATCH 2/5] [net/9p] Pass p9_client structure to pdu perpartion routines Venkateswararao Jujjuri (JV)
2010-08-17 17:27 ` [PATCH 3/5] [net/9p] Add support for placing page addresses directly on the sg list Venkateswararao Jujjuri (JV)
2010-08-18 20:50   ` [V9fs-developer] " Latchesar Ionkov
2010-08-19 18:28     ` Venkateswararao Jujjuri (JV)
2010-08-19 18:49       ` Latchesar Ionkov
2010-08-19 20:47         ` Venkateswararao Jujjuri (JV)
2010-08-19 21:07           ` Latchesar Ionkov
2010-08-19 21:26             ` Eric Van Hensbergen
2010-08-19 23:35               ` Venkateswararao Jujjuri (JV) [this message]
2010-08-20  0:27                 ` Eric Van Hensbergen
2010-08-17 17:27 ` [PATCH 4/5] [net/9p] Achieve zero copy on write path Venkateswararao Jujjuri (JV)
2010-08-19 19:30   ` [V9fs-developer] " Latchesar Ionkov
2010-08-19 20:55     ` Venkateswararao Jujjuri (JV)
2010-08-17 17:27 ` [PATCH 5/5] [net/9p] Achieve zero copy on read path Venkateswararao Jujjuri (JV)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C6DBFC0.3020906@linux.vnet.ibm.com \
    --to=jvrao@linux.vnet.ibm.com \
    --cc=ericvh@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=pbadari@us.ibm.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).