From: Greg Banks <gnb@sgi.com>
To: Tom Tucker <tom@opengridcomputing.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
Neil Brown <neilb@suse.de>, Tom Talpey <Thomas.Talpey@netapp.com>,
Linux NFS Mailing List <nfs@lists.sourceforge.net>,
Peter Leckie <pleckie@melbourne.sgi.com>
Subject: Re: [RFC,PATCH 11/15] knfsd: RDMA transport core
Date: Thu, 24 May 2007 00:09:01 +1000 [thread overview]
Message-ID: <20070523140901.GG14076@sgi.com> (raw)
In-Reply-To: <1179516988.23385.171.camel@trinity.ogc.int>
On Fri, May 18, 2007 at 02:36:28PM -0500, Tom Tucker wrote:
> On Fri, 2007-05-18 at 15:24 -0400, J. Bruce Fields wrote:
> > On Fri, May 18, 2007 at 12:45:52PM -0500, Tom Tucker wrote:
> >
> > > +/*
> > > + * Setup the reply buffer for the svc_process function to write the
> > > + * RPC into.
> > > + */
> > > +static int svc_rdma_prep_reply_buf(struct svc_rqst *rqstp)
> > > +{
> > > + struct kvec *resv = &rqstp->rq_res.head[0];
> > > +
> > > + /* setup response xdr_buf.
> > > + * Initially it has just one page
> > > + */
> > > + rqstp->rq_resused = 1;
> > > + resv->iov_base = page_address(rqstp->rq_respages[0]);
> > > + resv->iov_len = 0;
> > > + rqstp->rq_res.pages = rqstp->rq_respages+1;
> > > + rqstp->rq_res.len = 0;
> > > + rqstp->rq_res.page_base = 0;
> > > + rqstp->rq_res.page_len = 0;
> > > + rqstp->rq_res.buflen = PAGE_SIZE;
> > > + rqstp->rq_res.tail[0].iov_base = NULL;
> > > + rqstp->rq_res.tail[0].iov_len = 0;
> > > +
> > > + return 0;
> > > +}
> >
> > I think this is a repeat of a question from yesterday--I'm not sure why
> > we bothered to pull this out into separate function when the
> > implementations all end up being identical, except for the one line for
> > the record length in the tcp case?
> >
>
> Yeah, you're right. Maybe it shouldn't be prepare_buffer, but rather
> "prepare_header". Then you'd keep the buffer code common and just have
> the one line function for the TCP record len.
Ok, I'll change my patch to push all those initialisation lines back
into svc_process() where they came from, make svc_tcp_prepare_reply()
do just the 1 line it needs to, and have no method for UDP.
Tom, just curious but in the November there was actually a difference
between the sockets and RDMA versions, the latter didn't have this
line:
rqstp->rq_res.buflen = PAGE_SIZE;
> > > +static struct cache_deferred_req *
> > > +svc_rdma_defer(struct cache_req *req)
> > > +{
> > > + struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
> > > + struct svcxprt_rdma *xprt;
> > > + struct svc_rdma_deferred_req *dr;
> > > +
> > > + dprintk("svcrdma: deferring request on \n"
> > > + " rqstp=%p\n"
> > > + " rqstp->rq_arg.len=%d\n",
> > > + rqstp,
> > > + rqstp->rq_arg.len);
> > > +
> > > + /* if more than a page, give up FIXME */
> > > + if (rqstp->rq_arg.page_len)
> > > + return NULL;
> > > + BUG_ON(rqstp->rq_deferred);
> > > + xprt = (struct svcxprt_rdma*)rqstp->rq_sock;
> > > + retry:
> > > + dr = kmalloc(sizeof(struct svc_rdma_deferred_req), GFP_KERNEL);
> > > + if (!dr) {
> > > + printk(KERN_INFO "svcrdma: sleeping waiting for memory\n");
> > > + schedule_timeout_uninterruptible(msecs_to_jiffies(1000));
> > > + goto retry;
> > > + }
> >
> > Why not return NULL, as svc_defer() does?
> >
>
> I think you're right. I'll fix this.
Why have svc_rdma_defer() at all? The generic code should be able
to do call deferral and revisiting. After all, it just needs to
shuffle bits aside from the xdr_buf and put them back later; it's not
a performance path; and the logic is convoluted and a pain to test.
To illustrate how difficult this logic is, note that svc_rdma_revisit()
leaks a page when called with too_many=1.
Like handling SK_CLOSE, it's logic that should be in the generic
code rather than replicated in three svc_foo_recvfrom() routines.
Looking at the code, it seems to me that the reason for doing your
own defer/revisit logic is to preserve the RDMA chunking header
which lives before the RPC call header. This header is skipped
by svc_rdma_xdr_decode_req(), but a pointer to it is magically
reconstructed by svc_rdma_sendto() so it can traverse the write chunks
or reply chunks therein. Am I correct?
So the thing that RDMA is doing differently from the other transports
is that it needs to go back and look at its transport-specific
on-the-wire header again after having earlier skipped over it, and that
this header needs to be preserved across a defer & revisit. Right?
So perhaps a better approach would be to slightly tweak the existing
generic defer & revisit logic to allow a transport to specify how
many bytes of transport header need to be preserved. Then the
transport-specific code needs very little code to support defer &
revisit, and doesn't duplicate multiple complex and subtle functions
which mess with svc_sock internals.
I have 4 untested patches which attempt to do this, I'll post them
in a moment.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-05-23 14:09 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-18 17:45 [RFC,PATCH 11/15] knfsd: RDMA transport core Tom Tucker
2007-05-18 19:07 ` Trond Myklebust
2007-05-18 20:07 ` Tom Tucker
2007-05-18 21:17 ` Trond Myklebust
2007-05-19 4:32 ` Tom Tucker
2007-05-21 7:16 ` Neil Brown
2007-05-21 16:02 ` Tom Tucker
2007-05-22 5:36 ` Neil Brown
2007-05-22 15:23 ` Tom Tucker
2007-05-18 19:24 ` J. Bruce Fields
2007-05-18 19:36 ` Tom Tucker
2007-05-18 19:42 ` J. Bruce Fields
2007-05-23 14:09 ` Greg Banks [this message]
2007-05-23 14:43 ` Tom Tucker
2007-05-23 14:55 ` Greg Banks
2007-05-23 15:03 ` Trond Myklebust
2007-05-23 15:12 ` Tom Tucker
2007-05-23 15:37 ` Trond Myklebust
2007-05-23 16:02 ` Tom Tucker
2007-05-23 16:35 ` Greg Banks
2007-05-23 16:29 ` Greg Banks
2007-05-23 18:07 ` Trond Myklebust
2007-05-23 18:19 ` Talpey, Thomas
2007-05-23 18:37 ` Trond Myklebust
2007-05-23 18:59 ` Talpey, Thomas
2007-05-23 20:01 ` Trond Myklebust
2007-05-23 21:00 ` Talpey, Thomas
2007-05-24 8:35 ` Greg Banks
2007-05-24 13:45 ` Talpey, Thomas
2007-05-23 15:03 ` Tom Tucker
2007-05-21 7:11 ` Neil Brown
2007-05-21 10:02 ` Greg Banks
2007-05-21 15:58 ` Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070523140901.GG14076@sgi.com \
--to=gnb@sgi.com \
--cc=Thomas.Talpey@netapp.com \
--cc=bfields@fieldses.org \
--cc=neilb@suse.de \
--cc=nfs@lists.sourceforge.net \
--cc=pleckie@melbourne.sgi.com \
--cc=tom@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.