Linux NFS development
 help / color / mirror / Atom feed
From: Olaf Kirch <olaf.kirch@oracle.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: nfs@lists.sourceforge.net
Subject: Re: cel's patches under development
Date: Wed, 11 Apr 2007 08:41:20 +0200	[thread overview]
Message-ID: <200704110841.21291.olaf.kirch@oracle.com> (raw)
In-Reply-To: <1176233539.309.27.camel@heimdal.trondhjem.org>

On Tuesday 10 April 2007 21:32, Trond Myklebust wrote:
> I'm a bit wary of this. skbs are usually allocated from the ATOMIC pool
> which is a very limited resource: their lifetime really wants to be as
> short as possible. Won't this basically end up aggravating an already
> nasty situation w.r.t. our behaviour when under memory pressure?

I don't think so. We're talking about an additional delay caused by
moving the memcpy from the BH to some process (user or rpciod).
I agree that it would be a bad idea to keep these skbs around longer
than needed.

On the server side, the impact of this delay will probably be even
lower - skbs usually spend some time on the TCP socket's receive
queue anyway, and nfsd is pulling over the received packet
using recvmsg.

> Also, what about stuff like RDMA, which doesn't need this sort of
> mechanism in order to get things right?

But RDMA may benefit from the proposed interface for transport
specific receive buffers (rpc_data objects). How that buffer works is
entirely up to the transport. For TCP and UDP it's skb_lists, but for
RDMA it would probably be something very different.

Here's the mode of operation - XDR functions that expect to receive
data to a pagevec, such as READ, READLINK etc, call
xprt_rcvbuf_alloc(xprt, pages, pgbase, pglen) to allocate a
transport specific buffer object. Transports such as TCP or UDP
ignore the page vector, but the RDMA transport could use this
to do set up its buffers. From the implementation point of view,
it's probably not much different from extracting the pagevec
information from the xdr_buf receive buffer, but it looks cleaner
to me.

Likewise, it is possible to create transport-specific rpc_data
objects for sending data (and eg have RDMA do a DMA Send
for these). This would allow to get rid of the pagevec inside the
xdr_buf altogether.

> Finally, will we need to keep writing these very complex handlers for
> every new protocol that we want to add (e.g. IPv6, IPoIB, ...)?

No. TCP and UDP share the same skb_list handling code,
regardless of address family and link layer protocol. Adding
a transport such as DCCP will not need a new handler
either.

I believe the net result of this proposed restructuring will be
less complexity. Right now we have half a dozen or so
functions that walk through an xdr_buf (head, pagevec, tail)...
whereas with the proposed changes, you have the complexity
confined to one place.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir@lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

  reply	other threads:[~2007-04-11  6:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-26 23:09 cel's patches under development Chuck Lever
2007-04-10 15:22 ` Olaf Kirch
2007-04-10 19:32   ` Trond Myklebust
2007-04-11  6:41     ` Olaf Kirch [this message]
2007-04-11 11:39       ` Talpey, Thomas
2007-04-11 12:02         ` Olaf Kirch
2007-04-11 12:32           ` Talpey, Thomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200704110841.21291.olaf.kirch@oracle.com \
    --to=olaf.kirch@oracle.com \
    --cc=nfs@lists.sourceforge.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox