* cel's patches under development @ 2007-03-26 23:09 Chuck Lever 2007-04-10 15:22 ` Olaf Kirch 0 siblings, 1 reply; 7+ messages in thread From: Chuck Lever @ 2007-03-26 23:09 UTC (permalink / raw) To: nfs; +Cc: Olaf Kirch [-- Attachment #1: Type: text/plain, Size: 654 bytes --] This week's code drop is now available on linux-nfs.org in my git repo (cel-2.6.git) under the branch called drop-20070326. This branch contains the remaining RPC transport switch patches against 2.6.21-rc5, IPv6 patches (such as they are), BKL elimination patches, an immature set of in-kernel mount option parsing patches, and a long series of patches that rework the NFS client XDR routines. The NFS XDR rewrite is partially finished. The v2 and v3 routines are tested and working. The NFSv4 encode routines are compile tested only, and the decode routines haven't been changed yet. Plans include updates for NLM, rpcbind, and NFSROOT MNT. [-- Attachment #2: chuck.lever.vcf --] [-- Type: text/x-vcard, Size: 317 bytes --] begin:vcard fn:Chuck Lever n:Lever;Chuck org:Oracle Corporation;Corporate Architecture -- Linux Projects Group adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA email;internet:chuck dot lever at nospam oracle dot com title:Principal Member of Staff tel;work:+1 248 614 5091 x-mozilla-html:FALSE version:2.1 end:vcard [-- Attachment #3: Type: text/plain, Size: 345 bytes --] ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV [-- Attachment #4: Type: text/plain, Size: 140 bytes --] _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cel's patches under development 2007-03-26 23:09 cel's patches under development Chuck Lever @ 2007-04-10 15:22 ` Olaf Kirch 2007-04-10 19:32 ` Trond Myklebust 0 siblings, 1 reply; 7+ messages in thread From: Olaf Kirch @ 2007-04-10 15:22 UTC (permalink / raw) To: nfs, chuck.lever Hi, I've been working a bit on some changes to clean up the xdr_buf handling. Patches can be found here: http://oss.oracle.com/~okir/xdr-buf-rewrite/ The patch set aims to - factor out common send/recv code in the client/server side, and put it into socklib.c - get rid of some of the messy and duplicated code in xdr.c - move the data copies out of the data_ready callbacks in xprtsock.c. (In the long run, one may want to think about per-CPU rpciod threads, so that data remains inside the cache of the CPU that processes it, rather than bouncing around) The current code tries to guess the correct alignment of the returned data in READ, READLINK and READDIR replies, and inlines a vector of pages "just so" into the XDR so that the data ends up in the right place. If it doesn't, we jump through 15 hoops to realign it. The approach I'm trying is to leave the data in skb's for as long as it's practical, and copy data to the page cache in the XDR decode routines. In order to hide the skb mechanics, there's an rpc_data object that encapsulates this. It looks a little awkward, but Chuck complained about layering violations so I tried this instead. The patch set is based on Chuck's tree (drop-20070409). It's far from complete; in particular, I want to get rid of any code that uses xdr_inline_pages, and I want to fully convert svcsock.c to use the new generic recv functions as well. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cel's patches under development 2007-04-10 15:22 ` Olaf Kirch @ 2007-04-10 19:32 ` Trond Myklebust 2007-04-11 6:41 ` Olaf Kirch 0 siblings, 1 reply; 7+ messages in thread From: Trond Myklebust @ 2007-04-10 19:32 UTC (permalink / raw) To: Olaf Kirch; +Cc: nfs On Tue, 2007-04-10 at 17:22 +0200, Olaf Kirch wrote: > Hi, > The approach I'm trying is to leave the data in skb's for as long as it's > practical, and copy data to the page cache in the XDR decode routines. > In order to hide the skb mechanics, there's an rpc_data object that > encapsulates this. It looks a little awkward, but Chuck complained > about layering violations so I tried this instead. I'm a bit wary of this. skbs are usually allocated from the ATOMIC pool which is a very limited resource: their lifetime really wants to be as short as possible. Won't this basically end up aggravating an already nasty situation w.r.t. our behaviour when under memory pressure? Also, what about stuff like RDMA, which doesn't need this sort of mechanism in order to get things right? Finally, will we need to keep writing these very complex handlers for every new protocol that we want to add (e.g. IPv6, IPoIB, ...)? Cheers Trond ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cel's patches under development 2007-04-10 19:32 ` Trond Myklebust @ 2007-04-11 6:41 ` Olaf Kirch 2007-04-11 11:39 ` Talpey, Thomas 0 siblings, 1 reply; 7+ messages in thread From: Olaf Kirch @ 2007-04-11 6:41 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs On Tuesday 10 April 2007 21:32, Trond Myklebust wrote: > I'm a bit wary of this. skbs are usually allocated from the ATOMIC pool > which is a very limited resource: their lifetime really wants to be as > short as possible. Won't this basically end up aggravating an already > nasty situation w.r.t. our behaviour when under memory pressure? I don't think so. We're talking about an additional delay caused by moving the memcpy from the BH to some process (user or rpciod). I agree that it would be a bad idea to keep these skbs around longer than needed. On the server side, the impact of this delay will probably be even lower - skbs usually spend some time on the TCP socket's receive queue anyway, and nfsd is pulling over the received packet using recvmsg. > Also, what about stuff like RDMA, which doesn't need this sort of > mechanism in order to get things right? But RDMA may benefit from the proposed interface for transport specific receive buffers (rpc_data objects). How that buffer works is entirely up to the transport. For TCP and UDP it's skb_lists, but for RDMA it would probably be something very different. Here's the mode of operation - XDR functions that expect to receive data to a pagevec, such as READ, READLINK etc, call xprt_rcvbuf_alloc(xprt, pages, pgbase, pglen) to allocate a transport specific buffer object. Transports such as TCP or UDP ignore the page vector, but the RDMA transport could use this to do set up its buffers. From the implementation point of view, it's probably not much different from extracting the pagevec information from the xdr_buf receive buffer, but it looks cleaner to me. Likewise, it is possible to create transport-specific rpc_data objects for sending data (and eg have RDMA do a DMA Send for these). This would allow to get rid of the pagevec inside the xdr_buf altogether. > Finally, will we need to keep writing these very complex handlers for > every new protocol that we want to add (e.g. IPv6, IPoIB, ...)? No. TCP and UDP share the same skb_list handling code, regardless of address family and link layer protocol. Adding a transport such as DCCP will not need a new handler either. I believe the net result of this proposed restructuring will be less complexity. Right now we have half a dozen or so functions that walk through an xdr_buf (head, pagevec, tail)... whereas with the proposed changes, you have the complexity confined to one place. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cel's patches under development 2007-04-11 6:41 ` Olaf Kirch @ 2007-04-11 11:39 ` Talpey, Thomas 2007-04-11 12:02 ` Olaf Kirch 0 siblings, 1 reply; 7+ messages in thread From: Talpey, Thomas @ 2007-04-11 11:39 UTC (permalink / raw) To: Olaf Kirch; +Cc: nfs, Trond Myklebust At 02:41 AM 4/11/2007, Olaf Kirch wrote: >On Tuesday 10 April 2007 21:32, Trond Myklebust wrote: >> Also, what about stuff like RDMA, which doesn't need this sort of >> mechanism in order to get things right? > >But RDMA may benefit from the proposed interface for transport >specific receive buffers (rpc_data objects). How that buffer works is >entirely up to the transport. For TCP and UDP it's skb_lists, but for >RDMA it would probably be something very different. FWIW, I'm still looking at this from the RDMA perspective. I like the abstraction of the data representation, but I don't like the protocol specific switch statements that are in the data movement. These are cumbersome and don't lend themselves to dynamic loading. Also, RDMA isn't a new protocol family, so it doesn't lend itself to an IPv4/IPv6 and UDP/TCP kind of switch. >Here's the mode of operation - XDR functions that expect to receive >data to a pagevec, such as READ, READLINK etc, call >xprt_rcvbuf_alloc(xprt, pages, pgbase, pglen) to allocate a >transport specific buffer object. Transports such as TCP or UDP >ignore the page vector, but the RDMA transport could use this >to do set up its buffers. From the implementation point of view, The RDMA code turns out to not need this, because the xdr iov's have a clear encoding for pagelists (head->pages[]->tail). The only ambiguity comes from the upper layer - NFS READDIRs for example send down a pagelist, but they don't expect to cache the result directly like for READ. We use a different RDMA operation for them, and there's a guess made at the lower layer. However, moving this function to XDR doesn't fix it. So I'm semi-neutral on the proposed change from the RDMA perspective. >I believe the net result of this proposed restructuring will be >less complexity. Right now we have half a dozen or so >functions that walk through an xdr_buf (head, pagevec, tail)... >whereas with the proposed changes, you have the complexity >confined to one place. This is good, but I'd like to see a stronger benefit, personally. More comments after I get a better look. Tom. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cel's patches under development 2007-04-11 11:39 ` Talpey, Thomas @ 2007-04-11 12:02 ` Olaf Kirch 2007-04-11 12:32 ` Talpey, Thomas 0 siblings, 1 reply; 7+ messages in thread From: Olaf Kirch @ 2007-04-11 12:02 UTC (permalink / raw) To: Talpey, Thomas; +Cc: nfs, Trond Myklebust On Wednesday 11 April 2007 13:39, Talpey, Thomas wrote: > FWIW, I'm still looking at this from the RDMA perspective. I like the > abstraction of the data representation, but I don't like the protocol > specific switch statements that are in the data movement. These Right now, the proposed rpc_data thing has two function pointers, so no switch needed. > are cumbersome and don't lend themselves to dynamic loading. Also, > RDMA isn't a new protocol family, so it doesn't lend itself to an > IPv4/IPv6 and UDP/TCP kind of switch. How is it integrated on the client side? Wouldn't this be another transport in terms of Chuck's transport switch? If so, it would just provide a function for creating an RDMA specific rpc_data object that does the right thing. > The RDMA code turns out to not need this, because the xdr iov's > have a clear encoding for pagelists (head->pages[]->tail). But that means yet another non-obvious user of xdr_buf that makes it even harder to rewrite that whole code :-) Wouldn't it be better to make it explicit "here is a bunch of pages that I want you to put the bulk data into"? Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cel's patches under development 2007-04-11 12:02 ` Olaf Kirch @ 2007-04-11 12:32 ` Talpey, Thomas 0 siblings, 0 replies; 7+ messages in thread From: Talpey, Thomas @ 2007-04-11 12:32 UTC (permalink / raw) To: Olaf Kirch; +Cc: nfs At 08:02 AM 4/11/2007, Olaf Kirch wrote: >> are cumbersome and don't lend themselves to dynamic loading. Also, >> RDMA isn't a new protocol family, so it doesn't lend itself to an >> IPv4/IPv6 and UDP/TCP kind of switch. > >How is it integrated on the client side? Wouldn't this be another >transport in terms of Chuck's transport switch? If so, it would just >provide a function for creating an RDMA specific rpc_data object >that does the right thing. Sure, it's switched, though frankly it's imperfect since the switch selects based on address family / protocol naming just like others. But once we hack^H^H^H^Hwork around that it just jumps to the rdma transport setup. >> The RDMA code turns out to not need this, because the xdr iov's >> have a clear encoding for pagelists (head->pages[]->tail). > >But that means yet another non-obvious user of xdr_buf that makes >it even harder to rewrite that whole code :-) Wouldn't it be better >to make it explicit "here is a bunch of pages that I want you to >put the bulk data into"? Yep, and I agree. Not sure about the "non-obvious" thing though, it seems like a direct application of the structure to me. What I really want is some kind of upper layer hint that follows the xdr buffer, perhaps as explicit as "this is a read or write", or at least "this is bulk data". Basically, something from xdr_inline_pages/xdr_encode_pages that indicates the purpose. How it's accounted for and represented in iovecs isn't as much of a concern. I think a second part of your proposal is that there'd be a new callout to set up these buffers for i/o prior to send marshalling. For RDMA we basically do that in the transport switch send routine, where we insert the per-transport headers (RDMA chunk headers, TCP just inserts its record marker). I'm stil thinking about whether there's some advantage to a separate callout. Tom. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-04-11 12:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-26 23:09 cel's patches under development Chuck Lever 2007-04-10 15:22 ` Olaf Kirch 2007-04-10 19:32 ` Trond Myklebust 2007-04-11 6:41 ` Olaf Kirch 2007-04-11 11:39 ` Talpey, Thomas 2007-04-11 12:02 ` Olaf Kirch 2007-04-11 12:32 ` Talpey, Thomas
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.