* Reading NFS file without copying to user-space? @ 2009-09-04 19:48 Ben Greear 2009-09-04 20:35 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Ben Greear @ 2009-09-04 19:48 UTC (permalink / raw) To: linux-nfs I'm trying to optimize a tool that should do NFS reads as fast as possible from a server in order to stress test the server. Currently, I open the file as normal and read into a pre-allocated buffer. This causes a copy of the data to user-space. Is there any way to cause the nfs client logic to still request the file-read, but not actually copy anything to user-space? Maybe some trick with mmap would do this? Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 19:48 Reading NFS file without copying to user-space? Ben Greear @ 2009-09-04 20:35 ` Trond Myklebust [not found] ` <1252096543.2402.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2009-09-04 20:35 UTC (permalink / raw) To: Ben Greear; +Cc: linux-nfs On Fri, 2009-09-04 at 12:48 -0700, Ben Greear wrote: > I'm trying to optimize a tool that should do NFS reads as fast as possible > from a server in order to stress test the server. > > Currently, I open the file as normal and read into a pre-allocated buffer. > > This causes a copy of the data to user-space. > > Is there any way to cause the nfs client logic to still request the file-read, > but not actually copy anything to user-space? > > Maybe some trick with mmap would do this? How about using O_DIRECT? That just copies the data directly into user pages and avoids all the overhead of using the page cache? Note that you can combine O_DIRECT with aio in order to further increase the speeds. Cheers Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <1252096543.2402.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: Reading NFS file without copying to user-space? [not found] ` <1252096543.2402.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-09-04 20:49 ` Ben Greear 2009-09-04 20:58 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Ben Greear @ 2009-09-04 20:49 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On 09/04/2009 01:35 PM, Trond Myklebust wrote: > On Fri, 2009-09-04 at 12:48 -0700, Ben Greear wrote: >> I'm trying to optimize a tool that should do NFS reads as fast as possible >> from a server in order to stress test the server. >> >> Currently, I open the file as normal and read into a pre-allocated buffer. >> >> This causes a copy of the data to user-space. >> >> Is there any way to cause the nfs client logic to still request the file-read, >> but not actually copy anything to user-space? >> >> Maybe some trick with mmap would do this? > > How about using O_DIRECT? That just copies the data directly into user > pages and avoids all the overhead of using the page cache? > > Note that you can combine O_DIRECT with aio in order to further increase > the speeds. I'm using O_DIRECT (so that the server is continually stressed even if the file would have otherwise been cached locally on the client). This still causes a copy of the contents to user-space when I do a read() call though, as far as I can tell. Since I'm normally not looking at this data at all, the memory copy from kernel to user is wasted effort in my case. I haven't looked into aio yet..will go do some googling... Thanks, Ben > > Cheers > Trond -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 20:49 ` Ben Greear @ 2009-09-04 20:58 ` Trond Myklebust 2009-09-04 21:12 ` Ben Greear 2009-09-04 21:57 ` Ben Greear 0 siblings, 2 replies; 11+ messages in thread From: Trond Myklebust @ 2009-09-04 20:58 UTC (permalink / raw) To: Ben Greear; +Cc: linux-nfs@vger.kernel.org On Sep 4, 2009, at 16:49, Ben Greear <greearb@candelatech.com> wrote: > I'm using O_DIRECT (so that the server is continually stressed even if > the file would have otherwise been cached locally on the client). > > This still causes a copy of the contents to user-space when I do a > read() call though, as far as I can tell. Since I'm normally not > looking > at this data at all, the memory copy from kernel to user is wasted > effort in my case. You're missing the point. O_DIRECT does not copy data from the kernel into userspace. The data is placed directly into the user buffer from the socket. The only faster alternative would be to directly discard the data in the socket, and we offer no option to do that. Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 20:58 ` Trond Myklebust @ 2009-09-04 21:12 ` Ben Greear 2009-09-04 22:00 ` Trond Myklebust 2009-09-04 21:57 ` Ben Greear 1 sibling, 1 reply; 11+ messages in thread From: Ben Greear @ 2009-09-04 21:12 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On 09/04/2009 01:58 PM, Trond Myklebust wrote: > On Sep 4, 2009, at 16:49, Ben Greear <greearb@candelatech.com> wrote: > >> I'm using O_DIRECT (so that the server is continually stressed even if >> the file would have otherwise been cached locally on the client). >> >> This still causes a copy of the contents to user-space when I do a >> read() call though, as far as I can tell. Since I'm normally not looking >> at this data at all, the memory copy from kernel to user is wasted >> effort in my case. > > You're missing the point. O_DIRECT does not copy data from the kernel > into userspace. The data is placed directly into the user buffer from > the socket. I may be going about things all wrong... > > The only faster alternative would be to directly discard the data in the > socket, and we offer no option to do that. I'm opening an fd like this: uint32 flgs = O_RDONLY | O_DIRECT | O_LARGEFILE; fd = open(fname, flgs); Then read from the fd it: int retval = read(fd, rcv_buffer_ptr, my_read_len); rcv_buffer_ptr is just a 1MB (or so) array of bytes. Maybe I need to use aio_read with O_DIRECT to get the benefits you speak of? Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 21:12 ` Ben Greear @ 2009-09-04 22:00 ` Trond Myklebust 0 siblings, 0 replies; 11+ messages in thread From: Trond Myklebust @ 2009-09-04 22:00 UTC (permalink / raw) To: Ben Greear; +Cc: linux-nfs@vger.kernel.org On Sep 4, 2009, at 17:12, Ben Greear <greearb@candelatech.com> wrote: > On 09/04/2009 01:58 PM, Trond Myklebust wrote: >> On Sep 4, 2009, at 16:49, Ben Greear <greearb@candelatech.com> wrote: >> >>> I'm using O_DIRECT (so that the server is continually stressed >>> even if >>> the file would have otherwise been cached locally on the client). >>> >>> This still causes a copy of the contents to user-space when I do a >>> read() call though, as far as I can tell. Since I'm normally not >>> looking >>> at this data at all, the memory copy from kernel to user is wasted >>> effort in my case. >> >> You're missing the point. O_DIRECT does not copy data from the kernel >> into userspace. The data is placed directly into the user buffer from >> the socket. > > I may be going about things all wrong... > >> >> The only faster alternative would be to directly discard the data >> in the >> socket, and we offer no option to do that. > > I'm opening an fd like this: > > > uint32 flgs = O_RDONLY | O_DIRECT | O_LARGEFILE; > fd = open(fname, flgs); > > Then read from the fd it: > int retval = read(fd, rcv_buffer_ptr, my_read_len); > > rcv_buffer_ptr is just a 1MB (or so) array of bytes. > Use a (much) larger buffer. Linux clients are capable of reading 2MB in a single RPC, so you won't be doing much in the way of parallel reads with 1MB. I'd also suggest bumping up the number of tcp slots (see in /proc/sys/ fs/nfs/). This should be done before you mount the NFS partition. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 20:58 ` Trond Myklebust 2009-09-04 21:12 ` Ben Greear @ 2009-09-04 21:57 ` Ben Greear 2009-09-04 22:15 ` Trond Myklebust 1 sibling, 1 reply; 11+ messages in thread From: Ben Greear @ 2009-09-04 21:57 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On 09/04/2009 01:58 PM, Trond Myklebust wrote: > You're missing the point. O_DIRECT does not copy data from the kernel > into userspace. The data is placed directly into the user buffer from > the socket. > > The only faster alternative would be to directly discard the data in the > socket, and we offer no option to do that. I was thinking I might be clever and use sendfile to send an nfs file to /dev/zero, but unfortunately it seems sendfile can only send to a destination that is a socket.... Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 21:57 ` Ben Greear @ 2009-09-04 22:15 ` Trond Myklebust [not found] ` <1252102506.5274.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2009-09-04 22:15 UTC (permalink / raw) To: Ben Greear; +Cc: linux-nfs@vger.kernel.org On Fri, 2009-09-04 at 14:57 -0700, Ben Greear wrote: > On 09/04/2009 01:58 PM, Trond Myklebust wrote: > > > You're missing the point. O_DIRECT does not copy data from the kernel > > into userspace. The data is placed directly into the user buffer from > > the socket. > > > > The only faster alternative would be to directly discard the data in the > > socket, and we offer no option to do that. > > I was thinking I might be clever and use sendfile to send an nfs > file to /dev/zero, but unfortunately it seems sendfile can only send > to a destination that is a socket.... Why do you think that would be any faster than standard O_DIRECT? It should be slower, since it involves an extra copy. Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <1252102506.5274.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: Reading NFS file without copying to user-space? [not found] ` <1252102506.5274.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-09-04 22:30 ` Ben Greear 2009-09-04 22:49 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Ben Greear @ 2009-09-04 22:30 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On 09/04/2009 03:15 PM, Trond Myklebust wrote: > On Fri, 2009-09-04 at 14:57 -0700, Ben Greear wrote: >> On 09/04/2009 01:58 PM, Trond Myklebust wrote: >> >>> You're missing the point. O_DIRECT does not copy data from the kernel >>> into userspace. The data is placed directly into the user buffer from >>> the socket. >>> >>> The only faster alternative would be to directly discard the data in the >>> socket, and we offer no option to do that. >> >> I was thinking I might be clever and use sendfile to send an nfs >> file to /dev/zero, but unfortunately it seems sendfile can only send >> to a destination that is a socket.... > > Why do you think that would be any faster than standard O_DIRECT? It > should be slower, since it involves an extra copy. I was thinking that the kernel might take the data received in the skb's from the file-server and send it to /dev/null, ie basically just immediately discard the received data. If it could do that, it would be a zero-copy read: The only copying would be the NIC DMA'ing the packet into the skb. It would also seem to me that if one allowed sendfile to copy between files, it could do the same trick saving to a real file and save user-space having to read the file in and then write it out again to disk. Truth is, I don't know much about the low level of file-io, so I may be completely confused about things :) I'll try using much larger buffers for the read() call, and will also make sure the networking buffer pools are big enough. Out of curiosity, any one have any benchmarks for NFS on 10G hardware? I have two 2.6.31-rc8 Linux systems that for a short time will serve & sink about 9Gbps of file-io (serving from 2GB tmpfs, discarding as soon as we read). Something goes weird after a minute or two and bandwidth drops down and bounces between 4Gbps-8Gbps. Based on testing against another vendor's nfs server, it seems that the client is loosing packets (the server shows tcp retransmits). Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 22:30 ` Ben Greear @ 2009-09-04 22:49 ` Trond Myklebust 2009-09-04 23:03 ` Ben Greear 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2009-09-04 22:49 UTC (permalink / raw) To: Ben Greear; +Cc: linux-nfs@vger.kernel.org On Fri, 2009-09-04 at 15:30 -0700, Ben Greear wrote: > On 09/04/2009 03:15 PM, Trond Myklebust wrote: > > On Fri, 2009-09-04 at 14:57 -0700, Ben Greear wrote: > >> On 09/04/2009 01:58 PM, Trond Myklebust wrote: > >> > >>> You're missing the point. O_DIRECT does not copy data from the kernel > >>> into userspace. The data is placed directly into the user buffer from > >>> the socket. > >>> > >>> The only faster alternative would be to directly discard the data in the > >>> socket, and we offer no option to do that. > >> > >> I was thinking I might be clever and use sendfile to send an nfs > >> file to /dev/zero, but unfortunately it seems sendfile can only send > >> to a destination that is a socket.... > > > > Why do you think that would be any faster than standard O_DIRECT? It > > should be slower, since it involves an extra copy. > > I was thinking that the kernel might take the data received in the skb's from > the file-server and send it to /dev/null, ie basically just immediately > discard the received data. If it could do that, it would be a zero-copy > read: The only copying would be the NIC DMA'ing the packet into the skb. No... The RPC layer will always copy the data from the socket into a buffer. If you are using O_DIRECT reads, then that buffer will be the same one that you supplied in userland (the kernel just uses page table trickery to map those pages into the kernel address space). If you are using any other type of read (even if it is being piped using sendfile() or splice()) then it will copy that data into the NFS filesystem's page cache. > It would also seem to me that if one allowed sendfile to copy between > files, it could do the same trick saving to a real file and save user-space > having to read the file in and then write it out again to disk. As I said above, sendfile and splice don't work that way. They both use the page cache as the source, so the filesystem needs to fill the page cache first. > Out of curiosity, any one have any benchmarks for NFS on 10G hardware? I'm not aware of any public figures. I'd be interested to hear how you max out. > Based on testing against another vendor's nfs server, it seems that the client > is loosing packets (the server shows tcp retransmits). Is the data being lost at the client, the switch or the server? Assuming that you are using a managed switch, then a look at its statistics should be able to answer that question. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Reading NFS file without copying to user-space? 2009-09-04 22:49 ` Trond Myklebust @ 2009-09-04 23:03 ` Ben Greear 0 siblings, 0 replies; 11+ messages in thread From: Ben Greear @ 2009-09-04 23:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On 09/04/2009 03:49 PM, Trond Myklebust wrote: > On Fri, 2009-09-04 at 15:30 -0700, Ben Greear wrote: >> I was thinking that the kernel might take the data received in the skb's from >> the file-server and send it to /dev/null, ie basically just immediately >> discard the received data. If it could do that, it would be a zero-copy >> read: The only copying would be the NIC DMA'ing the packet into the skb. > > No... The RPC layer will always copy the data from the socket into a > buffer. If you are using O_DIRECT reads, then that buffer will be the > same one that you supplied in userland (the kernel just uses page table > trickery to map those pages into the kernel address space). If you are > using any other type of read (even if it is being piped using sendfile() > or splice()) then it will copy that data into the NFS filesystem's page > cache. Ok, I think I understand that better now. Seems like one could have RPC use a list of skbs as data store instead of copying the data, but perhaps that would be optimizing for something no one would ever really want in the real world. >> Out of curiosity, any one have any benchmarks for NFS on 10G hardware? > > I'm not aware of any public figures. I'd be interested to hear how you > max out. > >> Based on testing against another vendor's nfs server, it seems that the client >> is loosing packets (the server shows tcp retransmits). > > Is the data being lost at the client, the switch or the server? Assuming > that you are using a managed switch, then a look at its statistics > should be able to answer that question. At least for my local linux - linux tests, I'm using just fibre optic cable to connect them, so definitely not a switch problem here. No obvious errors reported by either NIC, and pktgen tests show that they can easily sustain 9Gbps. I need to do more detailed looking at the netstat counters and such. I suspect I may have too-small network buffers. I last set up their defaults when a 1GB RAM system was 'high end', and now I'm using 12GB systems :P Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-09-04 23:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-04 19:48 Reading NFS file without copying to user-space? Ben Greear
2009-09-04 20:35 ` Trond Myklebust
[not found] ` <1252096543.2402.4.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-09-04 20:49 ` Ben Greear
2009-09-04 20:58 ` Trond Myklebust
2009-09-04 21:12 ` Ben Greear
2009-09-04 22:00 ` Trond Myklebust
2009-09-04 21:57 ` Ben Greear
2009-09-04 22:15 ` Trond Myklebust
[not found] ` <1252102506.5274.7.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-09-04 22:30 ` Ben Greear
2009-09-04 22:49 ` Trond Myklebust
2009-09-04 23:03 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).