* nfs client and io_uring zero copy receive @ 2025-07-22 18:10 Anton Gavriliuk 2025-07-22 18:43 ` Trond Myklebust 0 siblings, 1 reply; 4+ messages in thread From: Anton Gavriliuk @ 2025-07-22 18:10 UTC (permalink / raw) To: linux-nfs Hi I am trying to exceed 20 GB/s doing sequential read from a single file on the nfs client. perf top shows excessive memcpy usage: Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.): 120872739112 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 20,54% [kernel] [k] memcpy 6,52% [nfs] [k] nfs_generic_pg_test 5,12% [nfs] [k] nfs_page_group_lock 4,92% [kernel] [k] _copy_to_iter 4,79% [kernel] [k] gro_list_prepare 2,77% [nfs] [k] nfs_clear_request 2,10% [nfs] [k] __nfs_pageio_add_request 2,07% [kernel] [k] check_heap_object 2,00% [kernel] [k] __slab_free Can nfs client be adopted to use zero copy ?, for example by using io_uring zero copy rx. Anton ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: nfs client and io_uring zero copy receive 2025-07-22 18:10 nfs client and io_uring zero copy receive Anton Gavriliuk @ 2025-07-22 18:43 ` Trond Myklebust 2025-07-22 19:01 ` Anton Gavriliuk 0 siblings, 1 reply; 4+ messages in thread From: Trond Myklebust @ 2025-07-22 18:43 UTC (permalink / raw) To: Anton Gavriliuk, linux-nfs On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote: > Hi > > I am trying to exceed 20 GB/s doing sequential read from a single > file > on the nfs client. > > perf top shows excessive memcpy usage: > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.): > 120872739112 lost: 0/0 drop: 0/0 > Overhead Shared Object Symbol > 20,54% [kernel] [k] memcpy > 6,52% [nfs] [k] nfs_generic_pg_test > 5,12% [nfs] [k] nfs_page_group_lock > 4,92% [kernel] [k] _copy_to_iter > 4,79% [kernel] [k] gro_list_prepare > 2,77% [nfs] [k] nfs_clear_request > 2,10% [nfs] [k] > __nfs_pageio_add_request > 2,07% [kernel] [k] check_heap_object > 2,00% [kernel] [k] __slab_free > > Can nfs client be adopted to use zero copy ?, for example by using > io_uring zero copy rx. > The client has no idea in which order the server will return replies to the RPC calls it sends. So no, it can't queue up those reply buffers in advance. The only way you can avoid memory copies here is to use RDMA to allow the server to write its replies directly into the correct client read buffers. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@kernel.org, trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: nfs client and io_uring zero copy receive 2025-07-22 18:43 ` Trond Myklebust @ 2025-07-22 19:01 ` Anton Gavriliuk 2025-07-22 19:40 ` Trond Myklebust 0 siblings, 1 reply; 4+ messages in thread From: Anton Gavriliuk @ 2025-07-22 19:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs > The only way you can avoid memory copies here is to use RDMA to allow > the server to write its replies directly into the correct client read > buffers. I remounted with rdma [root@23-127-77-6 ~]# mount -t nfs -o proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt /mnt [root@23-127-77-6 ~]# mount -v|grep -i rdma 192.168.0.7:/mnt on /mnt type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fatal_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7) [root@23-127-77-6 ~]# and repeat sequential read. According to perf top, memcpy is gone, Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.): 22510217633 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 13,12% [nfs] [k] nfs_generic_pg_test 11,32% [nfs] [k] nfs_page_group_lock 10,42% [nfs] [k] nfs_clear_request 5,41% [kernel] [k] gup_fast_pte_range 4,11% [nfs] [k] nfs_page_group_sync_on_bit 3,36% [nfs] [k] nfs_page_create 3,13% [nfs] [k] __nfs_pageio_add_request 2,10% [nfs] [k] __nfs_find_lock_context but it didn't improve read bandwidth at all. Even slightly worse compared to proto=tcp. Anton вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@kernel.org>: > > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote: > > Hi > > > > I am trying to exceed 20 GB/s doing sequential read from a single > > file > > on the nfs client. > > > > perf top shows excessive memcpy usage: > > > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.): > > 120872739112 lost: 0/0 drop: 0/0 > > Overhead Shared Object Symbol > > 20,54% [kernel] [k] memcpy > > 6,52% [nfs] [k] nfs_generic_pg_test > > 5,12% [nfs] [k] nfs_page_group_lock > > 4,92% [kernel] [k] _copy_to_iter > > 4,79% [kernel] [k] gro_list_prepare > > 2,77% [nfs] [k] nfs_clear_request > > 2,10% [nfs] [k] > > __nfs_pageio_add_request > > 2,07% [kernel] [k] check_heap_object > > 2,00% [kernel] [k] __slab_free > > > > Can nfs client be adopted to use zero copy ?, for example by using > > io_uring zero copy rx. > > > > The client has no idea in which order the server will return replies to > the RPC calls it sends. So no, it can't queue up those reply buffers in > advance. > > The only way you can avoid memory copies here is to use RDMA to allow > the server to write its replies directly into the correct client read > buffers. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trondmy@kernel.org, trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: nfs client and io_uring zero copy receive 2025-07-22 19:01 ` Anton Gavriliuk @ 2025-07-22 19:40 ` Trond Myklebust 0 siblings, 0 replies; 4+ messages in thread From: Trond Myklebust @ 2025-07-22 19:40 UTC (permalink / raw) To: Anton Gavriliuk; +Cc: linux-nfs On Tue, 2025-07-22 at 22:01 +0300, Anton Gavriliuk wrote: > > The only way you can avoid memory copies here is to use RDMA to > > allow > > the server to write its replies directly into the correct client > > read > > buffers. > > I remounted with rdma > > [root@23-127-77-6 ~]# mount -t nfs -o > proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt > /mnt > [root@23-127-77-6 ~]# mount -v|grep -i rdma > 192.168.0.7:/mnt on /mnt type nfs4 > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat > al_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans > =2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7) > [root@23-127-77-6 ~]# > > and repeat sequential read. > > According to perf top, memcpy is gone, > > Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.): > 22510217633 lost: 0/0 drop: 0/0 > Overhead Shared Object Symbol > 13,12% [nfs] [k] nfs_generic_pg_test > 11,32% [nfs] [k] nfs_page_group_lock > 10,42% [nfs] [k] nfs_clear_request > 5,41% [kernel] [k] gup_fast_pte_range > 4,11% [nfs] [k] > nfs_page_group_sync_on_bit > 3,36% [nfs] [k] nfs_page_create > 3,13% [nfs] [k] > __nfs_pageio_add_request > 2,10% [nfs] [k] > __nfs_find_lock_context > > but it didn't improve read bandwidth at all. Even slightly worse > compared to proto=tcp. So that more or less proves that those memcpys were never the root cause of your performance problem. I suspect you'll want to look at the server performance. Maybe also look at the client tunables that limit concurrency, such as the sunrpc.rdma_slot_table_entries sysctl, or the nfs.max_session_slots module parameter, etc. > > Anton > > вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@kernel.org>: > > > > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote: > > > Hi > > > > > > I am trying to exceed 20 GB/s doing sequential read from a single > > > file > > > on the nfs client. > > > > > > perf top shows excessive memcpy usage: > > > > > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count > > > (approx.): > > > 120872739112 lost: 0/0 drop: 0/0 > > > Overhead Shared Object Symbol > > > 20,54% [kernel] [k] memcpy > > > 6,52% [nfs] [k] > > > nfs_generic_pg_test > > > 5,12% [nfs] [k] > > > nfs_page_group_lock > > > 4,92% [kernel] [k] _copy_to_iter > > > 4,79% [kernel] [k] gro_list_prepare > > > 2,77% [nfs] [k] > > > nfs_clear_request > > > 2,10% [nfs] [k] > > > __nfs_pageio_add_request > > > 2,07% [kernel] [k] > > > check_heap_object > > > 2,00% [kernel] [k] __slab_free > > > > > > Can nfs client be adopted to use zero copy ?, for example by > > > using > > > io_uring zero copy rx. > > > > > > > The client has no idea in which order the server will return > > replies to > > the RPC calls it sends. So no, it can't queue up those reply > > buffers in > > advance. > > > > The only way you can avoid memory copies here is to use RDMA to > > allow > > the server to write its replies directly into the correct client > > read > > buffers. > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trondmy@kernel.org, trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-22 19:40 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-22 18:10 nfs client and io_uring zero copy receive Anton Gavriliuk 2025-07-22 18:43 ` Trond Myklebust 2025-07-22 19:01 ` Anton Gavriliuk 2025-07-22 19:40 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox