nfs client and io_uring zero copy receive

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* nfs client and io_uring zero copy receive
@ 2025-07-22 18:10 Anton Gavriliuk
  2025-07-22 18:43 ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Anton Gavriliuk @ 2025-07-22 18:10 UTC (permalink / raw)
  To: linux-nfs

Hi

I am trying to exceed 20 GB/s doing sequential read from a single file
on the nfs client.

perf top shows excessive memcpy usage:

Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.):
120872739112 lost: 0/0 drop: 0/0
Overhead  Shared Object                      Symbol
  20,54%  [kernel]                           [k] memcpy
   6,52%  [nfs]                              [k] nfs_generic_pg_test
   5,12%  [nfs]                              [k] nfs_page_group_lock
   4,92%  [kernel]                           [k] _copy_to_iter
   4,79%  [kernel]                           [k] gro_list_prepare
   2,77%  [nfs]                              [k] nfs_clear_request
   2,10%  [nfs]                              [k] __nfs_pageio_add_request
   2,07%  [kernel]                           [k] check_heap_object
   2,00%  [kernel]                           [k] __slab_free

Can nfs client be adopted to use zero copy ?, for example by using
io_uring zero copy rx.

Anton

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nfs client and io_uring zero copy receive
  2025-07-22 18:10 nfs client and io_uring zero copy receive Anton Gavriliuk
@ 2025-07-22 18:43 ` Trond Myklebust
  2025-07-22 19:01   ` Anton Gavriliuk
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2025-07-22 18:43 UTC (permalink / raw)
  To: Anton Gavriliuk, linux-nfs

On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote:
> Hi
> 
> I am trying to exceed 20 GB/s doing sequential read from a single
> file
> on the nfs client.
> 
> perf top shows excessive memcpy usage:
> 
> Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.):
> 120872739112 lost: 0/0 drop: 0/0
> Overhead  Shared Object                      Symbol
>   20,54%  [kernel]                           [k] memcpy
>    6,52%  [nfs]                              [k] nfs_generic_pg_test
>    5,12%  [nfs]                              [k] nfs_page_group_lock
>    4,92%  [kernel]                           [k] _copy_to_iter
>    4,79%  [kernel]                           [k] gro_list_prepare
>    2,77%  [nfs]                              [k] nfs_clear_request
>    2,10%  [nfs]                              [k]
> __nfs_pageio_add_request
>    2,07%  [kernel]                           [k] check_heap_object
>    2,00%  [kernel]                           [k] __slab_free
> 
> Can nfs client be adopted to use zero copy ?, for example by using
> io_uring zero copy rx.
> 

The client has no idea in which order the server will return replies to
the RPC calls it sends. So no, it can't queue up those reply buffers in
advance.

The only way you can avoid memory copies here is to use RDMA to allow
the server to write its replies directly into the correct client read
buffers.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nfs client and io_uring zero copy receive
  2025-07-22 18:43 ` Trond Myklebust
@ 2025-07-22 19:01   ` Anton Gavriliuk
  2025-07-22 19:40     ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Anton Gavriliuk @ 2025-07-22 19:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

> The only way you can avoid memory copies here is to use RDMA to allow
> the server to write its replies directly into the correct client read
> buffers.

I remounted with rdma

[root@23-127-77-6 ~]# mount -t nfs -o
proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt
/mnt
[root@23-127-77-6 ~]# mount -v|grep -i rdma
192.168.0.7:/mnt on /mnt type nfs4
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fatal_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7)
[root@23-127-77-6 ~]#

and repeat sequential read.

According to perf top, memcpy is gone,

Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.):
22510217633 lost: 0/0 drop: 0/0
Overhead  Shared Object                      Symbol
  13,12%  [nfs]                              [k] nfs_generic_pg_test
  11,32%  [nfs]                              [k] nfs_page_group_lock
  10,42%  [nfs]                              [k] nfs_clear_request
   5,41%  [kernel]                           [k] gup_fast_pte_range
   4,11%  [nfs]                              [k] nfs_page_group_sync_on_bit
   3,36%  [nfs]                              [k] nfs_page_create
   3,13%  [nfs]                              [k] __nfs_pageio_add_request
   2,10%  [nfs]                              [k] __nfs_find_lock_context

but it didn't improve read bandwidth at all.  Even slightly worse
compared to proto=tcp.

Anton

вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@kernel.org>:
>
> On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote:
> > Hi
> >
> > I am trying to exceed 20 GB/s doing sequential read from a single
> > file
> > on the nfs client.
> >
> > perf top shows excessive memcpy usage:
> >
> > Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.):
> > 120872739112 lost: 0/0 drop: 0/0
> > Overhead  Shared Object                      Symbol
> >   20,54%  [kernel]                           [k] memcpy
> >    6,52%  [nfs]                              [k] nfs_generic_pg_test
> >    5,12%  [nfs]                              [k] nfs_page_group_lock
> >    4,92%  [kernel]                           [k] _copy_to_iter
> >    4,79%  [kernel]                           [k] gro_list_prepare
> >    2,77%  [nfs]                              [k] nfs_clear_request
> >    2,10%  [nfs]                              [k]
> > __nfs_pageio_add_request
> >    2,07%  [kernel]                           [k] check_heap_object
> >    2,00%  [kernel]                           [k] __slab_free
> >
> > Can nfs client be adopted to use zero copy ?, for example by using
> > io_uring zero copy rx.
> >
>
> The client has no idea in which order the server will return replies to
> the RPC calls it sends. So no, it can't queue up those reply buffers in
> advance.
>
> The only way you can avoid memory copies here is to use RDMA to allow
> the server to write its replies directly into the correct client read
> buffers.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: nfs client and io_uring zero copy receive
  2025-07-22 19:01   ` Anton Gavriliuk
@ 2025-07-22 19:40     ` Trond Myklebust
  0 siblings, 0 replies; 4+ messages in thread
From: Trond Myklebust @ 2025-07-22 19:40 UTC (permalink / raw)
  To: Anton Gavriliuk; +Cc: linux-nfs

On Tue, 2025-07-22 at 22:01 +0300, Anton Gavriliuk wrote:
> > The only way you can avoid memory copies here is to use RDMA to
> > allow
> > the server to write its replies directly into the correct client
> > read
> > buffers.
> 
> I remounted with rdma
> 
> [root@23-127-77-6 ~]# mount -t nfs -o
> proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt
> /mnt
> [root@23-127-77-6 ~]# mount -v|grep -i rdma
> 192.168.0.7:/mnt on /mnt type nfs4
> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat
> al_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans
> =2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7)
> [root@23-127-77-6 ~]#
> 
> and repeat sequential read.
> 
> According to perf top, memcpy is gone,
> 
> Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.):
> 22510217633 lost: 0/0 drop: 0/0
> Overhead  Shared Object                      Symbol
>   13,12%  [nfs]                              [k] nfs_generic_pg_test
>   11,32%  [nfs]                              [k] nfs_page_group_lock
>   10,42%  [nfs]                              [k] nfs_clear_request
>    5,41%  [kernel]                           [k] gup_fast_pte_range
>    4,11%  [nfs]                              [k]
> nfs_page_group_sync_on_bit
>    3,36%  [nfs]                              [k] nfs_page_create
>    3,13%  [nfs]                              [k]
> __nfs_pageio_add_request
>    2,10%  [nfs]                              [k]
> __nfs_find_lock_context
> 
> but it didn't improve read bandwidth at all.  Even slightly worse
> compared to proto=tcp.

So that more or less proves that those memcpys were never the root
cause of your performance problem.

I suspect you'll want to look at the server performance. Maybe also
look at the client tunables that limit concurrency, such as the
sunrpc.rdma_slot_table_entries sysctl, or the nfs.max_session_slots
module parameter, etc.

> 
> Anton
> 
> вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@kernel.org>:
> > 
> > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote:
> > > Hi
> > > 
> > > I am trying to exceed 20 GB/s doing sequential read from a single
> > > file
> > > on the nfs client.
> > > 
> > > perf top shows excessive memcpy usage:
> > > 
> > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count
> > > (approx.):
> > > 120872739112 lost: 0/0 drop: 0/0
> > > Overhead  Shared Object                      Symbol
> > >   20,54%  [kernel]                           [k] memcpy
> > >    6,52%  [nfs]                              [k]
> > > nfs_generic_pg_test
> > >    5,12%  [nfs]                              [k]
> > > nfs_page_group_lock
> > >    4,92%  [kernel]                           [k] _copy_to_iter
> > >    4,79%  [kernel]                           [k] gro_list_prepare
> > >    2,77%  [nfs]                              [k]
> > > nfs_clear_request
> > >    2,10%  [nfs]                              [k]
> > > __nfs_pageio_add_request
> > >    2,07%  [kernel]                           [k]
> > > check_heap_object
> > >    2,00%  [kernel]                           [k] __slab_free
> > > 
> > > Can nfs client be adopted to use zero copy ?, for example by
> > > using
> > > io_uring zero copy rx.
> > > 
> > 
> > The client has no idea in which order the server will return
> > replies to
> > the RPC calls it sends. So no, it can't queue up those reply
> > buffers in
> > advance.
> > 
> > The only way you can avoid memory copies here is to use RDMA to
> > allow
> > the server to write its replies directly into the correct client
> > read
> > buffers.
> > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-22 19:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22 18:10 nfs client and io_uring zero copy receive Anton Gavriliuk
2025-07-22 18:43 ` Trond Myklebust
2025-07-22 19:01   ` Anton Gavriliuk
2025-07-22 19:40     ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox