public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@kernel.org>
To: Anton Gavriliuk <antosha20xx@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: nfs client and io_uring zero copy receive
Date: Tue, 22 Jul 2025 15:40:06 -0400	[thread overview]
Message-ID: <60b1e1be9ce67496e8774ccb64e9ff637ab2a75d.camel@kernel.org> (raw)
In-Reply-To: <CAAiJnjrmeZUexNkJJmvuUDKvTqvuQhahWY2uFhOgBOmoLrLbLw@mail.gmail.com>

On Tue, 2025-07-22 at 22:01 +0300, Anton Gavriliuk wrote:
> > The only way you can avoid memory copies here is to use RDMA to
> > allow
> > the server to write its replies directly into the correct client
> > read
> > buffers.
> 
> I remounted with rdma
> 
> [root@23-127-77-6 ~]# mount -t nfs -o
> proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt
> /mnt
> [root@23-127-77-6 ~]# mount -v|grep -i rdma
> 192.168.0.7:/mnt on /mnt type nfs4
> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat
> al_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans
> =2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7)
> [root@23-127-77-6 ~]#
> 
> and repeat sequential read.
> 
> According to perf top, memcpy is gone,
> 
> Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.):
> 22510217633 lost: 0/0 drop: 0/0
> Overhead  Shared Object                      Symbol
>   13,12%  [nfs]                              [k] nfs_generic_pg_test
>   11,32%  [nfs]                              [k] nfs_page_group_lock
>   10,42%  [nfs]                              [k] nfs_clear_request
>    5,41%  [kernel]                           [k] gup_fast_pte_range
>    4,11%  [nfs]                              [k]
> nfs_page_group_sync_on_bit
>    3,36%  [nfs]                              [k] nfs_page_create
>    3,13%  [nfs]                              [k]
> __nfs_pageio_add_request
>    2,10%  [nfs]                              [k]
> __nfs_find_lock_context
> 
> but it didn't improve read bandwidth at all.  Even slightly worse
> compared to proto=tcp.

So that more or less proves that those memcpys were never the root
cause of your performance problem.

I suspect you'll want to look at the server performance. Maybe also
look at the client tunables that limit concurrency, such as the
sunrpc.rdma_slot_table_entries sysctl, or the nfs.max_session_slots
module parameter, etc.

> 
> Anton
> 
> вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@kernel.org>:
> > 
> > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote:
> > > Hi
> > > 
> > > I am trying to exceed 20 GB/s doing sequential read from a single
> > > file
> > > on the nfs client.
> > > 
> > > perf top shows excessive memcpy usage:
> > > 
> > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count
> > > (approx.):
> > > 120872739112 lost: 0/0 drop: 0/0
> > > Overhead  Shared Object                      Symbol
> > >   20,54%  [kernel]                           [k] memcpy
> > >    6,52%  [nfs]                              [k]
> > > nfs_generic_pg_test
> > >    5,12%  [nfs]                              [k]
> > > nfs_page_group_lock
> > >    4,92%  [kernel]                           [k] _copy_to_iter
> > >    4,79%  [kernel]                           [k] gro_list_prepare
> > >    2,77%  [nfs]                              [k]
> > > nfs_clear_request
> > >    2,10%  [nfs]                              [k]
> > > __nfs_pageio_add_request
> > >    2,07%  [kernel]                           [k]
> > > check_heap_object
> > >    2,00%  [kernel]                           [k] __slab_free
> > > 
> > > Can nfs client be adopted to use zero copy ?, for example by
> > > using
> > > io_uring zero copy rx.
> > > 
> > 
> > The client has no idea in which order the server will return
> > replies to
> > the RPC calls it sends. So no, it can't queue up those reply
> > buffers in
> > advance.
> > 
> > The only way you can avoid memory copies here is to use RDMA to
> > allow
> > the server to write its replies directly into the correct client
> > read
> > buffers.
> > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trondmy@kernel.org, trond.myklebust@hammerspace.com

      reply	other threads:[~2025-07-22 19:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-22 18:10 nfs client and io_uring zero copy receive Anton Gavriliuk
2025-07-22 18:43 ` Trond Myklebust
2025-07-22 19:01   ` Anton Gavriliuk
2025-07-22 19:40     ` Trond Myklebust [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60b1e1be9ce67496e8774ccb64e9ff637ab2a75d.camel@kernel.org \
    --to=trondmy@kernel.org \
    --cc=antosha20xx@gmail.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox