From: Trond Myklebust <trondmy@kernel.org>
To: Anton Gavriliuk <antosha20xx@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: nfs client and io_uring zero copy receive
Date: Tue, 22 Jul 2025 15:40:06 -0400 [thread overview]
Message-ID: <60b1e1be9ce67496e8774ccb64e9ff637ab2a75d.camel@kernel.org> (raw)
In-Reply-To: <CAAiJnjrmeZUexNkJJmvuUDKvTqvuQhahWY2uFhOgBOmoLrLbLw@mail.gmail.com>
On Tue, 2025-07-22 at 22:01 +0300, Anton Gavriliuk wrote:
> > The only way you can avoid memory copies here is to use RDMA to
> > allow
> > the server to write its replies directly into the correct client
> > read
> > buffers.
>
> I remounted with rdma
>
> [root@23-127-77-6 ~]# mount -t nfs -o
> proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt
> /mnt
> [root@23-127-77-6 ~]# mount -v|grep -i rdma
> 192.168.0.7:/mnt on /mnt type nfs4
> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat
> al_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans
> =2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7)
> [root@23-127-77-6 ~]#
>
> and repeat sequential read.
>
> According to perf top, memcpy is gone,
>
> Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.):
> 22510217633 lost: 0/0 drop: 0/0
> Overhead Shared Object Symbol
> 13,12% [nfs] [k] nfs_generic_pg_test
> 11,32% [nfs] [k] nfs_page_group_lock
> 10,42% [nfs] [k] nfs_clear_request
> 5,41% [kernel] [k] gup_fast_pte_range
> 4,11% [nfs] [k]
> nfs_page_group_sync_on_bit
> 3,36% [nfs] [k] nfs_page_create
> 3,13% [nfs] [k]
> __nfs_pageio_add_request
> 2,10% [nfs] [k]
> __nfs_find_lock_context
>
> but it didn't improve read bandwidth at all. Even slightly worse
> compared to proto=tcp.
So that more or less proves that those memcpys were never the root
cause of your performance problem.
I suspect you'll want to look at the server performance. Maybe also
look at the client tunables that limit concurrency, such as the
sunrpc.rdma_slot_table_entries sysctl, or the nfs.max_session_slots
module parameter, etc.
>
> Anton
>
> вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@kernel.org>:
> >
> > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote:
> > > Hi
> > >
> > > I am trying to exceed 20 GB/s doing sequential read from a single
> > > file
> > > on the nfs client.
> > >
> > > perf top shows excessive memcpy usage:
> > >
> > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count
> > > (approx.):
> > > 120872739112 lost: 0/0 drop: 0/0
> > > Overhead Shared Object Symbol
> > > 20,54% [kernel] [k] memcpy
> > > 6,52% [nfs] [k]
> > > nfs_generic_pg_test
> > > 5,12% [nfs] [k]
> > > nfs_page_group_lock
> > > 4,92% [kernel] [k] _copy_to_iter
> > > 4,79% [kernel] [k] gro_list_prepare
> > > 2,77% [nfs] [k]
> > > nfs_clear_request
> > > 2,10% [nfs] [k]
> > > __nfs_pageio_add_request
> > > 2,07% [kernel] [k]
> > > check_heap_object
> > > 2,00% [kernel] [k] __slab_free
> > >
> > > Can nfs client be adopted to use zero copy ?, for example by
> > > using
> > > io_uring zero copy rx.
> > >
> >
> > The client has no idea in which order the server will return
> > replies to
> > the RPC calls it sends. So no, it can't queue up those reply
> > buffers in
> > advance.
> >
> > The only way you can avoid memory copies here is to use RDMA to
> > allow
> > the server to write its replies directly into the correct client
> > read
> > buffers.
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trondmy@kernel.org, trond.myklebust@hammerspace.com
prev parent reply other threads:[~2025-07-22 19:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 18:10 nfs client and io_uring zero copy receive Anton Gavriliuk
2025-07-22 18:43 ` Trond Myklebust
2025-07-22 19:01 ` Anton Gavriliuk
2025-07-22 19:40 ` Trond Myklebust [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=60b1e1be9ce67496e8774ccb64e9ff637ab2a75d.camel@kernel.org \
--to=trondmy@kernel.org \
--cc=antosha20xx@gmail.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox