All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Dragos Tatulea <dtatulea@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Richard Cochran <richardcochran@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	Moshe Shemesh <moshe@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Gal Pressman <gal@nvidia.com>, Cosmin Ratiu <cratiu@nvidia.com>
Subject: Re: [PATCH net-next V2 00/11] net/mlx5e: Add support for devmem and io_uring TCP zero-copy
Date: Wed, 28 May 2025 16:04:18 -0700	[thread overview]
Message-ID: <aDeWcntZgm7Je8TZ@mini-arch> (raw)
In-Reply-To: <CAHS8izMhCm1+UzmWK2Ju+hbA5U-7OYUcHpdd8yEuQEux3QZ74A@mail.gmail.com>

On 05/28, Mina Almasry wrote:
> On Wed, May 28, 2025 at 8:45 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 05/28, Dragos Tatulea wrote:
> > > On Tue, May 27, 2025 at 09:05:49AM -0700, Stanislav Fomichev wrote:
> > > > On 05/23, Tariq Toukan wrote:
> > > > > This series from the team adds support for zerocopy rx TCP with devmem
> > > > > and io_uring for ConnectX7 NICs and above. For performance reasons and
> > > > > simplicity HW-GRO will also be turned on when header-data split mode is
> > > > > on.
> > > > >
> > > > > Find more details below.
> > > > >
> > > > > Regards,
> > > > > Tariq
> > > > >
> > > > > Performance
> > > > > ===========
> > > > >
> > > > > Test setup:
> > > > >
> > > > > * CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (single NUMA)
> > > > > * NIC: ConnectX7
> > > > > * Benchmarking tool: kperf [1]
> > > > > * Single TCP flow
> > > > > * Test duration: 60s
> > > > >
> > > > > With application thread and interrupts pinned to the *same* core:
> > > > >
> > > > > |------+-----------+----------|
> > > > > | MTU  | epoll     | io_uring |
> > > > > |------+-----------+----------|
> > > > > | 1500 | 61.6 Gbps | 114 Gbps |
> > > > > | 4096 | 69.3 Gbps | 151 Gbps |
> > > > > | 9000 | 67.8 Gbps | 187 Gbps |
> > > > > |------+-----------+----------|
> > > > >
> > > > > The CPU usage for io_uring is 95%.
> > > > >
> > > > > Reproduction steps for io_uring:
> > > > >
> > > > > server --no-daemon -a 2001:db8::1 --no-memcmp --iou --iou_sendzc \
> > > > >         --iou_zcrx --iou_dev_name eth2 --iou_zcrx_queue_id 2
> > > > >
> > > > > server --no-daemon -a 2001:db8::2 --no-memcmp --iou --iou_sendzc
> > > > >
> > > > > client --src 2001:db8::2 --dst 2001:db8::1 \
> > > > >         --msg-zerocopy -t 60 --cpu-min=2 --cpu-max=2
> > > > >
> > > > > Patch overview:
> > > > > ================
> > > > >
> > > > > First, a netmem API for skb_can_coalesce is added to the core to be able
> > > > > to do skb fragment coalescing on netmems.
> > > > >
> > > > > The next patches introduce some cleanups in the internal SHAMPO code and
> > > > > improvements to hw gro capability checks in FW.
> > > > >
> > > > > A separate page_pool is introduced for headers. Ethtool stats are added
> > > > > as well.
> > > > >
> > > > > Then the driver is converted to use the netmem API and to allow support
> > > > > for unreadable netmem page pool.
> > > > >
> > > > > The queue management ops are implemented.
> > > > >
> > > > > Finally, the tcp-data-split ring parameter is exposed.
> > > > >
> > > > > Changelog
> > > > > =========
> > > > >
> > > > > Changes from v1 [0]:
> > > > > - Added support for skb_can_coalesce_netmem().
> > > > > - Avoid netmem_to_page() casts in the driver.
> > > > > - Fixed code to abide 80 char limit with some exceptions to avoid
> > > > > code churn.
> > > >
> > > > Since there is gonna be 2-3 weeks of closed net-next, can you
> > > > also add a patch for the tx side? It should be trivial (skip dma unmap
> > > > for niovs in tx completions plus netdev->netmem_tx=1).
> > > >
> > > Seems indeed trivial. We will add it.
> > >
> > > > And, btw, what about the issue that Cosmin raised in [0]? Is it addressed
> > > > in this series?
> > > >
> > > > 0: https://lore.kernel.org/netdev/9322c3c4826ed1072ddc9a2103cc641060665864.camel@nvidia.com/
> > > We wanted to fix this afterwards as it needs to change a more subtle
> > > part in the code that replenishes pages. This needs more thinking and
> > > testing.
> >
> > Thanks! For my understanding: does the issue occur only during initial
> > queue refill? Or the same problem will happen any time there is a burst
> > of traffic that might exhaust all rx descriptors?
> >
> 
> Minor: a burst in traffic likely won't reproduce this case, I'm sure
> mlx5 can drive the hardware to line rate consistently. It's more if
> the machine is under extreme memory pressure, I think,
> page_pool_alloc_pages and friends may return ENOMEM, which reproduces
> the same edge case as the dma-buf being extremely small which also
> makes page_pool_alloc_netmems return -ENOMEM.

What I want to understand is whether the kernel/driver will oops when dmabuf
runs out of buffers after initial setup. Either traffic burst and/or userspace
being slow on refill - doesn't matter.

  reply	other threads:[~2025-05-28 23:04 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-22 21:41 [PATCH net-next V2 00/11] net/mlx5e: Add support for devmem and io_uring TCP zero-copy Tariq Toukan
2025-05-22 21:41 ` [PATCH net-next V2 01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR Tariq Toukan
2025-05-22 23:07   ` Mina Almasry
2025-05-22 21:41 ` [PATCH net-next V2 02/11] net: Add skb_can_coalesce for netmem Tariq Toukan
2025-05-22 23:09   ` Mina Almasry
2025-05-25 13:03     ` Dragos Tatulea
2025-05-25 17:44       ` Mina Almasry
2025-05-28  9:13         ` Dragos Tatulea
2025-05-22 21:41 ` [PATCH net-next V2 03/11] net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc Tariq Toukan
2025-05-22 21:41 ` [PATCH net-next V2 04/11] net/mlx5e: SHAMPO: Remove redundant params Tariq Toukan
2025-05-22 21:41 ` [PATCH net-next V2 05/11] net/mlx5e: SHAMPO: Improve hw gro capability checking Tariq Toukan
2025-05-22 21:41 ` [PATCH net-next V2 06/11] net/mlx5e: SHAMPO: Separate pool for headers Tariq Toukan
2025-05-22 22:30   ` Jakub Kicinski
2025-05-22 23:08     ` Saeed Mahameed
2025-05-22 23:24       ` Mina Almasry
2025-05-22 23:43         ` Saeed Mahameed
2025-05-27 15:29       ` Jakub Kicinski
2025-05-27 15:53         ` Dragos Tatulea
2025-05-22 21:41 ` [PATCH net-next V2 07/11] net/mlx5e: SHAMPO: Headers page pool stats Tariq Toukan
2025-05-22 22:31   ` Jakub Kicinski
2025-05-22 22:58     ` Saeed Mahameed
2025-06-06 10:43       ` Cosmin Ratiu
2025-06-08 10:09         ` Tariq Toukan
2025-06-09 15:20           ` Jakub Kicinski
2025-05-22 21:41 ` [PATCH net-next V2 08/11] net/mlx5e: Convert over to netmem Tariq Toukan
2025-05-22 23:18   ` Mina Almasry
2025-05-22 23:54     ` Saeed Mahameed
2025-05-23 17:58       ` Mina Almasry
2025-05-23 19:22         ` Saeed Mahameed
2025-05-22 21:41 ` [PATCH net-next V2 09/11] net/mlx5e: Add support for UNREADABLE netmem page pools Tariq Toukan
2025-05-22 23:26   ` Mina Almasry
2025-05-22 23:56     ` Saeed Mahameed
2025-05-22 21:41 ` [PATCH net-next V2 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap Tariq Toukan
2025-05-22 21:41 ` [PATCH net-next V2 11/11] net/mlx5e: Support ethtool tcp-data-split settings Tariq Toukan
2025-05-22 22:55   ` Jakub Kicinski
2025-05-22 23:19     ` Saeed Mahameed
2025-05-23 16:17       ` Cosmin Ratiu
2025-05-23 19:35         ` saeed
2025-05-27 16:10       ` Jakub Kicinski
2025-05-28  5:10         ` Gal Pressman
2025-05-29  0:12           ` Jakub Kicinski
2025-05-27 16:05 ` [PATCH net-next V2 00/11] net/mlx5e: Add support for devmem and io_uring TCP zero-copy Stanislav Fomichev
2025-05-28  9:17   ` Dragos Tatulea
2025-05-28 15:45     ` Stanislav Fomichev
2025-05-28 22:59       ` Mina Almasry
2025-05-28 23:04         ` Stanislav Fomichev [this message]
2025-05-29 11:11           ` Dragos Tatulea
2025-06-06  9:00             ` Cosmin Ratiu
2025-05-28  0:31 ` Jakub Kicinski
2025-05-28  1:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDeWcntZgm7Je8TZ@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cratiu@nvidia.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.