From: Siddharth Vadapalli <s-vadapalli@ti.com>
To: <peter.ujfalusi@gmail.com>, <vkoul@kernel.org>,
<Frank.Li@kernel.org>, <andrew+netdev@lunn.ch>,
<davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
<pabeni@redhat.com>, <nm@ti.com>, <ssantosh@kernel.org>,
<horms@kernel.org>, <c-vankar@ti.com>, <mwalle@kernel.org>
Cc: <dmaengine@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<netdev@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>,
<danishanwar@ti.com>, <srk@ti.com>, <s-vadapalli@ti.com>
Subject: [RFC PATCH 0/6] Descriptor Recycling and Batch processing for CPSW
Date: Wed, 25 Mar 2026 18:08:36 +0530 [thread overview]
Message-ID: <20260325123850.638748-1-s-vadapalli@ti.com> (raw)
Hello,
NOTE for MAINTAINERS:
Patches in this series span 3 subsystems and I have posted this as an RFC
series to make it easy for the reviewers to understand the complete
implementation. I will eventually split the series and post them
sequentially to the respective subsystem's mailing list:
1. SoC
2. DMAEngine
3. Netdev
Series is based on commit
d1e59a469737 tcp: add cwnd_event_tx_start to tcp_congestion_ops
of the main branch of net-next tree. When I split the series in the
future, I shall base the patches for SoC and DMAEngine on linux-next
and the patches for Netdev on net-next.
This series enables batch processing for the am65-cpsw-nuss.c driver
on the transmit path (ndo_start_xmit and ndo_xdp_xmit) and transmit
completion path. Additionally, this series also recycles descriptors
instead of releasing them to the pool and reallocating them. The
difference in memory footprint without this series and with this series
is hardly noticeable (being under 1 MB).
Feedback on the implementation w.r.t. correctness, ease of use /
maintenance and configurability (sysfs based option for changing batch
size) is appreciated.
Series has been tested in the following combinations to cover edge
cases:
1. Single-Port (CPSW2G on J784S4-EVM)
2. Multi-Port (CPSW3G on AM625-SK)
3. Bidirectional TCP Iperf followed by interfaces being brought down
with traffic in flight (and TX / RX DMA Channel Teardown) followed
by interfaces being brought up and ensuring that Iperf traffic
resumes.
The primary motivation for this series is to improve performance in
terms of lowering the CPU load and achieving higher throughput for
gigabit and multi-gigabit operation.
The upcoming features that I plan to implement are:
1. Enable batch processing on RX
2. Batch processing on ICSSG similar to CPSW (since batch processing
increases latency, it might not be desirable to enable batch
processing and may be skipped as well).
The following sections capture the improvements brought about by this
series.
[1] AM625-SK with CPSW3G (multi-port / two netdevs) and single A53
processor (remaining CPUs are disabled) with each MAC Port operating
at 1 Gbps Full-Duplex.
===========================================================================
Baseline for [1]
===========================================================================
Dual TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
403 Mbps + 408 Mbps = 811 Mbps
Dual RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
336 Mbps + 331 Mbps = 667 Mbps
===========================================================================
With this series for [1]
===========================================================================
Dual TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
428 Mbps + 437 Mbps = 865 Mbps
Dual RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
332 Mbps + 337 Mbps = 669 Mbps
[2] J784S4-EVM with CPSW2G (single-port) and single A72 processor
(remaining CPUs are disabled) with the MAC Port operating at 1 Gbps Full-
Duplex.
===========================================================================
Baseline for [2]
===========================================================================
TX Iperf UDP traffic at 84% CPU Load averaged over 30 seconds:
956 Mbps
RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
941 Mbps
===========================================================================
With this series for [2]
===========================================================================
TX Iperf UDP traffic at 80% CPU Load averaged over 30 seconds:
956 Mbps
RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
941 Mbps
[3] J784S4-EVM with CPSW9G (multi-port) and single A72 processor
(remaining CPUs are disabled) with one MAC Port operating at 5 Gbps
Full-Duplex.
===========================================================================
Baseline for [3]
===========================================================================
TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
1.26 Gbps
RX Iperf TCP traffic at 75% CPU Load averaged over 30 seconds:
1.73 Gbps
===========================================================================
With this series for [3]
===========================================================================
TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
1.28 Gbps
RX Iperf TCP traffic at 75% CPU Load averaged over 30 seconds:
1.75 Gbps
Regards,
Siddharth.
Siddharth Vadapalli (6):
soc: ti: k3-ringacc: Add helper to get realtime count of free elements
soc: ti: k3-ringacc: Add helpers for batch push and pop operations
dmaengine: ti: k3-udma-glue: Add helpers for batch operations on TX/RX
DMA
net: ethernet: ti: am65-cpsw-nuss: Do not set buf_type for SKB
fragments
net: ethernet: ti: am65-cpsw-nuss: Recycle TX and RX CPPI Descriptors
net: ethernet: ti: am65-cpsw-nuss: Enable batch processing for TX / TX
CMPL
drivers/dma/ti/k3-udma-glue.c | 55 +++
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 441 +++++++++++++++++++----
drivers/net/ethernet/ti/am65-cpsw-nuss.h | 31 ++
drivers/soc/ti/k3-ringacc.c | 99 +++++
include/linux/dma/k3-udma-glue.h | 12 +
include/linux/soc/ti/k3-ringacc.h | 35 ++
6 files changed, 612 insertions(+), 61 deletions(-)
--
2.51.1
next reply other threads:[~2026-03-25 12:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-25 12:38 Siddharth Vadapalli [this message]
2026-03-25 12:38 ` [RFC PATCH 1/6] soc: ti: k3-ringacc: Add helper to get realtime count of free elements Siddharth Vadapalli
2026-03-25 12:38 ` [RFC PATCH 2/6] soc: ti: k3-ringacc: Add helpers for batch push and pop operations Siddharth Vadapalli
2026-03-25 12:38 ` [RFC PATCH 3/6] dmaengine: ti: k3-udma-glue: Add helpers for batch operations on TX/RX DMA Siddharth Vadapalli
2026-03-25 12:38 ` [RFC PATCH 4/6] net: ethernet: ti: am65-cpsw-nuss: Do not set buf_type for SKB fragments Siddharth Vadapalli
2026-03-25 12:38 ` [RFC PATCH 5/6] net: ethernet: ti: am65-cpsw-nuss: Recycle TX and RX CPPI Descriptors Siddharth Vadapalli
2026-03-25 12:38 ` [RFC PATCH 6/6] net: ethernet: ti: am65-cpsw-nuss: Enable batch processing for TX / TX CMPL Siddharth Vadapalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260325123850.638748-1-s-vadapalli@ti.com \
--to=s-vadapalli@ti.com \
--cc=Frank.Li@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=c-vankar@ti.com \
--cc=danishanwar@ti.com \
--cc=davem@davemloft.net \
--cc=dmaengine@vger.kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mwalle@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nm@ti.com \
--cc=pabeni@redhat.com \
--cc=peter.ujfalusi@gmail.com \
--cc=srk@ti.com \
--cc=ssantosh@kernel.org \
--cc=vkoul@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox