dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
* Re: [URGENT] Critical Issue: Chelsio NIC (cxgbe) fails to send Zero-Copy packets and provides incomplete HW offload in DPDK 【紧急技术问题】Chelsio 网卡 (cxgbe) 在 DPDK 中无法发送零拷贝分片包且硬件卸载功能不完整
       [not found] <203d7186.13d6.199b3a225b3.Coremail.bloodyevil@163.com>
@ 2025-10-06 12:47 ` Stephen Hemminger
  0 siblings, 0 replies; only message in thread
From: Stephen Hemminger @ 2025-10-06 12:47 UTC (permalink / raw)
  To: bloodyevil; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 8565 bytes --]

Please put this in a bugzilla report.


On Mon, Oct 6, 2025, 12:00 bloodyevil <bloodyevil@163.com> wrote:

> *Dear DPDK Development Team and Chelsio Support Team,*
>
> We are writing to report two severe, fundamental issues we've encountered
> while using Chelsio T5/T6 series NICs (with the cxgbe PMD) in our
> high-performance real-time audio streaming application. These problems
> prevent us from leveraging core DPDK features and require your urgent
> attention.
>
> *Issue #1: Complete Failure to Transmit Zero-Copy (Multi-Segment) Packets*
>
> To achieve the lowest latency, we are using the standard DPDK zero-copy
> mechanism: attaching an external shared memory buffer (from rte_memzone)
> containing our audio payload to an mbuf header using
> rte_pktmbuf_attach_extbuf. This correctly creates a *multi-segment mbuf* (nb_segs
> > 1).
>
> However, we have found that the *cxgbe driver is completely unable to
> transmit these multi-segment mbufs*. Any attempt to send such a packet via
>  rte_eth_tx_burst fails (returns 0 or results in silent packet drops),
> regardless of whether hardware offloads are enabled or disabled. The cxgbe
>  PMD's capability report (tx_offload_capa) correctly *does not include*
> the RTE_ETH_TX_OFFLOAD_MULTI_SEGS flag.
>
> This means that for the cxgbe driver, the standard zero-copy path in DPDK
> is entirely non-functional.
>
> *Issue #2: Incomplete Hardware Checksum Offload*
>
> Forced to abandon zero-copy, we implemented a "single-copy" workaround by
> using rte_memcpy to create a contiguous, single-segment mbuf. While this
> allows packets to be transmitted, we discovered a second critical issue: *the
> hardware checksum offload functionality is incomplete*.
>
> Specifically:
>
>    1.
>
>    We set the full offload flags on the mbuf: m->ol_flags |=
>    RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM;.
>    2.
>
>    We zero out both the IP and UDP checksum fields in their respective
>    headers before transmission.
>    3.
>
>    Packet captures reveal that only the *UDP checksum* is correctly
>    calculated and filled in by the hardware. The *IP checksum* field
>    remains zeroed, causing the packet to be treated as invalid and dropped by
>    the network.
>    4.
>
>    For the packet to be transmitted successfully, we are forced to *manually
>    calculate the IP checksum in software* (ip_h->hdr_checksum =
>    rte_ipv4_cksum(ip_h);) while keeping the UDP checksum field zeroed for
>    hardware offload.
>
> This proves that although the cxgbe PMD reports support for
> RTE_ETH_TX_OFFLOAD_IPV4_CKSUM, it *does not actually perform IP checksum
> offloading* in practice.
>
> For contrast, we must emphasize that both of the core functionalities
> we've described—zero-copy (multi-segment mbuf) transmission and full
> (IP+UDP) hardware checksum offloading—work perfectly on the same testbed
> when using NICs from Intel (igc/i40e) and onboard Realtek(RTL8125)
> controllers. This strongly suggests that the issues are specific to the
> Chelsio cxgbe PMD.
>
> *Our Dilemma*
>
> These two issues leave us in an untenable position:
>
>    -
>
>    The ideal *zero-copy path is completely broken*, preventing us from
>    realizing a primary performance benefit of DPDK.
>    -
>
>    The fallback *single-copy path is highly inefficient*, as it not only
>    incurs the CPU cost of a memcpy but also requires the additional CPU
>    overhead of software IP checksum calculation, largely defeating the purpose
>    of hardware offloads.
>
> *Our Questions*
>
> We urgently need your help to clarify the following:
>
>    -
>       -
>
>       Is this behavior from the cxgbe PMD (offloading only UDP checksum)
>       consistent with the design expectations for a PMD in DPDK?
>       -
>
>       Does the DPDK framework provide any debugging mechanisms to trace
>       why an explicitly set offload flag (RTE_MBUF_F_TX_IP_CKSUM) would
>       be ignored by a PMD without reporting an error?
>
>
> Resolving these issues is critical to the success of our project. Any
> information or guidance you can provide would be greatly appreciated.
>
> *Our Environment*
>
>    -
>
>    *DPDK Version:* 25.07
>    -
>
>    *NIC Model:* Chelsio T520-CR
>    -
>
>    *OS & Kernel:* Tinycore64 16.0 kernel 6.6.63
>
> Thank you for your time and attention to this urgent matter. We look
> forward to your response.
>
> Best regards,
>
>
>
>
> *尊敬的 DPDK 开发团队和 Chelsio 技术支持团队:*
>
> 您们好!
>
> 我们正在开发一个对性能要求极高的实时音频流项目,但目前在使用 Chelsio T5/T6 系列网卡(cxgbe PMD)时,遇到了两个严重的底层功能障碍,导致
> DPDK 的核心优势无法发挥。我们恳请您们的紧急援助。
>
> *【核心问题一:零拷贝(多段 mbuf)数据包完全无法发送】*
>
> 为了实现最低延迟,我们采用 DPDK 标准的零拷贝机制:通过 rte_pktmbuf_attach_extbuf 函数,将外部共享内存(
> rte_memzone)中的音频数据附加到 mbuf 头部之后。此操作会创建一个*多段 mbuf* (nb_segs > 1)。
>
> 然而,我们发现 *cxgbe 驱动完全无法发送这种多段 mbuf*。一旦调用 rte_eth_tx_burst 发送此类数据包,无论是否开启硬件卸载,发送都会失败(返回值为
> 0 或导致丢包),数据包无法出现在网络上。cxgbe PMD 的能力报告(tx_offload_capa)也确实*不包含*
> RTE_ETH_TX_OFFLOAD_MULTI_SEGS 标志。
>
> 这表明,对于 cxgbe 驱动而言,DPDK 的标准零拷贝机制是完全不可用的。
>
> *【核心问题二:硬件校验和卸载功能不完整】*
>
> 为了绕过上述问题,我们被迫采用“单拷贝”的妥协方案:手动 rte_memcpy 数据到一个大的、连续的单段 mbuf
> 中。虽然这种方式可以成功发包,但我们发现了第二个严重问题:*硬件卸载功能是残缺的*。
>
> 具体表现为:
>
>    1.
>
>    我们为 mbuf 设置了完整的卸载标志:m->ol_flags |= RTE_MBUF_F_TX_IPV4 |
>    RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM;。
>    2.
>
>    在发包前,我们将 IP 和 UDP 头部中的校验和字段都清零。
>    3.
>
>    抓包分析后发现,只有 *UDP 校验和*被硬件正确计算并填充了。而 *IP 校验和*字段依然是 0,导致该包在网络中被视为无效数据包而被丢弃。
>    4.
>
>    我们必须在软件中手动计算 IP 校验和(ip_h->hdr_checksum = rte_ipv4_cksum(ip_h);),同时保持
>    UDP 校验和为 0,才能让数据包正确发送并被接收端验证。
>
> 这证实了 cxgbe PMD 虽然声称支持 RTE_ETH_TX_OFFLOAD_IPV4_CKSUM,但在实际工作中*并未执行 IP
> 校验和的硬件卸载*。
>
> 作为对比,我们需要强调的是:我们描述的这两项核心功能——即零拷贝(多段
> mbuf)发送和完整的硬件校验和卸载(IP+UDP),在我们的同一测试平台上,使用 Intel (igc/i40e) 和 Realtek
> (RTL8125) 的板载网卡时,都完全正常工作。这使我们确信,问题是特定于 Chelsio cxgbe PMD 的。
>
> *【我们的困境】*
>
> 这两个问题使我们陷入了绝境:
>
>    -
>
>    *理想的零拷贝路径完全不通*,导致 DPDK 的核心性能优势无法体现。
>    -
>
>    *妥协的单拷贝路径效率低下*,不仅引入了 memcpy 的 CPU 开销,还必须额外承担 IP
>    校验和的软件计算开销,使得硬件卸载的价值大打折扣。
>
> *【我们的问题】*
>
> 我们急需您的帮助来澄清以下问题:
>
>    -
>       -
>
>       cxgbe 驱动的这种行为(只卸载 UDP 校验和)是否符合 DPDK 对 PMD 的设计预期?
>       -
>
>       DPDK 框架是否有调试机制,可以追踪为何一个明确设置的卸载标志(RTE_MBUF_F_TX_IP_CKSUM)会被 PMD
>       忽略且不报告任何错
>
>
> 解决这些问题对于我们的项目能否成功至关重要。任何能够帮助我们前进的建议或信息,我们将不胜感激。
>
> *【我们的环境信息】*
>
>    -
>
>    *DPDK 版本:25.07*
>    -
>
>    *网卡型号:*Chelsio T520-CR
>    -
>
>    *操作系统与内核版本:Tinycore64 16.0  kernel 6.6.63*
>
> 感谢您的时间和关注,我们急切地期待您的回复!
>
> 此致,
>
> 敬礼!
>
>

[-- Attachment #2: Type: text/html, Size: 34709 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-10-06 12:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <203d7186.13d6.199b3a225b3.Coremail.bloodyevil@163.com>
2025-10-06 12:47 ` [URGENT] Critical Issue: Chelsio NIC (cxgbe) fails to send Zero-Copy packets and provides incomplete HW offload in DPDK 【紧急技术问题】Chelsio 网卡 (cxgbe) 在 DPDK 中无法发送零拷贝分片包且硬件卸载功能不完整 Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).