From: Olivier Matz <olivier.matz@6wind.com>
To: Gregory Etelson <getelson@nvidia.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
Ajit Khaparde <ajit.khaparde@broadcom.com>,
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
Ferruh Yigit <ferruh.yigit@intel.com>,
NBU-Contact-Thomas Monjalon <thomas@monjalon.net>,
"stable@dpdk.org" <stable@dpdk.org>,
Xiaoyun Li <xiaoyun.li@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2] app/testpmd: fix TX checksum calculation for tunnel
Date: Thu, 29 Jul 2021 10:25:30 +0200 [thread overview]
Message-ID: <YQJl+tbclXiKDNEA@platinum> (raw)
In-Reply-To: <BY5PR12MB4834F4D65732319889E55180A5EA9@BY5PR12MB4834.namprd12.prod.outlook.com>
On Wed, Jul 28, 2021 at 04:07:51PM +0000, Gregory Etelson wrote:
> Hello Oliver,
>
> Please see my comments below
>
> > On Tue, Jul 27, 2021 at 04:07:57PM +0300, Gregory Etelson wrote:
> > > TX checksum of a tunnelled packet can be calculated for outer headers
> > > only or for both outer and inner parts. The calculation method is
> > > determined by application.
> > > If TX checksum calculation can be offloaded, hardware ignores existing
> > > checksum value and replaces it with an updated result.
> >
> > This is not always true. Actually, the checksum value is optionally set by
> > software to the value that is expected by the hardware to offload the
> > checksum correctly. This is done through rte_eth_tx_prepare(), which is called
> > in csumonly test engine.
> >
> > For instance, on an ixgbe NIC, it does:
> >
> > rte_eth_tx_prepare()
> > eth_dev->tx_pkt_prepare()
> > ixgbe_prep_pkts()
> > rte_net_intel_cksum_flags_prepare()
> > if packet is IP, set IP checksum to 0
> > if packet is TCP or UDP, set L4 checksum to the phdr csum
> >
> > This driver-specific rte_eth_tx_prepare() can indeed do nothing and let the
> > hardware ignore the checksum in the packet.
> >
>
> You are right. I'll update the patch comment in v3.
>
> > > If TX checksum is calculated by a software, existing value must be
> > > zeroed first.
> > > The testpmd checksum forwarding engine always zeroed inner checksums.
> > > If inner checksum calculation was offloaded, that header was left with
> > > 0 checksum value.
> > > Following outer software checksum calculation produced wrong value.
> > > The patch zeroes inner IPv4 checksum only before software calculation.
> >
> > Sorry, I think I don't understand the issue. Are you trying to compute the inner
> > checksum by hardware and the outer checksum by software?
> >
>
> Correct. Inner checksum is offloaded and outer computed in software.
I think this approach is not sane: the value of the outer checksum depends
on the inner checksum, so it has to be calculated after. There is a comment
in the code about this:
/* Then process outer headers if any. Note that the software
* checksum will be wrong if one of the inner checksums is
* processed in hardware. */
if (info.is_tunnel == 1) {
tx_ol_flags |= process_outer_cksums(outer_l3_hdr, &info,
tx_offloads,
!!(tx_ol_flags & PKT_TX_TCP_SEG));
}
> Consider this example:
> Tunneled packet arrived at port A and being forwarded through port B.
> The packet arrived at port A with correct inner checksums - L3 and L4.
> Port B TX offloads inner L3 only.
>
> process_inner_cksums() sets "ipv4_hdr->hdr_checksum = 0;" unconditionally.
> Inner L3 checksum value will be restored by port B TX checksum offload, but when
> process_outer_cksums() runs software calculation on outer L4 it will use 0 and produce wrong result.
>
> Therefore, the patch zeros inner checksum values only before actual software calculations.
I better understand your use case, thanks.
However, with your patch, if the inner L4 checksum is wrong when it
arrives on port A, I think it will result in a packet with a wrong outer
L4 checksum and a correct inner L4 checksum. Is it what you expect?
I don't argue against the patch itself. What you suggest better matches
the offload API than what we have today. Can you please send another
version that better explains the use-case?
One more suggestion, maybe for later. Currently, the csumonly engine can
be configured to do the checksum in sw or in hw. Maybe we could add a
"dont-touch" option, to keep the value in the packet. Would it help for
your use-case?
>
> > > Fixes: 51f694dd40f5 ("app/testpmd: rework checksum forward engine")
> >
> > I'm not sure the problem origin is this commit (however, I may have
> > misunderstood your issue).
> >
> > At the time this commit was done, it was required to set the TCP/UDP
> > checksum to the pseudo header checksum to offload an L4 checksum. See:
> > https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h?id=51f694dd40f5
> > #n107
> >
> > The introduction of rte_eth_tx_prepare() API removed this need, see:
> > https://git.dpdk.org/dpdk/commit/?id=6b520d54ebfe
Just a reminder for this one.
Thanks,
Olivier
> > Thanks,
> > Olivier
> >
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> > > ---
> > > v2:
> > > remove blank line between Fixes and Cc explicitly compare with 0
> > > value in `if ()`
> > > ---
> > > app/test-pmd/csumonly.c | 23 ++++++++++++-----------
> > > 1 file changed, 12 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
> > > 0161f72175..bd5ad64a57 100644
> > > --- a/app/test-pmd/csumonly.c
> > > +++ b/app/test-pmd/csumonly.c
> > > @@ -480,17 +480,18 @@ process_inner_cksums(void *l3_hdr, const struct
> > > testpmd_offload_info *info,
> > >
> > > if (info->ethertype == _htons(RTE_ETHER_TYPE_IPV4)) {
> > > ipv4_hdr = l3_hdr;
> > > - ipv4_hdr->hdr_checksum = 0;
> > >
> > > ol_flags |= PKT_TX_IPV4;
> > > if (info->l4_proto == IPPROTO_TCP && tso_segsz) {
> > > ol_flags |= PKT_TX_IP_CKSUM;
> > > } else {
> > > - if (tx_offloads & DEV_TX_OFFLOAD_IPV4_CKSUM)
> > > + if (tx_offloads & DEV_TX_OFFLOAD_IPV4_CKSUM) {
> > > ol_flags |= PKT_TX_IP_CKSUM;
> > > - else
> > > + } else if (ipv4_hdr->hdr_checksum != 0) {
> > > + ipv4_hdr->hdr_checksum = 0;
> > > ipv4_hdr->hdr_checksum =
> > > rte_ipv4_cksum(ipv4_hdr);
> > > + }
> > > }
> > > } else if (info->ethertype == _htons(RTE_ETHER_TYPE_IPV6))
> > > ol_flags |= PKT_TX_IPV6; @@ -501,10 +502,10 @@
> > > process_inner_cksums(void *l3_hdr, const struct testpmd_offload_info
> > *info,
> > > udp_hdr = (struct rte_udp_hdr *)((char *)l3_hdr + info->l3_len);
> > > /* do not recalculate udp cksum if it was 0 */
> > > if (udp_hdr->dgram_cksum != 0) {
> > > - udp_hdr->dgram_cksum = 0;
> > > - if (tx_offloads & DEV_TX_OFFLOAD_UDP_CKSUM)
> > > + if (tx_offloads & DEV_TX_OFFLOAD_UDP_CKSUM) {
> > > ol_flags |= PKT_TX_UDP_CKSUM;
> > > - else {
> > > + } else {
> > > + udp_hdr->dgram_cksum = 0;
> > > udp_hdr->dgram_cksum =
> > > get_udptcp_checksum(l3_hdr, udp_hdr,
> > > info->ethertype); @@
> > > -514,12 +515,12 @@ process_inner_cksums(void *l3_hdr, const struct
> > testpmd_offload_info *info,
> > > ol_flags |= PKT_TX_UDP_SEG;
> > > } else if (info->l4_proto == IPPROTO_TCP) {
> > > tcp_hdr = (struct rte_tcp_hdr *)((char *)l3_hdr + info->l3_len);
> > > - tcp_hdr->cksum = 0;
> > > if (tso_segsz)
> > > ol_flags |= PKT_TX_TCP_SEG;
> > > - else if (tx_offloads & DEV_TX_OFFLOAD_TCP_CKSUM)
> > > + else if (tx_offloads & DEV_TX_OFFLOAD_TCP_CKSUM) {
> > > ol_flags |= PKT_TX_TCP_CKSUM;
> > > - else {
> > > + } else if (tcp_hdr->cksum != 0) {
> > > + tcp_hdr->cksum = 0;
> > > tcp_hdr->cksum =
> > > get_udptcp_checksum(l3_hdr, tcp_hdr,
> > > info->ethertype); @@ -529,13
> > > +530,13 @@ process_inner_cksums(void *l3_hdr, const struct
> > testpmd_offload_info *info,
> > > } else if (info->l4_proto == IPPROTO_SCTP) {
> > > sctp_hdr = (struct rte_sctp_hdr *)
> > > ((char *)l3_hdr + info->l3_len);
> > > - sctp_hdr->cksum = 0;
> > > /* sctp payload must be a multiple of 4 to be
> > > * offloaded */
> > > if ((tx_offloads & DEV_TX_OFFLOAD_SCTP_CKSUM) &&
> > > ((ipv4_hdr->total_length & 0x3) == 0)) {
> > > ol_flags |= PKT_TX_SCTP_CKSUM;
> > > - } else {
> > > + } else if (sctp_hdr->cksum != 0) {
> > > + sctp_hdr->cksum = 0;
> > > /* XXX implement CRC32c, example available in
> > > * RFC3309 */
> > > }
> > > --
> > > 2.32.0
> > >
next prev parent reply other threads:[~2021-07-29 8:25 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-19 8:33 [dpdk-dev] [PATCH] app/testpmd: fix TX checksum calculation for tunnel Gregory Etelson
2021-07-21 6:42 ` Ori Kam
2021-07-24 11:37 ` Thomas Monjalon
2021-07-24 12:43 ` Gregory Etelson
2021-07-27 13:07 ` [dpdk-dev] [PATCH v2] " Gregory Etelson
2021-07-28 1:31 ` Li, Xiaoyun
2021-07-28 3:45 ` Gregory Etelson
2021-07-28 4:09 ` Ajit Khaparde
2021-07-28 5:07 ` Li, Xiaoyun
2021-07-28 14:12 ` Olivier Matz
2021-07-28 16:07 ` Gregory Etelson
2021-07-29 8:25 ` Olivier Matz [this message]
2021-07-29 10:31 ` Gregory Etelson
2021-07-29 16:02 ` Olivier Matz
2021-07-29 9:39 ` [dpdk-dev] [PATCH v3] " Gregory Etelson
2021-07-29 16:05 ` Olivier Matz
2021-07-29 17:05 ` Gregory Etelson
2021-07-29 17:01 ` [dpdk-dev] [PATCH v4] " Gregory Etelson
2021-07-30 8:39 ` Olivier Matz
2021-07-30 12:04 ` Thomas Monjalon
2021-08-02 11:21 ` Jiang, YuX
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YQJl+tbclXiKDNEA@platinum \
--to=olivier.matz@6wind.com \
--cc=ajit.khaparde@broadcom.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=getelson@nvidia.com \
--cc=stable@dpdk.org \
--cc=thomas@monjalon.net \
--cc=xiaoyun.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.