From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Neal Cardwell <ncardwell@google.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
Willem de Bruijn <willemb@google.com>,
netdev@vger.kernel.org
Subject: Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
Date: Tue, 25 Nov 2025 14:49:00 -0500 [thread overview]
Message-ID: <willemdebruijn.kernel.39fa9d8834471@gmail.com> (raw)
In-Reply-To: <CADVnQym7Whnbc9xf_dew-ey1fGFBY1dSf6RJ=9qLNP=u+NYOEw@mail.gmail.com>
Neal Cardwell wrote:
> On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jakub Kicinski wrote:
> > > Hi Willem!
> > >
> > > I migrated netdev CI to our own infra now, and the slightly faster,
> > > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> > >
> > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > # script packet: 1.000237 P. 36:37(1) ack 1
> > > # actual packet: 1.000235 P. 36:37(1) ack 1 win 1050
> > > # not ok 1 ipv4
> > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > # script packet: 1.000209 P. 36:37(1) ack 1
> > > # actual packet: 1.000208 P. 36:37(1) ack 1 win 1050
> > > # not ok 2 ipv6
> > > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> > >
> > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> > >
> > > This happens on both debug and non-debug kernel (tho on the former
> > > the failure is masked due to MACHINE_SLOW).
> >
> > That's an odd error.
> >
> > The test send an msg_iov of 18 1 byte fragments. And verifies that
> > only 17 fit in one packet, followed by a single 1 byte packet. The
> > test does not explicitly initialize payload, but trusts packetdrill
> > to handle that. Relevant snippet below.
> >
> > Packetdrill complains about payload contents. That error is only
> > generated by the below check in run_packet.c. Pretty straightforward.
> >
> > Packetdrill agrees that the packet is one byte long. The win argument
> > is optional on outgoing packets, not relevant to the failure.
> >
> > So somehow the data in that frag got overwritten in the short window
> > between when it was injected into the kernel and when it was observed?
> > Seems so unlikely.
> >
> > Sorry, I'm a bit at a loss at least initially as to the cause.
>
> I agree this is odd. It looks like either a very concerning kernel
> bug, or very concerning packetdrill bug. :-)
>
> Could someone please run the test with tcpump in the background to
> capture the full packet contents, to verify that indeed the packet has
> the wrong contents?
>
> This would help make sure that this is a kernel bug and not a
> packetdrill bug. :-)
I'm not able to reproduce this on my own machine with the latest nn.
But could reproduce it on the netdev machine.
I assume all payload is supposed to be zeroed. And indeed the packet
seen has a non-zero single byte of payload: 0x60.
Is there any chance that this happens on some kernel with
unsubmitted patches, but not on netdev-nn/main on this machine either?
----
tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect
outbound data payload
script packet: 1.000169 P. 36:37(1) ack 1
actual packet: 1.000167 P. 36:37(1) ack 1 win 1050
14:42:01.330694 tun0 Out IP6 fd3d:a0b:17d6::1.webcache >
fd3d:fa7b:d17d::1.50901: Flags [P.], seq 19:36, ack 1, win 1050,
length 17: HTTP
0x0000: 6000 842c 0025 0640 fd3d 0a0b 17d6 0000
0x0010: 0000 0000 0000 0001 fd3d fa7b d17d 0000
0x0020: 0000 0000 0000 0001 1f90 c6d5 f7fe 05e9
0x0030: 0000 0001 5018 041a e883 0000 0000 0000
0x0040: 0000 0000 0000 0000 0000 0000 00
14:42:01.330723 tun0 In IP6 fd3d:fa7b:d17d::1.50901 >
fd3d:a0b:17d6::1.webcache: Flags [.], ack 36, win 257, length 0
0x0000: 6000 0000 0014 06ff fd3d fa7b d17d 0000
0x0010: 0000 0000 0000 0001 fd3d 0a0b 17d6 0000
0x0020: 0000 0000 0000 0001 c6d5 1f90 0000 0001
0x0030: f7fe 05fa 5010 0101 e21b 0000
14:42:01.330727 tun0 Out IP6 fd3d:a0b:17d6::1.webcache >
fd3d:fa7b:d17d::1.50901: Flags [P.], seq 36:37, ack 1, win 1050,
length 1: HTTP
0x0000: 6000 842c 0015 0640 fd3d 0a0b 17d6 0000
0x0010: 0000 0000 0000 0001 fd3d fa7b d17d 0000
0x0020: 0000 0000 0000 0001 1f90 c6d5 f7fe 05fa
0x0030: 0000 0001 5018 041a e873 0000 60
next prev parent reply other threads:[~2025-11-25 19:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-24 15:18 [TEST] tcp_zerocopy_maxfrags.pkt fails Jakub Kicinski
2025-11-24 16:29 ` Willem de Bruijn
2025-11-24 16:38 ` Neal Cardwell
2025-11-25 19:49 ` Willem de Bruijn [this message]
2025-11-25 20:31 ` Neal Cardwell
2025-11-25 20:44 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.39fa9d8834471@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=kuba@kernel.org \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.