[TEST] tcp_zerocopy_maxfrags.pkt fails

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [TEST] tcp_zerocopy_maxfrags.pkt fails
@ 2025-11-24 15:18 Jakub Kicinski
  2025-11-24 16:29 ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-11-24 15:18 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev

Hi Willem!

I migrated netdev CI to our own infra now, and the slightly faster,
Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:

# tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
# script packet:  1.000237 P. 36:37(1) ack 1 
# actual packet:  1.000235 P. 36:37(1) ack 1 win 1050 
# not ok 1 ipv4
# tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
# script packet:  1.000209 P. 36:37(1) ack 1 
# actual packet:  1.000208 P. 36:37(1) ack 1 win 1050 
# not ok 2 ipv6
# # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0

https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout

This happens on both debug and non-debug kernel (tho on the former 
the failure is masked due to MACHINE_SLOW).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
  2025-11-24 15:18 [TEST] tcp_zerocopy_maxfrags.pkt fails Jakub Kicinski
@ 2025-11-24 16:29 ` Willem de Bruijn
  2025-11-24 16:38   ` Neal Cardwell
  0 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-11-24 16:29 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn; +Cc: netdev

Jakub Kicinski wrote:
> Hi Willem!
> 
> I migrated netdev CI to our own infra now, and the slightly faster,
> Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> 
> # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> # script packet:  1.000237 P. 36:37(1) ack 1 
> # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050 
> # not ok 1 ipv4
> # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> # script packet:  1.000209 P. 36:37(1) ack 1 
> # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050 
> # not ok 2 ipv6
> # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> 
> https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> 
> This happens on both debug and non-debug kernel (tho on the former 
> the failure is masked due to MACHINE_SLOW).

That's an odd error.

The test send an msg_iov of 18 1 byte fragments. And verifies that
only 17 fit in one packet, followed by a single 1 byte packet. The
test does not explicitly initialize payload, but trusts packetdrill
to handle that. Relevant snippet below.

Packetdrill complains about payload contents. That error is only
generated by the below check in run_packet.c. Pretty straightforward.

Packetdrill agrees that the packet is one byte long. The win argument
is optional on outgoing packets, not relevant to the failure.

So somehow the data in that frag got overwritten in the short window
between when it was injected into the kernel and when it was observed?
Seems so unlikely.

Sorry, I'm a bit at a loss at least initially as to the cause.

----

   // send a zerocopy iov of 18 elements:
   +1 sendmsg(4, {msg_name(...)=...,
                  msg_iov(18)=[{..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}, {..., 1}, {..., 1},
                               {..., 1}, {..., 1}],
                  msg_flags=0}, MSG_ZEROCOPY) = 18

   // verify that it is split in one skb of 17 frags + 1 of 1 frag
   // verify that both have the PSH bit set
   +0 > P. 19:36(17) ack 1
   +0 < . 1:1(0) ack 36 win 257

   +0 > P. 36:37(1) ack 1
   +0 < . 1:1(0) ack 37 win 257

----

/* Verify TCP/UDP payload matches expected value. */
static int verify_outbound_live_payload(
        struct packet *actual_packet,
        struct packet *script_packet, char **error)
{
        /* Diff the TCP/UDP data payloads. We've already implicitly
         * checked their length by checking the IP and TCP/UDP headers.
         */
        assert(packet_payload_len(actual_packet) ==
               packet_payload_len(script_packet));
        if (memcmp(packet_payload(script_packet),
                   packet_payload(actual_packet),
                   packet_payload_len(script_packet)) != 0) {
                asprintf(error, "incorrect outbound data payload");
                return STATUS_ERR;
        }
        return STATUS_OK;
}


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
  2025-11-24 16:29 ` Willem de Bruijn
@ 2025-11-24 16:38   ` Neal Cardwell
  2025-11-25 19:49     ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Neal Cardwell @ 2025-11-24 16:38 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Jakub Kicinski, Willem de Bruijn, netdev

On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jakub Kicinski wrote:
> > Hi Willem!
> >
> > I migrated netdev CI to our own infra now, and the slightly faster,
> > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> >
> > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > # script packet:  1.000237 P. 36:37(1) ack 1
> > # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050
> > # not ok 1 ipv4
> > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > # script packet:  1.000209 P. 36:37(1) ack 1
> > # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050
> > # not ok 2 ipv6
> > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> >
> > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> >
> > This happens on both debug and non-debug kernel (tho on the former
> > the failure is masked due to MACHINE_SLOW).
>
> That's an odd error.
>
> The test send an msg_iov of 18 1 byte fragments. And verifies that
> only 17 fit in one packet, followed by a single 1 byte packet. The
> test does not explicitly initialize payload, but trusts packetdrill
> to handle that. Relevant snippet below.
>
> Packetdrill complains about payload contents. That error is only
> generated by the below check in run_packet.c. Pretty straightforward.
>
> Packetdrill agrees that the packet is one byte long. The win argument
> is optional on outgoing packets, not relevant to the failure.
>
> So somehow the data in that frag got overwritten in the short window
> between when it was injected into the kernel and when it was observed?
> Seems so unlikely.
>
> Sorry, I'm a bit at a loss at least initially as to the cause.

I agree this is odd. It looks like either a very concerning kernel
bug, or very concerning packetdrill bug. :-)

Could someone please run the test with tcpump in the background to
capture the full packet contents, to verify that indeed the packet has
the wrong contents?

This would help make sure that this is a kernel bug and not a
packetdrill bug. :-)

thanks,
neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
  2025-11-24 16:38   ` Neal Cardwell
@ 2025-11-25 19:49     ` Willem de Bruijn
  2025-11-25 20:31       ` Neal Cardwell
  0 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-11-25 19:49 UTC (permalink / raw)
  To: Neal Cardwell, Willem de Bruijn; +Cc: Jakub Kicinski, Willem de Bruijn, netdev

Neal Cardwell wrote:
> On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Jakub Kicinski wrote:
> > > Hi Willem!
> > >
> > > I migrated netdev CI to our own infra now, and the slightly faster,
> > > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> > >
> > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > # script packet:  1.000237 P. 36:37(1) ack 1
> > > # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050
> > > # not ok 1 ipv4
> > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > # script packet:  1.000209 P. 36:37(1) ack 1
> > > # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050
> > > # not ok 2 ipv6
> > > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> > >
> > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> > >
> > > This happens on both debug and non-debug kernel (tho on the former
> > > the failure is masked due to MACHINE_SLOW).
> >
> > That's an odd error.
> >
> > The test send an msg_iov of 18 1 byte fragments. And verifies that
> > only 17 fit in one packet, followed by a single 1 byte packet. The
> > test does not explicitly initialize payload, but trusts packetdrill
> > to handle that. Relevant snippet below.
> >
> > Packetdrill complains about payload contents. That error is only
> > generated by the below check in run_packet.c. Pretty straightforward.
> >
> > Packetdrill agrees that the packet is one byte long. The win argument
> > is optional on outgoing packets, not relevant to the failure.
> >
> > So somehow the data in that frag got overwritten in the short window
> > between when it was injected into the kernel and when it was observed?
> > Seems so unlikely.
> >
> > Sorry, I'm a bit at a loss at least initially as to the cause.
> 
> I agree this is odd. It looks like either a very concerning kernel
> bug, or very concerning packetdrill bug. :-)
> 
> Could someone please run the test with tcpump in the background to
> capture the full packet contents, to verify that indeed the packet has
> the wrong contents?
> 
> This would help make sure that this is a kernel bug and not a
> packetdrill bug. :-)

I'm not able to reproduce this on my own machine with the latest nn.
But could reproduce it on the netdev machine.

I assume all payload is supposed to be zeroed. And indeed the packet
seen has a non-zero single byte of payload: 0x60.

Is there any chance that this happens on some kernel with
unsubmitted patches, but not on netdev-nn/main on this machine either?

----

tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect
outbound data payload
script packet:  1.000169 P. 36:37(1) ack 1
actual packet:  1.000167 P. 36:37(1) ack 1 win 1050

14:42:01.330694 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
fd3d:fa7b:d17d::1.50901: Flags [P.], seq 19:36, ack 1, win 1050,
length 17: HTTP
        0x0000:  6000 842c 0025 0640 fd3d 0a0b 17d6 0000
        0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
        0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05e9
        0x0030:  0000 0001 5018 041a e883 0000 0000 0000
        0x0040:  0000 0000 0000 0000 0000 0000 00
14:42:01.330723 tun0  In  IP6 fd3d:fa7b:d17d::1.50901 >
fd3d:a0b:17d6::1.webcache: Flags [.], ack 36, win 257, length 0
        0x0000:  6000 0000 0014 06ff fd3d fa7b d17d 0000
        0x0010:  0000 0000 0000 0001 fd3d 0a0b 17d6 0000
        0x0020:  0000 0000 0000 0001 c6d5 1f90 0000 0001
        0x0030:  f7fe 05fa 5010 0101 e21b 0000
14:42:01.330727 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
fd3d:fa7b:d17d::1.50901: Flags [P.], seq 36:37, ack 1, win 1050,
length 1: HTTP
        0x0000:  6000 842c 0015 0640 fd3d 0a0b 17d6 0000
        0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
        0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05fa
        0x0030:  0000 0001 5018 041a e873 0000 60





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
  2025-11-25 19:49     ` Willem de Bruijn
@ 2025-11-25 20:31       ` Neal Cardwell
  2025-11-25 20:44         ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Neal Cardwell @ 2025-11-25 20:31 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Jakub Kicinski, Willem de Bruijn, netdev

On Tue, Nov 25, 2025 at 2:49 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Neal Cardwell wrote:
> > On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Jakub Kicinski wrote:
> > > > Hi Willem!
> > > >
> > > > I migrated netdev CI to our own infra now, and the slightly faster,
> > > > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> > > >
> > > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > > # script packet:  1.000237 P. 36:37(1) ack 1
> > > > # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050
> > > > # not ok 1 ipv4
> > > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > > # script packet:  1.000209 P. 36:37(1) ack 1
> > > > # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050
> > > > # not ok 2 ipv6
> > > > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> > > >
> > > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> > > >
> > > > This happens on both debug and non-debug kernel (tho on the former
> > > > the failure is masked due to MACHINE_SLOW).
> > >
> > > That's an odd error.
> > >
> > > The test send an msg_iov of 18 1 byte fragments. And verifies that
> > > only 17 fit in one packet, followed by a single 1 byte packet. The
> > > test does not explicitly initialize payload, but trusts packetdrill
> > > to handle that. Relevant snippet below.
> > >
> > > Packetdrill complains about payload contents. That error is only
> > > generated by the below check in run_packet.c. Pretty straightforward.
> > >
> > > Packetdrill agrees that the packet is one byte long. The win argument
> > > is optional on outgoing packets, not relevant to the failure.
> > >
> > > So somehow the data in that frag got overwritten in the short window
> > > between when it was injected into the kernel and when it was observed?
> > > Seems so unlikely.
> > >
> > > Sorry, I'm a bit at a loss at least initially as to the cause.
> >
> > I agree this is odd. It looks like either a very concerning kernel
> > bug, or very concerning packetdrill bug. :-)
> >
> > Could someone please run the test with tcpump in the background to
> > capture the full packet contents, to verify that indeed the packet has
> > the wrong contents?
> >
> > This would help make sure that this is a kernel bug and not a
> > packetdrill bug. :-)
>
> I'm not able to reproduce this on my own machine with the latest nn.
> But could reproduce it on the netdev machine.
>
> I assume all payload is supposed to be zeroed. And indeed the packet
> seen has a non-zero single byte of payload: 0x60.
>
> Is there any chance that this happens on some kernel with
> unsubmitted patches, but not on netdev-nn/main on this machine either?
>
> ----
>
> tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect
> outbound data payload
> script packet:  1.000169 P. 36:37(1) ack 1
> actual packet:  1.000167 P. 36:37(1) ack 1 win 1050
>
> 14:42:01.330694 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
> fd3d:fa7b:d17d::1.50901: Flags [P.], seq 19:36, ack 1, win 1050,
> length 17: HTTP
>         0x0000:  6000 842c 0025 0640 fd3d 0a0b 17d6 0000
>         0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
>         0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05e9
>         0x0030:  0000 0001 5018 041a e883 0000 0000 0000
>         0x0040:  0000 0000 0000 0000 0000 0000 00
> 14:42:01.330723 tun0  In  IP6 fd3d:fa7b:d17d::1.50901 >
> fd3d:a0b:17d6::1.webcache: Flags [.], ack 36, win 257, length 0
>         0x0000:  6000 0000 0014 06ff fd3d fa7b d17d 0000
>         0x0010:  0000 0000 0000 0001 fd3d 0a0b 17d6 0000
>         0x0020:  0000 0000 0000 0001 c6d5 1f90 0000 0001
>         0x0030:  f7fe 05fa 5010 0101 e21b 0000
> 14:42:01.330727 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
> fd3d:fa7b:d17d::1.50901: Flags [P.], seq 36:37, ack 1, win 1050,
> length 1: HTTP
>         0x0000:  6000 842c 0015 0640 fd3d 0a0b 17d6 0000
>         0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
>         0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05fa
>         0x0030:  0000 0001 5018 041a e873 0000 60

Looking at the tests in tools/testing/selftests/net/packetdrill/, I
don't see anything that sets the --send_omit_free packetdrill flag.
That flag is needed for TCP zero copy tests, to ensure that
packetdrill doesn't free the send() buffer after the send() call.

Because the test didn't use the --send_omit_free flag, packetdrill
freed the buffer. And the memory probably got reused before the
transmit. Perhaps for an IPv6 packet, whose first byte is 0x60, and
thus what was transmitted was the garbage 0x60.

Does that sound plausible, Willem? If you agree, do you have cycles to
cook a commit of some kind to fix this?

One option is to put the  --send_omit_free flag near the top of the
/tools/testing/selftests/net/packetdrill/tcp_zerocopy_maxfrags.pkt
script.

Thanks!

neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [TEST] tcp_zerocopy_maxfrags.pkt fails
  2025-11-25 20:31       ` Neal Cardwell
@ 2025-11-25 20:44         ` Willem de Bruijn
  0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-11-25 20:44 UTC (permalink / raw)
  To: Neal Cardwell, Willem de Bruijn; +Cc: Jakub Kicinski, Willem de Bruijn, netdev

Neal Cardwell wrote:
> On Tue, Nov 25, 2025 at 2:49 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Neal Cardwell wrote:
> > > On Mon, Nov 24, 2025 at 11:33 AM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:
> > > >
> > > > Jakub Kicinski wrote:
> > > > > Hi Willem!
> > > > >
> > > > > I migrated netdev CI to our own infra now, and the slightly faster,
> > > > > Fedora-based system is failing tcp_zerocopy_maxfrags.pkt:
> > > > >
> > > > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > > > # script packet:  1.000237 P. 36:37(1) ack 1
> > > > > # actual packet:  1.000235 P. 36:37(1) ack 1 win 1050
> > > > > # not ok 1 ipv4
> > > > > # tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect outbound data payload
> > > > > # script packet:  1.000209 P. 36:37(1) ack 1
> > > > > # actual packet:  1.000208 P. 36:37(1) ack 1 win 1050
> > > > > # not ok 2 ipv6
> > > > > # # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
> > > > >
> > > > > https://netdev-ctrl.bots.linux.dev/logs/vmksft/packetdrill/results/399942/13-tcp-zerocopy-maxfrags-pkt/stdout
> > > > >
> > > > > This happens on both debug and non-debug kernel (tho on the former
> > > > > the failure is masked due to MACHINE_SLOW).
> > > >
> > > > That's an odd error.
> > > >
> > > > The test send an msg_iov of 18 1 byte fragments. And verifies that
> > > > only 17 fit in one packet, followed by a single 1 byte packet. The
> > > > test does not explicitly initialize payload, but trusts packetdrill
> > > > to handle that. Relevant snippet below.
> > > >
> > > > Packetdrill complains about payload contents. That error is only
> > > > generated by the below check in run_packet.c. Pretty straightforward.
> > > >
> > > > Packetdrill agrees that the packet is one byte long. The win argument
> > > > is optional on outgoing packets, not relevant to the failure.
> > > >
> > > > So somehow the data in that frag got overwritten in the short window
> > > > between when it was injected into the kernel and when it was observed?
> > > > Seems so unlikely.
> > > >
> > > > Sorry, I'm a bit at a loss at least initially as to the cause.
> > >
> > > I agree this is odd. It looks like either a very concerning kernel
> > > bug, or very concerning packetdrill bug. :-)
> > >
> > > Could someone please run the test with tcpump in the background to
> > > capture the full packet contents, to verify that indeed the packet has
> > > the wrong contents?
> > >
> > > This would help make sure that this is a kernel bug and not a
> > > packetdrill bug. :-)
> >
> > I'm not able to reproduce this on my own machine with the latest nn.
> > But could reproduce it on the netdev machine.
> >
> > I assume all payload is supposed to be zeroed. And indeed the packet
> > seen has a non-zero single byte of payload: 0x60.
> >
> > Is there any chance that this happens on some kernel with
> > unsubmitted patches, but not on netdev-nn/main on this machine either?
> >
> > ----
> >
> > tcp_zerocopy_maxfrags.pkt:56: error handling packet: incorrect
> > outbound data payload
> > script packet:  1.000169 P. 36:37(1) ack 1
> > actual packet:  1.000167 P. 36:37(1) ack 1 win 1050
> >
> > 14:42:01.330694 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
> > fd3d:fa7b:d17d::1.50901: Flags [P.], seq 19:36, ack 1, win 1050,
> > length 17: HTTP
> >         0x0000:  6000 842c 0025 0640 fd3d 0a0b 17d6 0000
> >         0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
> >         0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05e9
> >         0x0030:  0000 0001 5018 041a e883 0000 0000 0000
> >         0x0040:  0000 0000 0000 0000 0000 0000 00
> > 14:42:01.330723 tun0  In  IP6 fd3d:fa7b:d17d::1.50901 >
> > fd3d:a0b:17d6::1.webcache: Flags [.], ack 36, win 257, length 0
> >         0x0000:  6000 0000 0014 06ff fd3d fa7b d17d 0000
> >         0x0010:  0000 0000 0000 0001 fd3d 0a0b 17d6 0000
> >         0x0020:  0000 0000 0000 0001 c6d5 1f90 0000 0001
> >         0x0030:  f7fe 05fa 5010 0101 e21b 0000
> > 14:42:01.330727 tun0  Out IP6 fd3d:a0b:17d6::1.webcache >
> > fd3d:fa7b:d17d::1.50901: Flags [P.], seq 36:37, ack 1, win 1050,
> > length 1: HTTP
> >         0x0000:  6000 842c 0015 0640 fd3d 0a0b 17d6 0000
> >         0x0010:  0000 0000 0000 0001 fd3d fa7b d17d 0000
> >         0x0020:  0000 0000 0000 0001 1f90 c6d5 f7fe 05fa
> >         0x0030:  0000 0001 5018 041a e873 0000 60
> 
> Looking at the tests in tools/testing/selftests/net/packetdrill/, I
> don't see anything that sets the --send_omit_free packetdrill flag.
> That flag is needed for TCP zero copy tests, to ensure that
> packetdrill doesn't free the send() buffer after the send() call.
> 
> Because the test didn't use the --send_omit_free flag, packetdrill
> freed the buffer. And the memory probably got reused before the
> transmit. Perhaps for an IPv6 packet, whose first byte is 0x60, and
> thus what was transmitted was the garbage 0x60.
> 
> Does that sound plausible, Willem? If you agree, do you have cycles to
> cook a commit of some kind to fix this?
> 
> One option is to put the  --send_omit_free flag near the top of the
> /tools/testing/selftests/net/packetdrill/tcp_zerocopy_maxfrags.pkt
> script.
> 
> Thanks!
> 
> neal

Thanks Neal!

I verified that that fixed the failure. And that our original Google
internal runner passes that flag on the command line, only for these
zerocopy tests.

I can send a fix.

Only, the ipv4 test appears to be failing with a different error.
Equally surprising. It times out just waiting for the SYNACK.

    ./ksft_runner.sh tcp_zerocopy_maxfrags.pkt
    TAP version 13
    1..2
    tcp_zerocopy_maxfrags.pkt:25: error handling packet: Timed out waiting for packet

Which corresponds with the last line in this snippet.

    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 setsockopt(3, SOL_SOCKET, SO_ZEROCOPY, [1], 4) = 0

   // Each pinned zerocopy page is fully accounted to skb->truesize.
   // This test generates a worst case packet with each frag storing
   // one byte, but increasing truesize with a page (64KB on PPC).
   +0 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [2000000], 4) = 0

   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-11-25 20:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-24 15:18 [TEST] tcp_zerocopy_maxfrags.pkt fails Jakub Kicinski
2025-11-24 16:29 ` Willem de Bruijn
2025-11-24 16:38   ` Neal Cardwell
2025-11-25 19:49     ` Willem de Bruijn
2025-11-25 20:31       ` Neal Cardwell
2025-11-25 20:44         ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).