Re: [TEST] txtimestamp.sh pains after netdev foundation migration

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>,
	 "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [TEST] txtimestamp.sh pains after netdev foundation migration
Date: Sun, 11 Jan 2026 22:28:39 -0500	[thread overview]
Message-ID: <willemdebruijn.kernel.311e0b9ad88f0@gmail.com> (raw)
In-Reply-To: <willemdebruijn.kernel.555dd45f2e96@gmail.com>

Willem de Bruijn wrote:
> Willem de Bruijn wrote:
> > Jakub Kicinski wrote:
> > > On Thu, 08 Jan 2026 14:02:15 -0500 Willem de Bruijn wrote:
> > > > Increasing tolerance should work.
> > > > 
> > > > The current values are pragmatic choices to be so low as to minimize
> > > > total test runtime, but high enough to avoid flakes. Well..
> > > > 
> > > > If increasing tolerance, we also need to increase the time the test
> > > > waits for all notifications to arrive, cfg_sleep_usec.
> > > 
> > > To be clear the theory is that we got scheduled out between taking the
> > > USR timestamp and sending the packet. But once the packet is in the
> > > kernel it seems to flow, so AFAIU cfg_sleep_usec can remain untouched.
> > > 
> > > Thinking about it more - maybe what blocks us is the print? Maybe under
> > > vng there's a non-trivial chance that a print to stderr ends up
> > > blocking on serial and schedules us out? I mean maybe we should:
> > > 
> > > diff --git a/tools/testing/selftests/net/txtimestamp.c b/tools/testing/selftests/net/txtimestamp.c
> > > index abcec47ec2e6..e2273fdff495 100644
> > > --- a/tools/testing/selftests/net/txtimestamp.c
> > > +++ b/tools/testing/selftests/net/txtimestamp.c
> > > @@ -207,12 +207,10 @@ static void __print_timestamp(const char *name, struct timespec *cur,
> > >         fprintf(stderr, "\n");
> > >  }
> > >  
> > > -static void print_timestamp_usr(void)
> > > +static void record_timestamp_usr(void)
> > >  {
> > >         if (clock_gettime(CLOCK_REALTIME, &ts_usr))
> > >                 error(1, errno, "clock_gettime");
> > > -
> > > -       __print_timestamp("  USR", &ts_usr, 0, 0);
> > >  }
> > >  
> > >  static void check_timestamp_usr(void)
> > > @@ -636,8 +634,6 @@ static void do_test(int family, unsigned int report_opt)
> > >                         fill_header_udp(buf + off, family == PF_INET);
> > >                 }
> > >  
> > > -               print_timestamp_usr();
> > > -
> > >                 iov.iov_base = buf;
> > >                 iov.iov_len = total_len;
> > >  
> > > @@ -692,10 +688,14 @@ static void do_test(int family, unsigned int report_opt)
> > >  
> > >                 }
> > >  
> > > +               record_timestamp_usr();
> > >                 val = sendmsg(fd, &msg, 0);
> > >                 if (val != total_len)
> > >                         error(1, errno, "send");
> > >  
> > > +               /* Avoid I/O between taking ts_usr and sendmsg() */
> > > +               __print_timestamp("  USR", &ts_usr, 0, 0);
> > > +
> > >                 check_timestamp_usr();
> > >  
> > >                 /* wait for all errors to be queued, else ACKs arrive OOO */
> > 
> > Definitely worth including.
> > 
> > Could it be helpful to schedule at RR or FIFO prio. Depends on the
> > reason for descheduling. And it only affects priority within the VM.
> > 
> > I'm having trouble reproducing it in vng both locally and on 
> > netdev-virt.
> > 
> > At this point, an initial obviously correct patch and observe how
> > much that mitigates the issue is likely the fastest way forward.
> 
> Instead of increasing tolerance, how about optionally allowing one
> moderate timing error:
> 
> @@ -166,8 +167,15 @@ static void validate_timestamp(struct timespec *cur, int min_delay)
>         if (cur64 < start64 + min_delay || cur64 > start64 + max_delay) {
>                 fprintf(stderr, "ERROR: %" PRId64 " us expected between %d and %d\n",
>                                 cur64 - start64, min_delay, max_delay);
> -               if (!getenv("KSFT_MACHINE_SLOW"))
> -                       test_failed = true;
> +               if (!getenv("KSFT_MACHINE_SLOW")) {
> +                       if (cfg_num_max_timing_failures &&
> +                           (cur64 <= start64 + (max_delay * 2))) {
> +                               cfg_num_max_timing_failures--;
> +                               fprintf(stderr, "CONTINUE: ignore 1 timing failure\n");
> +                       } else {
> +                               test_failed = true;
> +                       }
> +               }
>         }
>  }
> 
> @@ -746,6 +755,10 @@ static void parse_opt(int argc, char **argv)
>                 case 'E':
>                         cfg_use_epoll = true;
>                         cfg_epollet = true;
> +                       break;
> +               case 'f':
> +                       cfg_num_max_timing_failures = strtoul(optarg, NULL, 10);
> +                       break;
> 
> +++ b/tools/testing/selftests/net/txtimestamp.sh
> @@ -30,8 +30,8 @@ run_test_v4v6() {
>         # wait for ACK to be queued
>         local -r args="$@ -v 10000 -V 60000 -t 8000 -S 80000"
>  
> -       ./txtimestamp ${args} -4 -L 127.0.0.1
> -       ./txtimestamp ${args} -6 -L ::1
> +       ./txtimestamp ${args} -f 1 -4 -L 127.0.0.1
> +       ./txtimestamp ${args} -f 1 -6 -L ::1
>  }
> 
> and some boilerplate.
> 
> Can fold in the record_timestamp_usr() change too.
> 
> I can send this, your alternative with Suggested-by, or let me know if
> you prefer to send that.
> 
> It's tricky to reproduce, but evidently on some platforms this occurs,
> so not unreasonable to give some leeway. A single UDP test runs 12
> timing validations: 4 packets * {SND, ENQ, END + SND} setups. A single
> TCP test runs additional {ACK, SND + ACK, ENQ + SND + ACK} cases. If
> we consider 1/12 skips too high, we could increase packet count. 

That should say 16 validations: ENQ + SND validates both.
 
> txtimestamp.sh runs 3 * 7 * 2 test variants. Alternatively we suppress
> 1 failure here, rather than in the individual tests.
> 
> Any of these approaches should significantly reduce the flake rate
> reported on netdev.bots.

next prev parent reply	other threads:[~2026-01-12  3:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07 19:05 [TEST] txtimestamp.sh pains after netdev foundation migration Jakub Kicinski
2026-01-08  0:19 ` Willem de Bruijn
2026-01-08  3:25   ` Jakub Kicinski
2026-01-08 16:06     ` Jakub Kicinski
2026-01-08 19:02       ` Willem de Bruijn
2026-01-08 20:38         ` Jakub Kicinski
2026-01-08 21:19           ` Willem de Bruijn
2026-01-12  3:24             ` Willem de Bruijn
2026-01-12  3:28               ` Willem de Bruijn [this message]
2026-01-12 14:29                 ` Jakub Kicinski
2026-01-12 16:38                   ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.311e0b9ad88f0@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox