public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
From: Li Wang <wangli.ahau@gmail.com>
To: John Stultz <jstultz@google.com>
Cc: Soma Das <somadas1@linux.ibm.com>, ltp@lists.linux.it
Subject: Re: [LTP] [PATCH] sched_football: fix false failures on many-CPU systems
Date: Tue, 21 Apr 2026 11:31:37 +0800	[thread overview]
Message-ID: <aebuYvezjPziv5r0@gmail.com> (raw)
In-Reply-To: <CANDhNCrZRiqyO32fsL=yb+NL0xD+LtYAYNYhbZDn1eCSfwMcyw@mail.gmail.com>

On Mon, Apr 20, 2026 at 03:49:06PM -0700, John Stultz wrote:
> On Wed, Apr 15, 2026 at 8:23 PM Li Wang <wangli.ahau@gmail.com> wrote:
> > John Stultz <jstultz@google.com> wrote:
> > > > > > > 1. RT throttling freezes all SCHED_FIFO threads simultaneously. On
> > > > > > > release, the kernel does not always reschedule the highest-priority
> > > > > > > thread first on every CPU, so offense briefly runs and increments
> > > > > > > the_ball before defense is rescheduled. Fix by saving and disabling
> > > > > > > sched_rt_runtime_us in setup and restoring it in a new cleanup
> > > > > > > callback.
> > > > >
> > > > > Make sense, and like the AI-reviewer points out LTP provides an option
> > > > > to save_restore it automatically.
> > >
> > > Throttling shouldn't break the test. The fact that SCHED_NORMAL tasks
> > > ran shouldn't change the ordering when we go back to running RT tasks.
> > > This is likely a kernel bug.
> >
> > Theoretically speaking, your point is correct. While instant global
> > priority ordering upon unthrottling is ideal, it ignores the physical realities
> > of SMP architecture.
> >
> > If we look at do_sched_rt_period_timer(), CPUs are unthrottled sequentially
> > via a for_each_cpu loop. When an early CPU in the loop is unlocked, it
> > immediately schedules its local RT task 'offense'.
> > See:
> >   https://elixir.bootlin.com/linux/v7.0/source/kernel/sched/rt.c#L797
> >
> > Meanwhile, subsequent CPUs in the loop (which may hold the higher-priority
> > 'defense' task) are literally still throttled.
> >
> > Expecting atomic, zero-latency global unthrottling is physically unrealistic for
> > multi-core systems.
> >
> > That's why I tend to believe disabling throttling for this specific test is the
> > wise and practical approach.
> 
> Apologies, I'm still not sure I see it.
> 
> If before throttling happens  there are NR_CPU high priority (same
> priority) defenders, and they are distributed across cpus the
> preventing NR_CPU lower priority offensive tasks running, throttling
> should not change this distribution of the high priority tasks
> (because all the other CPUs already have their own high priority
> defender to run, so they wouldn't pull an equivalent priority task
> over). So when unthrottling, I don't see how the lower-priority
> offensive task would be chosen.
> 
> It may very well be the case that lower priority tasks do run, but I'd
> contend that suggests there is a bug.

Yes, you probably right here, I might have overlooked something related
to throttling before. Now I've re-examined the details of this part
and found that the smallest unit of operation is the runqueue, so
low-priority tasks won't occupy the CPU for a short period of time.

No mater we do pin or not, we shouldn't touch the RT throttling.

> > > > > > > 2. Offense and defense threads were unpinned, allowing the scheduler
> > > > > > > to migrate them freely. An offense thread could land on a CPU with
> > > > > > > no defense thread present and run unchecked. Fix by passing a CPU
> > > > > > > index as the thread arg and calling sched_setaffinity() at thread
> > > > > > > start. Pairs are distributed round-robin (i % ncpus) so each
> > > > > > > offense thread shares its CPU with a defense thread.
> > > > >
> > > > > This is a good thought, as for SCHED_FIFO it manages the corresponding
> > > > > runqueue for each CPU and simply picks the higher priority task to run.
> > > > > So pinning the threads to each CPU makes sense, but maybe we could
> > > > > only pin the defense because:
> > > > >
> > > > > With N defense threads pinned one per CPU, every CPU has a defense
> > > > > thread at priority 30 permanently runnable. The offense threads at priority
> > > > > 15, regardless of which CPU the scheduler places them on, will always find
> > > > > a higher-priority defense thread on the same CPU's runqueue. Since
> > > > > SCHED_FIFO strictly favors the higher-priority runnable task, offense can
> > > > > never be picked.
> > > > >
> > > > > Pinning offense as well would be redundant, it doesn't matter where offense
> > > > > lands, because defense already covers every CPU. This also has the advantage
> > > > > of letting the scheduler freely migrate offense threads without
> > > > > affecting the test
> > > > > outcome, which avoids interfering with the kernel's load balancing logic during
> > > > > the test.
> > > > >
> > > > > And, I'd suggest using tst_ncpus_available() instead of get_numcpus()
> > > > > when distributing defense threads across CPUs, in case some CPUs are
> > > > > offline. Pinning a defense thread to an offline CPU would leave that
> > > > > CPU uncovered and allow offense to run unchecked. See:
> > >
> > > I didn't see the orignal patch here, but the whole point of
> > > sched_football is to ensure the top <num cpu> (unaffined) priority
> > > tasks are always run and no lower priority rt tasks are run instead.
> > >
> > > So none of the tasks should be pinned to any cpus. The scheduler is
> > > supposed to ensure the RT invariant holds.
> > > There are some known bugs at the moment that will cause sched_football
> > > to fail (the RT_PUSH_IPI feature, for instance). That's a problem with
> > > the kernel, not the test.
> >
> > Apart from the known bug of RT_PUSH_IPI feature, it still does not
> > guarantee 100% success in real scenarios.
> >
> > After a deep look into the rt scheduler principals, I found that, because
> > the RT_PUSH_IPI mechanism is designed as a "best-effort"
> > optimization rather than a guaranteed operation. As the kernel's
> > scheduling state is highly dynamic and asynchronous, a push attempt
> > will deliberately abort if the environment changes between the time the
> > IPI is sent and when it is actually processed.
> 
> So again, RT_PUSH_IPI is known to break the RT invarient.  So I'm not
> sure pointing to it to justify the behavior is very compelling.
> 
> 
> > It fails by design to prevent instability, primarily due to state expiration,
> > CPU affinity restrictions, sudden priority inversions, or the lack of an
> > eligible target CPU.
> >
> > See push_rt_task() in kernel/sched/rt.c:
> >   https://elixir.bootlin.com/linux/v7.0/source/kernel/sched/rt.c#L1939
> >
> > Hence, if we explicitly pin the defense thread to each CPU, it will join in
> > the corresponding runqueues, which completely match the reasonable
> > situation: the kernel's RT scheduler guarantees per-CPU priority ordering,
> > not global placement. The RT load balancer is asynchronous and doesn't
> > guarantee that all high-priority threads are placed before any low-priority
> > thread runs.
> >
> > On the other side, if the test expects a global guarantee, it's testing
> > something the kernel doesn't claim to provide.
> 
> Prior to RT_PUSH_IPI, I believe it did provide this functionality for
> RT scheduling and I recall a fair amount of effort went into the RT
> scheduler to try to enforce the RT priority invariant.
> Last I tried, disabling RT_PUSH_IPI resolved the failures I saw on SMP
> systems, but that was a bit back so new issues may have cropped up.
> 
> Mostly I just want to make sure we're not papering over the test to
> have it stop reporting a real correctness bug.
> 
> I'll grant that this real bug doesn't seem like something anyone is
> prioritizing to resolve, so maybe it's not all that important. And I
> agree there is a cost to having the sched_football test be strict and
> always failing, since we potentially miss other variations of bugs and
> problems that could be introduced. If you want to add a --per-cpu
> argument or something to enable the pinning and re-scope the test, I'd
> probably not object to that.
> 
> But I really would like to make sure that the *specific thing* the
> test was written to check isn't lost just so the results can stop
> seeing failures.

Thank you, I agree. Using the `--per-cpu` parameter (can make test pass but)
will immediately narrow the test scope. It turns the sched_football into a
CPU-native test targeting its real-time runqueue. 

hi @Soma, @Jan, how about the test result on your SMP system with disabling
the RT_PUSH_IPI nowdays?

  # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched/features

If this way can eliminate false positives, I would prefer to use this
method first rather than binding the defense task to the CPU.

> > To the rest changes, the current test has several non-kernel issues
> > (RT throttling interference, uninitialized game_over on reruns, silent
> > sched_setscheduler failures) that produce false failures. These mask
> > the real kernel bugs you want to detect.
> 
> Again, I wasn't sent the patch that started this thread, so I don't
> have any strong objections.

There are just minor code issues, not the key reason for the failure.

Regards,
Li Wang

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

  reply	other threads:[~2026-04-21  3:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-12 13:05 [LTP] [PATCH] sched_football: fix false failures on many-CPU systems Soma Das
2026-04-14 15:54 ` Jan Polensky
2026-04-15  9:52   ` Li Wang
2026-04-15 15:20     ` Jan Polensky
2026-04-15 20:21       ` John Stultz via ltp
2026-04-16  3:23         ` Li Wang
2026-04-16  5:53           ` Li Wang
2026-04-16  8:26             ` Soma Das
2026-04-16  8:40               ` Li Wang
2026-04-20 22:49           ` John Stultz via ltp
2026-04-21  3:31             ` Li Wang [this message]
2026-04-14 16:59 ` [LTP] " linuxtestproject.agent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aebuYvezjPziv5r0@gmail.com \
    --to=wangli.ahau@gmail.com \
    --cc=jstultz@google.com \
    --cc=ltp@lists.linux.it \
    --cc=somadas1@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox