* [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
@ 2026-02-24 10:45 Jan Polensky
2026-02-24 12:44 ` Andrea Cervesato via ltp
2026-02-24 21:03 ` John Stultz via ltp
0 siblings, 2 replies; 7+ messages in thread
From: Jan Polensky @ 2026-02-24 10:45 UTC (permalink / raw)
To: chrubis, pvorel, jstultz; +Cc: Linux Test Project
The sched_football test has been intermittently failing, most noticeably
on systems with many CPUs and/or under load, due to a startup ordering
hole around kickoff.
At game start the referee can transition into kickoff while not all
defense threads have reliably reached their blocking path yet. This
creates a window where offense threads may run and increment the_ball.
Fix the startup protocol:
- Introduce a dedicated defense_ready_barrier to rendezvous all defense
threads with the referee before kickoff.
- Increase the pre-kickoff settle time (RT: 100 ms, non-RT: 2.5 s) to
account for large CPU counts and loaded systems.
- Add a compiler barrier in the defense busy-loop to prevent it from
being optimized away.
- Add an additional barrier step after the initial start phase so all
threads are positioned deterministically before the game begins.
Signed-off-by: Jan Polensky <japo@linux.ibm.com>
---
| 23 ++++++++++++++-----
1 file changed, 17 insertions(+), 6 deletions(-)
--git a/testcases/realtime/func/sched_football/sched_football.c b/testcases/realtime/func/sched_football/sched_football.c
index 2cb85322d782..08cdc2fd8b4e 100644
--- a/testcases/realtime/func/sched_football/sched_football.c
+++ b/testcases/realtime/func/sched_football/sched_football.c
@@ -50,6 +50,7 @@ static tst_atomic_t game_over;
static char *str_game_length;
static char *str_players_per_team;
static pthread_barrier_t start_barrier;
+static pthread_barrier_t defense_ready_barrier;
/* These are fans running across the field. They're trying to interrupt/distract everyone */
void *thread_fan(void *arg LTP_ATTRIBUTE_UNUSED)
@@ -81,11 +82,13 @@ void *thread_defense(void *arg LTP_ATTRIBUTE_UNUSED)
{
prctl(PR_SET_NAME, "defense", 0, 0, 0);
pthread_barrier_wait(&start_barrier);
+ pthread_barrier_wait(&defense_ready_barrier);
while (!tst_atomic_load(&kickoff_flag))
;
- /*keep the ball from being moved */
+ /* Keep the ball from being moved using a compiler barrier */
while (!tst_atomic_load(&game_over)) {
+ __asm__ __volatile__("" ::: "memory");
}
return NULL;
@@ -124,14 +127,18 @@ void referee(int game_length)
/* Start the game! */
atrace_marker_write("sched_football", "Game_started!");
pthread_barrier_wait(&start_barrier);
- usleep(200000);
+
+ /* Wait for defense to be ready before starting the game */
+ pthread_barrier_wait(&defense_ready_barrier);
+
+ /* Give defense threads time to establish */
+ if (tst_check_preempt_rt())
+ usleep(100000);
+ else
+ usleep(2500000);
tst_atomic_store(0, &the_ball);
tst_atomic_store(1, &kickoff_flag);
- if (tst_check_preempt_rt())
- usleep(20000);
- else
- usleep(2000000);
/* Watch the game */
while ((now.tv_sec - start.tv_sec) < game_length) {
@@ -170,6 +177,9 @@ static void do_test(void)
sched_setscheduler(0, SCHED_FIFO, ¶m);
tst_atomic_store(0, &kickoff_flag);
+ /* Defense ready barrier: defense threads + referee */
+ pthread_barrier_init(&defense_ready_barrier, NULL, players_per_team + 1);
+
/*
* Start the offense
* They are lower priority than defense, so they must be started first.
@@ -197,6 +207,7 @@ static void do_test(void)
referee(game_length);
pthread_barrier_destroy(&start_barrier);
+ pthread_barrier_destroy(&defense_ready_barrier);
}
static void do_setup(void)
--
2.53.0
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
2026-02-24 10:45 [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems Jan Polensky
@ 2026-02-24 12:44 ` Andrea Cervesato via ltp
2026-02-24 21:03 ` John Stultz via ltp
1 sibling, 0 replies; 7+ messages in thread
From: Andrea Cervesato via ltp @ 2026-02-24 12:44 UTC (permalink / raw)
To: Jan Polensky, chrubis, pvorel, jstultz; +Cc: Linux Test Project
Hi!
It looks good to me, but I'm wondering if we can use
TST_CHECKPOINT_WAIT2() in here to sync multiple threads instead of using
pthrea barriers which are a bit more complex.
Kind regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
2026-02-24 10:45 [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems Jan Polensky
2026-02-24 12:44 ` Andrea Cervesato via ltp
@ 2026-02-24 21:03 ` John Stultz via ltp
2026-02-25 9:23 ` Andrea Cervesato via ltp
1 sibling, 1 reply; 7+ messages in thread
From: John Stultz via ltp @ 2026-02-24 21:03 UTC (permalink / raw)
To: Jan Polensky; +Cc: Steven Rostedt, Linux Test Project
On Tue, Feb 24, 2026 at 2:45 AM Jan Polensky <japo@linux.ibm.com> wrote:
>
> The sched_football test has been intermittently failing, most noticeably
> on systems with many CPUs and/or under load, due to a startup ordering
> hole around kickoff.
>
I've not had time to closely review your suggestion here, but it
sounds reasonable.
That said, I want to warn you and ensure you are aware: the
RT_PUSH_IPI feature in the scheduler breaks the RT invariant
sched_football is testing.
I see this as a bug with that feature, but the scalability RT_PUSH_IPI
allows for seems more important to folks who are doing RT work then
the mis-behavior. Steven and I talked awhile back about some ideas on
how we might be able to do the pull in a more efficient way while
still holding the invariant true, and I have a bug to track it, but
its not been high enough priority to get bandwidth yet.
So you might want to make sure you disable that feature before testing via:
# echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched/features
thanks
-john
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
2026-02-24 21:03 ` John Stultz via ltp
@ 2026-02-25 9:23 ` Andrea Cervesato via ltp
2026-02-25 19:11 ` John Stultz via ltp
0 siblings, 1 reply; 7+ messages in thread
From: Andrea Cervesato via ltp @ 2026-02-25 9:23 UTC (permalink / raw)
To: John Stultz, Jan Polensky; +Cc: Linux Test Project, Steven Rostedt
On Tue Feb 24, 2026 at 10:03 PM CET, John Stultz via ltp wrote:
> On Tue, Feb 24, 2026 at 2:45 AM Jan Polensky <japo@linux.ibm.com> wrote:
> >
> > The sched_football test has been intermittently failing, most noticeably
> > on systems with many CPUs and/or under load, due to a startup ordering
> > hole around kickoff.
> >
>
> I've not had time to closely review your suggestion here, but it
> sounds reasonable.
>
> That said, I want to warn you and ensure you are aware: the
> RT_PUSH_IPI feature in the scheduler breaks the RT invariant
> sched_football is testing.
>
> I see this as a bug with that feature, but the scalability RT_PUSH_IPI
> allows for seems more important to folks who are doing RT work then
> the mis-behavior. Steven and I talked awhile back about some ideas on
> how we might be able to do the pull in a more efficient way while
> still holding the invariant true, and I have a bug to track it, but
> its not been high enough priority to get bandwidth yet.
>
> So you might want to make sure you disable that feature before testing via:
> # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched/features
>
> thanks
> -john
Thanks for your deep analysis on the possible issue. I'm not an RT expert,
but I trust your expertise in here :-) Will leave this patch review to
someone who's more skilled than me in this topic.
I have a suggestion tho.
About the NO_RT_PUSH_IPI, we are lucky: LTP provides a safe mechanism to
set the sys configurations and to restore it to default value after
test. You can find this in the `struct tst_test` and it's called
`.save_restore` [1]
I think it's worth to force this option according to the underlying
variant (and properly document this in the code with a comment).
WDYT?
[1] https://linux-test-project.readthedocs.io/en/latest/developers/api_c_tests.html#tst-test-definition
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
2026-02-25 9:23 ` Andrea Cervesato via ltp
@ 2026-02-25 19:11 ` John Stultz via ltp
2026-02-26 9:43 ` Andrea Cervesato via ltp
2026-03-02 8:56 ` Jan Polensky
0 siblings, 2 replies; 7+ messages in thread
From: John Stultz via ltp @ 2026-02-25 19:11 UTC (permalink / raw)
To: Andrea Cervesato; +Cc: Linux Test Project, Steven Rostedt
On Wed, Feb 25, 2026 at 1:23 AM Andrea Cervesato
<andrea.cervesato@suse.com> wrote:
> On Tue Feb 24, 2026 at 10:03 PM CET, John Stultz via ltp wrote:
> > On Tue, Feb 24, 2026 at 2:45 AM Jan Polensky <japo@linux.ibm.com> wrote:
> > >
> > > The sched_football test has been intermittently failing, most noticeably
> > > on systems with many CPUs and/or under load, due to a startup ordering
> > > hole around kickoff.
> > >
> >
> > I've not had time to closely review your suggestion here, but it
> > sounds reasonable.
> >
> > That said, I want to warn you and ensure you are aware: the
> > RT_PUSH_IPI feature in the scheduler breaks the RT invariant
> > sched_football is testing.
> >
> > I see this as a bug with that feature, but the scalability RT_PUSH_IPI
> > allows for seems more important to folks who are doing RT work then
> > the mis-behavior. Steven and I talked awhile back about some ideas on
> > how we might be able to do the pull in a more efficient way while
> > still holding the invariant true, and I have a bug to track it, but
> > its not been high enough priority to get bandwidth yet.
> >
> > So you might want to make sure you disable that feature before testing via:
> > # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched/features
> >
> > thanks
> > -john
>
> Thanks for your deep analysis on the possible issue. I'm not an RT expert,
> but I trust your expertise in here :-) Will leave this patch review to
> someone who's more skilled than me in this topic.
>
> I have a suggestion tho.
>
> About the NO_RT_PUSH_IPI, we are lucky: LTP provides a safe mechanism to
> set the sys configurations and to restore it to default value after
> test. You can find this in the `struct tst_test` and it's called
> `.save_restore` [1]
>
> I think it's worth to force this option according to the underlying
> variant (and properly document this in the code with a comment).
>
> WDYT?
That seems reasonable, as long as it's clearly labeled as a workaround
and hopefully can be dropped (or kernel version limited) when the
issue is finally addressed.
thanks
-john
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
2026-02-25 19:11 ` John Stultz via ltp
@ 2026-02-26 9:43 ` Andrea Cervesato via ltp
2026-03-02 8:56 ` Jan Polensky
1 sibling, 0 replies; 7+ messages in thread
From: Andrea Cervesato via ltp @ 2026-02-26 9:43 UTC (permalink / raw)
To: John Stultz; +Cc: Linux Test Project, Steven Rostedt
Hi John,
> That seems reasonable, as long as it's clearly labeled as a workaround
> and hopefully can be dropped (or kernel version limited) when the
> issue is finally addressed.
Sure, that can be eventually addressed by using `tst_kvercmp()` or
`tst_kvercmp2` (to be even more specific).
Kind regards,
--
Andrea Cervesato
SUSE QE Automation Engineer Linux
andrea.cervesato@suse.com
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems
2026-02-25 19:11 ` John Stultz via ltp
2026-02-26 9:43 ` Andrea Cervesato via ltp
@ 2026-03-02 8:56 ` Jan Polensky
1 sibling, 0 replies; 7+ messages in thread
From: Jan Polensky @ 2026-03-02 8:56 UTC (permalink / raw)
To: John Stultz, Andrea Cervesato; +Cc: Linux Test Project, Steven Rostedt
On Wed, Feb 25, 2026 at 11:11:53AM -0800, John Stultz wrote:
> On Wed, Feb 25, 2026 at 1:23 AM Andrea Cervesato
> <andrea.cervesato@suse.com> wrote:
> > On Tue Feb 24, 2026 at 10:03 PM CET, John Stultz via ltp wrote:
> > > On Tue, Feb 24, 2026 at 2:45 AM Jan Polensky <japo@linux.ibm.com> wrote:
> > > >
> > > > The sched_football test has been intermittently failing, most noticeably
> > > > on systems with many CPUs and/or under load, due to a startup ordering
> > > > hole around kickoff.
> > > >
> > >
> > > I've not had time to closely review your suggestion here, but it
> > > sounds reasonable.
> > >
> > > That said, I want to warn you and ensure you are aware: the
> > > RT_PUSH_IPI feature in the scheduler breaks the RT invariant
> > > sched_football is testing.
> > >
> > > I see this as a bug with that feature, but the scalability RT_PUSH_IPI
> > > allows for seems more important to folks who are doing RT work then
> > > the mis-behavior. Steven and I talked awhile back about some ideas on
> > > how we might be able to do the pull in a more efficient way while
> > > still holding the invariant true, and I have a bug to track it, but
> > > its not been high enough priority to get bandwidth yet.
> > >
> > > So you might want to make sure you disable that feature before testing via:
> > > # echo NO_RT_PUSH_IPI > /sys/kernel/debug/sched/features
> > >
> > > thanks
> > > -john
> >
> > Thanks for your deep analysis on the possible issue. I'm not an RT expert,
> > but I trust your expertise in here :-) Will leave this patch review to
> > someone who's more skilled than me in this topic.
> >
> > I have a suggestion tho.
> >
> > About the NO_RT_PUSH_IPI, we are lucky: LTP provides a safe mechanism to
> > set the sys configurations and to restore it to default value after
> > test. You can find this in the `struct tst_test` and it's called
> > `.save_restore` [1]
> >
> > I think it's worth to force this option according to the underlying
> > variant (and properly document this in the code with a comment).
> >
> > WDYT?
>
> That seems reasonable, as long as it's clearly labeled as a workaround
> and hopefully can be dropped (or kernel version limited) when the
> issue is finally addressed.
>
> thanks
> -john
Hi Andrea, hi John,
thank you for the thorough review and the helpful remarks.
After going through the feedback, I think it makes sense to step back and
rework the patch. The main objective is to drive the failure rate down as
much as possible, and the current version still shows weaknesses,
especially with respect to steal time. On heavily loaded systems I also
still observe frequent TBROK results, so the timing clearly needs further
tuning.
I will take some time to revisit the design and incorporate these aspects
before posting a revised version.
Thanks again for your comments and suggestions.
Thanks & Greetings
Jan
--
Mailing list info: https://lists.linux.it/listinfo/ltp
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-02 8:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 10:45 [LTP] [PATCH v1 1/1] sched_football: harden kickoff synchronization on high-CPU systems Jan Polensky
2026-02-24 12:44 ` Andrea Cervesato via ltp
2026-02-24 21:03 ` John Stultz via ltp
2026-02-25 9:23 ` Andrea Cervesato via ltp
2026-02-25 19:11 ` John Stultz via ltp
2026-02-26 9:43 ` Andrea Cervesato via ltp
2026-03-02 8:56 ` Jan Polensky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox