public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: gmonaco@redhat.com
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Tejun Heo <tj@kernel.org>, Joel Fernandes <joelagnelf@nvidia.com>,
	David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] sched/deadline: Reset dl_server execution state on stop
Date: Thu, 29 Jan 2026 18:32:58 +0100	[thread overview]
Message-ID: <aXuZysNQCVrfFzx7@gpd4> (raw)
In-Reply-To: <b7ffc7e6121320d29cedcff0e2b68ad76c8e2775.camel@redhat.com>

Hi Gabriele,

On Thu, Jan 29, 2026 at 12:48:35PM +0100, gmonaco@redhat.com wrote:
> On Wed, 2026-01-28 at 14:41 +0100, Andrea Righi wrote:
> > Just to make sure we're testing the same thing, I'm currently using
> > https://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git,
> > branch
> > scx-dl-server.
> > 
> > I'm running this test inside virtme-ng:
> >   $ vng -vb --config tools/testing/selftests/sched_ext/config
> >   $ vng -v -- tools/testing/selftests/sched_ext/runner -t rt_stall
> 
> Well, that's a fun one, I could reproduce the same failure you
> described in vng on another x86 box.
> 
> The arm box (bare metal) I used initially still passes just fine all 4
> iterations of the test.
> 
> 
> On the x86 box (vng) I tried different orders of iterations (where the
> original is fair-ext-fair-ext) with and without the ext server active.
> 
> No ext-server: the ext iteration fails and breaks also fair (unlike the
> arm64 box where the fair was intact)
> ext-server active: a sequence fair-ext breaks both (like you observe).
> 
> I don't have time to look further into this right now, but it looks
> like an interesting pattern.

Thanks for checking and reproducing it.

Considering that these issues around DL server stop/start transitions can
be triggered introducing an additional DL server (EXT) makes me wonder
whether this could become even more problematic as we add more DL servers
(hierarchical DL servers?).

Considering that unconditionally clearing dl_defer_running in
dl_server_stop() seems to re-establish a clear state-machine workflow,
I think we should go with that fix for now, so we can unblock the EXT DL
server patch set. With that change in place, all the server combinations
and sequences I've tested seem to behave consistently.

We can always revisit preserving the short-sleep optimization later if we
find a way to do it with stronger guarantees (and I'll keep investigating
on this), but for now the unconditional reset seems like the most robust
fix to me.

Opinions? Peter / Juri?

Thanks,
-Andrea

  reply	other threads:[~2026-01-29 17:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 16:16 [PATCH v2] sched/deadline: Reset dl_server execution state on stop Andrea Righi
2026-01-23 16:22 ` Juri Lelli
2026-01-26 14:20 ` Gabriele Monaco
2026-01-26 16:30   ` Andrea Righi
2026-01-26 16:56     ` Gabriele Monaco
2026-01-26 21:26       ` Andrea Righi
2026-01-27  8:52         ` Gabriele Monaco
2026-01-27 14:18           ` Andrea Righi
2026-01-27 16:00             ` Gabriele Monaco
2026-01-27 18:54               ` Andrea Righi
2026-01-28  9:50                 ` Gabriele Monaco
2026-01-28 13:41                   ` Andrea Righi
2026-01-29 11:48                     ` gmonaco
2026-01-29 17:32                       ` Andrea Righi [this message]
2026-01-30  7:30                         ` Juri Lelli
2026-01-30 12:24                     ` Peter Zijlstra
2026-01-30 12:26                       ` Peter Zijlstra
2026-01-30 12:41                         ` Peter Zijlstra
2026-01-30 15:52                           ` Juri Lelli
2026-01-30 16:25                           ` Andrea Righi
2026-01-30 16:40                             ` Peter Zijlstra
2026-01-30 16:46                               ` Andrea Righi
2026-01-30 22:12                           ` [tip: sched/urgent] sched/deadline: Fix 'stuck' dl_server tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXuZysNQCVrfFzx7@gpd4 \
    --to=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gmonaco@redhat.com \
    --cc=hodgesd@meta.com \
    --cc=joelagnelf@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox