All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	linux-kernel@vger.kernel.org, Sashiko <sashiko-bot@kernel.org>
Subject: Re: [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs
Date: Fri, 29 May 2026 09:09:30 +0200	[thread overview]
Message-ID: <ahk7qv_NZemAVABY@gpd4> (raw)
In-Reply-To: <ahWNGdRTDb4mT6oS@jlelli-thinkpadt14gen4.remote.csb>

Hi Peter,

On Tue, May 26, 2026 at 02:07:53PM +0200, Juri Lelli wrote:
> Hi Andrea,
> 
> On 26/05/26 12:05, Andrea Righi wrote:
> > Writing runtime or period via the per-CPU dl_server debugfs files
> > (/sys/kernel/debug/sched/{fair,ext}_server/cpu*/{runtime,period}) on an
> > offline CPU can trigger two distinct kernel issues:
> > 
> > 1) Divide-by-zero in dl_server_apply_params():
> > 
> >   Oops: divide error: 0000 [#1] SMP NOPTI
> >   RIP: 0010:dl_server_apply_params+0x239/0x3a0
> >   Call Trace:
> >    sched_server_write_common.isra.0+0x21a/0x3c0
> >    full_proxy_write+0x78/0xd0
> >    vfs_write+0xe7/0x6e0
> > 
> >   Both __dl_sub() and __dl_add() divide by cpus internally, which can be
> >   0 once the CPU has been removed from any active root-domain span (this
> >   has been latent since the debugfs interface was introduced).
> > 
> > 2) WARN_ON_ONCE in dl_server_start():
> > 
> >   WARNING: kernel/sched/deadline.c:1805 at dl_server_start+0x232/0x270
> > 
> >   Commit ee6e44dfe6e5 ("sched/deadline: Stop dl_server before CPU goes
> >   offline") added this check to catch enqueueing the server on an
> >   offline rq.
> > 
> > There's no meaningful semantics for re-configuring the per-CPU dl_server
> > bandwidth while the CPU is offline, so simply reject the write with
> > -EBUSY so userspace gets a clear error.
> > 
> > Reported-by: Sashiko <sashiko-bot@kernel.org>
> > Closes: https://lore.kernel.org/all/20260526092228.3B6891F00A3A@smtp.kernel.org/
> > Fixes: d741f297bcea ("sched/fair: Fair server interface")
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/debug.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> > index ed3a0d65da0ca..e57ad8c78a60e 100644
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -415,6 +415,9 @@ static ssize_t sched_server_write_common(struct file *filp, const char __user *u
> >  			return  -EINVAL;
> >  		}
> >  
> > +		if (!cpu_online(cpu_of(rq)))
> > +			return -EBUSY;
> > +
> >  		update_rq_clock(rq);
> >  		dl_server_stop(dl_se);
> >  		retval = dl_server_apply_params(dl_se, runtime, period, 0);
> 
> I was looking at Sashiko findings and wondered what to do about this as
> well. I think what you are proposing should be fine, unless for some
> reason one wants to tweak dl-server parameters before swithcing a CPU
> on. but since hotplug it's a disruptive operation already, I would say
> imposing to make such a change after CPU is online should be ok (and
> simpler to get right from a bandwidth accounting pov).
> 
> Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

If this makes sense to you, could you add it to your queue:sched/core?

Otherwise it's possible to trigger the issues above by changing dl_server
bandwidth for offline CPUs.

Thanks,
-Andrea

  reply	other threads:[~2026-05-29  7:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-26 10:05 [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs Andrea Righi
2026-05-26 12:07 ` Juri Lelli
2026-05-29  7:09   ` Andrea Righi [this message]
2026-05-29  9:14     ` Peter Zijlstra
2026-05-29 10:45 ` [tip: sched/core] " tip-bot2 for Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2026-05-26 10:06 [PATCH] " Andrea Righi
2026-05-26 12:04 ` abaci-kreproducer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahk7qv_NZemAVABY@gpd4 \
    --to=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sashiko-bot@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.