public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vineeth Pillai <vineeth@bitbyteword.org>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Phil Auld <pauld@redhat.com>,
	suleiman@google.com
Subject: Re: [PATCH v5 7/7] sched/fair: Fair server interface
Date: Fri, 19 Jan 2024 01:55:13 +0000	[thread overview]
Message-ID: <20240119015513.GA2187855@google.com> (raw)
In-Reply-To: <26adad2378c8b15533e4f6216c2863341e587f57.1699095159.git.bristot@kernel.org>

On Sat, Nov 04, 2023 at 11:59:24AM +0100, Daniel Bristot de Oliveira wrote:
> Add an interface for fair server setup on debugfs.
> 
> Each rq have three files under /sys/kernel/debug/sched/rq/CPU{ID}:
> 
>  - fair_server_runtime: set runtime in ns
>  - fair_server_period: set period in ns
>  - fair_server_defer: on/off for the defer mechanism
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>

Hi Daniel, Peter,
I am writing on behalf of the ChromeOS scheduler team.

We had to revert the last 3 patches in this series because of a syzkaller
reported bug, this happens on the sched/more branch in Peter's tree:

 WARNING: CPU: 0 PID: 2404 at kernel/sched/fair.c:5220
 place_entity+0x240/0x290 kernel/sched/fair.c:5147
 Call Trace:
 <TASK>
  enqueue_entity+0xdf/0x1130 kernel/sched/fair.c:5283
  enqueue_task_fair+0x241/0xbd0 kernel/sched/fair.c:6717
  enqueue_task+0x199/0x2f0 kernel/sched/core.c:2117
  activate_task+0x60/0xc0 kernel/sched/core.c:2147
  ttwu_do_activate+0x18d/0x6b0 kernel/sched/core.c:3794
  ttwu_queue kernel/sched/core.c:4047 [inline]
  try_to_wake_up+0x805/0x12f0 kernel/sched/core.c:4368
  kick_pool+0x2e7/0x3b0 kernel/workqueue.c:1142
  __queue_work+0xcf8/0xfe0 kernel/workqueue.c:1800
  queue_delayed_work_on+0x15a/0x260 kernel/workqueue.c:1986
  queue_delayed_work include/linux/workqueue.h:577 [inline]
  srcu_funnel_gp_start kernel/rcu/srcutree.c:1068 [inline]

which is basically this warning in place_entity:
		if (WARN_ON_ONCE(!load))
			load = 1;

Full log (scroll to the bottom as there is console/lockdep side effects which
are likely not relevant to this issue): https://paste.debian.net/1304579/

Side note, we are also looking into a KASAN nullptr deref but this happens
only on our backport of the patches to a 5.15 kernel, as far as we know.

KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 1592 Comm: syz-executor.0 Not tainted [...]
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
 RIP: 0010:____rb_erase_color lib/rbtree.c:354 [inline] 
 RIP: 0010:rb_erase+0x664/0xe1e lib/rbtree.c:445
 [...]
Call Trace:
 <TASK>
  set_next_entity+0x6e/0x576 kernel/sched/fair.c:4728
  set_next_task_fair+0x1bb/0x355 kernel/sched/fair.c:11943
  set_next_task kernel/sched/sched.h:2241 [inline] 
  pick_next_task kernel/sched/core.c:6014 [inline] 
  __schedule+0x36fb/0x402d kernel/sched/core.c:6378
  preempt_schedule_common+0x74/0xc0 kernel/sched/core.c:6590
  preempt_schedule+0xd6/0xdd kernel/sched/core.c:6615

Full splat: https://paste.debian.net/1304573/

Investigation is on going but could you also please take a look at these? It
is hard to reproduce and only syzbot has luck reproducing these.

Also I had a comment below:

> +int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 period, bool init)
> +{
> +	u64 old_bw = init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime);
> +	u64 new_bw = to_ratio(period, runtime);
> +	struct rq *rq = dl_se->rq;
> +	int cpu = cpu_of(rq);
> +	struct dl_bw *dl_b;
> +	unsigned long cap;
> +	int retval = 0;
> +	int cpus;
> +
> +	dl_b = dl_bw_of(cpu);
> +	raw_spin_lock(&dl_b->lock);
> +	cpus = dl_bw_cpus(cpu);
> +	cap = dl_bw_capacity(cpu);
> +
> +	if (__dl_overflow(dl_b, cap, old_bw, new_bw)) {

The dl_overflow() call here seems introducing an issue with our conceptual
understanding of how the dl server is supposed to work.

Suppose we have a 4 CPU system. Also suppose RT throttling is disabled.
Suppose the DL server params are 50ms runtime in 100ms period (basically we
want to dedicate 50% of the bandwidth of each CPU to CFS).

In such a situation, __dl_overflow() will return an error right? Because
total bandwidth will exceed 100% (4 times 50% is 200%).

Further, this complicates the setting of the parameters since it means we
have to check the number of CPUs in advance and then set the parameters to
prevent dl_overflow(). As an example of this, 30% (runtime / period) for each
CPU will work fine if we have 2 CPUs. But if we have 4 CPUs, it will not work
because __dl_overflow() will fail.

How do you suggest we remedy this? Can we make the dlserver calculate how
much bandwidth is allowed on a per-CPU basis? My understanding is the fake
dl_se are all pinned to their respective CPUs, so we don't have the same
requirement as real DL tasks which may migrate freely within the root domain.

thanks,

 - Joel


  parent reply	other threads:[~2024-01-19  1:55 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-04 10:59 [PATCH v5 0/7] SCHED_DEADLINE server infrastructure Daniel Bristot de Oliveira
2023-11-04 10:59 ` [PATCH v5 1/7] sched: Unify runtime accounting across classes Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 2/7] sched/deadline: Collect sched_dl_entity initialization Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 3/7] sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 4/7] sched/deadline: Introduce deadline servers Daniel Bristot de Oliveira
2023-11-15  9:04   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-11-04 10:59 ` [PATCH v5 5/7] sched/fair: Add trivial fair server Daniel Bristot de Oliveira
2023-11-06 14:24   ` Peter Zijlstra
2023-11-06 14:26     ` Daniel Bristot de Oliveira
2023-11-04 10:59 ` [PATCH v5 6/7] sched/deadline: Deferrable dl server Daniel Bristot de Oliveira
2023-11-06 14:55   ` Peter Zijlstra
2023-11-06 17:05     ` Daniel Bristot de Oliveira
2023-11-06 19:32   ` Joel Fernandes
2023-11-06 21:32     ` Joel Fernandes
2023-11-06 21:37       ` Joel Fernandes
2023-11-07 11:58         ` Daniel Bristot de Oliveira
2023-11-08  2:42           ` Joel Fernandes
2023-11-07 16:47         ` Steven Rostedt
2023-11-07 17:35           ` Steven Rostedt
2023-11-07 17:46             ` Steven Rostedt
2023-11-07 17:54             ` Steven Rostedt
2023-11-07 19:32               ` Steven Rostedt
2023-11-07 20:07                 ` Steven Rostedt
2023-11-07 17:37           ` Daniel Bristot de Oliveira
2023-11-07 18:50             ` Daniel Bristot de Oliveira
2023-11-08  3:20               ` Joel Fernandes
2023-11-08  8:01                 ` Daniel Bristot de Oliveira
2023-11-08 18:25                   ` Joel Fernandes
2023-11-08 12:44               ` Peter Zijlstra
2023-11-08 12:50                 ` Peter Zijlstra
2023-11-08 14:52                   ` Daniel Bristot de Oliveira
2023-11-08 13:46                 ` Daniel Bristot de Oliveira
2023-11-08 13:58                 ` Daniel Bristot de Oliveira
2023-11-08 15:14                 ` Juri Lelli
2023-11-08 16:57                   ` Peter Zijlstra
2023-11-08  2:37           ` Joel Fernandes
2023-11-07  7:30     ` Daniel Bristot de Oliveira
2023-11-07 16:37   ` Steven Rostedt
2023-11-13 15:05   ` kernel test robot
2024-03-20  0:03   ` Joel Fernandes
2024-03-20 19:24     ` Daniel Bristot de Oliveira
2024-03-21 16:15       ` Joel Fernandes
2024-03-23 14:37         ` Joel Fernandes
2024-04-05 14:35         ` Daniel Bristot de Oliveira
2024-04-08 17:11           ` Steven Rostedt
2023-11-04 10:59 ` [PATCH v5 7/7] sched/fair: Fair server interface Daniel Bristot de Oliveira
2023-11-04 15:18   ` kernel test robot
2023-11-05  0:55   ` kernel test robot
2023-11-06 15:40   ` Peter Zijlstra
2023-11-06 16:29     ` Daniel Bristot de Oliveira
2023-11-07  8:16       ` Peter Zijlstra
2023-11-07 14:06         ` Daniel Bristot de Oliveira
2023-11-07 14:44       ` Peter Zijlstra
2023-11-07 12:38   ` Peter Zijlstra
2023-11-07 13:24     ` Daniel Bristot de Oliveira
2024-01-19  1:49   ` Joel Fernandes
2024-01-19  1:55   ` Joel Fernandes [this message]
2024-01-22 14:14     ` Daniel Bristot de Oliveira
2024-01-23 15:39       ` Joel Fernandes
2024-01-23 15:44       ` Joel Fernandes
2024-02-13  2:13   ` Joel Fernandes
2024-02-13  2:21     ` Joel Fernandes
2024-02-14 14:23     ` Daniel Bristot de Oliveira
2024-02-15 13:57       ` Joel Fernandes
2024-02-15 17:27         ` Daniel Bristot de Oliveira
2024-02-15 17:41           ` Joel Fernandes
2024-04-04 17:43             ` Daniel Bristot de Oliveira
2023-12-08 21:47 ` [PATCH v5 0/7] SCHED_DEADLINE server infrastructure Joel Fernandes
2024-02-19  7:33 ` Huang, Ying
2024-02-19 10:23   ` Daniel Bristot de Oliveira
2024-02-20  3:28     ` Huang, Ying
2024-02-20  8:31       ` Daniel Bristot de Oliveira
2024-02-20  8:41         ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240119015513.GA2187855@google.com \
    --to=joel@joelfernandes.org \
    --cc=bristot@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.abeni@santannapisa.it \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=skhan@linuxfoundation.org \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=tommaso.cucinotta@santannapisa.it \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox