All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Juri Lelli <juri.lelli@redhat.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Aashish Sharma <shraash@google.com>,
	Shin Kawamura <kawasin@google.com>,
	Vineeth Remanan Pillai <vineeth@bitbyteword.org>
Subject: Re: [PATCH] dl_server: Reset DL server params when rd changes
Date: Wed, 30 Oct 2024 19:50:17 +0000	[thread overview]
Message-ID: <20241030195017.GA4171541@google.com> (raw)
In-Reply-To: <ZyJC9MkbPeF9_rdP@jlelli-thinkpadt14gen4.remote.csb>

On Wed, Oct 30, 2024 at 03:30:12PM +0100, Juri Lelli wrote:
> Hi Joel,

Hi Juri!

> On 29/10/24 22:51, Joel Fernandes (Google) wrote:
> > During boot initialization, DL server parameters are initialized using the
> > default root domain before the proper scheduler domains and root domains
> > are built. This results in DL server parameters being tied to the default
> > root domain's bandwidth accounting instead of the actual root domain
> > assigned to the CPU after scheduler topology initialization.
> > 
> > When secondary CPUs are brought up, the dl_bw_cpus() accounting doesn't
> > properly track CPUs being added since the DL server was started too early
> > with the default root domain. Specifically, dl_bw_cpus() is called before
> > set_cpu_active() during secondary CPU bringup, causing it to not account
> > for the CPU being brought up in its capacity calculations. This causes
> > subsequent sysfs parameter updates to fail with -EBUSY due to bandwidth
> > accounting using the wrong root domain with zeroed total_bw.
> > 
> > This issue also causes under-utilization of system capacity. With the fix,
> > we see proper capacity initialization and scaling as CPUs come online - the
> > total system capacity increases from CPU 0 to CPU 1 and continues scaling
> > up as more CPUs are added (from cap=1024 initially to cap=8192 with 8
> > CPUs). Without the fix, the capacity initialization was incomplete since
> > dl_bw_cpus() runs before the CPU is marked active in set_cpu_active(),
> > leading to CPUs not being properly accounted for in the capacity
> > calculations.
> > 
> > Fix this by tracking the last root domain used for the DL server and
> > resetting the server parameters when the root domain changes. This ensures
> > bandwidth accounting uses the correct, fully initialized root domain after
> > the scheduler topology is built.
> 
> So, I'm trying to reproduce this issue, but currenlty not really seeing
> it, sorry.
> 
> I'm on a 40 CPUs box and, even if I fiddle with hotplug, the numbers I
> see from debug (bw, total_bw) seem sane and consistent with the fair
> server settings.
> 
> Could you please provide additional info about how you reproduce the
> issue? Maybe you have a test script around you could share?

With some prints [1] in the kernel, we can see on boot:

$ dmesg|grep appl
[    0.930337] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
[    0.949025] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
[    0.953026] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=2, cap=2048, init=1
[    0.957024] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=3, cap=3072, init=1
[    0.961023] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=4, cap=4096, init=1
[    0.965030] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=5, cap=5120, init=1
[    0.969024] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=6, cap=6144, init=1
[    0.973024] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=7, cap=7168, init=1

For the 8th apply_params, the 8th CPU is not considered. This is because
set_cpu_active() for the 8th CPU has not yet happened as mentioned in commit
message.

With the patch:

$ dmesg|grep appl
[    0.961169] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
[    0.981936] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=1, cap=1024, init=1
[    0.985836] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=2, cap=2048, init=1
[    0.989835] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=3, cap=3072, init=1
[    0.993840] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=4, cap=4096, init=1
[    0.997835] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=5, cap=5120, init=1
[    1.001838] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=6, cap=6144, init=1
[    1.005834] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=7, cap=7168, init=1

   [ ... here somewhere rd changes as topology init finishes, then all the
   params are replied, this time with the correct rd. ]

[    1.009903] dl_server_apply_params: cpu=0, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.012409] dl_server_apply_params: cpu=1, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.014269] dl_server_apply_params: cpu=2, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.019865] dl_server_apply_params: cpu=3, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.054908] dl_server_apply_params: cpu=4, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.081865] dl_server_apply_params: cpu=5, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.108861] dl_server_apply_params: cpu=6, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1
[    1.136944] dl_server_apply_params: cpu=7, runtime=50000000, period=1000000000, cpus=8, cap=8192, init=1

The -EBUSY happens for our 5.15 backport. I see dl_b->total_bw to be 0
without my patch. Even if the -EBUSY doesn't happen for you (perhaps due to
compiler or other differences), shouldn't we use the correct rd for
apply_params? The dl_bw is tied to the rd via  cpu_rq(cpu)->rd->dl_bw;

So if rd changes during boot initialization, the correct dl_bw has to be
updated AFAICS. Also if cpusets are used, the rd for a CPU may change.

Let me know if I missed something?

[1]
----------8<-----------
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d9d5a702f1a6..249f99e88b98 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -238,6 +238,12 @@ void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpus)
 static inline bool
 __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_bw)
 {
+	printk("dl_b->bw: %llu\n", dl_b->bw);
+	printk("cap: %lu, cap_scale(dl_b->bw, cap): %llu\n", cap, cap_scale(dl_b->bw, cap));
+	printk("dl_b->total_bw: %llu\n", dl_b->total_bw);
+	printk("old_bw: %llu\n", old_bw);
+	printk("new_bw: %llu\n", new_bw);
+
 	return dl_b->bw != -1 &&
 	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
 }
@@ -1704,6 +1710,10 @@ int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 perio
 	cpus = dl_bw_cpus(cpu);
 	cap = dl_bw_capacity(cpu);
 
+	printk("dl_server_apply_params: cpu=%d, runtime=%llu, period=%llu, cpus=%d, cap=%lu, init=%d\n",
+		cpu, runtime, period, cpus, cap, init);
+	printk("initial dl_b->total_bw=%llu, dl_b->bw=%llu\n", dl_b->total_bw, dl_b->bw);
+
 	if (__dl_overflow(dl_b, cap, old_bw, new_bw))
 		return -EBUSY;
 
-- 
2.47.0.163.g1226f6d8fa-goog


  reply	other threads:[~2024-10-30 19:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-29 22:51 [PATCH] dl_server: Reset DL server params when rd changes Joel Fernandes (Google)
2024-10-30 14:30 ` Juri Lelli
2024-10-30 19:50   ` Joel Fernandes [this message]
2024-11-04 10:54     ` Juri Lelli
2024-11-04 17:41       ` Joel Fernandes
2024-11-06 16:08         ` Juri Lelli
2024-11-06 18:05           ` Waiman Long
2024-11-08  4:40             ` Waiman Long
2024-11-08 10:40               ` Juri Lelli
2024-11-09  3:30               ` Waiman Long
2024-11-09 18:18                 ` Waiman Long
2024-11-11  9:37                   ` Juri Lelli
2024-11-11 12:24                     ` Juri Lelli
2024-10-30 15:54 ` kernel test robot
2024-10-30 17:06 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241030195017.GA4171541@google.com \
    --to=joel@joelfernandes.org \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kawasin@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shraash@google.com \
    --cc=suleiman@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.