public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc
@ 2023-09-02  7:46 Yang Yingliang
  2023-09-04  2:26 ` Chen Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Yang Yingliang @ 2023-09-02  7:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, tglx, yangyingliang

I got the following warn report while doing stress test:

jump label: negative count!
WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
Call Trace:
 <TASK>
 __static_key_slow_dec_cpuslocked+0x16/0x70
 sched_cpu_deactivate+0x26e/0x2a0
 cpuhp_invoke_callback+0x3ad/0x10d0
 cpuhp_thread_fun+0x3f5/0x680
 smpboot_thread_fn+0x56d/0x8d0
 kthread+0x309/0x400
 ret_from_fork+0x41/0x70
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
the cpu offline failed, but sched_smt_present is decreased before
calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix
it by increasing sched_smt_present in the error path.

Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2299a5cfbfb9..b7ef2df36b75 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
 	sched_update_numa(cpu, false);
 	ret = cpuset_cpu_inactive(cpu);
 	if (ret) {
+#ifdef CONFIG_SCHED_SMT
+		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+#endif
 		balance_push_set(cpu, false);
 		set_cpu_active(cpu, true);
 		sched_update_numa(cpu, true);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc
  2023-09-02  7:46 [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc Yang Yingliang
@ 2023-09-04  2:26 ` Chen Yu
  2023-09-05  3:59   ` Chen Yu
  2023-09-07 22:11   ` Tim Chen
  0 siblings, 2 replies; 4+ messages in thread
From: Chen Yu @ 2023-09-04  2:26 UTC (permalink / raw)
  To: Yang Yingliang
  Cc: linux-kernel, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	tglx

Hi Yingliang,

On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> I got the following warn report while doing stress test:
> 

May I know if the test is to run many deadline tasks while offline the CPUs,
so as to trigger the failing case that removing one CPU gets us below the total
allocated bandwidth?

> jump label: negative count!
> WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> Call Trace:
>  <TASK>
>  __static_key_slow_dec_cpuslocked+0x16/0x70
>  sched_cpu_deactivate+0x26e/0x2a0
>  cpuhp_invoke_callback+0x3ad/0x10d0
>  cpuhp_thread_fun+0x3f5/0x680
>  smpboot_thread_fn+0x56d/0x8d0
>  kthread+0x309/0x400
>  ret_from_fork+0x41/0x70
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
> 
> Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),

s/Becaus/Because/

> the cpu offline failed, but sched_smt_present is decreased before
> calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix

s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?

> it by increasing sched_smt_present in the error path.
> 
> Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> ---
>  kernel/sched/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2299a5cfbfb9..b7ef2df36b75 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
>  	sched_update_numa(cpu, false);
>  	ret = cpuset_cpu_inactive(cpu);
>  	if (ret) {
> +#ifdef CONFIG_SCHED_SMT
> +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> +			static_branch_inc_cpuslocked(&sched_smt_present);
> +#endif

While checking the code, it seems that the core scheduling also missed
the error path, maybe we should also invoke sched_core_cpu_starting() to
restore the context. I'll have a check.

Other than above typo, it looks good to me,

Reviewed-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc
  2023-09-04  2:26 ` Chen Yu
@ 2023-09-05  3:59   ` Chen Yu
  2023-09-07 22:11   ` Tim Chen
  1 sibling, 0 replies; 4+ messages in thread
From: Chen Yu @ 2023-09-05  3:59 UTC (permalink / raw)
  To: Yang Yingliang
  Cc: linux-kernel, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	tglx

On 2023-09-04 at 10:26:20 +0800, Chen Yu wrote:
> Hi Yingliang,
> 
> On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> 
> While checking the code, it seems that the core scheduling also missed
> the error path, maybe we should also invoke sched_core_cpu_starting() to
> restore the context. I'll have a check.

After checking the core scheduling code, there is no need to restore
the context 'broken' by sched_core_cpu_deactivate(). It just tries to
find a new leader if the offline CPU is the old leader, and it should
not have any impact to switch to a new leader if the cpu hotplug offline
fails.

thanks,
Chenyu 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc
  2023-09-04  2:26 ` Chen Yu
  2023-09-05  3:59   ` Chen Yu
@ 2023-09-07 22:11   ` Tim Chen
  1 sibling, 0 replies; 4+ messages in thread
From: Tim Chen @ 2023-09-07 22:11 UTC (permalink / raw)
  To: Chen Yu, Yang Yingliang
  Cc: linux-kernel, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	tglx

On Mon, 2023-09-04 at 10:26 +0800, Chen Yu wrote:
> Hi Yingliang,
> 
> On 2023-09-02 at 15:46:09 +0800, Yang Yingliang wrote:
> > I got the following warn report while doing stress test:
> > 
> 
> May I know if the test is to run many deadline tasks while offline the CPUs,
> so as to trigger the failing case that removing one CPU gets us below the total
> allocated bandwidth?
> 
> > jump label: negative count!
> > WARNING: CPU: 3 PID: 38 at kernel/jump_label.c:263 static_key_slow_try_dec+0x9d/0xb0
> > Call Trace:
> >  <TASK>
> >  __static_key_slow_dec_cpuslocked+0x16/0x70
> >  sched_cpu_deactivate+0x26e/0x2a0
> >  cpuhp_invoke_callback+0x3ad/0x10d0
> >  cpuhp_thread_fun+0x3f5/0x680
> >  smpboot_thread_fn+0x56d/0x8d0
> >  kthread+0x309/0x400
> >  ret_from_fork+0x41/0x70
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> > 
> > Becaus when cpuset_cpu_inactive() fails in sched_cpu_deactivate(),
> 
> s/Becaus/Because/
> 
> > the cpu offline failed, but sched_smt_present is decreased before

s/decreased/decremented
to be precise

> > calling sched_cpu_deactivate, it leads unbalance dec/inc, so fix

s/leads unbalance dec/inc / leads to unbalanced dec/inc
> 
> s/calling sched_cpu_deactivate/calling cpuset_cpu_inactive() ?
> > it by increasing sched_smt_present in the error path.

s/increasing/incrementing

> > 
> > Fixes: c5511d03ec09 ("sched/smt: Make sched_smt_present track topology")
> > Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> > ---
> >  kernel/sched/core.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2299a5cfbfb9..b7ef2df36b75 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -9745,6 +9745,10 @@ int sched_cpu_deactivate(unsigned int cpu)
> >  	sched_update_numa(cpu, false);
> >  	ret = cpuset_cpu_inactive(cpu);
> >  	if (ret) {
> > +#ifdef CONFIG_SCHED_SMT
> > +		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
> > +			static_branch_inc_cpuslocked(&sched_smt_present);
> > +#endif
> 
> While checking the code, it seems that the core scheduling also missed
> the error path, maybe we should also invoke sched_core_cpu_starting() to
> restore the context. I'll have a check.
> 
> Other than above typo, it looks good to me,
> 
> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
> 

Other than some minor nits to commit log wording,

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-09-07 22:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-02  7:46 [PATCH] sched/smt: fix unbalance sched_smt_present dec/inc Yang Yingliang
2023-09-04  2:26 ` Chen Yu
2023-09-05  3:59   ` Chen Yu
2023-09-07 22:11   ` Tim Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox