public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Linux 5.14
       [not found] ` <20210830201225.GA2671970@roeck-us.net>
@ 2021-08-30 20:15   ` Linus Torvalds
  2021-08-30 21:28     ` Peter Zijlstra
  2021-08-30 20:32   ` Thomas Gleixner
  1 sibling, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2021-08-30 20:15 UTC (permalink / raw)
  To: Guenter Roeck, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger
  Cc: Linux Kernel Mailing List, Peter Zijlstra, linux-s390

On Mon, Aug 30, 2021 at 1:12 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> So far so good, but there is a brand new runtime warning, seen when booting
> s390 images.
>
> [    3.218816] ------------[ cut here ]------------
> [    3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180
> [    3.222845] Call Trace:
> [    3.222992]  [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180
> [    3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180)
> [    3.223182]  [<00000000001963e4>] sched_cpu_starting+0x2c/0x68
> [    3.223243]  [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970
> [    3.223304]  [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108
> [    3.223364]  [<000000000015123c>] notify_cpu_starting+0x84/0xa8
> [    3.223426]  [<0000000000117bca>] smp_init_secondary+0x72/0xf0
> [    3.223492]  [<0000000000117846>] smp_start_secondary+0x86/0x90
>
> Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized
> CPUs") seems to be the culprit. Indeed, the warning is gone after reverting
> this commit.

Ouch, not great timing.

Adding the s390 people to the cc too, just to make sure everybody
involved is aware.

           Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 5.14
       [not found] ` <20210830201225.GA2671970@roeck-us.net>
  2021-08-30 20:15   ` Linux 5.14 Linus Torvalds
@ 2021-08-30 20:32   ` Thomas Gleixner
  2021-08-30 23:57     ` Thomas Gleixner
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2021-08-30 20:32 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds
  Cc: Linux Kernel Mailing List, Peter Zijlstra, linux-s390,
	Heiko Carstens

On Mon, Aug 30 2021 at 13:12, Guenter Roeck wrote:
> On Sun, Aug 29, 2021 at 03:19:23PM -0700, Linus Torvalds wrote:
> So far so good, but there is a brand new runtime warning, seen when booting
> s390 images.
>
> [    3.218816] ------------[ cut here ]------------
> [    3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180
> [    3.222992]  [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180
> [    3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180)
> [    3.223182]  [<00000000001963e4>] sched_cpu_starting+0x2c/0x68
> [    3.223243]  [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970
> [    3.223304]  [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108
> [    3.223364]  [<000000000015123c>] notify_cpu_starting+0x84/0xa8
> [    3.223426]  [<0000000000117bca>] smp_init_secondary+0x72/0xf0
> [    3.223492]  [<0000000000117846>] smp_start_secondary+0x86/0x90
>
> Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized
> CPUs") sems to be the culprit. Indeed, the warning is gone after reverting
> this commit.

The warning is gone, but the underlying S390 problem persists:

S390 invokes notify_cpu_starting() _before_ updating the topology masks.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 5.14
  2021-08-30 20:15   ` Linux 5.14 Linus Torvalds
@ 2021-08-30 21:28     ` Peter Zijlstra
  2021-08-31 11:04       ` Heiko Carstens
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2021-08-30 21:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Linux Kernel Mailing List, linux-s390,
	Sven Schnelle

On Mon, Aug 30, 2021 at 01:15:37PM -0700, Linus Torvalds wrote:
> On Mon, Aug 30, 2021 at 1:12 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > So far so good, but there is a brand new runtime warning, seen when booting
> > s390 images.
> >
> > [    3.218816] ------------[ cut here ]------------
> > [    3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180
> > [    3.222845] Call Trace:
> > [    3.222992]  [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180
> > [    3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180)
> > [    3.223182]  [<00000000001963e4>] sched_cpu_starting+0x2c/0x68
> > [    3.223243]  [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970
> > [    3.223304]  [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108
> > [    3.223364]  [<000000000015123c>] notify_cpu_starting+0x84/0xa8
> > [    3.223426]  [<0000000000117bca>] smp_init_secondary+0x72/0xf0
> > [    3.223492]  [<0000000000117846>] smp_start_secondary+0x86/0x90
> >
> > Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized
> > CPUs") seems to be the culprit. Indeed, the warning is gone after reverting
> > this commit.
> 
> Ouch, not great timing.
> 
> Adding the s390 people to the cc too, just to make sure everybody
> involved is aware.

'Funny' thing, Sven actually tested that on s390. I had already comitted
the patch which is why his tag isn't on the commit:

  https://lkml.kernel.org/r/yt9dy28o8q0o.fsf@linux.ibm.com

Anyway, looks like Thomas found something fishy in their topology code.
Lemme go catch up.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 5.14
  2021-08-30 20:32   ` Thomas Gleixner
@ 2021-08-30 23:57     ` Thomas Gleixner
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2021-08-30 23:57 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds
  Cc: Linux Kernel Mailing List, Peter Zijlstra, linux-s390,
	Heiko Carstens, Sven Schnelle

On Mon, Aug 30 2021 at 22:32, Thomas Gleixner wrote:

> On Mon, Aug 30 2021 at 13:12, Guenter Roeck wrote:
>> On Sun, Aug 29, 2021 at 03:19:23PM -0700, Linus Torvalds wrote:
>> So far so good, but there is a brand new runtime warning, seen when booting
>> s390 images.
>>
>> [    3.218816] ------------[ cut here ]------------
>> [    3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180
>> [    3.222992]  [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180
>> [    3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180)
>> [    3.223182]  [<00000000001963e4>] sched_cpu_starting+0x2c/0x68
>> [    3.223243]  [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970
>> [    3.223304]  [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108
>> [    3.223364]  [<000000000015123c>] notify_cpu_starting+0x84/0xa8
>> [    3.223426]  [<0000000000117bca>] smp_init_secondary+0x72/0xf0
>> [    3.223492]  [<0000000000117846>] smp_start_secondary+0x86/0x90
>>
>> Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized
>> CPUs") sems to be the culprit. Indeed, the warning is gone after reverting
>> this commit.
>
> The warning is gone, but the underlying S390 problem persists:
>
> S390 invokes notify_cpu_starting() _before_ updating the topology masks.

And interestingly enough that very commit was tested on S390:

  https://lore.kernel.org/r/yt9dy28o8q0o.fsf@linux.ibm.com

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Linux 5.14
  2021-08-30 21:28     ` Peter Zijlstra
@ 2021-08-31 11:04       ` Heiko Carstens
  0 siblings, 0 replies; 5+ messages in thread
From: Heiko Carstens @ 2021-08-31 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Guenter Roeck, Vasily Gorbik,
	Christian Borntraeger, Linux Kernel Mailing List, linux-s390,
	Sven Schnelle, Thomas Gleixner

On Mon, Aug 30, 2021 at 11:28:54PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 30, 2021 at 01:15:37PM -0700, Linus Torvalds wrote:
> > On Mon, Aug 30, 2021 at 1:12 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > >
> > > So far so good, but there is a brand new runtime warning, seen when booting
> > > s390 images.
> > >
> > > [    3.218816] ------------[ cut here ]------------
> > > [    3.219010] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:5779 sched_core_cpu_starting+0x172/0x180
> > > [    3.222845] Call Trace:
> > > [    3.222992]  [<0000000000186e86>] sched_core_cpu_starting+0x176/0x180
> > > [    3.223114] ([<0000000000186dc4>] sched_core_cpu_starting+0xb4/0x180)
> > > [    3.223182]  [<00000000001963e4>] sched_cpu_starting+0x2c/0x68
> > > [    3.223243]  [<000000000014f288>] cpuhp_invoke_callback+0x318/0x970
> > > [    3.223304]  [<000000000014f970>] cpuhp_invoke_callback_range+0x90/0x108
> > > [    3.223364]  [<000000000015123c>] notify_cpu_starting+0x84/0xa8
> > > [    3.223426]  [<0000000000117bca>] smp_init_secondary+0x72/0xf0
> > > [    3.223492]  [<0000000000117846>] smp_start_secondary+0x86/0x90
> > >
> > > Commit 3c474b3239f12 ("sched: Fix Core-wide rq->lock for uninitialized
> > > CPUs") seems to be the culprit. Indeed, the warning is gone after reverting
> > > this commit.
> > 
> > Ouch, not great timing.
> > 
> > Adding the s390 people to the cc too, just to make sure everybody
> > involved is aware.
> 
> 'Funny' thing, Sven actually tested that on s390. I had already comitted
> the patch which is why his tag isn't on the commit:
> 
>   https://lkml.kernel.org/r/yt9dy28o8q0o.fsf@linux.ibm.com
> 
> Anyway, looks like Thomas found something fishy in their topology code.
> Lemme go catch up.

Sven provided the patch below which should fix the topology problem.
If it fixes everything it will go upstream with a stable tag, but it
first needs to see our CI to hopefully make sure it doesn't introduce
new regressions.

From: Sven Schnelle <svens@linux.ibm.com>
Subject: [PATCH] s390: fix topology information when calling cpu hotplug notifiers

The cpu hotplug notifiers are called without updating the core/thread
masks when a new CPU is added. This causes problems with code setting
up data structures in a cpu hotplug notifier, and relying on that later
in normal code.

This caused a crash in the new core scheduling code (SCHED_CORE),
where rq->core was set up in a notifier depending on cpu masks.

To fix this, add a cpu_setup_mask which is used in update_cpu_masks()
instead of the cpu_online_mask to determine whether the cpu masks should
be set for a certain cpu. Also move update_cpu_masks() to update the
masks before calling notify_cpu_starting() so that the notifiers are
seeing the updated masks.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
---
 arch/s390/include/asm/smp.h |  1 +
 arch/s390/kernel/smp.c      |  9 +++++++--
 arch/s390/kernel/topology.c | 10 +++++-----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/smp.h b/arch/s390/include/asm/smp.h
index e317fd4866c1..f16f4d054ae2 100644
--- a/arch/s390/include/asm/smp.h
+++ b/arch/s390/include/asm/smp.h
@@ -18,6 +18,7 @@ extern struct mutex smp_cpu_state_mutex;
 extern unsigned int smp_cpu_mt_shift;
 extern unsigned int smp_cpu_mtid;
 extern __vector128 __initdata boot_cpu_vector_save_area[__NUM_VXRS];
+extern cpumask_t cpu_setup_mask;
 
 extern int __cpu_up(unsigned int cpu, struct task_struct *tidle);
 
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 2a991e43ead3..1a04e5bdf655 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -95,6 +95,7 @@ __vector128 __initdata boot_cpu_vector_save_area[__NUM_VXRS];
 #endif
 
 static unsigned int smp_max_threads __initdata = -1U;
+cpumask_t cpu_setup_mask;
 
 static int __init early_nosmt(char *s)
 {
@@ -902,13 +903,14 @@ static void smp_start_secondary(void *cpuvoid)
 	vtime_init();
 	vdso_getcpu_init();
 	pfault_init();
+	cpumask_set_cpu(cpu, &cpu_setup_mask);
+	update_cpu_masks();
 	notify_cpu_starting(cpu);
 	if (topology_cpu_dedicated(cpu))
 		set_cpu_flag(CIF_DEDICATED_CPU);
 	else
 		clear_cpu_flag(CIF_DEDICATED_CPU);
 	set_cpu_online(cpu, true);
-	update_cpu_masks();
 	inc_irq_stat(CPU_RST);
 	local_irq_enable();
 	cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
@@ -950,10 +952,13 @@ early_param("possible_cpus", _setup_possible_cpus);
 int __cpu_disable(void)
 {
 	unsigned long cregs[16];
+	int cpu;
 
 	/* Handle possible pending IPIs */
 	smp_handle_ext_call();
-	set_cpu_online(smp_processor_id(), false);
+	cpu = smp_processor_id();
+	set_cpu_online(cpu, false);
+	cpumask_clear_cpu(cpu, &cpu_setup_mask);
 	update_cpu_masks();
 	/* Disable pseudo page faults on this cpu. */
 	pfault_fini();
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index d2458a29618f..5cc7aeae4610 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -67,9 +67,8 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c
 	static cpumask_t mask;
 
 	cpumask_clear(&mask);
-	if (!cpu_online(cpu))
+	if (!cpumask_test_cpu(cpu, &cpu_setup_mask))
 		goto out;
-	cpumask_set_cpu(cpu, &mask);
 	switch (topology_mode) {
 	case TOPOLOGY_MODE_HW:
 		while (info) {
@@ -89,6 +88,7 @@ static void cpu_group_map(cpumask_t *dst, struct mask_info *info, unsigned int c
 		break;
 	}
 	cpumask_and(&mask, &mask, cpu_online_mask);
+	cpumask_set_cpu(cpu, &mask);
 out:
 	cpumask_copy(dst, &mask);
 }
@@ -99,16 +99,15 @@ static void cpu_thread_map(cpumask_t *dst, unsigned int cpu)
 	int i;
 
 	cpumask_clear(&mask);
-	if (!cpu_online(cpu))
+	if (!cpumask_test_cpu(cpu, &cpu_setup_mask))
 		goto out;
 	cpumask_set_cpu(cpu, &mask);
 	if (topology_mode != TOPOLOGY_MODE_HW)
 		goto out;
 	cpu -= cpu % (smp_cpu_mtid + 1);
 	for (i = 0; i <= smp_cpu_mtid; i++)
-		if (cpu_present(cpu + i))
+		if (cpu_online(cpu + i))
 			cpumask_set_cpu(cpu + i, &mask);
-	cpumask_and(&mask, &mask, cpu_online_mask);
 out:
 	cpumask_copy(dst, &mask);
 }
@@ -569,6 +568,7 @@ void __init topology_init_early(void)
 	alloc_masks(info, &book_info, 2);
 	alloc_masks(info, &drawer_info, 3);
 out:
+	cpumask_set_cpu(0, &cpu_setup_mask);
 	__arch_update_cpu_topology();
 	__arch_update_dedicated_flag(NULL);
 }
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-31 11:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAHk-=wh75ELUu99yPkPNt+R166CK=-M4eoV+F62tW3TVgB7=4g@mail.gmail.com>
     [not found] ` <20210830201225.GA2671970@roeck-us.net>
2021-08-30 20:15   ` Linux 5.14 Linus Torvalds
2021-08-30 21:28     ` Peter Zijlstra
2021-08-31 11:04       ` Heiko Carstens
2021-08-30 20:32   ` Thomas Gleixner
2021-08-30 23:57     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox