Re: scheduler crash on Power - Dietmar Eggemann

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: "bruno@wolff.to" <bruno@wolff.to>,
	Michael Ellerman <michaele@au1.ibm.com>,
	"jwboyer@redhat.com" <jwboyer@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infrdead.org" <peterz@infrdead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: scheduler crash on Power
Date: Mon, 04 Aug 2014 12:31:08 +0100	[thread overview]
Message-ID: <53DF6EFC.30705@arm.com> (raw)
In-Reply-To: <1407122432.2286.0.camel@concordia>

On 04/08/14 04:20, Michael Ellerman wrote:
> On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
>> Dietmar Eggemann [dietmar.eggemann@arm.com] wrote:
>> | > ltcbrazos2-lp07 login: [  181.915974] ------------[ cut here ]------=
------
>> | > [  181.915991] WARNING: at ../kernel/sched/core.c:5881
>> |=20
>> | This warning indicates the problem. One of the struct sched_domains do=
es
>> | not have it's groups member set.
>> |=20
>> | And its happening during a rebuild of the sched domain hierarchy, not
>> | during the initial build.
>> |=20
>> | You could run your system with the following patch-let (on top of
>> | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
>> | patches (w/ CONFIG_SCHED_DEBUG enabled).
>> |=20
>> | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
>> | struct sched_domain *sd)
>> |  {
>> |         struct sched_group *sg =3D sd->groups;
>> |=20
>> | +#ifdef CONFIG_SCHED_DEBUG
>> | +       printk("sd name: %s span: %pc\n", sd->name, sd->span);
>> | +#endif
>> |         WARN_ON(!sg);
>> |=20
>> |         do {
>> |=20
>> | This will show if the rebuild of the sched domain hierarchy happens on
>> | both systems and hopefully indicate for which sched_domain the
>> | sd->groups is not set.
>>
>> Thanks for the patch. It appears that the NUMA sched domain does not
>> have the sd->groups set - snippet of the error (with your patch and
>> Peter's patch)
>>
>> [  181.914494] build_sched_groups: got group c000000006da0000 with cpus:=
=20
>> [  181.914498] build_sched_groups: got group c0000000dd830000 with cpus:=
=20
>> [  181.915234] sd name: SMT span: 8-15
>> [  181.915239] sd name: DIE span: 0-7
>> [  181.915242] sd name: NUMA span: 0-15
>> [  181.915250] ------------[ cut here ]------------
>> [  181.915253] WARNING: at ../kernel/sched/core.c:5891
>>
>> Patched code:
>>
>> =095884 static void init_sched_groups_capacity(int cpu, struct sched_dom=
ain *sd)
>> =095885 {
>> =095886         struct sched_group *sg =3D sd->groups;
>> =095887=20
>> =095888 #ifdef CONFIG_SCHED_DEBUG
>> =095889         printk("sd name: %s span: %pc\n", sd->name, sd->span);
>> =095890 #endif
>> =095891         WARN_ON(!sg);
>>
>> Complete log below.
>>
>> I was able to bisect it down to this patch in the 24x7 patchset
>>
>> =09https://lkml.org/lkml/2014/5/27/804
>>
>> I replaced the kfree(page) calls in the patch with
>> kmem_cache_free(hv_page_cache, page).
>>
>> The problem sems to disappear if the call to create_events_from_catalog(=
)
>> in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.
>=20
> Is that patch just clobbering memory it doesn't own and corrupting the
> scheduler data structures?

Quite likely. When the system comes up initially, it has SMT and DIE
sched domain level:

...
[    0.033832] build_sched_groups: got group c0000000e7d50000 with cpus:
[    0.033835] build_sched_groups: got group c0000000e7d80000 with cpus:
[    0.033844] sd name: SMT span: 8-15
[    0.033847] sd name: DIE span: 0-15  <-- !!!
[    0.033850] sd name: SMT span: 8-15
[    0.033853] sd name: DIE span: 0-15
...

and the cpu mask of DIE spans all CPUs '0-15'.

Then during the rebuild of the sched domain hierarchy, this looks very
different:

...
[  181.914494] build_sched_groups: got group c000000006da0000 with cpus:
[  181.914498] build_sched_groups: got group c0000000dd830000 with cpus:
[  181.915234] sd name: SMT span: 8-15
[  181.915239] sd name: DIE span: 0-7   <-- !!!
[  181.915242] sd name: NUMA span: 0-15
...

The cpu mask of the DIE level is all the sudden '0-7', which is clearly
wrong.

So I suspect that sched_domain_mask_f mask function for the DIE level
'cpu_cpu_mask()' returns a wrong value during this rebuild.

Could be checked with this little patch-let:

@@ -6467,6 +6467,12 @@ struct sched_domain *build_sched_domain(struct
sched_domain_topology_level *tl,
        if (!sd)
                return child;

+       printk("%s: cpu: %d level: %s cpu_map: %pc tl->mask: %pc\n",
+                       __func__,
+                       cpu, tl->name,
+                       cpu_map,
+                       tl->mask(cpu));
+
        cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
        if (child) {
                sd->level =3D child->level + 1;


Should give you something similar like:

...
build_sched_domain: cpu: 0 level: GMC cpu_map: 0-4 tl->mask: 0
build_sched_domain: cpu: 0 level: MC cpu_map: 0-4 tl->mask: 0-1
build_sched_domain: cpu: 0 level: DIE cpu_map: 0-4 tl->mask: 0-4
build_sched_domain: cpu: 1 level: GMC cpu_map: 0-4 tl->mask: 1
build_sched_domain: cpu: 1 level: MC cpu_map: 0-4 tl->mask: 0-1
build_sched_domain: cpu: 1 level: DIE cpu_map: 0-4 tl->mask: 0-4
...

>=20
> cheers
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" i=
n
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>=20

WARNING: multiple messages have this Message-ID (diff)

From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: "bruno@wolff.to" <bruno@wolff.to>,
	Michael Ellerman <michaele@au1.ibm.com>,
	"jwboyer@redhat.com" <jwboyer@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infrdead.org" <peterz@infrdead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: scheduler crash on Power
Date: Mon, 04 Aug 2014 12:31:08 +0100	[thread overview]
Message-ID: <53DF6EFC.30705@arm.com> (raw)
In-Reply-To: <1407122432.2286.0.camel@concordia>

On 04/08/14 04:20, Michael Ellerman wrote:
> On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
>> Dietmar Eggemann [dietmar.eggemann@arm.com] wrote:
>> | > ltcbrazos2-lp07 login: [  181.915974] ------------[ cut here ]------------
>> | > [  181.915991] WARNING: at ../kernel/sched/core.c:5881
>> | 
>> | This warning indicates the problem. One of the struct sched_domains does
>> | not have it's groups member set.
>> | 
>> | And its happening during a rebuild of the sched domain hierarchy, not
>> | during the initial build.
>> | 
>> | You could run your system with the following patch-let (on top of
>> | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
>> | patches (w/ CONFIG_SCHED_DEBUG enabled).
>> | 
>> | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
>> | struct sched_domain *sd)
>> |  {
>> |         struct sched_group *sg = sd->groups;
>> | 
>> | +#ifdef CONFIG_SCHED_DEBUG
>> | +       printk("sd name: %s span: %pc\n", sd->name, sd->span);
>> | +#endif
>> |         WARN_ON(!sg);
>> | 
>> |         do {
>> | 
>> | This will show if the rebuild of the sched domain hierarchy happens on
>> | both systems and hopefully indicate for which sched_domain the
>> | sd->groups is not set.
>>
>> Thanks for the patch. It appears that the NUMA sched domain does not
>> have the sd->groups set - snippet of the error (with your patch and
>> Peter's patch)
>>
>> [  181.914494] build_sched_groups: got group c000000006da0000 with cpus: 
>> [  181.914498] build_sched_groups: got group c0000000dd830000 with cpus: 
>> [  181.915234] sd name: SMT span: 8-15
>> [  181.915239] sd name: DIE span: 0-7
>> [  181.915242] sd name: NUMA span: 0-15
>> [  181.915250] ------------[ cut here ]------------
>> [  181.915253] WARNING: at ../kernel/sched/core.c:5891
>>
>> Patched code:
>>
>> 	5884 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
>> 	5885 {
>> 	5886         struct sched_group *sg = sd->groups;
>> 	5887 
>> 	5888 #ifdef CONFIG_SCHED_DEBUG
>> 	5889         printk("sd name: %s span: %pc\n", sd->name, sd->span);
>> 	5890 #endif
>> 	5891         WARN_ON(!sg);
>>
>> Complete log below.
>>
>> I was able to bisect it down to this patch in the 24x7 patchset
>>
>> 	https://lkml.org/lkml/2014/5/27/804
>>
>> I replaced the kfree(page) calls in the patch with
>> kmem_cache_free(hv_page_cache, page).
>>
>> The problem sems to disappear if the call to create_events_from_catalog()
>> in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.
> 
> Is that patch just clobbering memory it doesn't own and corrupting the
> scheduler data structures?

Quite likely. When the system comes up initially, it has SMT and DIE
sched domain level:

...
[    0.033832] build_sched_groups: got group c0000000e7d50000 with cpus:
[    0.033835] build_sched_groups: got group c0000000e7d80000 with cpus:
[    0.033844] sd name: SMT span: 8-15
[    0.033847] sd name: DIE span: 0-15  <-- !!!
[    0.033850] sd name: SMT span: 8-15
[    0.033853] sd name: DIE span: 0-15
...

and the cpu mask of DIE spans all CPUs '0-15'.

Then during the rebuild of the sched domain hierarchy, this looks very
different:

...
[  181.914494] build_sched_groups: got group c000000006da0000 with cpus:
[  181.914498] build_sched_groups: got group c0000000dd830000 with cpus:
[  181.915234] sd name: SMT span: 8-15
[  181.915239] sd name: DIE span: 0-7   <-- !!!
[  181.915242] sd name: NUMA span: 0-15
...

The cpu mask of the DIE level is all the sudden '0-7', which is clearly
wrong.

So I suspect that sched_domain_mask_f mask function for the DIE level
'cpu_cpu_mask()' returns a wrong value during this rebuild.

Could be checked with this little patch-let:

@@ -6467,6 +6467,12 @@ struct sched_domain *build_sched_domain(struct
sched_domain_topology_level *tl,
        if (!sd)
                return child;

+       printk("%s: cpu: %d level: %s cpu_map: %pc tl->mask: %pc\n",
+                       __func__,
+                       cpu, tl->name,
+                       cpu_map,
+                       tl->mask(cpu));
+
        cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
        if (child) {
                sd->level = child->level + 1;


Should give you something similar like:

...
build_sched_domain: cpu: 0 level: GMC cpu_map: 0-4 tl->mask: 0
build_sched_domain: cpu: 0 level: MC cpu_map: 0-4 tl->mask: 0-1
build_sched_domain: cpu: 0 level: DIE cpu_map: 0-4 tl->mask: 0-4
build_sched_domain: cpu: 1 level: GMC cpu_map: 0-4 tl->mask: 1
build_sched_domain: cpu: 1 level: MC cpu_map: 0-4 tl->mask: 0-1
build_sched_domain: cpu: 1 level: DIE cpu_map: 0-4 tl->mask: 0-4
...

> 
> cheers
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

next prev parent reply	other threads:[~2014-08-04 11:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-30  7:22 scheduler crash on Power Sukadev Bhattiprolu
2014-07-30  7:22 ` Sukadev Bhattiprolu
2014-07-31 11:57 ` Dietmar Eggemann
2014-07-31 11:57   ` Dietmar Eggemann
2014-08-01 21:24   ` Sukadev Bhattiprolu
2014-08-01 21:24     ` Sukadev Bhattiprolu
2014-08-04  3:20     ` Michael Ellerman
2014-08-04  3:20       ` Michael Ellerman
2014-08-04 11:31       ` Dietmar Eggemann [this message]
2014-08-04 11:31         ` Dietmar Eggemann
2014-08-01  1:53 ` Michael Ellerman
2014-08-01  1:53   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53DF6EFC.30705@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=bruno@wolff.to \
    --cc=jwboyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=michaele@au1.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infrdead.org \
    --cc=sukadev@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.