Re: scheduler crash on Power - Dietmar Eggemann

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Michael Ellerman <mpe@ellerman.id.au>,
	Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: "bruno@wolff.to" <bruno@wolff.to>,
	Michael Ellerman <michaele@au1.ibm.com>,
	"jwboyer@redhat.com" <jwboyer@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"peterz@infrdead.org" <peterz@infrdead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: scheduler crash on Power
Date: Mon, 04 Aug 2014 12:31:08 +0100	[thread overview]
Message-ID: <53DF6EFC.30705@arm.com> (raw)
In-Reply-To: <1407122432.2286.0.camel@concordia>

On 04/08/14 04:20, Michael Ellerman wrote:
> On Fri, 2014-08-01 at 14:24 -0700, Sukadev Bhattiprolu wrote:
>> Dietmar Eggemann [dietmar.eggemann@arm.com] wrote:
>> | > ltcbrazos2-lp07 login: [  181.915974] ------------[ cut here ]------=
------
>> | > [  181.915991] WARNING: at ../kernel/sched/core.c:5881
>> |=20
>> | This warning indicates the problem. One of the struct sched_domains do=
es
>> | not have it's groups member set.
>> |=20
>> | And its happening during a rebuild of the sched domain hierarchy, not
>> | during the initial build.
>> |=20
>> | You could run your system with the following patch-let (on top of
>> | https://lkml.org/lkml/2014/7/17/288)  w/ and w/o the perf related
>> | patches (w/ CONFIG_SCHED_DEBUG enabled).
>> |=20
>> | @@ -5882,6 +5882,9 @@ static void init_sched_groups_capacity(int cpu,
>> | struct sched_domain *sd)
>> |  {
>> |         struct sched_group *sg =3D sd->groups;
>> |=20
>> | +#ifdef CONFIG_SCHED_DEBUG
>> | +       printk("sd name: %s span: %pc\n", sd->name, sd->span);
>> | +#endif
>> |         WARN_ON(!sg);
>> |=20
>> |         do {
>> |=20
>> | This will show if the rebuild of the sched domain hierarchy happens on
>> | both systems and hopefully indicate for which sched_domain the
>> | sd->groups is not set.
>>
>> Thanks for the patch. It appears that the NUMA sched domain does not
>> have the sd->groups set - snippet of the error (with your patch and
>> Peter's patch)
>>
>> [  181.914494] build_sched_groups: got group c000000006da0000 with cpus:=
=20
>> [  181.914498] build_sched_groups: got group c0000000dd830000 with cpus:=
=20
>> [  181.915234] sd name: SMT span: 8-15
>> [  181.915239] sd name: DIE span: 0-7
>> [  181.915242] sd name: NUMA span: 0-15
>> [  181.915250] ------------[ cut here ]------------
>> [  181.915253] WARNING: at ../kernel/sched/core.c:5891
>>
>> Patched code:
>>
>> =095884 static void init_sched_groups_capacity(int cpu, struct sched_dom=
ain *sd)
>> =095885 {
>> =095886         struct sched_group *sg =3D sd->groups;
>> =095887=20
>> =095888 #ifdef CONFIG_SCHED_DEBUG
>> =095889         printk("sd name: %s span: %pc\n", sd->name, sd->span);
>> =095890 #endif
>> =095891         WARN_ON(!sg);
>>
>> Complete log below.
>>
>> I was able to bisect it down to this patch in the 24x7 patchset
>>
>> =09https://lkml.org/lkml/2014/5/27/804
>>
>> I replaced the kfree(page) calls in the patch with
>> kmem_cache_free(hv_page_cache, page).
>>
>> The problem sems to disappear if the call to create_events_from_catalog(=
)
>> in hv_24x7_init() is skipped. I am continuing to debug the 24x7 patch.
>=20
> Is that patch just clobbering memory it doesn't own and corrupting the
> scheduler data structures?

Quite likely. When the system comes up initially, it has SMT and DIE
sched domain level:

...
[    0.033832] build_sched_groups: got group c0000000e7d50000 with cpus:
[    0.033835] build_sched_groups: got group c0000000e7d80000 with cpus:
[    0.033844] sd name: SMT span: 8-15
[    0.033847] sd name: DIE span: 0-15  <-- !!!
[    0.033850] sd name: SMT span: 8-15
[    0.033853] sd name: DIE span: 0-15
...

and the cpu mask of DIE spans all CPUs '0-15'.

Then during the rebuild of the sched domain hierarchy, this looks very
different:

...
[  181.914494] build_sched_groups: got group c000000006da0000 with cpus:
[  181.914498] build_sched_groups: got group c0000000dd830000 with cpus:
[  181.915234] sd name: SMT span: 8-15
[  181.915239] sd name: DIE span: 0-7   <-- !!!
[  181.915242] sd name: NUMA span: 0-15
...

The cpu mask of the DIE level is all the sudden '0-7', which is clearly
wrong.

So I suspect that sched_domain_mask_f mask function for the DIE level
'cpu_cpu_mask()' returns a wrong value during this rebuild.

Could be checked with this little patch-let:

@@ -6467,6 +6467,12 @@ struct sched_domain *build_sched_domain(struct
sched_domain_topology_level *tl,
        if (!sd)
                return child;

+       printk("%s: cpu: %d level: %s cpu_map: %pc tl->mask: %pc\n",
+                       __func__,
+                       cpu, tl->name,
+                       cpu_map,
+                       tl->mask(cpu));
+
        cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
        if (child) {
                sd->level =3D child->level + 1;


Should give you something similar like:

...
build_sched_domain: cpu: 0 level: GMC cpu_map: 0-4 tl->mask: 0
build_sched_domain: cpu: 0 level: MC cpu_map: 0-4 tl->mask: 0-1
build_sched_domain: cpu: 0 level: DIE cpu_map: 0-4 tl->mask: 0-4
build_sched_domain: cpu: 1 level: GMC cpu_map: 0-4 tl->mask: 1
build_sched_domain: cpu: 1 level: MC cpu_map: 0-4 tl->mask: 0-1
build_sched_domain: cpu: 1 level: DIE cpu_map: 0-4 tl->mask: 0-4
...

>=20
> cheers
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" i=
n
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>=20

next prev parent reply	other threads:[~2014-08-04 11:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-30  7:22 scheduler crash on Power Sukadev Bhattiprolu
2014-07-31 11:57 ` Dietmar Eggemann
2014-08-01 21:24   ` Sukadev Bhattiprolu
2014-08-04  3:20     ` Michael Ellerman
2014-08-04 11:31       ` Dietmar Eggemann [this message]
2014-08-01  1:53 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53DF6EFC.30705@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=bruno@wolff.to \
    --cc=jwboyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=michaele@au1.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=peterz@infrdead.org \
    --cc=sukadev@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).