linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jan Stancek <jstancek@redhat.com>
To: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, vdavydov@parallels.com,
	benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au,
	anton@samba.org, nacc@linux.vnet.ibm.com,
	gkurz@linux.vnet.ibm.com, grant likely <grant.likely@linaro.org>,
	nikunj@linux.vnet.ibm.com, Steve Best <sbest@redhat.com>,
	Gustavo Duarte <gduarte@redhat.com>,
	Thomas Huth <thuth@redhat.com>
Subject: Re: [BUG] PowerNV crash with 4.4.0-rc8 at sched_init_numa (related to commit c118baf80256)
Date: Fri, 15 Jan 2016 09:18:01 -0500 (EST)	[thread overview]
Message-ID: <487148274.8430516.1452867481771.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160115134307.GA28330@linux.vnet.ibm.com>



----- Original Message -----
> From: "Raghavendra K T" <raghavendra.kt@linux.vnet.ibm.com>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: linuxppc-dev@lists.ozlabs.org, "raghavendra kt" <raghavendra.kt@linux.vnet.ibm.com>, vdavydov@parallels.com,
> benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, anton@samba.org, nacc@linux.vnet.ibm.com,
> gkurz@linux.vnet.ibm.com, "grant likely" <grant.likely@linaro.org>, nikunj@linux.vnet.ibm.com, "Steve Best"
> <sbest@redhat.com>, "Gustavo Duarte" <gduarte@redhat.com>, "Thomas Huth" <thuth@redhat.com>
> Sent: Friday, 15 January, 2016 2:43:07 PM
> Subject: Re: [BUG] PowerNV crash with 4.4.0-rc8 at sched_init_numa (related to commit c118baf80256)
> 
> * Jan Stancek <jstancek@redhat.com> [2016-01-09 18:03:55]:
> 
> > Hi,
> > 
> > I'm seeing bare metal ppc64le system crashing early during boot
> > with latest upstream kernel (4.4.0-rc8):
> > 
> > # git describe
> > v4.4-rc8-96-g751e5f5
> > 
> > [    0.625451] Unable to handle kernel paging request for data at address
> > 0x00000000
> > [    0.625586] Faulting instruction address: 0xc0000000004ae000
> > [    0.625698] Oops: Kernel access of bad area, sig: 11 [#1]
> > [    0.625789] SMP NR_CPUS=2048 NUMA PowerNV
> > [    0.625879] Modules linked in:
> > [    0.625973] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.0-rc8+ #6
> > [    0.626087] task: c000002ff4300000 ti: c000002ff6084000 task.ti:
> > c000002ff6084000
> > [    0.626224] NIP: c0000000004ae000 LR: c00000000090b9e4 CTR:
> > 0000000000000003
> > [    0.626361] REGS: c000002ff6087930 TRAP: 0300   Not tainted
> > (4.4.0-rc8+)
> > [    0.626475] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
> > 48002044  XER: 20000000
> > [    0.626808] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 40000000
> > SOFTE: 1
> > GPR00: c00000000090b9ac c000002ff6087bb0 c000000001700900 c000003ff229e080
> > GPR04: c000003ff229e080 0000000000000000 0000000000000003 0000000000000001
> > GPR08: 0000000000000000 0000000000000000 0000000000000010 9000000100001003
> > GPR12: 0000000000002200 c00000000fb40000 c00000000000bd68 0000000000000002
> > GPR16: 0000000000000028 c000000000b25940 c00000000173ffa4 0000000000000000
> > GPR20: c000000000b259d8 c000000000b259e0 c000000000b259e8 0000000000000000
> > GPR24: c000003ff229e080 0000000000000000 c00000000189b180 0000000000000000
> > GPR28: 0000000000000000 c000000001740a94 0000000000000002 0000000000000002
> > [    0.627925] NIP [c0000000004ae000] __bitmap_or+0x30/0x50
> > [    0.627973] LR [c00000000090b9e4] sched_init_numa+0x440/0x7c8
> > [    0.628030] Call Trace:
> > [    0.628054] [c000002ff6087bb0] [c00000000090b9ac]
> > sched_init_numa+0x408/0x7c8 (unreliable)
> > [    0.628136] [c000002ff6087ca0] [c000000000c60718]
> > sched_init_smp+0x60/0x238
> > [    0.628206] [c000002ff6087d00] [c000000000c44294]
> > kernel_init_freeable+0x1fc/0x3b4
> > [    0.628286] [c000002ff6087dc0] [c00000000000bd84] kernel_init+0x24/0x140
> > [    0.628356] [c000002ff6087e30] [c000000000009544]
> > ret_from_kernel_thread+0x5c/0x98
> > [    0.628435] Instruction dump:
> > [    0.628470] 38c6003f 78c9d183 4d820020 38c9ffff 39200000 78c60020
> > 38c60001 7cc903a6
> > [    0.628587] 60000000 60000000 60000000 60420000 <7d05482a> 7d44482a
> > 7d0a5378 7d43492a
> > [    0.628711] ---[ end trace b423f3e02b333fbf ]---
> > [    0.628757]
> > [    2.628822] Kernel panic - not syncing: Fatal exception
> > [    2.628969] Rebooting in 10 seconds..[    0.000000] OPAL V3 detected !
> > 
> ....
> > The crash goes away if I revert following commit:
> >   commit c118baf802562688d46e6002f2b5fe66b947da21
> >   Author: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> >   Date:   Thu Nov 5 18:46:29 2015 -0800
> >     arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing
> >     nodes
> >
> 
> Something like below should fix. I 'll send it in a separate email
>  marking Peter and Ingo. Basically for_each_node conversion
> has targeted only slowpaths / used_once sort of functions.
> But it seems there was a cpumask_or in sched_init_numa that used
> unallocated node.
> 
> Sorry for getting back late.. Was overcautious checking x86/power
> w/ and w/o DEBUG_PER_CPU_MAPS

Hi,

I ran it on my setup (same config as before) on top of v4.4-5966-g7d1fc01.
System now booted OK, dmesg looks clean.

Regards,
Jan

> 
> ---8<-----
> From 6680994a5a8dde7eccfbd2bffde341fdff2aed63 Mon Sep 17 00:00:00 2001
> From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> Date: Fri, 15 Jan 2016 18:19:56 +0530
> Subject: [PATCH] Fix: PowerNV crash with 4.4.0-rc8 at sched_init_numa
> 
> Commit c118baf80256 ("arch/powerpc/mm/numa.c: do not allocate bootmem
> memory for non existing nodes") avoided bootmem memory allocation for
> non existent nodes.
> 
> When DEBUG_PER_CPU_MAPS enabled, powerNV system failed to boot because
> in sched_init_numa, cpumask_or operation was done on unallocated nodes.
> Fix that by making cpumask_or operation only on existing nodes.
> 
> [ Tested with and w/o DEBUG_PER_CPU_MAPS on x86 and powerpc ]
> 
> Reported-by: Jan Stancek <jstancek@redhat.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 44253ad..474658b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6840,7 +6840,7 @@ static void sched_init_numa(void)
>  
>  			sched_domains_numa_masks[i][j] = mask;
>  
> -			for (k = 0; k < nr_node_ids; k++) {
> +			for_each_node(k) {
>  				if (node_distance(j, k) > sched_domains_numa_distance[i])
>  					continue;
>  
> --
> 1.7.11.7
> 
> 

      reply	other threads:[~2016-01-15 14:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1477405602.6296768.1452378871633.JavaMail.zimbra@redhat.com>
2016-01-09 23:03 ` [BUG] PowerNV crash with 4.4.0-rc8 at sched_init_numa (related to commit c118baf80256) Jan Stancek
2016-01-10  6:47   ` Raghavendra K T
2016-01-10  9:25     ` Jan Stancek
2016-01-11 11:52   ` Raghavendra K T
2016-01-11 13:11     ` Raghavendra K T
2016-01-15 13:43   ` Raghavendra K T
2016-01-15 14:18     ` Jan Stancek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=487148274.8430516.1452867481771.JavaMail.zimbra@redhat.com \
    --to=jstancek@redhat.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=gduarte@redhat.com \
    --cc=gkurz@linux.vnet.ibm.com \
    --cc=grant.likely@linaro.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=nacc@linux.vnet.ibm.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=sbest@redhat.com \
    --cc=thuth@redhat.com \
    --cc=vdavydov@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).