From: Xishi Qiu <qiuxishi@huawei.com>
To: Dave Young <dyoung@redhat.com>
Cc: <x86@kernel.org>, <linux-kernel@vger.kernel.org>,
<tglx@linutronix.de>, <bhe@redhat.com>,
<akpm@linux-foundation.org>, <isimatu.yasuaki@jp.fujitsu.com>,
<mingo@redhat.com>, <hpa@zytor.com>
Subject: Re: [PATCH V2] x86/numa: kernel stack corruption fix
Date: Wed, 8 Apr 2015 09:36:47 +0800 [thread overview]
Message-ID: <5524862F.6010709@huawei.com> (raw)
In-Reply-To: <20150407134132.GA23522@dhcp-16-198.nay.redhat.com>
On 2015/4/7 21:41, Dave Young wrote:
> I got below kernel panic during kdump test on Thinkpad T420 laptop:
>
> [ 0.000000] No NUMA configuration found
> [ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff]
> [ 0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is cor
> upted in: ffffffff81d21910 r
> [ 0.000000]
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.0.0-rc6+ #44
> [ 0.000000] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET76WW (1.46 ) 07/
> 5/2013 0
> [ 0.000000] 0000000000000000 c70296ddd809e4f6 ffffffff81b67ce8 ffffffff817c
> a26 2
> [ 0.000000] 0000000000000000 ffffffff81a61c90 ffffffff81b67d68 ffffffff817b
> 8d2 c
> [ 0.000000] 0000000000000010 ffffffff81b67d78 ffffffff81b67d18 c70296ddd809
> 4f6 e
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff817c2a26>] dump_stack+0x45/0x57
> [ 0.000000] [<ffffffff817bc8d2>] panic+0xd0/0x204
> [ 0.000000] [<ffffffff81d21910>] ? numa_clear_kernel_node_hotplug+0xe6/0xf2
> [ 0.000000] [<ffffffff8107741b>] __stack_chk_fail+0x1b/0x20
> [ 0.000000] [<ffffffff81d21910>] numa_clear_kernel_node_hotplug+0xe6/0xf2
> [ 0.000000] [<ffffffff81d21e5d>] numa_init+0x1a5/0x520
> [ 0.000000] [<ffffffff81d222b1>] x86_numa_init+0x19/0x3d
> [ 0.000000] [<ffffffff81d22460>] initmem_init+0x9/0xb
> [ 0.000000] [<ffffffff81d0d00c>] setup_arch+0x94f/0xc82
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff817bd0bb>] ? printk+0x55/0x6b
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff81d05d9b>] start_kernel+0xe8/0x4d6
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c
> [ 0.000000] [<ffffffff81d05751>] x86_64_start_kernel+0x161/0x184
> [ 0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta
> k is corrupted in: ffffffff81d21910 c
> [ 0.000000]
> PANIC: early exception 0d rip 10:ffffffff8105d2a6 error 7eb cr2 ffff8800371dd00
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.0.0-rc6+ #44 0
> [ 0.000000] Hardware name: LENOVO 4236NUC/4236NUC, BIOS 83ET76WW (1.46 ) 07/
> 5/2013 0
> [ 0.000000] 0000000000000000 c70296ddd809e4f6 ffffffff81b67c60 ffffffff817c
> a26 2
> [ 0.000000] 0000000000000096 ffffffff81a61c90 ffffffff81b67d68 fffffff00000
> 084 0000000000000a0d 0000000000000a00 0
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff817c2a26>] dump_stack+0x45/0x57
> [ 0.000000] [<ffffffff81d051b0>] early_idt_handler+0x90/0xb7
> [ 0.000000] [<ffffffff8105d2a6>] ? native_irq_enable+0x6/0x10
> [ 0.000000] [<ffffffff817bc9c5>] ? panic+0x1c3/0x204
> [ 0.000000] [<ffffffff81d21910>] ? numa_clear_kernel_node_hotplug+0xe6/0xf2
> [ 0.000000] [<ffffffff8107741b>] __stack_chk_fail+0x1b/0x20
> [ 0.000000] [<ffffffff81d21910>] numa_clear_kernel_node_hotplug+0xe6/0xf2
> [ 0.000000] [<ffffffff81d21e5d>] numa_init+0x1a5/0x520
> [ 0.000000] [<ffffffff81d222b1>] x86_numa_init+0x19/0x3d
> [ 0.000000] [<ffffffff81d22460>] initmem_init+0x9/0xb
> [ 0.000000] [<ffffffff81d0d00c>] setup_arch+0x94f/0xc82
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff817bd0bb>] ? printk+0x55/0x6b
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff81d05d9b>] start_kernel+0xe8/0x4d6
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
> [ 0.000000] [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c
> [ 0.000000] [<ffffffff81d05751>] x86_64_start_kernel+0x161/0x184
> [ 0.000000] RIP 0x46
>
> This is caused by writing over end of numa mask bitmap.
>
> numa_clear_kernel_node try to set node id in a mask bitmap, it iterating all
> reserved region and assume every regions have valid nid. It is not true because
> There's an exception for some graphic memory quirks.
> See function trim_snb_memory in arch/x86/kernel/setup.c
>
> It is easily to reproduce the bug in kdump kernel because kdump kernel use
> pre-reserved memory instead of whole memory, but kexec pass other reserved
> memory ranges to 2nd kernel as well. like below in my test:
>
> kdump kernel ram 0x2d000000 - 0x37bfffff
> One of the reserved regions: 0x40000000 - 0x40100000 which includes 0x40004000,
> a page excluded in trim_snb_memory. For this memblock reserved region the nid
> is not set, it is still default value MAX_NUMNODES. later node_set will set bit
> MAX_NUMNODES thus stack corruption happen.
>
> This also happens when booting with mem= kernel commandline during my test.
>
> Fixing it by adding a check, do not call node_set in case nid is MAX_NUMNODES.
>
> Signed-off-by: Dave Young <dyoung@redhat.com>
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> ---
> arch/x86/mm/numa.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> --- linux.orig/arch/x86/mm/numa.c
> +++ linux/arch/x86/mm/numa.c
> @@ -482,9 +482,16 @@ static void __init numa_clear_kernel_nod
> &memblock.reserved, mb->nid);
> }
>
> - /* Mark all kernel nodes. */
> + /*
> + * Mark all kernel nodes.
> + *
> + * In case booting with mem=nn[kMG] or in kdump kernel, numa_meminfo
Hi Dave,
It should both set mem=xx and numa=off, then numa_meminfo may not include all
the memblock.reserved memory, right?
Thanks,
Xishi Qiu
> + * may not include all the memblock.reserved memory ranges because
> + * trim_snb_memory() reserves specific pages for Sandy Bridge graphics.
> + */
> for_each_memblock(reserved, r)
> - node_set(r->nid, numa_kernel_nodes);
> + if (r->nid != MAX_NUMNODES)
> + node_set(r->nid, numa_kernel_nodes);
>
> /* Clear MEMBLOCK_HOTPLUG flag for memory in kernel nodes. */
> for (i = 0; i < numa_meminfo.nr_blks; i++) {
>
> .
>
next prev parent reply other threads:[~2015-04-08 1:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-07 13:41 [PATCH V2] x86/numa: kernel stack corruption fix Dave Young
2015-04-07 14:09 ` [tip:x86/urgent] x86/mm/numa: Fix kernel stack corruption in numa_init()-> numa_clear_kernel_node_hotplug() tip-bot for Dave Young
2015-04-08 1:36 ` Xishi Qiu [this message]
2015-04-08 1:46 ` [PATCH V2] x86/numa: kernel stack corruption fix Dave Young
2015-04-08 1:59 ` Xishi Qiu
2015-04-08 2:18 ` Baoquan He
2015-04-08 2:41 ` Xishi Qiu
2015-04-08 3:09 ` Dave Young
2015-04-09 3:27 ` Baoquan He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5524862F.6010709@huawei.com \
--to=qiuxishi@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=dyoung@redhat.com \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.