From: Borislav Petkov <bp@alien8.de>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
Ingo Molnar <mingo@kernel.org>,
Andy Lutomirski <luto@amacapital.net>,
Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
X86 ML <x86@kernel.org>
Subject: Re: [PATCH V1] x86, espfix: postpone the initialization of espfix stack for AP
Date: Wed, 17 Jun 2015 23:04:11 +0200 [thread overview]
Message-ID: <20150617210411.GD16999@pd.tnic> (raw)
In-Reply-To: <55812149.1040804@zytor.com>
On Wed, Jun 17, 2015 at 12:27:05AM -0700, H. Peter Anvin wrote:
> On 06/04/2015 02:45 AM, Gu Zheng wrote:
> > The following lockdep warning occurrs when running with latest kernel:
> > [ 3.178000] ------------[ cut here ]------------
> > [ 3.183000] WARNING: CPU: 128 PID: 0 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0xdd/0xe0()
> > [ 3.193000] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> > [ 3.199000] Modules linked in:
> >
> > [ 3.203000] CPU: 128 PID: 0 Comm: swapper/128 Not tainted 4.1.0-rc3 #70
> > [ 3.221000] 0000000000000000 2d6601fb3e6d4e4c ffff88086fd5fc38 ffffffff81773f0a
> > [ 3.230000] 0000000000000000 ffff88086fd5fc90 ffff88086fd5fc78 ffffffff8108c85a
> > [ 3.238000] ffff88086fd60000 0000000000000092 ffff88086fd60000 00000000000000d0
> > [ 3.246000] Call Trace:
> > [ 3.249000] [<ffffffff81773f0a>] dump_stack+0x4c/0x65
> > [ 3.255000] [<ffffffff8108c85a>] warn_slowpath_common+0x8a/0xc0
> > [ 3.261000] [<ffffffff8108c8e5>] warn_slowpath_fmt+0x55/0x70
> > [ 3.268000] [<ffffffff810ee24d>] lockdep_trace_alloc+0xdd/0xe0
> > [ 3.274000] [<ffffffff811cda0d>] __alloc_pages_nodemask+0xad/0xca0
> > [ 3.281000] [<ffffffff810ec7ad>] ? __lock_acquire+0xf6d/0x1560
> > [ 3.288000] [<ffffffff81219c8a>] alloc_page_interleave+0x3a/0x90
> > [ 3.295000] [<ffffffff8121b32d>] alloc_pages_current+0x17d/0x1a0
> > [ 3.301000] [<ffffffff811c869e>] ? __get_free_pages+0xe/0x50
> > [ 3.308000] [<ffffffff811c869e>] __get_free_pages+0xe/0x50
> > [ 3.314000] [<ffffffff8102640b>] init_espfix_ap+0x17b/0x320
> > [ 3.320000] [<ffffffff8105c691>] start_secondary+0xf1/0x1f0
> > [ 3.327000] ---[ end trace 1b3327d9d6a1d62c ]---
> >
> > As we alloc pages with GFP_KERNEL in init_espfix_ap() which is called
> > before enabled local irq, and the lockdep sub-system considers this
> > behaviour as allocating memory with GFP_FS with local irq disabled,
> > then trigger the warning as mentioned about.
> >
> > Though we could allocate them on the boot CPU side and hand them over to
> > the secondary CPU, but it seemes a bit waste if some of cpus are offline.
> > As thers is no need to these pages(espfix stack) until we try to run user
> > code, so we postpone the initialization of espfix stack, and let the boot
> > up routine init the espfix stack for the target cpu after it booted to
> > avoid the noise.
> >
>
> It isn't *at all* obvious to me at least that if the GFP_KERNEL
> allocation fails we may not get rescheduled on another CPU and/or get stuck.
>
> I'm starting to think that the right thing to do is to allocate these on
> the CPU that is bringing up the other CPU, at the same time we allocate
> the percpu area. This won't affect offline CPUs.
Btw, as part of experimenting for something else, I was able to trigger
this even on a guest here. It is an insane guest though: 16 NUMA nodes,
with 8 cores each:
[ 0.032000] ------------[ cut here ]------------
[ 0.032000] WARNING: CPU: 64 PID: 0 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x10c/0x120()
[ 0.032000] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[ 0.032000] Modules linked in:
[ 0.032000] CPU: 64 PID: 0 Comm: swapper/64 Not tainted 4.1.0-rc3+ #4
[ 0.032000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 0.032000] ffffffff818dd1a1 ffff880006a1fca8 ffffffff816a9685 0000000000000000
[ 0.032000] ffff880006a1fcf8 ffff880006a1fce8 ffffffff81058585 00000000001d74c0
[ 0.032000] 0000000000000080 0000000000000046 ffff880047ffcd00 ffff88003e804058
[ 0.032000] Call Trace:
[ 0.032000] [<ffffffff816a9685>] dump_stack+0x4f/0x7b
[ 0.032000] [<ffffffff81058585>] warn_slowpath_common+0x95/0xe0
[ 0.032000] [<ffffffff81058616>] warn_slowpath_fmt+0x46/0x50
[ 0.032000] [<ffffffff810a662c>] lockdep_trace_alloc+0x10c/0x120
[ 0.032000] [<ffffffff8113d6ed>] __alloc_pages_nodemask+0xad/0xab0
[ 0.032000] [<ffffffff813442d7>] ? debug_smp_processor_id+0x17/0x20
[ 0.032000] [<ffffffff810a370e>] ? put_lock_stats.isra.19+0xe/0x30
[ 0.032000] [<ffffffff816ae288>] ? mutex_lock_nested+0x2e8/0x420
[ 0.032000] [<ffffffff8117e0cc>] alloc_page_interleave+0x3c/0x90
[ 0.032000] [<ffffffff8117e995>] alloc_pages_current+0xc5/0xd0
[ 0.032000] [<ffffffff81138734>] __get_free_pages+0x14/0x50
[ 0.032000] [<ffffffff8100a484>] init_espfix_ap.part.5+0x164/0x270
[ 0.032000] [<ffffffff8100a5b1>] init_espfix_ap+0x21/0x30
[ 0.032000] [<ffffffff8103cd28>] start_secondary+0xe8/0x180
[ 0.032000] ---[ end trace 6a7abdb28fbb7667 ]---
Now I can test the future fix too. :)
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
next prev parent reply other threads:[~2015-06-17 21:04 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-14 11:37 [RFC PATCH] x86, espfix: use spin_lock rather than mutex Gu Zheng
2015-05-14 12:26 ` Borislav Petkov
2015-05-14 18:29 ` Ingo Molnar
2015-05-14 21:27 ` Borislav Petkov
2015-05-14 22:13 ` H. Peter Anvin
2015-05-15 6:54 ` Ingo Molnar
2015-05-15 7:27 ` H. Peter Anvin
2015-05-18 19:43 ` Andy Lutomirski
2015-05-19 15:04 ` H. Peter Anvin
2015-05-22 10:13 ` [RFC PATCH] x86, espfix: postpone the initialization of espfix stack for AP Gu Zheng
2015-05-28 1:20 ` Gu Zheng
2015-05-29 1:07 ` Andy Lutomirski
2015-05-29 0:57 ` Gu Zheng
2015-06-02 9:23 ` Gu Zheng
2015-06-02 9:25 ` [RFC PATCH V2] " Gu Zheng
2015-06-02 11:59 ` Ingo Molnar
2015-06-03 9:58 ` Gu Zheng
2015-06-04 9:45 ` [PATCH V1] " Gu Zheng
2015-06-17 5:53 ` Zhu Guihua
2015-06-17 7:27 ` H. Peter Anvin
2015-06-17 21:04 ` Borislav Petkov [this message]
2015-06-17 21:11 ` H. Peter Anvin
2015-06-17 21:50 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150617210411.GD16999@pd.tnic \
--to=bp@alien8.de \
--cc=guz.fnst@cn.fujitsu.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.