From: Jiang Liu <jiang.liu@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
David Rientjes <rientjes@google.com>,
Ingo Molnar <mingo@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Tony Luck <tony.luck@intel.com>,
linux-kernel@vger.kernel.org
Subject: [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition
Date: Mon, 28 Apr 2014 10:48:13 +0800 [thread overview]
Message-ID: <1398653293-12483-1-git-send-email-jiang.liu@linux.intel.com> (raw)
Intel platforms with Nehalem/Westmere/IvyBridge CPUs may support socket
hotplug/online at runtime. The CPU hot-addition flow is:
1) handle CPU hot-addition event
1.a) gather platform specific information
1.b) associate hot-added CPU with NUMA node
1.c) create CPU device
2) online hot-added CPU through sysfs:
2.a) cpu_up()
2.b) ->try_online_node()
2.c) ->hotadd_new_pgdat()
2.d) ->node_set_online()
Between 1.b and 2.c, hot-added CPUs are associated with NUMA nodes
but those NUMA nodes may still be in offlined state. So we should
check node_online(nid) before calling kmalloc_node(nid) and friends,
otherwise it may cause invalid memory access as below.
[ 3663.324476] BUG: unable to handle kernel paging request at 0000000000001f08
[ 3663.332348] IP: [<ffffffff81172219>] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.339719] PGD 82fe10067 PUD 82ebef067 PMD 0
[ 3663.344773] Oops: 0000 [#1] SMP
[ 3663.348455] Modules linked in: shpchp gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode joydev sb_edac edac_core lpc_ich ipmi_si tpm_tis ipmi_msghandler ioatdma wmi acpi_pad mac_hid lp parport ixgbe isci mpt2sas dca ahci ptp libsas libahci raid_class pps_core scsi_transport_sas mdio hid_generic usbhid hid
[ 3663.394393] CPU: 61 PID: 2416 Comm: cron Tainted: G W 3.14.0-rc5+ #21
[ 3663.402643] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIN1.86B.0047.F03.1403031049 03/03/2014
[ 3663.414299] task: ffff88082fe54b00 ti: ffff880845fba000 task.ti: ffff880845fba000
[ 3663.422741] RIP: 0010:[<ffffffff81172219>] [<ffffffff81172219>] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.432857] RSP: 0018:ffff880845fbbcd0 EFLAGS: 00010246
[ 3663.439265] RAX: 0000000000001f00 RBX: 0000000000000000 RCX: 0000000000000000
[ 3663.447291] RDX: 0000000000000000 RSI: 0000000000000a8d RDI: ffffffff81a8d950
[ 3663.455318] RBP: ffff880845fbbd58 R08: ffff880823293400 R09: 0000000000000001
[ 3663.463345] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000002052d0
[ 3663.471363] R13: ffff880854c07600 R14: 0000000000000002 R15: 0000000000000000
[ 3663.479389] FS: 00007f2e8b99e800(0000) GS:ffff88105a400000(0000) knlGS:0000000000000000
[ 3663.488514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3663.495018] CR2: 0000000000001f08 CR3: 00000008237b1000 CR4: 00000000001407e0
[ 3663.503476] Stack:
[ 3663.505757] ffffffff811bd74d ffff880854c01d98 ffff880854c01df0 ffff880854c01dd0
[ 3663.514167] 00000003208ca420 000000075a5d84d0 ffff88082fe54b00 ffffffff811bb35f
[ 3663.522567] ffff880854c07600 0000000000000003 0000000000001f00 ffff880845fbbd48
[ 3663.530976] Call Trace:
[ 3663.533753] [<ffffffff811bd74d>] ? deactivate_slab+0x41d/0x4f0
[ 3663.540421] [<ffffffff811bb35f>] ? new_slab+0x3f/0x2d0
[ 3663.546307] [<ffffffff811bb3c5>] new_slab+0xa5/0x2d0
[ 3663.552001] [<ffffffff81768c97>] __slab_alloc+0x35d/0x54a
[ 3663.558185] [<ffffffff810a4845>] ? local_clock+0x25/0x30
[ 3663.564686] [<ffffffff8177a34c>] ? __do_page_fault+0x4ec/0x5e0
[ 3663.571356] [<ffffffff810b0054>] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.578609] [<ffffffff810c77f1>] ? __raw_spin_lock_init+0x21/0x60
[ 3663.585570] [<ffffffff811be476>] kmem_cache_alloc_node_trace+0xa6/0x1d0
[ 3663.593112] [<ffffffff810b0054>] ? alloc_fair_sched_group+0xc4/0x190
[ 3663.600363] [<ffffffff810b0054>] alloc_fair_sched_group+0xc4/0x190
[ 3663.607423] [<ffffffff810a359f>] sched_create_group+0x3f/0x80
[ 3663.613994] [<ffffffff810b611f>] sched_autogroup_create_attach+0x3f/0x1b0
[ 3663.621732] [<ffffffff8108258a>] sys_setsid+0xea/0x110
[ 3663.628020] [<ffffffff8177f42d>] system_call_fastpath+0x1a/0x1f
[ 3663.634780] Code: 00 44 89 e7 e8 b9 f8 f4 ff 41 f6 c4 10 74 18 31 d2 be 8d 0a 00 00 48 c7 c7 50 d9 a8 81 e8 70 6a f2 ff e8 db dd 5f 00 48 8b 45 c8 <48> 83 78 08 00 0f 84 b5 01 00 00 48 83 c0 08 44 89 75 c0 4d 89
[ 3663.657032] RIP [<ffffffff81172219>] __alloc_pages_nodemask+0xb9/0x2d0
[ 3663.664491] RSP <ffff880845fbbcd0>
[ 3663.668429] CR2: 0000000000001f08
[ 3663.672659] ---[ end trace df13f08ed9de18ad ]---
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
Hi all,
We have improved log messages according to Peter's suggestion,
no code changes.
Thanks!
Gerry
---
kernel/sched/fair.c | 12 +++++++-----
kernel/sched/rt.c | 11 +++++++----
2 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7570dd969c28..71be1b96662e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7487,7 +7487,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se;
- int i;
+ int i, nid;
tg->cfs_rq = kzalloc(sizeof(cfs_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg->cfs_rq)
@@ -7501,13 +7501,15 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
init_cfs_bandwidth(tg_cfs_bandwidth(tg));
for_each_possible_cpu(i) {
- cfs_rq = kzalloc_node(sizeof(struct cfs_rq),
- GFP_KERNEL, cpu_to_node(i));
+ nid = cpu_to_node(i);
+ if (nid != NUMA_NO_NODE && !node_online(nid))
+ nid = NUMA_NO_NODE;
+
+ cfs_rq = kzalloc_node(sizeof(struct cfs_rq), GFP_KERNEL, nid);
if (!cfs_rq)
goto err;
- se = kzalloc_node(sizeof(struct sched_entity),
- GFP_KERNEL, cpu_to_node(i));
+ se = kzalloc_node(sizeof(struct sched_entity), GFP_KERNEL, nid);
if (!se)
goto err_free_rq;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index bd2267ad404f..cdabbd85e22f 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -161,7 +161,7 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
{
struct rt_rq *rt_rq;
struct sched_rt_entity *rt_se;
- int i;
+ int i, nid;
tg->rt_rq = kzalloc(sizeof(rt_rq) * nr_cpu_ids, GFP_KERNEL);
if (!tg->rt_rq)
@@ -174,13 +174,16 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
ktime_to_ns(def_rt_bandwidth.rt_period), 0);
for_each_possible_cpu(i) {
- rt_rq = kzalloc_node(sizeof(struct rt_rq),
- GFP_KERNEL, cpu_to_node(i));
+ nid = cpu_to_node(i);
+ if (nid != NUMA_NO_NODE && !node_online(nid))
+ nid = NUMA_NO_NODE;
+
+ rt_rq = kzalloc_node(sizeof(struct rt_rq), GFP_KERNEL, nid);
if (!rt_rq)
goto err;
rt_se = kzalloc_node(sizeof(struct sched_rt_entity),
- GFP_KERNEL, cpu_to_node(i));
+ GFP_KERNEL, nid);
if (!rt_se)
goto err_free_rq;
--
1.7.10.4
next reply other threads:[~2014-04-28 2:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-28 2:48 Jiang Liu [this message]
2014-04-28 7:09 ` [Bugfix v2] sched: fix possible invalid memory access caused by CPU hot-addition Peter Zijlstra
2014-04-30 4:26 ` Jiang Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1398653293-12483-1-git-send-email-jiang.liu@linux.intel.com \
--to=jiang.liu@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=rientjes@google.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.