From: lizf@kernel.org
To: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org,
Wanpeng Li <wanpeng.li@linux.intel.com>,
David Rientjes <rientjes@google.com>,
Prarit Bhargava <prarit@redhat.com>,
Steven Rostedt <srostedt@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>, Zefan Li <lizefan@huawei.com>
Subject: [PATCH 3.4 62/91] sched: Fix unreleased llc_shared_mask bit during CPU hotplug
Date: Thu, 27 Nov 2014 16:42:45 +0800 [thread overview]
Message-ID: <1417077794-9299-62-git-send-email-lizf@kernel.org> (raw)
In-Reply-To: <1417077368-9217-1-git-send-email-lizf@kernel.org>
From: Wanpeng Li <wanpeng.li@linux.intel.com>
3.4.105-rc1 review patch. If anyone has any objections, please let me know.
------------------
commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream.
The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
PGD 5a9d5067 PUD 13067 PMD 0
Oops: 0000 [#3] SMP
[...]
Call Trace:
load_balance
? _raw_spin_unlock_irqrestore
idle_balance
__schedule
schedule
schedule_timeout
? lock_timer_base
schedule_timeout_uninterruptible
msleep
lock_device_hotplug_sysfs
online_store
dev_attr_store
sysfs_write_file
vfs_write
SyS_write
system_call_fastpath
Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.
However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.
This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.
Yasuaki also reported that this can happen on real hardware:
https://lkml.org/lkml/2014/7/22/1018
His case is here:
==
Here is an example on my system.
My system has 4 sockets and each socket has 15 cores and HT is
enabled. In this case, each core of sockes is numbered as
follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119
Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.
When hot-removing socket#2 and #3, each core of sockets is
numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
remains having 0x3fff80000001fffc0000000.
After that, when hot-adding socket#2 and #3, each core of
sockets is numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119
Then llc_shared_mask of CPU#30 becomes
0x3fff8000fffffffc0000000. It means that last level cache of
Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
the wrong value.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Linn Crosetto <linn@hp.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
---
arch/x86/kernel/smpboot.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d28c595..c7dbf02 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1248,6 +1248,9 @@ static void remove_siblinginfo(int cpu)
for_each_cpu(sibling, cpu_sibling_mask(cpu))
cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
+ for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
+ cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
+ cpumask_clear(cpu_llc_shared_mask(cpu));
cpumask_clear(cpu_sibling_mask(cpu));
cpumask_clear(cpu_core_mask(cpu));
c->phys_proc_id = 0;
--
1.9.1
next prev parent reply other threads:[~2014-11-27 8:42 UTC|newest]
Thread overview: 98+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-27 8:36 [PATCH 3.4 00/91] 3.4.105-rc1 review lizf
2014-11-27 8:41 ` [PATCH 3.4 01/91] KVM: s390: Fix user triggerable bug in dead code lizf
2014-11-27 8:41 ` [PATCH 3.4 02/91] regmap: Fix handling of volatile registers for format_write() chips lizf
2014-11-27 8:41 ` [PATCH 3.4 03/91] drm/i915: Remove bogus __init annotation from DMI callbacks lizf
2014-11-27 8:41 ` [PATCH 3.4 04/91] get rid of propagate_umount() mistakenly treating slaves as busy lizf
2014-11-27 8:41 ` [PATCH 3.4 05/91] drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle lizf
2014-11-27 8:41 ` [PATCH 3.4 06/91] ALSA: hda - Fix COEF setups for ALC1150 codec lizf
2014-11-27 8:41 ` [PATCH 3.4 07/91] ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock lizf
2014-11-27 8:41 ` [PATCH 3.4 08/91] regulatory: add NUL to alpha2 lizf
2014-11-27 8:41 ` [PATCH 3.4 09/91] percpu: fix pcpu_alloc_pages() failure path lizf
2014-11-27 8:41 ` [PATCH 3.4 10/91] percpu: perform tlb flush after pcpu_map_pages() failure lizf
2014-11-27 8:41 ` [PATCH 3.4 11/91] percpu: free percpu allocation info for uniprocessor system lizf
2014-11-27 8:41 ` [PATCH 3.4 12/91] cgroup: reject cgroup names with ' ' lizf
2014-11-27 8:41 ` [PATCH 3.4 13/91] rtlwifi: rtl8192cu: Add new ID lizf
2014-11-27 8:41 ` [PATCH 3.4 14/91] ahci: Add Device IDs for Intel 9 Series PCH lizf
2014-11-27 8:41 ` [PATCH 3.4 15/91] ata_piix: " lizf
2014-11-27 8:41 ` [PATCH 3.4 16/91] USB: ftdi_sio: add support for NOVITUS Bono E thermal printer lizf
2014-11-27 8:42 ` [PATCH 3.4 17/91] USB: sierra: avoid CDC class functions on "68A3" devices lizf
2014-11-27 8:42 ` [PATCH 3.4 18/91] USB: sierra: add 1199:68AA device ID lizf
2014-11-27 8:42 ` [PATCH 3.4 19/91] xen/manage: Always freeze/thaw processes when suspend/resuming lizf
2014-11-27 8:42 ` [PATCH 3.4 20/91] block: Fix dev_t minor allocation lifetime lizf
2014-11-27 8:42 ` [PATCH 3.4 21/91] usb: dwc3: core: fix order of PM runtime calls lizf
2014-11-27 8:42 ` [PATCH 3.4 22/91] ahci: add pcid for Marvel 0x9182 controller lizf
2014-11-27 8:42 ` [PATCH 3.4 23/91] drm/radeon: add connector quirk for fujitsu board lizf
2014-11-27 8:42 ` [PATCH 3.4 24/91] usb: host: xhci: fix compliance mode workaround lizf
2014-11-27 8:42 ` [PATCH 3.4 25/91] Input: elantech - fix detection of touchpad on ASUS s301l lizf
2014-11-27 8:42 ` [PATCH 3.4 26/91] USB: ftdi_sio: Add support for GE Healthcare Nemo Tracker device lizf
2014-11-27 8:42 ` [PATCH 3.4 27/91] uwb: init beacon cache entry before registering uwb device lizf
2014-11-27 8:42 ` [PATCH 3.4 28/91] Input: synaptics - add support for ForcePads lizf
2014-11-27 8:42 ` [PATCH 3.4 29/91] libceph: gracefully handle large reply messages from the mon lizf
2014-11-27 8:42 ` [PATCH 3.4 30/91] libceph: add process_one_ticket() helper lizf
2014-11-27 8:42 ` [PATCH 3.4 31/91] libceph: do not hard code max auth ticket len lizf
2014-11-27 8:42 ` [PATCH 3.4 32/91] Input: serport - add compat handling for SPIOCSTYPE ioctl lizf
2014-11-27 8:42 ` [PATCH 3.4 33/91] usb: hub: take hub->hdev reference when processing from eventlist lizf
2014-11-27 8:42 ` [PATCH 3.4 34/91] storage: Add single-LUN quirk for Jaz USB Adapter lizf
2014-11-27 8:42 ` [PATCH 3.4 35/91] xhci: Fix null pointer dereference if xhci initialization fails lizf
2014-11-27 8:42 ` [PATCH 3.4 36/91] futex: Unlock hb->lock in futex_wait_requeue_pi() error path lizf
2014-11-27 8:42 ` [PATCH 3.4 37/91] alarmtimer: Return relative times in timer_gettime lizf
2014-11-27 8:42 ` [PATCH 3.4 38/91] alarmtimer: Do not signal SIGEV_NONE timers lizf
2014-11-27 8:42 ` [PATCH 3.4 39/91] alarmtimer: Lock k_itimer during timer callback lizf
2014-11-27 8:42 ` [PATCH 3.4 40/91] don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() lizf
2014-11-27 8:42 ` [PATCH 3.4 41/91] jiffies: Fix timeval conversion to jiffies lizf
2014-11-27 8:42 ` [PATCH 3.4 42/91] MIPS: ZBOOT: add missing <linux/string.h> include lizf
2014-11-27 8:42 ` [PATCH 3.4 43/91] perf: Fix a race condition in perf_remove_from_context() lizf
2014-12-01 18:43 ` Sukadev Bhattiprolu
2014-12-02 1:22 ` Zefan Li
2014-11-27 8:42 ` [PATCH 3.4 44/91] ASoC: samsung-i2s: Check secondary DAI exists before referencing lizf
2014-11-27 8:42 ` [PATCH 3.4 45/91] Input: i8042 - add Fujitsu U574 to no_timeout dmi table lizf
2014-11-27 8:42 ` [PATCH 3.4 46/91] Input: i8042 - add nomux quirk for Avatar AVIU-145A6 lizf
2014-11-27 8:42 ` [PATCH 3.4 47/91] iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid lizf
2014-11-27 8:42 ` [PATCH 3.4 48/91] iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure lizf
2014-11-27 8:42 ` [PATCH 3.4 49/91] NFSv4: Fix another bug in the close/open_downgrade code lizf
2014-11-27 8:42 ` [PATCH 3.4 50/91] libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu lizf
2014-11-27 8:42 ` [PATCH 3.4 51/91] USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter lizf
2014-11-27 8:42 ` [PATCH 3.4 52/91] USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter lizf
2014-11-27 8:42 ` [PATCH 3.4 53/91] USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters lizf
2014-11-27 8:42 ` [PATCH 3.4 54/91] can: flexcan: mark TX mailbox as TX_INACTIVE lizf
2014-11-27 8:42 ` [PATCH 3.4 55/91] can: flexcan: correctly initialize mailboxes lizf
2014-11-27 8:42 ` [PATCH 3.4 56/91] can: flexcan: implement workaround for errata ERR005829 lizf
2014-11-27 8:42 ` [PATCH 3.4 57/91] can: flexcan: put TX mailbox into TX_INACTIVE mode after tx-complete lizf
2014-11-27 8:42 ` [PATCH 3.4 58/91] can: at91_can: add missing prepare and unprepare of the clock lizf
2014-11-27 8:42 ` [PATCH 3.4 59/91] ALSA: pcm: fix fifo_size frame calculation lizf
2014-11-27 8:42 ` [PATCH 3.4 60/91] Fix nasty 32-bit overflow bug in buffer i/o code lizf
2014-11-27 8:42 ` [PATCH 3.4 61/91] parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds lizf
2014-11-27 8:42 ` lizf [this message]
2014-11-27 8:42 ` [PATCH 3.4 63/91] sched: add macros to define bitops for task atomic flags lizf
2014-11-27 8:42 ` [PATCH 3.4 64/91] cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be " lizf
2014-11-27 8:42 ` [PATCH 3.4 65/91] MIPS: mcount: Adjust stack pointer for static trace in MIPS32 lizf
2014-11-27 8:42 ` [PATCH 3.4 66/91] nilfs2: fix data loss with mmap() lizf
2014-11-27 8:42 ` [PATCH 3.4 67/91] ocfs2/dlm: do not get resource spinlock if lockres is new lizf
2014-11-27 8:42 ` [PATCH 3.4 68/91] shmem: fix nlink for rename overwrite directory lizf
2014-11-27 8:42 ` [PATCH 3.4 69/91] ARM: 8165/1: alignment: don't break misaligned NEON load/store lizf
2014-11-27 8:42 ` [PATCH 3.4 70/91] ASoC: core: fix possible ZERO_SIZE_PTR pointer dereferencing error lizf
2014-11-27 8:42 ` [PATCH 3.4 71/91] mm: migrate: Close race between migration completion and mprotect lizf
2014-11-27 8:42 ` [PATCH 3.4 72/91] perf: fix perf bug in fork() lizf
2014-11-27 8:42 ` [PATCH 3.4 73/91] init/Kconfig: Hide printk log config if CONFIG_PRINTK=n lizf
2014-11-27 8:42 ` [PATCH 3.4 74/91] genhd: fix leftover might_sleep() in blk_free_devt() lizf
2014-11-27 8:42 ` [PATCH 3.4 75/91] nl80211: clear skb cb before passing to netlink lizf
2014-11-27 8:42 ` [PATCH 3.4 76/91] ext4: propagate errors up to ext4_find_entry()'s callers lizf
2014-11-27 8:43 ` [PATCH 3.4 77/91] ext4: avoid trying to kfree an ERR_PTR pointer lizf
2014-11-27 8:43 ` [PATCH 3.4 78/91] NFS: fix stable regression lizf
2014-11-27 8:43 ` [PATCH 3.4 79/91] perf: Handle compat ioctl lizf
2014-11-27 8:43 ` [PATCH 3.4 80/91] bluetooth: hci_ldisc: fix deadlock condition lizf
2014-11-27 8:43 ` [PATCH 3.4 81/91] mnt: Only change user settable mount flags in remount lizf
2014-11-27 8:43 ` [PATCH 3.4 82/91] dm crypt: fix access beyond the end of allocated space lizf
2014-11-27 8:43 ` [PATCH 3.4 83/91] Fix spurious request sense in error handling lizf
2014-11-27 8:43 ` [PATCH 3.4 84/91] ipv4: move route garbage collector to work queue lizf
2014-11-27 8:43 ` [PATCH 3.4 85/91] ipv4: avoid parallel route cache gc executions lizf
2014-11-27 8:43 ` [PATCH 3.4 86/91] ipv4: disable bh while doing route gc lizf
2014-11-27 8:43 ` [PATCH 3.4 87/91] rtl8192ce: Fix null dereference in watchdog lizf
2014-11-27 8:43 ` [PATCH 3.4 88/91] ipv6: reuse ip6_frag_id from ip6_ufo_append_data lizf
2014-11-27 8:43 ` [PATCH 3.4 89/91] net: Do not enable tx-nocache-copy by default lizf
2014-11-27 8:43 ` [PATCH 3.4 90/91] ixgbevf: Prevent RX/TX statistics getting reset to zero lizf
2014-11-27 8:43 ` [PATCH 3.4 91/91] l2tp: fix race while getting PMTU on PPP pseudo-wire lizf
2014-11-27 9:07 ` [PATCH 3.4 00/91] 3.4.105-rc1 review Zefan Li
2014-11-27 20:15 ` Guenter Roeck
2014-11-29 5:54 ` Zefan Li
-- strict thread matches above, loose matches on Subject: below --
2015-01-28 4:07 [PATCH 3.4 000/177] 3.4.106-rc1 review lizf
2015-01-28 4:08 ` [PATCH 3.4 62/91] sched: Fix unreleased llc_shared_mask bit during CPU hotplug lizf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1417077794-9299-62-git-send-email-lizf@kernel.org \
--to=lizf@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=prarit@redhat.com \
--cc=rientjes@google.com \
--cc=srostedt@redhat.com \
--cc=stable@vger.kernel.org \
--cc=wanpeng.li@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).