From: Wanpeng Li <kernellwp@gmail.com>
To: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Wanpeng Li <wanpeng.li@linux.intel.com>,
Ingo Molnar <mingo@redhat.com>,
hpa@zytor.com, Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
x86@kernel.org, Borislav Petkov <bp@alien8.de>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
David Rientjes <rientjes@google.com>,
Prarit Bhargava <prarit@redhat.com>,
Steven Rostedt <srostedt@redhat.com>,
Toshi Kani <toshi.kani@hp.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug
Date: Tue, 23 Sep 2014 14:36:07 +0800 [thread overview]
Message-ID: <542114D7.3030605@gmail.com> (raw)
In-Reply-To: <5420FB25.8050102@jp.fujitsu.com>
Hi Kamezawa,
于 14-9-23 下午12:46, Kamezawa Hiroyuki 写道:
> (2014/09/17 16:17), Wanpeng Li wrote:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> IP: [..] find_busiest_group
>> PGD 5a9d5067 PUD 13067 PMD 0
>> Oops: 0000 [#3] SMP
>> [...]
>> Call Trace:
>> load_balance
>> ? _raw_spin_unlock_irqrestore
>> idle_balance
>> __schedule
>> schedule
>> schedule_timeout
>> ? lock_timer_base
>> schedule_timeout_uninterruptible
>> msleep
>> lock_device_hotplug_sysfs
>> online_store
>> dev_attr_store
>> sysfs_write_file
>> vfs_write
>> SyS_write
>> system_call_fastpath
>>
>> This bug can be triggered by hot add and remove large number of xen
>> domain0's vcpus repeatedly.
>>
>> Last level cache shared map is built during cpu up and build sched domain
>> routine takes advantage of it to setup sched domain cpu topology, however,
>> llc shared map is unreleased during cpu disable which lead to invalid sched
>> domain cpu topology. This patch fix it by release llc shared map correctly
>> during cpu disable.
>>
>> Reviewed-by: Toshi Kani <toshi.kani@hp.com>
>> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
>> Tested-by: Linn Crosetto <linn@hp.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> Yasuaki reported this can happen on our real hardware.
> https://lkml.org/lkml/2014/7/22/1018
>
> Our case is here.
> ==
> Here is a example on my system.
> My system has 4 sockets and each socket has 15 cores and HT is enabled.
> In this case, each core of sockes is numbered as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-44, 90-104
> Socket#3 | 45-59, 105-119
> Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
> It means that last level cache of Socket#2 is shared with
> CPU#30-44 and 90-104.
> When hot-removing socket#2 and #3, each core of sockets is numbered
> as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
> having 0x3fff80000001fffc0000000.
> After that, when hot-adding socket#2 and #3, each core of sockets is
> numbered as follows:
>
> | CPU#
> Socket#0 | 0-14 , 60-74
> Socket#1 | 15-29, 75-89
> Socket#2 | 30-59
> Socket#3 | 90-119
> Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000.
> It means that last level cache of Socket#2 is shared with CPU#30-59
> and 90-104. So the mask has wrong value.
> At first, I cleared hot-removed CPU number's bit from llc_shared_map
> when hot removing CPU. But Borislav suggested that the problem will
> disappear if readded CPU is assigned same CPU number. And llc_shared_map
> must not be changed.
> ==
>
> So, please.
As I mentioned before, we still observe calltrace after Yasuaki's patch
applied.
https://lkml.org/lkml/2014/7/29/40
Actually I prefer to merge both patches, one for fix llc shared map
unreleased during hotplug and the other one for assign same CPU number
to readded CPU.
Regards,
Wanpeng Li
> Thanks,
> -Kame
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2014-09-23 6:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-17 7:17 [PATCH v5] x86, cpu-hotplug: fix llc shared map unreleased during cpu hotplug Wanpeng Li
2014-09-21 23:11 ` Wanpeng Li
2014-09-23 4:46 ` Kamezawa Hiroyuki
2014-09-23 6:36 ` Wanpeng Li [this message]
2014-09-23 7:56 ` Kamezawa Hiroyuki
2014-09-23 9:37 ` Borislav Petkov
2014-09-23 23:48 ` Wanpeng Li
2014-09-24 7:52 ` Ingo Molnar
2014-09-24 8:18 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=542114D7.3030605@gmail.com \
--to=kernellwp@gmail.com \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=prarit@redhat.com \
--cc=rientjes@google.com \
--cc=srostedt@redhat.com \
--cc=toshi.kani@hp.com \
--cc=wanpeng.li@linux.intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).