From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966833AbdACXwl (ORCPT ); Tue, 3 Jan 2017 18:52:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56056 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965138AbdACXwc (ORCPT ); Tue, 3 Jan 2017 18:52:32 -0500 Message-ID: <586C374A.80401@redhat.com> Date: Tue, 03 Jan 2017 18:44:10 -0500 From: Prarit Bhargava User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: linux-kernel@vger.kernel.org CC: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Kan Liang , Borislav Petkov , Harish Chegondi Subject: Re: [PATCH] perf/x86/intel/uncore: Initialize with correct logical package ID References: <1483471471-14450-1-git-send-email-prarit@redhat.com> In-Reply-To: <1483471471-14450-1-git-send-email-prarit@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 03 Jan 2017 23:44:13 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/03/2017 02:24 PM, Prarit Bhargava wrote: > On multi-socket Intel v3 processor systems (aka Haswell) kdump can fail with: > > BUG: unable to handle kernel paging request at 00000000006563a1 > IP: [] hswep_uncore_cpu_init+0x52/0xa0 > PGD 0 [ 2.313897] > Oops: 0000 [#1] SMP > Modules linked in: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0 #1 > Hardware name: NEC Express5800/T120f [N8100-2285Y]/GA-7WESV-NJ, BIOS 5.0.4009 08/01/2016 > task: ffff88002bdb8000 task.stack: ffffc90000014000 > RIP: 0010:[] [] hswep_uncore_cpu_init+0x52/0xa0 > RSP: 0000:ffffc90000017db8 EFLAGS: 00010206 > RAX: 0000000000656369 RBX: 0000000000000000 RCX: 0000000000001e03 > RDX: ffff88002b224780 RSI: 0000000000000000 RDI: 0000000000000000 > RBP: ffffc90000017dc8 R08: 000000000001c880 R09: ffffffff813667e1 > R10: ffff880030c1c880 R11: 0000000000000000 R12: 0000000000000000 > R13: ffffffff81c1c090 R14: afafafafafafafaf R15: afafafafafafafaf > FS: 0000000000000000(0000) GS:ffff880030c00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000006563a1 CR3: 000000002fc07000 CR4: 00000000001406b0 > Stack: > ffffc90000017dc8 00000000352bd002 ffffc90000017e00 ffffffff81da17f8 > 0000000000000000 ffffffff81da16f9 00000000000000f0 afafafafafafafaf > afafafafafafafaf ffffc90000017e78 ffffffff81002190 ffffc90000017e00 > Call Trace: > [] intel_uncore_init+0xff/0x2e6 > [] ? uncore_type_init+0x158/0x158 > [] do_one_initcall+0x50/0x190 > [] ? parse_args+0x27b/0x460 > [] kernel_init_freeable+0x1a5/0x249 > [] ? set_debug_rodata+0x12/0x12 > [] ? rest_init+0x80/0x80 > [] kernel_init+0xe/0x110 > [] ret_from_fork+0x25/0x30 > Code: 1a d5 00 39 15 cc 1c c0 00 7e 06 89 15 c4 1c c0 00 48 98 48 8b 15 d7 c3 f7 00 48 8d 04 40 48 8d 04 c2 48 8b 40 10 48 85 c0 74 1b <8b> 70 38 48 8b 78 10 48 8d 4d f4 ba 94 00 00 00 e8 b9 db 38 00 > RIP [] hswep_uncore_cpu_init+0x52/0xa0 > > This is now occuring because 9d85eb9119f4 ("x86/smpboot: Make logical package > management more robust") corrected the physical ID to logical ID mapping of the > threads. hswep_uncore_cpu_init() is hard coded for physical socket 0 and if > the system is kdump'ing on any other socket the logical package value will be > incorrect. The code should not use 0 as the physical ID, and should use > the boot cpu's physical package ID in this calculation. > > Signed-off-by: Prarit Bhargava > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x86@kernel.org > Cc: Peter Zijlstra > Cc: Kan Liang > Cc: Borislav Petkov > Cc: Harish Chegondi > --- > arch/x86/events/intel/uncore_snbep.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c > index e6832be714bc..b5fbb59fdc64 100644 > --- a/arch/x86/events/intel/uncore_snbep.c > +++ b/arch/x86/events/intel/uncore_snbep.c > @@ -2686,7 +2686,7 @@ static int hswep_pcu_hw_config(struct intel_uncore_box *box, struct perf_event * > > void hswep_uncore_cpu_init(void) > { > - int pkg = topology_phys_to_logical_pkg(0); > + int pkg = topology_phys_to_logical_pkg(boot_cpu_data.phys_proc_id); One thing that just occurred to me as I was looking at other code. boot_cpu_data has logical_proc_id, so it may be better to use that instead of the lookup function. I'm not sure of the usage of physical_to_logical_pkg[] and logical_proc_id. Unless tglx or someone already knows of a reason not to use logical_proc_id I certainly can change the patch. P. > > if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores) > hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores; >