From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753963AbcJDM1W (ORCPT ); Tue, 4 Oct 2016 08:27:22 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49516 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751701AbcJDM1V (ORCPT ); Tue, 4 Oct 2016 08:27:21 -0400 Date: Tue, 4 Oct 2016 14:27:17 +0200 From: Jiri Olsa To: Thomas Gleixner Cc: Prarit Bhargava , linux-kernel@vger.kernel.org, Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Len Brown , Borislav Petkov , Andi Kleen , Juergen Gross , dyoung@redhat.com, Eric Biederman , kexec@lists.infradead.org Subject: Re: [PATCH] arch/x86: Fix kdump on x86 with physically hotadded CPUs Message-ID: <20161004122717.GA4998@krava> References: <1475514432-27682-1-git-send-email-prarit@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.0 (2016-08-17) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 04 Oct 2016 12:27:20 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 04, 2016 at 12:58:04PM +0200, Thomas Gleixner wrote: > On Mon, 3 Oct 2016, Prarit Bhargava wrote: > > BUG: unable to handle kernel paging request at 0000000000841f1f > > IP: [] uncore_change_context+0xd4/0x180 > ... > > [] ? uncore_cpu_starting+0x130/0x130 > > [] uncore_event_cpu_online+0x6c/0x80 > > [] cpuhp_invoke_callback+0x49/0x100 > > [] cpuhp_thread_fun+0x41/0x100 > > [] smpboot_thread_fn+0x10f/0x160 > > [] ? sort_range+0x30/0x30 > > [] kthread+0xd8/0xf0 > > [] ret_from_fork+0x1f/0x40 > > [] ? kthread_park+0x60/0x60 > > > arch/x86/events/intel/uncore.c: > > 1137 static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_ cpu, > > 1138 int new_cpu) > > 1139 { > > 1140 struct intel_uncore_pmu *pmu = type->pmus; > > 1141 struct intel_uncore_box *box; > > 1142 int i, pkg; > > 1143 > > 1144 pkg = topology_logical_package_id(old_cpu < 0 ? new_cpu : old_cpu); > > 1145 for (i = 0; i < type->num_boxes; i++, pmu++) { > > 1146 box = pmu->boxes[pkg]; > > > > pmu->boxes[pkg] is garbage because pkg was returned as 0xffff. > > And that's what needs to be fixed in the first place. right, I'll check on that.. but I think we need this fix as well > > > This patch adds the missing generic_processor_info() to > > prefill_possible_map() to ensure the initialization of the boot cpu is > > correct. > > > This results in smp_init_package_map() having correct data and > > properly setting the package map for the hotplugged boot cpu, which in > > turn resolves the kdump kernel panic on physically hotplugged cpus. > > While it is the right thing to initialize the package map in that case, it > still papers over a robustness issue in the uncore code, which needs to be > fixed first. > > > [2] prefill_possible_map() is called before smp_store_boot_cpu_info(). > > The comment beside the call to smp_store_boot_cpu_info() states that the > > completed call results in "Final full version of the data". > > I'm not sure what that [2] here means and I cannot figure out the meaning > of this sentence either. > > This changelog is incomprehensible in general and more a "oh look how I > decoded this problem" report than something which clearly describes the > problem at hand, the root cause and the fix. The latter wants a > understandable explanation why prefill_possible_map() is the right place to > do this. I was wondering if acpi_boot_init was a better place for that, but then Prarit suggested in our discussion that the prefill_possible_map() call seems to be a hotplug cleanup.. so it seemed to fit however it's difficult to say with complex code like this, so any ideas are welcome ;-) thanks, jirka