From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Dom0 crash with old style AMD NUMA detection Date: Tue, 18 Sep 2012 10:55:24 -0400 Message-ID: <20120918145524.GA19790@phenom.dumpdata.com> References: <501BC20F.3040205@amd.com> <20120803123628.GB10670@andromeda.dapyr.net> <20120817142237.GA8467@phenom.dumpdata.com> <20120914185822.GA7495@phenom.dumpdata.com> <5056D152.2090708@amd.com> <20120917191432.GA18552@phenom.dumpdata.com> <5058458D.7030603@amd.com> <20120918134457.GE12053@phenom.dumpdata.com> <5058A646.5060909@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <5058A646.5060909@amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andre Przywara Cc: Konrad Rzeszutek Wilk , Jeremy Fitzhardinge , xen-devel , Dario Faggioli , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On Tue, Sep 18, 2012 at 06:50:14PM +0200, Andre Przywara wrote: > On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote: > >On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote: > >>On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote: > >>>On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote: > >>>>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: > >>>>>>>>[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! > >>>>>>>>(XEN) Domain 0 crashed: 'noreboot' set - not rebooting. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>The obvious solution would be to explicitly deny northbridge scanning > >>>>>>>>when running as Dom0, though I am not sure how to implement this without > >>>>>>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-) > >>>>>>> > >>>>>>>Heh. > >>>>>>>Is there a numa=0 option that could be used to override it to turn it > >>>>>>>off? > >>>>>> > >>>>>>Not compile tested.. but was thinking something like this: > >>>>> > >>>>>ping? > >>>> > >>>>That looks good to me - at least for the time being. > >>> > >>>OK, can I've your Tested-by/Acked-by on it pls? > >>> > >>>>I just want to check how this interacts with upcoming Dom0 NUMA > >>>>support. It wouldn't be too clever if we deliberately disable NUMA > >>> > >>>We can always revert this patch in future versions of Linux. > >> > >>I don't like this idea. Then we have Linux kernel up to 3.5 working > >>and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That > >>would be pretty unfortunate. > > > >Huh? v3.5 working? But it never worked? I would say turn off the NUMA > >detection (keep in mind it still will set up the dummy NUMA stuff) > >until there are some PV NUMA capability and then we can revert it. > > I was under the impression that somehow the Dom0 NUMA would be made > compatible, using some of the existing discovery mechanisms. So we > would enable the hypervisor, and Dom0 would just magically start > working. I am probably rooted too much in the HVM world ;-) > > >> > >>I haven't checked back with Dario, but I'd suspect that we use ACPI > >>for injecting NUMA topology into Dom0. Even if not, a general > >>"numa=off" for Dom0 is too much of a sledgehammer for me. > > > >How would you inject it in Dom0? It s a PV guest so the hypervisor would > >have to tweak the SRAT/SLIT tables. That is not going to happen > >in the very short term.. And I don't recall seeing any patches, so > >the dom0 NUMA support is right now non-existent? > > Right, I just don't wanted to slam the door deliberately. Thinking > more about this, we probably need some kind of PV enablement in > Dom0, even if we could somehow use the ACPI tables (and thus the > ACPI parsing code). > If this is the case, we could at the same time remove this "force > numa off" patch. > > I am almost convinced by now. > Just waiting for Dario's opinion for a few more hours and will send > my final opinion later today. If you cannot wait, tell me. Couple of days is OK with me. My deadline is Friday as I would like to send a git pull to Linus and include this patch if it makes sense.