From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: Dom0 crash with old style AMD NUMA detection Date: Tue, 18 Sep 2012 18:50:14 +0200 Message-ID: <5058A646.5060909@amd.com> References: <501BC20F.3040205@amd.com> <20120803123628.GB10670@andromeda.dapyr.net> <20120817142237.GA8467@phenom.dumpdata.com> <20120914185822.GA7495@phenom.dumpdata.com> <5056D152.2090708@amd.com> <20120917191432.GA18552@phenom.dumpdata.com> <5058458D.7030603@amd.com> <20120918134457.GE12053@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120918134457.GE12053@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk , Jeremy Fitzhardinge , xen-devel , Dario Faggioli , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote: > On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote: >> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote: >>> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote: >>>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote: >>>>>>>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>>>>>>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The obvious solution would be to explicitly deny northbridge scanning >>>>>>>> when running as Dom0, though I am not sure how to implement this without >>>>>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-) >>>>>>> >>>>>>> Heh. >>>>>>> Is there a numa=0 option that could be used to override it to turn it >>>>>>> off? >>>>>> >>>>>> Not compile tested.. but was thinking something like this: >>>>> >>>>> ping? >>>> >>>> That looks good to me - at least for the time being. >>> >>> OK, can I've your Tested-by/Acked-by on it pls? >>> >>>> I just want to check how this interacts with upcoming Dom0 NUMA >>>> support. It wouldn't be too clever if we deliberately disable NUMA >>> >>> We can always revert this patch in future versions of Linux. >> >> I don't like this idea. Then we have Linux kernel up to 3.5 working >> and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That >> would be pretty unfortunate. > > Huh? v3.5 working? But it never worked? I would say turn off the NUMA > detection (keep in mind it still will set up the dummy NUMA stuff) > until there are some PV NUMA capability and then we can revert it. I was under the impression that somehow the Dom0 NUMA would be made compatible, using some of the existing discovery mechanisms. So we would enable the hypervisor, and Dom0 would just magically start working. I am probably rooted too much in the HVM world ;-) >> >> I haven't checked back with Dario, but I'd suspect that we use ACPI >> for injecting NUMA topology into Dom0. Even if not, a general >> "numa=off" for Dom0 is too much of a sledgehammer for me. > > How would you inject it in Dom0? It s a PV guest so the hypervisor would > have to tweak the SRAT/SLIT tables. That is not going to happen > in the very short term.. And I don't recall seeing any patches, so > the dom0 NUMA support is right now non-existent? Right, I just don't wanted to slam the door deliberately. Thinking more about this, we probably need some kind of PV enablement in Dom0, even if we could somehow use the ACPI tables (and thus the ACPI parsing code). If this is the case, we could at the same time remove this "force numa off" patch. I am almost convinced by now. Just waiting for Dario's opinion for a few more hours and will send my final opinion later today. If you cannot wait, tell me. Andre.