From: Andre Przywara <andre.przywara@amd.com>
To: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
Jeremy Fitzhardinge <jeremy@goop.org>,
xen-devel <xen-devel@lists.xen.org>,
Dario Faggioli <raistlin@linux.it>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: Dom0 crash with old style AMD NUMA detection
Date: Tue, 18 Sep 2012 11:57:33 +0200 [thread overview]
Message-ID: <5058458D.7030603@amd.com> (raw)
In-Reply-To: <20120917191432.GA18552@phenom.dumpdata.com>
On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>>>>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The obvious solution would be to explicitly deny northbridge scanning
>>>>>> when running as Dom0, though I am not sure how to implement this without
>>>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-)
>>>>>
>>>>> Heh.
>>>>> Is there a numa=0 option that could be used to override it to turn it
>>>>> off?
>>>>
>>>> Not compile tested.. but was thinking something like this:
>>>
>>> ping?
>>
>> That looks good to me - at least for the time being.
>
> OK, can I've your Tested-by/Acked-by on it pls?
>
>> I just want to check how this interacts with upcoming Dom0 NUMA
>> support. It wouldn't be too clever if we deliberately disable NUMA
>
> We can always revert this patch in future versions of Linux.
I don't like this idea. Then we have Linux kernel up to 3.5 working and
say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That would be
pretty unfortunate.
I haven't checked back with Dario, but I'd suspect that we use ACPI for
injecting NUMA topology into Dom0. Even if not, a general "numa=off" for
Dom0 is too much of a sledgehammer for me.
>> and future Xen version will allow us to use it. So let me check if I
>> can confine this turn-off to the fallback K8 northbridge reading.
>
> This potentially could work, but I would prefer to not do it for 3.6.
Mmh, I don't get the idea of your patch below. One can always read the
NUMA topology from the AMD northbridge, but this is deprecated if favor
of ACPI. The amdtopology.c stuff was only there to enable NUMA for very
early Opterons, where BIOSes didn't provide (sane) SRAT tables.
Though we disallow ACPI for NUMA on Dom0, this northbridge scanning
unfortunately "shines through" the virtualization, actually revealing
the system's NUMA topology, which is usually much different from Dom0's one.
So instead I want to do more something like this:
diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index bfacd2c..7811c0d 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -20,6 +20,8 @@
extern int numa_off;
+extern bool deny_amd_nb_numa_scan;
+
/*
* __apicid_to_node[] stores the raw mapping between physical apicid and
* node and is used to initialize cpu_to_node mapping.
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 5247d01..f223a67 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -29,6 +29,8 @@
static unsigned char __initdata nodeids[8];
+bool deny_amd_nb_numa_scan = 0;
+
static __init int find_northbridge(void)
{
int num;
@@ -78,6 +80,9 @@ int __init amd_numa_init(void)
u32 nodeid, reg;
unsigned int bits, cores, apicid_base;
+ if (deny_amd_nb_numa_scan)
+ return -ENOENT;
+
if (!early_pci_allowed())
return -EINVAL;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d11ca11..6db63c0 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -532,6 +532,8 @@ void __init xen_arch_setup(void)
}
#endif
+ deny_amd_nb_numa_scan = 1;
+
memcpy(boot_command_line, xen_start_info->cmd_line,
MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);
This would just turn off this one kind of NUMA discovery for Dom0.
The patch is admittedly a bit rough (not sure about the proper placement
into #ifdef's, for instance) and not well tested yet.
Also one could think about using a more general variable name to cover
other hardware things in the future that Dom0 shouldn't use.
So this isn't something still for 3.6, probably not even for 3.7.
What about if we drop the patch for this problem at all for 3.6 and
recommend "numa=off" as a workaround? This is much less sticky than a
kernel patch and could appear in the Xen wiki, for instance.
After all this isn't a strict regression (appears with every 3.x kernel,
AFAICT).
Most of the time the northbridge scanning will yield bogus results, so
the kernel eventually discards it, but sometimes it seems to slip
through and causes trouble.
Also it does not trigger on newer (Bulldozer) class CPUs, since we
deliberately avoided adding the new northbridge PCI-ID for this routine.
Regards,
Andre.
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index a4790bf..b4edce4 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
> #include <asm/e820.h>
> #include <asm/setup.h>
> #include <asm/acpi.h>
> +#include <asm/numa.h>
> #include <asm/xen/hypervisor.h>
> #include <asm/xen/hypercall.h>
>
> @@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void)
> if(ret != 0)
> setup_clear_cpu_cap(sysenter_feature);
> }
> +#ifdef CONFIG_AMD_NUMA
> +int __cpuinit xen_amd_k8(void)
> +{
> + int num;
> +
> + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> + return -ENOENT;
> +
> + for (num = 0; num < 32; num++) {
> + u32 header;
> +
> + header = read_pci_config(0, num, 0, 0x00);
> + if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
> + continue;
>
> + header = read_pci_config(0, num, 1, 0x00);
> + if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
> + header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
> + continue;
> + return num;
> + }
> + return -ENOENT;
> +#endif
> void __cpuinit xen_enable_syscall(void)
> {
> #ifdef CONFIG_X86_64
> @@ -542,4 +568,8 @@ void __init xen_arch_setup(void)
> disable_cpufreq();
> WARN_ON(set_pm_idle_to_default());
> fiddle_vdso();
> +#ifdef CONFIG_AMD_NUMA
> + if (xen_amd_k8() >= 0)
> + numa_off=1;
> +#endif
> }
>
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
next prev parent reply other threads:[~2012-09-18 9:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-03 12:20 Dom0 crash with old style AMD NUMA detection Andre Przywara
2012-08-03 12:36 ` Konrad Rzeszutek Wilk
2012-08-17 14:22 ` Konrad Rzeszutek Wilk
2012-09-14 18:58 ` Konrad Rzeszutek Wilk
2012-09-17 7:29 ` Andre Przywara
2012-09-17 19:14 ` Konrad Rzeszutek Wilk
2012-09-18 9:57 ` Andre Przywara [this message]
2012-09-18 13:44 ` Konrad Rzeszutek Wilk
2012-09-18 16:50 ` Andre Przywara
2012-09-18 14:55 ` Konrad Rzeszutek Wilk
2012-09-21 17:49 ` Andre Przywara
2012-09-21 17:48 ` Konrad Rzeszutek Wilk
2012-09-21 23:46 ` Andre Przywara
2012-09-24 13:48 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5058458D.7030603@amd.com \
--to=andre.przywara@amd.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=konrad@darnok.org \
--cc=konrad@kernel.org \
--cc=raistlin@linux.it \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).