From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56548) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VfSKq-0005C9-9H for qemu-devel@nongnu.org; Sun, 10 Nov 2013 05:33:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VfSKl-00069n-W1 for qemu-devel@nongnu.org; Sun, 10 Nov 2013 05:33:40 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37234) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VfSKl-00069f-OY for qemu-devel@nongnu.org; Sun, 10 Nov 2013 05:33:35 -0500 Date: Sun, 10 Nov 2013 12:36:29 +0200 From: "Michael S. Tsirkin" Message-ID: <20131110103629.GD3241@redhat.com> References: <1383828119-2181-1-git-send-email-vasilis.liaskovitis@profitbricks.com> <20131107130342.GA2212@redhat.com> <20131108102212.GA2790@shadowkeep> <20131108183312.7b5cde34@thinkpad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20131108183312.7b5cde34@thinkpad> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH] i386: Add _PXM method to ACPI CPU objects List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: Vasilis Liaskovitis , thilo.fromm@profitbricks.com, kevin@koconnor.net, seabios@seabios.org, qemu-devel@nongnu.org On Fri, Nov 08, 2013 at 06:33:12PM +0100, Igor Mammedov wrote: > On Fri, 8 Nov 2013 12:22:12 +0200 > Vasilis Liaskovitis wrote: >=20 > > Hi, > >=20 > > On Thu, Nov 07, 2013 at 03:03:42PM +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 07, 2013 at 01:41:59PM +0100, Vasilis Liaskovitis wrote= : > > > > This patch adds a _PXM method to ACPI CPU objects for the pc mach= ine. The _PXM > > > > value is derived from the passed in guest info, same way as CPU S= RAT entries. > > > >=20 > > > > The motivation for this patch is a CPU hot-unplug/hot-plug bug ob= served when > > > > using a 3.11 linux guest kernel on a multi-NUMA node qemu/kvm VM.= The linux > > > > guest kernel parses the SRAT CPU entries at boot time and stores = them in the > > > > array __apicid_to_node. When a CPU is hot-removed, the linux gues= t kernel > > > > resets the removed CPU's __apicid_to_node entry to NO_NUMA_NODE (= kernel commit > > > > c4c60524). When the removed cpu is hot-added again, the linux ker= nel looks up > > > > the hot-added cpu object's _PXM method instead of somehow re-disc= overing the > > > > SRAT entry info. With current qemu/seabios, the _PXM method is no= t found, and > > > > the CPU is thus hot-plugged in the default NUMA node 0. (The prob= lem does not > > > > show up on initial hotplug of a cpu; the PXM method is still not = found in this > > > > case, but the kernel still has the correct proximity value from t= he CPU's SRAT > > > > entry stored in __apicid_to_node) > > > >=20 > > > > ACPI spec mentions that the _PXM method is the correct way to det= ermine > > > > proximity information at hot-add time. > > >=20 > > > Where does it say this? > > > I found this: > > > If the Local APIC ID / Local SAPIC ID / Local x2APIC ID of a dynami= cally > > > added processor is not present in the System Resource Affinity Tabl= e > > > (SRAT), a _PXM object must exist for the processor=E2=80=99s device= or one of > > > its ancestors in the ACPI Namespace. > > >=20 > > > Does this mean that linux is buggy, and should be fixed up to look = up > > > the apic ID in SRAT? > >=20 > > The quote above suggests that if SRAT is absent, _PXM should be prese= nt. > > Seabios/qemu provide SRAT entries, and no _PXM. The fact that the ke= rnel > > resets the parse SRAT info on hot-remove time looks like a kernel pro= blem. > >=20 > > But As Toshi Kani mentioned in the original thread, here is a quote f= rom ACPI > > 5.0, stating _PXM and only _PXM should be used at hot-plug time: > >=20 > > =3D=3D=3D > > 17.2.1 System Resource Affinity Table Definition > >=20 > > This optional System Resource Affinity Table (SRAT) provides the boot > > time description of the processor and memory ranges belonging to a > > system locality. OSPM will consume the SRAT only at boot time. OSPM > > should use _PXM for any devices that are hot-added into the system af= ter > > boot up. > > =3D=3D=3D=3D > >=20 > > So in this sense, the kernel is correct (kernel only uses _PXM at hot= -plug time) > > , and qemu/Seabios should have _PXM methods for hot operations. >=20 > in terms of RFC SHOULD doesn't mean MUST, and in my interpretation of a= bove is > that SRAT parsed once but it doesn't mean that OS should forget data fr= om it. Well it says "OSPM will consume the SRAT only at boot time". How do you interpret that? > Anyway we surely can have both in QEMU. > >=20 > > >=20 > > > > So far, qemu/seabios do not provide this > > > > method for CPUs. So regardless of kernel behaviour, it is a good = idea to add > > > > this _PXM method. Since ACPI table generation has recently been m= oved from > > > > seabios to qemu, we do this in qemu. > > > >=20 > > > > Note that the above hot-remove/hot-add scenario has been tested o= n an older > > > > qemu + non-upstreamed patches for cpu hot-removal support, and no= t on qemu > > > > master (since cpu-del support is still not on master). The only t= esting done > > > > with qemu/seabios master and this patch, are successful boots of = multi-node > > > > linux and windows8 guests. > > > >=20 > > > > For the initial discussion on seabios and linux-acpi lists see > > > > http://www.spinics.net/lists/linux-acpi/msg47058.html > > > >=20 > > > > Signed-off-by: Vasilis Liaskovitis > > > > Reviewed-by: Thilo Fromm > > >=20 > > > Even if this is a linux bug, I have no issue with working around > > > it in qemu. > > >=20 > > > But I think proper testing needs to be done with rebased upport for= cpu-del. > >=20 > > Ok, I can try to rebase cpu-del support for testing. If there are cpu= -del bits > > already somewhere (Igor?) and not merged yet, please point me to them. > >=20 > > >=20 > > > > --- > > > > hw/i386/acpi-build.c | 2 ++ > > > > hw/i386/ssdt-proc.dsl | 2 ++ > > > > 2 files changed, 4 insertions(+) > > > >=20 > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > > > > index 6cfa044..9373f5e 100644 > > > > --- a/hw/i386/acpi-build.c > > > > +++ b/hw/i386/acpi-build.c > > > > @@ -603,6 +603,7 @@ static inline char acpi_get_hex(uint32_t val) > > > > #define ACPI_PROC_OFFSET_CPUHEX (*ssdt_proc_name - *ssdt_proc_st= art + 2) > > > > #define ACPI_PROC_OFFSET_CPUID1 (*ssdt_proc_name - *ssdt_proc_st= art + 4) > > > > #define ACPI_PROC_OFFSET_CPUID2 (*ssdt_proc_id - *ssdt_proc_star= t) > > > > +#define ACPI_PROC_OFFSET_CPUPXM (*ssdt_proc_pxm - *ssdt_proc_sta= rt) > > > > #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) > > > > #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) > > > > =20 > > > > @@ -724,6 +725,7 @@ build_ssdt(GArray *table_data, GArray *linker= , > > > > proc[ACPI_PROC_OFFSET_CPUHEX+1] =3D acpi_get_hex(i); > > > > proc[ACPI_PROC_OFFSET_CPUID1] =3D i; > > > > proc[ACPI_PROC_OFFSET_CPUID2] =3D i; > > > > + proc[ACPI_PROC_OFFSET_CPUPXM] =3D guest_info->node_c= pu[i]; > > > > } > > > > =20 > > > > /* build this code: > > > > diff --git a/hw/i386/ssdt-proc.dsl b/hw/i386/ssdt-proc.dsl > > > > index 8229bfd..7eef8b2 100644 > > > > --- a/hw/i386/ssdt-proc.dsl > > > > +++ b/hw/i386/ssdt-proc.dsl > > > > @@ -47,6 +47,8 @@ DefinitionBlock ("ssdt-proc.aml", "SSDT", 0x01,= "BXPC", "BXSSDT", 0x1) > > > > * also updating the C code. > > > > */ > > > > Name(_HID, "ACPI0007") > > > > + ACPI_EXTRACT_NAME_BYTE_CONST ssdt_proc_pxm > > > > + Name(_PXM, 0xAA) > > >=20 > > > The ACPI spec says this should be a DWORD value: > > >=20 > > > Return Value: > > > An Integer (DWORD) containing a proximity domain identifier. > >=20 > > ok, I 'll change this. > >=20 > > thanks, > >=20 > > - Vasilis >=20 >=20 > --=20 > Regards, > Igor