From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LCacM-0001bx-0u for qemu-devel@nongnu.org; Tue, 16 Dec 2008 09:09:46 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LCacK-0001bl-W7 for qemu-devel@nongnu.org; Tue, 16 Dec 2008 09:09:45 -0500 Received: from [199.232.76.173] (port=44517 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LCacK-0001bi-Ty for qemu-devel@nongnu.org; Tue, 16 Dec 2008 09:09:44 -0500 Received: from outbound-va3.frontbridge.com ([216.32.180.16]:43414 helo=VA3EHSOBE005.bigfish.com) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_ARCFOUR_MD5:16) (Exim 4.60) (envelope-from ) id 1LCacK-0004eS-Jr for qemu-devel@nongnu.org; Tue, 16 Dec 2008 09:09:44 -0500 Message-ID: <4947B6BC.4070108@amd.com> Date: Tue, 16 Dec 2008 15:10:04 +0100 From: Andre Przywara MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] [PATCH 0/8] v2: add NUMA support to QEMU Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: qemu-devel@nongnu.org, Avi Kivity Hi, this is a reworked and repackaged version of the first NUMA try from last week. I integrated Anthony's comments. Documentation will follow. I refrained from splitting the host pinning and guest NUMA topology description on the command line, since it will not make the code easier and I don't see the point of intentionally restricting the cmd line interface. If you want to separate it, write "-numa 2 -numa pin:1;2" (the second argument is missing the leading node number). ------- The following patches add support for NUMA (Non-Uniform Memory Access) guests in QEMU. Since QEMU lacks real SMP support, this is mostly for debugging or research, but will improve performance in KVM (with additional KVM-only patches). 1/8: The user specifies a NUMA topology on the command line The command line syntax is: -numa [,mem:size1[;size2..]][,cpu:cpu1[;cpu2..]] Beside the number of nodes all other arguments are optional, so possible command lines are: -numa 2 /* inject two NUMA nodes into the guest, distribute guest CPUs and memory equally over the two nodes, don't pin the memory to host nodes*/ -smp 4 -numa 3,mem:1536M;768M;768M,cpu:0-1;2;3 /* inject three nodes, distribute the memory and cpu as described: node0: 1536M, CPUs 0,1; node1: 768M, CPU 2; node2: 768M, CPU 3 */ Please note that ; and * must be escaped on the shell. 2/8: push NUMA topology info to the BIOS using the QEMU firmware configuration interface. Defines three additional "channels"(?) to transport the information. 3/8: add 'info numa' monitor command to show the chosen NUMA topology. The output is based on Linux' numactl --hardware. 4/8: extend parser to parse host affinity option This adds the "[,pin:node1[;node2]]" option to the numa command line -numa 2,pin:2;* /* inject two nodes, allocate the memory for the first node from the host node 2, the second node has no affinity (all host nodes) */ For now it is not recommended to use host pinning in pure QEMU. 5/8: check for existence of libnuma in configure libnuma is a Linux library wrapping around the NUMA kernel interface. It is usually contained within the numactl package. Support for this is optional, it will only affect the host pinning code (next two patches) 6/8: if stated on the command line, set the affinity of the guest memory to certain host nodes. Requires libnuma. 7/8: add "-numa pin" monitor command. This allows to change the host affinity of guest nodes during the guest's runtime. Requires libnuma. The syntax is the same as on the command line, but the leading number is omitted: > numa pin:0;2 (no number means unchanged, * means all nodes, like: numa pin:;*) Note that this will only affect future guest memory touches, if the guest has already used every page, no change will occur. 8/8: For the sake of completeness a patch to build the ACPI SRAT table in the BIOS. This contains some code that is already upstream in BOCHS-cvs (from Gleb Natapov), I hope that Anthony will sync the BIOS soon. I will send a reworked patch then. If not, I can port over Gleb's changes first and base my patch on top of it. Regards, Andre. Signed-off-by: Andre Przywara -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 277-84917 ----to satisfy European Law for business letters: AMD Saxony Limited Liability Company & Co. KG, Wilschdorfer Landstr. 101, 01109 Dresden, Germany Register Court Dresden: HRA 4896, General Partner authorized to represent: AMD Saxony LLC (Wilmington, Delaware, US) General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy