From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46917) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHXgy-0002h0-9E for qemu-devel@nongnu.org; Thu, 05 Sep 2013 07:25:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VHXgs-0006er-Ag for qemu-devel@nongnu.org; Thu, 05 Sep 2013 07:25:40 -0400 Received: from cantor2.suse.de ([195.135.220.15]:38658 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VHXgs-0006en-1B for qemu-devel@nongnu.org; Thu, 05 Sep 2013 07:25:34 -0400 Message-ID: <52286A28.3070807@suse.de> Date: Thu, 05 Sep 2013 13:25:28 +0200 From: =?ISO-8859-15?Q?Andreas_F=E4rber?= MIME-Version: 1.0 References: <1375366359-11553-1-git-send-email-jjherne@us.ibm.com> <52272B78.3010804@suse.de> <52285F8F.5020705@de.ibm.com> In-Reply-To: <52285F8F.5020705@de.ibm.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 0/8] [PATCH RFC v3] s390 cpu hotplug List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Christian Borntraeger Cc: agraf@suse.de, ehabkost@redhat.com, qemu-devel@nongnu.org, "Jason J. Herne" , jfrei@linux.vnet.ibm.com, Anthony Liguori , imammedo@redhat.com, Luiz Capitulino , Einar Lueck Am 05.09.2013 12:40, schrieb Christian Borntraeger: > On 04/09/13 14:45, Andreas F=E4rber wrote: >> Hello, >> >> Am 01.08.2013 16:12, schrieb Jason J. Herne: >>> From: "Jason J. Herne" >>> >>> Latest code for cpu Hotplug on S390 architecture. This one is vastl= y simpler >>> than v2 as we have decided to avoid the command line specification=20 >>> of -device s390-cpu. >>> >>> The last version can be found here: >>> http://lists.gnu.org/archive/html/qemu-devel/2013-06/msg01183.html >>> >>> There is also a patch in this series to add cpu-add to the Qemu monit= or >>> interface. >>> >>> Hotplugged cpus are created in the configured state and can be used b= y the >>> guest after the guest onlines the cpu by:=20 >>> "echo 1 > /sys/bus/cpu/devices/cpuN/online" >>> >>> Hot unplugging is currently not implemented by this code.=20 >> >> We have been having several off-list discussions since then that I'll >> try to briefly summarize here, please correct or extend as needed: >> >> 1) CPU topology for QOM >> >> Physically a System z machine may have an MCM with, e.g., 6 CPUs with = 6 >> cores each. But unlike x86, there is PR/SM, LPAR and possibly z/VM in >> between Linux and hardware, so we do actually want to be able to >> hot-plug in quantities of 1 and not by 6 on s390x for the foreseeable >> future. We seem willing to set a QOM ABI in stone based on that assump= tion. >=20 > Just stepping in, Jason is on vacation this week. Everyone is welcome to comment. :) > To summarize my understanding: > You were thinking if CPU model needs topology (e.g. -device mcm,id=3Dm1= , -device cpu,mcm=3Dm1) > and s390 was the only arch left, that you were not sure about if topolo= gy is needed?=20 > All other platforms dont need topology for cpu hotplug? No, on the contrary: I don't want s390x to blindly copy x86 cpu-add, because for x86 we know that what we have is a hack to make it work today, but there we know we want to do device_add Xeon-X42-4242 instead, which then hot-plugs the 6 cores x 2 threads at once that a physical hot-plug would do and not hot-add individual threads. So the question of topology is not about what is below KVM but about what is inside QEMU, since x86 emulates i440fx/q35 based hardware. The understanding I reached on IRC is that s390x (similar to sPAPR) tries to emulate LPAR / z/VM layer rather than the hardware below them, thus no applicable concept of "real" hardware and arbitrary quantities. > Yes, we want to be able to hotplug single cores (not chips, not MCMs).=20 > It is pretty hard to pin the vCPUs to a given real topology for KVM. Yo= u need to > pin on LPAR and KVM. Libvirt could do some pinning of guest vCPUs to h= ost CPUs and > LPAR can have dedicated CPUs. But pinning a full chip (6cores) would on= ly make > sense in very rare cases. Last time I looked into this, the post-add hook was solely for overall ccw initialization. So we can use device_add s390-cpu today, can't we? The question that I still need to investigate is how the always-incrementing CPU address interacts with maxcpus. Consider maxcpus=3D6 and smp_cpus=3D2. 4x device_add should work. Now if we did 1x device_del, then 1x device_add should work again IMO. cpu-add checks the user-supplied id against maxcpus though iirc. Therefore my saying in multiple contexts that we should get the QEMU and KVM CPU count checks into the CPU realizefn so that we get the checks irrespective of the call site with nice error reporting. >> =3D> s390-cpu (or future subtypes) to be used with device_add. >> =3D> Flat /machine/cpu[n] list in composition tree a possibility. >> >> 1a) CPU topology for guests >> >> STSI instruction topology support not implemented yet. >=20 > Right not implemented yet, but we certainly want to be able to define t= he guest > visible topology at some point in time (grouping of cores basically).=20 > But I guess this does not mean that we have to go away from the flat li= st of CPUs. So STSI would show what real LPAR/CPU we are running on? But QEMU would have /machine/cpu[0]? Or do we need /machine/cpugroup[0]/cpu[0]? The latter is my concern here, to decide about child<> vs. link<> properties. To cope with device_add s390-cpu adding the device to /machine/peripheral/ or /machine/peripheral-anon/device[0] I *think* we'll need link<>, which would then translate back to ipi_states array as backend and the remaining question would be where to expose those properties in the composition tree - i.e. /machine/cpu[n] or /machine/ipi/cpu[n] or something - please suggest. Similarly if those become link<> properties then the CPUs created by the machine via smp_cpus need a canonical path as well; quite obviously both cannot be the same. Background is that long-term Anthony would like x86 CPU hot-plug to become setting/unsetting some /machine/cpu-socket[n] link<> property of the machine, and the ipi_states array seems a close equivalent on s390x. >> =3D> Guest unaware of any emulated topology today. >=20 > An additional problem is, that for the normal case (linux scheduler, no= pinning, also > no gang scheduling) the topology would change too fast. The guest would= be busy rebuilding > the scheduler domains all the time. [snip] Regards, Andreas --=20 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imend=F6rffer; HRB 16746 AG N=FCrnbe= rg