From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46917)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <afaerber@suse.de>) id 1VHXgy-0002h0-9E
	for qemu-devel@nongnu.org; Thu, 05 Sep 2013 07:25:46 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <afaerber@suse.de>) id 1VHXgs-0006er-Ag
	for qemu-devel@nongnu.org; Thu, 05 Sep 2013 07:25:40 -0400
Received: from cantor2.suse.de ([195.135.220.15]:38658 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <afaerber@suse.de>) id 1VHXgs-0006en-1B
	for qemu-devel@nongnu.org; Thu, 05 Sep 2013 07:25:34 -0400
Message-ID: <52286A28.3070807@suse.de>
Date: Thu, 05 Sep 2013 13:25:28 +0200
From: =?ISO-8859-15?Q?Andreas_F=E4rber?= <afaerber@suse.de>
MIME-Version: 1.0
References: <1375366359-11553-1-git-send-email-jjherne@us.ibm.com>
	<52272B78.3010804@suse.de> <52285F8F.5020705@de.ibm.com>
In-Reply-To: <52285F8F.5020705@de.ibm.com>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 0/8] [PATCH RFC v3] s390 cpu hotplug
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: agraf@suse.de, ehabkost@redhat.com, qemu-devel@nongnu.org, "Jason J. Herne" <jjherne@us.ibm.com>, jfrei@linux.vnet.ibm.com, Anthony Liguori <anthony@codemonkey.ws>, imammedo@redhat.com, Luiz Capitulino <lcapitulino@redhat.com>, Einar Lueck <elelueck@linux.vnet.ibm.com>

Am 05.09.2013 12:40, schrieb Christian Borntraeger:
> On 04/09/13 14:45, Andreas F=E4rber wrote:
>> Hello,
>>
>> Am 01.08.2013 16:12, schrieb Jason J. Herne:
>>> From: "Jason J. Herne" <jjherne@us.ibm.com>
>>>
>>> Latest code for cpu Hotplug on S390 architecture.   This one is vastl=
y simpler
>>> than v2 as we have decided to avoid the command line specification=20
>>> of -device s390-cpu.
>>>
>>> The last version can be found here:
>>> http://lists.gnu.org/archive/html/qemu-devel/2013-06/msg01183.html
>>>
>>> There is also a patch in this series to add cpu-add to the Qemu monit=
or
>>> interface.
>>>
>>> Hotplugged cpus are created in the configured state and can be used b=
y the
>>> guest after the guest onlines the cpu by:=20
>>> "echo 1 > /sys/bus/cpu/devices/cpuN/online"
>>>
>>> Hot unplugging is currently not implemented by this code.=20
>>
>> We have been having several off-list discussions since then that I'll
>> try to briefly summarize here, please correct or extend as needed:
>>
>> 1) CPU topology for QOM
>>
>> Physically a System z machine may have an MCM with, e.g., 6 CPUs with =
6
>> cores each. But unlike x86, there is PR/SM, LPAR and possibly z/VM in
>> between Linux and hardware, so we do actually want to be able to
>> hot-plug in quantities of 1 and not by 6 on s390x for the foreseeable
>> future. We seem willing to set a QOM ABI in stone based on that assump=
tion.
>=20
> Just stepping in, Jason is on vacation this week.

Everyone is welcome to comment. :)

> To summarize my understanding:
> You were thinking if CPU model needs topology (e.g. -device mcm,id=3Dm1=
, -device cpu,mcm=3Dm1)
> and s390 was the only arch left, that you were not sure about if topolo=
gy is needed?=20
> All other platforms dont need topology for cpu hotplug?

No, on the contrary: I don't want s390x to blindly copy x86 cpu-add,
because for x86 we know that what we have is a hack to make it work
today, but there we know we want to do device_add Xeon-X42-4242 instead,
which then hot-plugs the 6 cores x 2 threads at once that a physical
hot-plug would do and not hot-add individual threads.

So the question of topology is not about what is below KVM but about
what is inside QEMU, since x86 emulates i440fx/q35 based hardware.
The understanding I reached on IRC is that s390x (similar to sPAPR)
tries to emulate LPAR / z/VM layer rather than the hardware below them,
thus no applicable concept of "real" hardware and arbitrary quantities.

> Yes, we want to be able to hotplug single cores (not chips, not MCMs).=20
> It is pretty hard to pin the vCPUs to a given real topology for KVM. Yo=
u need to
> pin on LPAR and KVM. Libvirt could  do some pinning of guest vCPUs to h=
ost CPUs and
> LPAR can have dedicated CPUs. But pinning a full chip (6cores) would on=
ly make
> sense in very rare cases.

Last time I looked into this, the post-add hook was solely for overall
ccw initialization. So we can use device_add s390-cpu today, can't we?

The question that I still need to investigate is how the
always-incrementing CPU address interacts with maxcpus. Consider
maxcpus=3D6 and smp_cpus=3D2. 4x device_add should work. Now if we did 1x
device_del, then 1x device_add should work again IMO. cpu-add checks the
user-supplied id against maxcpus though iirc.

Therefore my saying in multiple contexts that we should get the QEMU and
KVM CPU count checks into the CPU realizefn so that we get the checks
irrespective of the call site with nice error reporting.

>> =3D> s390-cpu (or future subtypes) to be used with device_add.
>> =3D> Flat /machine/cpu[n] list in composition tree a possibility.
>>
>> 1a) CPU topology for guests
>>
>> STSI instruction topology support not implemented yet.
>=20
> Right not implemented yet, but we certainly want to be able to define t=
he guest
> visible topology at some point in time (grouping of cores basically).=20
> But I guess this does not mean that we have to go away from the flat li=
st of CPUs.

So STSI would show what real LPAR/CPU we are running on? But QEMU would
have /machine/cpu[0]? Or do we need /machine/cpugroup[0]/cpu[0]? The
latter is my concern here, to decide about child<> vs. link<> properties.

To cope with device_add s390-cpu adding the device to
/machine/peripheral/<id> or /machine/peripheral-anon/device[0] I *think*
we'll need link<>, which would then translate back to ipi_states array
as backend and the remaining question would be where to expose those
properties in the composition tree - i.e. /machine/cpu[n] or
/machine/ipi/cpu[n] or something - please suggest. Similarly if those
become link<> properties then the CPUs created by the machine via
smp_cpus need a canonical path as well; quite obviously both cannot be
the same.

Background is that long-term Anthony would like x86 CPU hot-plug to
become setting/unsetting some /machine/cpu-socket[n] link<> property of
the machine, and the ipi_states array seems a close equivalent on s390x.

>> =3D> Guest unaware of any emulated topology today.
>=20
> An additional problem is, that for the normal case (linux scheduler, no=
 pinning, also
> no gang scheduling) the topology would change too fast. The guest would=
 be busy rebuilding
> the scheduler domains all the time.
[snip]

Regards,
Andreas

--=20
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imend=F6rffer; HRB 16746 AG N=FCrnbe=
rg