All of lore.kernel.org
 help / color / mirror / Atom feed
From: Salil Mehta <salil.mehta@huawei.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Andrew Jones <drjones@redhat.com>,
	"gshan@redhat.com" <gshan@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
	pbonzini <pbonzini@redhat.com>
Subject: RE: [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices
Date: Wed, 3 Jun 2020 15:13:26 +0000	[thread overview]
Message-ID: <ab94a6dec7b9435cbb36ca9d488a0daa@huawei.com> (raw)
In-Reply-To: <20200127160352.54f95875@redhat.com>

Hi Igor,
My sincere Apologies, I just realized that I missed to reply this mail.
I was distracted to something else in  the month of the February and
had only resumed working on hotplug in march. But will still reply to
this mail. Also, I have incorporated most of the below points as in the
vcpu hotplug patches as per your suggestions.


> From: Qemu-arm [mailto:qemu-arm-bounces+salil.mehta=huawei.com@nongnu.org]
> On Behalf Of Igor Mammedov
> Sent: Monday, January 27, 2020 3:04 PM
> To: Salil Mehta <salil.mehta@huawei.com>
> 
> On Fri, 24 Jan 2020 18:44:16 +0000
> Salil Mehta <salil.mehta@huawei.com> wrote:
> 
> > > From: Igor Mammedov [mailto:imammedo@redhat.com]
> > > Sent: Friday, January 24, 2020 4:07 PM
> > >
> > > On Fri, 24 Jan 2020 15:02:10 +0000
> > > Salil Mehta <salil.mehta@huawei.com> wrote:
> > >
> > > > > From: Igor Mammedov [mailto:imammedo@redhat.com]
> > > > > Sent: Friday, January 24, 2020 1:54 PM
> > > > > To: Salil Mehta <salil.mehta@huawei.com>
> > > > >
> > > > > On Fri, 24 Jan 2020 11:20:15 +0000
> > > > > Salil Mehta <salil.mehta@huawei.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > > I am working on vCPU Hotplug feature for ARM64 and I am in mid of understanding
> > > > > > some aspect of device_add/device_del interface of the QEMU.
> > > > > >
> > > > > > Observations:
> > > > > > 1. Any object initialised by qmp_device_add() gets into /machine/unattached
> > > > > > container. I traced the flow to code leg inside  device_set_realized()
> > > > > > 2. I could see the reverse qmp_device_del() expects the device to be in
> > > > > > /machine/peripheral container.
> > > > > > 3. I could see any object initially added to unattached container did not had
> > > > > > their parents until object_add_property_child() was called further in the leg.
> > > > > > which effectively meant a new property was created and property table
> > > > > > populated and child was parented.
> > > > > > 4. Generally, container  /machine/peripheral was being used wherever
> > > > > > DEVICE(dev)->id was present and non-null.
> > > > > >
> > > > > > Question:
> > > > > > 1. Wanted to confirm my understanding about the use of having separate
> > > > > > containers like unattached, peripheral and anonymous.
> > > > >
> > > > > > 2. At init time all the vcpus goes under *unattached* container. Now,
> > > > > > qmp_device_del() cannot be used to unplug them. I am wondering
> > > > >
> > > > > device is put into 'unattached' in case it wasn't assigned a parent.
> > > > > Usually it happens when board creates device directly.
> > > >
> > > >
> > > > Sure, but if we decide that certain number(N) of vcpus are hotplugabble
> > > > and certain subset of N (say 'n' < 'N') should be allowed to be present
> > > > or cold-plugged at the init time then I wonder which of the following
> > > > is correct approach:
> > > >
> > > > 1. Bring all of N vcpus at boot time under "peripheral" container
> > > >                                    OR
> > > > 2. Just bring subset 'n' of 'N' under "peripheral" container and rest
> > > >     under "unattached" container? And later as and when rest of the
> > > >     vcpus are hotplugged they should be transferred from "unattached"
> > > >     container to "peripheral" container?
> > >
> > > issue with that is that to put device into "peripheral" container,
> > > 'the user' must provide 'id'. (currently QEMU isn't able to do it on its own
> > > [1])
> > >
> > > But it doesn't mean that cold-plugged CPUs can't be unpluged.
> > > What current users could do is start QEMU like this (simplified version):
> > >
> > >  $QEMU -smp 1,maxcpus=N -device foo-cpu-type,id=CPU00 -device
> > > foo-cpu-type,id=CPU01 ...
> > >
> > > i.e. 1st CPU is not manageable due to lack if 'id' and is created by board code,
> > > the rest have 'id' and could be managed.
> >
> >
> > I understand, that we can somehow assign ids from the QMP interface but
> > above will not push vcpus into "peripheral" container. They will appear
> > in "unattached" container but with specified names and as-far-as I can
> > see in the code 'device_del' can only delete objects/devices from the
> > 'peripheral' container?
> 
> qemu-system-x86_64 -monitor stdio \
>     -smp 1,maxcpus=3 \
>     -device qemu64-x86_64-cpu,id=foo,socket-id=1,core-id=0,thread-id=0 \
>     -device qemu64-x86_64-cpu,socket-id=2,core-id=0,thread-id=0
> 
> (qemu) info hotpluggable-cpus
> Hotpluggable CPUs:
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/peripheral-anon/device[0]"
>                       ^^^^^^^^^^^^^^^
>   CPUInstance Properties:
>     socket-id: "2"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/peripheral/foo"
>                       ^^^^^^^^^^
> 
> in gist, if device is created with any variant of device_add,
> it goes to "peripheral[-anon]" including cold-plugged one.


I am not sure why you said "including cold-plugged one"? I hope by
cold-plug'ging you mean here are the CPUs which already exist at
the boot time of the Guest VM and plugged using -device?


>   CPUInstance Properties:
>     socket-id: "1"
>     core-id: "0"
>     thread-id: "0"
>   type: "qemu64-x86_64-cpu"
>   vcpus_count: "1"
>   qom_path: "/machine/unattached/device[0]"
                          ^^^^^^^^^^^^^

Unless you have pasted wrongly here, above output also shows qom path
as 'unattached' for cold-plugged CPUs. Am I missing something? :)


>   CPUInstance Properties:
>     socket-id: "0"
>     core-id: "0"
>     thread-id: "0"



> > Plus, having those many ids specified for over large number of vcpus
> > does not looks very practical solution. We need interface like auto
> number of IDs is not a problem since it's usually handled by management
> software just fine (ex: libvirt does it)
> 
> > Generation of ids even at the boot time. I could see from the link you
> > have shared that it is already being used by ID_BLOCK subsystem. Can we
> > create a new subsystem for cpus under this and do the auto Generation
> > of vcpu ids as well?
> 
> I'm not sure that auto ids was actually merged.
> (I thought it wasn't)


Well if you were referring to below then it has been part of qemu for quite
long now:

Patch: util - add automated ID generation utility
File: https://github.com/qemu/qemu/blob/master/util/id.c
Commit-id: https://github.com/qemu/qemu/commit/a0f1913637e6


> Anyway auto IDs are not directly related to enabling CPU hotplug for ARM,
> if you feel they should be generated you can try to propose patches.

Sure. 


> > Plus, there is another unique requirement specifically for realizing
> > vcpu hotplug for ARM64.
> >
> > Summary:
> > Right now ARM architecture does not allows reinitializing the GIC
> > after VM has booted. Therefore, we are planning to pre-size the GIC
> > interfaces at init time by initializing all of the possible vcpus
> > and keep them in 'realized' but 'unavailable' state to the Guest
> > VM. They shall be made available as-and-when vcpus are hot-plugged
> > later-on. Therefore, current efforts are to be able to plug and
> > unplug from the qemu QOM without destroying the existing state of
> > the devices/objects representing vcpus. These all possible vcpus
> > shall be created once at the boot time of the VM. The vcpus which
> > are not available to the Guest VM can be Parked.
> >
> > Once the vcpus are hot-(un)plug'ged only the (pre-)plug/unplug(-request)
> > interfaces are used to convey this even information to the Guest
> > VM.
> >
> > I have tested this solution and it works but I wanted to make sure
> > that I am not doing anything which breaks any of the existing Qemu
> > QOM interfaces and basic fundamental idea behind being able to attach
> > and detach from the Qemu QOM is okay?
> >
> > Any suggestion are welcome in this regard?
> 
> From discussion with Drew [CCed], I got that kvm/arm isn't designed
> to support vCPU hotplug (and it would require heavy refactoring to
> separate GIC and VCPUs, which probably won't be welcomed by maintainers).

Agreed. I have pre-sized GIC with possible vcpus. I think Marc did
mention about this earlier.

> 
> But that's only KVM side of the equation. Assuming that we don't
> touch KVM much, the only QEMU side is left.
> 
> Further lets call
>  * vCPU - a kvm's part of CPU
>  * CPU - QEMU object which is linked to vCPU via file descriptor.
> 
> In QEMU we have CPU devices which optionally might create vCPUs
> during device realize time (if QEMU runs with KVM accelerator).
> 
> So from design point of view, I'd suggest to dynamically
> create/remove CPU devices on demand using existing
>  -device/device_add/device_del interface
> like we do for other architectures.
> 
> But in case of running with KVM accelerator, to accommodate
> current non dynamic ARM/KVM, I'd move vCPU creation to "kvm_init()"
> time or board init time, so it would pre-create vCPUs in
> KVM early in parked state and put them in 'kvm_parked_vcpus' list
> but won't create CPU devices for them.


Something similar. But I am doing it from virt machine init.


> Then later when management adds CPU device either with
> '-device' or 'device_add', a new CPU device will pick up
> pre-created parked vCPU file descriptor and re-use it.
> 
> Parked vCPU infrastructure is already exists in QEMU as we use it
> for hot-unplugged CPUs for the same reasons (it needs too much
> refactoring on KVM side to really remove vCPU).

Agreed with respect to KVM change. I have added some new in qemu/kvm
layer but some re-factoring might be needed later to extract common
code. Also, ARM arch specific functions had to be introduced.

> 
> So when CPU is hot-unplugged, we put linked vCPU file descriptor
> into kvm_parked_vcpus (see: kvm_destroy_vcpu) and completely delete
> CPU device on QEMU side. Then when the same CPU is hot-plugged again,
> we reuse previously parked vCPU file descriptor (see: kvm_get_vcpu).


Yes, I have done exactly this and perhaps this approach was discussed
in earlier mail-chains as well involving you. So taken your suggestion
and many thanks for this :)


> > NOTE: I plan to share the patches with the community which includes
> > both the changes of the Linux Kernel and the QEMU in near future.


Plan to share a RFC within a week for sure. 


> > > 1) here is what I could find on IDs topic
> > >    https://lists.gnu.org/archive/html/qemu-block/2015-09/msg00011.html
> > >
> > > > > >    if all the hotplug devices need to go under the *peripheral* container while
> > > > > > they are hotplugged and during object init time as well?
> > > > >
> > > > > theoretically device_del may use QOM path (the later users can get with
> > > > > query-hotpluggable-cpus),
> > > > > but I think it's mostly debugging feature.
> > > >
> > > >
> > > > Sure.
> > > >
> > > >
> > > > > users are supposed to specify 'id' during -device/device_add if they are going
> > > > > to manage that device.
> > > > > afterwards (like unplugging it). Then they could use that 'id' in other commands
> > > > > (including device_del)
> > > > >
> > > > > So 'id'-ed devices end up in 'peripheral' container.
> > > >
> > > >
> > > > Sure, what if hotplugged device is removed and then added again? It looks
> > > > qmp_device_add() interface will again end up calling the device_set_realized()
> > > > which eventually would put hotplugged devices under "unattached" container?
> > >
> > > it won't, see call chain:
> > >
> > >   qmp_device_add()
> > >       -> qdev_device_add()
> > >           -> qdev_set_id()
> >
> > Ok, sure. I did see the qdev_set_id() interface. Infact, earlier I was actually
> > trying to play with it by making it more generic and adding even the 'unattached'
> > container handling to it(which is missing right now) and calling it inside the
> > device_set_realized()  instead of below code:
> >
> >         if (!obj->parent) {
> >             gchar *name = g_strdup_printf("device[%d]", unattached_count++);
> >
> >             object_property_add_child(container_get(qdev_get_machine(),
> >                                                     "/unattached"),
> >                                       name, obj, &error_abort);
> >             unattached_parent = true;
> >             g_free(name);
> >         }
> >
> > Idea of above dabbling was to have common interface for 'unattached' container
> > and call it from virt.c from machvirt_init() where possible vcpus are being
> > created. Force them to be located either inside 'unttached' or 'peripheral'
> > container akin to the example you had given.
> >
> > If we look at qdev_device_add() function, after setting dev->id using
> > qdev_set_id() (which would also result in parenting of an object under
> > 'peripheral' container), it calls the function to 'realize' the device/object
> > which would end up in hitting above shared code excerpt and now because
> > it will have the parent already set, the object won't go into 'unattached'
> > container.
> >
> > Currently, later cannot be controlled for the cold-lugged vcpus. Therefore,
> > before this discussion my initial thought process was to either make
> > qdev_set_id() universal (i.e. include handling for all type of containers
> > unattached/peripheral/anonymous in qdev_set_id()). Then call it from
> > machvirt_init() just before the vcpus have been 'realized' so that
> > cold-plugged cpus could get into 'peripheral' container. This would help
> > in hot-unplugging using the standard 'device_del' interface.
> 
> I think we understand cold-plugged vcpus differently.
> From QEMU point of view there are 2 kinds of cold-plugged CPU devices.
> 
> Ones that are currently created by board directly following pattern
> 
>  object_new()
>  set properties
>  relalize
> 
> these are created (lets call them builtin) in amount specified by -smp X
> and are not manageable by external applications.
> 
> The second kind of cold-plugged CPUs are the ones created on command
> line with help of:
> 
>  -device cpu-type-foo
> 
> these are created by management applications and could be hot-removed
> if management supplied 'id'. For example libvirt starts qemu with 1
> built-in cpu in paused mode (-s) QEMU, then it hotplugs via QMP N cpus
> and lets VM run (it's essentially the same as using -device on CLI).
> This way it can remove all CPUs except of 1st one which is good enough
> in almost all the cases.
> 

Sure, I can see your point. I have tried to incorporate part of it.
Maybe you would like to check the code which will be sent soon.


Many thanks
Salil


  reply	other threads:[~2020-06-03 15:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-24 11:20 [Question] Regarding containers "unattached/peripheral/anonymous" - their relation with hot(un)plug of devices Salil Mehta
2020-01-24 13:54 ` Igor Mammedov
2020-01-24 15:02   ` Salil Mehta
2020-01-24 16:06     ` Igor Mammedov
2020-01-24 18:44       ` Salil Mehta
2020-01-27 15:03         ` Igor Mammedov
2020-06-03 15:13           ` Salil Mehta [this message]
2020-06-04  9:54             ` Igor Mammedov
2020-06-04 11:08               ` Salil Mehta
2020-02-03 10:40   ` Michael S. Tsirkin
2020-02-03 15:48     ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab94a6dec7b9435cbb36ca9d488a0daa@huawei.com \
    --to=salil.mehta@huawei.com \
    --cc=drjones@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.