Re: [Qemu-devel] [PATCH 2/3] VCPU hotplug support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@siemens.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "pingfank@linux.vnet.ibm.com" <pingfank@linux.vnet.ibm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"gleb@redhat.com" <gleb@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 2/3] VCPU hotplug support
Date: Mon, 23 Jan 2012 18:55:18 +0100	[thread overview]
Message-ID: <4F1D9F06.1040404@siemens.com> (raw)
In-Reply-To: <4F1D8ADD.4000306@redhat.com>

On 2012-01-23 17:29, Igor Mammedov wrote:
> On 01/17/2012 03:17 PM, Jan Kiszka wrote:
>> On 2012-01-17 14:17, Igor Mammedov wrote:
>>> Rebase of the missing bits from qemu-kvm for vcpu hotplug
>>
>> Description, please. Please try to split up, at least into PIIX4
>> preparations and "the rest". Maybe also a patch for a generic CPU
>> hotplug infrastructure.
>>
>>>
>>> Signed-off-by: Igor Mammedov<imammedo@redhat.com>
>>> ---
>>>   Makefile.objs     |    2 +-
>>>   Makefile.target   |    2 +-
>>>   hw/acpi_piix4.c   |   83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>>>   hw/pc.c           |   21 +++++++++++++-
>>>   hw/pc_piix.c      |    4 ++
>>>   sysemu.h          |    2 +
>>>   target-i386/cpu.h |    1 +
>>>   7 files changed, 108 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 45df666..2a8ccc1 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -212,7 +212,7 @@ hw-obj-$(CONFIG_USB_UHCI) += usb-uhci.o
>>>   hw-obj-$(CONFIG_USB_OHCI) += usb-ohci.o
>>>   hw-obj-$(CONFIG_USB_EHCI) += usb-ehci.o
>>>   hw-obj-$(CONFIG_FDC) += fdc.o
>>> -hw-obj-$(CONFIG_ACPI) += acpi.o acpi_piix4.o icc_bus.o
>>> +hw-obj-$(CONFIG_ACPI) += acpi.o icc_bus.o
>>>   hw-obj-$(CONFIG_APM) += pm_smbus.o apm.o
>>>   hw-obj-$(CONFIG_DMA) += dma.o
>>>   hw-obj-$(CONFIG_HPET) += hpet.o
>>> diff --git a/Makefile.target b/Makefile.target
>>> index 06d79b8..4865dde 100644
>>> --- a/Makefile.target
>>> +++ b/Makefile.target
>>> @@ -226,7 +226,7 @@ obj-y += device-hotplug.o
>>>   # Hardware support
>>>   obj-i386-y += vga.o
>>>   obj-i386-y += mc146818rtc.o pc.o
>>> -obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o
>>> +obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o acpi_piix4.o
>>
>> That's a qemu-kvm hack, see below for the discussion. No-go for upstream
>> - likely.
>>
>>>   obj-i386-y += vmport.o
>>>   obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
>>>   obj-i386-y += debugcon.o multiboot.o
>>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
>>> index bdc55a1..f1cdcc1 100644
>>> --- a/hw/acpi_piix4.c
>>> +++ b/hw/acpi_piix4.c
>>> @@ -39,13 +39,19 @@
>>>   #define ACPI_DBG_IO_ADDR  0xb044
>>>
>>>   #define GPE_BASE 0xafe0
>>> +#define PROC_BASE 0xaf00
>>>   #define GPE_LEN 4
>>>   #define PCI_BASE 0xae00
>>>   #define PCI_EJ_BASE 0xae08
>>>   #define PCI_RMV_BASE 0xae0c
>>>
>>> +#define PIIX4_CPU_HOTPLUG_STATUS 4
>>>   #define PIIX4_PCI_HOTPLUG_STATUS 2
>>>
>>> +struct gpe_regs {
>>> +    uint8_t cpus_sts[32];
>>> +};
>>> +
>>>   struct pci_status {
>>>       uint32_t up;
>>>       uint32_t down;
>>> @@ -71,6 +77,7 @@ typedef struct PIIX4PMState {
>>>
>>>       /* for pci hotplug */
>>>       ACPIGPE gpe;
>>> +    struct gpe_regs gpe_cpu;
>>>       struct pci_status pci0_status;
>>>       uint32_t pci0_hotplug_enable;
>>>   } PIIX4PMState;
>>> @@ -90,9 +97,11 @@ static void pm_update_sci(PIIX4PMState *s)
>>>                      ACPI_BITMASK_POWER_BUTTON_ENABLE |
>>>                      ACPI_BITMASK_GLOBAL_LOCK_ENABLE |
>>>                      ACPI_BITMASK_TIMER_ENABLE)) != 0) ||
>>> -        (((s->gpe.sts[0]&  s->gpe.en[0])&  PIIX4_PCI_HOTPLUG_STATUS) != 0);
>>> +        (((s->gpe.sts[0]&  s->gpe.en[0])&
>>> +         (PIIX4_PCI_HOTPLUG_STATUS | PIIX4_CPU_HOTPLUG_STATUS)) != 0);
>>>
>>>       qemu_set_irq(s->irq, sci_level);
>>> +
>>>       /* schedule a timer interruption if needed */
>>>       acpi_pm_tmr_update(&s->tmr, (s->pm1a.en&  ACPI_BITMASK_TIMER_ENABLE)&&
>>>                          !(pmsts&  ACPI_BITMASK_TIMER_STATUS));
>>> @@ -219,10 +228,9 @@ static int vmstate_acpi_post_load(void *opaque, int version_id)
>>>    {                                                                   \
>>>        .name       = (stringify(_field)),                              \
>>>        .version_id = 0,                                                \
>>> -     .num        = GPE_LEN,                                          \
>>>        .info       =&vmstate_info_uint16,                             \
>>>        .size       = sizeof(uint16_t),                                 \
>>> -     .flags      = VMS_ARRAY | VMS_POINTER,                          \
>>> +     .flags      = VMS_SINGLE | VMS_POINTER,                         \
>>
>> Does this make the vmstate incompatible to the current version?
> I suspect it makes it incompatible, but not sure what to do about it.
> According to b2e4a396f7 in qemu-kvm it fixes migration bug. It probably
> should be an independed patch not related to hotplug since qemu.git has
> commit 23910d3f6 that introduced the code there and it applies to gpe
> struct in general.

So this hunks converts an array back to a single uint16_t entry - unless
I'm now totally confused about VMSTATE_GPE_ARRAY. CC'ing Juan in the
hope he can parse it for us. That might be a bug in upstream then. But
it should definitely be address in a separate patch.

> 
>>
>>>        .offset     = vmstate_offset_pointer(_state, _field, uint8_t),  \
>>>    }
>>>
>>> @@ -329,11 +337,16 @@ static void piix4_pm_machine_ready(Notifier *n, void *opaque)
>>>
>>>   }
>>>
>>> +static PIIX4PMState *global_piix4_pm_state; /* cpu hotadd */
>>> +
>>>   static int piix4_pm_initfn(PCIDevice *dev)
>>>   {
>>>       PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, dev);
>>>       uint8_t *pci_conf;
>>>
>>> +    /* for cpu hotadd */
>>> +    global_piix4_pm_state = s;
>>> +
>>>       pci_conf = s->dev.config;
>>>       pci_conf[0x06] = 0x80;
>>>       pci_conf[0x07] = 0x02;
>>> @@ -425,7 +438,16 @@ device_init(piix4_pm_register);
>>>   static uint32_t gpe_readb(void *opaque, uint32_t addr)
>>>   {
>>>       PIIX4PMState *s = opaque;
>>> -    uint32_t val = acpi_gpe_ioport_readb(&s->gpe, addr);
>>> +    uint32_t val = 0;
>>> +    struct gpe_regs *g =&s->gpe_cpu;
>>> +
>>> +    switch (addr) {
>>> +    case PROC_BASE ... PROC_BASE+31:
>>> +        val = g->cpus_sts[addr - PROC_BASE];
>>> +        break;
>>> +    default:
>>> +        val = acpi_gpe_ioport_readb(&s->gpe, addr);
>>> +    }
>>>
>>>       PIIX4_DPRINTF("gpe read %x == %x\n", addr, val);
>>>       return val;
>>> @@ -519,11 +541,20 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>>   static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>>>   {
>>>       struct pci_status *pci0_status =&s->pci0_status;
>>> +    int i = 0, cpus = smp_cpus;
>>> +
>>> +    while (cpus>  0) {
>>> +        s->gpe_cpu.cpus_sts[i++] = (cpus<  8) ? (1<<  cpus) - 1 : 0xff;
>>> +        cpus -= 8;
>>> +    }
>>>
>>>       register_ioport_write(GPE_BASE, GPE_LEN, 1, gpe_writeb, s);
>>>       register_ioport_read(GPE_BASE, GPE_LEN, 1,  gpe_readb, s);
>>>       acpi_gpe_blk(&s->gpe, GPE_BASE);
>>>
>>> +    register_ioport_write(PROC_BASE, 32, 1, gpe_writeb, s);
>>> +    register_ioport_read(PROC_BASE, 32, 1,  gpe_readb, s);
>>> +
>>>       register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
>>>       register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
>>>
>>> @@ -536,6 +567,50 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
>>>       pci_bus_hotplug(bus, piix4_device_hotplug,&s->dev.qdev);
>>>   }
>>>
>>> +extern const char *global_cpu_model;
>>> +
>>> +#ifdef TARGET_I386
>>
>> Why only TARGET_I386? If the PIIX4 is used on other targets, the
> 
> May be there is no sence in spending time on abstacting PIIX4 that
> will be used only by x86 target. Is there an even remote chance that PIIX4
> could/will be used with other targets?

It's used on MIPS. At least it is built for that target.

> 
> 
>> infrastructure should be prepared to enable hotplugging for them as well
>> (at a later point). pc_new_cpu is a bad name then. And APIC fiddling
>> should be pushed into the arch-specific new-cpu handler.
> 
> I've started to implement 'new-cpu' as you suggested but could you clarify what
> you've meant under "APIC fiddling"?

I meant cpuid_apic_id initialization.

> Is there an arch specific place for pc like hw to put this in (pc_piix.c perhaps)?

Generic PC hardware bits should go to pc.c.

> 
>> Also note that target dependencies are a no-go for hwlib compilations
>> which is clearly preferrable today.
> I guess that target specific deps are here because of the code below
> needed access to CPUState of specific target, with proper abstraction
> this deps could be avoided.

Yep, that would be good.

> 
>>
>>> +static void enable_processor(PIIX4PMState *s, int cpu)
>>> +{
>>> +    struct gpe_regs *g =&s->gpe_cpu;
>>> +    ACPIGPE *gpe =&s->gpe;
>>> +
>>> +    *gpe->sts = *gpe->sts | PIIX4_CPU_HOTPLUG_STATUS;
>>> +    g->cpus_sts[cpu/8] |= (1<<  (cpu%8));
>>
>> "cpu / 8", "cpu % 8", please. Here and below. checkpatch doesn't complain?
> 
> checkpatch was happy, I'll fix it.
> 
>>
>>> +}
>>> +
>>> +static void disable_processor(PIIX4PMState *s, int cpu)
>>> +{
>>> +    struct gpe_regs *g =&s->gpe_cpu;
>>> +    ACPIGPE *gpe =&s->gpe;
>>> +
>>> +    *gpe->sts = *gpe->sts | PIIX4_CPU_HOTPLUG_STATUS;
>>> +    g->cpus_sts[cpu/8]&= ~(1<<  (cpu%8));
>>> +}
>>> +
>>> +void qemu_system_cpu_hot_add(int cpu, int state)
>>> +{
>>> +    CPUState *env;
>>> +    PIIX4PMState *s = global_piix4_pm_state;
>>> +
>>> +    if (state&&  !qemu_get_cpu(cpu)) {
>>> +        env = pc_new_cpu(global_cpu_model);
>>> +        if (!env) {
>>> +            fprintf(stderr, "cpu %d creation failed\n", cpu);
>>> +            return;
>>> +        }
>>> +        env->cpuid_apic_id = cpu;
>>
>> See comment above about proper target abstraction.
>>
> 
> Let suppose that we have on command line option "-smp 1,maxcpus=3"
> and then call
>    qemu_system_cpu_hot_add( 2, 1)
>    qemu_system_cpu_hot_add( 2, 1)
> and we end up with
>    [cpu_index = 1, cpuid_apic_id = 2],
>    [cpu_index = 2, cpuid_apic_id = 2]
> qemu_get_cpu(cpu) uses cpu_index field to lookup vcpu and
> cpu_init -> cpu_x86_init -> cpu_exec_init doesn't have an ability
> to create vcpu with specific cpu_index and numbers them according
> to position in vcpu list. So we end up with 2 new vcpus with the
> same cpuid_apic_id. That for sure might confuse guest since cpuid
> for last 2 cpus will return the same value. And I'm not sure what
> will happen if we call
>    qemu_system_cpu_hot_add( 2, 1)
>    qemu_system_cpu_hot_add( 1, 1)
> and end up with
>    [cpu_index = 1, cpuid_apic_id = 2],
>    [cpu_index = 2, cpuid_apic_id = 1]
> I don't know what kind of trouble this could cause.
> 
> It seams that "env->cpuid_apic_id = cpu" is pointless especcialy
> taking in account that in cpu_x86_init cpuid_apic_id is initialized
> by cpu_index.
> What we gain in having cpuid_apic_id that actually duplicate cpu_index?
> May be there is sence to get rid of cpuid_apic_id?

cpu_index is for internal counting (I think to remember that,
cpuid_apic_id is the ID exposed to the guest. During CPU hotplug, you
can control this by virtually plugging the CPU in a specific slot. So we
need to pass this ID down the init chain - just not set it in generic code.

> 
> Another question is about how hot-plug/unplug should be designed:
> 1st approach:
>     With the current code we can't create vcpu with specific index.

Forget about index, the apic_id is important to control. And that could
become something like -cpu XXX,apid_id=N. Of course, collisions need to
be detected and rejected.

>     But we can implement xen like approach, where hot-plug command says
>     which amount of active vcpus guest should have. This way we can
>     leave current cpu_init -> cpu_x86_init -> cpu_exec_init call
>     chain without change and plug/unplug next/last vcpu.
> 
> 2nd approach:
>     Ability to plug/unplug individual vcpus based on their cpu_index.
>     to do this we need add cpu_index argument to cpu_init ->
>     cpu_x86_init -> cpu_exec_init call chain. It will look more
>     like the real hardware cpu hot-plug, but do virtual guests really
>     need it. And what for if this way is more preferrable than the 1st.
> 
>>> +    }
>>> +
>>> +    if (state) {
>>> +        enable_processor(s, cpu);
>>> +    } else {
>>> +        disable_processor(s, cpu);
>>> +    }
>>> +    pm_update_sci(s);
>>> +}
>>> +#endif
>>> +
>>>   static void enable_device(PIIX4PMState *s, int slot)
>>>   {
>>>       s->gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS;
>>> diff --git a/hw/pc.c b/hw/pc.c
>>> index 33d8090..c7342d8 100644
>>> --- a/hw/pc.c
>>> +++ b/hw/pc.c
>>> @@ -44,6 +44,8 @@
>>>   #include "ui/qemu-spice.h"
>>>   #include "memory.h"
>>>   #include "exec-memory.h"
>>> +#include "cpus.h"
>>> +#include "kvm.h"
>>
>> kvm? Likely not what you want.
> cpu_synchronize_post_init is declared in kvm.h
> Any suggestions are welcome.

Indeed. Ugly. Guess we should move prototypes to cpus.h, probably
uninlining the generic helpers. But that's a separate story. For now
your version is OK then.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

next prev parent reply	other threads:[~2012-01-23 17:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17 13:17 [Qemu-devel] [PATCH 0/3] Make vcpu hotplug work for qemu Igor Mammedov
2012-01-17 13:17 ` [Qemu-devel] [PATCH 1/3] Introduce a new bus "ICC" to connect APIC Igor Mammedov
2012-01-17 13:57   ` Jan Kiszka
2012-01-17 13:17 ` [Qemu-devel] [PATCH 2/3] VCPU hotplug support Igor Mammedov
2012-01-17 14:17   ` Jan Kiszka
2012-01-17 14:22     ` Jan Kiszka
2012-01-23 16:29     ` Igor Mammedov
2012-01-23 17:55       ` Jan Kiszka [this message]
2012-01-24 16:24         ` Igor Mammedov
2012-01-24 16:37           ` Jan Kiszka
2012-01-17 13:17 ` [Qemu-devel] [PATCH 3/3] add cpu_set qmp command Igor Mammedov
2012-01-17 14:18   ` Jan Kiszka
2012-01-19  9:38     ` Igor Mammedov
2012-01-19 10:24       ` Jan Kiszka
2012-01-19 11:04         ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F1D9F06.1040404@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=gleb@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pingfank@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.