Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: peter.maydell@linaro.org, mst@redhat.com, qemu-devel@nongnu.org,
	dgilbert@redhat.com, amit.shah@redhat.com, lersek@redhat.com
Subject: Re: [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0
Date: Thu, 24 Jul 2014 16:28:45 +0200	[thread overview]
Message-ID: <53D1181D.6090102@redhat.com> (raw)
In-Reply-To: <20140724105918.07e92cd7@nial.usersys.redhat.com>

Il 24/07/2014 10:59, Igor Mammedov ha scritto:
> On Wed, 23 Jul 2014 18:37:46 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> Changing the ACPI table size causes migration to break, and the memory
>> hotplug work opened our eyes on how horribly we were breaking things in
>> 2.0 already.
>>
>> The ACPI table size is rounded to the next 4k, which one would think
>> gives some headroom.  In practice this is not the case, because the user
>> can control the ACPI table size (each CPU adds 105 bytes) and so some
>> "-smp" values will break the 4k boundary and fail to migrate.  Similarly,
>> PCI bridges add ~1870 bytes to the SSDT.
>>
>> To fix this, hard-code 64k as the maximum ACPI table size, which
>> (despite being an order of magnitude smaller than 640k) should be enough
>> for everyone.
>>
>> To fix migration from QEMU 2.0, compute the payload size of QEMU 2.0
>> and always use that one.  The previous patch shrunk the ACPI tables
>> enough that the QEMU 2.0 size should always be enough.
>>
>> Non-AML tables can change depending on the configuration (especially
>> MADT, SRAT, HPET) but they remain the same between QEMU 2.0 and 2.1,
>> so we only compute our padding based on the sizes of the SSDT and DSDT.
>>
>> Migration from QEMU 1.7 should work for guests that have a number of CPUs
>> other than 12, 13, 14, 54, 55, 56, 97, 98, 139, 140, and that have no
>> PCI bridges.  It was already broken from QEMU 1.7 to QEMU 2.0 in the
>> same way, though.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  hw/i386/acpi-build.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++++----
>>  hw/i386/pc_piix.c    | 20 +++++++++++++++++
>>  hw/i386/pc_q35.c     |  5 +++++
>>  include/hw/i386/pc.h |  1 +
>>  4 files changed, 83 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index ebc5f03..7373d93 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -25,7 +25,9 @@
>>  #include <glib.h>
>>  #include "qemu-common.h"
>>  #include "qemu/bitmap.h"
>> +#include "qemu/osdep.h"
>>  #include "qemu/range.h"
>> +#include "qemu/error-report.h"
>>  #include "hw/pci/pci.h"
>>  #include "qom/cpu.h"
>>  #include "hw/i386/pc.h"
>> @@ -87,6 +89,8 @@ typedef struct AcpiBuildPciBusHotplugState {
>>      struct AcpiBuildPciBusHotplugState *parent;
>>  } AcpiBuildPciBusHotplugState;
>>  
>> +unsigned bsel_alloc;
>> +
>>  static void acpi_get_dsdt(AcpiMiscInfo *info)
>>  {
>>      uint16_t *applesmc_sta;
>> @@ -759,8 +763,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
>>  static void acpi_set_pci_info(void)
>>  {
>>      PCIBus *bus = find_i440fx(); /* TODO: Q35 support */
>> -    unsigned bsel_alloc = 0;
>>  
>> +    assert(bsel_alloc == 0);
>>      if (bus) {
>>          /* Scan all PCI buses. Set property to enable acpi based hotplug. */
>>          pci_for_each_bus_depth_first(bus, acpi_set_bsel, NULL, &bsel_alloc);
>> @@ -1440,13 +1444,14 @@ static
>>  void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>>  {
>>      GArray *table_offsets;
>> -    unsigned facs, dsdt, rsdt;
>> +    unsigned facs, ssdt, dsdt, rsdt;
>>      AcpiCpuInfo cpu;
>>      AcpiPmInfo pm;
>>      AcpiMiscInfo misc;
>>      AcpiMcfgInfo mcfg;
>>      PcPciInfo pci;
>>      uint8_t *u;
>> +    size_t aml_len = 0;
>>  
>>      acpi_get_cpu_info(&cpu);
>>      acpi_get_pm_info(&pm);
>> @@ -1474,13 +1479,20 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>>      dsdt = tables->table_data->len;
>>      build_dsdt(tables->table_data, tables->linker, &misc);
>>  
>> +    /* Count the size of the DSDT and SSDT, we will need it for legacy
>> +     * sizing of ACPI tables.
>> +     */
>> +    aml_len += tables->table_data->len - dsdt;
>> +
>>      /* ACPI tables pointed to by RSDT */
>>      acpi_add_table(table_offsets, tables->table_data);
>>      build_fadt(tables->table_data, tables->linker, &pm, facs, dsdt);
>>  
>> +    ssdt = tables->table_data->len;
>>      acpi_add_table(table_offsets, tables->table_data);
>>      build_ssdt(tables->table_data, tables->linker, &cpu, &pm, &misc, &pci,
>>                 guest_info);
>> +    aml_len += tables->table_data->len - ssdt;
>>  
>>      acpi_add_table(table_offsets, tables->table_data);
>>      build_madt(tables->table_data, tables->linker, &cpu, guest_info);
>> @@ -1513,12 +1525,53 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
>>      /* RSDP is in FSEG memory, so allocate it separately */
>>      build_rsdp(tables->rsdp, tables->linker, rsdt);
>>  
>> -    /* We'll expose it all to Guest so align size to reduce
>> +    /* We'll expose it all to Guest so we want to reduce
>>       * chance of size changes.
>>       * RSDP is small so it's easy to keep it immutable, no need to
>>       * bother with alignment.
>> +     *
>> +     * We used to align the tables to 4k, but of course this would
>> +     * too simple to be enough.  4k turned out to be too small an
>> +     * alignment very soon, and in fact it is almost impossible to
>> +     * keep the table size stable for all (max_cpus, max_memory_slots)
>> +     * combinations.  So the table size is always 64k for pc-2.1 and
>> +     * we give an error if the table grows beyond that limit.
>> +     *
>> +     * We still have the problem of migrating from "-M pc-2.0".  For that,
>> +     * we exploit the fact that QEMU 2.1 generates _smaller_ tables than 2.0
>> +     * and we can always pad the smaller tables with zeros.  We can then use
>> +     * the exact size of the 2.0 tables.
>> +     *
>> +     * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration.
>>       */
>> -    acpi_align_size(tables->table_data, 0x1000);
>> +    if (guest_info->legacy_acpi_table_size) {
>> +        /* Subtracting aml_len gives the size of fixed tables.  Then add the
>> +         * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
>> +         */
>> +        int legacy_aml_len =
>> +            guest_info->legacy_acpi_table_size +
>> +            97 * max_cpus +
> Commit message says it's 105 and not 97 so one of them should be fixed.
> Also please replace magic numbers (above and below) with defines so that
> it would be clear what they mean in the future.

Right, it's 97 in the SSDT and 8 in the MADT.

>> +            1875 * (MAX(bsel_alloc, 1) - 1);
>> +        int legacy_table_size =
>> +            ROUND_UP(tables->table_data->len - aml_len + legacy_aml_len, 0x1000);
> line over 80 characters
> 
>> +        if (tables->table_data->len > legacy_table_size) {
>> +            /* -M pc-2.0 doesn't support memory hotplug, so this should never
>> +             * happen.
> it supports hotplug on PCI bridges, which could lead to this branch,
> just dropping this comment is fine.

Hotplug on PCI bridges is accounted, see the 1875 above.

> Looking in future if we expand amount of supported VCPUs to 1024,
> SSDT table will quickly grow to 100K, perhaps 128K or 256K would be better?

This memory is allocated by the BIOS (including all the unused space at
the end), so I'd rather not have an exaggerate padding).

> 
>> +            /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
> isn't for 2.1 VCPUs max 256, or even for 2.0?

Yeah, this is just an example.  The limit is really just what the kernel
reports.

> line over 80 characters
> 
>> +            error_report("Too many maximum CPUs, NUMA nodes or memory slots.");
> Add PCI bridges here since they affect size greatly, and even if user removes
> all CPUs and turns off memory hotplug, he still will get this error if bridge devices
> at startup will exceed above limit.

Ok.

>> +            error_report("Please decrease one of these parameters.");
>> +            exit(1);
>> +        }
>> +        g_array_set_size(tables->table_data, 0x10000);
> Maybe define for size here and above?

Oops, of course. :)

Paolo

next prev parent reply	other threads:[~2014-07-24 14:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-23 16:37 [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Paolo Bonzini
2014-07-23 16:37 ` [Qemu-devel] [PATCH 1/2] acpi-dsdt: procedurally generate _PRT Paolo Bonzini
2014-07-23 19:27   ` Laszlo Ersek
2014-07-24  8:22   ` Igor Mammedov
2014-07-23 16:37 ` [Qemu-devel] [PATCH 2/2] pc: hack for migration compatibility from QEMU 2.0 Paolo Bonzini
2014-07-23 19:34   ` Laszlo Ersek
2014-07-24  8:59   ` Igor Mammedov
2014-07-24 14:28     ` Paolo Bonzini [this message]
2014-07-24 15:22 ` [Qemu-devel] [PATCH 0/2] pc: fix /etc/acpi/tables size in fw_cfg for -M pc-2.0 Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53D1181D.6090102@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).