* [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good @ 2014-09-18 16:17 Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 1/6] pc: initialize fw_cfg earlier Paolo Bonzini ` (7 more replies) 0 siblings, 8 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst In the emergency last-minute patches of QEMU 2.1 we did two things: - fixed migration problems from 1.7 or 2.0 to 2.1 due to changes in ACPI table sizes - ensured that future versions will not break migration compatibility with 2.2 for reasonable configurations (with ACPI tables smaller than a hundred kilobytes, roughly) However, this came at the cost of wasting 128 KB unconditionally on even the smaller configuration, and we didn't provide a mechanism to ensure compatibility with larger configurations. This series provides this mechanism. As mentioned early, the design is to consider the SSDT immutable and versioned (together with other non-AML tables such as HPET, TPMA and MADT, SRAT, MCFG, DMAR). The DSDT instead can change more or less arbitrarily. To do this, we add padding after the DSDT to allow for future growth. Once we do this, the size of the ACPI table fw_cfg "file" is constant given a machine type and a command-line, so we do not need anymore the larger 128KB padding. This is done in patches 4-6. However, there is another problem. As the ACPI tables grow, we need to move the address at which linuxboot.bin loads the initrd. This address is placed close to the end of memory, but it is QEMU that tells linuxboot.bin where exactly the initrd is to be loaded. And QEMU cannot really know how much high memory SeaBIOS will use, because QEMU does not know the final e820 memory map. The solution would be to let linuxboot.bin parse the memory map and ignore the suggested initrd base address, but that's tedious. In the meanwhile, we can just assume that most of the need comes from the ACPI tables (which is in fact true: patch 3 adds a fixed 32k extra just in case) and dynamically resize the padding. This is what patches 1-3 do. The nice part is that they also remove some differences between Xen and other accelerators. I would appreciate Xen testing from interested people. Thanks, Paolo Paolo Bonzini (6): pc: initialize fw_cfg earlier pc: load the kernel after ACPI tables are built pc: redo sizing of reserved high memory area for -kernel/-initrd pc: introduce new ACPI table sizing algorithm pc: go back to smaller ACPI tables pc: clean up pre-2.1 compatibility code hw/i386/acpi-build.c | 23 +++++++++------- hw/i386/pc.c | 72 +++++++++++++++++---------------------------------- hw/i386/pc_piix.c | 32 ++++++++++++++-------- hw/i386/pc_q35.c | 7 ++-- include/hw/i386/pc.h | 4 ++ 5 files changed, 66 insertions(+), 72 deletions(-) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 1/6] pc: initialize fw_cfg earlier 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini @ 2014-09-18 16:17 ` Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 2/6] pc: load the kernel after ACPI tables are built Paolo Bonzini ` (6 subsequent siblings) 7 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst Notifiers are called in LIFO order. In our case, we want fw_cfg to use the output of the machine-specific notifier: we will process the -kernel command-line option there, and this will modify the boot order which fw_cfg's notifier processes. Because fw_cfg's notifier has to run last, we have to create the device early, before pc_guest_info_init registers its own notifier. It turns out that pc_guest_info_init is itself a good place to create fw_cfg. This also makes Xen more similar to native machine types. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/i386/pc.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index b6c9b61..b95ac18 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -104,6 +104,8 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +static FWCfgState *bochs_bios_init(void); + void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -1093,6 +1095,9 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, PcGuestInfo *guest_info = &guest_info_state->info; int i, j; + guest_info->fw_cfg = bochs_bios_init(); + rom_set_fw(guest_info->fw_cfg); + guest_info->ram_size_below_4g = below_4g_mem_size; guest_info->ram_size = below_4g_mem_size + above_4g_mem_size; guest_info->apic_id_limit = pc_apic_id_limit(max_cpus); @@ -1173,22 +1178,17 @@ FWCfgState *xen_load_linux(const char *kernel_filename, PcGuestInfo *guest_info) { int i; - FWCfgState *fw_cfg; assert(kernel_filename != NULL); - fw_cfg = fw_cfg_init(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 1, 0, 0); - rom_set_fw(fw_cfg); - - load_linux(fw_cfg, kernel_filename, initrd_filename, + load_linux(guest_info->fw_cfg, kernel_filename, initrd_filename, kernel_cmdline, below_4g_mem_size); for (i = 0; i < nb_option_roms; i++) { assert(!strcmp(option_rom[i].name, "linuxboot.bin") || !strcmp(option_rom[i].name, "multiboot.bin")); rom_add_option(option_rom[i].name, option_rom[i].bootindex); } - guest_info->fw_cfg = fw_cfg; - return fw_cfg; + return guest_info->fw_cfg; } FWCfgState *pc_memory_init(MachineState *machine, @@ -1202,7 +1202,7 @@ FWCfgState *pc_memory_init(MachineState *machine, int linux_boot, i; MemoryRegion *ram, *option_rom_mr; MemoryRegion *ram_below_4g, *ram_above_4g; - FWCfgState *fw_cfg; + FWCfgState *fw_cfg = guest_info->fw_cfg; PCMachineState *pcms = PC_MACHINE(machine); assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size); @@ -1280,9 +1280,6 @@ FWCfgState *pc_memory_init(MachineState *machine, option_rom_mr, 1); - fw_cfg = bochs_bios_init(); - rom_set_fw(fw_cfg); - if (guest_info->has_reserved_memory && pcms->hotplug_memory_base) { uint64_t *val = g_malloc(sizeof(*val)); *val = cpu_to_le64(ROUND_UP(pcms->hotplug_memory_base, 0x1ULL << 30)); @@ -1297,7 +1294,6 @@ FWCfgState *pc_memory_init(MachineState *machine, for (i = 0; i < nb_option_roms; i++) { rom_add_option(option_rom[i].name, option_rom[i].bootindex); } - guest_info->fw_cfg = fw_cfg; return fw_cfg; } -- 2.1.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 2/6] pc: load the kernel after ACPI tables are built 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 1/6] pc: initialize fw_cfg earlier Paolo Bonzini @ 2014-09-18 16:17 ` Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 3/6] pc: redo sizing of reserved high memory area for -kernel/-initrd Paolo Bonzini ` (5 subsequent siblings) 7 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst The initrd is placed at the top of RAM, just below some space left for the ACPI tables. In order to allow for large ACPI tables, we need to delay loading of the initrd until after the ACPI tables are built. We can use the existing pc_guest_info_machine_done notifier, and also subsume xen_load_linux into this function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/i386/pc.c | 46 ++++++++++++++-------------------------------- hw/i386/pc_piix.c | 7 ------- 2 files changed, 14 insertions(+), 39 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index b95ac18..863a40e 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1082,10 +1082,22 @@ typedef struct PcGuestInfoState { static void pc_guest_info_machine_done(Notifier *notifier, void *data) { + int i; PcGuestInfoState *guest_info_state = container_of(notifier, PcGuestInfoState, machine_done); - acpi_setup(&guest_info_state->info); + PcGuestInfo *guest_info = &guest_info_state->info; + acpi_setup(guest_info); + + if (current_machine->kernel_filename != NULL) { + load_linux(guest_info->fw_cfg, + current_machine->kernel_filename, current_machine->initrd_filename, + current_machine->kernel_cmdline, guest_info->ram_size_below_4g); + } + + for (i = 0; i < nb_option_roms; i++) { + rom_add_option(option_rom[i].name, option_rom[i].bootindex); + } } PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, @@ -1171,26 +1183,6 @@ void pc_acpi_init(const char *default_dsdt) } } -FWCfgState *xen_load_linux(const char *kernel_filename, - const char *kernel_cmdline, - const char *initrd_filename, - ram_addr_t below_4g_mem_size, - PcGuestInfo *guest_info) -{ - int i; - - assert(kernel_filename != NULL); - - load_linux(guest_info->fw_cfg, kernel_filename, initrd_filename, - kernel_cmdline, below_4g_mem_size); - for (i = 0; i < nb_option_roms; i++) { - assert(!strcmp(option_rom[i].name, "linuxboot.bin") || - !strcmp(option_rom[i].name, "multiboot.bin")); - rom_add_option(option_rom[i].name, option_rom[i].bootindex); - } - return guest_info->fw_cfg; -} - FWCfgState *pc_memory_init(MachineState *machine, MemoryRegion *system_memory, ram_addr_t below_4g_mem_size, @@ -1199,7 +1191,7 @@ FWCfgState *pc_memory_init(MachineState *machine, MemoryRegion **ram_memory, PcGuestInfo *guest_info) { - int linux_boot, i; + int i; MemoryRegion *ram, *option_rom_mr; MemoryRegion *ram_below_4g, *ram_above_4g; FWCfgState *fw_cfg = guest_info->fw_cfg; @@ -1207,8 +1199,6 @@ FWCfgState *pc_memory_init(MachineState *machine, assert(machine->ram_size == below_4g_mem_size + above_4g_mem_size); - linux_boot = (machine->kernel_filename != NULL); - /* Allocate RAM. We allocate it as a single memory region and use * aliases to address portions of it, mostly for backwards compatibility * with older qemus that used qemu_ram_alloc(). @@ -1286,14 +1276,6 @@ FWCfgState *pc_memory_init(MachineState *machine, fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val)); } - if (linux_boot) { - load_linux(fw_cfg, machine->kernel_filename, machine->initrd_filename, - machine->kernel_cmdline, below_4g_mem_size); - } - - for (i = 0; i < nb_option_roms; i++) { - rom_add_option(option_rom[i].name, option_rom[i].bootindex); - } return fw_cfg; } diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 103d756..2aa2b6d 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -180,13 +180,6 @@ static void pc_init1(MachineState *machine, fw_cfg = pc_memory_init(machine, system_memory, below_4g_mem_size, above_4g_mem_size, rom_memory, &ram_memory, guest_info); - } else if (machine->kernel_filename != NULL) { - /* For xen HVM direct kernel boot, load linux here */ - fw_cfg = xen_load_linux(machine->kernel_filename, - machine->kernel_cmdline, - machine->initrd_filename, - below_4g_mem_size, - guest_info); } gsi_state = g_malloc0(sizeof(*gsi_state)); -- 2.1.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 3/6] pc: redo sizing of reserved high memory area for -kernel/-initrd 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 1/6] pc: initialize fw_cfg earlier Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 2/6] pc: load the kernel after ACPI tables are built Paolo Bonzini @ 2014-09-18 16:17 ` Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 4/6] pc: introduce new ACPI table sizing algorithm Paolo Bonzini ` (4 subsequent siblings) 7 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst The initrd is loaded close to the top of memory, with some extra padding on top to account for memory that the BIOS could reserve in the memory map (for ACPI tables and the like). This reserved high memory area can get quite big. We currently force a 128k size for 2.1+ machine types, but even that might not be enough: if the BIOS allocates more memory from the high area than just the ACPI tables, loading the initrd can stumble on that memory and break. So, we want to do two things. First, increase the size of the padding to account for BIOS data structures. 32k will do, since in practice the problems were fixed with just 4k. Second, to avoid wasting memory, size the padding based on the actual size of the ACPI tables. The table builder can pass the right size, padded as above and with a minimum of 64k. For older machine types, the padding has basically no consequence; simply, the acpi_data_size will be set at machine_ready time instead of machine init, which is made possible by calling load_linux after the ACPI tables are built. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/i386/acpi-build.c | 1 + hw/i386/pc.c | 6 +++--- hw/i386/pc_piix.c | 1 - hw/i386/pc_q35.c | 1 - include/hw/i386/pc.h | 2 +- 5 files changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index a313321..694ee91 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -1691,6 +1691,7 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) acpi_align_size(tables->table_data, ACPI_BUILD_TABLE_SIZE); } + pc_set_acpi_data_size(tables->table_data->len); acpi_align_size(tables->linker, ACPI_BUILD_ALIGN_SIZE); /* Cleanup memory that's no longer used. */ diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 863a40e..b4158f2 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -73,10 +73,10 @@ #endif /* Leave a chunk of memory at the top of RAM for the BIOS ACPI tables. */ -unsigned acpi_data_size = 0x28000; -void pc_set_legacy_acpi_data_size(void) +unsigned acpi_data_size = 0x10000; +void pc_set_acpi_data_size(unsigned size) { - acpi_data_size = 0x10000; + acpi_data_size = MAX(size + 0x8000, 0x10000); } #define BIOS_CFG_IOPORT 0x510 diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 2aa2b6d..ca9f07c 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -316,7 +316,6 @@ static void pc_compat_2_0(MachineState *machine) legacy_acpi_table_size = 6652; smbios_legacy_mode = true; has_reserved_memory = false; - pc_set_legacy_acpi_data_size(); } static void pc_compat_1_7(MachineState *machine) diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index d4a907c..4b5a274 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -280,7 +280,6 @@ static void pc_compat_2_0(MachineState *machine) { smbios_legacy_mode = true; has_reserved_memory = false; - pc_set_legacy_acpi_data_size(); } static void pc_compat_1_7(MachineState *machine) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 77316d5..853829b 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -176,7 +176,7 @@ void pc_acpi_init(const char *default_dsdt); PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, ram_addr_t above_4g_mem_size); -void pc_set_legacy_acpi_data_size(void); +void pc_set_acpi_data_size(unsigned); #define PCI_HOST_PROP_PCI_HOLE_START "pci-hole-start" #define PCI_HOST_PROP_PCI_HOLE_END "pci-hole-end" -- 2.1.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 4/6] pc: introduce new ACPI table sizing algorithm 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini ` (2 preceding siblings ...) 2014-09-18 16:17 ` [Qemu-devel] [PATCH 3/6] pc: redo sizing of reserved high memory area for -kernel/-initrd Paolo Bonzini @ 2014-09-18 16:17 ` Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 5/6] pc: go back to smaller ACPI tables Paolo Bonzini ` (3 subsequent siblings) 7 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst Add padding after the DSDT. Tables that vary depending on the command-line arguments will have to be byte-equivalent across QEMU versions >= 2.2, while fixed tables (including the DSDT) can be changed freely. This new algorithm will let us present smaller ACPI blobs to the guest, which avoids bugs with -kernel/-initrd and 32-bit RHEL5 guests. However, this patch does not change the size of the blobs yet; for now, the values of the parameters are tuned to have 2.1-compatible sizes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/i386/acpi-build.c | 11 +++++++---- hw/i386/pc_piix.c | 5 +++++ include/hw/i386/pc.h | 2 ++ 3 files changed, 14 insertions(+), 4 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 694ee91..48e7f2e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -65,8 +65,6 @@ #define ACPI_BUILD_LEGACY_CPU_AML_SIZE 97 #define ACPI_BUILD_ALIGN_SIZE 0x1000 -#define ACPI_BUILD_TABLE_SIZE 0x20000 - typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); } AcpiCpuInfo; @@ -1597,6 +1595,10 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables->table_data); build_fadt(tables->table_data, tables->linker, &pm, facs, dsdt); + if (guest_info->fixed_table_align) { + acpi_align_size(tables->table_data, guest_info->fixed_table_align); + } + ssdt = tables->table_data->len; acpi_add_table(table_offsets, tables->table_data); build_ssdt(tables->table_data, tables->linker, &cpu, &pm, &misc, &pci, @@ -1681,14 +1683,15 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) g_array_set_size(tables->table_data, legacy_table_size); } else { /* Make sure we have a buffer in case we need to resize the tables. */ - if (tables->table_data->len > ACPI_BUILD_TABLE_SIZE / 2) { + if (!guest_info->fixed_table_align && + tables->table_data->len > guest_info->acpi_table_align / 2) { /* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. */ error_report("Warning: ACPI tables are larger than 64k."); error_report("Warning: migration may not work."); error_report("Warning: please remove CPUs, NUMA nodes, " "memory slots or PCI bridges."); } - acpi_align_size(tables->table_data, ACPI_BUILD_TABLE_SIZE); + acpi_align_size(tables->table_data, guest_info->acpi_table_align); } pc_set_acpi_data_size(tables->table_data->len); diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index ca9f07c..1ecb281 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -61,6 +61,8 @@ static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; static bool has_acpi_build = true; static int legacy_acpi_table_size; +static int fixed_table_align = 0; +static int acpi_table_align = 131072; static bool smbios_defaults = true; static bool smbios_legacy_mode; /* Make sure that guest addresses aligned at 1Gbyte boundaries get mapped to @@ -164,6 +166,8 @@ static void pc_init1(MachineState *machine, guest_info->has_acpi_build = has_acpi_build; guest_info->legacy_acpi_table_size = legacy_acpi_table_size; + guest_info->fixed_table_align = fixed_table_align; + guest_info->acpi_table_align = acpi_table_align; guest_info->isapc_ram_fw = !pci_enabled; guest_info->has_reserved_memory = has_reserved_memory; @@ -314,6 +318,7 @@ static void pc_compat_2_0(MachineState *machine) * QEMU 1.7 it is 6414. For RHEL/CentOS 7.0 it is 6418. */ legacy_acpi_table_size = 6652; + acpi_table_align = 4096; smbios_legacy_mode = true; has_reserved_memory = false; } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 853829b..8b65b47 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -94,6 +94,8 @@ struct PcGuestInfo { uint64_t *node_cpu; FWCfgState *fw_cfg; int legacy_acpi_table_size; + int fixed_table_align; + int acpi_table_align; bool has_acpi_build; bool has_reserved_memory; }; -- 2.1.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 5/6] pc: go back to smaller ACPI tables 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini ` (3 preceding siblings ...) 2014-09-18 16:17 ` [Qemu-devel] [PATCH 4/6] pc: introduce new ACPI table sizing algorithm Paolo Bonzini @ 2014-09-18 16:17 ` Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 6/6] pc: clean up pre-2.1 compatibility code Paolo Bonzini ` (2 subsequent siblings) 7 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst The new algorithm introduced by the previous patch lets us make tables smaller and avoid bugs due to large tables. Use it for 2.2+ machine types by tweaking the default fixed_table_align and acpi_table_align values. At the same time, preserve backwards-compatible logic for pc-i440fx-2.1. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/i386/pc_piix.c | 19 ++++++++++++++++--- hw/i386/pc_q35.c | 6 ++++-- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 1ecb281..1c3469f 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -61,8 +61,8 @@ static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; static bool has_acpi_build = true; static int legacy_acpi_table_size; -static int fixed_table_align = 0; -static int acpi_table_align = 131072; +static int fixed_table_align = 16384; +static int acpi_table_align = 4096; static bool smbios_defaults = true; static bool smbios_legacy_mode; /* Make sure that guest addresses aligned at 1Gbyte boundaries get mapped to @@ -299,6 +299,12 @@ static void pc_init_pci(MachineState *machine) pc_init1(machine, 1, 1); } +static void pc_compat_2_1(MachineState *machine) +{ + fixed_table_align = 0; + acpi_table_align = 131072; +} + static void pc_compat_2_0(MachineState *machine) { /* This value depends on the actual DSDT and SSDT compiled into @@ -317,6 +323,7 @@ static void pc_compat_2_0(MachineState *machine) * 6652 is valid for QEMU 2.0, the right value for pc-i440fx-1.7 on * QEMU 1.7 it is 6414. For RHEL/CentOS 7.0 it is 6418. */ + pc_compat_2_1(machine); legacy_acpi_table_size = 6652; acpi_table_align = 4096; smbios_legacy_mode = true; @@ -365,6 +372,12 @@ static void pc_compat_1_2(MachineState *machine) x86_cpu_compat_disable_kvm_features(FEAT_KVM, KVM_FEATURE_PV_EOI); } +static void pc_init_pci_2_1(MachineState *machine) +{ + pc_compat_2_1(machine); + pc_init_pci(machine); +} + static void pc_init_pci_2_0(MachineState *machine) { pc_compat_2_0(machine); @@ -468,7 +481,7 @@ static QEMUMachine pc_i440fx_machine_v2_2 = { static QEMUMachine pc_i440fx_machine_v2_1 = { PC_I440FX_2_1_MACHINE_OPTIONS, .name = "pc-i440fx-2.1", - .init = pc_init_pci, + .init = pc_init_pci_2_1, .compat_props = (GlobalProperty[]) { PC_COMPAT_2_1, { /* end of list */ } diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 4b5a274..283956f 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -153,10 +153,12 @@ static void pc_q35_init(MachineState *machine) guest_info->has_acpi_build = has_acpi_build; guest_info->has_reserved_memory = has_reserved_memory; - /* Migration was not supported in 2.0 for Q35, so do not bother - * with this hack (see hw/i386/acpi-build.c). + /* Migration was not supported in 2.0 for Q35, so do not bother with + * hacks around the ACPI table size (see hw/i386/acpi-build.c). */ guest_info->legacy_acpi_table_size = 0; + guest_info->fixed_table_align = 16384; + guest_info->acpi_table_align = 4096; if (smbios_defaults) { MachineClass *mc = MACHINE_GET_CLASS(machine); -- 2.1.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH 6/6] pc: clean up pre-2.1 compatibility code 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini ` (4 preceding siblings ...) 2014-09-18 16:17 ` [Qemu-devel] [PATCH 5/6] pc: go back to smaller ACPI tables Paolo Bonzini @ 2014-09-18 16:17 ` Paolo Bonzini 2014-09-19 7:36 ` [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Gerd Hoffmann 2014-10-02 12:11 ` Michael S. Tsirkin 7 siblings, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-18 16:17 UTC (permalink / raw) To: qemu-devel; +Cc: jsnow, mst Now that the alignment is parameterized, we can share the call to acpi_align_size between all three (1.7-2.0/2.1/2.2+) sizing algorithms. Also, with the new rule that SSDT cannot change except with machine-type compat code, the magic 97 constant for a CPU's AML size is not anymore "legacy", so rename it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- hw/i386/acpi-build.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 48e7f2e..7809154 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -62,8 +62,8 @@ * a little bit, there should be plenty of free space since the DSDT * shrunk by ~1.5k between QEMU 2.0 and QEMU 2.1. */ -#define ACPI_BUILD_LEGACY_CPU_AML_SIZE 97 -#define ACPI_BUILD_ALIGN_SIZE 0x1000 +#define ACPI_BUILD_CPU_AML_SIZE 97 +#define ACPI_BUILD_ALIGN_SIZE 0x1000 typedef struct AcpiCpuInfo { DECLARE_BITMAP(found_cpus, ACPI_CPU_HOTPLUG_ID_LIMIT); @@ -1672,10 +1672,9 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) */ int legacy_aml_len = guest_info->legacy_acpi_table_size + - ACPI_BUILD_LEGACY_CPU_AML_SIZE * max_cpus; + ACPI_BUILD_CPU_AML_SIZE * max_cpus; int legacy_table_size = - ROUND_UP(tables->table_data->len - aml_len + legacy_aml_len, - ACPI_BUILD_ALIGN_SIZE); + tables->table_data->len - aml_len + legacy_aml_len; if (tables->table_data->len > legacy_table_size) { /* Should happen only with PCI bridges and -M pc-i440fx-2.0. */ error_report("Warning: migration may not work."); @@ -1691,8 +1690,8 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables) error_report("Warning: please remove CPUs, NUMA nodes, " "memory slots or PCI bridges."); } - acpi_align_size(tables->table_data, guest_info->acpi_table_align); } + acpi_align_size(tables->table_data, guest_info->acpi_table_align); pc_set_acpi_data_size(tables->table_data->len); acpi_align_size(tables->linker, ACPI_BUILD_ALIGN_SIZE); -- 2.1.0 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini ` (5 preceding siblings ...) 2014-09-18 16:17 ` [Qemu-devel] [PATCH 6/6] pc: clean up pre-2.1 compatibility code Paolo Bonzini @ 2014-09-19 7:36 ` Gerd Hoffmann 2014-09-19 8:06 ` Paolo Bonzini 2014-09-19 13:09 ` Paolo Bonzini 2014-10-02 12:11 ` Michael S. Tsirkin 7 siblings, 2 replies; 20+ messages in thread From: Gerd Hoffmann @ 2014-09-19 7:36 UTC (permalink / raw) To: Paolo Bonzini; +Cc: jsnow, qemu-devel, mst Hi, > However, there is another problem. As the ACPI tables grow, we need > to move the address at which linuxboot.bin loads the initrd. This > address is placed close to the end of memory, but it is QEMU that > tells linuxboot.bin where exactly the initrd is to be loaded. And > QEMU cannot really know how much high memory SeaBIOS will use, because > QEMU does not know the final e820 memory map. > > The solution would be to let linuxboot.bin parse the memory map and > ignore the suggested initrd base address, but that's tedious. In the > meanwhile, we can just assume that most of the need comes from the ACPI > tables (which is in fact true: patch 3 adds a fixed 32k extra just in > case) and dynamically resize the padding. Hmm. That assumes we are running seabios, where we know how much memory we actually need. IMHO we should either really parse the memory map, or reserve more space. IIRC it doesn't matter that much where we load the initrd. It should not be just after the kernel, because the kernel needs some space to unpack itself and for early allocations such as initial page tables. This is where the common practice to load the initrd high comes from. But whenever we leave 128k or 16m between initrd and top-of-memory doesn't make much of a difference. cheers, Gerd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-09-19 7:36 ` [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Gerd Hoffmann @ 2014-09-19 8:06 ` Paolo Bonzini 2014-09-19 13:09 ` Paolo Bonzini 1 sibling, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-19 8:06 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: jsnow, qemu-devel, mst Il 19/09/2014 09:36, Gerd Hoffmann ha scritto: > Hmm. That assumes we are running seabios, where we know how much memory > we actually need. Right. However, note that this only affects one patch in the series (patch 3). Patches 1-2 are useful to unify Xen-specific behavior with other hypervisors, and patches 4-6 are useful to future-proof ACPI table sizes for migration. > IMHO we should either really parse the memory map, or reserve more > space. I agree. However, we need to cater for a fixed initrd loading address until this is done. And even after linuxboot.bin is improved, the older version will be used when migrating for older QEMU machine types, so it is important to have a decent fallback. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-09-19 7:36 ` [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Gerd Hoffmann 2014-09-19 8:06 ` Paolo Bonzini @ 2014-09-19 13:09 ` Paolo Bonzini 1 sibling, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-09-19 13:09 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: jsnow, qemu-devel, mst Il 19/09/2014 09:36, Gerd Hoffmann ha scritto: > Hi, > >> However, there is another problem. As the ACPI tables grow, we need >> to move the address at which linuxboot.bin loads the initrd. This >> address is placed close to the end of memory, but it is QEMU that >> tells linuxboot.bin where exactly the initrd is to be loaded. And >> QEMU cannot really know how much high memory SeaBIOS will use, because >> QEMU does not know the final e820 memory map. >> >> The solution would be to let linuxboot.bin parse the memory map and >> ignore the suggested initrd base address, but that's tedious. In the >> meanwhile, we can just assume that most of the need comes from the ACPI >> tables (which is in fact true: patch 3 adds a fixed 32k extra just in >> case) and dynamically resize the padding. > > Hmm. That assumes we are running seabios, where we know how much memory > we actually need. > > IMHO we should either really parse the memory map, or reserve more > space. > > IIRC it doesn't matter that much where we load the initrd. It should > not be just after the kernel, because the kernel needs some space to > unpack itself and for early allocations such as initial page tables. > This is where the common practice to load the initrd high comes from. > But whenever we leave 128k or 16m between initrd and top-of-memory > doesn't make much of a difference. Ok, I wrote the e820 scanning code, and it works with KVM but it hits a TCG bug. The rep/movsb in SeaBIOS's e820 routine just doesn't write to es:di. The TCG ops seem sane: set_label $0x1 ext16u_i64 tmp2,rsi ld_i64 tmp3,env,$0x108 // load ds base add_i64 tmp2,tmp2,tmp3 ext32u_i64 tmp2,tmp2 qemu_ld_i64 tmp0,tmp2,ub,$0x2 // load into tmp0 ext16u_i64 tmp2,rdi ld_i64 tmp3,env,$0xc0 // load es base add_i64 tmp2,tmp2,tmp3 ext32u_i64 tmp2,tmp2 qemu_st_i64 tmp0,tmp2,ub,$0x2 // store from tmp0 ld32s_i64 tmp0,env,$0xac // increase rsi/rdi add_i64 tmp3,rsi,tmp0 deposit_i64 rsi,rsi,tmp3,$0x0,$0x10 add_i64 tmp3,rdi,tmp0 deposit_i64 rdi,rdi,tmp3,$0x0,$0x10 movi_i64 tmp13,$0xffffffffffffffff // decrement rcx add_i64 tmp3,rcx,tmp13 deposit_i64 rcx,rcx,tmp3,$0x0,$0x10 goto_tb $0x0 movi_i64 tmp3,$0xf7b4 st_i64 tmp3,env,$0x80 exit_tb $0x7fe8a2c167a0 set_label $0x0 exit_tb $0x7fe8a2c167a3 For now I'm giving up, here is the patch just in case. It also fails with 2.1.1. There is some debugging output that goes to the serial port. With KVM it prints 1/2/2/1/2/2, while with TCG it prints 0/0/0/0/0 (it should print 1/2/2/1/2 instead). diff --git a/pc-bios/optionrom/linuxboot.S b/pc-bios/optionrom/linuxboot.S index 748c831..e6f1be1 100644 --- a/pc-bios/optionrom/linuxboot.S +++ b/pc-bios/optionrom/linuxboot.S @@ -76,6 +76,96 @@ boot_kernel: copy_kernel: + push %ds + pop %es + + /* Compute initrd address */ + mov $0xe801, %ax + xor %cx, %cx + xor %dx, %dx + int $0x15 + + /* Output could be in AX/BX or CX/DX */ + or %cx, %cx + jnz 1f + or %dx, %dx + jnz 1f + mov %ax, %cx + mov %bx, %dx +1: + + or %dx, %dx + jnz 2f + addw $1024, %cx /* add 1 MB */ + movzwl %cx, %ebp + shll $10, %ebp /* convert to bytes */ + jmp mmap_loop_start + +2: + addw $16777216 >> 16, %dx /* add 16 MB */ + movzwl %dx, %ebp + shll $16, %ebp /* convert to bytes */ + + /* EBP (end of memory) is a hint to the loop below, that computes the + final location using the e820 memory map. O(n^2) loop, but e820 + is small anyway. */ + +mmap_loop_start: + movl %ebp, %esi /* ESI = end of memory */ + + read_fw FW_CFG_INITRD_SIZE + subl %eax, %ebp /* EBP = start of initrd */ + andl $-4096, %ebp + + xor %ebx, %ebx + + /* now move it further down according to the indications of the e820 + memory map... */ +mmap_loop: + mov $0xe820, %ax + mov $0x534D4150, %edx + mov $24, %ecx + mov $e820, %edi + int $0x15 + jc mmap_done /* if at end of list, we're done */ + cmp $0x534D4150, %eax /* if BIOS broken, exit */ + jnz mmap_done + or %ebx, %ebx /* another check for end of list */ + jz mmap_done + +mov 16(%di), %al +mov $0x3f8, %dx +add $0x30, %al +out %al, %dx +mov $0xd, %al +out %al, %dx +mov $0xa, %al +out %al, %dx + + jcxz mmap_loop /* ignore empty entries */ + cmpb $1, 16(%di) /* only process reserved regions */ + je mmap_loop + cmpl $0, 4(%di) /* only process low memory */ + jne mmap_loop + cmpl %esi, 0(%di) + jae mmap_loop + + movl 8(%di), %ecx /* ECX = region size */ + jecxz mmap_loop /* ignore empty regions */ + + /* Valid low memory region. Check if it overlaps EBP..ESI */ + + addl 0(%di), %ecx /* ECX = end of region */ + cmp %ebp, %ecx /* not if end <= initrd_start */ + jbe mmap_loop + + /* Cannot put initrd here, try lowering the top of memory */ + + movl 0(%di), %ebp + jmp mmap_loop_start + +mmap_done: + mov %ebp, %edi /* EDI = start of initrd */ /* We need to load the kernel into memory we can't access in 16 bit mode, so let's get into 32 bit mode, write the kernel and jump @@ -108,10 +198,18 @@ copy_kernel: /* We're now running in 16-bit CS, but 32-bit ES! */ /* Load kernel and initrd */ + pushl %edi + read_fw_blob_addr32_edi(FW_CFG_INITRD) read_fw_blob_addr32(FW_CFG_KERNEL) - read_fw_blob_addr32(FW_CFG_INITRD) read_fw_blob_addr32(FW_CFG_CMDLINE) - read_fw_blob_addr32(FW_CFG_SETUP) + + read_fw FW_CFG_SETUP_ADDR + mov %eax, %edi + mov %eax, %ebx + read_fw_blob_addr32_edi(FW_CFG_SETUP) + + /* Update the header with the initrd address we chose above */ + popl %es:0x218(%ebx) /* And now jump into Linux! */ mov $0, %eax @@ -136,4 +234,9 @@ gdt: /* 0x10: data segment (base=0, limit=0xfffff, type=32bit data read/write, DPL=0, 4k) */ .byte 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00 +e820: +.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +.byte 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 + BOOT_ROM_END diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h index ce43608..f1a9021 100644 --- a/pc-bios/optionrom/optionrom.h +++ b/pc-bios/optionrom/optionrom.h @@ -51,8 +51,6 @@ .endm #define read_fw_blob_pre(var) \ - read_fw var ## _ADDR; \ - mov %eax, %edi; \ read_fw var ## _SIZE; \ mov %eax, %ecx; \ mov $var ## _DATA, %ax; \ @@ -68,6 +66,8 @@ * Clobbers: %eax, %edx, %es, %ecx, %edi */ #define read_fw_blob(var) \ + read_fw var ## _ADDR; \ + mov %eax, %edi; \ read_fw_blob_pre(var); \ /* old as(1) doesn't like this insn so emit the bytes instead: \ rep insb (%dx), %es:(%edi); \ @@ -80,7 +80,22 @@ * * Clobbers: %eax, %edx, %es, %ecx, %edi */ -#define read_fw_blob_addr32(var) \ +#define read_fw_blob_addr32(var) \ + read_fw var ## _ADDR; \ + mov %eax, %edi; \ + read_fw_blob_pre(var); \ + /* old as(1) doesn't like this insn so emit the bytes instead: \ + addr32 rep insb (%dx), %es:(%edi); \ + */ \ + .dc.b 0x67,0xf3,0x6c + +/* + * Read a blob from the fw_cfg device in forced addr32 mode, address is in %edi. + * Requires _SIZE and _DATA values for the parameter. + * + * Clobbers: %eax, %edx, %edi, %es, %ecx + */ +#define read_fw_blob_addr32_edi(var) \ read_fw_blob_pre(var); \ /* old as(1) doesn't like this insn so emit the bytes instead: \ addr32 rep insb (%dx), %es:(%edi); \ ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini ` (6 preceding siblings ...) 2014-09-19 7:36 ` [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Gerd Hoffmann @ 2014-10-02 12:11 ` Michael S. Tsirkin 2014-10-02 12:44 ` Paolo Bonzini 2014-10-02 13:30 ` Paolo Bonzini 7 siblings, 2 replies; 20+ messages in thread From: Michael S. Tsirkin @ 2014-10-02 12:11 UTC (permalink / raw) To: Paolo Bonzini; +Cc: jsnow, qemu-devel On Thu, Sep 18, 2014 at 06:17:48PM +0200, Paolo Bonzini wrote: > In the emergency last-minute patches of QEMU 2.1 we did two things: > > - fixed migration problems from 1.7 or 2.0 to 2.1 due to changes in > ACPI table sizes > > - ensured that future versions will not break migration compatibility > with 2.2 for reasonable configurations (with ACPI tables smaller > than a hundred kilobytes, roughly) > > However, this came at the cost of wasting 128 KB unconditionally on > even the smaller configuration, and we didn't provide a mechanism to > ensure compatibility with larger configurations. > > This series provides this mechanism. As mentioned early, the design > is to consider the SSDT immutable and versioned (together with other > non-AML tables such as HPET, TPMA and MADT, SRAT, MCFG, DMAR). > The DSDT instead can change more or less arbitrarily. To do this, > we add padding after the DSDT to allow for future growth. I'm not sure I got this one. Why is this padding more robust than the one we have now? > Once we do this, the size of the ACPI table fw_cfg "file" is constant > given a machine type and a command-line, so we do not need anymore the > larger 128KB padding. > > This is done in patches 4-6. > > However, there is another problem. As the ACPI tables grow, we need > to move the address at which linuxboot.bin loads the initrd. This > address is placed close to the end of memory, but it is QEMU that > tells linuxboot.bin where exactly the initrd is to be loaded. And > QEMU cannot really know how much high memory SeaBIOS will use, because > QEMU does not know the final e820 memory map. > > The solution would be to let linuxboot.bin parse the memory map and > ignore the suggested initrd base address, but that's tedious. In the > meanwhile, we can just assume that most of the need comes from the ACPI > tables (which is in fact true: patch 3 adds a fixed 32k extra just in > case) and dynamically resize the padding. > > This is what patches 1-3 do. The nice part is that they also remove > some differences between Xen and other accelerators. I would appreciate > Xen testing from interested people. > > Thanks, > > Paolo I'm not inclined to apply this for 2.2. Summarizing what you say, there are two issues around ACPI tables: - linuxboot uses FW CFG to for memory allocations, seabios ignores that, so they might conflict. Let's fix either linuxboot or seabios (or both!) and forget about it. - table size changes cause cross version migration issues this is really due to the fact we are using RAM to migrate ACPI tables. IMHO a more robust fix would be to allow RAM size to change during migration, or to avoid using RAM, switch to another type of object. So both issues have other solutions, and I think it's a good idea to focus on them for now. Also, I really would like to avoid having ACPI sizing-related issues for this release. The memory of 2.1.X pain is too fresh :) I'm not NACKing this patchset, but let's make some progress on the bigger issues listed above, then come back and address sizing as appropriate. Thanks! > > Paolo Bonzini (6): > pc: initialize fw_cfg earlier > pc: load the kernel after ACPI tables are built > pc: redo sizing of reserved high memory area for -kernel/-initrd > pc: introduce new ACPI table sizing algorithm > pc: go back to smaller ACPI tables > pc: clean up pre-2.1 compatibility code > > hw/i386/acpi-build.c | 23 +++++++++------- > hw/i386/pc.c | 72 +++++++++++++++++---------------------------------- > hw/i386/pc_piix.c | 32 ++++++++++++++-------- > hw/i386/pc_q35.c | 7 ++-- > include/hw/i386/pc.h | 4 ++ > 5 files changed, 66 insertions(+), 72 deletions(-) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-02 12:11 ` Michael S. Tsirkin @ 2014-10-02 12:44 ` Paolo Bonzini 2014-10-02 13:30 ` Paolo Bonzini 1 sibling, 0 replies; 20+ messages in thread From: Paolo Bonzini @ 2014-10-02 12:44 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: jsnow, qemu-devel Il 02/10/2014 14:11, Michael S. Tsirkin ha scritto: > On Thu, Sep 18, 2014 at 06:17:48PM +0200, Paolo Bonzini wrote: >> In the emergency last-minute patches of QEMU 2.1 we did two things: >> >> - fixed migration problems from 1.7 or 2.0 to 2.1 due to changes in >> ACPI table sizes >> >> - ensured that future versions will not break migration compatibility >> with 2.2 for reasonable configurations (with ACPI tables smaller >> than a hundred kilobytes, roughly) >> >> However, this came at the cost of wasting 128 KB unconditionally on >> even the smaller configuration, and we didn't provide a mechanism to >> ensure compatibility with larger configurations. >> >> This series provides this mechanism. As mentioned early, the design >> is to consider the SSDT immutable and versioned (together with other >> non-AML tables such as HPET, TPMA and MADT, SRAT, MCFG, DMAR). >> The DSDT instead can change more or less arbitrarily. To do this, >> we add padding after the DSDT to allow for future growth. > > I'm not sure I got this one. > Why is this padding more robust than the one we have now? Because the current one applies to all ACPI tables. After this patch, the padding only applies to the "fixed" tables (FACS, FADT, DSDT). With the old algorithm, a small change in the DSDT, combined with the "right" size of the MADT/SSDT, could cause a change in the rounded size. With the new algorithm, any change in the DSDT will either break migration or will not, independent of the content of the SSDT. Example with the old algorithm (up to 2.0): old version of QEMU new version of QEMU fixed table size 4000 4100 variable table size 4100 4100 total size 8100 8100 4k-rounded size 8192 12288 In 2.1 we changed the rounding to 128k, but the fundamental problem remains and that is why we warn for ACPI tables whose size is above 64k. The same example with new algorithm: old version of QEMU new version of QEMU fixed table size 4000 4100 padded to 16384 16384 variable table size 4100 4100 total size 17484 17484 4k-rounded size 20480 20480 Paolo >> Once we do this, the size of the ACPI table fw_cfg "file" is constant >> given a machine type and a command-line, so we do not need anymore the >> larger 128KB padding. >> >> This is done in patches 4-6. >> >> However, there is another problem. As the ACPI tables grow, we need >> to move the address at which linuxboot.bin loads the initrd. This >> address is placed close to the end of memory, but it is QEMU that >> tells linuxboot.bin where exactly the initrd is to be loaded. And >> QEMU cannot really know how much high memory SeaBIOS will use, because >> QEMU does not know the final e820 memory map. >> >> The solution would be to let linuxboot.bin parse the memory map and >> ignore the suggested initrd base address, but that's tedious. In the >> meanwhile, we can just assume that most of the need comes from the ACPI >> tables (which is in fact true: patch 3 adds a fixed 32k extra just in >> case) and dynamically resize the padding. >> >> This is what patches 1-3 do. The nice part is that they also remove >> some differences between Xen and other accelerators. I would appreciate >> Xen testing from interested people. >> >> Thanks, >> >> Paolo > > > I'm not inclined to apply this for 2.2. > > Summarizing what you say, there are two issues around ACPI tables: > - linuxboot uses FW CFG to for memory allocations, > seabios ignores that, so they might conflict. > Let's fix either linuxboot or seabios (or both!) > and forget about it. > > - table size changes cause cross version migration issues > this is really due to the fact we are using RAM > to migrate ACPI tables. > IMHO a more robust fix would be to allow RAM size to change > during migration, or to avoid using RAM, switch to another type of > object. > > So both issues have other solutions, and I think it's a good > idea to focus on them for now. > Also, I really would like to avoid having ACPI sizing-related > issues for this release. The memory of 2.1.X pain is too fresh :) > I'm not NACKing this patchset, but let's > make some progress on the bigger issues listed above, then come > back and address sizing as appropriate. > > Thanks! > >> >> Paolo Bonzini (6): >> pc: initialize fw_cfg earlier >> pc: load the kernel after ACPI tables are built >> pc: redo sizing of reserved high memory area for -kernel/-initrd >> pc: introduce new ACPI table sizing algorithm >> pc: go back to smaller ACPI tables >> pc: clean up pre-2.1 compatibility code >> >> hw/i386/acpi-build.c | 23 +++++++++------- >> hw/i386/pc.c | 72 +++++++++++++++++---------------------------------- >> hw/i386/pc_piix.c | 32 ++++++++++++++-------- >> hw/i386/pc_q35.c | 7 ++-- >> include/hw/i386/pc.h | 4 ++ >> 5 files changed, 66 insertions(+), 72 deletions(-) > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-02 12:11 ` Michael S. Tsirkin 2014-10-02 12:44 ` Paolo Bonzini @ 2014-10-02 13:30 ` Paolo Bonzini 2014-10-02 13:41 ` Michael S. Tsirkin 1 sibling, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-10-02 13:30 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: jsnow, qemu-devel Il 02/10/2014 14:11, Michael S. Tsirkin ha scritto: > Summarizing what you say, there are two issues around ACPI tables: > - linuxboot uses FW CFG to for memory allocations, > seabios ignores that, so they might conflict. > Let's fix either linuxboot or seabios (or both!) > and forget about it. We can fix linuxboot, it is easy. These patches do fix John's scenario, but that is not the main issue. They are not an _attempt_ to fix it, they just do so more or less by chance. Their real purpose is fixing the second issue: > - table size changes cause cross version migration issues > this is really due to the fact we are using RAM > to migrate ACPI tables. > IMHO a more robust fix would be to allow RAM size to change > during migration, or to avoid using RAM, switch to another type of > object. Allowing fw_cfg size to change during migration (does not matter if it is stored in RAM or otherwise) is a huge can of worms because the host might have loaded the size and stored it somewhere, way before migration. Extreme example: the guest could expect the size to remain the same at boot time and S3 resume time. So I think the fw_cfg size is guest ABI and cannot change across migration anyway. > So both issues have other solutions, and I think it's a good > idea to focus on them for now. > Also, I really would like to avoid having ACPI sizing-related > issues for this release. The memory of 2.1.X pain is too fresh :) Yeah, I understand that. But I think the scary part of this series is actually the first two patches, rather than the ACPI sizing algorithm. Paolo > I'm not NACKing this patchset, but let's > make some progress on the bigger issues listed above, then come > back and address sizing as appropriate. > > Thanks! > >> >> Paolo Bonzini (6): >> pc: initialize fw_cfg earlier >> pc: load the kernel after ACPI tables are built >> pc: redo sizing of reserved high memory area for -kernel/-initrd >> pc: introduce new ACPI table sizing algorithm >> pc: go back to smaller ACPI tables >> pc: clean up pre-2.1 compatibility code >> >> hw/i386/acpi-build.c | 23 +++++++++------- >> hw/i386/pc.c | 72 +++++++++++++++++---------------------------------- >> hw/i386/pc_piix.c | 32 ++++++++++++++-------- >> hw/i386/pc_q35.c | 7 ++-- >> include/hw/i386/pc.h | 4 ++ >> 5 files changed, 66 insertions(+), 72 deletions(-) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-02 13:30 ` Paolo Bonzini @ 2014-10-02 13:41 ` Michael S. Tsirkin 2014-10-02 13:43 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Michael S. Tsirkin @ 2014-10-02 13:41 UTC (permalink / raw) To: Paolo Bonzini; +Cc: jsnow, qemu-devel On Thu, Oct 02, 2014 at 03:30:57PM +0200, Paolo Bonzini wrote: > Il 02/10/2014 14:11, Michael S. Tsirkin ha scritto: > > Summarizing what you say, there are two issues around ACPI tables: > > - linuxboot uses FW CFG to for memory allocations, > > seabios ignores that, so they might conflict. > > Let's fix either linuxboot or seabios (or both!) > > and forget about it. > > We can fix linuxboot, it is easy. > > These patches do fix John's scenario, but that is not the main issue. > They are not an _attempt_ to fix it, they just do so more or less by > chance. Their real purpose is fixing the second issue: > > > - table size changes cause cross version migration issues > > this is really due to the fact we are using RAM > > to migrate ACPI tables. > > IMHO a more robust fix would be to allow RAM size to change > > during migration, or to avoid using RAM, switch to another type of > > object. > > Allowing fw_cfg size to change during migration (does not matter if it > is stored in RAM or otherwise) is a huge can of worms because the host > might have loaded the size and stored it somewhere, way before migration. Right. I'm not suggesting it. I suggest migrating fw cfg size instead. > Extreme example: the guest could expect the size to remain the same at > boot time and S3 resume time. AFAIK ACPI tables aren't re-read from QEMU at S3 resume. So guest will always see the same tables that are currently in RAM. > So I think the fw_cfg size is guest ABI and cannot change across > migration anyway. But this is already the case, this is not the issue. The issue is that incoming migration might have a different fw_cfg size from what we have. I think migrating this value will solve the issue in a cleaner way. > > > So both issues have other solutions, and I think it's a good > > idea to focus on them for now. > > Also, I really would like to avoid having ACPI sizing-related > > issues for this release. The memory of 2.1.X pain is too fresh :) > > Yeah, I understand that. But I think the scary part of this series is > actually the first two patches, rather than the ACPI sizing algorithm. > > Paolo > > > I'm not NACKing this patchset, but let's > > make some progress on the bigger issues listed above, then come > > back and address sizing as appropriate. > > > > Thanks! > > > >> > >> Paolo Bonzini (6): > >> pc: initialize fw_cfg earlier > >> pc: load the kernel after ACPI tables are built > >> pc: redo sizing of reserved high memory area for -kernel/-initrd > >> pc: introduce new ACPI table sizing algorithm > >> pc: go back to smaller ACPI tables > >> pc: clean up pre-2.1 compatibility code > >> > >> hw/i386/acpi-build.c | 23 +++++++++------- > >> hw/i386/pc.c | 72 +++++++++++++++++---------------------------------- > >> hw/i386/pc_piix.c | 32 ++++++++++++++-------- > >> hw/i386/pc_q35.c | 7 ++-- > >> include/hw/i386/pc.h | 4 ++ > >> 5 files changed, 66 insertions(+), 72 deletions(-) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-02 13:41 ` Michael S. Tsirkin @ 2014-10-02 13:43 ` Paolo Bonzini 2014-10-02 13:49 ` Michael S. Tsirkin 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-10-02 13:43 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: jsnow, qemu-devel Il 02/10/2014 15:41, Michael S. Tsirkin ha scritto: > On Thu, Oct 02, 2014 at 03:30:57PM +0200, Paolo Bonzini wrote: >> These patches do fix John's scenario, but that is not the main issue. >> They are not an _attempt_ to fix it, they just do so more or less by >> chance. Their real purpose is fixing the second issue: >> >>> - table size changes cause cross version migration issues >>> this is really due to the fact we are using RAM >>> to migrate ACPI tables. >>> IMHO a more robust fix would be to allow RAM size to change >>> during migration, or to avoid using RAM, switch to another type of >>> object. >> >> Allowing fw_cfg size to change during migration (does not matter if it >> is stored in RAM or otherwise) is a huge can of worms because the host >> might have loaded the size and stored it somewhere, way before migration. > > Right. I'm not suggesting it. I suggest migrating fw cfg size instead. > > The issue is that incoming migration might have a different > fw_cfg size from what we have. Understood now. > I think migrating this value will solve the issue in a cleaner way. Perhaps. The question is whether it would complicate the forwards-migration code beyond what is sane. I think we are practically speaking stuck with RAM. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-02 13:43 ` Paolo Bonzini @ 2014-10-02 13:49 ` Michael S. Tsirkin 2014-10-06 13:42 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Michael S. Tsirkin @ 2014-10-02 13:49 UTC (permalink / raw) To: Paolo Bonzini; +Cc: jsnow, qemu-devel On Thu, Oct 02, 2014 at 03:43:35PM +0200, Paolo Bonzini wrote: > Il 02/10/2014 15:41, Michael S. Tsirkin ha scritto: > > On Thu, Oct 02, 2014 at 03:30:57PM +0200, Paolo Bonzini wrote: > >> These patches do fix John's scenario, but that is not the main issue. > >> They are not an _attempt_ to fix it, they just do so more or less by > >> chance. Their real purpose is fixing the second issue: > >> > >>> - table size changes cause cross version migration issues > >>> this is really due to the fact we are using RAM > >>> to migrate ACPI tables. > >>> IMHO a more robust fix would be to allow RAM size to change > >>> during migration, or to avoid using RAM, switch to another type of > >>> object. > >> > >> Allowing fw_cfg size to change during migration (does not matter if it > >> is stored in RAM or otherwise) is a huge can of worms because the host > >> might have loaded the size and stored it somewhere, way before migration. > > > > Right. I'm not suggesting it. I suggest migrating fw cfg size instead. > > > > The issue is that incoming migration might have a different > > fw_cfg size from what we have. > > Understood now. > > > I think migrating this value will solve the issue in a cleaner way. > > Perhaps. The question is whether it would complicate the > forwards-migration code beyond what is sane. I think we are practically > speaking stuck with RAM. > > Paolo Migrating RAM size is actually useful too, I think someone asked for it. -- MST ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-02 13:49 ` Michael S. Tsirkin @ 2014-10-06 13:42 ` Paolo Bonzini 2014-10-06 13:52 ` Michael S. Tsirkin 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-10-06 13:42 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: jsnow, qemu-devel Il 02/10/2014 15:49, Michael S. Tsirkin ha scritto: >>> The issue is that incoming migration might have a different >>> fw_cfg size from what we have. >> >> Understood now. >> >>> I think migrating this value will solve the issue in a cleaner way. >> >> Perhaps. The question is whether it would complicate the >> forwards-migration code beyond what is sane. I think we are practically >> speaking stuck with RAM. > > Migrating RAM size is actually useful too, I think someone asked for it. Migrating RAM size was discussed for BIOS and option ROMs, in order to support migration from old versions of QEMU. It was floated around for some time, but ultimately we ended up shipping two copies of affected firmware (128k/256k BIOS, and non-EFI/EFI option ROMs). For BIOS it wouldn't be enough, because the BIOS size affects the memory map. Of course ACPI tables aren't mapped anywhere, but I'd be wary of adding code to migration that is half-broken and almost never used. Also, RAM blocks that have different size would be yet another thing that makes machine types "almost compatible" with the QEMU version they're supposed to represent. In a scenario similar to John's, with mutable RAM sizes, would have likely broken all machine types, because we would not have bothered doing full backwards-compatibility. I'm not an advocate of backwards bug-compatibility, but I think RAM block sizes are way beyond the line of what we should be allowed to modify between machine types. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-06 13:42 ` Paolo Bonzini @ 2014-10-06 13:52 ` Michael S. Tsirkin 2014-10-06 13:55 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Michael S. Tsirkin @ 2014-10-06 13:52 UTC (permalink / raw) To: Paolo Bonzini; +Cc: jsnow, qemu-devel On Mon, Oct 06, 2014 at 03:42:01PM +0200, Paolo Bonzini wrote: > Il 02/10/2014 15:49, Michael S. Tsirkin ha scritto: > >>> The issue is that incoming migration might have a different > >>> fw_cfg size from what we have. > >> > >> Understood now. > >> > >>> I think migrating this value will solve the issue in a cleaner way. > >> > >> Perhaps. The question is whether it would complicate the > >> forwards-migration code beyond what is sane. I think we are practically > >> speaking stuck with RAM. > > > > Migrating RAM size is actually useful too, I think someone asked for it. > > Migrating RAM size was discussed for BIOS and option ROMs, in order to > support migration from old versions of QEMU. It was floated around for > some time, but ultimately we ended up shipping two copies of affected > firmware (128k/256k BIOS, and non-EFI/EFI option ROMs). > > For BIOS it wouldn't be enough, because the BIOS size affects the memory > map. Of course ACPI tables aren't mapped anywhere, but I'd be wary of > adding code to migration that is half-broken and almost never used. > > Also, RAM blocks that have different size would be yet another thing > that makes machine types "almost compatible" with the QEMU version > they're supposed to represent. In a scenario similar to John's, with > mutable RAM sizes, would have likely broken all machine types, because > we would not have bothered doing full backwards-compatibility. > > I'm not an advocate of backwards bug-compatibility, but I think RAM > block sizes are way beyond the line of what we should be allowed to > modify between machine types. > > Paolo Maybe we should just modify ACPI and rom files in general to use something else, not RAM? It looked like a good fit initially so we went ahead with it, but these things are fairly small, so it's not a problem to migrate them as part of the device state. -- MST ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-06 13:52 ` Michael S. Tsirkin @ 2014-10-06 13:55 ` Paolo Bonzini 2014-10-06 14:12 ` Michael S. Tsirkin 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-10-06 13:55 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: jsnow, qemu-devel Il 06/10/2014 15:52, Michael S. Tsirkin ha scritto: > Maybe we should just modify ACPI and rom files in general to use > something else, not RAM? > It looked like a good fit initially so we went ahead with it, > but these things are fairly small, so it's not a problem to > migrate them as part of the device state. Yes, that would have been a good design too, but I'm not sure it's worth changing it now. So far, it has certainly helped us; it both revealed bugs (though a bit too late) and kept us honest. These patches would be easy to revert from now till 2.2 release, they're just an incremental improvement on top of 2.1.2. I'll post a v2 that includes the linuxboot changes to look at the memory map, and keep the 160k padding for the sake of old linuxboot option ROMs being migrated to 2.2. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good 2014-10-06 13:55 ` Paolo Bonzini @ 2014-10-06 14:12 ` Michael S. Tsirkin 0 siblings, 0 replies; 20+ messages in thread From: Michael S. Tsirkin @ 2014-10-06 14:12 UTC (permalink / raw) To: Paolo Bonzini; +Cc: jsnow, qemu-devel On Mon, Oct 06, 2014 at 03:55:49PM +0200, Paolo Bonzini wrote: > Il 06/10/2014 15:52, Michael S. Tsirkin ha scritto: > > Maybe we should just modify ACPI and rom files in general to use > > something else, not RAM? > > It looked like a good fit initially so we went ahead with it, > > but these things are fairly small, so it's not a problem to > > migrate them as part of the device state. > > Yes, that would have been a good design too, but I'm not sure it's worth > changing it now. So far, it has certainly helped us; it both revealed > bugs (though a bit too late) and kept us honest. Bugs that surface too late is exactly what worries me. I'll try to look at how hard to implement the above change is. > These patches would be easy to revert from now till 2.2 release, they're > just an incremental improvement on top of 2.1.2. > > I'll post a v2 that includes the linuxboot changes to look at the memory > map, Pls keep linuxboot things a separate patchset. > and keep the 160k padding for the sake of old linuxboot option ROMs > being migrated to 2.2. > > Paolo I think we can drop this for 2.2 machine type. We can also drop the fw cfg file for new machine types. -- MST ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-10-06 14:09 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-18 16:17 [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 1/6] pc: initialize fw_cfg earlier Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 2/6] pc: load the kernel after ACPI tables are built Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 3/6] pc: redo sizing of reserved high memory area for -kernel/-initrd Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 4/6] pc: introduce new ACPI table sizing algorithm Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 5/6] pc: go back to smaller ACPI tables Paolo Bonzini 2014-09-18 16:17 ` [Qemu-devel] [PATCH 6/6] pc: clean up pre-2.1 compatibility code Paolo Bonzini 2014-09-19 7:36 ` [Qemu-devel] [PATCH 0/6] pc: bring ACPI table size below to 2.0 levels, try fixing -initrd for good Gerd Hoffmann 2014-09-19 8:06 ` Paolo Bonzini 2014-09-19 13:09 ` Paolo Bonzini 2014-10-02 12:11 ` Michael S. Tsirkin 2014-10-02 12:44 ` Paolo Bonzini 2014-10-02 13:30 ` Paolo Bonzini 2014-10-02 13:41 ` Michael S. Tsirkin 2014-10-02 13:43 ` Paolo Bonzini 2014-10-02 13:49 ` Michael S. Tsirkin 2014-10-06 13:42 ` Paolo Bonzini 2014-10-06 13:52 ` Michael S. Tsirkin 2014-10-06 13:55 ` Paolo Bonzini 2014-10-06 14:12 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).