From: "Michael S. Tsirkin" <mst@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, peter.maydell@linaro.org,
philmd@linaro.org, wangyanan55@huawei.com, pbonzini@redhat.com,
richard.henderson@linaro.org, anisinha@redhat.com,
qemu-arm@nongnu.org
Subject: Re: [PATCH] smbios: make memory device size configurable per Machine
Date: Thu, 11 Jul 2024 07:13:27 -0400 [thread overview]
Message-ID: <20240711071054-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240711074822.3384344-1-imammedo@redhat.com>
On Thu, Jul 11, 2024 at 09:48:22AM +0200, Igor Mammedov wrote:
> Currently SMBIOS maximum memory device chunk is capped at 16Gb,
> which is fine for the most cases (QEMU uses it to describe initial
> RAM (type 17 SMBIOS table entries)).
> However when starting guest with terabytes of RAM this leads to
> too many memory device structures, which eventually upsets linux
> kernel as it reserves only 64K for these entries and when that
> border is crossed out it runs out of reserved memory.
>
> Instead of partitioning initial RAM on 16Gb chunks, use maximum
> possible chunk size that SMBIOS spec allows[1]. Which lets
> encode RAM in Mb units in uint32_t-1 field (upto 2047Tb).
> As result initial RAM will generate only one type 17 structure
> until host/guest reach ability to use more RAM in the future.
>
> Compat changes:
> We can't unconditionally change chunk size as it will break
> QEMU<->guest ABI (and migration). Thus introduce a new machine class
> field that would let older versioned machines to use 16Gb chunks
> while new machine type could use maximum possible chunk size.
>
> While it might seem to be risky to rise max entry size this much
> (much beyond of what current physical RAM modules support),
> I'd not expect it causing much issues, modulo uncovering bugs
> in software running within guest. And those should be fixed
> on guest side to handle SMBIOS spec properly, especially if
> guest is expected to support so huge RAM configs.
> In worst case, QEMU can reduce chunk size later if we would
> care enough about introducing a workaround for some 'unfixable'
> guest OS, either by fixing up the next machine type or
> giving users a CLI option to customize it.
>
> 1) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size
>
> PS:
> * tested on 8Tb host with RHEL6 guest, which seems to parse
> type 17 SMBIOS table entries correctly (according to 'dmidecode').
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> include/hw/boards.h | 4 ++++
> hw/arm/virt.c | 1 +
> hw/core/machine.c | 1 +
> hw/i386/pc_piix.c | 1 +
> hw/i386/pc_q35.c | 1 +
> hw/smbios/smbios.c | 11 ++++++-----
> 6 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index ef6f18f2c1..48ff6d8b93 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -237,6 +237,9 @@ typedef struct {
> * purposes only.
> * Applies only to default memory backend, i.e., explicit memory backend
> * wasn't used.
> + * @smbios_memory_device_size:
> + * Default size of memory device,
> + * SMBIOS 3.1.0 "7.18 Memory Device (Type 17)"
Maybe it would be better to just make this a boolean,
and put the spec related logic in smbios.c ?
WDYT?
> */
> struct MachineClass {
> /*< private >*/
> @@ -304,6 +307,7 @@ struct MachineClass {
> const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
> int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
> ram_addr_t (*fixup_ram_size)(ram_addr_t size);
> + uint64_t smbios_memory_device_size;
> };
>
> /**
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index b0c68d66a3..719e83e6a1 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -3308,6 +3308,7 @@ DEFINE_VIRT_MACHINE_AS_LATEST(9, 1)
> static void virt_machine_9_0_options(MachineClass *mc)
> {
> virt_machine_9_1_options(mc);
> + mc->smbios_memory_device_size = 16 * GiB;
> compat_props_add(mc->compat_props, hw_compat_9_0, hw_compat_9_0_len);
> }
> DEFINE_VIRT_MACHINE(9, 0)
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index bc38cad7f2..3cfdaec65d 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -1004,6 +1004,7 @@ static void machine_class_init(ObjectClass *oc, void *data)
> /* Default 128 MB as guest ram size */
> mc->default_ram_size = 128 * MiB;
> mc->rom_file_has_mr = true;
> + mc->smbios_memory_device_size = 2047 * TiB;
>
> /* numa node memory size aligned on 8MB by default.
> * On Linux, each node's border has to be 8MB aligned
All these values really should be documented.
And I feel
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 9445b07b4f..d9e69243b4 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -495,6 +495,7 @@ static void pc_i440fx_machine_9_0_options(MachineClass *m)
> pc_i440fx_machine_9_1_options(m);
> m->alias = NULL;
> m->is_default = false;
> + m->smbios_memory_device_size = 16 * GiB;
>
> compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len);
> compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len);
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 71d3c6d122..9d108b194e 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -374,6 +374,7 @@ static void pc_q35_machine_9_0_options(MachineClass *m)
> PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
> pc_q35_machine_9_1_options(m);
> m->alias = NULL;
> + m->smbios_memory_device_size = 16 * GiB;
> compat_props_add(m->compat_props, hw_compat_9_0, hw_compat_9_0_len);
> compat_props_add(m->compat_props, pc_compat_9_0, pc_compat_9_0_len);
> pcmc->isa_bios_alias = false;
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index 3b7703489d..a394514264 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -1093,6 +1093,7 @@ static bool smbios_get_tables_ep(MachineState *ms,
> Error **errp)
> {
> unsigned i, dimm_cnt, offset;
> + MachineClass *mc = MACHINE_GET_CLASS(ms);
> ERRP_GUARD();
>
> assert(ep_type == SMBIOS_ENTRY_POINT_TYPE_32 ||
> @@ -1123,12 +1124,12 @@ static bool smbios_get_tables_ep(MachineState *ms,
> smbios_build_type_9_table(errp);
> smbios_build_type_11_table();
>
> -#define MAX_DIMM_SZ (16 * GiB)
> -#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? MAX_DIMM_SZ \
> - : ((current_machine->ram_size - 1) % MAX_DIMM_SZ) + 1)
> +#define GET_DIMM_SZ ((i < dimm_cnt - 1) ? mc->smbios_memory_device_size \
> + : ((current_machine->ram_size - 1) % mc->smbios_memory_device_size) + 1)
>
> - dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size, MAX_DIMM_SZ) /
> - MAX_DIMM_SZ;
> + dimm_cnt = QEMU_ALIGN_UP(current_machine->ram_size,
> + mc->smbios_memory_device_size) /
> + mc->smbios_memory_device_size;
>
> /*
> * The offset determines if we need to keep additional space between
> --
> 2.43.0
next prev parent reply other threads:[~2024-07-11 11:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 7:48 [PATCH] smbios: make memory device size configurable per Machine Igor Mammedov
2024-07-11 8:19 ` Philippe Mathieu-Daudé
2024-07-11 8:42 ` Igor Mammedov
2025-02-04 21:46 ` Philippe Mathieu-Daudé
2024-07-11 8:43 ` Daniel P. Berrangé
2024-07-11 9:17 ` Igor Mammedov
2024-07-11 11:13 ` Michael S. Tsirkin [this message]
2024-07-11 13:05 ` Igor Mammedov
2024-07-20 19:36 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240711071054-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=anisinha@redhat.com \
--cc=imammedo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=wangyanan55@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).