qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: "Daniel Henrique Barboza" <dbarboza@ventanamicro.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	"Alistair Francis" <alistair.francis@wdc.com>,
	"Bin Meng" <bmeng.cn@gmail.com>,
	"Weiwei Li" <liwei1518@gmail.com>,
	"Liu Zhiwei" <zhiwei_liu@linux.alibaba.com>,
	qemu-riscv@nongnu.org, qemu-devel@nongnu.org,
	"Atish Patra" <atishp@atishpatra.org>,
	"Atish Patra" <atishp@rivosinc.com>
Cc: "Björn Töpel" <bjorn@rivosinc.com>,
	"Sunil V L" <sunilvl@ventanamicro.com>,
	"Santosh Mamila" <santosh.mamila@catalinasystems.io>,
	"Chethan Seshadri" <Chethan.Seshadri@catalinasystems.io>,
	"Sivakumar Munnangi" <siva.munnangi@catalinasystems.io>
Subject: Re: [PATCH v2 1/3] hw/riscv/virt: Add memory hotplugging and virtio-md-pci support
Date: Fri, 24 May 2024 17:02:23 +0200	[thread overview]
Message-ID: <51ef570c-da63-4e25-9c48-dbdf8a40a34d@redhat.com> (raw)
In-Reply-To: <55810d52-0360-40ad-a8d2-3b6a8aa220ae@ventanamicro.com>

On 24.05.24 15:14, Daniel Henrique Barboza wrote:
> 
> 
> On 5/21/24 07:56, Björn Töpel wrote:
>> From: Björn Töpel <bjorn@rivosinc.com>
>>
>> Virtio-based memory devices (virtio-mem/virtio-pmem) allows for
>> dynamic resizing of virtual machine memory, and requires proper
>> hotplugging (add/remove) support to work.
>>
>> Add device memory support for RISC-V "virt" machine, and enable
>> virtio-md-pci with the corresponding missing hotplugging callbacks.
>>
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> ---
>>    hw/riscv/Kconfig       |  2 +
>>    hw/riscv/virt.c        | 83 +++++++++++++++++++++++++++++++++++++++++-
>>    hw/virtio/virtio-mem.c |  5 ++-
>>    3 files changed, 87 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
>> index a2030e3a6ff0..08f82dbb681a 100644
>> --- a/hw/riscv/Kconfig
>> +++ b/hw/riscv/Kconfig
>> @@ -56,6 +56,8 @@ config RISCV_VIRT
>>        select PLATFORM_BUS
>>        select ACPI
>>        select ACPI_PCI
>> +    select VIRTIO_MEM_SUPPORTED
>> +    select VIRTIO_PMEM_SUPPORTED
>>    
>>    config SHAKTI_C
>>        bool
>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>> index 4fdb66052587..443902f919d2 100644
>> --- a/hw/riscv/virt.c
>> +++ b/hw/riscv/virt.c
>> @@ -53,6 +53,8 @@
>>    #include "hw/pci-host/gpex.h"
>>    #include "hw/display/ramfb.h"
>>    #include "hw/acpi/aml-build.h"
>> +#include "hw/mem/memory-device.h"
>> +#include "hw/virtio/virtio-mem-pci.h"
>>    #include "qapi/qapi-visit-common.h"
>>    #include "hw/virtio/virtio-iommu.h"
>>    
>> @@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState *machine)
>>        DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
>>        int i, base_hartid, hart_count;
>>        int socket_count = riscv_socket_count(machine);
>> +    hwaddr device_memory_base, device_memory_size;
>>    
>>        /* Check socket count limit */
>>        if (VIRT_SOCKETS_MAX < socket_count) {
>> @@ -1420,6 +1423,12 @@ static void virt_machine_init(MachineState *machine)
>>            exit(1);
>>        }
>>    
>> +    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
>> +        error_report("unsupported amount of memory slots: %"PRIu64,
>> +                     machine->ram_slots);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>>        /* Initialize sockets */
>>        mmio_irqchip = virtio_irqchip = pcie_irqchip = NULL;
>>        for (i = 0; i < socket_count; i++) {
>> @@ -1553,6 +1562,37 @@ static void virt_machine_init(MachineState *machine)
>>        memory_region_add_subregion(system_memory, memmap[VIRT_MROM].base,
>>                                    mask_rom);
>>    
>> +    /* device memory */
>> +    device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base + machine->ram_size,
>> +                                  GiB);
>> +    device_memory_size = machine->maxram_size - machine->ram_size;
>> +    if (device_memory_size > 0) {
>> +        /*
>> +         * Each DIMM is aligned based on the backend's alignment value.
>> +         * Assume max 1G hugepage alignment per slot.
>> +         */
>> +        device_memory_size += machine->ram_slots * GiB;
> 
> We don't need to align to 1GiB. This calc can use 2MiB instead (or 4MiB if we're
> running 32 bits).
> 
>> +
>> +        if (riscv_is_32bit(&s->soc[0])) {
>> +            hwaddr memtop = device_memory_base + ROUND_UP(device_memory_size,
>> +                                                          GiB);
> 
> Same here - alignment is 2/4 MiB.
> 
>> +
>> +            if (memtop > UINT32_MAX) {
>> +                error_report("memory exceeds 32-bit limit by %lu bytes",
>> +                             memtop - UINT32_MAX);
>> +                exit(EXIT_FAILURE);
>> +            }
>> +        }
>> +
>> +        if (device_memory_base + device_memory_size < device_memory_size) {
>> +            error_report("unsupported amount of device memory");
>> +            exit(EXIT_FAILURE);
>> +        }
> 
> Took another look and found this a bit strange. These are all unsigned vars, so
> if (unsigned a + unsigned b < unsigned b) will always be 'false'. The compiler is
> probably cropping this out.

No. Unsigned interger overflow is defined behavior and this is a common 
check to detect such overflow. tI's consistent with what we do for other 
architectures.

> 
> The calc we need to do is to ensure that the extra ram_slots * alignment will fit into
> the VIRT_DRAM block, i.e. maxram_size + (ram_slots * alignment) < memmap[VIRT_DRAM].size.
> 
> 
> TBH I'm starting to have second thoughts about letting users hotplug whatever they want.
> It seems cleaner to just force the 2/4 Mb alignment in pre_plug() and be done with it,
> no need to allocate ram_slots * alignment and doing all these extra checks.

It's worth noting that if user space decides to specify addresses 
manually, it can mess up everything already. There are other events that 
can result in fragmentation of the memory device area (repeated 
hot(un)plug of differing DIMMs).

Assume you have 1 GiB range and hotplug a 512 MiB DIMM at offset 256 
MiB. You won't be able to hotplug another 512 MiB DIMM even though we 
reserved space.

My take so far is: if the user wants to do such stuff it should size the 
area (maxmem) much larger or deal with the concequences (not being able 
to hotplug memory).

It usually does not happen in practice ...

> 
> As I sent in an earlier email, users must already comply to the alignment of the host
> memory when plugging pc-dimms, so I'm not sure our value/proposition with all this
> extra code is worth it - the alignment will most likely be forced by the host memory
> backend, so might as well force ourselves in pre_plug().

At least on x86-64, the 2 MiB alignment requirement is handled 
automatically. QEMU_VMALLOC_ALIGN implicitly enforces that.

Maybe RISCV also wants to set QEMU_VMALLOC_ALIGN?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-05-24 15:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-21 10:56 [PATCH v2 0/3] RISC-V virt MHP support Björn Töpel
2024-05-21 10:56 ` [PATCH v2 1/3] hw/riscv/virt: Add memory hotplugging and virtio-md-pci support Björn Töpel
2024-05-24  8:53   ` Daniel Henrique Barboza
2024-05-27 11:49     ` Björn Töpel
2024-05-24 13:14   ` Daniel Henrique Barboza
2024-05-24 15:02     ` David Hildenbrand [this message]
2024-05-24 15:29       ` Daniel Henrique Barboza
2024-05-27  7:19         ` David Hildenbrand
2024-05-27 12:03       ` Björn Töpel
2024-05-27 13:19         ` Daniel Henrique Barboza
2024-05-21 10:56 ` [PATCH v2 2/3] hw/riscv/virt-acpi-build: Expose device memory in ACPI SRAT Björn Töpel
2024-05-21 10:56 ` [PATCH v2 3/3] hw/riscv/virt: Add ACPI GED and PC-DIMM MHP support Björn Töpel
2024-05-24  9:10   ` Daniel Henrique Barboza
2024-05-27 12:08     ` Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51ef570c-da63-4e25-9c48-dbdf8a40a34d@redhat.com \
    --to=david@redhat.com \
    --cc=Chethan.Seshadri@catalinasystems.io \
    --cc=alistair.francis@wdc.com \
    --cc=atishp@atishpatra.org \
    --cc=atishp@rivosinc.com \
    --cc=bjorn@kernel.org \
    --cc=bjorn@rivosinc.com \
    --cc=bmeng.cn@gmail.com \
    --cc=dbarboza@ventanamicro.com \
    --cc=liwei1518@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=santosh.mamila@catalinasystems.io \
    --cc=siva.munnangi@catalinasystems.io \
    --cc=sunilvl@ventanamicro.com \
    --cc=zhiwei_liu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).