* [Qemu-devel] [RFC PATCH v1 0/4] Refactoring pc_dimm_plug and NUMA node lookup API @ 2015-06-12 9:00 Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine Bharata B Rao ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Bharata B Rao @ 2015-06-12 9:00 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, imammedo, Bharata B Rao, ehabkost, david Hi, This is the next version of the NUMA lookup API v0 that I posted earlier. In this version, I have added a patch to factor out generic code from pc_dimm_plug() so that the same can be used by other architectures. I combined NUMA lookup API and this patch together since they are related and touch common code. This version is based on the feedback I received for my v0 post: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg01078.html Bharata B Rao (4): pc,pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine numa,pc-dimm: Store pc-dimm memory information in numa_info numa: Store boot memory address range in node_info numa: API to lookup NUMA node by address hw/i386/acpi-build.c | 2 +- hw/i386/pc.c | 90 +++++++++--------------------------------------- hw/mem/pc-dimm.c | 84 ++++++++++++++++++++++++++++++++++++++++++++ include/hw/i386/pc.h | 4 +-- include/hw/mem/pc-dimm.h | 9 +++++ include/sysemu/numa.h | 11 ++++++ numa.c | 82 +++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 206 insertions(+), 76 deletions(-) -- 2.1.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine 2015-06-12 9:00 [Qemu-devel] [RFC PATCH v1 0/4] Refactoring pc_dimm_plug and NUMA node lookup API Bharata B Rao @ 2015-06-12 9:00 ` Bharata B Rao 2015-06-15 6:32 ` David Gibson 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info Bharata B Rao ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Bharata B Rao @ 2015-06-12 9:00 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, imammedo, Bharata B Rao, ehabkost, david pc_dimm_plug() has code that will be needed for memory plug handlers in other archs too. Extract code from pc_dimm_plug() into a generic routine pc_dimm_memory_plug() that resides in pc-dimm.c. Also correspondingly refactor re-usable unplug code into pc_dimm_memory_unplug(). Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- hw/i386/acpi-build.c | 2 +- hw/i386/pc.c | 90 +++++++++--------------------------------------- hw/mem/pc-dimm.c | 80 ++++++++++++++++++++++++++++++++++++++++++ include/hw/i386/pc.h | 4 +-- include/hw/mem/pc-dimm.h | 9 +++++ 5 files changed, 109 insertions(+), 76 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index b71e942..5f6fa95 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -1512,7 +1512,7 @@ build_srat(GArray *table_data, GArray *linker, PcGuestInfo *guest_info) */ if (hotplugabble_address_space_size) { numamem = acpi_data_push(table_data, sizeof *numamem); - acpi_build_srat_memory(numamem, pcms->hotplug_memory_base, + acpi_build_srat_memory(numamem, pcms->hotplug_memory.base, hotplugabble_address_space_size, 0, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 3f0d435..c869588 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -64,7 +64,6 @@ #include "hw/pci/pci_host.h" #include "acpi-build.h" #include "hw/mem/pc-dimm.h" -#include "trace.h" #include "qapi/visitor.h" #include "qapi-visit.h" @@ -1297,7 +1296,7 @@ FWCfgState *pc_memory_init(MachineState *machine, exit(EXIT_FAILURE); } - pcms->hotplug_memory_base = + pcms->hotplug_memory.base = ROUND_UP(0x100000000ULL + above_4g_mem_size, 1ULL << 30); if (pcms->enforce_aligned_dimm) { @@ -1305,17 +1304,17 @@ FWCfgState *pc_memory_init(MachineState *machine, hotplug_mem_size += (1ULL << 30) * machine->ram_slots; } - if ((pcms->hotplug_memory_base + hotplug_mem_size) < + if ((pcms->hotplug_memory.base + hotplug_mem_size) < hotplug_mem_size) { error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT, machine->maxram_size); exit(EXIT_FAILURE); } - memory_region_init(&pcms->hotplug_memory, OBJECT(pcms), + memory_region_init(&pcms->hotplug_memory.mr, OBJECT(pcms), "hotplug-memory", hotplug_mem_size); - memory_region_add_subregion(system_memory, pcms->hotplug_memory_base, - &pcms->hotplug_memory); + memory_region_add_subregion(system_memory, pcms->hotplug_memory.base, + &pcms->hotplug_memory.mr); } /* Initialize PC system firmware */ @@ -1333,9 +1332,9 @@ FWCfgState *pc_memory_init(MachineState *machine, fw_cfg = bochs_bios_init(); rom_set_fw(fw_cfg); - if (guest_info->has_reserved_memory && pcms->hotplug_memory_base) { + if (guest_info->has_reserved_memory && pcms->hotplug_memory.base) { uint64_t *val = g_malloc(sizeof(*val)); - *val = cpu_to_le64(ROUND_UP(pcms->hotplug_memory_base, 0x1ULL << 30)); + *val = cpu_to_le64(ROUND_UP(pcms->hotplug_memory.base, 0x1ULL << 30)); fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val)); } @@ -1554,20 +1553,17 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name) static void pc_dimm_plug(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { - int slot; HotplugHandlerClass *hhc; Error *local_err = NULL; PCMachineState *pcms = PC_MACHINE(hotplug_dev); - MachineState *machine = MACHINE(hotplug_dev); PCDIMMDevice *dimm = PC_DIMM(dev); PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm); MemoryRegion *mr = ddc->get_memory_region(dimm); - uint64_t existing_dimms_capacity = 0; uint64_t align = TARGET_PAGE_SIZE; - uint64_t addr; - addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err); - if (local_err) { + if (!pcms->acpi_dev) { + error_setg(&local_err, + "memory hotplug is not enabled: missing acpi device"); goto out; } @@ -1575,67 +1571,18 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev, align = memory_region_get_alignment(mr); } - addr = pc_dimm_get_free_addr(pcms->hotplug_memory_base, - memory_region_size(&pcms->hotplug_memory), - !addr ? NULL : &addr, align, - memory_region_size(mr), &local_err); - if (local_err) { - goto out; - } - - existing_dimms_capacity = pc_existing_dimms_capacity(&local_err); - if (local_err) { - goto out; - } - - if (existing_dimms_capacity + memory_region_size(mr) > - machine->maxram_size - machine->ram_size) { - error_setg(&local_err, "not enough space, currently 0x%" PRIx64 - " in use of total hot pluggable 0x" RAM_ADDR_FMT, - existing_dimms_capacity, - machine->maxram_size - machine->ram_size); - goto out; - } - - object_property_set_int(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err); + pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err); if (local_err) { + pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr); goto out; } - trace_mhp_pc_dimm_assigned_address(addr); - slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err); - if (local_err) { - goto out; - } - - slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : &slot, - machine->ram_slots, &local_err); - if (local_err) { - goto out; - } - object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, &local_err); + hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev); + hhc->plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err); if (local_err) { - goto out; + pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr); } - trace_mhp_pc_dimm_assigned_slot(slot); - if (!pcms->acpi_dev) { - error_setg(&local_err, - "memory hotplug is not enabled: missing acpi device"); - goto out; - } - - if (kvm_enabled() && !kvm_has_free_slot(machine)) { - error_setg(&local_err, "hypervisor has no free memory slots left"); - goto out; - } - - memory_region_add_subregion(&pcms->hotplug_memory, - addr - pcms->hotplug_memory_base, mr); - vmstate_register_ram(mr, dev); - - hhc = HOTPLUG_HANDLER_GET_CLASS(pcms->acpi_dev); - hhc->plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev, &local_err); out: error_propagate(errp, local_err); } @@ -1677,11 +1624,8 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev, goto out; } - memory_region_del_subregion(&pcms->hotplug_memory, mr); - vmstate_unregister_ram(mr, dev); - + pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr); object_unparent(OBJECT(dev)); - out: error_propagate(errp, local_err); } @@ -1766,7 +1710,7 @@ pc_machine_get_hotplug_memory_region_size(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) { PCMachineState *pcms = PC_MACHINE(obj); - int64_t value = memory_region_size(&pcms->hotplug_memory); + int64_t value = memory_region_size(&pcms->hotplug_memory.mr); visit_type_int(v, &value, name, errp); } diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c index e70633d..98971b7 100644 --- a/hw/mem/pc-dimm.c +++ b/hw/mem/pc-dimm.c @@ -23,12 +23,92 @@ #include "qapi/visitor.h" #include "qemu/range.h" #include "sysemu/numa.h" +#include "sysemu/kvm.h" +#include "trace.h" typedef struct pc_dimms_capacity { uint64_t size; Error **errp; } pc_dimms_capacity; +void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, + MemoryRegion *mr, uint64_t align, Error **errp) +{ + int slot; + MachineState *machine = MACHINE(qdev_get_machine()); + PCDIMMDevice *dimm = PC_DIMM(dev); + Error *local_err = NULL; + uint64_t existing_dimms_capacity = 0; + uint64_t addr; + + addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err); + if (local_err) { + goto out; + } + + addr = pc_dimm_get_free_addr(hpms->base, + memory_region_size(&hpms->mr), + !addr ? NULL : &addr, align, + memory_region_size(mr), &local_err); + if (local_err) { + goto out; + } + + existing_dimms_capacity = pc_existing_dimms_capacity(&local_err); + if (local_err) { + goto out; + } + + if (existing_dimms_capacity + memory_region_size(mr) > + machine->maxram_size - machine->ram_size) { + error_setg(&local_err, "not enough space, currently 0x%" PRIx64 + " in use of total hot pluggable 0x" RAM_ADDR_FMT, + existing_dimms_capacity, + machine->maxram_size - machine->ram_size); + goto out; + } + + object_property_set_int(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err); + if (local_err) { + goto out; + } + trace_mhp_pc_dimm_assigned_address(addr); + + slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err); + if (local_err) { + goto out; + } + + slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : &slot, + machine->ram_slots, &local_err); + if (local_err) { + goto out; + } + object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, &local_err); + if (local_err) { + goto out; + } + trace_mhp_pc_dimm_assigned_slot(slot); + + if (kvm_enabled() && !kvm_has_free_slot(machine)) { + error_setg(&local_err, "hypervisor has no free memory slots left"); + goto out; + } + + memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); + vmstate_register_ram(mr, dev); + +out: + error_propagate(errp, local_err); +} + +void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms, + MemoryRegion *mr) +{ + memory_region_del_subregion(&hpms->mr, mr); + vmstate_unregister_ram(mr, dev); +} + static int pc_existing_dimms_capacity_internal(Object *obj, void *opaque) { pc_dimms_capacity *cap = opaque; diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 86c5651..6628791 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -15,6 +15,7 @@ #include "hw/pci/pci.h" #include "hw/boards.h" #include "hw/compat.h" +#include "hw/mem/pc-dimm.h" #define HPET_INTCAP "hpet-intcap" @@ -32,8 +33,7 @@ struct PCMachineState { MachineState parent_obj; /* <public> */ - ram_addr_t hotplug_memory_base; - MemoryRegion hotplug_memory; + MemoryHotplugState hotplug_memory; HotplugHandler *acpi_dev; ISADevice *rtc; diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h index f7b80b4..9fbab16 100644 --- a/include/hw/mem/pc-dimm.h +++ b/include/hw/mem/pc-dimm.h @@ -70,6 +70,11 @@ typedef struct PCDIMMDeviceClass { MemoryRegion *(*get_memory_region)(PCDIMMDevice *dimm); } PCDIMMDeviceClass; +typedef struct MemoryHotplugState { + ram_addr_t base; + MemoryRegion mr; +} MemoryHotplugState; + uint64_t pc_dimm_get_free_addr(uint64_t address_space_start, uint64_t address_space_size, uint64_t *hint, uint64_t align, uint64_t size, @@ -79,4 +84,8 @@ int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp); int qmp_pc_dimm_device_list(Object *obj, void *opaque); uint64_t pc_existing_dimms_capacity(Error **errp); +void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, + MemoryRegion *mr, uint64_t align, Error **errp); +void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms, + MemoryRegion *mr); #endif -- 2.1.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine Bharata B Rao @ 2015-06-15 6:32 ` David Gibson 0 siblings, 0 replies; 13+ messages in thread From: David Gibson @ 2015-06-15 6:32 UTC (permalink / raw) To: Bharata B Rao; +Cc: pbonzini, qemu-devel, ehabkost, imammedo [-- Attachment #1: Type: text/plain, Size: 672 bytes --] On Fri, Jun 12, 2015 at 02:30:25PM +0530, Bharata B Rao wrote: > pc_dimm_plug() has code that will be needed for memory plug handlers > in other archs too. Extract code from pc_dimm_plug() into a generic > routine pc_dimm_memory_plug() that resides in pc-dimm.c. Also > correspondingly refactor re-usable unplug code into pc_dimm_memory_unplug(). > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info 2015-06-12 9:00 [Qemu-devel] [RFC PATCH v1 0/4] Refactoring pc_dimm_plug and NUMA node lookup API Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine Bharata B Rao @ 2015-06-12 9:00 ` Bharata B Rao 2015-06-15 6:34 ` David Gibson 2015-06-15 9:17 ` Igor Mammedov 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 4/4] numa: API to lookup NUMA node by address Bharata B Rao 3 siblings, 2 replies; 13+ messages in thread From: Bharata B Rao @ 2015-06-12 9:00 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, imammedo, Bharata B Rao, ehabkost, david Start storing the (start_addr, size, nodeid) of the pc-dimm memory in numa_info so that this information can be used to lookup node by address. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- hw/mem/pc-dimm.c | 4 ++++ include/sysemu/numa.h | 10 ++++++++++ numa.c | 26 ++++++++++++++++++++++++++ 3 files changed, 40 insertions(+) diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c index 98971b7..bb04862 100644 --- a/hw/mem/pc-dimm.c +++ b/hw/mem/pc-dimm.c @@ -97,6 +97,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); vmstate_register_ram(mr, dev); + numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node); out: error_propagate(errp, local_err); @@ -105,6 +106,9 @@ out: void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms, MemoryRegion *mr) { + PCDIMMDevice *dimm = PC_DIMM(dev); + + numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node); memory_region_del_subregion(&hpms->mr, mr); vmstate_unregister_ram(mr, dev); } diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 6523b4d..7176364 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -10,16 +10,26 @@ extern int nb_numa_nodes; /* Number of NUMA nodes */ +struct numa_addr_range { + ram_addr_t mem_start; + ram_addr_t mem_end; + QLIST_ENTRY(numa_addr_range) entry; +}; + typedef struct node_info { uint64_t node_mem; DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); struct HostMemoryBackend *node_memdev; bool present; + QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */ } NodeInfo; + extern NodeInfo numa_info[MAX_NODES]; void parse_numa_opts(MachineClass *mc); void numa_post_machine_init(void); void query_numa_node_mem(uint64_t node_mem[]); extern QemuOptsList qemu_numa_opts; +void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); +void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); #endif diff --git a/numa.c b/numa.c index d227ccc..27ca743 100644 --- a/numa.c +++ b/numa.c @@ -53,6 +53,28 @@ static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one. int nb_numa_nodes; NodeInfo numa_info[MAX_NODES]; +void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) +{ + struct numa_addr_range *range = g_malloc0(sizeof(*range)); + + range->mem_start = addr; + range->mem_end = addr + size; + QLIST_INSERT_HEAD(&numa_info[node].addr, range, entry); +} + +void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) +{ + struct numa_addr_range *range, *next; + + QLIST_FOREACH_SAFE(range, &numa_info[node].addr, entry, next) { + if (addr == range->mem_start && (addr + size) == range->mem_end) { + QLIST_REMOVE(range, entry); + g_free(range); + return; + } + } +} + static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) { uint16_t nodenr; @@ -275,6 +297,10 @@ void parse_numa_opts(MachineClass *mc) } for (i = 0; i < nb_numa_nodes; i++) { + QLIST_INIT(&numa_info[i].addr); + } + + for (i = 0; i < nb_numa_nodes; i++) { if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) { break; } -- 2.1.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info Bharata B Rao @ 2015-06-15 6:34 ` David Gibson 2015-06-15 9:17 ` Igor Mammedov 1 sibling, 0 replies; 13+ messages in thread From: David Gibson @ 2015-06-15 6:34 UTC (permalink / raw) To: Bharata B Rao; +Cc: pbonzini, qemu-devel, ehabkost, imammedo [-- Attachment #1: Type: text/plain, Size: 537 bytes --] On Fri, Jun 12, 2015 at 02:30:26PM +0530, Bharata B Rao wrote: > Start storing the (start_addr, size, nodeid) of the pc-dimm memory > in numa_info so that this information can be used to lookup > node by address. > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info Bharata B Rao 2015-06-15 6:34 ` David Gibson @ 2015-06-15 9:17 ` Igor Mammedov 2015-06-15 13:04 ` Bharata B Rao 1 sibling, 1 reply; 13+ messages in thread From: Igor Mammedov @ 2015-06-15 9:17 UTC (permalink / raw) To: Bharata B Rao; +Cc: pbonzini, david, qemu-devel, ehabkost On Fri, 12 Jun 2015 14:30:26 +0530 Bharata B Rao <bharata@linux.vnet.ibm.com> wrote: > Start storing the (start_addr, size, nodeid) of the pc-dimm memory > in numa_info so that this information can be used to lookup > node by address. > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> > --- > hw/mem/pc-dimm.c | 4 ++++ > include/sysemu/numa.h | 10 ++++++++++ > numa.c | 26 ++++++++++++++++++++++++++ > 3 files changed, 40 insertions(+) > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c > index 98971b7..bb04862 100644 > --- a/hw/mem/pc-dimm.c > +++ b/hw/mem/pc-dimm.c > @@ -97,6 +97,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, > > memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > vmstate_register_ram(mr, dev); > + numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node); > > out: > error_propagate(errp, local_err); > @@ -105,6 +106,9 @@ out: > void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms, > MemoryRegion *mr) > { > + PCDIMMDevice *dimm = PC_DIMM(dev); > + > + numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node); Wouldn't that cause pc-dimm range appear in SRAT table? Before this pc-dimm-s are only added as ACPI devices but don't advertised in SRAT ACPI table. Perhaps make it up to target to decide if it want's to report dimms with numa_unset_mem_node_id() and not in generic code. > memory_region_del_subregion(&hpms->mr, mr); > vmstate_unregister_ram(mr, dev); > } > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h > index 6523b4d..7176364 100644 > --- a/include/sysemu/numa.h > +++ b/include/sysemu/numa.h > @@ -10,16 +10,26 @@ > > extern int nb_numa_nodes; /* Number of NUMA nodes */ > > +struct numa_addr_range { > + ram_addr_t mem_start; > + ram_addr_t mem_end; > + QLIST_ENTRY(numa_addr_range) entry; > +}; > + > typedef struct node_info { > uint64_t node_mem; > DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); > struct HostMemoryBackend *node_memdev; > bool present; > + QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */ > } NodeInfo; > + > extern NodeInfo numa_info[MAX_NODES]; > void parse_numa_opts(MachineClass *mc); > void numa_post_machine_init(void); > void query_numa_node_mem(uint64_t node_mem[]); > extern QemuOptsList qemu_numa_opts; > +void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); > +void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); > > #endif > diff --git a/numa.c b/numa.c > index d227ccc..27ca743 100644 > --- a/numa.c > +++ b/numa.c > @@ -53,6 +53,28 @@ static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one. > int nb_numa_nodes; > NodeInfo numa_info[MAX_NODES]; > > +void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) > +{ > + struct numa_addr_range *range = g_malloc0(sizeof(*range)); > + > + range->mem_start = addr; > + range->mem_end = addr + size; > + QLIST_INSERT_HEAD(&numa_info[node].addr, range, entry); > +} > + > +void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) > +{ > + struct numa_addr_range *range, *next; > + > + QLIST_FOREACH_SAFE(range, &numa_info[node].addr, entry, next) { > + if (addr == range->mem_start && (addr + size) == range->mem_end) { > + QLIST_REMOVE(range, entry); > + g_free(range); > + return; > + } > + } > +} > + > static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) > { > uint16_t nodenr; > @@ -275,6 +297,10 @@ void parse_numa_opts(MachineClass *mc) > } > > for (i = 0; i < nb_numa_nodes; i++) { > + QLIST_INIT(&numa_info[i].addr); > + } > + > + for (i = 0; i < nb_numa_nodes; i++) { > if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) { > break; > } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info 2015-06-15 9:17 ` Igor Mammedov @ 2015-06-15 13:04 ` Bharata B Rao 0 siblings, 0 replies; 13+ messages in thread From: Bharata B Rao @ 2015-06-15 13:04 UTC (permalink / raw) To: Igor Mammedov; +Cc: pbonzini, david, qemu-devel, ehabkost On Mon, Jun 15, 2015 at 11:17:59AM +0200, Igor Mammedov wrote: > On Fri, 12 Jun 2015 14:30:26 +0530 > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote: > > > Start storing the (start_addr, size, nodeid) of the pc-dimm memory > > in numa_info so that this information can be used to lookup > > node by address. > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> > > --- > > hw/mem/pc-dimm.c | 4 ++++ > > include/sysemu/numa.h | 10 ++++++++++ > > numa.c | 26 ++++++++++++++++++++++++++ > > 3 files changed, 40 insertions(+) > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c > > index 98971b7..bb04862 100644 > > --- a/hw/mem/pc-dimm.c > > +++ b/hw/mem/pc-dimm.c > > @@ -97,6 +97,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms, > > > > memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr); > > vmstate_register_ram(mr, dev); > > + numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node); > > > > out: > > error_propagate(errp, local_err); > > @@ -105,6 +106,9 @@ out: > > void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms, > > MemoryRegion *mr) > > { > > + PCDIMMDevice *dimm = PC_DIMM(dev); > > + > > + numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node); > Wouldn't that cause pc-dimm range appear in SRAT table? I don't think so. numa_set_mem_node_id() and numa_unset_mem_node_id() APIs store/remove address range and node id information of realized pc-dimm device in/from a linked list in numa_info structure so that we can lookup the node id for a given address from numa.c in a self-contained manner. So unless I am missing something, I don't see this affecting ACPI/SRAT table in any way. > Before this pc-dimm-s are only added as ACPI devices but don't > advertised in SRAT ACPI table. > > Perhaps make it up to target to decide if it want's to > report dimms with numa_unset_mem_node_id() and not in generic code. Regards, Bharata. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info 2015-06-12 9:00 [Qemu-devel] [RFC PATCH v1 0/4] Refactoring pc_dimm_plug and NUMA node lookup API Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info Bharata B Rao @ 2015-06-12 9:00 ` Bharata B Rao 2015-06-15 6:35 ` David Gibson 2015-06-15 16:31 ` Eduardo Habkost 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 4/4] numa: API to lookup NUMA node by address Bharata B Rao 3 siblings, 2 replies; 13+ messages in thread From: Bharata B Rao @ 2015-06-12 9:00 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, imammedo, Bharata B Rao, ehabkost, david Store memory address range information of boot memory in address range list of numa_info. This helps to have a common NUMA node lookup by address function that works for both boot time memory and hotplugged memory. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- numa.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/numa.c b/numa.c index 27ca743..d67b1fb 100644 --- a/numa.c +++ b/numa.c @@ -75,6 +75,26 @@ void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) } } +static void numa_set_mem_ranges(void) +{ + int i; + ram_addr_t mem_start, mem_end_prev; + + /* + * Deduce start address of each node and use it to store + * the address range info in numa_info address range list + */ + for (i = 0; i < nb_numa_nodes; i++) { + if (i) { + mem_start = mem_end_prev; + } else { + mem_start = 0; + } + mem_end_prev = mem_start + numa_info[i].node_mem; + numa_set_mem_node_id(mem_start, numa_info[i].node_mem, i); + } +} + static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) { uint16_t nodenr; @@ -300,6 +320,8 @@ void parse_numa_opts(MachineClass *mc) QLIST_INIT(&numa_info[i].addr); } + numa_set_mem_ranges(); + for (i = 0; i < nb_numa_nodes; i++) { if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) { break; -- 2.1.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info Bharata B Rao @ 2015-06-15 6:35 ` David Gibson 2015-06-15 16:31 ` Eduardo Habkost 1 sibling, 0 replies; 13+ messages in thread From: David Gibson @ 2015-06-15 6:35 UTC (permalink / raw) To: Bharata B Rao; +Cc: pbonzini, qemu-devel, ehabkost, imammedo [-- Attachment #1: Type: text/plain, Size: 616 bytes --] On Fri, Jun 12, 2015 at 02:30:27PM +0530, Bharata B Rao wrote: > Store memory address range information of boot memory in address > range list of numa_info. > > This helps to have a common NUMA node lookup by address function that > works for both boot time memory and hotplugged memory. > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info Bharata B Rao 2015-06-15 6:35 ` David Gibson @ 2015-06-15 16:31 ` Eduardo Habkost 2015-06-17 3:17 ` Bharata B Rao 1 sibling, 1 reply; 13+ messages in thread From: Eduardo Habkost @ 2015-06-15 16:31 UTC (permalink / raw) To: Bharata B Rao; +Cc: pbonzini, imammedo, qemu-devel, david On Fri, Jun 12, 2015 at 02:30:27PM +0530, Bharata B Rao wrote: > Store memory address range information of boot memory in address > range list of numa_info. > > This helps to have a common NUMA node lookup by address function that > works for both boot time memory and hotplugged memory. > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> > --- > numa.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/numa.c b/numa.c > index 27ca743..d67b1fb 100644 > --- a/numa.c > +++ b/numa.c > @@ -75,6 +75,26 @@ void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) > } > } > > +static void numa_set_mem_ranges(void) > +{ > + int i; > + ram_addr_t mem_start, mem_end_prev; > + > + /* > + * Deduce start address of each node and use it to store > + * the address range info in numa_info address range list > + */ > + for (i = 0; i < nb_numa_nodes; i++) { > + if (i) { > + mem_start = mem_end_prev; > + } else { > + mem_start = 0; > + } You could simply initialize mem_end_prev=0 before entering the loop, instead. Actually, you don't even need the mem_end_prev variable: int i; ram_addr_t mem_start = 0; for (i = 0; i < nb_numa_nodes; i++) { numa_set_mem_node_id(mem_start, numa_info[i].node_mem, i); mem_start = mem_start + numa_info[i].node_mem; } I was going to suggest moving this to memory_region_allocate_system_memory() instead (that already has a loop calculating the start address for each NUMA node), but the problem is that allocate_system_memory_nonnuma() may be called even if using NUMA if no memdevs are used. So this can be done later, after refactoring memory_region_allocate_system_memory() to have a single memory allocation code path. > + mem_end_prev = mem_start + numa_info[i].node_mem; > + numa_set_mem_node_id(mem_start, numa_info[i].node_mem, i); > + } > +} > + > static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) > { > uint16_t nodenr; > @@ -300,6 +320,8 @@ void parse_numa_opts(MachineClass *mc) > QLIST_INIT(&numa_info[i].addr); > } > > + numa_set_mem_ranges(); > + > for (i = 0; i < nb_numa_nodes; i++) { > if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) { > break; > -- > 2.1.0 > -- Eduardo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info 2015-06-15 16:31 ` Eduardo Habkost @ 2015-06-17 3:17 ` Bharata B Rao 0 siblings, 0 replies; 13+ messages in thread From: Bharata B Rao @ 2015-06-17 3:17 UTC (permalink / raw) To: Eduardo Habkost; +Cc: pbonzini, imammedo, qemu-devel, david On Mon, Jun 15, 2015 at 01:31:20PM -0300, Eduardo Habkost wrote: > On Fri, Jun 12, 2015 at 02:30:27PM +0530, Bharata B Rao wrote: > > Store memory address range information of boot memory in address > > range list of numa_info. > > > > This helps to have a common NUMA node lookup by address function that > > works for both boot time memory and hotplugged memory. > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> > > --- > > numa.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/numa.c b/numa.c > > index 27ca743..d67b1fb 100644 > > --- a/numa.c > > +++ b/numa.c > > @@ -75,6 +75,26 @@ void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node) > > } > > } > > > > +static void numa_set_mem_ranges(void) > > +{ > > + int i; > > + ram_addr_t mem_start, mem_end_prev; > > + > > + /* > > + * Deduce start address of each node and use it to store > > + * the address range info in numa_info address range list > > + */ > > + for (i = 0; i < nb_numa_nodes; i++) { > > + if (i) { > > + mem_start = mem_end_prev; > > + } else { > > + mem_start = 0; > > + } > > You could simply initialize mem_end_prev=0 before entering the loop, > instead. > > Actually, you don't even need the mem_end_prev variable: > > int i; > ram_addr_t mem_start = 0; > > for (i = 0; i < nb_numa_nodes; i++) { > numa_set_mem_node_id(mem_start, numa_info[i].node_mem, i); > mem_start = mem_start + numa_info[i].node_mem; > } Ok will change to this. > > I was going to suggest moving this to > memory_region_allocate_system_memory() instead (that already has a loop > calculating the start address for each NUMA node), but the problem is > that allocate_system_memory_nonnuma() may be called even if using NUMA > if no memdevs are used. So this can be done later, after refactoring > memory_region_allocate_system_memory() to have a single memory > allocation code path. If there are no more comments to be addressed in this series, I shall spin the next version. Regards, Bharata. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC PATCH v1 4/4] numa: API to lookup NUMA node by address 2015-06-12 9:00 [Qemu-devel] [RFC PATCH v1 0/4] Refactoring pc_dimm_plug and NUMA node lookup API Bharata B Rao ` (2 preceding siblings ...) 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info Bharata B Rao @ 2015-06-12 9:00 ` Bharata B Rao 2015-06-15 6:35 ` David Gibson 3 siblings, 1 reply; 13+ messages in thread From: Bharata B Rao @ 2015-06-12 9:00 UTC (permalink / raw) To: qemu-devel; +Cc: pbonzini, imammedo, Bharata B Rao, ehabkost, david Introduce an API numa_get_node(ram_addr_t addr, Error **errp) that returns the NUMA node to which the given address belongs to. This API works uniformly for both boot time as well as hotplugged memory. This API is needed by sPAPR PowerPC to support ibm,dynamic-reconfiguration-memory device tree node which is needed for memory hotplug. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- include/sysemu/numa.h | 1 + numa.c | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 7176364..a6392bc 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -31,5 +31,6 @@ void query_numa_node_mem(uint64_t node_mem[]); extern QemuOptsList qemu_numa_opts; void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); void numa_unset_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); +uint32_t numa_get_node(ram_addr_t addr, Error **errp); #endif diff --git a/numa.c b/numa.c index d67b1fb..ed18a61 100644 --- a/numa.c +++ b/numa.c @@ -95,6 +95,40 @@ static void numa_set_mem_ranges(void) } } +/* + * Check if @addr falls under NUMA @node. + */ +static bool numa_addr_belongs_to_node(ram_addr_t addr, uint32_t node) +{ + struct numa_addr_range *range; + + QLIST_FOREACH(range, &numa_info[node].addr, entry) { + if (addr >= range->mem_start && addr < range->mem_end) { + return true; + } + } + return false; +} + +/* + * Given an address, return the index of the NUMA node to which the + * address belongs to. + */ +uint32_t numa_get_node(ram_addr_t addr, Error **errp) +{ + uint32_t i; + + for (i = 0; i < nb_numa_nodes; i++) { + if (numa_addr_belongs_to_node(addr, i)) { + return i; + } + } + + error_setg(errp, "Address 0x" RAM_ADDR_FMT " doesn't belong to any " + "NUMA node", addr); + return -1; +} + static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp) { uint16_t nodenr; -- 2.1.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC PATCH v1 4/4] numa: API to lookup NUMA node by address 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 4/4] numa: API to lookup NUMA node by address Bharata B Rao @ 2015-06-15 6:35 ` David Gibson 0 siblings, 0 replies; 13+ messages in thread From: David Gibson @ 2015-06-15 6:35 UTC (permalink / raw) To: Bharata B Rao; +Cc: pbonzini, qemu-devel, ehabkost, imammedo [-- Attachment #1: Type: text/plain, Size: 743 bytes --] On Fri, Jun 12, 2015 at 02:30:28PM +0530, Bharata B Rao wrote: > Introduce an API numa_get_node(ram_addr_t addr, Error **errp) that > returns the NUMA node to which the given address belongs to. This > API works uniformly for both boot time as well as hotplugged memory. > > This API is needed by sPAPR PowerPC to support > ibm,dynamic-reconfiguration-memory device tree node which is needed for > memory hotplug. > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-06-17 3:18 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-06-12 9:00 [Qemu-devel] [RFC PATCH v1 0/4] Refactoring pc_dimm_plug and NUMA node lookup API Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 1/4] pc, pc-dimm: Factor out reusable parts in pc_dimm_plug to a separate routine Bharata B Rao 2015-06-15 6:32 ` David Gibson 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 2/4] numa, pc-dimm: Store pc-dimm memory information in numa_info Bharata B Rao 2015-06-15 6:34 ` David Gibson 2015-06-15 9:17 ` Igor Mammedov 2015-06-15 13:04 ` Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 3/4] numa: Store boot memory address range in node_info Bharata B Rao 2015-06-15 6:35 ` David Gibson 2015-06-15 16:31 ` Eduardo Habkost 2015-06-17 3:17 ` Bharata B Rao 2015-06-12 9:00 ` [Qemu-devel] [RFC PATCH v1 4/4] numa: API to lookup NUMA node by address Bharata B Rao 2015-06-15 6:35 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).