* [Qemu-devel] [PATCH for-4.2 v4 0/2] s390: stop abusing memory_region_allocate_system_memory() @ 2019-08-06 9:48 Igor Mammedov 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov 0 siblings, 2 replies; 19+ messages in thread From: Igor Mammedov @ 2019-08-06 9:48 UTC (permalink / raw) To: qemu-devel; +Cc: thuth, david, cohuck, borntraeger, qemu-s390x, pbonzini Changelog: since v3: - fix compilation issue - advance HVA along with GPA in kvm_set_phys_mem() since v2: - break migration from old QEMU (since 2.12-4.1) for guest with >8TB RAM and drop migratable aliases patch as was agreed during v2 review - drop 4.2 machines patch as it's not prerequisite anymore since v1: - include 4.2 machines patch for adding compat RAM layout on top - 2/4 add missing in v1 patch for splitting too big MemorySection on several memslots - 3/4 amend code path on alias destruction to ensure that RAMBlock is cleaned properly - 4/4 add compat machine code to keep old layout (migration-wise) for 4.1 and older machines While looking into unifying guest RAM allocation to use hostmem backends for initial RAM (especially when -mempath is used) and retiring memory_region_allocate_system_memory() API, leaving only single hostmem backend, I was inspecting how currently it is used by boards and it turns out several boards abuse it by calling the function several times (despite documented contract forbiding it). s390 is one of such boards where KVM limitation on memslot size got propagated to board design and memory_region_allocate_system_memory() was abused to satisfy KVM requirement for max RAM chunk where memory region alias would suffice. Unfortunately, memory_region_allocate_system_memory() usage created migration dependency where guest RAM is transferred in migration stream as several RAMBlocks if it's more than KVM_SLOT_MAX_BYTES. During v2 review it was agreed to ignore migration breakage (documenting it in release notes) and leaving only KVM fix. In order to replace these several RAM chunks with a single memdev and keep it working with KVM memslot size limit, following was done: * [1/2] split too big RAM chunk inside of KVM code on several memory slots if necessary * [2/2] drop manual ram splitting in s390 code CC: pbonzini@redhat.com CC: qemu-s390x@nongnu.org CC: borntraeger@de.ibm.com CC: thuth@redhat.com CC: david@redhat.com CC: cohuck@redhat.com Igor Mammedov (2): kvm: s390: split too big memory section on several memslots s390: do not call memory_region_allocate_system_memory() multiple times include/sysemu/kvm_int.h | 1 + accel/kvm/kvm-all.c | 80 +++++++++++++++++++++++--------------- hw/s390x/s390-virtio-ccw.c | 30 ++------------ target/s390x/kvm.c | 12 ++++++ 4 files changed, 65 insertions(+), 58 deletions(-) -- 2.18.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots 2019-08-06 9:48 [Qemu-devel] [PATCH for-4.2 v4 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov @ 2019-08-06 9:48 ` Igor Mammedov 2019-08-07 7:54 ` [Qemu-devel] [qemu-s390x] " David Hildenbrand 2019-08-07 15:32 ` [Qemu-devel] [PATCH for-4.2 v5 " Igor Mammedov 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov 1 sibling, 2 replies; 19+ messages in thread From: Igor Mammedov @ 2019-08-06 9:48 UTC (permalink / raw) To: qemu-devel; +Cc: thuth, david, cohuck, borntraeger, qemu-s390x, pbonzini Max memslot size supported by kvm on s390 is 8Tb, move logic of splitting RAM in chunks upto 8T to KVM code. This way it will hide KVM specific restrictions in KVM code and won't affect baord level design decisions. Which would allow us to avoid misusing memory_region_allocate_system_memory() API and eventually use a single hostmem backend for guest RAM. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- v4: * fix compilation issue (Christian Borntraeger <borntraeger@de.ibm.com>) * advance HVA along with GPA in kvm_set_phys_mem() (Christian Borntraeger <borntraeger@de.ibm.com>) patch prepares only KVM side for switching to single RAM memory region another patch will take care of dropping manual RAM partitioning in s390 code. --- include/sysemu/kvm_int.h | 1 + accel/kvm/kvm-all.c | 80 +++++++++++++++++++++++--------------- hw/s390x/s390-virtio-ccw.c | 9 ----- target/s390x/kvm.c | 12 ++++++ 4 files changed, 62 insertions(+), 40 deletions(-) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index 31df465fdc..7f7520bce2 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, AddressSpace *as, int as_id); +void kvm_set_max_memslot_size(hwaddr max_slot_size); #endif diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index f450f25295..d87f855ea4 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; bool kvm_ioeventfd_any_length_allowed; bool kvm_msi_use_devid; static bool kvm_immediate_exit; +static hwaddr kvm_max_slot_size = ~0; static const KVMCapabilityInfo kvm_required_capabilites[] = { KVM_CAP_INFO(USER_MEMORY), @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) return NULL; } +void kvm_set_max_memslot_size(hwaddr max_slot_size) +{ + g_assert( + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size + ); + kvm_max_slot_size = max_slot_size; +} + static void kvm_set_phys_mem(KVMMemoryListener *kml, MemoryRegionSection *section, bool add) { @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, int err; MemoryRegion *mr = section->mr; bool writeable = !mr->readonly && !mr->rom_device; - hwaddr start_addr, size; + hwaddr start_addr, size, slot_size; void *ram; if (!memory_region_is_ram(mr)) { @@ -983,41 +992,50 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, kvm_slots_lock(kml); if (!add) { - mem = kvm_lookup_matching_slot(kml, start_addr, size); - if (!mem) { - goto out; - } - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { - kvm_physical_sync_dirty_bitmap(kml, section); - } + do { + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); + if (!mem) { + goto out; + } + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { + kvm_physical_sync_dirty_bitmap(kml, section); + } - /* unregister the slot */ - g_free(mem->dirty_bmap); - mem->dirty_bmap = NULL; - mem->memory_size = 0; - mem->flags = 0; - err = kvm_set_user_memory_region(kml, mem, false); - if (err) { - fprintf(stderr, "%s: error unregistering slot: %s\n", - __func__, strerror(-err)); - abort(); - } + /* unregister the slot */ + g_free(mem->dirty_bmap); + mem->dirty_bmap = NULL; + mem->memory_size = 0; + mem->flags = 0; + err = kvm_set_user_memory_region(kml, mem, false); + if (err) { + fprintf(stderr, "%s: error unregistering slot: %s\n", + __func__, strerror(-err)); + abort(); + } + start_addr += slot_size; + } while ((size -= slot_size)); goto out; } /* register the new slot */ - mem = kvm_alloc_slot(kml); - mem->memory_size = size; - mem->start_addr = start_addr; - mem->ram = ram; - mem->flags = kvm_mem_flags(mr); - - err = kvm_set_user_memory_region(kml, mem, true); - if (err) { - fprintf(stderr, "%s: error registering slot: %s\n", __func__, - strerror(-err)); - abort(); - } + do { + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; + mem = kvm_alloc_slot(kml); + mem->memory_size = slot_size; + mem->start_addr = start_addr; + mem->ram = ram; + mem->flags = kvm_mem_flags(mr); + + err = kvm_set_user_memory_region(kml, mem, true); + if (err) { + fprintf(stderr, "%s: error registering slot: %s\n", __func__, + strerror(-err)); + abort(); + } + start_addr += slot_size; + ram += slot_size; + } while ((size -= slot_size)); out: kvm_slots_unlock(kml); diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 5b6a9a4e55..0c03ffb7c7 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -151,15 +151,6 @@ static void virtio_ccw_register_hcalls(void) virtio_ccw_hcall_early_printk); } -/* - * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages - * as the dirty bitmap must be managed by bitops that take an int as - * position indicator. If we have a guest beyond that we will split off - * new subregions. The split must happen on a segment boundary (1MB). - */ -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) -#define SEG_MSK (~0xfffffULL) -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) static void s390_memory_init(ram_addr_t mem_size) { MemoryRegion *sysmem = get_system_memory(); diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c index 6e814c230b..6b1428a760 100644 --- a/target/s390x/kvm.c +++ b/target/s390x/kvm.c @@ -28,6 +28,7 @@ #include "cpu.h" #include "internal.h" #include "kvm_s390x.h" +#include "sysemu/kvm_int.h" #include "qapi/error.h" #include "qemu/error-report.h" #include "qemu/timer.h" @@ -122,6 +123,16 @@ #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \ (max_cpus + NR_LOCAL_IRQS)) +/* + * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages + * as the dirty bitmap must be managed by bitops that take an int as + * position indicator. If we have a guest beyond that we will split off + * new subregions. The split must happen on a segment boundary (1MB). + */ +#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) +#define SEG_MSK (~0xfffffULL) +#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) + static CPUWatchpoint hw_watchpoint; /* * We don't use a list because this structure is also used to transmit the @@ -347,6 +358,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) */ /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */ + kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES); return 0; } -- 2.18.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov @ 2019-08-07 7:54 ` David Hildenbrand 2019-08-07 9:55 ` Igor Mammedov 2019-08-07 15:32 ` [Qemu-devel] [PATCH for-4.2 v5 " Igor Mammedov 1 sibling, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2019-08-07 7:54 UTC (permalink / raw) To: Igor Mammedov, qemu-devel Cc: borntraeger, thuth, cohuck, qemu-s390x, pbonzini On 06.08.19 11:48, Igor Mammedov wrote: > Max memslot size supported by kvm on s390 is 8Tb, > move logic of splitting RAM in chunks upto 8T to KVM code. > > This way it will hide KVM specific restrictions in KVM code > and won't affect baord level design decisions. Which would allow > us to avoid misusing memory_region_allocate_system_memory() API > and eventually use a single hostmem backend for guest RAM. > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > v4: > * fix compilation issue > (Christian Borntraeger <borntraeger@de.ibm.com>) > * advance HVA along with GPA in kvm_set_phys_mem() > (Christian Borntraeger <borntraeger@de.ibm.com>) > > patch prepares only KVM side for switching to single RAM memory region > another patch will take care of dropping manual RAM partitioning in > s390 code. > --- > include/sysemu/kvm_int.h | 1 + > accel/kvm/kvm-all.c | 80 +++++++++++++++++++++++--------------- > hw/s390x/s390-virtio-ccw.c | 9 ----- > target/s390x/kvm.c | 12 ++++++ > 4 files changed, 62 insertions(+), 40 deletions(-) > > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h > index 31df465fdc..7f7520bce2 100644 > --- a/include/sysemu/kvm_int.h > +++ b/include/sysemu/kvm_int.h > @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { > void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, > AddressSpace *as, int as_id); > > +void kvm_set_max_memslot_size(hwaddr max_slot_size); > #endif > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index f450f25295..d87f855ea4 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; > bool kvm_ioeventfd_any_length_allowed; > bool kvm_msi_use_devid; > static bool kvm_immediate_exit; > +static hwaddr kvm_max_slot_size = ~0; > > static const KVMCapabilityInfo kvm_required_capabilites[] = { > KVM_CAP_INFO(USER_MEMORY), > @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) > return NULL; > } > > +void kvm_set_max_memslot_size(hwaddr max_slot_size) > +{ > + g_assert( > + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size > + ); > + kvm_max_slot_size = max_slot_size; > +} > + > static void kvm_set_phys_mem(KVMMemoryListener *kml, > MemoryRegionSection *section, bool add) > { > @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > int err; > MemoryRegion *mr = section->mr; > bool writeable = !mr->readonly && !mr->rom_device; > - hwaddr start_addr, size; > + hwaddr start_addr, size, slot_size; > void *ram; > > if (!memory_region_is_ram(mr)) { > @@ -983,41 +992,50 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > kvm_slots_lock(kml); > > if (!add) { > - mem = kvm_lookup_matching_slot(kml, start_addr, size); > - if (!mem) { > - goto out; > - } > - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > - kvm_physical_sync_dirty_bitmap(kml, section); > - } > + do { > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); > + if (!mem) { > + goto out; I wonder if this can trigger for the first, but not the second slot (or the other way around). In that case you would want to continue the loop (incrementing counters). But most probably there would something be wrong in the caller if that would happen. > + } > + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > + kvm_physical_sync_dirty_bitmap(kml, section); > + } > > - /* unregister the slot */ > - g_free(mem->dirty_bmap); > - mem->dirty_bmap = NULL; > - mem->memory_size = 0; > - mem->flags = 0; > - err = kvm_set_user_memory_region(kml, mem, false); > - if (err) { > - fprintf(stderr, "%s: error unregistering slot: %s\n", > - __func__, strerror(-err)); > - abort(); > - } > + /* unregister the slot */ > + g_free(mem->dirty_bmap); > + mem->dirty_bmap = NULL; > + mem->memory_size = 0; > + mem->flags = 0; > + err = kvm_set_user_memory_region(kml, mem, false); > + if (err) { > + fprintf(stderr, "%s: error unregistering slot: %s\n", > + __func__, strerror(-err)); > + abort(); > + } > + start_addr += slot_size; > + } while ((size -= slot_size)); NIT: I think you can drop parentheses - but I would really prefer to not perform computations in the condition. > goto out; > } > > /* register the new slot */ > - mem = kvm_alloc_slot(kml); > - mem->memory_size = size; > - mem->start_addr = start_addr; > - mem->ram = ram; > - mem->flags = kvm_mem_flags(mr); > - > - err = kvm_set_user_memory_region(kml, mem, true); > - if (err) { > - fprintf(stderr, "%s: error registering slot: %s\n", __func__, > - strerror(-err)); > - abort(); > - } > + do { > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > + mem = kvm_alloc_slot(kml); > + mem->memory_size = slot_size; > + mem->start_addr = start_addr; > + mem->ram = ram; > + mem->flags = kvm_mem_flags(mr); > + > + err = kvm_set_user_memory_region(kml, mem, true); > + if (err) { > + fprintf(stderr, "%s: error registering slot: %s\n", __func__, > + strerror(-err)); > + abort(); > + } > + start_addr += slot_size; > + ram += slot_size; > + } while ((size -= slot_size)); dito One note: KVMState stores the number of slots in "nr_slots". We export that via kvm_get_max_memslots(). E.g., spapr uses that to compare it against "machine->ram_slots". Later (esp. for s390x), kvm_get_max_memslots() can no longer be compared to ram_slots directly. Could be that a ram slot would map to multiple KVM memory slots. There would be no easy way to detect if KVM is able to deal with "machine->ram_slots" as defined by the user, until the sizes of the slots are known. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots 2019-08-07 7:54 ` [Qemu-devel] [qemu-s390x] " David Hildenbrand @ 2019-08-07 9:55 ` Igor Mammedov 2019-08-07 10:20 ` David Hildenbrand 0 siblings, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-08-07 9:55 UTC (permalink / raw) To: David Hildenbrand Cc: thuth, cohuck, qemu-devel, borntraeger, qemu-s390x, pbonzini On Wed, 7 Aug 2019 09:54:27 +0200 David Hildenbrand <david@redhat.com> wrote: > On 06.08.19 11:48, Igor Mammedov wrote: > > Max memslot size supported by kvm on s390 is 8Tb, > > move logic of splitting RAM in chunks upto 8T to KVM code. > > > > This way it will hide KVM specific restrictions in KVM code > > and won't affect baord level design decisions. Which would allow > > us to avoid misusing memory_region_allocate_system_memory() API > > and eventually use a single hostmem backend for guest RAM. > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > v4: > > * fix compilation issue > > (Christian Borntraeger <borntraeger@de.ibm.com>) > > * advance HVA along with GPA in kvm_set_phys_mem() > > (Christian Borntraeger <borntraeger@de.ibm.com>) > > > > patch prepares only KVM side for switching to single RAM memory region > > another patch will take care of dropping manual RAM partitioning in > > s390 code. > > --- > > include/sysemu/kvm_int.h | 1 + > > accel/kvm/kvm-all.c | 80 +++++++++++++++++++++++--------------- > > hw/s390x/s390-virtio-ccw.c | 9 ----- > > target/s390x/kvm.c | 12 ++++++ > > 4 files changed, 62 insertions(+), 40 deletions(-) > > > > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h > > index 31df465fdc..7f7520bce2 100644 > > --- a/include/sysemu/kvm_int.h > > +++ b/include/sysemu/kvm_int.h > > @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { > > void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, > > AddressSpace *as, int as_id); > > > > +void kvm_set_max_memslot_size(hwaddr max_slot_size); > > #endif > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index f450f25295..d87f855ea4 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; > > bool kvm_ioeventfd_any_length_allowed; > > bool kvm_msi_use_devid; > > static bool kvm_immediate_exit; > > +static hwaddr kvm_max_slot_size = ~0; > > > > static const KVMCapabilityInfo kvm_required_capabilites[] = { > > KVM_CAP_INFO(USER_MEMORY), > > @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) > > return NULL; > > } > > > > +void kvm_set_max_memslot_size(hwaddr max_slot_size) > > +{ > > + g_assert( > > + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size > > + ); > > + kvm_max_slot_size = max_slot_size; > > +} > > + > > static void kvm_set_phys_mem(KVMMemoryListener *kml, > > MemoryRegionSection *section, bool add) > > { > > @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > int err; > > MemoryRegion *mr = section->mr; > > bool writeable = !mr->readonly && !mr->rom_device; > > - hwaddr start_addr, size; > > + hwaddr start_addr, size, slot_size; > > void *ram; > > > > if (!memory_region_is_ram(mr)) { > > @@ -983,41 +992,50 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > kvm_slots_lock(kml); > > > > if (!add) { > > - mem = kvm_lookup_matching_slot(kml, start_addr, size); > > - if (!mem) { > > - goto out; > > - } > > - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > - kvm_physical_sync_dirty_bitmap(kml, section); > > - } > > + do { > > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > > + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); > > + if (!mem) { > > + goto out; > > I wonder if this can trigger for the first, but not the second slot (or > the other way around). In that case you would want to continue the loop > (incrementing counters). But most probably there would something be > wrong in the caller if that would happen. I couldn't come up with scenario where it would be possible (unless flatview rendering is broken) > > > + } > > + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > + kvm_physical_sync_dirty_bitmap(kml, section); > > + } > > > > - /* unregister the slot */ > > - g_free(mem->dirty_bmap); > > - mem->dirty_bmap = NULL; > > - mem->memory_size = 0; > > - mem->flags = 0; > > - err = kvm_set_user_memory_region(kml, mem, false); > > - if (err) { > > - fprintf(stderr, "%s: error unregistering slot: %s\n", > > - __func__, strerror(-err)); > > - abort(); > > - } > > + /* unregister the slot */ > > + g_free(mem->dirty_bmap); > > + mem->dirty_bmap = NULL; > > + mem->memory_size = 0; > > + mem->flags = 0; > > + err = kvm_set_user_memory_region(kml, mem, false); > > + if (err) { > > + fprintf(stderr, "%s: error unregistering slot: %s\n", > > + __func__, strerror(-err)); > > + abort(); > > + } > > + start_addr += slot_size; > > + } while ((size -= slot_size)); > > NIT: I think you can drop parentheses - but I would really prefer to not > perform computations in the condition. sure, I'll move computation within the loop > > goto out; > > } > > > > /* register the new slot */ > > - mem = kvm_alloc_slot(kml); > > - mem->memory_size = size; > > - mem->start_addr = start_addr; > > - mem->ram = ram; > > - mem->flags = kvm_mem_flags(mr); > > - > > - err = kvm_set_user_memory_region(kml, mem, true); > > - if (err) { > > - fprintf(stderr, "%s: error registering slot: %s\n", __func__, > > - strerror(-err)); > > - abort(); > > - } > > + do { > > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > > + mem = kvm_alloc_slot(kml); > > + mem->memory_size = slot_size; > > + mem->start_addr = start_addr; > > + mem->ram = ram; > > + mem->flags = kvm_mem_flags(mr); > > + > > + err = kvm_set_user_memory_region(kml, mem, true); > > + if (err) { > > + fprintf(stderr, "%s: error registering slot: %s\n", __func__, > > + strerror(-err)); > > + abort(); > > + } > > + start_addr += slot_size; > > + ram += slot_size; > > + } while ((size -= slot_size)); > > dito > > One note: > > KVMState stores the number of slots in "nr_slots". We export that via > kvm_get_max_memslots(). > > E.g., spapr uses that to compare it against "machine->ram_slots". this patch shouldn't affect spapr/arm or x86 machines as they do not have limits on memslot size. > Later (esp. for s390x), kvm_get_max_memslots() can no longer be compared to > ram_slots directly. Could be that a ram slot would map to multiple KVM > memory slots. There would be no easy way to detect if KVM is able to > deal with "machine->ram_slots" as defined by the user, until the sizes > of the slots are known. If I recall correctly about kvm_foo_slots() APIs, assumption 1 memory region == 1 memslot didn't/doesn't hold true in all cases (ex: x86) and it's only best effort to let us compare the number of apples to oranges on a tree and works for current usecases. From hotplug point of view kvm_has_free_slot() would be more important, to allow for graceful abort. If s390 would ever need to hotplug RAM MemoryRegion, anyway APIs should be changed to account for 1:N split when actual dependency arises. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots 2019-08-07 9:55 ` Igor Mammedov @ 2019-08-07 10:20 ` David Hildenbrand 0 siblings, 0 replies; 19+ messages in thread From: David Hildenbrand @ 2019-08-07 10:20 UTC (permalink / raw) To: Igor Mammedov Cc: thuth, cohuck, qemu-devel, borntraeger, qemu-s390x, pbonzini On 07.08.19 11:55, Igor Mammedov wrote: > On Wed, 7 Aug 2019 09:54:27 +0200 > David Hildenbrand <david@redhat.com> wrote: > >> On 06.08.19 11:48, Igor Mammedov wrote: >>> Max memslot size supported by kvm on s390 is 8Tb, >>> move logic of splitting RAM in chunks upto 8T to KVM code. >>> >>> This way it will hide KVM specific restrictions in KVM code >>> and won't affect baord level design decisions. Which would allow >>> us to avoid misusing memory_region_allocate_system_memory() API >>> and eventually use a single hostmem backend for guest RAM. >>> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>> --- >>> v4: >>> * fix compilation issue >>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>> * advance HVA along with GPA in kvm_set_phys_mem() >>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>> >>> patch prepares only KVM side for switching to single RAM memory region >>> another patch will take care of dropping manual RAM partitioning in >>> s390 code. >>> --- >>> include/sysemu/kvm_int.h | 1 + >>> accel/kvm/kvm-all.c | 80 +++++++++++++++++++++++--------------- >>> hw/s390x/s390-virtio-ccw.c | 9 ----- >>> target/s390x/kvm.c | 12 ++++++ >>> 4 files changed, 62 insertions(+), 40 deletions(-) >>> >>> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h >>> index 31df465fdc..7f7520bce2 100644 >>> --- a/include/sysemu/kvm_int.h >>> +++ b/include/sysemu/kvm_int.h >>> @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { >>> void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, >>> AddressSpace *as, int as_id); >>> >>> +void kvm_set_max_memslot_size(hwaddr max_slot_size); >>> #endif >>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c >>> index f450f25295..d87f855ea4 100644 >>> --- a/accel/kvm/kvm-all.c >>> +++ b/accel/kvm/kvm-all.c >>> @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; >>> bool kvm_ioeventfd_any_length_allowed; >>> bool kvm_msi_use_devid; >>> static bool kvm_immediate_exit; >>> +static hwaddr kvm_max_slot_size = ~0; >>> >>> static const KVMCapabilityInfo kvm_required_capabilites[] = { >>> KVM_CAP_INFO(USER_MEMORY), >>> @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) >>> return NULL; >>> } >>> >>> +void kvm_set_max_memslot_size(hwaddr max_slot_size) >>> +{ >>> + g_assert( >>> + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size >>> + ); >>> + kvm_max_slot_size = max_slot_size; >>> +} >>> + >>> static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> MemoryRegionSection *section, bool add) >>> { >>> @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> int err; >>> MemoryRegion *mr = section->mr; >>> bool writeable = !mr->readonly && !mr->rom_device; >>> - hwaddr start_addr, size; >>> + hwaddr start_addr, size, slot_size; >>> void *ram; >>> >>> if (!memory_region_is_ram(mr)) { >>> @@ -983,41 +992,50 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, >>> kvm_slots_lock(kml); >>> >>> if (!add) { >>> - mem = kvm_lookup_matching_slot(kml, start_addr, size); >>> - if (!mem) { >>> - goto out; >>> - } >>> - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { >>> - kvm_physical_sync_dirty_bitmap(kml, section); >>> - } >>> + do { >>> + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; >>> + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); >>> + if (!mem) { >>> + goto out; >> >> I wonder if this can trigger for the first, but not the second slot (or >> the other way around). In that case you would want to continue the loop >> (incrementing counters). But most probably there would something be >> wrong in the caller if that would happen. > > I couldn't come up with scenario where it would be possible > (unless flatview rendering is broken) > >> >>> + } >>> + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { >>> + kvm_physical_sync_dirty_bitmap(kml, section); >>> + } >>> >>> - /* unregister the slot */ >>> - g_free(mem->dirty_bmap); >>> - mem->dirty_bmap = NULL; >>> - mem->memory_size = 0; >>> - mem->flags = 0; >>> - err = kvm_set_user_memory_region(kml, mem, false); >>> - if (err) { >>> - fprintf(stderr, "%s: error unregistering slot: %s\n", >>> - __func__, strerror(-err)); >>> - abort(); >>> - } >>> + /* unregister the slot */ >>> + g_free(mem->dirty_bmap); >>> + mem->dirty_bmap = NULL; >>> + mem->memory_size = 0; >>> + mem->flags = 0; >>> + err = kvm_set_user_memory_region(kml, mem, false); >>> + if (err) { >>> + fprintf(stderr, "%s: error unregistering slot: %s\n", >>> + __func__, strerror(-err)); >>> + abort(); >>> + } >>> + start_addr += slot_size; >>> + } while ((size -= slot_size)); >> >> NIT: I think you can drop parentheses - but I would really prefer to not >> perform computations in the condition. > sure, I'll move computation within the loop > >>> goto out; >>> } >>> >>> /* register the new slot */ >>> - mem = kvm_alloc_slot(kml); >>> - mem->memory_size = size; >>> - mem->start_addr = start_addr; >>> - mem->ram = ram; >>> - mem->flags = kvm_mem_flags(mr); >>> - >>> - err = kvm_set_user_memory_region(kml, mem, true); >>> - if (err) { >>> - fprintf(stderr, "%s: error registering slot: %s\n", __func__, >>> - strerror(-err)); >>> - abort(); >>> - } >>> + do { >>> + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; >>> + mem = kvm_alloc_slot(kml); >>> + mem->memory_size = slot_size; >>> + mem->start_addr = start_addr; >>> + mem->ram = ram; >>> + mem->flags = kvm_mem_flags(mr); >>> + >>> + err = kvm_set_user_memory_region(kml, mem, true); >>> + if (err) { >>> + fprintf(stderr, "%s: error registering slot: %s\n", __func__, >>> + strerror(-err)); >>> + abort(); >>> + } >>> + start_addr += slot_size; >>> + ram += slot_size; >>> + } while ((size -= slot_size)); >> >> dito >> >> One note: >> >> KVMState stores the number of slots in "nr_slots". We export that via >> kvm_get_max_memslots(). >> >> E.g., spapr uses that to compare it against "machine->ram_slots". > this patch shouldn't affect spapr/arm or x86 machines as they do not have > limits on memslot size. > Yes, just an example how the existing API could be used. >> Later (esp. for s390x), kvm_get_max_memslots() can no longer be compared to >> ram_slots directly. Could be that a ram slot would map to multiple KVM >> memory slots. There would be no easy way to detect if KVM is able to >> deal with "machine->ram_slots" as defined by the user, until the sizes >> of the slots are known. > > If I recall correctly about kvm_foo_slots() APIs, > assumption 1 memory region == 1 memslot didn't/doesn't hold true > in all cases (ex: x86) and it's only best effort to let us compare > the number of apples to oranges on a tree and works for current > usecases. Yes, x86 needs two kvm slots due to SMM if I recall correctly. > > From hotplug point of view kvm_has_free_slot() would be more important, > to allow for graceful abort. If s390 would ever need to hotplug > RAM MemoryRegion, anyway APIs should be changed to account for > 1:N split when actual dependency arises. Exactly, we should handle it gracefully then. (and hotplugging >4TB DIMMs is quite unlikely, so we can ignore that for now, just wanted to note it) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov 2019-08-07 7:54 ` [Qemu-devel] [qemu-s390x] " David Hildenbrand @ 2019-08-07 15:32 ` Igor Mammedov 2019-08-20 16:07 ` Cornelia Huck 1 sibling, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-08-07 15:32 UTC (permalink / raw) To: qemu-devel; +Cc: thuth, david, cohuck, borntraeger, qemu-s390x, pbonzini Max memslot size supported by kvm on s390 is 8Tb, move logic of splitting RAM in chunks upto 8T to KVM code. This way it will hide KVM specific restrictions in KVM code and won't affect baord level design decisions. Which would allow us to avoid misusing memory_region_allocate_system_memory() API and eventually use a single hostmem backend for guest RAM. Signed-off-by: Igor Mammedov <imammedo@redhat.com> --- v5: * move computation 'size -= slot_size' inside of loop body (David Hildenbrand <david@redhat.com>) v4: * fix compilation issue (Christian Borntraeger <borntraeger@de.ibm.com>) * advance HVA along with GPA in kvm_set_phys_mem() (Christian Borntraeger <borntraeger@de.ibm.com>) patch prepares only KVM side for switching to single RAM memory region another patch will take care of dropping manual RAM partitioning in s390 code. --- include/sysemu/kvm_int.h | 1 + accel/kvm/kvm-all.c | 82 ++++++++++++++++++++++++-------------- hw/s390x/s390-virtio-ccw.c | 9 ----- target/s390x/kvm.c | 12 ++++++ 4 files changed, 64 insertions(+), 40 deletions(-) diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index 31df465fdc..7f7520bce2 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, AddressSpace *as, int as_id); +void kvm_set_max_memslot_size(hwaddr max_slot_size); #endif diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index f450f25295..8153556335 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; bool kvm_ioeventfd_any_length_allowed; bool kvm_msi_use_devid; static bool kvm_immediate_exit; +static hwaddr kvm_max_slot_size = ~0; static const KVMCapabilityInfo kvm_required_capabilites[] = { KVM_CAP_INFO(USER_MEMORY), @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) return NULL; } +void kvm_set_max_memslot_size(hwaddr max_slot_size) +{ + g_assert( + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size + ); + kvm_max_slot_size = max_slot_size; +} + static void kvm_set_phys_mem(KVMMemoryListener *kml, MemoryRegionSection *section, bool add) { @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, int err; MemoryRegion *mr = section->mr; bool writeable = !mr->readonly && !mr->rom_device; - hwaddr start_addr, size; + hwaddr start_addr, size, slot_size; void *ram; if (!memory_region_is_ram(mr)) { @@ -983,41 +992,52 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, kvm_slots_lock(kml); if (!add) { - mem = kvm_lookup_matching_slot(kml, start_addr, size); - if (!mem) { - goto out; - } - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { - kvm_physical_sync_dirty_bitmap(kml, section); - } + do { + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); + if (!mem) { + goto out; + } + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { + kvm_physical_sync_dirty_bitmap(kml, section); + } - /* unregister the slot */ - g_free(mem->dirty_bmap); - mem->dirty_bmap = NULL; - mem->memory_size = 0; - mem->flags = 0; - err = kvm_set_user_memory_region(kml, mem, false); - if (err) { - fprintf(stderr, "%s: error unregistering slot: %s\n", - __func__, strerror(-err)); - abort(); - } + /* unregister the slot */ + g_free(mem->dirty_bmap); + mem->dirty_bmap = NULL; + mem->memory_size = 0; + mem->flags = 0; + err = kvm_set_user_memory_region(kml, mem, false); + if (err) { + fprintf(stderr, "%s: error unregistering slot: %s\n", + __func__, strerror(-err)); + abort(); + } + start_addr += slot_size; + size -= slot_size; + } while (size); goto out; } /* register the new slot */ - mem = kvm_alloc_slot(kml); - mem->memory_size = size; - mem->start_addr = start_addr; - mem->ram = ram; - mem->flags = kvm_mem_flags(mr); - - err = kvm_set_user_memory_region(kml, mem, true); - if (err) { - fprintf(stderr, "%s: error registering slot: %s\n", __func__, - strerror(-err)); - abort(); - } + do { + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; + mem = kvm_alloc_slot(kml); + mem->memory_size = slot_size; + mem->start_addr = start_addr; + mem->ram = ram; + mem->flags = kvm_mem_flags(mr); + + err = kvm_set_user_memory_region(kml, mem, true); + if (err) { + fprintf(stderr, "%s: error registering slot: %s\n", __func__, + strerror(-err)); + abort(); + } + start_addr += slot_size; + ram += slot_size; + size -= slot_size; + } while (size); out: kvm_slots_unlock(kml); diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 5b6a9a4e55..0c03ffb7c7 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -151,15 +151,6 @@ static void virtio_ccw_register_hcalls(void) virtio_ccw_hcall_early_printk); } -/* - * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages - * as the dirty bitmap must be managed by bitops that take an int as - * position indicator. If we have a guest beyond that we will split off - * new subregions. The split must happen on a segment boundary (1MB). - */ -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) -#define SEG_MSK (~0xfffffULL) -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) static void s390_memory_init(ram_addr_t mem_size) { MemoryRegion *sysmem = get_system_memory(); diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c index 6e814c230b..6b1428a760 100644 --- a/target/s390x/kvm.c +++ b/target/s390x/kvm.c @@ -28,6 +28,7 @@ #include "cpu.h" #include "internal.h" #include "kvm_s390x.h" +#include "sysemu/kvm_int.h" #include "qapi/error.h" #include "qemu/error-report.h" #include "qemu/timer.h" @@ -122,6 +123,16 @@ #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \ (max_cpus + NR_LOCAL_IRQS)) +/* + * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages + * as the dirty bitmap must be managed by bitops that take an int as + * position indicator. If we have a guest beyond that we will split off + * new subregions. The split must happen on a segment boundary (1MB). + */ +#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) +#define SEG_MSK (~0xfffffULL) +#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) + static CPUWatchpoint hw_watchpoint; /* * We don't use a list because this structure is also used to transmit the @@ -347,6 +358,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) */ /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */ + kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES); return 0; } -- 2.18.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-07 15:32 ` [Qemu-devel] [PATCH for-4.2 v5 " Igor Mammedov @ 2019-08-20 16:07 ` Cornelia Huck 2019-08-27 12:56 ` Igor Mammedov 0 siblings, 1 reply; 19+ messages in thread From: Cornelia Huck @ 2019-08-20 16:07 UTC (permalink / raw) To: Igor Mammedov; +Cc: thuth, david, qemu-devel, borntraeger, qemu-s390x, pbonzini On Wed, 7 Aug 2019 11:32:41 -0400 Igor Mammedov <imammedo@redhat.com> wrote: > Max memslot size supported by kvm on s390 is 8Tb, > move logic of splitting RAM in chunks upto 8T to KVM code. > > This way it will hide KVM specific restrictions in KVM code > and won't affect baord level design decisions. Which would allow > us to avoid misusing memory_region_allocate_system_memory() API > and eventually use a single hostmem backend for guest RAM. > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > --- > v5: > * move computation 'size -= slot_size' inside of loop body > (David Hildenbrand <david@redhat.com>) > v4: > * fix compilation issue > (Christian Borntraeger <borntraeger@de.ibm.com>) > * advance HVA along with GPA in kvm_set_phys_mem() > (Christian Borntraeger <borntraeger@de.ibm.com>) > > patch prepares only KVM side for switching to single RAM memory region > another patch will take care of dropping manual RAM partitioning in > s390 code. I may have lost track a bit -- what is the status of this patch (and the series)? > --- > include/sysemu/kvm_int.h | 1 + > accel/kvm/kvm-all.c | 82 ++++++++++++++++++++++++-------------- > hw/s390x/s390-virtio-ccw.c | 9 ----- > target/s390x/kvm.c | 12 ++++++ > 4 files changed, 64 insertions(+), 40 deletions(-) > > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h > index 31df465fdc..7f7520bce2 100644 > --- a/include/sysemu/kvm_int.h > +++ b/include/sysemu/kvm_int.h > @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { > void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, > AddressSpace *as, int as_id); > > +void kvm_set_max_memslot_size(hwaddr max_slot_size); > #endif > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index f450f25295..8153556335 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; > bool kvm_ioeventfd_any_length_allowed; > bool kvm_msi_use_devid; > static bool kvm_immediate_exit; > +static hwaddr kvm_max_slot_size = ~0; > > static const KVMCapabilityInfo kvm_required_capabilites[] = { > KVM_CAP_INFO(USER_MEMORY), > @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) > return NULL; > } > > +void kvm_set_max_memslot_size(hwaddr max_slot_size) > +{ > + g_assert( > + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size > + ); > + kvm_max_slot_size = max_slot_size; > +} > + > static void kvm_set_phys_mem(KVMMemoryListener *kml, > MemoryRegionSection *section, bool add) > { > @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > int err; > MemoryRegion *mr = section->mr; > bool writeable = !mr->readonly && !mr->rom_device; > - hwaddr start_addr, size; > + hwaddr start_addr, size, slot_size; > void *ram; > > if (!memory_region_is_ram(mr)) { > @@ -983,41 +992,52 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > kvm_slots_lock(kml); > > if (!add) { > - mem = kvm_lookup_matching_slot(kml, start_addr, size); > - if (!mem) { > - goto out; > - } > - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > - kvm_physical_sync_dirty_bitmap(kml, section); > - } > + do { > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); > + if (!mem) { > + goto out; > + } > + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > + kvm_physical_sync_dirty_bitmap(kml, section); > + } > > - /* unregister the slot */ > - g_free(mem->dirty_bmap); > - mem->dirty_bmap = NULL; > - mem->memory_size = 0; > - mem->flags = 0; > - err = kvm_set_user_memory_region(kml, mem, false); > - if (err) { > - fprintf(stderr, "%s: error unregistering slot: %s\n", > - __func__, strerror(-err)); > - abort(); > - } > + /* unregister the slot */ > + g_free(mem->dirty_bmap); > + mem->dirty_bmap = NULL; > + mem->memory_size = 0; > + mem->flags = 0; > + err = kvm_set_user_memory_region(kml, mem, false); > + if (err) { > + fprintf(stderr, "%s: error unregistering slot: %s\n", > + __func__, strerror(-err)); > + abort(); > + } > + start_addr += slot_size; > + size -= slot_size; > + } while (size); > goto out; > } > > /* register the new slot */ > - mem = kvm_alloc_slot(kml); > - mem->memory_size = size; > - mem->start_addr = start_addr; > - mem->ram = ram; > - mem->flags = kvm_mem_flags(mr); > - > - err = kvm_set_user_memory_region(kml, mem, true); > - if (err) { > - fprintf(stderr, "%s: error registering slot: %s\n", __func__, > - strerror(-err)); > - abort(); > - } > + do { > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > + mem = kvm_alloc_slot(kml); > + mem->memory_size = slot_size; > + mem->start_addr = start_addr; > + mem->ram = ram; > + mem->flags = kvm_mem_flags(mr); > + > + err = kvm_set_user_memory_region(kml, mem, true); > + if (err) { > + fprintf(stderr, "%s: error registering slot: %s\n", __func__, > + strerror(-err)); > + abort(); > + } > + start_addr += slot_size; > + ram += slot_size; > + size -= slot_size; > + } while (size); > > out: > kvm_slots_unlock(kml); > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index 5b6a9a4e55..0c03ffb7c7 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -151,15 +151,6 @@ static void virtio_ccw_register_hcalls(void) > virtio_ccw_hcall_early_printk); > } > > -/* > - * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages > - * as the dirty bitmap must be managed by bitops that take an int as > - * position indicator. If we have a guest beyond that we will split off > - * new subregions. The split must happen on a segment boundary (1MB). > - */ > -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) > -#define SEG_MSK (~0xfffffULL) > -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) > static void s390_memory_init(ram_addr_t mem_size) > { > MemoryRegion *sysmem = get_system_memory(); > diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > index 6e814c230b..6b1428a760 100644 > --- a/target/s390x/kvm.c > +++ b/target/s390x/kvm.c > @@ -28,6 +28,7 @@ > #include "cpu.h" > #include "internal.h" > #include "kvm_s390x.h" > +#include "sysemu/kvm_int.h" > #include "qapi/error.h" > #include "qemu/error-report.h" > #include "qemu/timer.h" > @@ -122,6 +123,16 @@ > #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \ > (max_cpus + NR_LOCAL_IRQS)) > > +/* > + * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages > + * as the dirty bitmap must be managed by bitops that take an int as > + * position indicator. If we have a guest beyond that we will split off > + * new subregions. The split must happen on a segment boundary (1MB). > + */ > +#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) > +#define SEG_MSK (~0xfffffULL) > +#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) > + > static CPUWatchpoint hw_watchpoint; > /* > * We don't use a list because this structure is also used to transmit the > @@ -347,6 +358,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > */ > /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */ > > + kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES); > return 0; > } > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-20 16:07 ` Cornelia Huck @ 2019-08-27 12:56 ` Igor Mammedov 2019-08-29 6:47 ` Christian Borntraeger 0 siblings, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-08-27 12:56 UTC (permalink / raw) To: Cornelia Huck; +Cc: thuth, david, qemu-devel, borntraeger, qemu-s390x, pbonzini On Tue, 20 Aug 2019 18:07:27 +0200 Cornelia Huck <cohuck@redhat.com> wrote: > On Wed, 7 Aug 2019 11:32:41 -0400 > Igor Mammedov <imammedo@redhat.com> wrote: > > > Max memslot size supported by kvm on s390 is 8Tb, > > move logic of splitting RAM in chunks upto 8T to KVM code. > > > > This way it will hide KVM specific restrictions in KVM code > > and won't affect baord level design decisions. Which would allow > > us to avoid misusing memory_region_allocate_system_memory() API > > and eventually use a single hostmem backend for guest RAM. > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > v5: > > * move computation 'size -= slot_size' inside of loop body > > (David Hildenbrand <david@redhat.com>) > > v4: > > * fix compilation issue > > (Christian Borntraeger <borntraeger@de.ibm.com>) > > * advance HVA along with GPA in kvm_set_phys_mem() > > (Christian Borntraeger <borntraeger@de.ibm.com>) > > > > patch prepares only KVM side for switching to single RAM memory region > > another patch will take care of dropping manual RAM partitioning in > > s390 code. > > I may have lost track a bit -- what is the status of this patch (and > the series)? Christian, could you test it on a host that have sufficient amount of RAM? PS: I've only simulated test on smaller host with KVM enabled (by reducing KVM_SLOT_MAX_BYTES size) to verify that KVM code splits RAM on smaller chunks. PS2: Also considering we decided to drop migration concerns there is no need to check migration on a large enough host anymore. > > > --- > > include/sysemu/kvm_int.h | 1 + > > accel/kvm/kvm-all.c | 82 ++++++++++++++++++++++++-------------- > > hw/s390x/s390-virtio-ccw.c | 9 ----- > > target/s390x/kvm.c | 12 ++++++ > > 4 files changed, 64 insertions(+), 40 deletions(-) > > > > diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h > > index 31df465fdc..7f7520bce2 100644 > > --- a/include/sysemu/kvm_int.h > > +++ b/include/sysemu/kvm_int.h > > @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener { > > void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml, > > AddressSpace *as, int as_id); > > > > +void kvm_set_max_memslot_size(hwaddr max_slot_size); > > #endif > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index f450f25295..8153556335 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -138,6 +138,7 @@ bool kvm_direct_msi_allowed; > > bool kvm_ioeventfd_any_length_allowed; > > bool kvm_msi_use_devid; > > static bool kvm_immediate_exit; > > +static hwaddr kvm_max_slot_size = ~0; > > > > static const KVMCapabilityInfo kvm_required_capabilites[] = { > > KVM_CAP_INFO(USER_MEMORY), > > @@ -951,6 +952,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list) > > return NULL; > > } > > > > +void kvm_set_max_memslot_size(hwaddr max_slot_size) > > +{ > > + g_assert( > > + ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size > > + ); > > + kvm_max_slot_size = max_slot_size; > > +} > > + > > static void kvm_set_phys_mem(KVMMemoryListener *kml, > > MemoryRegionSection *section, bool add) > > { > > @@ -958,7 +967,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > int err; > > MemoryRegion *mr = section->mr; > > bool writeable = !mr->readonly && !mr->rom_device; > > - hwaddr start_addr, size; > > + hwaddr start_addr, size, slot_size; > > void *ram; > > > > if (!memory_region_is_ram(mr)) { > > @@ -983,41 +992,52 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, > > kvm_slots_lock(kml); > > > > if (!add) { > > - mem = kvm_lookup_matching_slot(kml, start_addr, size); > > - if (!mem) { > > - goto out; > > - } > > - if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > - kvm_physical_sync_dirty_bitmap(kml, section); > > - } > > + do { > > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > > + mem = kvm_lookup_matching_slot(kml, start_addr, slot_size); > > + if (!mem) { > > + goto out; > > + } > > + if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { > > + kvm_physical_sync_dirty_bitmap(kml, section); > > + } > > > > - /* unregister the slot */ > > - g_free(mem->dirty_bmap); > > - mem->dirty_bmap = NULL; > > - mem->memory_size = 0; > > - mem->flags = 0; > > - err = kvm_set_user_memory_region(kml, mem, false); > > - if (err) { > > - fprintf(stderr, "%s: error unregistering slot: %s\n", > > - __func__, strerror(-err)); > > - abort(); > > - } > > + /* unregister the slot */ > > + g_free(mem->dirty_bmap); > > + mem->dirty_bmap = NULL; > > + mem->memory_size = 0; > > + mem->flags = 0; > > + err = kvm_set_user_memory_region(kml, mem, false); > > + if (err) { > > + fprintf(stderr, "%s: error unregistering slot: %s\n", > > + __func__, strerror(-err)); > > + abort(); > > + } > > + start_addr += slot_size; > > + size -= slot_size; > > + } while (size); > > goto out; > > } > > > > /* register the new slot */ > > - mem = kvm_alloc_slot(kml); > > - mem->memory_size = size; > > - mem->start_addr = start_addr; > > - mem->ram = ram; > > - mem->flags = kvm_mem_flags(mr); > > - > > - err = kvm_set_user_memory_region(kml, mem, true); > > - if (err) { > > - fprintf(stderr, "%s: error registering slot: %s\n", __func__, > > - strerror(-err)); > > - abort(); > > - } > > + do { > > + slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size; > > + mem = kvm_alloc_slot(kml); > > + mem->memory_size = slot_size; > > + mem->start_addr = start_addr; > > + mem->ram = ram; > > + mem->flags = kvm_mem_flags(mr); > > + > > + err = kvm_set_user_memory_region(kml, mem, true); > > + if (err) { > > + fprintf(stderr, "%s: error registering slot: %s\n", __func__, > > + strerror(-err)); > > + abort(); > > + } > > + start_addr += slot_size; > > + ram += slot_size; > > + size -= slot_size; > > + } while (size); > > > > out: > > kvm_slots_unlock(kml); > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > > index 5b6a9a4e55..0c03ffb7c7 100644 > > --- a/hw/s390x/s390-virtio-ccw.c > > +++ b/hw/s390x/s390-virtio-ccw.c > > @@ -151,15 +151,6 @@ static void virtio_ccw_register_hcalls(void) > > virtio_ccw_hcall_early_printk); > > } > > > > -/* > > - * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages > > - * as the dirty bitmap must be managed by bitops that take an int as > > - * position indicator. If we have a guest beyond that we will split off > > - * new subregions. The split must happen on a segment boundary (1MB). > > - */ > > -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) > > -#define SEG_MSK (~0xfffffULL) > > -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) > > static void s390_memory_init(ram_addr_t mem_size) > > { > > MemoryRegion *sysmem = get_system_memory(); > > diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c > > index 6e814c230b..6b1428a760 100644 > > --- a/target/s390x/kvm.c > > +++ b/target/s390x/kvm.c > > @@ -28,6 +28,7 @@ > > #include "cpu.h" > > #include "internal.h" > > #include "kvm_s390x.h" > > +#include "sysemu/kvm_int.h" > > #include "qapi/error.h" > > #include "qemu/error-report.h" > > #include "qemu/timer.h" > > @@ -122,6 +123,16 @@ > > #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \ > > (max_cpus + NR_LOCAL_IRQS)) > > > > +/* > > + * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages > > + * as the dirty bitmap must be managed by bitops that take an int as > > + * position indicator. If we have a guest beyond that we will split off > > + * new subregions. The split must happen on a segment boundary (1MB). > > + */ > > +#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1) > > +#define SEG_MSK (~0xfffffULL) > > +#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK) > > + > > static CPUWatchpoint hw_watchpoint; > > /* > > * We don't use a list because this structure is also used to transmit the > > @@ -347,6 +358,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > > */ > > /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */ > > > > + kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES); > > return 0; > > } > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-27 12:56 ` Igor Mammedov @ 2019-08-29 6:47 ` Christian Borntraeger 2019-08-29 12:04 ` Igor Mammedov 0 siblings, 1 reply; 19+ messages in thread From: Christian Borntraeger @ 2019-08-29 6:47 UTC (permalink / raw) To: Igor Mammedov, Cornelia Huck Cc: qemu-s390x, pbonzini, thuth, qemu-devel, david On 27.08.19 14:56, Igor Mammedov wrote: > On Tue, 20 Aug 2019 18:07:27 +0200 > Cornelia Huck <cohuck@redhat.com> wrote: > >> On Wed, 7 Aug 2019 11:32:41 -0400 >> Igor Mammedov <imammedo@redhat.com> wrote: >> >>> Max memslot size supported by kvm on s390 is 8Tb, >>> move logic of splitting RAM in chunks upto 8T to KVM code. >>> >>> This way it will hide KVM specific restrictions in KVM code >>> and won't affect baord level design decisions. Which would allow >>> us to avoid misusing memory_region_allocate_system_memory() API >>> and eventually use a single hostmem backend for guest RAM. >>> >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>> --- >>> v5: >>> * move computation 'size -= slot_size' inside of loop body >>> (David Hildenbrand <david@redhat.com>) >>> v4: >>> * fix compilation issue >>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>> * advance HVA along with GPA in kvm_set_phys_mem() >>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>> >>> patch prepares only KVM side for switching to single RAM memory region >>> another patch will take care of dropping manual RAM partitioning in >>> s390 code. >> >> I may have lost track a bit -- what is the status of this patch (and >> the series)? > > Christian, > > could you test it on a host that have sufficient amount of RAM? This version looks good. I was able to start a 9TB guest. [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 The only question is if we want to fix the weird alignment (0x7fffff00000) when we already add a migration barrier for uber-large guests. Maybe we could split at 4TB to avoid future problem with larger page sizes? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-29 6:47 ` Christian Borntraeger @ 2019-08-29 12:04 ` Igor Mammedov 2019-08-29 12:07 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger 0 siblings, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-08-29 12:04 UTC (permalink / raw) To: Christian Borntraeger Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On Thu, 29 Aug 2019 08:47:49 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 27.08.19 14:56, Igor Mammedov wrote: > > On Tue, 20 Aug 2019 18:07:27 +0200 > > Cornelia Huck <cohuck@redhat.com> wrote: > > > >> On Wed, 7 Aug 2019 11:32:41 -0400 > >> Igor Mammedov <imammedo@redhat.com> wrote: > >> > >>> Max memslot size supported by kvm on s390 is 8Tb, > >>> move logic of splitting RAM in chunks upto 8T to KVM code. > >>> > >>> This way it will hide KVM specific restrictions in KVM code > >>> and won't affect baord level design decisions. Which would allow > >>> us to avoid misusing memory_region_allocate_system_memory() API > >>> and eventually use a single hostmem backend for guest RAM. > >>> > >>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > >>> --- > >>> v5: > >>> * move computation 'size -= slot_size' inside of loop body > >>> (David Hildenbrand <david@redhat.com>) > >>> v4: > >>> * fix compilation issue > >>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>> * advance HVA along with GPA in kvm_set_phys_mem() > >>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>> > >>> patch prepares only KVM side for switching to single RAM memory region > >>> another patch will take care of dropping manual RAM partitioning in > >>> s390 code. > >> > >> I may have lost track a bit -- what is the status of this patch (and > >> the series)? > > > > Christian, > > > > could you test it on a host that have sufficient amount of RAM? > > > This version looks good. I was able to start a 9TB guest. > [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 > [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 > > The only question is if we want to fix the weird alignment (0x7fffff00000) when > we already add a migration barrier for uber-large guests. > Maybe we could split at 4TB to avoid future problem with larger page sizes? That probably should be a separate patch on top. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-29 12:04 ` Igor Mammedov @ 2019-08-29 12:07 ` Christian Borntraeger 2019-08-29 12:31 ` Igor Mammedov 0 siblings, 1 reply; 19+ messages in thread From: Christian Borntraeger @ 2019-08-29 12:07 UTC (permalink / raw) To: Igor Mammedov Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On 29.08.19 14:04, Igor Mammedov wrote: > On Thu, 29 Aug 2019 08:47:49 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> On 27.08.19 14:56, Igor Mammedov wrote: >>> On Tue, 20 Aug 2019 18:07:27 +0200 >>> Cornelia Huck <cohuck@redhat.com> wrote: >>> >>>> On Wed, 7 Aug 2019 11:32:41 -0400 >>>> Igor Mammedov <imammedo@redhat.com> wrote: >>>> >>>>> Max memslot size supported by kvm on s390 is 8Tb, >>>>> move logic of splitting RAM in chunks upto 8T to KVM code. >>>>> >>>>> This way it will hide KVM specific restrictions in KVM code >>>>> and won't affect baord level design decisions. Which would allow >>>>> us to avoid misusing memory_region_allocate_system_memory() API >>>>> and eventually use a single hostmem backend for guest RAM. >>>>> >>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>>>> --- >>>>> v5: >>>>> * move computation 'size -= slot_size' inside of loop body >>>>> (David Hildenbrand <david@redhat.com>) >>>>> v4: >>>>> * fix compilation issue >>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>> * advance HVA along with GPA in kvm_set_phys_mem() >>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>> >>>>> patch prepares only KVM side for switching to single RAM memory region >>>>> another patch will take care of dropping manual RAM partitioning in >>>>> s390 code. >>>> >>>> I may have lost track a bit -- what is the status of this patch (and >>>> the series)? >>> >>> Christian, >>> >>> could you test it on a host that have sufficient amount of RAM? >> >> >> This version looks good. I was able to start a 9TB guest. >> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 >> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 >> >> The only question is if we want to fix the weird alignment (0x7fffff00000) when >> we already add a migration barrier for uber-large guests. >> Maybe we could split at 4TB to avoid future problem with larger page sizes? > That probably should be a separate patch on top. Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-29 12:07 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger @ 2019-08-29 12:31 ` Igor Mammedov 2019-08-29 12:41 ` Christian Borntraeger 0 siblings, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-08-29 12:31 UTC (permalink / raw) To: Christian Borntraeger Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On Thu, 29 Aug 2019 14:07:44 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 29.08.19 14:04, Igor Mammedov wrote: > > On Thu, 29 Aug 2019 08:47:49 +0200 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> On 27.08.19 14:56, Igor Mammedov wrote: > >>> On Tue, 20 Aug 2019 18:07:27 +0200 > >>> Cornelia Huck <cohuck@redhat.com> wrote: > >>> > >>>> On Wed, 7 Aug 2019 11:32:41 -0400 > >>>> Igor Mammedov <imammedo@redhat.com> wrote: > >>>> > >>>>> Max memslot size supported by kvm on s390 is 8Tb, > >>>>> move logic of splitting RAM in chunks upto 8T to KVM code. > >>>>> > >>>>> This way it will hide KVM specific restrictions in KVM code > >>>>> and won't affect baord level design decisions. Which would allow > >>>>> us to avoid misusing memory_region_allocate_system_memory() API > >>>>> and eventually use a single hostmem backend for guest RAM. > >>>>> > >>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > >>>>> --- > >>>>> v5: > >>>>> * move computation 'size -= slot_size' inside of loop body > >>>>> (David Hildenbrand <david@redhat.com>) > >>>>> v4: > >>>>> * fix compilation issue > >>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>> * advance HVA along with GPA in kvm_set_phys_mem() > >>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>> > >>>>> patch prepares only KVM side for switching to single RAM memory region > >>>>> another patch will take care of dropping manual RAM partitioning in > >>>>> s390 code. > >>>> > >>>> I may have lost track a bit -- what is the status of this patch (and > >>>> the series)? > >>> > >>> Christian, > >>> > >>> could you test it on a host that have sufficient amount of RAM? > >> > >> > >> This version looks good. I was able to start a 9TB guest. > >> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 > >> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 > > >> The only question is if we want to fix the weird alignment (0x7fffff00000) when > >> we already add a migration barrier for uber-large guests. > >> Maybe we could split at 4TB to avoid future problem with larger page sizes? > > That probably should be a separate patch on top. > > Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? it should not affect other QEMU parts and migration (to my limited understanding of it), we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by creating several memory regions instead of one as described in [2/2] commit message. Also could you also test migration of +9Tb guest, to check that nothing where broken by accident in QEMU migration code? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-29 12:31 ` Igor Mammedov @ 2019-08-29 12:41 ` Christian Borntraeger 2019-08-30 9:41 ` Igor Mammedov 0 siblings, 1 reply; 19+ messages in thread From: Christian Borntraeger @ 2019-08-29 12:41 UTC (permalink / raw) To: Igor Mammedov Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On 29.08.19 14:31, Igor Mammedov wrote: > On Thu, 29 Aug 2019 14:07:44 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> On 29.08.19 14:04, Igor Mammedov wrote: >>> On Thu, 29 Aug 2019 08:47:49 +0200 >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>> >>>> On 27.08.19 14:56, Igor Mammedov wrote: >>>>> On Tue, 20 Aug 2019 18:07:27 +0200 >>>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>>> >>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 >>>>>> Igor Mammedov <imammedo@redhat.com> wrote: >>>>>> >>>>>>> Max memslot size supported by kvm on s390 is 8Tb, >>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. >>>>>>> >>>>>>> This way it will hide KVM specific restrictions in KVM code >>>>>>> and won't affect baord level design decisions. Which would allow >>>>>>> us to avoid misusing memory_region_allocate_system_memory() API >>>>>>> and eventually use a single hostmem backend for guest RAM. >>>>>>> >>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>>>>>> --- >>>>>>> v5: >>>>>>> * move computation 'size -= slot_size' inside of loop body >>>>>>> (David Hildenbrand <david@redhat.com>) >>>>>>> v4: >>>>>>> * fix compilation issue >>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() >>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>>>> >>>>>>> patch prepares only KVM side for switching to single RAM memory region >>>>>>> another patch will take care of dropping manual RAM partitioning in >>>>>>> s390 code. >>>>>> >>>>>> I may have lost track a bit -- what is the status of this patch (and >>>>>> the series)? >>>>> >>>>> Christian, >>>>> >>>>> could you test it on a host that have sufficient amount of RAM? >>>> >>>> >>>> This version looks good. I was able to start a 9TB guest. >>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 >>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 >> >>>> The only question is if we want to fix the weird alignment (0x7fffff00000) when >>>> we already add a migration barrier for uber-large guests. >>>> Maybe we could split at 4TB to avoid future problem with larger page sizes? >>> That probably should be a separate patch on top. >> >> Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? > > it should not affect other QEMU parts and migration (to my limited understanding of it), > we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by > creating several memory regions instead of one as described in [2/2] commit message. > > Also could you also test migration of +9Tb guest, to check that nothing where broken by > accident in QEMU migration code? I only have one server that is large enough :-/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-29 12:41 ` Christian Borntraeger @ 2019-08-30 9:41 ` Igor Mammedov 2019-08-30 16:19 ` Christian Borntraeger 0 siblings, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-08-30 9:41 UTC (permalink / raw) To: Christian Borntraeger Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On Thu, 29 Aug 2019 14:41:13 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 29.08.19 14:31, Igor Mammedov wrote: > > On Thu, 29 Aug 2019 14:07:44 +0200 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> On 29.08.19 14:04, Igor Mammedov wrote: > >>> On Thu, 29 Aug 2019 08:47:49 +0200 > >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>> > >>>> On 27.08.19 14:56, Igor Mammedov wrote: > >>>>> On Tue, 20 Aug 2019 18:07:27 +0200 > >>>>> Cornelia Huck <cohuck@redhat.com> wrote: > >>>>> > >>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 > >>>>>> Igor Mammedov <imammedo@redhat.com> wrote: > >>>>>> > >>>>>>> Max memslot size supported by kvm on s390 is 8Tb, > >>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. > >>>>>>> > >>>>>>> This way it will hide KVM specific restrictions in KVM code > >>>>>>> and won't affect baord level design decisions. Which would allow > >>>>>>> us to avoid misusing memory_region_allocate_system_memory() API > >>>>>>> and eventually use a single hostmem backend for guest RAM. > >>>>>>> > >>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > >>>>>>> --- > >>>>>>> v5: > >>>>>>> * move computation 'size -= slot_size' inside of loop body > >>>>>>> (David Hildenbrand <david@redhat.com>) > >>>>>>> v4: > >>>>>>> * fix compilation issue > >>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() > >>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>>>> > >>>>>>> patch prepares only KVM side for switching to single RAM memory region > >>>>>>> another patch will take care of dropping manual RAM partitioning in > >>>>>>> s390 code. > >>>>>> > >>>>>> I may have lost track a bit -- what is the status of this patch (and > >>>>>> the series)? > >>>>> > >>>>> Christian, > >>>>> > >>>>> could you test it on a host that have sufficient amount of RAM? > >>>> > >>>> > >>>> This version looks good. I was able to start a 9TB guest. > >>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 > >>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 > >> > >>>> The only question is if we want to fix the weird alignment (0x7fffff00000) when > >>>> we already add a migration barrier for uber-large guests. > >>>> Maybe we could split at 4TB to avoid future problem with larger page sizes? > >>> That probably should be a separate patch on top. > >> > >> Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? > > > > it should not affect other QEMU parts and migration (to my limited understanding of it), > > we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by > > creating several memory regions instead of one as described in [2/2] commit message. > > > > Also could you also test migration of +9Tb guest, to check that nothing where broken by > > accident in QEMU migration code? > > I only have one server that is large enough :-/ Could you test offline migration on it (to a file and restore from it)? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-30 9:41 ` Igor Mammedov @ 2019-08-30 16:19 ` Christian Borntraeger 2019-09-02 13:49 ` Igor Mammedov 0 siblings, 1 reply; 19+ messages in thread From: Christian Borntraeger @ 2019-08-30 16:19 UTC (permalink / raw) To: Igor Mammedov Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On 30.08.19 11:41, Igor Mammedov wrote: > On Thu, 29 Aug 2019 14:41:13 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> On 29.08.19 14:31, Igor Mammedov wrote: >>> On Thu, 29 Aug 2019 14:07:44 +0200 >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>> >>>> On 29.08.19 14:04, Igor Mammedov wrote: >>>>> On Thu, 29 Aug 2019 08:47:49 +0200 >>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>>>> >>>>>> On 27.08.19 14:56, Igor Mammedov wrote: >>>>>>> On Tue, 20 Aug 2019 18:07:27 +0200 >>>>>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>>>>> >>>>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 >>>>>>>> Igor Mammedov <imammedo@redhat.com> wrote: >>>>>>>> >>>>>>>>> Max memslot size supported by kvm on s390 is 8Tb, >>>>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. >>>>>>>>> >>>>>>>>> This way it will hide KVM specific restrictions in KVM code >>>>>>>>> and won't affect baord level design decisions. Which would allow >>>>>>>>> us to avoid misusing memory_region_allocate_system_memory() API >>>>>>>>> and eventually use a single hostmem backend for guest RAM. >>>>>>>>> >>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>>>>>>>> --- >>>>>>>>> v5: >>>>>>>>> * move computation 'size -= slot_size' inside of loop body >>>>>>>>> (David Hildenbrand <david@redhat.com>) >>>>>>>>> v4: >>>>>>>>> * fix compilation issue >>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() >>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>>>>>> >>>>>>>>> patch prepares only KVM side for switching to single RAM memory region >>>>>>>>> another patch will take care of dropping manual RAM partitioning in >>>>>>>>> s390 code. >>>>>>>> >>>>>>>> I may have lost track a bit -- what is the status of this patch (and >>>>>>>> the series)? >>>>>>> >>>>>>> Christian, >>>>>>> >>>>>>> could you test it on a host that have sufficient amount of RAM? >>>>>> >>>>>> >>>>>> This version looks good. I was able to start a 9TB guest. >>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 >>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 >>>> >>>>>> The only question is if we want to fix the weird alignment (0x7fffff00000) when >>>>>> we already add a migration barrier for uber-large guests. >>>>>> Maybe we could split at 4TB to avoid future problem with larger page sizes? >>>>> That probably should be a separate patch on top. >>>> >>>> Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? >>> >>> it should not affect other QEMU parts and migration (to my limited understanding of it), >>> we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by >>> creating several memory regions instead of one as described in [2/2] commit message. >>> >>> Also could you also test migration of +9Tb guest, to check that nothing where broken by >>> accident in QEMU migration code? >> >> I only have one server that is large enough :-/ > Could you test offline migration on it (to a file and restore from it)? I tested migration with a hacked QEMU (basically split in KVM code at 1GB instead of 8TB) and the restore from file failed with data corruption in the guest. The current code does work when I use small memslots. No idea yet what is wrong. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-08-30 16:19 ` Christian Borntraeger @ 2019-09-02 13:49 ` Igor Mammedov 2019-09-03 6:57 ` Christian Borntraeger 0 siblings, 1 reply; 19+ messages in thread From: Igor Mammedov @ 2019-09-02 13:49 UTC (permalink / raw) To: Christian Borntraeger Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On Fri, 30 Aug 2019 18:19:29 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 30.08.19 11:41, Igor Mammedov wrote: > > On Thu, 29 Aug 2019 14:41:13 +0200 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> On 29.08.19 14:31, Igor Mammedov wrote: > >>> On Thu, 29 Aug 2019 14:07:44 +0200 > >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>> > >>>> On 29.08.19 14:04, Igor Mammedov wrote: > >>>>> On Thu, 29 Aug 2019 08:47:49 +0200 > >>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>>>> > >>>>>> On 27.08.19 14:56, Igor Mammedov wrote: > >>>>>>> On Tue, 20 Aug 2019 18:07:27 +0200 > >>>>>>> Cornelia Huck <cohuck@redhat.com> wrote: > >>>>>>> > >>>>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 > >>>>>>>> Igor Mammedov <imammedo@redhat.com> wrote: > >>>>>>>> > >>>>>>>>> Max memslot size supported by kvm on s390 is 8Tb, > >>>>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. > >>>>>>>>> > >>>>>>>>> This way it will hide KVM specific restrictions in KVM code > >>>>>>>>> and won't affect baord level design decisions. Which would allow > >>>>>>>>> us to avoid misusing memory_region_allocate_system_memory() API > >>>>>>>>> and eventually use a single hostmem backend for guest RAM. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > >>>>>>>>> --- > >>>>>>>>> v5: > >>>>>>>>> * move computation 'size -= slot_size' inside of loop body > >>>>>>>>> (David Hildenbrand <david@redhat.com>) > >>>>>>>>> v4: > >>>>>>>>> * fix compilation issue > >>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() > >>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>>>>>> > >>>>>>>>> patch prepares only KVM side for switching to single RAM memory region > >>>>>>>>> another patch will take care of dropping manual RAM partitioning in > >>>>>>>>> s390 code. > >>>>>>>> > >>>>>>>> I may have lost track a bit -- what is the status of this patch (and > >>>>>>>> the series)? > >>>>>>> > >>>>>>> Christian, > >>>>>>> > >>>>>>> could you test it on a host that have sufficient amount of RAM? > >>>>>> > >>>>>> > >>>>>> This version looks good. I was able to start a 9TB guest. > >>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 > >>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 > >>>> > >>>>>> The only question is if we want to fix the weird alignment (0x7fffff00000) when > >>>>>> we already add a migration barrier for uber-large guests. > >>>>>> Maybe we could split at 4TB to avoid future problem with larger page sizes? > >>>>> That probably should be a separate patch on top. > >>>> > >>>> Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? > >>> > >>> it should not affect other QEMU parts and migration (to my limited understanding of it), > >>> we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by > >>> creating several memory regions instead of one as described in [2/2] commit message. > >>> > >>> Also could you also test migration of +9Tb guest, to check that nothing where broken by > >>> accident in QEMU migration code? > >> > >> I only have one server that is large enough :-/ > > Could you test offline migration on it (to a file and restore from it)? > > I tested migration with a hacked QEMU (basically split in KVM code at 1GB instead of 8TB) and > the restore from file failed with data corruption in the guest. The current code > does work when I use small memslots. No idea yet what is wrong. I've tested 2Gb (max, I can test) guest (also hacked up version) and it worked for me. How do you test it and detect corruption so I could try to reproduce it locally? (given it worked before, there is no much hope but I could try) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-09-02 13:49 ` Igor Mammedov @ 2019-09-03 6:57 ` Christian Borntraeger 2019-09-16 13:16 ` Igor Mammedov 0 siblings, 1 reply; 19+ messages in thread From: Christian Borntraeger @ 2019-09-03 6:57 UTC (permalink / raw) To: Igor Mammedov Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On 02.09.19 15:49, Igor Mammedov wrote: > On Fri, 30 Aug 2019 18:19:29 +0200 > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >> On 30.08.19 11:41, Igor Mammedov wrote: >>> On Thu, 29 Aug 2019 14:41:13 +0200 >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>> >>>> On 29.08.19 14:31, Igor Mammedov wrote: >>>>> On Thu, 29 Aug 2019 14:07:44 +0200 >>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>>>> >>>>>> On 29.08.19 14:04, Igor Mammedov wrote: >>>>>>> On Thu, 29 Aug 2019 08:47:49 +0200 >>>>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: >>>>>>> >>>>>>>> On 27.08.19 14:56, Igor Mammedov wrote: >>>>>>>>> On Tue, 20 Aug 2019 18:07:27 +0200 >>>>>>>>> Cornelia Huck <cohuck@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 >>>>>>>>>> Igor Mammedov <imammedo@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> Max memslot size supported by kvm on s390 is 8Tb, >>>>>>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. >>>>>>>>>>> >>>>>>>>>>> This way it will hide KVM specific restrictions in KVM code >>>>>>>>>>> and won't affect baord level design decisions. Which would allow >>>>>>>>>>> us to avoid misusing memory_region_allocate_system_memory() API >>>>>>>>>>> and eventually use a single hostmem backend for guest RAM. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>>>>>>>>>> --- >>>>>>>>>>> v5: >>>>>>>>>>> * move computation 'size -= slot_size' inside of loop body >>>>>>>>>>> (David Hildenbrand <david@redhat.com>) >>>>>>>>>>> v4: >>>>>>>>>>> * fix compilation issue >>>>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>>>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() >>>>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) >>>>>>>>>>> >>>>>>>>>>> patch prepares only KVM side for switching to single RAM memory region >>>>>>>>>>> another patch will take care of dropping manual RAM partitioning in >>>>>>>>>>> s390 code. >>>>>>>>>> >>>>>>>>>> I may have lost track a bit -- what is the status of this patch (and >>>>>>>>>> the series)? >>>>>>>>> >>>>>>>>> Christian, >>>>>>>>> >>>>>>>>> could you test it on a host that have sufficient amount of RAM? >>>>>>>> >>>>>>>> >>>>>>>> This version looks good. I was able to start a 9TB guest. >>>>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 >>>>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 >>>>>> >>>>>>>> The only question is if we want to fix the weird alignment (0x7fffff00000) when >>>>>>>> we already add a migration barrier for uber-large guests. >>>>>>>> Maybe we could split at 4TB to avoid future problem with larger page sizes? >>>>>>> That probably should be a separate patch on top. >>>>>> >>>>>> Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? >>>>> >>>>> it should not affect other QEMU parts and migration (to my limited understanding of it), >>>>> we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by >>>>> creating several memory regions instead of one as described in [2/2] commit message. >>>>> >>>>> Also could you also test migration of +9Tb guest, to check that nothing where broken by >>>>> accident in QEMU migration code? >>>> >>>> I only have one server that is large enough :-/ >>> Could you test offline migration on it (to a file and restore from it)? >> >> I tested migration with a hacked QEMU (basically split in KVM code at 1GB instead of 8TB) and >> the restore from file failed with data corruption in the guest. The current code >> does work when I use small memslots. No idea yet what is wrong. > > I've tested 2Gb (max, I can test) guest (also hacked up version) > and it worked for me. > How do you test it and detect corruption so I could try to reproduce it locally? > (given it worked before, there is no much hope but I could try) I basically started a guest with just kernel and ramdisk on the command line and then in the monitor I did migrate "exec: cat > savefile" and then I restarted the guest with -incoming "exec: cat savefile" the guest then very quickly crashed with random kernel oopses. Using libvirts managedsave should work as well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [qemu-s390x] [PATCH for-4.2 v5 1/2] kvm: s390: split too big memory section on several memslots 2019-09-03 6:57 ` Christian Borntraeger @ 2019-09-16 13:16 ` Igor Mammedov 0 siblings, 0 replies; 19+ messages in thread From: Igor Mammedov @ 2019-09-16 13:16 UTC (permalink / raw) To: Christian Borntraeger Cc: thuth, david, Cornelia Huck, qemu-devel, qemu-s390x, pbonzini On Tue, 3 Sep 2019 08:57:38 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 02.09.19 15:49, Igor Mammedov wrote: > > On Fri, 30 Aug 2019 18:19:29 +0200 > > Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > >> On 30.08.19 11:41, Igor Mammedov wrote: > >>> On Thu, 29 Aug 2019 14:41:13 +0200 > >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>> > >>>> On 29.08.19 14:31, Igor Mammedov wrote: > >>>>> On Thu, 29 Aug 2019 14:07:44 +0200 > >>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>>>> > >>>>>> On 29.08.19 14:04, Igor Mammedov wrote: > >>>>>>> On Thu, 29 Aug 2019 08:47:49 +0200 > >>>>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote: > >>>>>>> > >>>>>>>> On 27.08.19 14:56, Igor Mammedov wrote: > >>>>>>>>> On Tue, 20 Aug 2019 18:07:27 +0200 > >>>>>>>>> Cornelia Huck <cohuck@redhat.com> wrote: > >>>>>>>>> > >>>>>>>>>> On Wed, 7 Aug 2019 11:32:41 -0400 > >>>>>>>>>> Igor Mammedov <imammedo@redhat.com> wrote: > >>>>>>>>>> > >>>>>>>>>>> Max memslot size supported by kvm on s390 is 8Tb, > >>>>>>>>>>> move logic of splitting RAM in chunks upto 8T to KVM code. > >>>>>>>>>>> > >>>>>>>>>>> This way it will hide KVM specific restrictions in KVM code > >>>>>>>>>>> and won't affect baord level design decisions. Which would allow > >>>>>>>>>>> us to avoid misusing memory_region_allocate_system_memory() API > >>>>>>>>>>> and eventually use a single hostmem backend for guest RAM. > >>>>>>>>>>> > >>>>>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> > >>>>>>>>>>> --- > >>>>>>>>>>> v5: > >>>>>>>>>>> * move computation 'size -= slot_size' inside of loop body > >>>>>>>>>>> (David Hildenbrand <david@redhat.com>) > >>>>>>>>>>> v4: > >>>>>>>>>>> * fix compilation issue > >>>>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>>>>>>>> * advance HVA along with GPA in kvm_set_phys_mem() > >>>>>>>>>>> (Christian Borntraeger <borntraeger@de.ibm.com>) > >>>>>>>>>>> > >>>>>>>>>>> patch prepares only KVM side for switching to single RAM memory region > >>>>>>>>>>> another patch will take care of dropping manual RAM partitioning in > >>>>>>>>>>> s390 code. > >>>>>>>>>> > >>>>>>>>>> I may have lost track a bit -- what is the status of this patch (and > >>>>>>>>>> the series)? > >>>>>>>>> > >>>>>>>>> Christian, > >>>>>>>>> > >>>>>>>>> could you test it on a host that have sufficient amount of RAM? > >>>>>>>> > >>>>>>>> > >>>>>>>> This version looks good. I was able to start a 9TB guest. > >>>>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0, memory_size=8796091973632, userspace_addr=0x3ffee700000}) = 0 > >>>>>>>> [pid 215723] ioctl(10, KVM_SET_USER_MEMORY_REGION, {slot=1, flags=0, guest_phys_addr=0x7fffff00000, memory_size=1099512676352, userspace_addr=0xbffee600000}) = 0 > >>>>>> > >>>>>>>> The only question is if we want to fix the weird alignment (0x7fffff00000) when > >>>>>>>> we already add a migration barrier for uber-large guests. > >>>>>>>> Maybe we could split at 4TB to avoid future problem with larger page sizes? > >>>>>>> That probably should be a separate patch on top. > >>>>>> > >>>>>> Right. The split in KVM code is transparent to migration and other parts of QEMU, correct? > >>>>> > >>>>> it should not affect other QEMU parts and migration (to my limited understanding of it), > >>>>> we are passing to KVM memory slots upto KVM_SLOT_MAX_BYTES as we were doing before by > >>>>> creating several memory regions instead of one as described in [2/2] commit message. > >>>>> > >>>>> Also could you also test migration of +9Tb guest, to check that nothing where broken by > >>>>> accident in QEMU migration code? > >>>> > >>>> I only have one server that is large enough :-/ > >>> Could you test offline migration on it (to a file and restore from it)? > >> > >> I tested migration with a hacked QEMU (basically split in KVM code at 1GB instead of 8TB) and > >> the restore from file failed with data corruption in the guest. The current code > >> does work when I use small memslots. No idea yet what is wrong. > > > > I've tested 2Gb (max, I can test) guest (also hacked up version) > > and it worked for me. > > How do you test it and detect corruption so I could try to reproduce it locally? > > (given it worked before, there is no much hope but I could try) > > I basically started a guest with just kernel and ramdisk on the command line and > then in the monitor I did > migrate "exec: cat > savefile" > and then I restarted the guest with > -incoming "exec: cat savefile" > > the guest then very quickly crashed with random kernel oopses. Well, I wasn't able to reproduce that. So I've looked at the kvm part of migration and it turned out migration didn't even start as KVM was concerned. Issue was in that migration related kvm code parts assumed 1:1 relation between MemorySection and memslot which isn't true anymore. I'll respin v6 with that fixed (i.e. make sure that kvm parts can handle 1:n MemorySection:memslots invariant). PS: I've tested (1Gb segments hack) pinpong migration (via file) with Fedora 29 in boot loop. > Using libvirts managedsave should work as well. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH for-4.2 v4 2/2] s390: do not call memory_region_allocate_system_memory() multiple times 2019-08-06 9:48 [Qemu-devel] [PATCH for-4.2 v4 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov @ 2019-08-06 9:48 ` Igor Mammedov 1 sibling, 0 replies; 19+ messages in thread From: Igor Mammedov @ 2019-08-06 9:48 UTC (permalink / raw) To: qemu-devel; +Cc: thuth, david, cohuck, borntraeger, qemu-s390x, pbonzini s390 was trying to solve limited KVM memslot size issue by abusing memory_region_allocate_system_memory(), which breaks API contract where the function might be called only once. Beside an invalid use of API, the approach also introduced migration issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in migration stream as separate RAMBlocks. After discussion [1], it was agreed to break migration from older QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12) and considered to be not actually used downstream). Migration should keep working for guests with less than 8TB and for more than 8TB with QEMU 4.2 and newer binary. In case user tries to migrate more than 8TB guest, between incompatible QEMU versions, migration should fail gracefully due to non-exiting RAMBlock ID or RAMBlock size mismatch. Taking in account above and that now KVM code is able to split too big MemorySection into several memslots, stop abusing memory_region_allocate_system_memory() and use only one memory region for RAM. 1) [PATCH RFC v2 4/4] s390: do not call memory_region_allocate_system_memory() multiple times Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> --- v3: - drop migration compat code PS: I don't have access to a suitable system to test it. --- hw/s390x/s390-virtio-ccw.c | 21 +++------------------ 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 0c03ffb7c7..e30058df38 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -154,27 +154,12 @@ static void virtio_ccw_register_hcalls(void) static void s390_memory_init(ram_addr_t mem_size) { MemoryRegion *sysmem = get_system_memory(); - ram_addr_t chunk, offset = 0; - unsigned int number = 0; + MemoryRegion *ram = g_new(MemoryRegion, 1); Error *local_err = NULL; - gchar *name; /* allocate RAM for core */ - name = g_strdup_printf("s390.ram"); - while (mem_size) { - MemoryRegion *ram = g_new(MemoryRegion, 1); - uint64_t size = mem_size; - - /* KVM does not allow memslots >= 8 TB */ - chunk = MIN(size, KVM_SLOT_MAX_BYTES); - memory_region_allocate_system_memory(ram, NULL, name, chunk); - memory_region_add_subregion(sysmem, offset, ram); - mem_size -= chunk; - offset += chunk; - g_free(name); - name = g_strdup_printf("s390.ram.%u", ++number); - } - g_free(name); + memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size); + memory_region_add_subregion(sysmem, 0, ram); /* * Configure the maximum page size. As no memory devices were created -- 2.18.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2019-09-16 13:19 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-08-06 9:48 [Qemu-devel] [PATCH for-4.2 v4 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov 2019-08-07 7:54 ` [Qemu-devel] [qemu-s390x] " David Hildenbrand 2019-08-07 9:55 ` Igor Mammedov 2019-08-07 10:20 ` David Hildenbrand 2019-08-07 15:32 ` [Qemu-devel] [PATCH for-4.2 v5 " Igor Mammedov 2019-08-20 16:07 ` Cornelia Huck 2019-08-27 12:56 ` Igor Mammedov 2019-08-29 6:47 ` Christian Borntraeger 2019-08-29 12:04 ` Igor Mammedov 2019-08-29 12:07 ` [Qemu-devel] [qemu-s390x] " Christian Borntraeger 2019-08-29 12:31 ` Igor Mammedov 2019-08-29 12:41 ` Christian Borntraeger 2019-08-30 9:41 ` Igor Mammedov 2019-08-30 16:19 ` Christian Borntraeger 2019-09-02 13:49 ` Igor Mammedov 2019-09-03 6:57 ` Christian Borntraeger 2019-09-16 13:16 ` Igor Mammedov 2019-08-06 9:48 ` [Qemu-devel] [PATCH for-4.2 v4 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).