[Qemu-devel] [PATCH v6 0/2] s390: stop abusing memory_region_allocate_system

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v6 0/2] s390: stop abusing memory_region_allocate_system_memory()
@ 2019-09-16 13:23 Igor Mammedov
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
  0 siblings, 2 replies; 8+ messages in thread
From: Igor Mammedov @ 2019-09-16 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, thuth, cohuck, qemu-s390x, david

Changelog:
  since v5:
    - [1/2] fix migration that wasn't starting and make sure that KVM part
      is able to handle 1:n MemorySection:memslot arrangement
  since v3:
    - fix compilation issue
    - advance HVA along with GPA in kvm_set_phys_mem()
  since v2:
    - break migration from old QEMU (since 2.12-4.1) for guest with >8TB RAM
      and drop migratable aliases patch as was agreed during v2 review
    - drop 4.2 machines patch as it's not prerequisite anymore
  since v1:
    - include 4.2 machines patch for adding compat RAM layout on top
    - 2/4 add missing in v1 patch for splitting too big MemorySection on
          several memslots
    - 3/4 amend code path on alias destruction to ensure that RAMBlock is
          cleaned properly
    - 4/4 add compat machine code to keep old layout (migration-wise) for
          4.1 and older machines 

While looking into unifying guest RAM allocation to use hostmem backends
for initial RAM (especially when -mempath is used) and retiring
memory_region_allocate_system_memory() API, leaving only single hostmem backend,
I was inspecting how currently it is used by boards and it turns out several
boards abuse it by calling the function several times (despite documented contract
forbiding it).

s390 is one of such boards where KVM limitation on memslot size got propagated
to board design and memory_region_allocate_system_memory() was abused to satisfy
KVM requirement for max RAM chunk where memory region alias would suffice.

Unfortunately, memory_region_allocate_system_memory() usage created migration
dependency where guest RAM is transferred in migration stream as several RAMBlocks
if it's more than KVM_SLOT_MAX_BYTES. During v2 review it was agreed to ignore
migration breakage (documenting it in release notes) and leaving only KVM fix.

In order to replace these several RAM chunks with a single memdev and keep it
working with KVM memslot size limit, following was done:
   * [1/2] split too big RAM chunk inside of KVM code on several memory slots
           if necessary
   * [2/2] drop manual ram splitting in s390 code

Igor Mammedov (2):
  kvm: s390: split too big memory section on several memslots
  s390: do not call memory_region_allocate_system_memory() multiple
    times

 include/sysemu/kvm_int.h   |   1 +
 accel/kvm/kvm-all.c        | 239 ++++++++++++++++++++++---------------
 hw/s390x/s390-virtio-ccw.c |  30 +----
 target/s390x/kvm.c         |  12 ++
 4 files changed, 162 insertions(+), 120 deletions(-)

-- 
2.18.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots
  2019-09-16 13:23 [Qemu-devel] [PATCH v6 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
@ 2019-09-16 13:23 ` Igor Mammedov
  2019-09-16 13:32   ` Dr. David Alan Gilbert
  2019-09-17  8:38   ` Peter Xu
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
  1 sibling, 2 replies; 8+ messages in thread
From: Igor Mammedov @ 2019-09-16 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: thuth, david, cohuck, dgilbert, qemu-s390x, pbonzini

Max memslot size supported by kvm on s390 is 8Tb,
move logic of splitting RAM in chunks upto 8T to KVM code.

This way it will hide KVM specific restrictions in KVM code
and won't affect board level design decisions. Which would allow
us to avoid misusing memory_region_allocate_system_memory() API
and eventually use a single hostmem backend for guest RAM.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
Since there is quite a bit of migration related code churn in the patch
CCing migration maintainer

CC: dgilbert@redhat.com

v6:
  * KVM's migration code was assuming 1:1 relation between
    memslot and MemorySection, which becomes not true fo s390x
    with this patch. As result migration was broken and
    dirty logging wasn't even started with when a MemorySection
    was split on several memslots. Amend related KVM dirty
    log tracking code to account for split MemorySection.
v5:
  * move computation 'size -= slot_size' inside of loop body
          (David Hildenbrand <david@redhat.com>)
v4:
  * fix compilation issue
          (Christian Borntraeger <borntraeger@de.ibm.com>)
  * advance HVA along with GPA in kvm_set_phys_mem()
          (Christian Borntraeger <borntraeger@de.ibm.com>)

patch prepares only KVM side for switching to single RAM memory region
another patch will take care of  dropping manual RAM partitioning in
s390 code.


 include/sysemu/kvm_int.h   |   1 +
 accel/kvm/kvm-all.c        | 239 ++++++++++++++++++++++---------------
 hw/s390x/s390-virtio-ccw.c |   9 --
 target/s390x/kvm.c         |  12 ++
 4 files changed, 159 insertions(+), 102 deletions(-)

diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 72b2d1b3ae..ac2d1f8b56 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -41,4 +41,5 @@ typedef struct KVMMemoryListener {
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
                                   AddressSpace *as, int as_id);
 
+void kvm_set_max_memslot_size(hwaddr max_slot_size);
 #endif
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b09bad0804..0635903408 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -140,6 +140,7 @@ bool kvm_direct_msi_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 static bool kvm_immediate_exit;
+static hwaddr kvm_max_slot_size = ~0;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
     KVM_CAP_INFO(USER_MEMORY),
@@ -437,7 +438,7 @@ static int kvm_slot_update_flags(KVMMemoryListener *kml, KVMSlot *mem,
 static int kvm_section_update_flags(KVMMemoryListener *kml,
                                     MemoryRegionSection *section)
 {
-    hwaddr start_addr, size;
+    hwaddr start_addr, size, slot_size;
     KVMSlot *mem;
     int ret = 0;
 
@@ -448,13 +449,18 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
 
     kvm_slots_lock(kml);
 
-    mem = kvm_lookup_matching_slot(kml, start_addr, size);
-    if (!mem) {
-        /* We don't have a slot if we want to trap every access. */
-        goto out;
-    }
+    while (size && !ret) {
+        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
+        mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
+        if (!mem) {
+            /* We don't have a slot if we want to trap every access. */
+            goto out;
+        }
 
-    ret = kvm_slot_update_flags(kml, mem, section->mr);
+        ret = kvm_slot_update_flags(kml, mem, section->mr);
+        start_addr += slot_size;
+        size -= slot_size;
+    }
 
 out:
     kvm_slots_unlock(kml);
@@ -497,11 +503,14 @@ static void kvm_log_stop(MemoryListener *listener,
 
 /* get kvm's dirty pages bitmap and update qemu's */
 static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
+                                         hwaddr offset_in_section,
+                                         hwaddr range_size,
                                          unsigned long *bitmap)
 {
     ram_addr_t start = section->offset_within_region +
+                       offset_in_section +
                        memory_region_get_ram_addr(section->mr);
-    ram_addr_t pages = int128_get64(section->size) / getpagesize();
+    ram_addr_t pages = range_size / getpagesize();
 
     cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages);
     return 0;
@@ -527,11 +536,13 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
     struct kvm_dirty_log d = {};
     KVMSlot *mem;
     hwaddr start_addr, size;
+    hwaddr slot_size, slot_offset = 0;
     int ret = 0;
 
     size = kvm_align_section(section, &start_addr);
-    if (size) {
-        mem = kvm_lookup_matching_slot(kml, start_addr, size);
+    while (size) {
+        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
+        mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
         if (!mem) {
             /* We don't have a slot if we want to trap every access. */
             goto out;
@@ -549,11 +560,11 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
          * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
          * a hope that sizeof(long) won't become >8 any time soon.
          */
-        size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
-                     /*HOST_LONG_BITS*/ 64) / 8;
         if (!mem->dirty_bmap) {
+            hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
+                                        /*HOST_LONG_BITS*/ 64) / 8;
             /* Allocate on the first log_sync, once and for all */
-            mem->dirty_bmap = g_malloc0(size);
+            mem->dirty_bmap = g_malloc0(bitmap_size);
         }
 
         d.dirty_bitmap = mem->dirty_bmap;
@@ -564,7 +575,12 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
             goto out;
         }
 
-        kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
+        kvm_get_dirty_pages_log_range(section, slot_offset, slot_size,
+                                      d.dirty_bitmap);
+
+        slot_offset += slot_size;
+        start_addr += slot_size;
+        size -= slot_size;
     }
 out:
     return ret;
@@ -575,55 +591,14 @@ out:
 #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
 #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
 
-/**
- * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
- *
- * NOTE: this will be a no-op if we haven't enabled manual dirty log
- * protection in the host kernel because in that case this operation
- * will be done within log_sync().
- *
- * @kml:     the kvm memory listener
- * @section: the memory range to clear dirty bitmap
- */
-static int kvm_physical_log_clear(KVMMemoryListener *kml,
-                                  MemoryRegionSection *section)
+static int kvm_physical_log_slot_clear(uint64_t start, uint64_t size,
+                                       KVMSlot *mem, int as_id)
 {
     KVMState *s = kvm_state;
-    struct kvm_clear_dirty_log d;
-    uint64_t start, end, bmap_start, start_delta, bmap_npages, size;
     unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
-    KVMSlot *mem = NULL;
-    int ret, i;
-
-    if (!s->manual_dirty_log_protect) {
-        /* No need to do explicit clear */
-        return 0;
-    }
-
-    start = section->offset_within_address_space;
-    size = int128_get64(section->size);
-
-    if (!size) {
-        /* Nothing more we can do... */
-        return 0;
-    }
-
-    kvm_slots_lock(kml);
-
-    /* Find any possible slot that covers the section */
-    for (i = 0; i < s->nr_slots; i++) {
-        mem = &kml->slots[i];
-        if (mem->start_addr <= start &&
-            start + size <= mem->start_addr + mem->memory_size) {
-            break;
-        }
-    }
-
-    /*
-     * We should always find one memslot until this point, otherwise
-     * there could be something wrong from the upper layer
-     */
-    assert(mem && i != s->nr_slots);
+    uint64_t end, bmap_start, start_delta, bmap_npages;
+    struct kvm_clear_dirty_log d;
+    int ret;
 
     /*
      * We need to extend either the start or the size or both to
@@ -694,7 +669,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
     /* It should never overflow.  If it happens, say something */
     assert(bmap_npages <= UINT32_MAX);
     d.num_pages = bmap_npages;
-    d.slot = mem->slot | (kml->as_id << 16);
+    d.slot = mem->slot | (as_id << 16);
 
     if (kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, &d) == -1) {
         ret = -errno;
@@ -718,8 +693,66 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
     /* This handles the NULL case well */
     g_free(bmap_clear);
 
-    kvm_slots_unlock(kml);
+    return ret;
+}
+
+/**
+ * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
+ *
+ * NOTE: this will be a no-op if we haven't enabled manual dirty log
+ * protection in the host kernel because in that case this operation
+ * will be done within log_sync().
+ *
+ * @kml:     the kvm memory listener
+ * @section: the memory range to clear dirty bitmap
+ */
+static int kvm_physical_log_clear(KVMMemoryListener *kml,
+                                  MemoryRegionSection *section)
+{
+    KVMState *s = kvm_state;
+    uint64_t start, size, slot_size;
+    KVMSlot *mem = NULL;
+    int i, ret = 0;
+
+    if (!s->manual_dirty_log_protect) {
+        /* No need to do explicit clear */
+        return 0;
+    }
+
+    start = section->offset_within_address_space;
+    size = int128_get64(section->size);
+
+    if (!size) {
+        /* Nothing more we can do... */
+        return 0;
+    }
+
+    kvm_slots_lock(kml);
+
+    while (size) {
+        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
+        /* Iterate over slots that cover the section */
+        for (i = 0; i < s->nr_slots; i++) {
+            mem = &kml->slots[i];
+            if (mem->start_addr <= start &&
+                start + slot_size <= mem->start_addr + mem->memory_size) {
+                break;
+            }
+        }
 
+        /*
+         * We should always find one memslot until this point, otherwise
+         * there could be something wrong from the upper layer
+         */
+        assert(mem && i != s->nr_slots);
+
+        kvm_physical_log_slot_clear(start, slot_size, mem, kml->as_id);
+
+        size -= slot_size;
+        start += slot_size;
+    }
+
+    kvm_slots_unlock(kml);
     return ret;
 }
 
@@ -953,6 +986,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list)
     return NULL;
 }
 
+void kvm_set_max_memslot_size(hwaddr max_slot_size)
+{
+    g_assert(
+        ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size
+    );
+    kvm_max_slot_size = max_slot_size;
+}
+
 static void kvm_set_phys_mem(KVMMemoryListener *kml,
                              MemoryRegionSection *section, bool add)
 {
@@ -960,7 +1001,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     int err;
     MemoryRegion *mr = section->mr;
     bool writeable = !mr->readonly && !mr->rom_device;
-    hwaddr start_addr, size;
+    hwaddr start_addr, size, slot_size;
     void *ram;
 
     if (!memory_region_is_ram(mr)) {
@@ -985,41 +1026,52 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
     kvm_slots_lock(kml);
 
     if (!add) {
-        mem = kvm_lookup_matching_slot(kml, start_addr, size);
-        if (!mem) {
-            goto out;
-        }
-        if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-            kvm_physical_sync_dirty_bitmap(kml, section);
-        }
+        do {
+            slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
+            mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
+            if (!mem) {
+                goto out;
+            }
+            if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+                kvm_physical_sync_dirty_bitmap(kml, section);
+            }
 
-        /* unregister the slot */
-        g_free(mem->dirty_bmap);
-        mem->dirty_bmap = NULL;
-        mem->memory_size = 0;
-        mem->flags = 0;
-        err = kvm_set_user_memory_region(kml, mem, false);
-        if (err) {
-            fprintf(stderr, "%s: error unregistering slot: %s\n",
-                    __func__, strerror(-err));
-            abort();
-        }
+            /* unregister the slot */
+            g_free(mem->dirty_bmap);
+            mem->dirty_bmap = NULL;
+            mem->memory_size = 0;
+            mem->flags = 0;
+            err = kvm_set_user_memory_region(kml, mem, false);
+            if (err) {
+                fprintf(stderr, "%s: error unregistering slot: %s\n",
+                        __func__, strerror(-err));
+                abort();
+            }
+            start_addr += slot_size;
+            size -= slot_size;
+        } while (size);
         goto out;
     }
 
     /* register the new slot */
-    mem = kvm_alloc_slot(kml);
-    mem->memory_size = size;
-    mem->start_addr = start_addr;
-    mem->ram = ram;
-    mem->flags = kvm_mem_flags(mr);
-
-    err = kvm_set_user_memory_region(kml, mem, true);
-    if (err) {
-        fprintf(stderr, "%s: error registering slot: %s\n", __func__,
-                strerror(-err));
-        abort();
-    }
+    do {
+        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
+        mem = kvm_alloc_slot(kml);
+        mem->memory_size = slot_size;
+        mem->start_addr = start_addr;
+        mem->ram = ram;
+        mem->flags = kvm_mem_flags(mr);
+
+        err = kvm_set_user_memory_region(kml, mem, true);
+        if (err) {
+            fprintf(stderr, "%s: error registering slot: %s\n", __func__,
+                    strerror(-err));
+            abort();
+        }
+        start_addr += slot_size;
+        ram += slot_size;
+        size -= slot_size;
+    } while (size);
 
 out:
     kvm_slots_unlock(kml);
@@ -2859,6 +2911,7 @@ static bool kvm_accel_has_memory(MachineState *ms, AddressSpace *as,
 
     for (i = 0; i < kvm->nr_as; ++i) {
         if (kvm->as[i].as == as && kvm->as[i].ml) {
+            size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
             return NULL != kvm_lookup_matching_slot(kvm->as[i].ml,
                                                     start_addr, size);
         }
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 8bfb6684cb..baa95aad38 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -154,15 +154,6 @@ static void virtio_ccw_register_hcalls(void)
                                    virtio_ccw_hcall_early_printk);
 }
 
-/*
- * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
- * as the dirty bitmap must be managed by bitops that take an int as
- * position indicator. If we have a guest beyond that we will split off
- * new subregions. The split must happen on a segment boundary (1MB).
- */
-#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
-#define SEG_MSK (~0xfffffULL)
-#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
 static void s390_memory_init(ram_addr_t mem_size)
 {
     MemoryRegion *sysmem = get_system_memory();
diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
index cea71ac7c3..838d23cb17 100644
--- a/target/s390x/kvm.c
+++ b/target/s390x/kvm.c
@@ -28,6 +28,7 @@
 #include "cpu.h"
 #include "internal.h"
 #include "kvm_s390x.h"
+#include "sysemu/kvm_int.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
@@ -123,6 +124,16 @@
 #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \
                                      (max_cpus + NR_LOCAL_IRQS))
 
+/*
+ * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
+ * as the dirty bitmap must be managed by bitops that take an int as
+ * position indicator. If we have a guest beyond that we will split off
+ * new subregions. The split must happen on a segment boundary (1MB).
+ */
+#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
+#define SEG_MSK (~0xfffffULL)
+#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
+
 static CPUWatchpoint hw_watchpoint;
 /*
  * We don't use a list because this structure is also used to transmit the
@@ -348,6 +359,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
      */
     /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */
 
+    kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
     return 0;
 }
 
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times
  2019-09-16 13:23 [Qemu-devel] [PATCH v6 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov
@ 2019-09-16 13:23 ` Igor Mammedov
  2019-09-17  8:44   ` Peter Xu
  1 sibling, 1 reply; 8+ messages in thread
From: Igor Mammedov @ 2019-09-16 13:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, thuth, cohuck, qemu-s390x, david

s390 was trying to solve limited KVM memslot size issue by abusing
memory_region_allocate_system_memory(), which breaks API contract
where the function might be called only once.

Beside an invalid use of API, the approach also introduced migration
issue, since RAM chunks for each KVM_SLOT_MAX_BYTES are transferred in
migration stream as separate RAMBlocks.

After discussion [1], it was agreed to break migration from older
QEMU for guest with RAM >8Tb (as it was relatively new (since 2.12)
and considered to be not actually used downstream).
Migration should keep working for guests with less than 8TB and for
more than 8TB with QEMU 4.2 and newer binary.
In case user tries to migrate more than 8TB guest, between incompatible
QEMU versions, migration should fail gracefully due to non-exiting
RAMBlock ID or RAMBlock size mismatch.

Taking in account above and that now KVM code is able to split too
big MemorySection into several memslots, stop abusing
  memory_region_allocate_system_memory()
and use only one memory region for RAM.

1) [PATCH RFC v2 4/4] s390: do not call  memory_region_allocate_system_memory() multiple times

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
v3:
  - drop migration compat code

PS:
I don't have access to a suitable system to test it.
---
 hw/s390x/s390-virtio-ccw.c | 21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index baa95aad38..18ad279a00 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -157,27 +157,12 @@ static void virtio_ccw_register_hcalls(void)
 static void s390_memory_init(ram_addr_t mem_size)
 {
     MemoryRegion *sysmem = get_system_memory();
-    ram_addr_t chunk, offset = 0;
-    unsigned int number = 0;
+    MemoryRegion *ram = g_new(MemoryRegion, 1);
     Error *local_err = NULL;
-    gchar *name;

     /* allocate RAM for core */
-    name = g_strdup_printf("s390.ram");
-    while (mem_size) {
-        MemoryRegion *ram = g_new(MemoryRegion, 1);
-        uint64_t size = mem_size;
-
-        /* KVM does not allow memslots >= 8 TB */
-        chunk = MIN(size, KVM_SLOT_MAX_BYTES);
-        memory_region_allocate_system_memory(ram, NULL, name, chunk);
-        memory_region_add_subregion(sysmem, offset, ram);
-        mem_size -= chunk;
-        offset += chunk;
-        g_free(name);
-        name = g_strdup_printf("s390.ram.%u", ++number);
-    }
-    g_free(name);
+    memory_region_allocate_system_memory(ram, NULL, "s390.ram", mem_size);
+    memory_region_add_subregion(sysmem, 0, ram);

     /*
      * Configure the maximum page size. As no memory devices were created
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov
@ 2019-09-16 13:32   ` Dr. David Alan Gilbert
  2019-09-17  8:38   ` Peter Xu
  1 sibling, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2019-09-16 13:32 UTC (permalink / raw)
  To: Igor Mammedov, peterx
  Cc: thuth, david, cohuck, qemu-devel, qemu-s390x, pbonzini

* Igor Mammedov (imammedo@redhat.com) wrote:
> Max memslot size supported by kvm on s390 is 8Tb,
> move logic of splitting RAM in chunks upto 8T to KVM code.
> 
> This way it will hide KVM specific restrictions in KVM code
> and won't affect board level design decisions. Which would allow
> us to avoid misusing memory_region_allocate_system_memory() API
> and eventually use a single hostmem backend for guest RAM.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> Since there is quite a bit of migration related code churn in the patch
> CCing migration maintainer
> 
> CC: dgilbert@redhat.com

CC'ing in peterx who knows this code better than me.
But if I'm reading it right, then the rest of the migration code can be
blissfully ignorant of these changes.

Dave


> 
> v6:
>   * KVM's migration code was assuming 1:1 relation between
>     memslot and MemorySection, which becomes not true fo s390x
>     with this patch. As result migration was broken and
>     dirty logging wasn't even started with when a MemorySection
>     was split on several memslots. Amend related KVM dirty
>     log tracking code to account for split MemorySection.
> v5:
>   * move computation 'size -= slot_size' inside of loop body
>           (David Hildenbrand <david@redhat.com>)
> v4:
>   * fix compilation issue
>           (Christian Borntraeger <borntraeger@de.ibm.com>)
>   * advance HVA along with GPA in kvm_set_phys_mem()
>           (Christian Borntraeger <borntraeger@de.ibm.com>)
> 
> patch prepares only KVM side for switching to single RAM memory region
> another patch will take care of  dropping manual RAM partitioning in
> s390 code.
> 
> 
>  include/sysemu/kvm_int.h   |   1 +
>  accel/kvm/kvm-all.c        | 239 ++++++++++++++++++++++---------------
>  hw/s390x/s390-virtio-ccw.c |   9 --
>  target/s390x/kvm.c         |  12 ++
>  4 files changed, 159 insertions(+), 102 deletions(-)
> 
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 72b2d1b3ae..ac2d1f8b56 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener {
>  void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>                                    AddressSpace *as, int as_id);
>  
> +void kvm_set_max_memslot_size(hwaddr max_slot_size);
>  #endif
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index b09bad0804..0635903408 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -140,6 +140,7 @@ bool kvm_direct_msi_allowed;
>  bool kvm_ioeventfd_any_length_allowed;
>  bool kvm_msi_use_devid;
>  static bool kvm_immediate_exit;
> +static hwaddr kvm_max_slot_size = ~0;
>  
>  static const KVMCapabilityInfo kvm_required_capabilites[] = {
>      KVM_CAP_INFO(USER_MEMORY),
> @@ -437,7 +438,7 @@ static int kvm_slot_update_flags(KVMMemoryListener *kml, KVMSlot *mem,
>  static int kvm_section_update_flags(KVMMemoryListener *kml,
>                                      MemoryRegionSection *section)
>  {
> -    hwaddr start_addr, size;
> +    hwaddr start_addr, size, slot_size;
>      KVMSlot *mem;
>      int ret = 0;
>  
> @@ -448,13 +449,18 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
>  
>      kvm_slots_lock(kml);
>  
> -    mem = kvm_lookup_matching_slot(kml, start_addr, size);
> -    if (!mem) {
> -        /* We don't have a slot if we want to trap every access. */
> -        goto out;
> -    }
> +    while (size && !ret) {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
> +        mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
> +        if (!mem) {
> +            /* We don't have a slot if we want to trap every access. */
> +            goto out;
> +        }
>  
> -    ret = kvm_slot_update_flags(kml, mem, section->mr);
> +        ret = kvm_slot_update_flags(kml, mem, section->mr);
> +        start_addr += slot_size;
> +        size -= slot_size;
> +    }
>  
>  out:
>      kvm_slots_unlock(kml);
> @@ -497,11 +503,14 @@ static void kvm_log_stop(MemoryListener *listener,
>  
>  /* get kvm's dirty pages bitmap and update qemu's */
>  static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
> +                                         hwaddr offset_in_section,
> +                                         hwaddr range_size,
>                                           unsigned long *bitmap)
>  {
>      ram_addr_t start = section->offset_within_region +
> +                       offset_in_section +
>                         memory_region_get_ram_addr(section->mr);
> -    ram_addr_t pages = int128_get64(section->size) / getpagesize();
> +    ram_addr_t pages = range_size / getpagesize();
>  
>      cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages);
>      return 0;
> @@ -527,11 +536,13 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>      struct kvm_dirty_log d = {};
>      KVMSlot *mem;
>      hwaddr start_addr, size;
> +    hwaddr slot_size, slot_offset = 0;
>      int ret = 0;
>  
>      size = kvm_align_section(section, &start_addr);
> -    if (size) {
> -        mem = kvm_lookup_matching_slot(kml, start_addr, size);
> +    while (size) {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
> +        mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
>          if (!mem) {
>              /* We don't have a slot if we want to trap every access. */
>              goto out;
> @@ -549,11 +560,11 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>           * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
>           * a hope that sizeof(long) won't become >8 any time soon.
>           */
> -        size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> -                     /*HOST_LONG_BITS*/ 64) / 8;
>          if (!mem->dirty_bmap) {
> +            hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> +                                        /*HOST_LONG_BITS*/ 64) / 8;
>              /* Allocate on the first log_sync, once and for all */
> -            mem->dirty_bmap = g_malloc0(size);
> +            mem->dirty_bmap = g_malloc0(bitmap_size);
>          }
>  
>          d.dirty_bitmap = mem->dirty_bmap;
> @@ -564,7 +575,12 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>              goto out;
>          }
>  
> -        kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
> +        kvm_get_dirty_pages_log_range(section, slot_offset, slot_size,
> +                                      d.dirty_bitmap);
> +
> +        slot_offset += slot_size;
> +        start_addr += slot_size;
> +        size -= slot_size;
>      }
>  out:
>      return ret;
> @@ -575,55 +591,14 @@ out:
>  #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>  #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>  
> -/**
> - * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
> - *
> - * NOTE: this will be a no-op if we haven't enabled manual dirty log
> - * protection in the host kernel because in that case this operation
> - * will be done within log_sync().
> - *
> - * @kml:     the kvm memory listener
> - * @section: the memory range to clear dirty bitmap
> - */
> -static int kvm_physical_log_clear(KVMMemoryListener *kml,
> -                                  MemoryRegionSection *section)
> +static int kvm_physical_log_slot_clear(uint64_t start, uint64_t size,
> +                                       KVMSlot *mem, int as_id)
>  {
>      KVMState *s = kvm_state;
> -    struct kvm_clear_dirty_log d;
> -    uint64_t start, end, bmap_start, start_delta, bmap_npages, size;
>      unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
> -    KVMSlot *mem = NULL;
> -    int ret, i;
> -
> -    if (!s->manual_dirty_log_protect) {
> -        /* No need to do explicit clear */
> -        return 0;
> -    }
> -
> -    start = section->offset_within_address_space;
> -    size = int128_get64(section->size);
> -
> -    if (!size) {
> -        /* Nothing more we can do... */
> -        return 0;
> -    }
> -
> -    kvm_slots_lock(kml);
> -
> -    /* Find any possible slot that covers the section */
> -    for (i = 0; i < s->nr_slots; i++) {
> -        mem = &kml->slots[i];
> -        if (mem->start_addr <= start &&
> -            start + size <= mem->start_addr + mem->memory_size) {
> -            break;
> -        }
> -    }
> -
> -    /*
> -     * We should always find one memslot until this point, otherwise
> -     * there could be something wrong from the upper layer
> -     */
> -    assert(mem && i != s->nr_slots);
> +    uint64_t end, bmap_start, start_delta, bmap_npages;
> +    struct kvm_clear_dirty_log d;
> +    int ret;
>  
>      /*
>       * We need to extend either the start or the size or both to
> @@ -694,7 +669,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
>      /* It should never overflow.  If it happens, say something */
>      assert(bmap_npages <= UINT32_MAX);
>      d.num_pages = bmap_npages;
> -    d.slot = mem->slot | (kml->as_id << 16);
> +    d.slot = mem->slot | (as_id << 16);
>  
>      if (kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, &d) == -1) {
>          ret = -errno;
> @@ -718,8 +693,66 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
>      /* This handles the NULL case well */
>      g_free(bmap_clear);
>  
> -    kvm_slots_unlock(kml);
> +    return ret;
> +}
> +
> +/**
> + * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
> + *
> + * NOTE: this will be a no-op if we haven't enabled manual dirty log
> + * protection in the host kernel because in that case this operation
> + * will be done within log_sync().
> + *
> + * @kml:     the kvm memory listener
> + * @section: the memory range to clear dirty bitmap
> + */
> +static int kvm_physical_log_clear(KVMMemoryListener *kml,
> +                                  MemoryRegionSection *section)
> +{
> +    KVMState *s = kvm_state;
> +    uint64_t start, size, slot_size;
> +    KVMSlot *mem = NULL;
> +    int i, ret = 0;
> +
> +    if (!s->manual_dirty_log_protect) {
> +        /* No need to do explicit clear */
> +        return 0;
> +    }
> +
> +    start = section->offset_within_address_space;
> +    size = int128_get64(section->size);
> +
> +    if (!size) {
> +        /* Nothing more we can do... */
> +        return 0;
> +    }
> +
> +    kvm_slots_lock(kml);
> +
> +    while (size) {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
> +        /* Iterate over slots that cover the section */
> +        for (i = 0; i < s->nr_slots; i++) {
> +            mem = &kml->slots[i];
> +            if (mem->start_addr <= start &&
> +                start + slot_size <= mem->start_addr + mem->memory_size) {
> +                break;
> +            }
> +        }
>  
> +        /*
> +         * We should always find one memslot until this point, otherwise
> +         * there could be something wrong from the upper layer
> +         */
> +        assert(mem && i != s->nr_slots);
> +
> +        kvm_physical_log_slot_clear(start, slot_size, mem, kml->as_id);
> +
> +        size -= slot_size;
> +        start += slot_size;
> +    }
> +
> +    kvm_slots_unlock(kml);
>      return ret;
>  }
>  
> @@ -953,6 +986,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list)
>      return NULL;
>  }
>  
> +void kvm_set_max_memslot_size(hwaddr max_slot_size)
> +{
> +    g_assert(
> +        ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size
> +    );
> +    kvm_max_slot_size = max_slot_size;
> +}
> +
>  static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                               MemoryRegionSection *section, bool add)
>  {
> @@ -960,7 +1001,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      int err;
>      MemoryRegion *mr = section->mr;
>      bool writeable = !mr->readonly && !mr->rom_device;
> -    hwaddr start_addr, size;
> +    hwaddr start_addr, size, slot_size;
>      void *ram;
>  
>      if (!memory_region_is_ram(mr)) {
> @@ -985,41 +1026,52 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      kvm_slots_lock(kml);
>  
>      if (!add) {
> -        mem = kvm_lookup_matching_slot(kml, start_addr, size);
> -        if (!mem) {
> -            goto out;
> -        }
> -        if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> -            kvm_physical_sync_dirty_bitmap(kml, section);
> -        }
> +        do {
> +            slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
> +            mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
> +            if (!mem) {
> +                goto out;
> +            }
> +            if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> +                kvm_physical_sync_dirty_bitmap(kml, section);
> +            }
>  
> -        /* unregister the slot */
> -        g_free(mem->dirty_bmap);
> -        mem->dirty_bmap = NULL;
> -        mem->memory_size = 0;
> -        mem->flags = 0;
> -        err = kvm_set_user_memory_region(kml, mem, false);
> -        if (err) {
> -            fprintf(stderr, "%s: error unregistering slot: %s\n",
> -                    __func__, strerror(-err));
> -            abort();
> -        }
> +            /* unregister the slot */
> +            g_free(mem->dirty_bmap);
> +            mem->dirty_bmap = NULL;
> +            mem->memory_size = 0;
> +            mem->flags = 0;
> +            err = kvm_set_user_memory_region(kml, mem, false);
> +            if (err) {
> +                fprintf(stderr, "%s: error unregistering slot: %s\n",
> +                        __func__, strerror(-err));
> +                abort();
> +            }
> +            start_addr += slot_size;
> +            size -= slot_size;
> +        } while (size);
>          goto out;
>      }
>  
>      /* register the new slot */
> -    mem = kvm_alloc_slot(kml);
> -    mem->memory_size = size;
> -    mem->start_addr = start_addr;
> -    mem->ram = ram;
> -    mem->flags = kvm_mem_flags(mr);
> -
> -    err = kvm_set_user_memory_region(kml, mem, true);
> -    if (err) {
> -        fprintf(stderr, "%s: error registering slot: %s\n", __func__,
> -                strerror(-err));
> -        abort();
> -    }
> +    do {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
> +        mem = kvm_alloc_slot(kml);
> +        mem->memory_size = slot_size;
> +        mem->start_addr = start_addr;
> +        mem->ram = ram;
> +        mem->flags = kvm_mem_flags(mr);
> +
> +        err = kvm_set_user_memory_region(kml, mem, true);
> +        if (err) {
> +            fprintf(stderr, "%s: error registering slot: %s\n", __func__,
> +                    strerror(-err));
> +            abort();
> +        }
> +        start_addr += slot_size;
> +        ram += slot_size;
> +        size -= slot_size;
> +    } while (size);
>  
>  out:
>      kvm_slots_unlock(kml);
> @@ -2859,6 +2911,7 @@ static bool kvm_accel_has_memory(MachineState *ms, AddressSpace *as,
>  
>      for (i = 0; i < kvm->nr_as; ++i) {
>          if (kvm->as[i].as == as && kvm->as[i].ml) {
> +            size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
>              return NULL != kvm_lookup_matching_slot(kvm->as[i].ml,
>                                                      start_addr, size);
>          }
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 8bfb6684cb..baa95aad38 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -154,15 +154,6 @@ static void virtio_ccw_register_hcalls(void)
>                                     virtio_ccw_hcall_early_printk);
>  }
>  
> -/*
> - * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
> - * as the dirty bitmap must be managed by bitops that take an int as
> - * position indicator. If we have a guest beyond that we will split off
> - * new subregions. The split must happen on a segment boundary (1MB).
> - */
> -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
> -#define SEG_MSK (~0xfffffULL)
> -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
>  static void s390_memory_init(ram_addr_t mem_size)
>  {
>      MemoryRegion *sysmem = get_system_memory();
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index cea71ac7c3..838d23cb17 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -28,6 +28,7 @@
>  #include "cpu.h"
>  #include "internal.h"
>  #include "kvm_s390x.h"
> +#include "sysemu/kvm_int.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> @@ -123,6 +124,16 @@
>  #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \
>                                       (max_cpus + NR_LOCAL_IRQS))
>  
> +/*
> + * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
> + * as the dirty bitmap must be managed by bitops that take an int as
> + * position indicator. If we have a guest beyond that we will split off
> + * new subregions. The split must happen on a segment boundary (1MB).
> + */
> +#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
> +#define SEG_MSK (~0xfffffULL)
> +#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
> +
>  static CPUWatchpoint hw_watchpoint;
>  /*
>   * We don't use a list because this structure is also used to transmit the
> @@ -348,6 +359,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>       */
>      /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */
>  
> +    kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
>      return 0;
>  }
>  
> -- 
> 2.18.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov
  2019-09-16 13:32   ` Dr. David Alan Gilbert
@ 2019-09-17  8:38   ` Peter Xu
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Xu @ 2019-09-17  8:38 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: thuth, david, cohuck, qemu-devel, dgilbert, qemu-s390x, pbonzini

On Mon, Sep 16, 2019 at 09:23:46AM -0400, Igor Mammedov wrote:
> Max memslot size supported by kvm on s390 is 8Tb,
> move logic of splitting RAM in chunks upto 8T to KVM code.
> 
> This way it will hide KVM specific restrictions in KVM code
> and won't affect board level design decisions. Which would allow
> us to avoid misusing memory_region_allocate_system_memory() API
> and eventually use a single hostmem backend for guest RAM.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> Since there is quite a bit of migration related code churn in the patch
> CCing migration maintainer
> 
> CC: dgilbert@redhat.com
> 
> v6:
>   * KVM's migration code was assuming 1:1 relation between
>     memslot and MemorySection, which becomes not true fo s390x
>     with this patch. As result migration was broken and
>     dirty logging wasn't even started with when a MemorySection
>     was split on several memslots. Amend related KVM dirty
>     log tracking code to account for split MemorySection.
> v5:
>   * move computation 'size -= slot_size' inside of loop body
>           (David Hildenbrand <david@redhat.com>)
> v4:
>   * fix compilation issue
>           (Christian Borntraeger <borntraeger@de.ibm.com>)
>   * advance HVA along with GPA in kvm_set_phys_mem()
>           (Christian Borntraeger <borntraeger@de.ibm.com>)
> 
> patch prepares only KVM side for switching to single RAM memory region
> another patch will take care of  dropping manual RAM partitioning in
> s390 code.
> 
> 
>  include/sysemu/kvm_int.h   |   1 +
>  accel/kvm/kvm-all.c        | 239 ++++++++++++++++++++++---------------
>  hw/s390x/s390-virtio-ccw.c |   9 --
>  target/s390x/kvm.c         |  12 ++
>  4 files changed, 159 insertions(+), 102 deletions(-)
> 
> diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
> index 72b2d1b3ae..ac2d1f8b56 100644
> --- a/include/sysemu/kvm_int.h
> +++ b/include/sysemu/kvm_int.h
> @@ -41,4 +41,5 @@ typedef struct KVMMemoryListener {
>  void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
>                                    AddressSpace *as, int as_id);
>  
> +void kvm_set_max_memslot_size(hwaddr max_slot_size);
>  #endif
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index b09bad0804..0635903408 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -140,6 +140,7 @@ bool kvm_direct_msi_allowed;
>  bool kvm_ioeventfd_any_length_allowed;
>  bool kvm_msi_use_devid;
>  static bool kvm_immediate_exit;
> +static hwaddr kvm_max_slot_size = ~0;
>  
>  static const KVMCapabilityInfo kvm_required_capabilites[] = {
>      KVM_CAP_INFO(USER_MEMORY),
> @@ -437,7 +438,7 @@ static int kvm_slot_update_flags(KVMMemoryListener *kml, KVMSlot *mem,
>  static int kvm_section_update_flags(KVMMemoryListener *kml,
>                                      MemoryRegionSection *section)
>  {
> -    hwaddr start_addr, size;
> +    hwaddr start_addr, size, slot_size;
>      KVMSlot *mem;
>      int ret = 0;
>  
> @@ -448,13 +449,18 @@ static int kvm_section_update_flags(KVMMemoryListener *kml,
>  
>      kvm_slots_lock(kml);
>  
> -    mem = kvm_lookup_matching_slot(kml, start_addr, size);
> -    if (!mem) {
> -        /* We don't have a slot if we want to trap every access. */
> -        goto out;
> -    }
> +    while (size && !ret) {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;

Nit: use MIN()?

> +        mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
> +        if (!mem) {
> +            /* We don't have a slot if we want to trap every access. */
> +            goto out;

I don't really understand the rational behind e377e87ca6a3d by simply
reading the commit message... Just want to make sure: this won't be
triggered when memslot split happened for this memory region section,
right?

> +        }
>  
> -    ret = kvm_slot_update_flags(kml, mem, section->mr);
> +        ret = kvm_slot_update_flags(kml, mem, section->mr);
> +        start_addr += slot_size;
> +        size -= slot_size;
> +    }
>  
>  out:
>      kvm_slots_unlock(kml);
> @@ -497,11 +503,14 @@ static void kvm_log_stop(MemoryListener *listener,
>  
>  /* get kvm's dirty pages bitmap and update qemu's */
>  static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
> +                                         hwaddr offset_in_section,
> +                                         hwaddr range_size,
>                                           unsigned long *bitmap)
>  {
>      ram_addr_t start = section->offset_within_region +
> +                       offset_in_section +
>                         memory_region_get_ram_addr(section->mr);
> -    ram_addr_t pages = int128_get64(section->size) / getpagesize();
> +    ram_addr_t pages = range_size / getpagesize();

I feel like we can avoid touching kvm_get_dirty_pages_log_range() by
passing in a correct MemoryRegionSection* to it (please see [1]
below).

>  
>      cpu_physical_memory_set_dirty_lebitmap(bitmap, start, pages);
>      return 0;
> @@ -527,11 +536,13 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>      struct kvm_dirty_log d = {};
>      KVMSlot *mem;
>      hwaddr start_addr, size;
> +    hwaddr slot_size, slot_offset = 0;
>      int ret = 0;
>  
>      size = kvm_align_section(section, &start_addr);
> -    if (size) {
> -        mem = kvm_lookup_matching_slot(kml, start_addr, size);
> +    while (size) {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;

(use MIN?)

> +        mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
>          if (!mem) {
>              /* We don't have a slot if we want to trap every access. */
>              goto out;
> @@ -549,11 +560,11 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>           * So for now, let's align to 64 instead of HOST_LONG_BITS here, in
>           * a hope that sizeof(long) won't become >8 any time soon.
>           */
> -        size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> -                     /*HOST_LONG_BITS*/ 64) / 8;
>          if (!mem->dirty_bmap) {
> +            hwaddr bitmap_size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS),
> +                                        /*HOST_LONG_BITS*/ 64) / 8;
>              /* Allocate on the first log_sync, once and for all */
> -            mem->dirty_bmap = g_malloc0(size);
> +            mem->dirty_bmap = g_malloc0(bitmap_size);
>          }
>  
>          d.dirty_bitmap = mem->dirty_bmap;
> @@ -564,7 +575,12 @@ static int kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
>              goto out;
>          }
>  
> -        kvm_get_dirty_pages_log_range(section, d.dirty_bitmap);
> +        kvm_get_dirty_pages_log_range(section, slot_offset, slot_size,
> +                                      d.dirty_bitmap);

[1]

Here can we hand-made a MemoryRegionSection which covers the
slot_offset/slot_size region and pass it to
kvm_get_dirty_pages_log_range()?

> +
> +        slot_offset += slot_size;
> +        start_addr += slot_size;
> +        size -= slot_size;
>      }
>  out:
>      return ret;
> @@ -575,55 +591,14 @@ out:
>  #define KVM_CLEAR_LOG_ALIGN  (qemu_real_host_page_size << KVM_CLEAR_LOG_SHIFT)
>  #define KVM_CLEAR_LOG_MASK   (-KVM_CLEAR_LOG_ALIGN)
>  
> -/**
> - * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
> - *
> - * NOTE: this will be a no-op if we haven't enabled manual dirty log
> - * protection in the host kernel because in that case this operation
> - * will be done within log_sync().
> - *
> - * @kml:     the kvm memory listener
> - * @section: the memory range to clear dirty bitmap
> - */
> -static int kvm_physical_log_clear(KVMMemoryListener *kml,
> -                                  MemoryRegionSection *section)
> +static int kvm_physical_log_slot_clear(uint64_t start, uint64_t size,
> +                                       KVMSlot *mem, int as_id)
>  {
>      KVMState *s = kvm_state;
> -    struct kvm_clear_dirty_log d;
> -    uint64_t start, end, bmap_start, start_delta, bmap_npages, size;
>      unsigned long *bmap_clear = NULL, psize = qemu_real_host_page_size;
> -    KVMSlot *mem = NULL;
> -    int ret, i;
> -
> -    if (!s->manual_dirty_log_protect) {
> -        /* No need to do explicit clear */
> -        return 0;
> -    }
> -
> -    start = section->offset_within_address_space;
> -    size = int128_get64(section->size);
> -
> -    if (!size) {
> -        /* Nothing more we can do... */
> -        return 0;
> -    }
> -
> -    kvm_slots_lock(kml);
> -
> -    /* Find any possible slot that covers the section */
> -    for (i = 0; i < s->nr_slots; i++) {
> -        mem = &kml->slots[i];
> -        if (mem->start_addr <= start &&
> -            start + size <= mem->start_addr + mem->memory_size) {
> -            break;
> -        }
> -    }
> -
> -    /*
> -     * We should always find one memslot until this point, otherwise
> -     * there could be something wrong from the upper layer
> -     */
> -    assert(mem && i != s->nr_slots);
> +    uint64_t end, bmap_start, start_delta, bmap_npages;
> +    struct kvm_clear_dirty_log d;
> +    int ret;
>  
>      /*
>       * We need to extend either the start or the size or both to
> @@ -694,7 +669,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
>      /* It should never overflow.  If it happens, say something */
>      assert(bmap_npages <= UINT32_MAX);
>      d.num_pages = bmap_npages;
> -    d.slot = mem->slot | (kml->as_id << 16);
> +    d.slot = mem->slot | (as_id << 16);
>  
>      if (kvm_vm_ioctl(s, KVM_CLEAR_DIRTY_LOG, &d) == -1) {
>          ret = -errno;
> @@ -718,8 +693,66 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
>      /* This handles the NULL case well */
>      g_free(bmap_clear);
>  
> -    kvm_slots_unlock(kml);
> +    return ret;
> +}
> +
> +/**
> + * kvm_physical_log_clear - Clear the kernel's dirty bitmap for range
> + *
> + * NOTE: this will be a no-op if we haven't enabled manual dirty log
> + * protection in the host kernel because in that case this operation
> + * will be done within log_sync().
> + *
> + * @kml:     the kvm memory listener
> + * @section: the memory range to clear dirty bitmap
> + */
> +static int kvm_physical_log_clear(KVMMemoryListener *kml,
> +                                  MemoryRegionSection *section)
> +{
> +    KVMState *s = kvm_state;
> +    uint64_t start, size, slot_size;
> +    KVMSlot *mem = NULL;
> +    int i, ret = 0;
> +
> +    if (!s->manual_dirty_log_protect) {
> +        /* No need to do explicit clear */
> +        return 0;
> +    }
> +
> +    start = section->offset_within_address_space;
> +    size = int128_get64(section->size);
> +
> +    if (!size) {
> +        /* Nothing more we can do... */
> +        return 0;
> +    }
> +
> +    kvm_slots_lock(kml);
> +
> +    while (size) {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;

MIN?

Also, the naming of slot_size might be a bit misleading here because
log_clear() is different in that it could happen with very small
memory region section passed in, hence slot_size could very probably
not be the size of slot.  Maybe range_size or anything else?

> +        /* Iterate over slots that cover the section */
> +        for (i = 0; i < s->nr_slots; i++) {
> +            mem = &kml->slots[i];
> +            if (mem->start_addr <= start &&
> +                start + slot_size <= mem->start_addr + mem->memory_size) {
> +                break;
> +            }
> +        }
>  
> +        /*
> +         * We should always find one memslot until this point, otherwise
> +         * there could be something wrong from the upper layer
> +         */
> +        assert(mem && i != s->nr_slots);
> +
> +        kvm_physical_log_slot_clear(start, slot_size, mem, kml->as_id);

Use ret to capture errors?

> +
> +        size -= slot_size;
> +        start += slot_size;
> +    }
> +
> +    kvm_slots_unlock(kml);
>      return ret;
>  }
>  
> @@ -953,6 +986,14 @@ kvm_check_extension_list(KVMState *s, const KVMCapabilityInfo *list)
>      return NULL;
>  }
>  
> +void kvm_set_max_memslot_size(hwaddr max_slot_size)
> +{
> +    g_assert(
> +        ROUND_UP(max_slot_size, qemu_real_host_page_size) == max_slot_size
> +    );
> +    kvm_max_slot_size = max_slot_size;
> +}
> +
>  static void kvm_set_phys_mem(KVMMemoryListener *kml,
>                               MemoryRegionSection *section, bool add)
>  {
> @@ -960,7 +1001,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      int err;
>      MemoryRegion *mr = section->mr;
>      bool writeable = !mr->readonly && !mr->rom_device;
> -    hwaddr start_addr, size;
> +    hwaddr start_addr, size, slot_size;
>      void *ram;
>  
>      if (!memory_region_is_ram(mr)) {
> @@ -985,41 +1026,52 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
>      kvm_slots_lock(kml);
>  
>      if (!add) {
> -        mem = kvm_lookup_matching_slot(kml, start_addr, size);
> -        if (!mem) {
> -            goto out;
> -        }
> -        if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> -            kvm_physical_sync_dirty_bitmap(kml, section);
> -        }
> +        do {
> +            slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;

Same

> +            mem = kvm_lookup_matching_slot(kml, start_addr, slot_size);
> +            if (!mem) {
> +                goto out;
> +            }
> +            if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> +                kvm_physical_sync_dirty_bitmap(kml, section);
> +            }
>  
> -        /* unregister the slot */
> -        g_free(mem->dirty_bmap);
> -        mem->dirty_bmap = NULL;
> -        mem->memory_size = 0;
> -        mem->flags = 0;
> -        err = kvm_set_user_memory_region(kml, mem, false);
> -        if (err) {
> -            fprintf(stderr, "%s: error unregistering slot: %s\n",
> -                    __func__, strerror(-err));
> -            abort();
> -        }
> +            /* unregister the slot */
> +            g_free(mem->dirty_bmap);
> +            mem->dirty_bmap = NULL;
> +            mem->memory_size = 0;
> +            mem->flags = 0;
> +            err = kvm_set_user_memory_region(kml, mem, false);
> +            if (err) {
> +                fprintf(stderr, "%s: error unregistering slot: %s\n",
> +                        __func__, strerror(-err));
> +                abort();
> +            }
> +            start_addr += slot_size;
> +            size -= slot_size;
> +        } while (size);
>          goto out;
>      }
>  
>      /* register the new slot */
> -    mem = kvm_alloc_slot(kml);
> -    mem->memory_size = size;
> -    mem->start_addr = start_addr;
> -    mem->ram = ram;
> -    mem->flags = kvm_mem_flags(mr);
> -
> -    err = kvm_set_user_memory_region(kml, mem, true);
> -    if (err) {
> -        fprintf(stderr, "%s: error registering slot: %s\n", __func__,
> -                strerror(-err));
> -        abort();
> -    }
> +    do {
> +        slot_size = kvm_max_slot_size < size ? kvm_max_slot_size : size;

Same.

> +        mem = kvm_alloc_slot(kml);
> +        mem->memory_size = slot_size;
> +        mem->start_addr = start_addr;
> +        mem->ram = ram;
> +        mem->flags = kvm_mem_flags(mr);
> +
> +        err = kvm_set_user_memory_region(kml, mem, true);
> +        if (err) {
> +            fprintf(stderr, "%s: error registering slot: %s\n", __func__,
> +                    strerror(-err));
> +            abort();
> +        }
> +        start_addr += slot_size;
> +        ram += slot_size;
> +        size -= slot_size;
> +    } while (size);
>  
>  out:
>      kvm_slots_unlock(kml);
> @@ -2859,6 +2911,7 @@ static bool kvm_accel_has_memory(MachineState *ms, AddressSpace *as,
>  
>      for (i = 0; i < kvm->nr_as; ++i) {
>          if (kvm->as[i].as == as && kvm->as[i].ml) {
> +            size = kvm_max_slot_size < size ? kvm_max_slot_size : size;
>              return NULL != kvm_lookup_matching_slot(kvm->as[i].ml,
>                                                      start_addr, size);
>          }
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 8bfb6684cb..baa95aad38 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -154,15 +154,6 @@ static void virtio_ccw_register_hcalls(void)
>                                     virtio_ccw_hcall_early_printk);
>  }
>  
> -/*
> - * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
> - * as the dirty bitmap must be managed by bitops that take an int as
> - * position indicator. If we have a guest beyond that we will split off
> - * new subregions. The split must happen on a segment boundary (1MB).
> - */
> -#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)
> -#define SEG_MSK (~0xfffffULL)
> -#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)

(Nit: IMHO it'll be cleaner to keep these and in the next patch to
 simply revert bb223055b9b327e)

>  static void s390_memory_init(ram_addr_t mem_size)
>  {
>      MemoryRegion *sysmem = get_system_memory();
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index cea71ac7c3..838d23cb17 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -28,6 +28,7 @@
>  #include "cpu.h"
>  #include "internal.h"
>  #include "kvm_s390x.h"
> +#include "sysemu/kvm_int.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> @@ -123,6 +124,16 @@
>  #define VCPU_IRQ_BUF_SIZE(max_cpus) (sizeof(struct kvm_s390_irq) * \
>                                       (max_cpus + NR_LOCAL_IRQS))
>  
> +/*
> + * KVM does only support memory slots up to KVM_MEM_MAX_NR_PAGES pages
> + * as the dirty bitmap must be managed by bitops that take an int as
> + * position indicator. If we have a guest beyond that we will split off
> + * new subregions. The split must happen on a segment boundary (1MB).
> + */
> +#define KVM_MEM_MAX_NR_PAGES ((1ULL << 31) - 1)

Not related to this series, but... should kvm export this to uapi
headers?

> +#define SEG_MSK (~0xfffffULL)
> +#define KVM_SLOT_MAX_BYTES ((KVM_MEM_MAX_NR_PAGES * TARGET_PAGE_SIZE) & SEG_MSK)
> +
>  static CPUWatchpoint hw_watchpoint;
>  /*
>   * We don't use a list because this structure is also used to transmit the
> @@ -348,6 +359,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>       */
>      /* kvm_vm_enable_cap(s, KVM_CAP_S390_AIS, 0); */
>  
> +    kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
>      return 0;
>  }
>  
> -- 
> 2.18.1
> 
> 

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times
  2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
@ 2019-09-17  8:44   ` Peter Xu
  2019-09-17 13:42     ` Igor Mammedov
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2019-09-17  8:44 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: thuth, david, cohuck, qemu-devel, qemu-s390x, pbonzini

On Mon, Sep 16, 2019 at 09:23:47AM -0400, Igor Mammedov wrote:
> PS:
> I don't have access to a suitable system to test it.

Hmm I feel like it would be good to have series like this to be at
least smoke tested somehow...

How about manually setup a very small max memslot size and test it on
x86?  IMHO we'd test with both KVMState.manual_dirty_log_protect to be
on & off to make sure to cover the log_clear() code path.  And of
course to trigger those paths we probably need migrations, but I
believe local migrations would be enough.

And since at it, I'm thinking whether we should start assert() in some
way in memory_region_allocate_system_memory() to make sure it's not
called twice from now on on any board.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times
  2019-09-17  8:44   ` Peter Xu
@ 2019-09-17 13:42     ` Igor Mammedov
  2019-09-18  0:16       ` Peter Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Igor Mammedov @ 2019-09-17 13:42 UTC (permalink / raw)
  To: Peter Xu; +Cc: thuth, david, cohuck, qemu-devel, qemu-s390x, pbonzini

On Tue, 17 Sep 2019 16:44:42 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Mon, Sep 16, 2019 at 09:23:47AM -0400, Igor Mammedov wrote:
> > PS:
> > I don't have access to a suitable system to test it.  
> 
> Hmm I feel like it would be good to have series like this to be at
> least smoke tested somehow...
> 
> How about manually setup a very small max memslot size and test it on
> x86?  IMHO we'd test with both KVMState.manual_dirty_log_protect to be
> on & off to make sure to cover the log_clear() code path.  And of
> course to trigger those paths we probably need migrations, but I
> believe local migrations would be enough.

I did smoke test it (Fedora boot loop) [*] on s390 host with hacked
1G max section. I guess I could hack x86 and do the same for x86 guest.
Anyways, suggestions how to test it better are welcome.

*) I don't have much faith in tests we have though as it didn't
   explode with broken v5 in my case. Hence CCing ones who is more
   familiar with migration parts.

   I used 2 memslot split config at 1Gb with offline migration like this:

   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -monitor stdio 
     (monitor) stop
     (monitor) migrate "exec: cat > savefile
     (monitor) q
   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -incoming "exec: cat savefile"

> 
> And since at it, I'm thinking whether we should start assert() in some
> way in memory_region_allocate_system_memory() to make sure it's not
> called twice from now on on any board.

there is another broken board that misuses it as well,
I intend to clean that up and drop memory_region_allocate_system_memory()
altogether once s390 case is dealt with.

---
*) I don't have much faith in existing tests though as it didn't
   explode with broken v5 in my case. Hence CCing ones who is more
   familiar with migration parts.

   I've used 2 memslot split config at 1Gb with offline migration like this:

   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -monitor stdio 
     (monitor) stop
     (monitor) migrate "exec: cat > savefile
     (monitor) q
   $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
        --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
        -incoming "exec: cat savefile"
     (monitor) cont



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times
  2019-09-17 13:42     ` Igor Mammedov
@ 2019-09-18  0:16       ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2019-09-18  0:16 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: thuth, david, cohuck, qemu-devel, qemu-s390x, pbonzini

On Tue, Sep 17, 2019 at 03:42:12PM +0200, Igor Mammedov wrote:
> On Tue, 17 Sep 2019 16:44:42 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, Sep 16, 2019 at 09:23:47AM -0400, Igor Mammedov wrote:
> > > PS:
> > > I don't have access to a suitable system to test it.  
> > 
> > Hmm I feel like it would be good to have series like this to be at
> > least smoke tested somehow...
> > 
> > How about manually setup a very small max memslot size and test it on
> > x86?  IMHO we'd test with both KVMState.manual_dirty_log_protect to be
> > on & off to make sure to cover the log_clear() code path.  And of
> > course to trigger those paths we probably need migrations, but I
> > believe local migrations would be enough.
> 
> I did smoke test it (Fedora boot loop) [*] on s390 host with hacked
> 1G max section. I guess I could hack x86 and do the same for x86 guest.
> Anyways, suggestions how to test it better are welcome.
> 
> *) I don't have much faith in tests we have though as it didn't
>    explode with broken v5 in my case. Hence CCing ones who is more
>    familiar with migration parts.
> 
>    I used 2 memslot split config at 1Gb with offline migration like this:
> 
>    $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
>         --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
>         -monitor stdio 
>      (monitor) stop
>      (monitor) migrate "exec: cat > savefile
>      (monitor) q
>    $ qemu-system-s390x -M s390-ccw-virtio -m 2048 -cpu max -smp 2 -M accel=kvm  \
>         --nographic --hda fedora.qcow2 -serial unix:/tmp/s,server,nowait \
>         -incoming "exec: cat savefile"

Yeah this looks good already. A better one could be (AFAICS):

  1) as mentioned, enable KVMState.manual_dirty_log_protect to test
     the log_clear path by offering a host kernel new enough (Linux
     5.2+), then it'll be on by default.  Otherwise the default is
     off.  We can enable some trace points to make sure those code
     paths are triggered if uncertain like trace_kvm_clear_dirty_log.

  2) more aggresive dirtying. This can be done by:

    - run a mem dirty workload inside.  I normally use [1] on my own
      ("mig_mon mm_dirty 1024 500" will dirty mem upon 1024MB with
      500MB/s dirty rate), but any tool would work

    - turn down migration bandwidth using "migrate_set_speed" so the
      migration even harder to converge, then dirty bit path is
      tortured more.  Otherwise local full-speed migration normally
      completes super fast due to pure mem moves.

Though if with 2) above I'd suggest to use unix sockets or tcp
otherwise the dumped file could be super big (hopefully not eating all
the laptop disk!).

[1] https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c

Regards,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-09-18  0:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-16 13:23 [Qemu-devel] [PATCH v6 0/2] s390: stop abusing memory_region_allocate_system_memory() Igor Mammedov
2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 1/2] kvm: s390: split too big memory section on several memslots Igor Mammedov
2019-09-16 13:32   ` Dr. David Alan Gilbert
2019-09-17  8:38   ` Peter Xu
2019-09-16 13:23 ` [Qemu-devel] [PATCH v6 2/2] s390: do not call memory_region_allocate_system_memory() multiple times Igor Mammedov
2019-09-17  8:44   ` Peter Xu
2019-09-17 13:42     ` Igor Mammedov
2019-09-18  0:16       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).