* [PATCH v4 0/4] KVM: Dynamic sized memslots array
@ 2024-09-17 16:38 Peter Xu
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Peter Xu @ 2024-09-17 16:38 UTC (permalink / raw)
To: qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, peterx, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand
v4:
- Remove restriction on kvm reports smaller than the default alloc size
(KVM_MEMSLOTS_NR_ALLOC_DEFAULT) [Fabiano]
v1: https://lore.kernel.org/r/20240904191635.3045606-1-peterx@redhat.com
v2: https://lore.kernel.org/r/20240904223510.3519358-1-peterx@redhat.com
v3: https://lore.kernel.org/r/20240909145413.3748429-1-peterx@redhat.com
This series make KVM memslots to be allocated dynamically in QEMU. It
fixes a migration performance regression that I observed, reducing precopy
dirty sync process from ~86ms to ~3ms each time.
Patch 1 is the fix to the problem, while the rest three patches are
cleanups.
Thanks,
Peter Xu (4):
KVM: Dynamic sized kvm memslots array
KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT
KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used
KVM: Rename KVMState->nr_slots to nr_slots_max
include/sysemu/kvm_int.h | 7 +--
accel/kvm/kvm-all.c | 105 ++++++++++++++++++++++++++++++---------
accel/kvm/trace-events | 1 +
3 files changed, 87 insertions(+), 26 deletions(-)
--
2.45.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
@ 2024-09-17 16:38 ` Peter Xu
2024-09-17 17:46 ` Fabiano Rosas
` (2 more replies)
2024-09-17 16:38 ` [PATCH v4 2/4] KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT Peter Xu
` (4 subsequent siblings)
5 siblings, 3 replies; 13+ messages in thread
From: Peter Xu @ 2024-09-17 16:38 UTC (permalink / raw)
To: qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, peterx, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand,
qemu-stable, Zhiyi Guo
Zhiyi reported an infinite loop issue in VFIO use case. The cause of that
was a separate discussion, however during that I found a regression of
dirty sync slowness when profiling.
Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's
statically allocated to be the max supported by the kernel. However after
Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
the max supported memslots reported now grows to some number large enough
so that it may not be wise to always statically allocate with the max
reported.
What's worse, QEMU kvm code still walks all the allocated memslots entries
to do any form of lookups. It can drastically slow down all memslot
operations because each of such loop can run over 32K times on the new
kernels.
Fix this issue by making the memslots to be allocated dynamically.
Here the initial size was set to 16 because it should cover the basic VM
usages, so that the hope is the majority VM use case may not even need to
grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
it'll consume 9 memslots), however not too large to waste memory.
There can also be even better way to address this, but so far this is the
simplest and should be already better even than before we grow the max
supported memslots. For example, in the case of above issue when VFIO was
attached on a 32GB system, there are only ~10 memslots used. So it could
be good enough as of now.
In the above VFIO context, measurement shows that the precopy dirty sync
shrinked from ~86ms to ~3ms after this patch applied. It should also apply
to any KVM enabled VM even without VFIO.
NOTE: we don't have a FIXES tag for this patch because there's no real
commit that regressed this in QEMU. Such behavior existed for a long time,
but only start to be a problem when the kernel reports very large
nr_slots_max value. However that's pretty common now (the kernel change
was merged in 2021) so we attached cc:stable because we'll want this change
to be backported to stable branches.
Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/sysemu/kvm_int.h | 1 +
accel/kvm/kvm-all.c | 87 +++++++++++++++++++++++++++++++++-------
accel/kvm/trace-events | 1 +
3 files changed, 74 insertions(+), 15 deletions(-)
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 17483ff53b..2304537b93 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -46,6 +46,7 @@ typedef struct KVMMemoryListener {
MemoryListener listener;
KVMSlot *slots;
unsigned int nr_used_slots;
+ unsigned int nr_slots_allocated;
int as_id;
QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add;
QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_del;
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index beb1988d12..e0430f08ea 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -69,6 +69,9 @@
#define KVM_GUESTDBG_BLOCKIRQ 0
#endif
+/* Default num of memslots to be allocated when VM starts */
+#define KVM_MEMSLOTS_NR_ALLOC_DEFAULT 16
+
struct KVMParkedVcpu {
unsigned long vcpu_id;
int kvm_fd;
@@ -165,6 +168,57 @@ void kvm_resample_fd_notify(int gsi)
}
}
+/**
+ * kvm_slots_grow(): Grow the slots[] array in the KVMMemoryListener
+ *
+ * @kml: The KVMMemoryListener* to grow the slots[] array
+ * @nr_slots_new: The new size of slots[] array
+ *
+ * Returns: True if the array grows larger, false otherwise.
+ */
+static bool kvm_slots_grow(KVMMemoryListener *kml, unsigned int nr_slots_new)
+{
+ unsigned int i, cur = kml->nr_slots_allocated;
+ KVMSlot *slots;
+
+ if (nr_slots_new > kvm_state->nr_slots) {
+ nr_slots_new = kvm_state->nr_slots;
+ }
+
+ if (cur >= nr_slots_new) {
+ /* Big enough, no need to grow, or we reached max */
+ return false;
+ }
+
+ if (cur == 0) {
+ slots = g_new0(KVMSlot, nr_slots_new);
+ } else {
+ assert(kml->slots);
+ slots = g_renew(KVMSlot, kml->slots, nr_slots_new);
+ /*
+ * g_renew() doesn't initialize extended buffers, however kvm
+ * memslots require fields to be zero-initialized. E.g. pointers,
+ * memory_size field, etc.
+ */
+ memset(&slots[cur], 0x0, sizeof(slots[0]) * (nr_slots_new - cur));
+ }
+
+ for (i = cur; i < nr_slots_new; i++) {
+ slots[i].slot = i;
+ }
+
+ kml->slots = slots;
+ kml->nr_slots_allocated = nr_slots_new;
+ trace_kvm_slots_grow(cur, nr_slots_new);
+
+ return true;
+}
+
+static bool kvm_slots_double(KVMMemoryListener *kml)
+{
+ return kvm_slots_grow(kml, kml->nr_slots_allocated * 2);
+}
+
unsigned int kvm_get_max_memslots(void)
{
KVMState *s = KVM_STATE(current_accel());
@@ -193,15 +247,26 @@ unsigned int kvm_get_free_memslots(void)
/* Called with KVMMemoryListener.slots_lock held */
static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
{
- KVMState *s = kvm_state;
+ unsigned int n;
int i;
- for (i = 0; i < s->nr_slots; i++) {
+ for (i = 0; i < kml->nr_slots_allocated; i++) {
if (kml->slots[i].memory_size == 0) {
return &kml->slots[i];
}
}
+ /*
+ * If no free slots, try to grow first by doubling. Cache the old size
+ * here to avoid another round of search: if the grow succeeded, it
+ * means slots[] now must have the existing "n" slots occupied,
+ * followed by one or more free slots starting from slots[n].
+ */
+ n = kml->nr_slots_allocated;
+ if (kvm_slots_double(kml)) {
+ return &kml->slots[n];
+ }
+
return NULL;
}
@@ -222,10 +287,9 @@ static KVMSlot *kvm_lookup_matching_slot(KVMMemoryListener *kml,
hwaddr start_addr,
hwaddr size)
{
- KVMState *s = kvm_state;
int i;
- for (i = 0; i < s->nr_slots; i++) {
+ for (i = 0; i < kml->nr_slots_allocated; i++) {
KVMSlot *mem = &kml->slots[i];
if (start_addr == mem->start_addr && size == mem->memory_size) {
@@ -267,7 +331,7 @@ int kvm_physical_memory_addr_from_host(KVMState *s, void *ram,
int i, ret = 0;
kvm_slots_lock();
- for (i = 0; i < s->nr_slots; i++) {
+ for (i = 0; i < kml->nr_slots_allocated; i++) {
KVMSlot *mem = &kml->slots[i];
if (ram >= mem->ram && ram < mem->ram + mem->memory_size) {
@@ -1071,7 +1135,7 @@ static int kvm_physical_log_clear(KVMMemoryListener *kml,
kvm_slots_lock();
- for (i = 0; i < s->nr_slots; i++) {
+ for (i = 0; i < kml->nr_slots_allocated; i++) {
mem = &kml->slots[i];
/* Discard slots that are empty or do not overlap the section */
if (!mem->memory_size ||
@@ -1719,12 +1783,8 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
/* Flush all kernel dirty addresses into KVMSlot dirty bitmap */
kvm_dirty_ring_flush();
- /*
- * TODO: make this faster when nr_slots is big while there are
- * only a few used slots (small VMs).
- */
kvm_slots_lock();
- for (i = 0; i < s->nr_slots; i++) {
+ for (i = 0; i < kml->nr_slots_allocated; i++) {
mem = &kml->slots[i];
if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
kvm_slot_sync_dirty_pages(mem);
@@ -1839,12 +1899,9 @@ void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
{
int i;
- kml->slots = g_new0(KVMSlot, s->nr_slots);
kml->as_id = as_id;
- for (i = 0; i < s->nr_slots; i++) {
- kml->slots[i].slot = i;
- }
+ kvm_slots_grow(kml, KVM_MEMSLOTS_NR_ALLOC_DEFAULT);
QSIMPLEQ_INIT(&kml->transaction_add);
QSIMPLEQ_INIT(&kml->transaction_del);
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 82c65fd2ab..e43d18a869 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -36,3 +36,4 @@ kvm_io_window_exit(void) ""
kvm_run_exit_system_event(int cpu_index, uint32_t event_type) "cpu_index %d, system_even_type %"PRIu32
kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"
kvm_memory_fault(uint64_t start, uint64_t size, uint64_t flags) "start 0x%" PRIx64 " size 0x%" PRIx64 " flags 0x%" PRIx64
+kvm_slots_grow(unsigned int old, unsigned int new) "%u -> %u"
--
2.45.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 2/4] KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
@ 2024-09-17 16:38 ` Peter Xu
2024-09-17 16:38 ` [PATCH v4 3/4] KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used Peter Xu
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2024-09-17 16:38 UTC (permalink / raw)
To: qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, peterx, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand
Make the default max nr_slots a macro, it's only used when KVM reports
nothing.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
accel/kvm/kvm-all.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e0430f08ea..0b66c8b27f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -71,6 +71,8 @@
/* Default num of memslots to be allocated when VM starts */
#define KVM_MEMSLOTS_NR_ALLOC_DEFAULT 16
+/* Default max allowed memslots if kernel reported nothing */
+#define KVM_MEMSLOTS_NR_MAX_DEFAULT 32
struct KVMParkedVcpu {
unsigned long vcpu_id;
@@ -2515,7 +2517,7 @@ static int kvm_init(MachineState *ms)
/* If unspecified, use the default value */
if (!s->nr_slots) {
- s->nr_slots = 32;
+ s->nr_slots_max = KVM_MEMSLOTS_NR_MAX_DEFAULT;
}
s->nr_as = kvm_check_extension(s, KVM_CAP_MULTI_ADDRESS_SPACE);
--
2.45.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 3/4] KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
2024-09-17 16:38 ` [PATCH v4 2/4] KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT Peter Xu
@ 2024-09-17 16:38 ` Peter Xu
2024-09-17 16:38 ` [PATCH v4 4/4] KVM: Rename KVMState->nr_slots to nr_slots_max Peter Xu
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2024-09-17 16:38 UTC (permalink / raw)
To: qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, peterx, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand
This will make all nr_slots counters to be named in the same manner.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/sysemu/kvm_int.h | 2 +-
accel/kvm/kvm-all.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 2304537b93..914c5c9ec5 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -45,7 +45,7 @@ typedef struct KVMMemoryUpdate {
typedef struct KVMMemoryListener {
MemoryListener listener;
KVMSlot *slots;
- unsigned int nr_used_slots;
+ unsigned int nr_slots_used;
unsigned int nr_slots_allocated;
int as_id;
QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add;
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0b66c8b27f..bceb154943 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -239,7 +239,7 @@ unsigned int kvm_get_free_memslots(void)
if (!s->as[i].ml) {
continue;
}
- used_slots = MAX(used_slots, s->as[i].ml->nr_used_slots);
+ used_slots = MAX(used_slots, s->as[i].ml->nr_slots_used);
}
kvm_slots_unlock();
@@ -1516,7 +1516,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
}
start_addr += slot_size;
size -= slot_size;
- kml->nr_used_slots--;
+ kml->nr_slots_used--;
} while (size);
return;
}
@@ -1555,7 +1555,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
ram_start_offset += slot_size;
ram += slot_size;
size -= slot_size;
- kml->nr_used_slots++;
+ kml->nr_slots_used++;
} while (size);
}
--
2.45.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 4/4] KVM: Rename KVMState->nr_slots to nr_slots_max
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
` (2 preceding siblings ...)
2024-09-17 16:38 ` [PATCH v4 3/4] KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used Peter Xu
@ 2024-09-17 16:38 ` Peter Xu
2024-10-09 12:44 ` [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
2024-10-10 7:53 ` Paolo Bonzini
5 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2024-09-17 16:38 UTC (permalink / raw)
To: qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, peterx, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand
This value used to reflect the maximum supported memslots from KVM kernel.
Rename it to be clearer.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/sysemu/kvm_int.h | 4 ++--
accel/kvm/kvm-all.c | 12 ++++++------
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 914c5c9ec5..a1e72763da 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -103,8 +103,8 @@ struct KVMDirtyRingReaper {
struct KVMState
{
AccelState parent_obj;
-
- int nr_slots;
+ /* Max number of KVM slots supported */
+ int nr_slots_max;
int fd;
int vmfd;
int coalesced_mmio;
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index bceb154943..7fa4019544 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -183,8 +183,8 @@ static bool kvm_slots_grow(KVMMemoryListener *kml, unsigned int nr_slots_new)
unsigned int i, cur = kml->nr_slots_allocated;
KVMSlot *slots;
- if (nr_slots_new > kvm_state->nr_slots) {
- nr_slots_new = kvm_state->nr_slots;
+ if (nr_slots_new > kvm_state->nr_slots_max) {
+ nr_slots_new = kvm_state->nr_slots_max;
}
if (cur >= nr_slots_new) {
@@ -225,7 +225,7 @@ unsigned int kvm_get_max_memslots(void)
{
KVMState *s = KVM_STATE(current_accel());
- return s->nr_slots;
+ return s->nr_slots_max;
}
unsigned int kvm_get_free_memslots(void)
@@ -243,7 +243,7 @@ unsigned int kvm_get_free_memslots(void)
}
kvm_slots_unlock();
- return s->nr_slots - used_slots;
+ return s->nr_slots_max - used_slots;
}
/* Called with KVMMemoryListener.slots_lock held */
@@ -2513,10 +2513,10 @@ static int kvm_init(MachineState *ms)
(kvm_supported_memory_attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE);
kvm_immediate_exit = kvm_check_extension(s, KVM_CAP_IMMEDIATE_EXIT);
- s->nr_slots = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
+ s->nr_slots_max = kvm_check_extension(s, KVM_CAP_NR_MEMSLOTS);
/* If unspecified, use the default value */
- if (!s->nr_slots) {
+ if (!s->nr_slots_max) {
s->nr_slots_max = KVM_MEMSLOTS_NR_MAX_DEFAULT;
}
--
2.45.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
@ 2024-09-17 17:46 ` Fabiano Rosas
2024-09-19 8:15 ` David Hildenbrand
2024-10-18 15:38 ` Michael Tokarev
2 siblings, 0 replies; 13+ messages in thread
From: Fabiano Rosas @ 2024-09-17 17:46 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Vitaly Kuznetsov, peterx, Prasad Pandit, Julia Suvorova,
Juraj Marcin, Paolo Bonzini, David Hildenbrand, qemu-stable,
Zhiyi Guo
Peter Xu <peterx@redhat.com> writes:
> Zhiyi reported an infinite loop issue in VFIO use case. The cause of that
> was a separate discussion, however during that I found a regression of
> dirty sync slowness when profiling.
>
> Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's
> statically allocated to be the max supported by the kernel. However after
> Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
> the max supported memslots reported now grows to some number large enough
> so that it may not be wise to always statically allocate with the max
> reported.
>
> What's worse, QEMU kvm code still walks all the allocated memslots entries
> to do any form of lookups. It can drastically slow down all memslot
> operations because each of such loop can run over 32K times on the new
> kernels.
>
> Fix this issue by making the memslots to be allocated dynamically.
>
> Here the initial size was set to 16 because it should cover the basic VM
> usages, so that the hope is the majority VM use case may not even need to
> grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
> it'll consume 9 memslots), however not too large to waste memory.
>
> There can also be even better way to address this, but so far this is the
> simplest and should be already better even than before we grow the max
> supported memslots. For example, in the case of above issue when VFIO was
> attached on a 32GB system, there are only ~10 memslots used. So it could
> be good enough as of now.
>
> In the above VFIO context, measurement shows that the precopy dirty sync
> shrinked from ~86ms to ~3ms after this patch applied. It should also apply
> to any KVM enabled VM even without VFIO.
>
> NOTE: we don't have a FIXES tag for this patch because there's no real
> commit that regressed this in QEMU. Such behavior existed for a long time,
> but only start to be a problem when the kernel reports very large
> nr_slots_max value. However that's pretty common now (the kernel change
> was merged in 2021) so we attached cc:stable because we'll want this change
> to be backported to stable branches.
>
> Cc: qemu-stable <qemu-stable@nongnu.org>
> Reported-by: Zhiyi Guo <zhguo@redhat.com>
> Tested-by: Zhiyi Guo <zhguo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
2024-09-17 17:46 ` Fabiano Rosas
@ 2024-09-19 8:15 ` David Hildenbrand
2024-10-18 15:38 ` Michael Tokarev
2 siblings, 0 replies; 13+ messages in thread
From: David Hildenbrand @ 2024-09-19 8:15 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit, Julia Suvorova,
Juraj Marcin, Paolo Bonzini, qemu-stable, Zhiyi Guo
On 17.09.24 18:38, Peter Xu wrote:
> Zhiyi reported an infinite loop issue in VFIO use case. The cause of that
> was a separate discussion, however during that I found a regression of
> dirty sync slowness when profiling.
>
> Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's
> statically allocated to be the max supported by the kernel. However after
> Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
> the max supported memslots reported now grows to some number large enough
> so that it may not be wise to always statically allocate with the max
> reported.
>
> What's worse, QEMU kvm code still walks all the allocated memslots entries
> to do any form of lookups. It can drastically slow down all memslot
> operations because each of such loop can run over 32K times on the new
> kernels.
>
> Fix this issue by making the memslots to be allocated dynamically.
>
> Here the initial size was set to 16 because it should cover the basic VM
> usages, so that the hope is the majority VM use case may not even need to
> grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
> it'll consume 9 memslots), however not too large to waste memory.
>
> There can also be even better way to address this, but so far this is the
> simplest and should be already better even than before we grow the max
> supported memslots. For example, in the case of above issue when VFIO was
> attached on a 32GB system, there are only ~10 memslots used. So it could
> be good enough as of now.
>
> In the above VFIO context, measurement shows that the precopy dirty sync
> shrinked from ~86ms to ~3ms after this patch applied. It should also apply
> to any KVM enabled VM even without VFIO.
>
> NOTE: we don't have a FIXES tag for this patch because there's no real
> commit that regressed this in QEMU. Such behavior existed for a long time,
> but only start to be a problem when the kernel reports very large
> nr_slots_max value. However that's pretty common now (the kernel change
> was merged in 2021) so we attached cc:stable because we'll want this change
> to be backported to stable branches.
>
> Cc: qemu-stable <qemu-stable@nongnu.org>
> Reported-by: Zhiyi Guo <zhguo@redhat.com>
> Tested-by: Zhiyi Guo <zhguo@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 0/4] KVM: Dynamic sized memslots array
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
` (3 preceding siblings ...)
2024-09-17 16:38 ` [PATCH v4 4/4] KVM: Rename KVMState->nr_slots to nr_slots_max Peter Xu
@ 2024-10-09 12:44 ` Peter Xu
2024-10-10 7:53 ` Paolo Bonzini
5 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2024-10-09 12:44 UTC (permalink / raw)
To: qemu-devel, Paolo Bonzini
Cc: Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit, Julia Suvorova,
Juraj Marcin, Paolo Bonzini, David Hildenbrand
Ping - Paolo, could you help have a look? Thanks!
On Tue, Sep 17, 2024 at 12:38:31PM -0400, Peter Xu wrote:
> v4:
> - Remove restriction on kvm reports smaller than the default alloc size
> (KVM_MEMSLOTS_NR_ALLOC_DEFAULT) [Fabiano]
>
> v1: https://lore.kernel.org/r/20240904191635.3045606-1-peterx@redhat.com
> v2: https://lore.kernel.org/r/20240904223510.3519358-1-peterx@redhat.com
> v3: https://lore.kernel.org/r/20240909145413.3748429-1-peterx@redhat.com
>
> This series make KVM memslots to be allocated dynamically in QEMU. It
> fixes a migration performance regression that I observed, reducing precopy
> dirty sync process from ~86ms to ~3ms each time.
>
> Patch 1 is the fix to the problem, while the rest three patches are
> cleanups.
>
> Thanks,
>
> Peter Xu (4):
> KVM: Dynamic sized kvm memslots array
> KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT
> KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used
> KVM: Rename KVMState->nr_slots to nr_slots_max
>
> include/sysemu/kvm_int.h | 7 +--
> accel/kvm/kvm-all.c | 105 ++++++++++++++++++++++++++++++---------
> accel/kvm/trace-events | 1 +
> 3 files changed, 87 insertions(+), 26 deletions(-)
>
> --
> 2.45.0
>
--
Peter Xu
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 0/4] KVM: Dynamic sized memslots array
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
` (4 preceding siblings ...)
2024-10-09 12:44 ` [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
@ 2024-10-10 7:53 ` Paolo Bonzini
5 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2024-10-10 7:53 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit,
Julia Suvorova, Juraj Marcin, David Hildenbrand
Queued, thanks.
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
2024-09-17 17:46 ` Fabiano Rosas
2024-09-19 8:15 ` David Hildenbrand
@ 2024-10-18 15:38 ` Michael Tokarev
2024-10-21 14:37 ` Peter Xu
2 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2024-10-18 15:38 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit, Julia Suvorova,
Juraj Marcin, Paolo Bonzini, David Hildenbrand, qemu-stable,
Zhiyi Guo
17.09.2024 19:38, Peter Xu wrote:
> Zhiyi reported an infinite loop issue in VFIO use case. The cause of that
> was a separate discussion, however during that I found a regression of
> dirty sync slowness when profiling.
>
> Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's
> statically allocated to be the max supported by the kernel. However after
> Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
> the max supported memslots reported now grows to some number large enough
> so that it may not be wise to always statically allocate with the max
> reported.
>
> What's worse, QEMU kvm code still walks all the allocated memslots entries
> to do any form of lookups. It can drastically slow down all memslot
> operations because each of such loop can run over 32K times on the new
> kernels.
>
> Fix this issue by making the memslots to be allocated dynamically.
>
> Here the initial size was set to 16 because it should cover the basic VM
> usages, so that the hope is the majority VM use case may not even need to
> grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
> it'll consume 9 memslots), however not too large to waste memory.
>
> There can also be even better way to address this, but so far this is the
> simplest and should be already better even than before we grow the max
> supported memslots. For example, in the case of above issue when VFIO was
> attached on a 32GB system, there are only ~10 memslots used. So it could
> be good enough as of now.
>
> In the above VFIO context, measurement shows that the precopy dirty sync
> shrinked from ~86ms to ~3ms after this patch applied. It should also apply
> to any KVM enabled VM even without VFIO.
>
> NOTE: we don't have a FIXES tag for this patch because there's no real
> commit that regressed this in QEMU. Such behavior existed for a long time,
> but only start to be a problem when the kernel reports very large
> nr_slots_max value. However that's pretty common now (the kernel change
> was merged in 2021) so we attached cc:stable because we'll want this change
> to be backported to stable branches.
Looking at this from qemu-stable PoV, I'm not 100% sure this change is good
for stable-7.2 series, because 7.2 lacks v8.1.0-1571-g5b23186a95
"kvm: Return number of free memslots" commit, which was a preparation for
for memory devices that consume multiple memslots.
I did a backport of this change (currently it is at the tip of staging-7.2
branch of https://gitlab.com/mjt0k/qemu.git) - I had to tweak context and
also to remove now-unused local variable in kvm-all.c. It builds and the
tests run fine, but I'm not really sure it does what it is intended to do.
Should anything else be picked up for 7.2 for all this to work, or should
this change not be back-ported to 7.2 ?
(for more recent releases, everything looks ok).
Thanks,
/mjt
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-10-18 15:38 ` Michael Tokarev
@ 2024-10-21 14:37 ` Peter Xu
2024-10-21 19:05 ` Michael Tokarev
0 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2024-10-21 14:37 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-devel, Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand,
qemu-stable, Zhiyi Guo
Michael,
On Fri, Oct 18, 2024 at 06:38:53PM +0300, Michael Tokarev wrote:
> Looking at this from qemu-stable PoV, I'm not 100% sure this change is good
> for stable-7.2 series, because 7.2 lacks v8.1.0-1571-g5b23186a95
> "kvm: Return number of free memslots" commit, which was a preparation for
> for memory devices that consume multiple memslots.
>
> I did a backport of this change (currently it is at the tip of staging-7.2
> branch of https://gitlab.com/mjt0k/qemu.git) - I had to tweak context and
> also to remove now-unused local variable in kvm-all.c. It builds and the
> tests run fine, but I'm not really sure it does what it is intended to do.
>
> Should anything else be picked up for 7.2 for all this to work, or should
> this change not be back-ported to 7.2 ?
>
> (for more recent releases, everything looks ok).
I don't remember anything this series logically depends on (besides any
context-wise change).
If there's uncertainty / challenge from backporting to some stable branches
from your POV, we can still keep things simple and skip the series, as it's
only a perf regression and only happens during live migrations (which can
enlarge the downtime, for example) but not daily VM use.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-10-21 14:37 ` Peter Xu
@ 2024-10-21 19:05 ` Michael Tokarev
2024-10-21 21:18 ` Peter Xu
0 siblings, 1 reply; 13+ messages in thread
From: Michael Tokarev @ 2024-10-21 19:05 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand,
qemu-stable, Zhiyi Guo
21.10.2024 17:37, Peter Xu wrote:
> Michael,
>
> On Fri, Oct 18, 2024 at 06:38:53PM +0300, Michael Tokarev wrote:
>> Looking at this from qemu-stable PoV, I'm not 100% sure this change is good
>> for stable-7.2 series, because 7.2 lacks v8.1.0-1571-g5b23186a95
>> "kvm: Return number of free memslots" commit, which was a preparation for
>> for memory devices that consume multiple memslots.
>>
>> I did a backport of this change (currently it is at the tip of staging-7.2
>> branch of https://gitlab.com/mjt0k/qemu.git) - I had to tweak context and
>> also to remove now-unused local variable in kvm-all.c. It builds and the
>> tests run fine, but I'm not really sure it does what it is intended to do.
>>
>> Should anything else be picked up for 7.2 for all this to work, or should
>> this change not be back-ported to 7.2 ?
>>
>> (for more recent releases, everything looks ok).
>
> I don't remember anything this series logically depends on (besides any
> context-wise change).
Well, 7.2 is a bit old by now, and the commit I already mentioned above is
also quite old, - at the time you started working on this series, this
commit (v8.1.0-1571-g5b23186a95) has been in the tree for a long time
already. This change might be relevant here or might be not.
> If there's uncertainty / challenge from backporting to some stable branches
> from your POV, we can still keep things simple and skip the series, as it's
> only a perf regression and only happens during live migrations (which can
> enlarge the downtime, for example) but not daily VM use.
For this change alone, I did the backport, I just am not sure it makes sense.
It would be great if you take a look, including the change I mentioned above
(it isn't in 7.2), there: https://gitlab.com/mjt0k/qemu/-/commits/staging-7.2
Or we can just drop it for 7.2 per the above.
Thanks,
/mjt
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] KVM: Dynamic sized kvm memslots array
2024-10-21 19:05 ` Michael Tokarev
@ 2024-10-21 21:18 ` Peter Xu
0 siblings, 0 replies; 13+ messages in thread
From: Peter Xu @ 2024-10-21 21:18 UTC (permalink / raw)
To: Michael Tokarev
Cc: qemu-devel, Vitaly Kuznetsov, Fabiano Rosas, Prasad Pandit,
Julia Suvorova, Juraj Marcin, Paolo Bonzini, David Hildenbrand,
qemu-stable, Zhiyi Guo
On Mon, Oct 21, 2024 at 10:05:23PM +0300, Michael Tokarev wrote:
> 21.10.2024 17:37, Peter Xu wrote:
> > Michael,
> >
> > On Fri, Oct 18, 2024 at 06:38:53PM +0300, Michael Tokarev wrote:
> > > Looking at this from qemu-stable PoV, I'm not 100% sure this change is good
> > > for stable-7.2 series, because 7.2 lacks v8.1.0-1571-g5b23186a95
> > > "kvm: Return number of free memslots" commit, which was a preparation for
> > > for memory devices that consume multiple memslots.
> > >
> > > I did a backport of this change (currently it is at the tip of staging-7.2
> > > branch of https://gitlab.com/mjt0k/qemu.git) - I had to tweak context and
> > > also to remove now-unused local variable in kvm-all.c. It builds and the
> > > tests run fine, but I'm not really sure it does what it is intended to do.
> > >
> > > Should anything else be picked up for 7.2 for all this to work, or should
> > > this change not be back-ported to 7.2 ?
> > >
> > > (for more recent releases, everything looks ok).
> >
> > I don't remember anything this series logically depends on (besides any
> > context-wise change).
>
> Well, 7.2 is a bit old by now, and the commit I already mentioned above is
> also quite old, - at the time you started working on this series, this
> commit (v8.1.0-1571-g5b23186a95) has been in the tree for a long time
> already. This change might be relevant here or might be not.
That specific commit (5b23186a95) shouldn't be relevant.
>
> > If there's uncertainty / challenge from backporting to some stable branches
> > from your POV, we can still keep things simple and skip the series, as it's
> > only a perf regression and only happens during live migrations (which can
> > enlarge the downtime, for example) but not daily VM use.
>
> For this change alone, I did the backport, I just am not sure it makes sense.
>
> It would be great if you take a look, including the change I mentioned above
> (it isn't in 7.2), there: https://gitlab.com/mjt0k/qemu/-/commits/staging-7.2
> Or we can just drop it for 7.2 per the above.
I checked the backport, it looks all good.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-21 21:19 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-17 16:38 [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
2024-09-17 16:38 ` [PATCH v4 1/4] KVM: Dynamic sized kvm " Peter Xu
2024-09-17 17:46 ` Fabiano Rosas
2024-09-19 8:15 ` David Hildenbrand
2024-10-18 15:38 ` Michael Tokarev
2024-10-21 14:37 ` Peter Xu
2024-10-21 19:05 ` Michael Tokarev
2024-10-21 21:18 ` Peter Xu
2024-09-17 16:38 ` [PATCH v4 2/4] KVM: Define KVM_MEMSLOTS_NUM_MAX_DEFAULT Peter Xu
2024-09-17 16:38 ` [PATCH v4 3/4] KVM: Rename KVMMemoryListener.nr_used_slots to nr_slots_used Peter Xu
2024-09-17 16:38 ` [PATCH v4 4/4] KVM: Rename KVMState->nr_slots to nr_slots_max Peter Xu
2024-10-09 12:44 ` [PATCH v4 0/4] KVM: Dynamic sized memslots array Peter Xu
2024-10-10 7:53 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).