* [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
@ 2025-05-16 21:35 Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX Sean Christopherson
` (8 more replies)
0 siblings, 9 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
Fix issues with dirty ring harvesting where KVM doesn't bound the processing
of entries in any way, which allows userspace to keep KVM in a tight loop
indefinitely.
E.g.
struct kvm_dirty_gfn *dirty_gfns = vcpu_map_dirty_ring(vcpu);
if (fork()) {
int r;
for (;;) {
r = kvm_vm_reset_dirty_ring(vcpu->vm);
if (r)
printf("RESET %d dirty ring entries\n", r);
}
} else {
int i;
for (i = 0; i < test_dirty_ring_count; i++) {
dirty_gfns[i].slot = TEST_MEM_SLOT_INDEX;
dirty_gfns[i].offset = (i * 64) % host_num_pages;
}
for (;;) {
for (i = 0; i < test_dirty_ring_count; i++)
WRITE_ONCE(dirty_gfns[i].flags, KVM_DIRTY_GFN_F_RESET);
}
}
Patches 1-3 address that class of bugs. Patches 4-6 are cleanups.
v3:
- Fix typos (I apparently can't spell opportunistically to save my life).
[Binbin, James]
- Clean up stale comments. [Binbin]
- Collect reviews. [James, Pankaj]
- Add a lockdep assertion on slots_lock, along with a comment. [James]
v2:
- https://lore.kernel.org/all/20250508141012.1411952-1-seanjc@google.com
- Expand on comments in dirty ring harvesting code. [Yan]
v1: https://lore.kernel.org/all/20250111010409.1252942-1-seanjc@google.com
Sean Christopherson (6):
KVM: Bound the number of dirty ring entries in a single reset at
INT_MAX
KVM: Bail from the dirty ring reset flow if a signal is pending
KVM: Conditionally reschedule when resetting the dirty ring
KVM: Check for empty mask of harvested dirty ring entries in caller
KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
resets
KVM: Assert that slots_lock is held when resetting per-vCPU dirty
rings
include/linux/kvm_dirty_ring.h | 18 ++----
virt/kvm/dirty_ring.c | 111 +++++++++++++++++++++++----------
virt/kvm/kvm_main.c | 9 ++-
3 files changed, 89 insertions(+), 49 deletions(-)
base-commit: 7ef51a41466bc846ad794d505e2e34ff97157f7f
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
@ 2025-05-16 21:35 ` Sean Christopherson
2025-05-20 6:51 ` Binbin Wu
2025-05-16 21:35 ` [PATCH v3 2/6] KVM: Bail from the dirty ring reset flow if a signal is pending Sean Christopherson
` (7 subsequent siblings)
8 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
Cap the number of ring entries that are reset in a single ioctl to INT_MAX
to ensure userspace isn't confused by a wrap into negative space, and so
that, in a truly pathological scenario, KVM doesn't miss a TLB flush due
to the count wrapping to zero. While the size of the ring is fixed at
0x10000 entries and KVM (currently) supports at most 4096, userspace is
allowed to harvest entries from the ring while the reset is in-progress,
i.e. it's possible for the ring to always have harvested entries.
Opportunistically return an actual error code from the helper so that a
future fix to handle pending signals can gracefully return -EINTR. Drop
the function comment now that the return code is a stanard 0/-errno (and
because a future commit will add a proper lockdep assertion).
Opportunistically drop a similarly stale comment for kvm_dirty_ring_push().
Cc: Peter Xu <peterx@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Binbin Wu <binbin.wu@linux.intel.com>
Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_dirty_ring.h | 18 +++++-------------
virt/kvm/dirty_ring.c | 10 +++++-----
virt/kvm/kvm_main.c | 9 ++++++---
3 files changed, 16 insertions(+), 21 deletions(-)
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index da4d9b5f58f1..eb10d87adf7d 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -49,9 +49,10 @@ static inline int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *r
}
static inline int kvm_dirty_ring_reset(struct kvm *kvm,
- struct kvm_dirty_ring *ring)
+ struct kvm_dirty_ring *ring,
+ int *nr_entries_reset)
{
- return 0;
+ return -ENOENT;
}
static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
@@ -77,17 +78,8 @@ bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm);
int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring,
int index, u32 size);
-
-/*
- * called with kvm->slots_lock held, returns the number of
- * processed pages.
- */
-int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
-
-/*
- * returns =0: successfully pushed
- * <0: unable to push, need to wait
- */
+int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
+ int *nr_entries_reset);
void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d14ffc7513ee..77986f34eff8 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -105,19 +105,19 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
}
-int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
+int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
+ int *nr_entries_reset)
{
u32 cur_slot, next_slot;
u64 cur_offset, next_offset;
unsigned long mask;
- int count = 0;
struct kvm_dirty_gfn *entry;
bool first_round = true;
/* This is only needed to make compilers happy */
cur_slot = cur_offset = mask = 0;
- while (true) {
+ while (likely((*nr_entries_reset) < INT_MAX)) {
entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
if (!kvm_dirty_gfn_harvested(entry))
@@ -130,7 +130,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
kvm_dirty_gfn_set_invalid(entry);
ring->reset_index++;
- count++;
+ (*nr_entries_reset)++;
/*
* Try to coalesce the reset operations when the guest is
* scanning pages in the same slot.
@@ -167,7 +167,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
trace_kvm_dirty_ring_reset(ring);
- return count;
+ return 0;
}
void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b24db92e98f3..571688507204 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4903,15 +4903,18 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
{
unsigned long i;
struct kvm_vcpu *vcpu;
- int cleared = 0;
+ int cleared = 0, r;
if (!kvm->dirty_ring_size)
return -EINVAL;
mutex_lock(&kvm->slots_lock);
- kvm_for_each_vcpu(i, vcpu, kvm)
- cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ r = kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, &cleared);
+ if (r)
+ break;
+ }
mutex_unlock(&kvm->slots_lock);
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 2/6] KVM: Bail from the dirty ring reset flow if a signal is pending
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX Sean Christopherson
@ 2025-05-16 21:35 ` Sean Christopherson
2025-05-20 6:53 ` Binbin Wu
2025-05-16 21:35 ` [PATCH v3 3/6] KVM: Conditionally reschedule when resetting the dirty ring Sean Christopherson
` (6 subsequent siblings)
8 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
Abort a dirty ring reset if the current task has a pending signal, as the
hard limit of INT_MAX entries doesn't ensure KVM will respond to a signal
in a timely fashion.
Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/dirty_ring.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 77986f34eff8..e844e869e8c7 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -118,6 +118,9 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
cur_slot = cur_offset = mask = 0;
while (likely((*nr_entries_reset) < INT_MAX)) {
+ if (signal_pending(current))
+ return -EINTR;
+
entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
if (!kvm_dirty_gfn_harvested(entry))
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 3/6] KVM: Conditionally reschedule when resetting the dirty ring
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 2/6] KVM: Bail from the dirty ring reset flow if a signal is pending Sean Christopherson
@ 2025-05-16 21:35 ` Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller Sean Christopherson
` (5 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
When resetting a dirty ring, conditionally reschedule on each iteration
after the first. The recently introduced hard limit mitigates the issue
of an endless reset, but isn't sufficient to completely prevent RCU
stalls, soft lockups, etc., nor is the hard limit intended to guard
against such badness.
Note! Take care to check for reschedule even in the "continue" paths,
as a pathological scenario (or malicious userspace) could dirty the same
gfn over and over, i.e. always hit the continue path.
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563
rcu: (t=5250 jiffies g=-319 q=608 ncpus=24)
CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G L 6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm]
Call Trace:
<TASK>
kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
kvm_dirty_ring_reset+0x58/0x220 [kvm]
kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
__x64_sys_ioctl+0x8b/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm]
Call Trace:
<TASK>
kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm]
kvm_dirty_ring_reset+0x58/0x220 [kvm]
kvm_vm_ioctl+0x10eb/0x15d0 [kvm]
__x64_sys_ioctl+0x8b/0xb0
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
Reviewed-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/dirty_ring.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index e844e869e8c7..97cca0c02fd1 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -134,6 +134,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
ring->reset_index++;
(*nr_entries_reset)++;
+
+ /*
+ * While the size of each ring is fixed, it's possible for the
+ * ring to be constantly re-dirtied/harvested while the reset
+ * is in-progress (the hard limit exists only to guard against
+ * wrapping the count into negative space).
+ */
+ if (!first_round)
+ cond_resched();
+
/*
* Try to coalesce the reset operations when the guest is
* scanning pages in the same slot.
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
` (2 preceding siblings ...)
2025-05-16 21:35 ` [PATCH v3 3/6] KVM: Conditionally reschedule when resetting the dirty ring Sean Christopherson
@ 2025-05-16 21:35 ` Sean Christopherson
2025-05-20 6:56 ` Binbin Wu
2025-05-21 9:16 ` Yan Zhao
2025-05-16 21:35 ` [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets Sean Christopherson
` (4 subsequent siblings)
8 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
When resetting a dirty ring, explicitly check that there is work to be
done before calling kvm_reset_dirty_gfn(), e.g. if no harvested entries
are found and/or on the loop's first iteration, and delete the extremely
misleading comment "This is only needed to make compilers happy". KVM
absolutely relies on mask to be zero-initialized, i.e. the comment is an
outright lie. Furthermore, the compiler is right to complain that KVM is
calling a function with uninitialized data, as there are no guarantees
the implementation details of kvm_reset_dirty_gfn() will be visible to
kvm_dirty_ring_reset().
While the flaw could be fixed by simply deleting (or rewording) the
comment, and duplicating the check is unfortunate, checking mask in the
caller will allow for additional cleanups.
Opportunistically drop the zero-initialization of cur_slot and cur_offset.
If a bug were introduced where either the slot or offset was consumed
before mask is set to a non-zero value, then it is highly desirable for
the compiler (or some other sanitizer) to yell.
Cc: Peter Xu <peterx@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/dirty_ring.c | 44 ++++++++++++++++++++++++++++++++++---------
1 file changed, 35 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 97cca0c02fd1..84c75483a089 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -55,9 +55,6 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
struct kvm_memory_slot *memslot;
int as_id, id;
- if (!mask)
- return;
-
as_id = slot >> 16;
id = (u16)slot;
@@ -108,15 +105,24 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
int *nr_entries_reset)
{
+ /*
+ * To minimize mmu_lock contention, batch resets for harvested entries
+ * whose gfns are in the same slot, and are within N frame numbers of
+ * each other, where N is the number of bits in an unsigned long. For
+ * simplicity, process the current set of entries when the next entry
+ * can't be included in the batch.
+ *
+ * Track the current batch slot, the gfn offset into the slot for the
+ * batch, and the bitmask of gfns that need to be reset (relative to
+ * offset). Note, the offset may be adjusted backwards, e.g. so that
+ * a sequence of gfns X, X-1, ... X-N can be batched.
+ */
u32 cur_slot, next_slot;
u64 cur_offset, next_offset;
- unsigned long mask;
+ unsigned long mask = 0;
struct kvm_dirty_gfn *entry;
bool first_round = true;
- /* This is only needed to make compilers happy */
- cur_slot = cur_offset = mask = 0;
-
while (likely((*nr_entries_reset) < INT_MAX)) {
if (signal_pending(current))
return -EINTR;
@@ -164,14 +170,34 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
continue;
}
}
- kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
+
+ /*
+ * Reset the slot for all the harvested entries that have been
+ * gathered, but not yet fully processed.
+ */
+ if (mask)
+ kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
+
+ /*
+ * The current slot was reset or this is the first harvested
+ * entry, (re)initialize the metadata.
+ */
cur_slot = next_slot;
cur_offset = next_offset;
mask = 1;
first_round = false;
}
- kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
+ /*
+ * Perform a final reset if there are harvested entries that haven't
+ * been processed, which is guaranteed if at least one harvested was
+ * found. The loop only performs a reset when the "next" entry can't
+ * be batched with the "current" entry(s), and that reset processes the
+ * _current_ entry(s); i.e. the last harvested entry, a.k.a. next, will
+ * always be left pending.
+ */
+ if (mask)
+ kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
/*
* The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
` (3 preceding siblings ...)
2025-05-16 21:35 ` [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller Sean Christopherson
@ 2025-05-16 21:35 ` Sean Christopherson
2025-05-20 6:58 ` Binbin Wu
2025-05-21 9:16 ` Yan Zhao
2025-05-16 21:35 ` [PATCH v3 6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings Sean Christopherson
` (3 subsequent siblings)
8 siblings, 2 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
Use "mask" instead of a dedicated boolean to track whether or not there
is at least one to-be-reset entry for the current slot+offset. In the
body of the loop, mask is zero only on the first iteration, i.e. !mask is
equivalent to first_round.
Opportunistically combine the adjacent "if (mask)" statements into a single
if-statement.
No functional change intended.
Cc: Peter Xu <peterx@redhat.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/dirty_ring.c | 60 +++++++++++++++++++++----------------------
1 file changed, 29 insertions(+), 31 deletions(-)
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 84c75483a089..54734025658a 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -121,7 +121,6 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
u64 cur_offset, next_offset;
unsigned long mask = 0;
struct kvm_dirty_gfn *entry;
- bool first_round = true;
while (likely((*nr_entries_reset) < INT_MAX)) {
if (signal_pending(current))
@@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
ring->reset_index++;
(*nr_entries_reset)++;
- /*
- * While the size of each ring is fixed, it's possible for the
- * ring to be constantly re-dirtied/harvested while the reset
- * is in-progress (the hard limit exists only to guard against
- * wrapping the count into negative space).
- */
- if (!first_round)
+ if (mask) {
+ /*
+ * While the size of each ring is fixed, it's possible
+ * for the ring to be constantly re-dirtied/harvested
+ * while the reset is in-progress (the hard limit exists
+ * only to guard against the count becoming negative).
+ */
cond_resched();
- /*
- * Try to coalesce the reset operations when the guest is
- * scanning pages in the same slot.
- */
- if (!first_round && next_slot == cur_slot) {
- s64 delta = next_offset - cur_offset;
+ /*
+ * Try to coalesce the reset operations when the guest
+ * is scanning pages in the same slot.
+ */
+ if (next_slot == cur_slot) {
+ s64 delta = next_offset - cur_offset;
- if (delta >= 0 && delta < BITS_PER_LONG) {
- mask |= 1ull << delta;
- continue;
- }
+ if (delta >= 0 && delta < BITS_PER_LONG) {
+ mask |= 1ull << delta;
+ continue;
+ }
- /* Backwards visit, careful about overflows! */
- if (delta > -BITS_PER_LONG && delta < 0 &&
- (mask << -delta >> -delta) == mask) {
- cur_offset = next_offset;
- mask = (mask << -delta) | 1;
- continue;
+ /* Backwards visit, careful about overflows! */
+ if (delta > -BITS_PER_LONG && delta < 0 &&
+ (mask << -delta >> -delta) == mask) {
+ cur_offset = next_offset;
+ mask = (mask << -delta) | 1;
+ continue;
+ }
}
- }
- /*
- * Reset the slot for all the harvested entries that have been
- * gathered, but not yet fully processed.
- */
- if (mask)
+ /*
+ * Reset the slot for all the harvested entries that
+ * have been gathered, but not yet fully processed.
+ */
kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
+ }
/*
* The current slot was reset or this is the first harvested
@@ -185,7 +184,6 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
cur_slot = next_slot;
cur_offset = next_offset;
mask = 1;
- first_round = false;
}
/*
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v3 6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
` (4 preceding siblings ...)
2025-05-16 21:35 ` [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets Sean Christopherson
@ 2025-05-16 21:35 ` Sean Christopherson
2025-05-20 7:04 ` Binbin Wu
2025-05-20 19:12 ` [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Peter Xu
` (2 subsequent siblings)
8 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-16 21:35 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Sean Christopherson, Pankaj Gupta
Assert that slots_lock is held in kvm_dirty_ring_reset() and add a comment
to explain _why_ slots needs to be held for the duration of the reset.
Link: https://lore.kernel.org/all/aCSns6Q5oTkdXUEe@google.com
Suggested-by: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/dirty_ring.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 54734025658a..1ba02a06378c 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -122,6 +122,14 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
unsigned long mask = 0;
struct kvm_dirty_gfn *entry;
+ /*
+ * Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
+ * e.g. so that KVM fully resets all entries processed by a given call
+ * before returning to userspace. Holding slots_lock also protects
+ * the various memslot accesses.
+ */
+ lockdep_assert_held(&kvm->slots_lock);
+
while (likely((*nr_entries_reset) < INT_MAX)) {
if (signal_pending(current))
return -EINTR;
--
2.49.0.1112.g889b7c5bd8-goog
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX
2025-05-16 21:35 ` [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX Sean Christopherson
@ 2025-05-20 6:51 ` Binbin Wu
0 siblings, 0 replies; 25+ messages in thread
From: Binbin Wu @ 2025-05-20 6:51 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Yan Zhao,
Maxim Levitsky, James Houghton, Pankaj Gupta
On 5/17/2025 5:35 AM, Sean Christopherson wrote:
> Cap the number of ring entries that are reset in a single ioctl to INT_MAX
> to ensure userspace isn't confused by a wrap into negative space, and so
> that, in a truly pathological scenario, KVM doesn't miss a TLB flush due
> to the count wrapping to zero. While the size of the ring is fixed at
> 0x10000 entries and KVM (currently) supports at most 4096, userspace is
> allowed to harvest entries from the ring while the reset is in-progress,
> i.e. it's possible for the ring to always have harvested entries.
>
> Opportunistically return an actual error code from the helper so that a
> future fix to handle pending signals can gracefully return -EINTR. Drop
> the function comment now that the return code is a stanard 0/-errno (and
stanard -> standard
The rest looks good to me.
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> because a future commit will add a proper lockdep assertion).
>
> Opportunistically drop a similarly stale comment for kvm_dirty_ring_push().
>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Cc: Binbin Wu <binbin.wu@linux.intel.com>
> Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> include/linux/kvm_dirty_ring.h | 18 +++++-------------
> virt/kvm/dirty_ring.c | 10 +++++-----
> virt/kvm/kvm_main.c | 9 ++++++---
> 3 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index da4d9b5f58f1..eb10d87adf7d 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -49,9 +49,10 @@ static inline int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *r
> }
>
> static inline int kvm_dirty_ring_reset(struct kvm *kvm,
> - struct kvm_dirty_ring *ring)
> + struct kvm_dirty_ring *ring,
> + int *nr_entries_reset)
> {
> - return 0;
> + return -ENOENT;
> }
>
> static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
> @@ -77,17 +78,8 @@ bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
> u32 kvm_dirty_ring_get_rsvd_entries(struct kvm *kvm);
> int kvm_dirty_ring_alloc(struct kvm *kvm, struct kvm_dirty_ring *ring,
> int index, u32 size);
> -
> -/*
> - * called with kvm->slots_lock held, returns the number of
> - * processed pages.
> - */
> -int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
> -
> -/*
> - * returns =0: successfully pushed
> - * <0: unable to push, need to wait
> - */
> +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> + int *nr_entries_reset);
> void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
>
> bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index d14ffc7513ee..77986f34eff8 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -105,19 +105,19 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
> return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
> }
>
> -int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> + int *nr_entries_reset)
> {
> u32 cur_slot, next_slot;
> u64 cur_offset, next_offset;
> unsigned long mask;
> - int count = 0;
> struct kvm_dirty_gfn *entry;
> bool first_round = true;
>
> /* This is only needed to make compilers happy */
> cur_slot = cur_offset = mask = 0;
>
> - while (true) {
> + while (likely((*nr_entries_reset) < INT_MAX)) {
> entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>
> if (!kvm_dirty_gfn_harvested(entry))
> @@ -130,7 +130,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> kvm_dirty_gfn_set_invalid(entry);
>
> ring->reset_index++;
> - count++;
> + (*nr_entries_reset)++;
> /*
> * Try to coalesce the reset operations when the guest is
> * scanning pages in the same slot.
> @@ -167,7 +167,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>
> trace_kvm_dirty_ring_reset(ring);
>
> - return count;
> + return 0;
> }
>
> void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b24db92e98f3..571688507204 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4903,15 +4903,18 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
> {
> unsigned long i;
> struct kvm_vcpu *vcpu;
> - int cleared = 0;
> + int cleared = 0, r;
>
> if (!kvm->dirty_ring_size)
> return -EINVAL;
>
> mutex_lock(&kvm->slots_lock);
>
> - kvm_for_each_vcpu(i, vcpu, kvm)
> - cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + r = kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, &cleared);
> + if (r)
> + break;
> + }
>
> mutex_unlock(&kvm->slots_lock);
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 2/6] KVM: Bail from the dirty ring reset flow if a signal is pending
2025-05-16 21:35 ` [PATCH v3 2/6] KVM: Bail from the dirty ring reset flow if a signal is pending Sean Christopherson
@ 2025-05-20 6:53 ` Binbin Wu
0 siblings, 0 replies; 25+ messages in thread
From: Binbin Wu @ 2025-05-20 6:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Yan Zhao,
Maxim Levitsky, James Houghton, Pankaj Gupta
On 5/17/2025 5:35 AM, Sean Christopherson wrote:
> Abort a dirty ring reset if the current task has a pending signal, as the
> hard limit of INT_MAX entries doesn't ensure KVM will respond to a signal
> in a timely fashion.
>
> Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking")
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> virt/kvm/dirty_ring.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 77986f34eff8..e844e869e8c7 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -118,6 +118,9 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> cur_slot = cur_offset = mask = 0;
>
> while (likely((*nr_entries_reset) < INT_MAX)) {
> + if (signal_pending(current))
> + return -EINTR;
> +
> entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>
> if (!kvm_dirty_gfn_harvested(entry))
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller
2025-05-16 21:35 ` [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller Sean Christopherson
@ 2025-05-20 6:56 ` Binbin Wu
2025-05-21 9:16 ` Yan Zhao
1 sibling, 0 replies; 25+ messages in thread
From: Binbin Wu @ 2025-05-20 6:56 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Yan Zhao,
Maxim Levitsky, James Houghton, Pankaj Gupta
On 5/17/2025 5:35 AM, Sean Christopherson wrote:
> When resetting a dirty ring, explicitly check that there is work to be
> done before calling kvm_reset_dirty_gfn(), e.g. if no harvested entries
> are found and/or on the loop's first iteration, and delete the extremely
> misleading comment "This is only needed to make compilers happy". KVM
> absolutely relies on mask to be zero-initialized, i.e. the comment is an
> outright lie. Furthermore, the compiler is right to complain that KVM is
> calling a function with uninitialized data, as there are no guarantees
> the implementation details of kvm_reset_dirty_gfn() will be visible to
> kvm_dirty_ring_reset().
>
> While the flaw could be fixed by simply deleting (or rewording) the
> comment, and duplicating the check is unfortunate, checking mask in the
> caller will allow for additional cleanups.
>
> Opportunistically drop the zero-initialization of cur_slot and cur_offset.
> If a bug were introduced where either the slot or offset was consumed
> before mask is set to a non-zero value, then it is highly desirable for
> the compiler (or some other sanitizer) to yell.
>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Cc: Binbin Wu <binbin.wu@linux.intel.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> virt/kvm/dirty_ring.c | 44 ++++++++++++++++++++++++++++++++++---------
> 1 file changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 97cca0c02fd1..84c75483a089 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -55,9 +55,6 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
> struct kvm_memory_slot *memslot;
> int as_id, id;
>
> - if (!mask)
> - return;
> -
> as_id = slot >> 16;
> id = (u16)slot;
>
> @@ -108,15 +105,24 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
> int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> int *nr_entries_reset)
> {
> + /*
> + * To minimize mmu_lock contention, batch resets for harvested entries
> + * whose gfns are in the same slot, and are within N frame numbers of
> + * each other, where N is the number of bits in an unsigned long. For
> + * simplicity, process the current set of entries when the next entry
> + * can't be included in the batch.
> + *
> + * Track the current batch slot, the gfn offset into the slot for the
> + * batch, and the bitmask of gfns that need to be reset (relative to
> + * offset). Note, the offset may be adjusted backwards, e.g. so that
> + * a sequence of gfns X, X-1, ... X-N can be batched.
> + */
> u32 cur_slot, next_slot;
> u64 cur_offset, next_offset;
> - unsigned long mask;
> + unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
> bool first_round = true;
>
> - /* This is only needed to make compilers happy */
> - cur_slot = cur_offset = mask = 0;
> -
> while (likely((*nr_entries_reset) < INT_MAX)) {
> if (signal_pending(current))
> return -EINTR;
> @@ -164,14 +170,34 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> continue;
> }
> }
> - kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> +
> + /*
> + * Reset the slot for all the harvested entries that have been
> + * gathered, but not yet fully processed.
> + */
> + if (mask)
> + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> +
> + /*
> + * The current slot was reset or this is the first harvested
> + * entry, (re)initialize the metadata.
> + */
> cur_slot = next_slot;
> cur_offset = next_offset;
> mask = 1;
> first_round = false;
> }
>
> - kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> + /*
> + * Perform a final reset if there are harvested entries that haven't
> + * been processed, which is guaranteed if at least one harvested was
> + * found. The loop only performs a reset when the "next" entry can't
> + * be batched with the "current" entry(s), and that reset processes the
> + * _current_ entry(s); i.e. the last harvested entry, a.k.a. next, will
> + * always be left pending.
> + */
> + if (mask)
> + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>
> /*
> * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
2025-05-16 21:35 ` [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets Sean Christopherson
@ 2025-05-20 6:58 ` Binbin Wu
2025-05-21 9:16 ` Yan Zhao
1 sibling, 0 replies; 25+ messages in thread
From: Binbin Wu @ 2025-05-20 6:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Yan Zhao,
Maxim Levitsky, James Houghton, Pankaj Gupta
On 5/17/2025 5:35 AM, Sean Christopherson wrote:
> Use "mask" instead of a dedicated boolean to track whether or not there
> is at least one to-be-reset entry for the current slot+offset. In the
> body of the loop, mask is zero only on the first iteration, i.e. !mask is
> equivalent to first_round.
>
> Opportunistically combine the adjacent "if (mask)" statements into a single
> if-statement.
>
> No functional change intended.
>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> virt/kvm/dirty_ring.c | 60 +++++++++++++++++++++----------------------
> 1 file changed, 29 insertions(+), 31 deletions(-)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 84c75483a089..54734025658a 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -121,7 +121,6 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> u64 cur_offset, next_offset;
> unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
> - bool first_round = true;
>
> while (likely((*nr_entries_reset) < INT_MAX)) {
> if (signal_pending(current))
> @@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> ring->reset_index++;
> (*nr_entries_reset)++;
>
> - /*
> - * While the size of each ring is fixed, it's possible for the
> - * ring to be constantly re-dirtied/harvested while the reset
> - * is in-progress (the hard limit exists only to guard against
> - * wrapping the count into negative space).
> - */
> - if (!first_round)
> + if (mask) {
> + /*
> + * While the size of each ring is fixed, it's possible
> + * for the ring to be constantly re-dirtied/harvested
> + * while the reset is in-progress (the hard limit exists
> + * only to guard against the count becoming negative).
> + */
> cond_resched();
>
> - /*
> - * Try to coalesce the reset operations when the guest is
> - * scanning pages in the same slot.
> - */
> - if (!first_round && next_slot == cur_slot) {
> - s64 delta = next_offset - cur_offset;
> + /*
> + * Try to coalesce the reset operations when the guest
> + * is scanning pages in the same slot.
> + */
> + if (next_slot == cur_slot) {
> + s64 delta = next_offset - cur_offset;
>
> - if (delta >= 0 && delta < BITS_PER_LONG) {
> - mask |= 1ull << delta;
> - continue;
> - }
> + if (delta >= 0 && delta < BITS_PER_LONG) {
> + mask |= 1ull << delta;
> + continue;
> + }
>
> - /* Backwards visit, careful about overflows! */
> - if (delta > -BITS_PER_LONG && delta < 0 &&
> - (mask << -delta >> -delta) == mask) {
> - cur_offset = next_offset;
> - mask = (mask << -delta) | 1;
> - continue;
> + /* Backwards visit, careful about overflows! */
> + if (delta > -BITS_PER_LONG && delta < 0 &&
> + (mask << -delta >> -delta) == mask) {
> + cur_offset = next_offset;
> + mask = (mask << -delta) | 1;
> + continue;
> + }
> }
> - }
>
> - /*
> - * Reset the slot for all the harvested entries that have been
> - * gathered, but not yet fully processed.
> - */
> - if (mask)
> + /*
> + * Reset the slot for all the harvested entries that
> + * have been gathered, but not yet fully processed.
> + */
> kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> + }
>
> /*
> * The current slot was reset or this is the first harvested
> @@ -185,7 +184,6 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> cur_slot = next_slot;
> cur_offset = next_offset;
> mask = 1;
> - first_round = false;
> }
>
> /*
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings
2025-05-16 21:35 ` [PATCH v3 6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings Sean Christopherson
@ 2025-05-20 7:04 ` Binbin Wu
0 siblings, 0 replies; 25+ messages in thread
From: Binbin Wu @ 2025-05-20 7:04 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Yan Zhao,
Maxim Levitsky, James Houghton, Pankaj Gupta
On 5/17/2025 5:35 AM, Sean Christopherson wrote:
> Assert that slots_lock is held in kvm_dirty_ring_reset() and add a comment
> to explain _why_ slots needs to be held for the duration of the reset.
>
> Link: https://lore.kernel.org/all/aCSns6Q5oTkdXUEe@google.com
> Suggested-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/dirty_ring.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 54734025658a..1ba02a06378c 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -122,6 +122,14 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
>
> + /*
> + * Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
> + * e.g. so that KVM fully resets all entries processed by a given call
It seems that "e.g." is not needed?
> + * before returning to userspace. Holding slots_lock also protects
> + * the various memslot accesses.
> + */
> + lockdep_assert_held(&kvm->slots_lock);
> +
> while (likely((*nr_entries_reset) < INT_MAX)) {
> if (signal_pending(current))
> return -EINTR;
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
` (5 preceding siblings ...)
2025-05-16 21:35 ` [PATCH v3 6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings Sean Christopherson
@ 2025-05-20 19:12 ` Peter Xu
2025-05-20 23:16 ` Sean Christopherson
2025-05-21 9:21 ` Yan Zhao
2025-06-24 19:36 ` Sean Christopherson
8 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2025-05-20 19:12 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Fri, May 16, 2025 at 02:35:34PM -0700, Sean Christopherson wrote:
> Sean Christopherson (6):
> KVM: Bound the number of dirty ring entries in a single reset at
> INT_MAX
> KVM: Bail from the dirty ring reset flow if a signal is pending
> KVM: Conditionally reschedule when resetting the dirty ring
> KVM: Check for empty mask of harvested dirty ring entries in caller
> KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
> resets
> KVM: Assert that slots_lock is held when resetting per-vCPU dirty
> rings
For the last one, I'd think it's majorly because of the memslot accesses
(or CONFIG_LOCKDEP=y should yell already on resets?). The "serialization
of concurrent RESETs" part could be a good side effect. After all, the
dirty rings rely a lot on the userspace to do right things.. for example,
the userspace better also remember to reset before any slot changes, or
it's possible to collect a dirty pfn with a slot index that was already
removed and reused with a new one..
Maybe we could switch the sentences there in the comment of last patch, but
not a huge deal.
Reviewed-by: Peter Xu <peterx@redhat.com>
Thanks!
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-20 19:12 ` [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Peter Xu
@ 2025-05-20 23:16 ` Sean Christopherson
2025-05-20 23:51 ` Peter Xu
0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-20 23:16 UTC (permalink / raw)
To: Peter Xu
Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Tue, May 20, 2025, Peter Xu wrote:
> On Fri, May 16, 2025 at 02:35:34PM -0700, Sean Christopherson wrote:
> > Sean Christopherson (6):
> > KVM: Bound the number of dirty ring entries in a single reset at
> > INT_MAX
> > KVM: Bail from the dirty ring reset flow if a signal is pending
> > KVM: Conditionally reschedule when resetting the dirty ring
> > KVM: Check for empty mask of harvested dirty ring entries in caller
> > KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
> > resets
> > KVM: Assert that slots_lock is held when resetting per-vCPU dirty
> > rings
>
> For the last one, I'd think it's majorly because of the memslot accesses
> (or CONFIG_LOCKDEP=y should yell already on resets?).
No? If KVM only needed to ensure stable memslot accesses, then SRCU would suffice.
It sounds like holding slots_lock may have been a somewhat unintentional, but the
reason KVM can't switch to SRCU is that doing so would break ordering, not because
slots_lock is needed to protect the memslot accesses.
> The "serialization of concurrent RESETs" part could be a good side effect.
> After all, the dirty rings rely a lot on the userspace to do right things..
> for example, the userspace better also remember to reset before any slot
> changes, or it's possible to collect a dirty pfn with a slot index that was
> already removed and reused with a new one..
>
> Maybe we could switch the sentences there in the comment of last patch, but
> not a huge deal.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Thanks!
>
> --
> Peter Xu
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-20 23:16 ` Sean Christopherson
@ 2025-05-20 23:51 ` Peter Xu
2025-05-21 14:50 ` Sean Christopherson
0 siblings, 1 reply; 25+ messages in thread
From: Peter Xu @ 2025-05-20 23:51 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Tue, May 20, 2025 at 04:16:00PM -0700, Sean Christopherson wrote:
> On Tue, May 20, 2025, Peter Xu wrote:
> > On Fri, May 16, 2025 at 02:35:34PM -0700, Sean Christopherson wrote:
> > > Sean Christopherson (6):
> > > KVM: Bound the number of dirty ring entries in a single reset at
> > > INT_MAX
> > > KVM: Bail from the dirty ring reset flow if a signal is pending
> > > KVM: Conditionally reschedule when resetting the dirty ring
> > > KVM: Check for empty mask of harvested dirty ring entries in caller
> > > KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
> > > resets
> > > KVM: Assert that slots_lock is held when resetting per-vCPU dirty
> > > rings
> >
> > For the last one, I'd think it's majorly because of the memslot accesses
> > (or CONFIG_LOCKDEP=y should yell already on resets?).
>
> No? If KVM only needed to ensure stable memslot accesses, then SRCU would suffice.
> It sounds like holding slots_lock may have been a somewhat unintentional, but the
> reason KVM can't switch to SRCU is that doing so would break ordering, not because
> slots_lock is needed to protect the memslot accesses.
Hmm.. isn't what you said exactly means a "yes"? :)
I mean, I would still expect lockdep to report this ioctl if without the
slots_lock, please correct me if it's not the case. And if using RCU is
not trivial (or not necessary either), so far the slots_lock is still
required to make sure the memslot accesses are legal?
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
2025-05-16 21:35 ` [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets Sean Christopherson
2025-05-20 6:58 ` Binbin Wu
@ 2025-05-21 9:16 ` Yan Zhao
2025-05-21 14:54 ` Sean Christopherson
1 sibling, 1 reply; 25+ messages in thread
From: Yan Zhao @ 2025-05-21 9:16 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Fri, May 16, 2025 at 02:35:39PM -0700, Sean Christopherson wrote:
> Use "mask" instead of a dedicated boolean to track whether or not there
> is at least one to-be-reset entry for the current slot+offset. In the
> body of the loop, mask is zero only on the first iteration, i.e. !mask is
> equivalent to first_round.
>
> Opportunistically combine the adjacent "if (mask)" statements into a single
> if-statement.
>
> No functional change intended.
>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/dirty_ring.c | 60 +++++++++++++++++++++----------------------
> 1 file changed, 29 insertions(+), 31 deletions(-)
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 84c75483a089..54734025658a 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -121,7 +121,6 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> u64 cur_offset, next_offset;
> unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
> - bool first_round = true;
>
> while (likely((*nr_entries_reset) < INT_MAX)) {
> if (signal_pending(current))
> @@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> ring->reset_index++;
> (*nr_entries_reset)++;
>
> - /*
> - * While the size of each ring is fixed, it's possible for the
> - * ring to be constantly re-dirtied/harvested while the reset
> - * is in-progress (the hard limit exists only to guard against
> - * wrapping the count into negative space).
> - */
> - if (!first_round)
> + if (mask) {
> + /*
> + * While the size of each ring is fixed, it's possible
> + * for the ring to be constantly re-dirtied/harvested
> + * while the reset is in-progress (the hard limit exists
> + * only to guard against the count becoming negative).
> + */
> cond_resched();
>
> - /*
> - * Try to coalesce the reset operations when the guest is
> - * scanning pages in the same slot.
> - */
> - if (!first_round && next_slot == cur_slot) {
> - s64 delta = next_offset - cur_offset;
> + /*
> + * Try to coalesce the reset operations when the guest
> + * is scanning pages in the same slot.
> + */
> + if (next_slot == cur_slot) {
> + s64 delta = next_offset - cur_offset;
>
> - if (delta >= 0 && delta < BITS_PER_LONG) {
> - mask |= 1ull << delta;
> - continue;
> - }
> + if (delta >= 0 && delta < BITS_PER_LONG) {
> + mask |= 1ull << delta;
> + continue;
> + }
>
> - /* Backwards visit, careful about overflows! */
> - if (delta > -BITS_PER_LONG && delta < 0 &&
> - (mask << -delta >> -delta) == mask) {
> - cur_offset = next_offset;
> - mask = (mask << -delta) | 1;
> - continue;
> + /* Backwards visit, careful about overflows! */
> + if (delta > -BITS_PER_LONG && delta < 0 &&
> + (mask << -delta >> -delta) == mask) {
> + cur_offset = next_offset;
> + mask = (mask << -delta) | 1;
> + continue;
> + }
> }
> - }
>
> - /*
> - * Reset the slot for all the harvested entries that have been
> - * gathered, but not yet fully processed.
> - */
> - if (mask)
> + /*
> + * Reset the slot for all the harvested entries that
> + * have been gathered, but not yet fully processed.
> + */
> kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
Nit and feel free to ignore it :)
Would it be better to move the "cond_resched()" to here, i.e., executing it for
at most every 64 entries?
> + }
>
> /*
> * The current slot was reset or this is the first harvested
> @@ -185,7 +184,6 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> cur_slot = next_slot;
> cur_offset = next_offset;
> mask = 1;
> - first_round = false;
> }
>
> /*
> --
> 2.49.0.1112.g889b7c5bd8-goog
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller
2025-05-16 21:35 ` [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller Sean Christopherson
2025-05-20 6:56 ` Binbin Wu
@ 2025-05-21 9:16 ` Yan Zhao
2025-05-21 14:55 ` Sean Christopherson
1 sibling, 1 reply; 25+ messages in thread
From: Yan Zhao @ 2025-05-21 9:16 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Fri, May 16, 2025 at 02:35:38PM -0700, Sean Christopherson wrote:
> @@ -108,15 +105,24 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
> int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> int *nr_entries_reset)
> {
> + /*
> + * To minimize mmu_lock contention, batch resets for harvested entries
> + * whose gfns are in the same slot, and are within N frame numbers of
> + * each other, where N is the number of bits in an unsigned long. For
Suppose N is 64,
> + * simplicity, process the current set of entries when the next entry
> + * can't be included in the batch.
> + *
> + * Track the current batch slot, the gfn offset into the slot for the
> + * batch, and the bitmask of gfns that need to be reset (relative to
> + * offset). Note, the offset may be adjusted backwards, e.g. so that
> + * a sequence of gfns X, X-1, ... X-N can be batched.
X-N can't be batched, right?
> + */
> u32 cur_slot, next_slot;
> u64 cur_offset, next_offset;
> - unsigned long mask;
> + unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
> bool first_round = true;
>
> - /* This is only needed to make compilers happy */
> - cur_slot = cur_offset = mask = 0;
> -
> while (likely((*nr_entries_reset) < INT_MAX)) {
> if (signal_pending(current))
> return -EINTR;
> @@ -164,14 +170,34 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> continue;
> }
> }
> - kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> +
> + /*
> + * Reset the slot for all the harvested entries that have been
> + * gathered, but not yet fully processed.
> + */
> + if (mask)
> + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> +
> + /*
> + * The current slot was reset or this is the first harvested
> + * entry, (re)initialize the metadata.
> + */
> cur_slot = next_slot;
> cur_offset = next_offset;
> mask = 1;
> first_round = false;
> }
>
> - kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> + /*
> + * Perform a final reset if there are harvested entries that haven't
> + * been processed, which is guaranteed if at least one harvested was
> + * found. The loop only performs a reset when the "next" entry can't
> + * be batched with the "current" entry(s), and that reset processes the
> + * _current_ entry(s); i.e. the last harvested entry, a.k.a. next, will
> + * always be left pending.
> + */
> + if (mask)
> + kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>
> /*
> * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
> --
> 2.49.0.1112.g889b7c5bd8-goog
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
` (6 preceding siblings ...)
2025-05-20 19:12 ` [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Peter Xu
@ 2025-05-21 9:21 ` Yan Zhao
2025-06-24 19:36 ` Sean Christopherson
8 siblings, 0 replies; 25+ messages in thread
From: Yan Zhao @ 2025-05-21 9:21 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
Aside from the nits,
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
hi Sean,
Do I need to rebase and repost [1]?
[1] https://lore.kernel.org/all/20241223070427.29583-1-yan.y.zhao@intel.com
Thanks
Yan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-20 23:51 ` Peter Xu
@ 2025-05-21 14:50 ` Sean Christopherson
2025-05-21 15:24 ` Peter Xu
0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-21 14:50 UTC (permalink / raw)
To: Peter Xu
Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Tue, May 20, 2025, Peter Xu wrote:
> On Tue, May 20, 2025 at 04:16:00PM -0700, Sean Christopherson wrote:
> > On Tue, May 20, 2025, Peter Xu wrote:
> > > On Fri, May 16, 2025 at 02:35:34PM -0700, Sean Christopherson wrote:
> > > > Sean Christopherson (6):
> > > > KVM: Bound the number of dirty ring entries in a single reset at
> > > > INT_MAX
> > > > KVM: Bail from the dirty ring reset flow if a signal is pending
> > > > KVM: Conditionally reschedule when resetting the dirty ring
> > > > KVM: Check for empty mask of harvested dirty ring entries in caller
> > > > KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
> > > > resets
> > > > KVM: Assert that slots_lock is held when resetting per-vCPU dirty
> > > > rings
> > >
> > > For the last one, I'd think it's majorly because of the memslot accesses
> > > (or CONFIG_LOCKDEP=y should yell already on resets?).
> >
> > No? If KVM only needed to ensure stable memslot accesses, then SRCU would suffice.
> > It sounds like holding slots_lock may have been a somewhat unintentional, but the
> > reason KVM can't switch to SRCU is that doing so would break ordering, not because
> > slots_lock is needed to protect the memslot accesses.
>
> Hmm.. isn't what you said exactly means a "yes"? :)
>
> I mean, I would still expect lockdep to report this ioctl if without the
> slots_lock, please correct me if it's not the case.
Yes, one of slots_lock or SRCU needs to be held.
> And if using RCU is not trivial (or not necessary either), so far the
> slots_lock is still required to make sure the memslot accesses are legal?
I don't follow this part. The intent of the comment is to document why slots_lock
is required, which is exceptional because memslot access for readers are protected
by kvm->srcu. The fact that slots_lock also protects memslots is notable only
because it makes acquiring kvm->srcu superfluous. But grabbing kvm->srcu is still
safe/legal/ok:
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 1ba02a06378c..6bf4f9e2f291 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -121,18 +121,26 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
u64 cur_offset, next_offset;
unsigned long mask = 0;
struct kvm_dirty_gfn *entry;
+ int idx;
/*
* Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
* e.g. so that KVM fully resets all entries processed by a given call
- * before returning to userspace. Holding slots_lock also protects
- * the various memslot accesses.
+ * before returning to userspace.
*/
lockdep_assert_held(&kvm->slots_lock);
+ /*
+ * Holding slots_lock also protects the various memslot accesses, but
+ * acquiring kvm->srcu for read here is still safe, just unnecessary.
+ */
+ idx = srcu_read_lock(&kvm->srcu);
+
while (likely((*nr_entries_reset) < INT_MAX)) {
- if (signal_pending(current))
+ if (signal_pending(current)) {
+ srcu_read_unlock(&kvm->srcu, idx);
return -EINTR;
+ }
entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
@@ -205,6 +213,8 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
if (mask)
kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
+ srcu_read_unlock(&kvm->srcu, idx);
+
/*
* The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
* by the VCPU thread next time when it enters the guest.
--
And unless there are other behaviors that are protected by slots_lock (which is
entirely possible), serializing the processing of each ring could be done via a
dedicated (for example only, the dedicated mutex could/should be per-vCPU, not
global).
This diff in particular shows why I ordered and phrased the comment the way I
did. The blurb about protecting memslot accesses is purely a friendly reminder
to readers. The sole reason for an assert and comment is to call out the need
for ordering.
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 1ba02a06378c..92ac82b535fe 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -102,6 +102,8 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
}
+static DEFINE_MUTEX(per_ring_lock);
+
int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
int *nr_entries_reset)
{
@@ -121,18 +123,22 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
u64 cur_offset, next_offset;
unsigned long mask = 0;
struct kvm_dirty_gfn *entry;
+ int idx;
/*
* Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
* e.g. so that KVM fully resets all entries processed by a given call
- * before returning to userspace. Holding slots_lock also protects
- * the various memslot accesses.
+ * before returning to userspace.
*/
- lockdep_assert_held(&kvm->slots_lock);
+ guard(mutex)(&per_ring_lock);
+
+ idx = srcu_read_lock(&kvm->srcu);
while (likely((*nr_entries_reset) < INT_MAX)) {
- if (signal_pending(current))
+ if (signal_pending(current)) {
+ srcu_read_unlock(&kvm->srcu, idx);
return -EINTR;
+ }
entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
@@ -205,6 +211,8 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
if (mask)
kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
+ srcu_read_unlock(&kvm->srcu, idx);
+
/*
* The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
* by the VCPU thread next time when it enters the guest.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 571688507204..45729a6f6451 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4908,16 +4908,12 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
if (!kvm->dirty_ring_size)
return -EINVAL;
- mutex_lock(&kvm->slots_lock);
-
kvm_for_each_vcpu(i, vcpu, kvm) {
r = kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, &cleared);
if (r)
break;
}
- mutex_unlock(&kvm->slots_lock);
-
if (cleared)
kvm_flush_remote_tlbs(kvm);
--
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
2025-05-21 9:16 ` Yan Zhao
@ 2025-05-21 14:54 ` Sean Christopherson
2025-05-21 19:45 ` Sean Christopherson
0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-21 14:54 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Wed, May 21, 2025, Yan Zhao wrote:
> On Fri, May 16, 2025 at 02:35:39PM -0700, Sean Christopherson wrote:
> > @@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > ring->reset_index++;
> > (*nr_entries_reset)++;
> >
> > - /*
> > - * While the size of each ring is fixed, it's possible for the
> > - * ring to be constantly re-dirtied/harvested while the reset
> > - * is in-progress (the hard limit exists only to guard against
> > - * wrapping the count into negative space).
> > - */
> > - if (!first_round)
> > + if (mask) {
> > + /*
> > + * While the size of each ring is fixed, it's possible
> > + * for the ring to be constantly re-dirtied/harvested
> > + * while the reset is in-progress (the hard limit exists
> > + * only to guard against the count becoming negative).
> > + */
> > cond_resched();
> >
> > - /*
> > - * Try to coalesce the reset operations when the guest is
> > - * scanning pages in the same slot.
> > - */
> > - if (!first_round && next_slot == cur_slot) {
> > - s64 delta = next_offset - cur_offset;
> > + /*
> > + * Try to coalesce the reset operations when the guest
> > + * is scanning pages in the same slot.
> > + */
> > + if (next_slot == cur_slot) {
> > + s64 delta = next_offset - cur_offset;
> >
> > - if (delta >= 0 && delta < BITS_PER_LONG) {
> > - mask |= 1ull << delta;
> > - continue;
> > - }
> > + if (delta >= 0 && delta < BITS_PER_LONG) {
> > + mask |= 1ull << delta;
> > + continue;
> > + }
> >
> > - /* Backwards visit, careful about overflows! */
> > - if (delta > -BITS_PER_LONG && delta < 0 &&
> > - (mask << -delta >> -delta) == mask) {
> > - cur_offset = next_offset;
> > - mask = (mask << -delta) | 1;
> > - continue;
> > + /* Backwards visit, careful about overflows! */
> > + if (delta > -BITS_PER_LONG && delta < 0 &&
> > + (mask << -delta >> -delta) == mask) {
> > + cur_offset = next_offset;
> > + mask = (mask << -delta) | 1;
> > + continue;
> > + }
> > }
> > - }
> >
> > - /*
> > - * Reset the slot for all the harvested entries that have been
> > - * gathered, but not yet fully processed.
> > - */
> > - if (mask)
> > + /*
> > + * Reset the slot for all the harvested entries that
> > + * have been gathered, but not yet fully processed.
> > + */
> > kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> Nit and feel free to ignore it :)
>
> Would it be better to move the "cond_resched()" to here, i.e., executing it for
> at most every 64 entries?
Hmm, yeah, I think that makes sense. The time spent manipulating the ring and
mask+offset is quite trivial, so checking on every single entry is unnecessary.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller
2025-05-21 9:16 ` Yan Zhao
@ 2025-05-21 14:55 ` Sean Christopherson
0 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-05-21 14:55 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Wed, May 21, 2025, Yan Zhao wrote:
> On Fri, May 16, 2025 at 02:35:38PM -0700, Sean Christopherson wrote:
> > @@ -108,15 +105,24 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
> > int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > int *nr_entries_reset)
> > {
> > + /*
> > + * To minimize mmu_lock contention, batch resets for harvested entries
> > + * whose gfns are in the same slot, and are within N frame numbers of
> > + * each other, where N is the number of bits in an unsigned long. For
> Suppose N is 64,
>
> > + * simplicity, process the current set of entries when the next entry
> > + * can't be included in the batch.
> > + *
> > + * Track the current batch slot, the gfn offset into the slot for the
> > + * batch, and the bitmask of gfns that need to be reset (relative to
> > + * offset). Note, the offset may be adjusted backwards, e.g. so that
> > + * a sequence of gfns X, X-1, ... X-N can be batched.
> X-N can't be batched, right?
Hah! Yeah, off-by-one error.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-21 14:50 ` Sean Christopherson
@ 2025-05-21 15:24 ` Peter Xu
0 siblings, 0 replies; 25+ messages in thread
From: Peter Xu @ 2025-05-21 15:24 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Wed, May 21, 2025 at 07:50:10AM -0700, Sean Christopherson wrote:
> On Tue, May 20, 2025, Peter Xu wrote:
> > On Tue, May 20, 2025 at 04:16:00PM -0700, Sean Christopherson wrote:
> > > On Tue, May 20, 2025, Peter Xu wrote:
> > > > On Fri, May 16, 2025 at 02:35:34PM -0700, Sean Christopherson wrote:
> > > > > Sean Christopherson (6):
> > > > > KVM: Bound the number of dirty ring entries in a single reset at
> > > > > INT_MAX
> > > > > KVM: Bail from the dirty ring reset flow if a signal is pending
> > > > > KVM: Conditionally reschedule when resetting the dirty ring
> > > > > KVM: Check for empty mask of harvested dirty ring entries in caller
> > > > > KVM: Use mask of harvested dirty ring entries to coalesce dirty ring
> > > > > resets
> > > > > KVM: Assert that slots_lock is held when resetting per-vCPU dirty
> > > > > rings
> > > >
> > > > For the last one, I'd think it's majorly because of the memslot accesses
> > > > (or CONFIG_LOCKDEP=y should yell already on resets?).
> > >
> > > No? If KVM only needed to ensure stable memslot accesses, then SRCU would suffice.
> > > It sounds like holding slots_lock may have been a somewhat unintentional, but the
> > > reason KVM can't switch to SRCU is that doing so would break ordering, not because
> > > slots_lock is needed to protect the memslot accesses.
> >
> > Hmm.. isn't what you said exactly means a "yes"? :)
> >
> > I mean, I would still expect lockdep to report this ioctl if without the
> > slots_lock, please correct me if it's not the case.
>
> Yes, one of slots_lock or SRCU needs to be held.
>
> > And if using RCU is not trivial (or not necessary either), so far the
> > slots_lock is still required to make sure the memslot accesses are legal?
>
> I don't follow this part. The intent of the comment is to document why slots_lock
> is required, which is exceptional because memslot access for readers are protected
> by kvm->srcu.
I always think it's fine to take slots_lock for readers too. RCU can
definitely be better in most cases..
> The fact that slots_lock also protects memslots is notable only
> because it makes acquiring kvm->srcu superfluous. But grabbing kvm->srcu is still
> safe/legal/ok:
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 1ba02a06378c..6bf4f9e2f291 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -121,18 +121,26 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> u64 cur_offset, next_offset;
> unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
> + int idx;
>
> /*
> * Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
> * e.g. so that KVM fully resets all entries processed by a given call
> - * before returning to userspace. Holding slots_lock also protects
> - * the various memslot accesses.
> + * before returning to userspace.
> */
> lockdep_assert_held(&kvm->slots_lock);
>
> + /*
> + * Holding slots_lock also protects the various memslot accesses, but
> + * acquiring kvm->srcu for read here is still safe, just unnecessary.
> + */
> + idx = srcu_read_lock(&kvm->srcu);
> +
> while (likely((*nr_entries_reset) < INT_MAX)) {
> - if (signal_pending(current))
> + if (signal_pending(current)) {
> + srcu_read_unlock(&kvm->srcu, idx);
> return -EINTR;
> + }
>
> entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>
> @@ -205,6 +213,8 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> if (mask)
> kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>
> + srcu_read_unlock(&kvm->srcu, idx);
> +
> /*
> * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
> * by the VCPU thread next time when it enters the guest.
> --
>
> And unless there are other behaviors that are protected by slots_lock (which is
> entirely possible), serializing the processing of each ring could be done via a
Yes, I am not the original author, but when I was working on it I don't
remember anything relying on that. However still it's possible it can
serialize some operations under the hood (which will be true side effect of
using this lock..).
> dedicated (for example only, the dedicated mutex could/should be per-vCPU, not
> global).
>
> This diff in particular shows why I ordered and phrased the comment the way I
> did. The blurb about protecting memslot accesses is purely a friendly reminder
> to readers. The sole reason for an assert and comment is to call out the need
> for ordering.
>
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 1ba02a06378c..92ac82b535fe 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -102,6 +102,8 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
> return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
> }
>
> +static DEFINE_MUTEX(per_ring_lock);
> +
> int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> int *nr_entries_reset)
> {
> @@ -121,18 +123,22 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> u64 cur_offset, next_offset;
> unsigned long mask = 0;
> struct kvm_dirty_gfn *entry;
> + int idx;
>
> /*
> * Ensure concurrent calls to KVM_RESET_DIRTY_RINGS are serialized,
> * e.g. so that KVM fully resets all entries processed by a given call
> - * before returning to userspace. Holding slots_lock also protects
> - * the various memslot accesses.
> + * before returning to userspace.
> */
> - lockdep_assert_held(&kvm->slots_lock);
> + guard(mutex)(&per_ring_lock);
> +
> + idx = srcu_read_lock(&kvm->srcu);
>
> while (likely((*nr_entries_reset) < INT_MAX)) {
> - if (signal_pending(current))
> + if (signal_pending(current)) {
> + srcu_read_unlock(&kvm->srcu, idx);
> return -EINTR;
> + }
>
> entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>
> @@ -205,6 +211,8 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> if (mask)
> kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>
> + srcu_read_unlock(&kvm->srcu, idx);
> +
> /*
> * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
> * by the VCPU thread next time when it enters the guest.
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 571688507204..45729a6f6451 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4908,16 +4908,12 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
> if (!kvm->dirty_ring_size)
> return -EINVAL;
>
> - mutex_lock(&kvm->slots_lock);
> -
> kvm_for_each_vcpu(i, vcpu, kvm) {
> r = kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring, &cleared);
> if (r)
> break;
> }
>
> - mutex_unlock(&kvm->slots_lock);
> -
> if (cleared)
> kvm_flush_remote_tlbs(kvm);
> --
>
I think we almost agree on each other, and I don't see anything
controversial.
It's just that for this path using srcu may have slight risk of breaking
what used to be serialized as you said. Said that, I'd be surprised if
so.. even if aarch64 is normally even trickier and it now also supports the
rings. So it's just that it seems unnecessary yet to switch to srcu,
because we don't expect any concurrent writters anyway.
So totally no strong opinion on how the comment should be laid out in the
last patch - please feel free to ignore my request. But I hope I stated
the fact, that in the current code base the slots_lock is required to
access memslots safely when rcu isn't around..
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
2025-05-21 14:54 ` Sean Christopherson
@ 2025-05-21 19:45 ` Sean Christopherson
2025-05-22 1:04 ` Yan Zhao
0 siblings, 1 reply; 25+ messages in thread
From: Sean Christopherson @ 2025-05-21 19:45 UTC (permalink / raw)
To: Yan Zhao
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Wed, May 21, 2025, Sean Christopherson wrote:
> On Wed, May 21, 2025, Yan Zhao wrote:
> > On Fri, May 16, 2025 at 02:35:39PM -0700, Sean Christopherson wrote:
> > > @@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > > ring->reset_index++;
> > > (*nr_entries_reset)++;
> > >
> > > - /*
> > > - * While the size of each ring is fixed, it's possible for the
> > > - * ring to be constantly re-dirtied/harvested while the reset
> > > - * is in-progress (the hard limit exists only to guard against
> > > - * wrapping the count into negative space).
> > > - */
> > > - if (!first_round)
> > > + if (mask) {
> > > + /*
> > > + * While the size of each ring is fixed, it's possible
> > > + * for the ring to be constantly re-dirtied/harvested
> > > + * while the reset is in-progress (the hard limit exists
> > > + * only to guard against the count becoming negative).
> > > + */
> > > cond_resched();
> > >
> > > - /*
> > > - * Try to coalesce the reset operations when the guest is
> > > - * scanning pages in the same slot.
> > > - */
> > > - if (!first_round && next_slot == cur_slot) {
> > > - s64 delta = next_offset - cur_offset;
> > > + /*
> > > + * Try to coalesce the reset operations when the guest
> > > + * is scanning pages in the same slot.
> > > + */
> > > + if (next_slot == cur_slot) {
> > > + s64 delta = next_offset - cur_offset;
> > >
> > > - if (delta >= 0 && delta < BITS_PER_LONG) {
> > > - mask |= 1ull << delta;
> > > - continue;
> > > - }
> > > + if (delta >= 0 && delta < BITS_PER_LONG) {
> > > + mask |= 1ull << delta;
> > > + continue;
> > > + }
> > >
> > > - /* Backwards visit, careful about overflows! */
> > > - if (delta > -BITS_PER_LONG && delta < 0 &&
> > > - (mask << -delta >> -delta) == mask) {
> > > - cur_offset = next_offset;
> > > - mask = (mask << -delta) | 1;
> > > - continue;
> > > + /* Backwards visit, careful about overflows! */
> > > + if (delta > -BITS_PER_LONG && delta < 0 &&
> > > + (mask << -delta >> -delta) == mask) {
> > > + cur_offset = next_offset;
> > > + mask = (mask << -delta) | 1;
> > > + continue;
> > > + }
> > > }
> > > - }
> > >
> > > - /*
> > > - * Reset the slot for all the harvested entries that have been
> > > - * gathered, but not yet fully processed.
> > > - */
> > > - if (mask)
> > > + /*
> > > + * Reset the slot for all the harvested entries that
> > > + * have been gathered, but not yet fully processed.
> > > + */
> > > kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > Nit and feel free to ignore it :)
> >
> > Would it be better to move the "cond_resched()" to here, i.e., executing it for
> > at most every 64 entries?
>
> Hmm, yeah, I think that makes sense. The time spent manipulating the ring and
> mask+offset is quite trivial, so checking on every single entry is unnecessary.
Oh, no, scratch that. Thankfully, past me explicitly documented this. From
patch 3:
Note! Take care to check for reschedule even in the "continue" paths,
as a pathological scenario (or malicious userspace) could dirty the same
gfn over and over, i.e. always hit the continue path.
A batch isn't guaranteed to be flushed after processing 64 entries, it's only
flushed when an entry more than N gfns away is encountered.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
2025-05-21 19:45 ` Sean Christopherson
@ 2025-05-22 1:04 ` Yan Zhao
0 siblings, 0 replies; 25+ messages in thread
From: Yan Zhao @ 2025-05-22 1:04 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Peter Xu, Maxim Levitsky,
Binbin Wu, James Houghton, Pankaj Gupta
On Wed, May 21, 2025 at 12:45:44PM -0700, Sean Christopherson wrote:
> On Wed, May 21, 2025, Sean Christopherson wrote:
> > On Wed, May 21, 2025, Yan Zhao wrote:
> > > On Fri, May 16, 2025 at 02:35:39PM -0700, Sean Christopherson wrote:
> > > > @@ -141,42 +140,42 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring,
> > > > ring->reset_index++;
> > > > (*nr_entries_reset)++;
> > > >
> > > > - /*
> > > > - * While the size of each ring is fixed, it's possible for the
> > > > - * ring to be constantly re-dirtied/harvested while the reset
> > > > - * is in-progress (the hard limit exists only to guard against
> > > > - * wrapping the count into negative space).
> > > > - */
> > > > - if (!first_round)
> > > > + if (mask) {
> > > > + /*
> > > > + * While the size of each ring is fixed, it's possible
> > > > + * for the ring to be constantly re-dirtied/harvested
> > > > + * while the reset is in-progress (the hard limit exists
> > > > + * only to guard against the count becoming negative).
> > > > + */
> > > > cond_resched();
> > > >
> > > > - /*
> > > > - * Try to coalesce the reset operations when the guest is
> > > > - * scanning pages in the same slot.
> > > > - */
> > > > - if (!first_round && next_slot == cur_slot) {
> > > > - s64 delta = next_offset - cur_offset;
> > > > + /*
> > > > + * Try to coalesce the reset operations when the guest
> > > > + * is scanning pages in the same slot.
> > > > + */
> > > > + if (next_slot == cur_slot) {
> > > > + s64 delta = next_offset - cur_offset;
> > > >
> > > > - if (delta >= 0 && delta < BITS_PER_LONG) {
> > > > - mask |= 1ull << delta;
> > > > - continue;
> > > > - }
> > > > + if (delta >= 0 && delta < BITS_PER_LONG) {
> > > > + mask |= 1ull << delta;
> > > > + continue;
> > > > + }
> > > >
> > > > - /* Backwards visit, careful about overflows! */
> > > > - if (delta > -BITS_PER_LONG && delta < 0 &&
> > > > - (mask << -delta >> -delta) == mask) {
> > > > - cur_offset = next_offset;
> > > > - mask = (mask << -delta) | 1;
> > > > - continue;
> > > > + /* Backwards visit, careful about overflows! */
> > > > + if (delta > -BITS_PER_LONG && delta < 0 &&
> > > > + (mask << -delta >> -delta) == mask) {
> > > > + cur_offset = next_offset;
> > > > + mask = (mask << -delta) | 1;
> > > > + continue;
> > > > + }
> > > > }
> > > > - }
> > > >
> > > > - /*
> > > > - * Reset the slot for all the harvested entries that have been
> > > > - * gathered, but not yet fully processed.
> > > > - */
> > > > - if (mask)
> > > > + /*
> > > > + * Reset the slot for all the harvested entries that
> > > > + * have been gathered, but not yet fully processed.
> > > > + */
> > > > kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > > Nit and feel free to ignore it :)
> > >
> > > Would it be better to move the "cond_resched()" to here, i.e., executing it for
> > > at most every 64 entries?
> >
> > Hmm, yeah, I think that makes sense. The time spent manipulating the ring and
> > mask+offset is quite trivial, so checking on every single entry is unnecessary.
>
> Oh, no, scratch that. Thankfully, past me explicitly documented this. From
> patch 3:
>
> Note! Take care to check for reschedule even in the "continue" paths,
> as a pathological scenario (or malicious userspace) could dirty the same
> gfn over and over, i.e. always hit the continue path.
>
> A batch isn't guaranteed to be flushed after processing 64 entries, it's only
> flushed when an entry more than N gfns away is encountered.
Oh, I overlooked the "pathological scenario". You are right!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
` (7 preceding siblings ...)
2025-05-21 9:21 ` Yan Zhao
@ 2025-06-24 19:36 ` Sean Christopherson
8 siblings, 0 replies; 25+ messages in thread
From: Sean Christopherson @ 2025-06-24 19:36 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Peter Xu, Yan Zhao, Maxim Levitsky, Binbin Wu,
James Houghton, Pankaj Gupta
On Fri, 16 May 2025 14:35:34 -0700, Sean Christopherson wrote:
> Fix issues with dirty ring harvesting where KVM doesn't bound the processing
> of entries in any way, which allows userspace to keep KVM in a tight loop
> indefinitely.
>
> E.g.
>
> struct kvm_dirty_gfn *dirty_gfns = vcpu_map_dirty_ring(vcpu);
>
> [...]
Applied to kvm-x86 dirty_ring, thanks!
[1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX
https://github.com/kvm-x86/linux/commit/530a8ba71b4c
[2/6] KVM: Bail from the dirty ring reset flow if a signal is pending
https://github.com/kvm-x86/linux/commit/49005a2a3d2a
[3/6] KVM: Conditionally reschedule when resetting the dirty ring
https://github.com/kvm-x86/linux/commit/1333c35c4eea
[4/6] KVM: Check for empty mask of harvested dirty ring entries in caller
https://github.com/kvm-x86/linux/commit/ee188dea1677
[5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets
https://github.com/kvm-x86/linux/commit/e46ad851150f
[6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings
https://github.com/kvm-x86/linux/commit/614fb9d1479b
--
https://github.com/kvm-x86/kvm-unit-tests/tree/next
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-06-24 19:36 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-16 21:35 [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 1/6] KVM: Bound the number of dirty ring entries in a single reset at INT_MAX Sean Christopherson
2025-05-20 6:51 ` Binbin Wu
2025-05-16 21:35 ` [PATCH v3 2/6] KVM: Bail from the dirty ring reset flow if a signal is pending Sean Christopherson
2025-05-20 6:53 ` Binbin Wu
2025-05-16 21:35 ` [PATCH v3 3/6] KVM: Conditionally reschedule when resetting the dirty ring Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 4/6] KVM: Check for empty mask of harvested dirty ring entries in caller Sean Christopherson
2025-05-20 6:56 ` Binbin Wu
2025-05-21 9:16 ` Yan Zhao
2025-05-21 14:55 ` Sean Christopherson
2025-05-16 21:35 ` [PATCH v3 5/6] KVM: Use mask of harvested dirty ring entries to coalesce dirty ring resets Sean Christopherson
2025-05-20 6:58 ` Binbin Wu
2025-05-21 9:16 ` Yan Zhao
2025-05-21 14:54 ` Sean Christopherson
2025-05-21 19:45 ` Sean Christopherson
2025-05-22 1:04 ` Yan Zhao
2025-05-16 21:35 ` [PATCH v3 6/6] KVM: Assert that slots_lock is held when resetting per-vCPU dirty rings Sean Christopherson
2025-05-20 7:04 ` Binbin Wu
2025-05-20 19:12 ` [PATCH v3 0/6] KVM: Dirty ring fixes and cleanups Peter Xu
2025-05-20 23:16 ` Sean Christopherson
2025-05-20 23:51 ` Peter Xu
2025-05-21 14:50 ` Sean Christopherson
2025-05-21 15:24 ` Peter Xu
2025-05-21 9:21 ` Yan Zhao
2025-06-24 19:36 ` Sean Christopherson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.