* [PATCH 0/3] SEV-SNP fix for cpu soft lockup on 1TB+ guests
@ 2025-06-05 15:24 Liam Merwick
2025-06-05 15:25 ` [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup Liam Merwick
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Liam Merwick @ 2025-06-05 15:24 UTC (permalink / raw)
To: kvm
Cc: liam.merwick, pbonzini, seanjc, thomas.lendacky, michael.roth,
tabba, ackerleytng
When creating SEV-SNP guests with a large amount of memory (940GB or greater)
the host experiences a soft cpu lockup while setting the per-page memory
attributes on the whole range of memory in the guest.
The underlying issue is that the implementation of setting the
memory attributes using an Xarray implementation is a time-consuming
operation (e.g. a 1.9TB guest takes over 30 seconds to set the attributes)
Fix the lockup by modifying kvm_vm_ioctl_set_mem_attributes() so that it
sets the attributes on, at most, a range of 512GB at a time and avoids
holding kvm->slot_lock for too long.
Apart from the lockup, the implementation to set memory attributes via Xarray
also results in a delay early in the boot of SEV-SNP/TDX guests - this fix
does not address that. As it happens, the slowness of setting the attributes
was brought up by Michael Roth in the review of Ackerley Tng's series to add
1G page support for guest_memfd [1] where using a Maple Tree implementation
is being proposed to track shareability and Michael suggested that doing it
for KVM mem attributes would be useful also (it should avoid the SLU while
also taking less CPU time in general to populate). If that was implemented
in the future, it should address this lockup but I think there's benefit
in fixing the lockup issue now with a targeted fix.
[1] https://lore.kernel.org/all/20250529054227.hh2f4jmyqf6igd3i@amd.com
Tested with VMs up to 1900GB in size (the limit of hardware available to me)
The functionality was introduced in v6.8 but I tagged as just needing
backporting as far as linux-6.12.y (applies cleanly)
Based on tag: kvm-6.16-1
Liam Merwick (3):
KVM: Batch setting of per-page memory attributes to avoid soft lockup
KVM: Add trace_kvm_vm_set_mem_attributes()
KVM: fix typo in kvm_vm_set_mem_attributes() comment
include/trace/events/kvm.h | 33 +++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 43 ++++++++++++++++++++++++++++++++------
2 files changed, 70 insertions(+), 6 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup
2025-06-05 15:24 [PATCH 0/3] SEV-SNP fix for cpu soft lockup on 1TB+ guests Liam Merwick
@ 2025-06-05 15:25 ` Liam Merwick
2025-06-05 15:57 ` Sean Christopherson
2025-06-05 15:25 ` [PATCH 2/3] KVM: Add trace_kvm_vm_set_mem_attributes() Liam Merwick
2025-06-05 15:25 ` [PATCH 3/3] KVM: fix typo in kvm_vm_set_mem_attributes() comment Liam Merwick
2 siblings, 1 reply; 8+ messages in thread
From: Liam Merwick @ 2025-06-05 15:25 UTC (permalink / raw)
To: kvm
Cc: liam.merwick, pbonzini, seanjc, thomas.lendacky, michael.roth,
tabba, ackerleytng
When booting an SEV-SNP guest with a sufficiently large amount of memory (1TB+),
the host can experience CPU soft lockups when running an operation in
kvm_vm_set_mem_attributes() to set memory attributes on the whole
range of guest memory.
watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [qemu-kvm:6372]
CPU: 8 UID: 0 PID: 6372 Comm: qemu-kvm Kdump: loaded Not tainted 6.15.0-rc7.20250520.el9uek.rc1.x86_64 #1 PREEMPT(voluntary)
Hardware name: Oracle Corporation ORACLE SERVER E4-2c/Asm,MB Tray,2U,E4-2c, BIOS 78016600 11/13/2024
RIP: 0010:xas_create+0x78/0x1f0
Code: 00 00 00 41 80 fc 01 0f 84 82 00 00 00 ba 06 00 00 00 bd 06 00 00 00 49 8b 45 08 4d 8d 65 08 41 39 d6 73 20 83 ed 06 48 85 c0 <74> 67 48 89 c2 83 e2 03 48 83 fa 02 75 0c 48 3d 00 10 00 00 0f 87
RSP: 0018:ffffad890a34b940 EFLAGS: 00000286
RAX: ffff96f30b261daa RBX: ffffad890a34b9c8 RCX: 0000000000000000
RDX: 000000000000001e RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffad890a356868
R13: ffffad890a356860 R14: 0000000000000000 R15: ffffad890a356868
FS: 00007f5578a2a400(0000) GS:ffff97ed317e1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f015c70fb18 CR3: 00000001109fd006 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<TASK>
xas_store+0x58/0x630
? srso_alias_return_thunk+0x5/0xfbef5
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
__xa_store+0xa5/0x130
xa_store+0x2c/0x50
kvm_vm_set_mem_attributes+0x343/0x710 [kvm]
kvm_vm_ioctl+0x796/0xab0 [kvm]
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? rseq_ip_fixup+0x8c/0x1e0
__x64_sys_ioctl+0xa3/0xd0
do_syscall_64+0x8c/0x7a0
? srso_alias_return_thunk+0x5/0xfbef5
? __alloc_frozen_pages_noprof+0x18d/0x340
? srso_alias_return_thunk+0x5/0xfbef5
? try_charge_memcg+0x76/0x640
? srso_alias_return_thunk+0x5/0xfbef5
? __count_memcg_events+0xbb/0x150
? srso_alias_return_thunk+0x5/0xfbef5
? __mod_memcg_lruvec_state+0xb6/0x1b0
? srso_alias_return_thunk+0x5/0xfbef5
? __lruvec_stat_mod_folio+0x83/0xd0
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? set_ptes.isra.0+0x36/0x90
? srso_alias_return_thunk+0x5/0xfbef5
? do_anonymous_page+0x103/0x4d0
? srso_alias_return_thunk+0x5/0xfbef5
? __handle_mm_fault+0x397/0x6f0
? srso_alias_return_thunk+0x5/0xfbef5
? __count_memcg_events+0xbb/0x150
? srso_alias_return_thunk+0x5/0xfbef5
? count_memcg_events.constprop.0+0x26/0x50
? srso_alias_return_thunk+0x5/0xfbef5
? handle_mm_fault+0x245/0x350
? srso_alias_return_thunk+0x5/0xfbef5
? do_user_addr_fault+0x221/0x686
? srso_alias_return_thunk+0x5/0xfbef5
? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5578d031bb
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 4c 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe0a742b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000004020aed2 RCX: 00007f5578d031bb
RDX: 00007ffe0a742c80 RSI: 000000004020aed2 RDI: 000000000000000b
RBP: 0000010000000000 R08: 0000010000000000 R09: 0000017680000000
R10: 0000000000000080 R11: 0000000000000246 R12: 00005575e5f95120
R13: 00007ffe0a742c80 R14: 0000000000000008 R15: 00005575e5f961e0
Limit the range of memory per operation when setting the attributes to
avoid holding kvm->slots_lock for too long and causing a cpu soft lockup.
Fixes: 5a475554db1e ("KVM: Introduce per-page memory attributes")
Cc: stable@vger.kernel.org # 6.12.x
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
---
virt/kvm/kvm_main.c | 37 ++++++++++++++++++++++++++++++++-----
1 file changed, 32 insertions(+), 5 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 69782df3617f..6e6d404a7d7a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2533,7 +2533,9 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
struct kvm_memory_attributes *attrs)
{
- gfn_t start, end;
+ gfn_t start, end, section_start, section_end;
+ u64 size, size_remaining;
+ int ret = 0;
/* flags is currently not used. */
if (attrs->flags)
@@ -2545,9 +2547,6 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
return -EINVAL;
- start = attrs->address >> PAGE_SHIFT;
- end = (attrs->address + attrs->size) >> PAGE_SHIFT;
-
/*
* xarray tracks data using "unsigned long", and as a result so does
* KVM. For simplicity, supports generic attributes only on 64-bit
@@ -2555,7 +2554,35 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
*/
BUILD_BUG_ON(sizeof(attrs->attributes) != sizeof(unsigned long));
- return kvm_vm_set_mem_attributes(kvm, start, end, attrs->attributes);
+ size_remaining = attrs->size;
+ section_start = start = attrs->address >> PAGE_SHIFT;
+ section_end = end = (attrs->address + attrs->size) >> PAGE_SHIFT;
+ while (size_remaining > 0) {
+ /*
+ * If the range of memory is greater than 512GB, clamp it for
+ * this iteration to 512GB. This avoids a potential CPU soft
+ * lockup when run on a larger range for an SEV-SNP guest.
+ * (measured at 940GB so there is some headroom, just in case).
+ */
+ if (size_remaining > SZ_512G) {
+ size = SZ_512G;
+ size_remaining -= size;
+ section_end = section_start + (size >> PAGE_SHIFT);
+ } else {
+ size = size_remaining;
+ size_remaining = 0;
+ section_end = end;
+ WARN_ON_ONCE(section_end != (section_start + (size >> PAGE_SHIFT)));
+ }
+
+ ret = kvm_vm_set_mem_attributes(kvm, section_start, section_end, attrs->attributes);
+ if (ret != 0)
+ break;
+
+ section_start = section_end;
+ }
+
+ return ret;
}
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] KVM: Add trace_kvm_vm_set_mem_attributes()
2025-06-05 15:24 [PATCH 0/3] SEV-SNP fix for cpu soft lockup on 1TB+ guests Liam Merwick
2025-06-05 15:25 ` [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup Liam Merwick
@ 2025-06-05 15:25 ` Liam Merwick
2025-06-05 15:25 ` [PATCH 3/3] KVM: fix typo in kvm_vm_set_mem_attributes() comment Liam Merwick
2 siblings, 0 replies; 8+ messages in thread
From: Liam Merwick @ 2025-06-05 15:25 UTC (permalink / raw)
To: kvm
Cc: liam.merwick, pbonzini, seanjc, thomas.lendacky, michael.roth,
tabba, ackerleytng
Add a tracing function to display the attribules being set for
a range of guest memory.
Sample output:
<...>-12693 [059] ..... 1342.536361: kvm_vm_set_mem_attributes: 0x00000000000000 -- 0x00000000080000 [0x8]
qemu-kvm-12693 [187] ..... 1342.747651: kvm_vm_set_mem_attributes: . 0x00000010000000 -- 0x00000018000000 [0x8]
qemu-kvm-12693 [040] .N... 1366.473790: kvm_vm_set_mem_attributes: . 0x00000018000000 -- 0x00000020000000 [0x8]
qemu-kvm-12693 [009] .N... 1390.350362: kvm_vm_set_mem_attributes: . 0x00000020000000 -- 0x00000028000000 [0x8]
qemu-kvm-12693 [008] .N... 1414.154231: kvm_vm_set_mem_attributes: 0x00000028000000 -- 0x0000002da80000 [0x8]
qemu-kvm-12693 [136] ..... 1430.988101: kvm_vm_set_mem_attributes: 0x000000000ffc00 -- 0x00000000100000 [0x8]
qemu-kvm-12693 [024] ..... 1431.029798: kvm_vm_set_mem_attributes: 0x00000000000000 -- 0x000000000000c0 [0x8]
The '.' before the addresses above signifies that the initial request
was split into multiple operations. Originally it was requested to
set the attributes on 0x00000010000000 to 0x0000002da80000
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
---
include/trace/events/kvm.h | 33 +++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 4 ++++
2 files changed, 37 insertions(+)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index fc7d0f8ff078..701bf1f88850 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -473,6 +473,39 @@ TRACE_EVENT(kvm_dirty_ring_exit,
TP_printk("vcpu %d", __entry->vcpu_id)
);
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+/*
+ * @start: Starting address of guest memory range
+ * @end: End address of guest memory range
+ * @attr: The value of the attribute being set.
+ * @indent: If true, indent output displayed (printing '.' is used to
+ * indicate that the transaction was split into multiple
+ * operations and more are to follow).
+ */
+TRACE_EVENT(kvm_vm_set_mem_attributes,
+ TP_PROTO(gfn_t start, gfn_t end, unsigned long attr, bool indent),
+ TP_ARGS(start, end, attr, indent),
+
+ TP_STRUCT__entry(
+ __field(gfn_t, start)
+ __field(gfn_t, end)
+ __field(unsigned long, attr)
+ __field(bool, indent)
+ ),
+
+ TP_fast_assign(
+ __entry->start = start;
+ __entry->end = end;
+ __entry->attr = attr;
+ __entry->indent = indent;
+ ),
+
+ TP_printk("%s %#016llx -- %#016llx [0x%lx]",
+ __entry->indent ? " ." : "",
+ __entry->start, __entry->end, __entry->attr)
+);
+#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
+
TRACE_EVENT(kvm_unmap_hva_range,
TP_PROTO(unsigned long start, unsigned long end),
TP_ARGS(start, end),
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6e6d404a7d7a..464357ea638c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2568,11 +2568,15 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
size = SZ_512G;
size_remaining -= size;
section_end = section_start + (size >> PAGE_SHIFT);
+ trace_kvm_vm_set_mem_attributes(section_start, section_end,
+ attrs->attributes, true);
} else {
size = size_remaining;
size_remaining = 0;
section_end = end;
WARN_ON_ONCE(section_end != (section_start + (size >> PAGE_SHIFT)));
+ trace_kvm_vm_set_mem_attributes(section_start, section_end,
+ attrs->attributes, false);
}
ret = kvm_vm_set_mem_attributes(kvm, section_start, section_end, attrs->attributes);
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] KVM: fix typo in kvm_vm_set_mem_attributes() comment
2025-06-05 15:24 [PATCH 0/3] SEV-SNP fix for cpu soft lockup on 1TB+ guests Liam Merwick
2025-06-05 15:25 ` [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup Liam Merwick
2025-06-05 15:25 ` [PATCH 2/3] KVM: Add trace_kvm_vm_set_mem_attributes() Liam Merwick
@ 2025-06-05 15:25 ` Liam Merwick
2 siblings, 0 replies; 8+ messages in thread
From: Liam Merwick @ 2025-06-05 15:25 UTC (permalink / raw)
To: kvm
Cc: liam.merwick, pbonzini, seanjc, thomas.lendacky, michael.roth,
tabba, ackerleytng
It should be 'has' in the sentence and not 'as'.
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 464357ea638c..be8cf9d5864d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2501,7 +2501,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
mutex_lock(&kvm->slots_lock);
- /* Nothing to do if the entire range as the desired attributes. */
+ /* Nothing to do if the entire range has the desired attributes. */
if (kvm_range_has_memory_attributes(kvm, start, end, ~0, attributes))
goto out_unlock;
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup
2025-06-05 15:25 ` [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup Liam Merwick
@ 2025-06-05 15:57 ` Sean Christopherson
2025-06-05 19:03 ` Liam Merwick
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-06-05 15:57 UTC (permalink / raw)
To: Liam Merwick
Cc: kvm, pbonzini, thomas.lendacky, michael.roth, tabba, ackerleytng
On Thu, Jun 05, 2025, Liam Merwick wrote:
> When booting an SEV-SNP guest with a sufficiently large amount of memory (1TB+),
> the host can experience CPU soft lockups when running an operation in
> kvm_vm_set_mem_attributes() to set memory attributes on the whole
> range of guest memory.
>
> watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [qemu-kvm:6372]
> CPU: 8 UID: 0 PID: 6372 Comm: qemu-kvm Kdump: loaded Not tainted 6.15.0-rc7.20250520.el9uek.rc1.x86_64 #1 PREEMPT(voluntary)
> Hardware name: Oracle Corporation ORACLE SERVER E4-2c/Asm,MB Tray,2U,E4-2c, BIOS 78016600 11/13/2024
> RIP: 0010:xas_create+0x78/0x1f0
> Code: 00 00 00 41 80 fc 01 0f 84 82 00 00 00 ba 06 00 00 00 bd 06 00 00 00 49 8b 45 08 4d 8d 65 08 41 39 d6 73 20 83 ed 06 48 85 c0 <74> 67 48 89 c2 83 e2 03 48 83 fa 02 75 0c 48 3d 00 10 00 00 0f 87
> RSP: 0018:ffffad890a34b940 EFLAGS: 00000286
> RAX: ffff96f30b261daa RBX: ffffad890a34b9c8 RCX: 0000000000000000
> RDX: 000000000000001e RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffad890a356868
> R13: ffffad890a356860 R14: 0000000000000000 R15: ffffad890a356868
> FS: 00007f5578a2a400(0000) GS:ffff97ed317e1000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f015c70fb18 CR3: 00000001109fd006 CR4: 0000000000f70ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> xas_store+0x58/0x630
Trim the '?' lines when including a backtrace in a changelog, they're pure noise.
> __xa_store+0xa5/0x130
> xa_store+0x2c/0x50
> kvm_vm_set_mem_attributes+0x343/0x710 [kvm]
> kvm_vm_ioctl+0x796/0xab0 [kvm]
> __x64_sys_ioctl+0xa3/0xd0
> do_syscall_64+0x8c/0x7a0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f5578d031bb
> Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 4c 0f 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffe0a742b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000004020aed2 RCX: 00007f5578d031bb
> RDX: 00007ffe0a742c80 RSI: 000000004020aed2 RDI: 000000000000000b
> RBP: 0000010000000000 R08: 0000010000000000 R09: 0000017680000000
> R10: 0000000000000080 R11: 0000000000000246 R12: 00005575e5f95120
> R13: 00007ffe0a742c80 R14: 0000000000000008 R15: 00005575e5f961e0
>
> Limit the range of memory per operation when setting the attributes to
> avoid holding kvm->slots_lock for too long and causing a cpu soft lockup.
Holding slots_lock is totally fine. Presumably the issue is that the CPU never
reschedules.
E.g. I would expect this to make the problem go away, though it's probably not a
complete fix (I'm guessing kvm_range_has_memory_attributes() can be made to yell
too).
I'd strongly prefer to avoid arbitrary batching, because that raises a bunch of
questions that are difficult to answer, e.g. what guarantees 512GiB is a "good"
batch size on _all_ systems.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b24db92e98f3..28230bad43f4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2513,6 +2513,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
if (r)
goto out_unlock;
+
+ cond_resched();
}
kvm_handle_gfn_range(kvm, &pre_set_range);
@@ -2521,6 +2523,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,
GFP_KERNEL_ACCOUNT));
KVM_BUG_ON(r, kvm);
+ cond_resched();
}
kvm_handle_gfn_range(kvm, &post_set_range);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup
2025-06-05 15:57 ` Sean Christopherson
@ 2025-06-05 19:03 ` Liam Merwick
2025-06-05 19:08 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: Liam Merwick @ 2025-06-05 19:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, pbonzini, thomas.lendacky, michael.roth, tabba, ackerleytng,
liam.merwick
On 05/06/2025 16:57, Sean Christopherson wrote:
> On Thu, Jun 05, 2025, Liam Merwick wrote:
>> When booting an SEV-SNP guest with a sufficiently large amount of memory (1TB+),
>> the host can experience CPU soft lockups when running an operation in
>> kvm_vm_set_mem_attributes() to set memory attributes on the whole
>> range of guest memory.
>>
>> watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [qemu-kvm:6372]
>> CPU: 8 UID: 0 PID: 6372 Comm: qemu-kvm Kdump: loaded Not tainted 6.15.0-rc7.20250520.el9uek.rc1.x86_64 #1 PREEMPT(voluntary)
>> Hardware name: Oracle Corporation ORACLE SERVER E4-2c/Asm,MB Tray,2U,E4-2c, BIOS 78016600 11/13/2024
>> RIP: 0010:xas_create+0x78/0x1f0
>> Code: 00 00 00 41 80 fc 01 0f 84 82 00 00 00 ba 06 00 00 00 bd 06 00 00 00 49 8b 45 08 4d 8d 65 08 41 39 d6 73 20 83 ed 06 48 85 c0 <74> 67 48 89 c2 83 e2 03 48 83 fa 02 75 0c 48 3d 00 10 00 00 0f 87
>> RSP: 0018:ffffad890a34b940 EFLAGS: 00000286
>> RAX: ffff96f30b261daa RBX: ffffad890a34b9c8 RCX: 0000000000000000
>> RDX: 000000000000001e RSI: 0000000000000000 RDI: 0000000000000000
>> RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: ffffad890a356868
>> R13: ffffad890a356860 R14: 0000000000000000 R15: ffffad890a356868
>> FS: 00007f5578a2a400(0000) GS:ffff97ed317e1000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f015c70fb18 CR3: 00000001109fd006 CR4: 0000000000f70ef0
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> xas_store+0x58/0x630
>
> Trim the '?' lines when including a backtrace in a changelog, they're pure noise.
>
Ack
>> __xa_store+0xa5/0x130
>> xa_store+0x2c/0x50
>> kvm_vm_set_mem_attributes+0x343/0x710 [kvm]
>> kvm_vm_ioctl+0x796/0xab0 [kvm]
>> __x64_sys_ioctl+0xa3/0xd0
>> do_syscall_64+0x8c/0x7a0
>> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> RIP: 0033:0x7f5578d031bb
>> Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 4c 0f 00 f7 d8 64 89 01 48
>> RSP: 002b:00007ffe0a742b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> RAX: ffffffffffffffda RBX: 000000004020aed2 RCX: 00007f5578d031bb
>> RDX: 00007ffe0a742c80 RSI: 000000004020aed2 RDI: 000000000000000b
>> RBP: 0000010000000000 R08: 0000010000000000 R09: 0000017680000000
>> R10: 0000000000000080 R11: 0000000000000246 R12: 00005575e5f95120
>> R13: 00007ffe0a742c80 R14: 0000000000000008 R15: 00005575e5f961e0
>>
>> Limit the range of memory per operation when setting the attributes to
>> avoid holding kvm->slots_lock for too long and causing a cpu soft lockup.
>
> Holding slots_lock is totally fine. Presumably the issue is that the CPU never
> reschedules.
>
> E.g. I would expect this to make the problem go away, though it's probably not a
> complete fix (I'm guessing kvm_range_has_memory_attributes() can be made to yell
> too).
>
That indeed works. I couldn't trigger anything in
kvm_range_has_memory_attributes() but am limited to about 2TiB.
I'll do some more tracing before I send a v2 to see if there any more
places that might be close to hitting the limit.
Thanks,
Liam
> I'd strongly prefer to avoid arbitrary batching, because that raises a bunch of
> questions that are difficult to answer, e.g. what guarantees 512GiB is a "good"
> batch size on _all_ systems.
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b24db92e98f3..28230bad43f4 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2513,6 +2513,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> r = xa_reserve(&kvm->mem_attr_array, i, GFP_KERNEL_ACCOUNT);
> if (r)
> goto out_unlock;
> +
> + cond_resched();
> }
>
> kvm_handle_gfn_range(kvm, &pre_set_range);
> @@ -2521,6 +2523,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,
> GFP_KERNEL_ACCOUNT));
> KVM_BUG_ON(r, kvm);
> + cond_resched();
> }
>
> kvm_handle_gfn_range(kvm, &post_set_range);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup
2025-06-05 19:03 ` Liam Merwick
@ 2025-06-05 19:08 ` Sean Christopherson
2025-06-07 21:14 ` Liam Merwick
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-06-05 19:08 UTC (permalink / raw)
To: Liam Merwick
Cc: kvm, pbonzini, thomas.lendacky, michael.roth, tabba, ackerleytng
On Thu, Jun 05, 2025, Liam Merwick wrote:
> On 05/06/2025 16:57, Sean Christopherson wrote:
> > On Thu, Jun 05, 2025, Liam Merwick wrote:
> > > Limit the range of memory per operation when setting the attributes to
> > > avoid holding kvm->slots_lock for too long and causing a cpu soft lockup.
> >
> > Holding slots_lock is totally fine. Presumably the issue is that the CPU never
> > reschedules.
> >
> > E.g. I would expect this to make the problem go away, though it's probably not a
> > complete fix (I'm guessing kvm_range_has_memory_attributes() can be made to yell
> > too).
>
> That indeed works. I couldn't trigger anything in
> kvm_range_has_memory_attributes() but am limited to about 2TiB. I'll do some
> more tracing before I send a v2 to see if there any more places that might be
> close to hitting the limit.
To get kvm_range_has_memory_attributes() to fail, I _think_ you would need to do
a large query when the attributes match a non-zero value, so that it needs to
perform its slower search.
Ah, actually, I wouldn't be at all surprised if the issue is limited to insertion,
or even just to the xa_reserve() path that allocates memory.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup
2025-06-05 19:08 ` Sean Christopherson
@ 2025-06-07 21:14 ` Liam Merwick
0 siblings, 0 replies; 8+ messages in thread
From: Liam Merwick @ 2025-06-07 21:14 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, pbonzini, thomas.lendacky, michael.roth, tabba, ackerleytng
On 05/06/2025 20:08, Sean Christopherson wrote:
> On Thu, Jun 05, 2025, Liam Merwick wrote:
>> On 05/06/2025 16:57, Sean Christopherson wrote:
>>> On Thu, Jun 05, 2025, Liam Merwick wrote:
>>>> Limit the range of memory per operation when setting the attributes to
>>>> avoid holding kvm->slots_lock for too long and causing a cpu soft lockup.
>>>
>>> Holding slots_lock is totally fine. Presumably the issue is that the CPU never
>>> reschedules.
>>>
>>> E.g. I would expect this to make the problem go away, though it's probably not a
>>> complete fix (I'm guessing kvm_range_has_memory_attributes() can be made to yell
>>> too).
>>
>> That indeed works. I couldn't trigger anything in
>> kvm_range_has_memory_attributes() but am limited to about 2TiB. I'll do some
>> more tracing before I send a v2 to see if there any more places that might be
>> close to hitting the limit.
>
> To get kvm_range_has_memory_attributes() to fail, I _think_ you would need to do
> a large query when the attributes match a non-zero value, so that it needs to
> perform its slower search.
>
> Ah, actually, I wouldn't be at all surprised if the issue is limited to insertion,
> or even just to the xa_reserve() path that allocates memory.
Yes indeed, the kvm_range_has_memory_attributes() operation has a much
lower overhead. kvm_vm_set_mem_attributes() has that outlier of 99 sec
for 1.9 TiB
kvm_range_has_memory_attributes
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 966532
1024 | 9781
2048 | 1355
4096 | 2449
8192 | 843
16384 | 240
32768 | 3
65536 | 239
131072 | 2
262144 | 1
524288 | 3
1048576 | 1
2097152 | 0
kvm_vm_set_mem_attributes
value ------------- Distribution ------------- count
512 | 0
1024 | 2
2048 |@@@@@@@@@@@@@ 1496
4096 |@@@@@@@ 813
8192 |@@@@@@@@@@@@@@ 1621
16384 |@@@@ 432
32768 | 12
65536 |@@ 239
131072 | 6
262144 | 4
524288 | 3
1048576 | 1
2097152 | 0
4194304 | 0
8388608 | 8
16777216 | 0
33554432 | 0
67108864 | 1
134217728 | 0
268435456 | 0
536870912 | 0
1073741824 | 0
2147483648 | 0
4294967296 | 0
8589934592 | 0
17179869184 | 0
34359738368 | 0
68719476736 | 1
137438953472 | 0
(As a test, I also inserted an additional call to
kvm_range_has_memory_attributes()
over the whole range of memory with a different attribute value
and didn't hit any pathological behaviour).
I'll send a v2 with the suggested fix.
Regards,
Liam
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-07 21:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 15:24 [PATCH 0/3] SEV-SNP fix for cpu soft lockup on 1TB+ guests Liam Merwick
2025-06-05 15:25 ` [PATCH 1/3] KVM: Batch setting of per-page memory attributes to avoid soft lockup Liam Merwick
2025-06-05 15:57 ` Sean Christopherson
2025-06-05 19:03 ` Liam Merwick
2025-06-05 19:08 ` Sean Christopherson
2025-06-07 21:14 ` Liam Merwick
2025-06-05 15:25 ` [PATCH 2/3] KVM: Add trace_kvm_vm_set_mem_attributes() Liam Merwick
2025-06-05 15:25 ` [PATCH 3/3] KVM: fix typo in kvm_vm_set_mem_attributes() comment Liam Merwick
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.