From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: David Hildenbrand <david@redhat.com>, qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Laszlo Ersek <lersek@redhat.com>,
Eduardo Habkost <ehabkost@redhat.com>,
Peter Xu <peterx@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH RFC] memory: pause all vCPUs for the duration of memory transactions
Date: Tue, 27 Oct 2020 13:36:19 +0100 [thread overview]
Message-ID: <87imav26d8.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <d7a20a33-0317-467e-6fc6-6528b3b46062@redhat.com>
David Hildenbrand <david@redhat.com> writes:
> On 26.10.20 11:43, David Hildenbrand wrote:
>> On 26.10.20 09:49, Vitaly Kuznetsov wrote:
>>> Currently, KVM doesn't provide an API to make atomic updates to memmap when
>>> the change touches more than one memory slot, e.g. in case we'd like to
>>> punch a hole in an existing slot.
>>>
>>> Reports are that multi-CPU Q35 VMs booted with OVMF sometimes print something
>>> like
>>>
>>> !!!! X64 Exception Type - 0E(#PF - Page-Fault) CPU Apic ID - 00000003 !!!!
>>> ExceptionData - 0000000000000010 I:1 R:0 U:0 W:0 P:0 PK:0 SS:0 SGX:0
>>> RIP - 000000007E35FAB6, CS - 0000000000000038, RFLAGS - 0000000000010006
>>> RAX - 0000000000000000, RCX - 000000007E3598F2, RDX - 00000000078BFBFF
>>> ...
>>>
>>> The problem seems to be that TSEG manipulations on one vCPU are not atomic
>>> from other vCPUs views. In particular, here's the strace:
>>>
>>> Initial creation of the 'problematic' slot:
>>>
>>> 10085 ioctl(13, KVM_SET_USER_MEMORY_REGION, {slot=6, flags=0, guest_phys_addr=0x100000,
>>> memory_size=2146435072, userspace_addr=0x7fb89bf00000}) = 0
>>>
>>> ... and then the update (caused by e.g. mch_update_smram()) later:
>>>
>>> 10090 ioctl(13, KVM_SET_USER_MEMORY_REGION, {slot=6, flags=0, guest_phys_addr=0x100000,
>>> memory_size=0, userspace_addr=0x7fb89bf00000}) = 0
>>> 10090 ioctl(13, KVM_SET_USER_MEMORY_REGION, {slot=6, flags=0, guest_phys_addr=0x100000,
>>> memory_size=2129657856, userspace_addr=0x7fb89bf00000}) = 0
>>>
>>> In case KVM has to handle any event on a different vCPU in between these
>>> two calls the #PF will get triggered.
>>>
>>> An ideal solution to the problem would probably require KVM to provide a
>>> new API to do the whole transaction in one shot but as a band-aid we can
>>> just pause all vCPUs to make memory transations atomic.
>>>
>>> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>> RFC: Generally, memap updates happen only a few times during guest boot but
>>> I'm not sure there are no scenarios when pausing all vCPUs is undesireable
>>> from performance point of view. Also, I'm not sure if kvm_enabled() check
>>> is needed.
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>> softmmu/memory.c | 11 +++++++++--
>>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>>> index fa280a19f7f7..0bf6f3f6d5dc 100644
>>> --- a/softmmu/memory.c
>>> +++ b/softmmu/memory.c
>>> @@ -28,6 +28,7 @@
>>>
>>> #include "exec/memory-internal.h"
>>> #include "exec/ram_addr.h"
>>> +#include "sysemu/cpus.h"
>>> #include "sysemu/kvm.h"
>>> #include "sysemu/runstate.h"
>>> #include "sysemu/tcg.h"
>>> @@ -1057,7 +1058,9 @@ static void address_space_update_topology(AddressSpace *as)
>>> void memory_region_transaction_begin(void)
>>> {
>>> qemu_flush_coalesced_mmio_buffer();
>>> - ++memory_region_transaction_depth;
>>> + if ((++memory_region_transaction_depth == 1) && kvm_enabled()) {
>>> + pause_all_vcpus();
>>> + }
>>> }
>>>
>>> void memory_region_transaction_commit(void)
>>> @@ -1087,7 +1090,11 @@ void memory_region_transaction_commit(void)
>>> }
>>> ioeventfd_update_pending = false;
>>> }
>>> - }
>>> +
>>> + if (kvm_enabled()) {
>>> + resume_all_vcpus();
>>> + }
>>> + }
>>> }
>>>
>>> static void memory_region_destructor_none(MemoryRegion *mr)
>>>
>>
>> This is in general unsafe. pause_all_vcpus() will temporarily drop the
>> BQL, resulting in bad things happening to caller sites.
Oh, I see, thanks! I was expecting there's a reason we don't have this
simple fix in already :-)
>>
>> I studies the involved issues quite intensively when wanting to resize
>> memory regions from virtio-mem code. It's not that easy.
>>
>> Have a look at my RFC for resizing. You can apply something similar to
>> other operations.
>>
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg684979.html
>
> Oh, and I even mentioned the case you try to fix here back then
>
> "
> Instead of inhibiting during the region_resize(), we could inhibit for the
> hole memory transaction (from begin() to commit()). This could be nice,
> because also splitting of memory regions would be atomic (I remember there
> was a BUG report regarding that), however, I am not sure if that might
> impact any RT users.
> "
>
> The current patches live in
> https://github.com/davidhildenbrand/qemu/commits/virtio-mem-next
>
> Especially
>
> https://github.com/davidhildenbrand/qemu/commit/433fbb3abed20f15030e42f2b2bea7e6b9a15180
>
>
I'm not sure why we're focusing on ioctls here. I was debugging my case
quite some time ago but from what I remember it had nothing to do with
ioctls from QEMU. When we are removing a memslot any exit to KVM may
trigger an error condition as we'll see that vCPU or some of our
internal structures (e.g. VMCS for a nested guest) references
non-existent memory. I don't see a good solution other than making the
update fully atomic from *all* vCPUs point of view and this requires
stopping all CPUs -- either from QEMU or from KVM.
Resizing a slot can probably be done without removing it first, however,
I expect that organizing QEMU code in a way where it will decide whether
or not old configuration requires removal is not easy. In some cases
(e.g. punching a KVM_MEM_READONLY hole in the middle of an RW slot) it
seems to be impossible to do with current KVM API.
> I haven't proceeded in upstreaming because I'm still busy with
> virtio-mem thingies in the kernel.
--
Vitaly
next prev parent reply other threads:[~2020-10-27 13:01 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-26 8:49 [PATCH RFC] memory: pause all vCPUs for the duration of memory transactions Vitaly Kuznetsov
2020-10-26 10:43 ` David Hildenbrand
2020-10-26 11:17 ` David Hildenbrand
2020-10-27 12:36 ` Vitaly Kuznetsov [this message]
2020-10-27 12:42 ` David Hildenbrand
2020-10-27 13:02 ` Vitaly Kuznetsov
2020-10-27 13:08 ` David Hildenbrand
2020-10-27 13:19 ` Vitaly Kuznetsov
2020-10-27 13:35 ` David Hildenbrand
2020-10-27 13:47 ` Vitaly Kuznetsov
2020-10-27 14:20 ` Igor Mammedov
2020-11-02 19:57 ` Peter Xu
2020-11-03 13:07 ` Vitaly Kuznetsov
2020-11-03 16:37 ` Peter Xu
2020-11-04 18:09 ` Laszlo Ersek
2020-11-04 19:23 ` Peter Xu
2020-11-04 19:23 ` Peter Xu
2020-11-05 15:36 ` Vitaly Kuznetsov
2020-11-05 15:36 ` Vitaly Kuznetsov
2020-11-05 16:35 ` Peter Xu
2020-11-05 16:35 ` Peter Xu
2020-11-04 17:58 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87imav26d8.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=lersek@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.