From: Sean Christopherson <seanjc@google.com>
To: Andrei Vagin <avagin@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Wanpeng Li <wanpengli@tencent.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Jianfeng Tan <henry.tjf@antfin.com>,
Adin Scannell <ascannell@google.com>,
Konstantin Bogomolov <bogomolov@google.com>,
Etienne Perot <eperot@google.com>,
Andy Lutomirski <luto@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH 0/5] KVM/x86: add a new hypercall to execute host system
Date: Tue, 26 Jul 2022 15:10:34 +0000 [thread overview]
Message-ID: <YuAD6qY+F2nuGm62@google.com> (raw)
In-Reply-To: <CAEWA0a4hrRb5HYLqa1Q47=guY6TLsWSJ_zxNjOXXV2jCjUekUA@mail.gmail.com>
On Tue, Jul 26, 2022, Andrei Vagin wrote:
> On Fri, Jul 22, 2022 at 4:41 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > +x86 maintainers, patch 1 most definitely needs acceptance from folks beyond KVM.
> >
> > On Fri, Jul 22, 2022, Andrei Vagin wrote:
> > > Another option is the KVM platform. In this case, the Sentry (gVisor
> > > kernel) can run in a guest ring0 and create/manage multiple address
> > > spaces. Its performance is much better than the ptrace one, but it is
> > > still not great compared with the native performance. This change
> > > optimizes the most critical part, which is the syscall overhead.
> >
> > What exactly is the source of the syscall overhead,
>
> Here are perf traces for two cases: when "guest" syscalls are executed via
> hypercalls and when syscalls are executed by the user-space VMM:
> https://gist.github.com/avagin/f50a6d569440c9ae382281448c187f4e
>
> And here are two tests that I use to collect these traces:
> https://github.com/avagin/linux-task-diag/commit/4e19c7007bec6a15645025c337f2e85689b81f99
>
> If we compare these traces, we can find that in the second case, we spend extra
> time in vmx_prepare_switch_to_guest, fpu_swap_kvm_fpstate, vcpu_put,
> syscall_exit_to_user_mode.
So of those, I think the only path a robust implementation can actually avoid,
without significantly whittling down the allowed set of syscalls, is
syscall_exit_to_user_mode().
The bulk of vcpu_put() is vmx_prepare_switch_to_host(), and KVM needs to run
through that before calling out of KVM. E.g. prctrl(ARCH_GET_GS) will read the
wrong GS.base if MSR_KERNEL_GS_BASE isn't restored. And that necessitates
calling vmx_prepare_switch_to_guest() when resuming the vCPU.
FPU state, i.e. fpu_swap_kvm_fpstate() is likely a similar story, there's bound
to be a syscall that accesses user FPU state and will do the wrong thing if guest
state is loaded.
For gVisor, that's all presumably a non-issue because it uses a small set of
syscalls (or has guest==host state?), but for a common KVM feature it's problematic.
> > and what alternatives have been explored? Making arbitrary syscalls from
> > within KVM is mildly terrifying.
>
> "mildly terrifying" is a good sentence in this case:). If I were in your place,
> I would think about it similarly.
>
> I understand these concerns about calling syscalls from the KVM code, and this
> is why I hide this feature under a separate capability that can be enabled
> explicitly.
>
> We can think about restricting the list of system calls that this hypercall can
> execute. In the user-space changes for gVisor, we have a list of system calls
> that are not executed via this hypercall.
Can you provide that list?
> But it has downsides:
> * Each sentry system call trigger the full exit to hr3.
> * Each vmenter/vmexit requires to trigger a signal but it is expensive.
Can you explain this one? I didn't quite follow what this is referring to.
> * It doesn't allow to support Confidential Computing (SEV-ES/SGX). The Sentry
> has to be fully enclosed in a VM to be able to support these technologies.
Speaking of SGX, this reminds me a lot of Graphene, SCONEs, etc..., which IIRC
tackled the "syscalls are crazy expensive" problem by using a message queue and
a dedicated task outside of the enclave to handle syscalls. Would something like
that work, or is having to burn a pCPU (or more) to handle syscalls in the host a
non-starter?
next prev parent reply other threads:[~2022-07-26 15:10 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-22 23:02 [PATCH 0/5] KVM/x86: add a new hypercall to execute host system Andrei Vagin
2022-07-22 23:02 ` [PATCH 1/5] kernel: add a new helper to execute system calls from kernel code Andrei Vagin
2022-07-22 23:02 ` [PATCH 2/5] kvm/x86: add controls to enable/disable paravirtualized system calls Andrei Vagin
2022-07-22 23:02 ` [PATCH 3/5] KVM/x86: add a new hypercall to execute host " Andrei Vagin
2022-07-22 23:02 ` [PATCH 4/5] selftests/kvm/x86_64: set rax before vmcall Andrei Vagin
2022-08-01 11:32 ` Vitaly Kuznetsov
2022-08-01 12:43 ` Paolo Bonzini
2022-07-22 23:02 ` [PATCH 5/5] selftests/kvm/x86_64: add tests for KVM_HC_HOST_SYSCALL Andrei Vagin
2022-07-22 23:41 ` [PATCH 0/5] KVM/x86: add a new hypercall to execute host system Sean Christopherson
2022-07-26 8:33 ` Andrei Vagin
2022-07-26 10:27 ` Paolo Bonzini
2022-07-27 6:44 ` Andrei Vagin
2022-07-26 15:10 ` Sean Christopherson [this message]
2022-07-26 22:10 ` Thomas Gleixner
2022-07-27 1:03 ` Andrei Vagin
2022-08-22 20:26 ` Andrei Vagin
2022-07-27 0:25 ` Andrei Vagin
2022-07-26 21:27 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YuAD6qY+F2nuGm62@google.com \
--to=seanjc@google.com \
--cc=ascannell@google.com \
--cc=avagin@google.com \
--cc=bogomolov@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=eperot@google.com \
--cc=henry.tjf@antfin.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).