From: Paolo Bonzini <pbonzini@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: "Andy Lutomirski" <luto@kernel.org>,
"Adalbert Lazăr" <alazar@bitdefender.com>,
Linux-MM <linux-mm@kvack.org>,
"Linux API" <linux-api@vger.kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Alexander Graf" <graf@amazon.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Jerome Glisse" <jglisse@redhat.com>,
"Mihai Donțu" <mdontu@bitdefender.com>,
"Mircea Cirjaliu" <mcirjaliu@bitdefender.com>,
"Arnd Bergmann" <arnd@arndb.de>,
"Sargun Dhillon" <sargun@sargun.me>,
"Aleksa Sarai" <cyphar@cyphar.com>,
"Oleg Nesterov" <oleg@redhat.com>, "Jann Horn" <jannh@google.com>,
"Kees Cook" <keescook@chromium.org>,
"Matthew Wilcox" <willy@infradead.org>,
"Christian Brauner" <christian.brauner@ubuntu.com>
Subject: Re: [RESEND RFC PATCH 0/5] Remote mapping
Date: Fri, 4 Sep 2020 23:58:57 +0200 [thread overview]
Message-ID: <836cff86-e670-8c69-6cbd-b22c5b5538df@redhat.com> (raw)
In-Reply-To: <70D23368-A24D-4A15-8FC7-FA728D102475@amacapital.net>
On 04/09/20 22:34, Andy Lutomirski wrote:
> On Sep 4, 2020, at 1:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 04/09/20 21:39, Andy Lutomirski wrote:
>>> I'm a little concerned
>>> that it's actually too clever and that maybe a more
>>> straightforward solution should be investigated. I personally
>>> rather dislike the KVM model in which the guest address space
>>> mirrors the host (QEMU) address space rather than being its own
>>> thing. In particular, the current model means that
>>> extra-special-strange mappings like SEV-encrypted memory are
>>> required to be present in the QEMU page tables in order for the
>>> guest to see them. (If I had noticed that last bit before it went
>>> upstream, I would have NAKked it. I would still like to see it
>>> deprecated and ideally eventually removed from the kernel. We
>>> have absolutely no business creating incoherent mappings like
>>> this.)
>>
>> NACK first and ask second, right Andy? I see that nothing has
>> changed since Alan Cox left Linux.
>
> NACKs are negotiable. And maybe someone can convince me that the SEV
> mapping scheme is reasonable, but I would be surprised.
So why say NACK? Any half-decent maintainer would hold on merging the
patches at least until the discussion is over. Also I suppose any
deprecation proposal should come with a description of an alternative.
Anyway, for SEV the problem is DMA. There is no way to know in advance
which memory the guest will use for I/O; it can change at any time and
the same host-physical address can even be mapped both as C=0 and C=1 by
the guest. There's no communication protocol between the guest and the
host to tell the host _which_ memory should be mapped in QEMU. (One was
added to support migration, but that doesn't even work with SEV-ES
processors where migration is planned to happen mostly with help from
the guest, either in the firmware or somewhere else).
But this is a digression. (If you would like to continue the discussion
please trim the recipient list and change the subject).
> Regardless, you seem to be suggesting that you want to have enclave
> VMs in which the enclave can see some memory that the parent VM can’t
> see. How does this fit into the KVM mapping model? How does this
> remote mapping mechanism help? Do you want QEMU to have that memory
> mapped in its own pagetables?
There are three processes:
- the manager, which is the parent of the VMs and uses the pidfd_mem
system call
- the primary VM
- the enclave VM(s)
The primary VM and the enclave VM(s) would each get a different memory
access file descriptor. QEMU would treat them no differently from any
other externally-provided memory backend, say hugetlbfs or memfd, so
yeah they would be mmap-ed to userspace and the host virtual address
passed as usual to KVM.
Enclave VMs could be used to store secrets and perform crypto for
example. The enclave is measured at boot, any keys or other stuff it
needs can be provided out-of-band from the manager
The manager can decide at any time to hide some memory from the parent
VM (in order to give it to an enclave). This would actually be done on
request of the parent VM itself, and QEMU would probably be so kind as
to replace the "hole" left in the guest memory with zeroes. But QEMU is
untrusted, so the manager cannot rely on QEMU behaving well. Hence the
privilege separation model that was implemented here.
Actually Amazon has already created something like that and Andra-Irina
Paraschiv has posted patches on the list for this. Their implementation
is not open source, but this pidfd-mem concept is something that Andra,
Alexander Graf and I came up with as a way to 1) reimplement the feature
upstream and 2) satisfy Bitdefender's need for memory introspection 3)
add what seemed a useful interface anyway, for example to replace
PTRACE_{PEEK,POKE}DATA. Though (3) would only need pread/pwrite, not
mmap which adds a lot of the complexity.
> As it stands, the way that KVM memory mappings are created seems to
> be convenient, but it also seems to be resulting in increasing
> bizarre userspace mappings. At what point is the right solution to
> decouple KVM’s mappings from QEMU’s?
So what you are suggesting is that KVM manages its own address space
instead of host virtual addresses (and with no relationship to host
virtual addresses, it would be just a "cookie")? It would then need a
couple ioctls to mmap/munmap (creating and deleting VMAs) into the
address space, and those cookies would be passed to
KVM_SET_USER_MEMORY_REGION. QEMU would still need access to these VMAs,
would it mmap a file descriptor provided by KVM? All in all the
implementation seems quite complex, and I don't understand why it would
avoid incoherent SEV mappings; what am I missing?
Paolo
next prev parent reply other threads:[~2020-09-04 21:59 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-04 11:31 [RESEND RFC PATCH 0/5] Remote mapping Adalbert Lazăr
2020-09-04 11:31 ` [RESEND RFC PATCH 1/5] mm: add atomic capability to zap_details Adalbert Lazăr
2020-09-04 11:31 ` [RESEND RFC PATCH 2/5] mm: let the VMA decide how zap_pte_range() acts on mapped pages Adalbert Lazăr
2020-09-04 11:31 ` [RESEND RFC PATCH 3/5] mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in nested scenarios Adalbert Lazăr
2020-09-04 12:03 ` Jason Gunthorpe
2020-09-04 11:31 ` [RESEND RFC PATCH 4/5] mm/remote_mapping: use a pidfd to access memory belonging to unrelated process Adalbert Lazăr
2020-09-04 17:55 ` Oleg Nesterov
2020-09-07 14:30 ` Oleg Nesterov
2020-09-07 15:16 ` Adalbert Lazăr
2020-09-09 8:32 ` Mircea CIRJALIU - MELIU
2020-09-10 16:43 ` Oleg Nesterov
2020-09-07 15:02 ` Christian Brauner
2020-09-07 16:04 ` Mircea CIRJALIU - MELIU
2020-09-04 11:31 ` [RESEND RFC PATCH 5/5] pidfd_mem: implemented remote memory mapping system call Adalbert Lazăr
2020-09-04 19:18 ` Florian Weimer
2020-09-07 14:55 ` Christian Brauner
2020-09-04 12:11 ` [RESEND RFC PATCH 0/5] Remote mapping Jason Gunthorpe
2020-09-04 13:24 ` Mircea CIRJALIU - MELIU
2020-09-04 13:39 ` Jason Gunthorpe
2020-09-04 14:18 ` Mircea CIRJALIU - MELIU
2020-09-04 14:39 ` Jason Gunthorpe
2020-09-04 15:40 ` Mircea CIRJALIU - MELIU
2020-09-04 16:11 ` Jason Gunthorpe
2020-09-04 19:41 ` Matthew Wilcox
2020-09-04 19:49 ` Jason Gunthorpe
2020-09-04 20:08 ` Paolo Bonzini
2020-12-01 18:01 ` Jason Gunthorpe
2020-09-04 19:19 ` Florian Weimer
2020-09-04 20:18 ` Paolo Bonzini
2020-09-07 8:33 ` Christian Brauner
2020-09-04 19:39 ` Andy Lutomirski
2020-09-04 20:09 ` Paolo Bonzini
2020-09-04 20:34 ` Andy Lutomirski
2020-09-04 21:58 ` Paolo Bonzini [this message]
2020-09-04 23:17 ` Andy Lutomirski
2020-09-05 18:27 ` Paolo Bonzini
2020-09-07 8:38 ` Christian Brauner
2020-09-07 12:41 ` Mircea CIRJALIU - MELIU
2020-09-07 7:05 ` Christoph Hellwig
2020-09-07 8:44 ` Paolo Bonzini
2020-09-07 10:25 ` Mircea CIRJALIU - MELIU
2020-09-07 15:05 ` Christian Brauner
2020-09-07 20:43 ` Andy Lutomirski
2020-09-09 11:38 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=836cff86-e670-8c69-6cbd-b22c5b5538df@redhat.com \
--to=pbonzini@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alazar@bitdefender.com \
--cc=arnd@arndb.de \
--cc=christian.brauner@ubuntu.com \
--cc=cyphar@cyphar.com \
--cc=graf@amazon.com \
--cc=jannh@google.com \
--cc=jglisse@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=mcirjaliu@bitdefender.com \
--cc=mdontu@bitdefender.com \
--cc=oleg@redhat.com \
--cc=sargun@sargun.me \
--cc=stefanha@redhat.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).