linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: "Adalbert Lazăr" <alazar@bitdefender.com>,
	Linux-MM <linux-mm@kvack.org>,
	"Linux API" <linux-api@vger.kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Alexander Graf" <graf@amazon.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Jerome Glisse" <jglisse@redhat.com>,
	"Mihai Donțu" <mdontu@bitdefender.com>,
	"Mircea Cirjaliu" <mcirjaliu@bitdefender.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Sargun Dhillon" <sargun@sargun.me>,
	"Aleksa Sarai" <cyphar@cyphar.com>,
	"Oleg Nesterov" <oleg@redhat.com>, "Jann Horn" <jannh@google.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Christian Brauner" <christian.brauner@ubuntu.com>
Subject: Re: [RESEND RFC PATCH 0/5] Remote mapping
Date: Sat, 5 Sep 2020 20:27:29 +0200	[thread overview]
Message-ID: <bbe80f23-86c5-9d8f-8144-f292a6fc81b4@redhat.com> (raw)
In-Reply-To: <CALCETrUcxFJzN_Vz7qe+79eg8033+uUKOAAMEVj-cB1Gp6pouw@mail.gmail.com>

On 05/09/20 01:17, Andy Lutomirski wrote:
> There's sev_pin_memory(), so QEMU must have at least some idea of
> which memory could potentially be encrypted.  Is it in fact the case
> that QEMU doesn't know that some SEV pinned memory might actually be
> used for DMA until the guest tries to do DMA on that memory?  If so,
> yuck.

Yes.  All the memory is pinned, all the memory could potentially be used
for DMA (of garbage if it's encrypted).  And it's the same for pretty
much all protected VM extensions (SEV, POWER, s390, Intel TDX).

>> The primary VM and the enclave VM(s) would each get a different memory
>> access file descriptor.  QEMU would treat them no differently from any
>> other externally-provided memory backend, say hugetlbfs or memfd, so
>> yeah they would be mmap-ed to userspace and the host virtual address
>> passed as usual to KVM.
> 
> Would the VM processes mmap() these descriptors, or would KVM learn
> how to handle that memory without it being mapped?

The idea is that the process mmaps them, QEMU would treat them just the
same as a hugetlbfs file descriptor for example.

>> The manager can decide at any time to hide some memory from the parent
>> VM (in order to give it to an enclave).  This would actually be done on
>> request of the parent VM itself [...] But QEMU is
>> untrusted, so the manager cannot rely on QEMU behaving well.  Hence the
>> privilege separation model that was implemented here.
> 
> How does this work?  Is there a revoke mechanism, or does the parent
> just munmap() the memory itself?

The parent has ioctls to add and remove memory from the pidfd-mem.  So
unmapping is just calling the ioctl that removes a range.

>> So what you are suggesting is that KVM manages its own address space
>> instead of host virtual addresses (and with no relationship to host
>> virtual addresses, it would be just a "cookie")?
> 
> [...] For this pidfd-mem scheme in particular, it might avoid the nasty
> corner case I mentioned.  With pidfd-mem as in this patchset, I'm
> concerned about what happens when process A maps some process B
> memory, process B maps some of process A's memory, and there's a
> recursive mapping that results.  Or when a process maps its own
> memory, for that matter.
> 
> Or memfd could get fancier with operations to split memfds, remove
> pages from memfds, etc.  Maybe that's overkill.

Doing it directly with memfd is certainly an option, especially since
MFD_HUGE_* exists.  Basically you'd have a system call to create a
secondary view of the memfd, and the syscall interface could still be
very similar to what is in this patch, in particular the control/access
pair.  Probably this could be used also to implement Matthew Wilcox's ideas.

I still believe that the pidfd-mem concept has merit as a
"capability-like" PTRACE_{PEEK,POKE}DATA replacement, but it would not
need any of privilege separation or mmap support, only direct read/write.

So there's two concepts mixed in one interface in this patch, with two
completely different usecases.  Merging them is clever, but perhaps too
clever.  I can say that since it was my idea. :D

Thanks,

Paolo


  reply	other threads:[~2020-09-05 18:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04 11:31 [RESEND RFC PATCH 0/5] Remote mapping Adalbert Lazăr
2020-09-04 11:31 ` [RESEND RFC PATCH 1/5] mm: add atomic capability to zap_details Adalbert Lazăr
2020-09-04 11:31 ` [RESEND RFC PATCH 2/5] mm: let the VMA decide how zap_pte_range() acts on mapped pages Adalbert Lazăr
2020-09-04 11:31 ` [RESEND RFC PATCH 3/5] mm/mmu_notifier: remove lockdep map, allow mmu notifier to be used in nested scenarios Adalbert Lazăr
2020-09-04 12:03   ` Jason Gunthorpe
2020-09-04 11:31 ` [RESEND RFC PATCH 4/5] mm/remote_mapping: use a pidfd to access memory belonging to unrelated process Adalbert Lazăr
2020-09-04 17:55   ` Oleg Nesterov
2020-09-07 14:30   ` Oleg Nesterov
2020-09-07 15:16     ` Adalbert Lazăr
2020-09-09  8:32     ` Mircea CIRJALIU - MELIU
2020-09-10 16:43       ` Oleg Nesterov
2020-09-07 15:02   ` Christian Brauner
2020-09-07 16:04     ` Mircea CIRJALIU - MELIU
2020-09-04 11:31 ` [RESEND RFC PATCH 5/5] pidfd_mem: implemented remote memory mapping system call Adalbert Lazăr
2020-09-04 19:18   ` Florian Weimer
2020-09-07 14:55   ` Christian Brauner
2020-09-04 12:11 ` [RESEND RFC PATCH 0/5] Remote mapping Jason Gunthorpe
2020-09-04 13:24   ` Mircea CIRJALIU - MELIU
2020-09-04 13:39     ` Jason Gunthorpe
2020-09-04 14:18       ` Mircea CIRJALIU - MELIU
2020-09-04 14:39         ` Jason Gunthorpe
2020-09-04 15:40           ` Mircea CIRJALIU - MELIU
2020-09-04 16:11             ` Jason Gunthorpe
2020-09-04 19:41   ` Matthew Wilcox
2020-09-04 19:49     ` Jason Gunthorpe
2020-09-04 20:08     ` Paolo Bonzini
2020-12-01 18:01     ` Jason Gunthorpe
2020-09-04 19:19 ` Florian Weimer
2020-09-04 20:18   ` Paolo Bonzini
2020-09-07  8:33     ` Christian Brauner
2020-09-04 19:39 ` Andy Lutomirski
2020-09-04 20:09   ` Paolo Bonzini
2020-09-04 20:34     ` Andy Lutomirski
2020-09-04 21:58       ` Paolo Bonzini
2020-09-04 23:17         ` Andy Lutomirski
2020-09-05 18:27           ` Paolo Bonzini [this message]
2020-09-07  8:38             ` Christian Brauner
2020-09-07 12:41           ` Mircea CIRJALIU - MELIU
2020-09-07  7:05         ` Christoph Hellwig
2020-09-07  8:44           ` Paolo Bonzini
2020-09-07 10:25   ` Mircea CIRJALIU - MELIU
2020-09-07 15:05 ` Christian Brauner
2020-09-07 20:43   ` Andy Lutomirski
2020-09-09 11:38     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bbe80f23-86c5-9d8f-8144-f292a6fc81b4@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alazar@bitdefender.com \
    --cc=arnd@arndb.de \
    --cc=christian.brauner@ubuntu.com \
    --cc=cyphar@cyphar.com \
    --cc=graf@amazon.com \
    --cc=jannh@google.com \
    --cc=jglisse@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mcirjaliu@bitdefender.com \
    --cc=mdontu@bitdefender.com \
    --cc=oleg@redhat.com \
    --cc=sargun@sargun.me \
    --cc=stefanha@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).