Re: Understanding osdep_xenforeignmemory_map mmap behaviour

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"Stratos-dev@op-lists.linaro.org"
	<Stratos-dev@op-lists.linaro.org>,
	"mathieu.poirier@linaro.com" <mathieu.poirier@linaro.com>,
	"christopher.w.clark@gmail.com" <christopher.w.clark@gmail.com>,
	"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"olekstysh@gmail.com" <olekstysh@gmail.com>
Subject: Re: Understanding osdep_xenforeignmemory_map mmap behaviour
Date: Wed, 24 Aug 2022 16:58:03 +0100	[thread overview]
Message-ID: <87y1vdh9or.fsf@linaro.org> (raw)
In-Reply-To: <edc33ca4-135e-111a-e198-38b9bae53005@suse.com>


Juergen Gross <jgross@suse.com> writes:

> [[PGP Signed Part:Undecided]]
> On 24.08.22 13:22, Alex Bennée wrote:
>> Andrew Cooper <Andrew.Cooper3@citrix.com> writes:
>> 
>>> On 24/08/2022 10:19, Viresh Kumar wrote:
>>>> On 24-03-22, 06:12, Juergen Gross wrote:
>>>>> For a rather long time we were using "normal" user pages for this purpose,
>>>>> which were just locked into memory for doing the hypercall.
>>>>>
>>>>> Unfortunately there have been very rare problems with that approach, as
>>>>> the Linux kernel can set a user page related PTE to invalid for short
>>>>> periods of time, which led to EFAULT in the hypervisor when trying to
>>>>> access the hypercall data.
>>>>>
>>>>> In Linux this can avoided only by using kernel memory, which is the
>>>>> reason why the hypercall buffers are allocated and mmap()-ed through the
>>>>> privcmd driver.
>>>> Hi Juergen,
>>>>
>>>> I understand why we moved from user pages to kernel pages, but I don't
>>>> fully understand why we need to make two separate calls to map the
>>>> guest memory, i.e. mmap() followed by ioctl(IOCTL_PRIVCMD_MMAPBATCH).
>>>>
>>>> Why aren't we doing all of it from mmap() itself ? I hacked it up to
>>>> check on it and it works fine if we do it all from mmap() itself.
>> As I understand it the MMAPBATCH ioctl is being treated like every
>> other
>> hypercall proxy through the ioctl interface. Which makes sense from the
>> point of view of having a consistent interface to the hypervisor but not
>> from point of view of providing a consistent userspace interface for
>> mapping memory which doesn't care about the hypervisor details.
>> The privcmd_mmapbatch_v2 interface is slightly richer than what you
>> could expose via mmap() because it allows the handling of partial
>> mappings with what I presume is a per-page *err array. If you issued the
>> hypercall directly from the mmap() and one of the pages wasn't mapped by
>> the hypervisor you would have to unwind everything before returning
>> EFAULT to the user.
>> 
>>>> Aren't we abusing the Linux userspace ABI here ? As standard userspace
>>>> code would expect just mmap() to be enough to map the memory. Yes, the
>>>> current user, Xen itself, is adapted to make two calls, but it breaks
>>>> as soon as we want to use something that relies on Linux userspace
>>>> ABI.
>>>>
>>>> For instance, in our case, where we are looking to create
>>>> hypervisor-agnostic virtio backends, the rust-vmm library [1] issues
>>>> mmap() only and expects it to work. It doesn't know it is running on a
>>>> Xen system, and it shouldn't know that as well.
>>>
>>> Use /dev/xen/hypercall which has a sane ABI for getting "safe" memory.
>>> privcmd is very much not sane.
>>>
>>> In practice you'll need to use both.  /dev/xen/hypercall for getting
>>> "safe" memory, and /dev/xen/privcmd for issuing hypercalls for now.
>> I'm unsure what is meant by safe memory here. privcmd_buf_mmap()
>> looks
>> like it just allocates a bunch of GFP_KERNEL pages rather than
>> interacting with the hypervisor directly. Are these the same pages that
>> get used when you eventually call privcmd_ioctl_mmap_batch()?
>
> privcmd_buf_mmap() is allocating kernel pages which are used for data being
> accessed by the hypervisor when doing the hypercall later. This is a generic
> interface being used for all hypercalls, not only for
> privcmd_ioctl_mmap_batch().
>
>> The fact that /dev/xen/hypercall is specified by xen_privcmdbuf_dev is a
>> little confusing TBH.
>> Anyway the goal here is to provide a non-xen aware userspace with
>> standard userspace API to access the guests memory. Perhaps messing
>
> This is what the Xen related libraries are meant for. Your decision to
> ignore those is firing back now.

We didn't ignore them - the initial version of the xen-vhost-master
binary was built with the rust and linking to the Xen libraries. We are
however in the process of moving to more pure rust (with the xen-sys
crate being a pure rust ioctl/hypercall wrapper).

However I was under the impression there where two classes of
hypercalls. ABI stable ones which won't change (which is all we are
planning to implement for xen-sys) and non-stable ABIs which would need
mediating by the xen libs. We are hoping we can do all of VirtIO with
just the stable ABI.

>> around with the semantics of the /dev/xen/[hypercall|privcmd] devices
>> nodes is too confusing.
>> Maybe we could instead:
>>   1. Have the Xen aware VMM ask to make the guests memory visible to
>> the
>>      host kernels address space.
>
> Urgh. This would be a major breach of the Xen security concept.
>
>>   2. When this is done explicitly create a device node to represent it (/dev/xen/dom-%d-mem?)
>>   3. Pass this new device to the non-Xen aware userspace which uses the
>>      standard mmap() call to make the kernel pages visible to userspace
>> Does that make sense?
>
> Maybe from your point of view, but not from the Xen architectural point
> of view IMHO. You are removing basically the main security advantages of
> Xen by generating a kernel interface for mapping arbitrary guest memory
> easily.

We are not talking about doing an end-run around the Xen architecture.
The guest still has to instruct the hypervisor to grant access to its
memory. Currently this is a global thing (i.e. whole address space or
nothing) but obviously more fine grained grants can be done on a
transaction by transaction basis although we are exploring more
efficient mechanisms for this (shared pools and carve outs).

This does raise questions for the mmap interface though - each
individually granted region would need to be mapped into the dom0
userspace virtual address space or perhaps a new flag for mmap() so we
can map the whole address space but expect SIGBUS faults if we access
something that hasn't been granted.

>
>
> Juergen
>
> [2. OpenPGP public key --- application/pgp-keys; OpenPGP_0xB0DE9DD628BF132F.asc]...
>
> [[End of PGP Signed Part]]


-- 
Alex Bennée

next prev parent reply	other threads:[~2022-08-24 16:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24  1:42 Understanding osdep_xenforeignmemory_map mmap behaviour Stefano Stabellini
2022-03-24  5:12 ` Juergen Gross
2022-03-25 16:07   ` Alex Bennée
2022-03-28  6:26     ` Juergen Gross
2022-08-24  9:19   ` Viresh Kumar
2022-08-24  9:44     ` Andrew Cooper
2022-08-24 11:22       ` Alex Bennée
2022-08-24 13:07         ` Juergen Gross
2022-08-24 15:58           ` Alex Bennée [this message]
2022-08-25  6:02             ` Juergen Gross
2022-08-31 16:02               ` Alex Bennée
2022-09-01  6:28                 ` Juergen Gross
2022-08-24 10:09     ` Juergen Gross
2022-08-24 11:47       ` Alex Bennée
2022-08-24 13:14         ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1vdh9or.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Stratos-dev@op-lists.linaro.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=christopher.w.clark@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jgross@suse.com \
    --cc=mathieu.poirier@linaro.com \
    --cc=olekstysh@gmail.com \
    --cc=sstabellini@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.