xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	xen-devel@lists.xensource.com, Tim Deegan <tim@xen.org>
Subject: Re: [Hackathon minutes] PV network improvements
Date: Tue, 21 May 2013 08:52:09 -0400	[thread overview]
Message-ID: <20130521125209.GA492@phenom.dumpdata.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1305211139170.4799@kaball.uk.xensource.com>

On Tue, May 21, 2013 at 11:51:03AM +0100, Stefano Stabellini wrote:
> On Tue, 21 May 2013, Tim Deegan wrote:
> > At 19:31 +0100 on 20 May (1369078279), Wei Liu wrote:
> > > On Mon, May 20, 2013 at 03:08:05PM +0100, Stefano Stabellini wrote:
> > > > J) Map the whole physical memory of the machine in dom0
> > > > If mapping/unmapping or copying slows us down, could we just keep the
> > > > whole physical memory of the machine mapped in dom0 (with corresponding
> > > > IOMMU entries)?
> > > > At that point the frontend could just pass mfn numbers to the backend,
> > > > and the backend would already have them mapped.
> > > > >From a security perspective it doesn't change anything when running
> > > > the backend in dom0, because dom0 is already capable of mapping random
> > > > pages of any guests. QEMU instances do that all the time.
> > > > But it would take away one of the benefits of deploying driver domains:
> > > > we wouldn't be able to run the backends at a lower privilege level.
> > > > However it might still be worth considering as an option? The backend is
> > > > still trusted and protected from the frontend, but the frontend wouldn't
> > > > be protected from the backend.
> > > > 
> > > 
> > > I think Dom0 mapping all machine memory is a good starting point.
> > 
> > I _strongly_ disagree.  The opportunity for disaggregation and reduction
> > of privilege in backends is probably Xen's biggest techical advantage
> > and we should not be taking any backward steps there.
> 
> While I agree with you, as a matter of fact the vast majority of Xen
> installations today do not use driver domains. That didn't stop them
> from enjoying Xen so far. Moreover the frontend/backend interface
> remains narrow and difficult to exploit, it's not a fully emulated
> interface (AHCI / virtio). The backend is still protected from the
> frontend. Having the backend running non-privileged is a great bonus
> and certainly required on a product that allows the user to install
> third party driver domains. However if the driver domains are "trusted"
> then I think they can also be trusted with a full memory map. After all
> it has been the case for all XenServer, OVM and SLES releases so far
> AFAIK.
> 
> An hypothetic future Xen release could offer both increased security
> (driver domains) or increased IO performances (backends with a full
> physical memory map) and give the user a choice between the two. I am
> pretty sure that a non-negligible amount of people would make the
> conscious choice to go for the performance option.
> Why should we be the ones to force security down their throats?
> After all it's all about what the users want from the project.
> 
> Obviously in an ideal world we would be able to offer both at the same
> time, and maybe George's proposal is exactly what is going to achieve
> that. But I was describing the case that requires us to make a choice.

CC-ing Mukesh here as driver domains have some relevance to PVH work.
Please also CC Malcolm here (I don't have his email).

I would say that perhaps a better option is to do both - as in retain
the security architecture Xen has _and_ also provide increased IO performance.

Concurently everybody is also looking at both backend and frontend having a
persistent pool of grants. This means we do setup an "window" from either
backend -> frontend or vice-versa that persists. Said "window" is bolted
for the life-time of the guest. For networking the kernel stack already
copies the pages from the user-space in the kernel and copying
in the kernel to specific pages is mostly using the CPU cache. We need to
exploit that and also make sure that the path is not interrupted.
The grant_mapping on the TX side also looks a nice path - just have to
make sure that the networking API don't try to free the page once the TX
has been done (and this is where Ian's skb deconstructor would be beneficial).

For block it is a bit different as aio's are mapped from kernel to
user-space. But the neat thing there is that there is no need to inspect
the data - when giving it to the DMA device (the exception is DIF/DIX which
need calculate checksums). That is unless one needs to do the
xen_biovec_phys_mergeable (to check if the next page is contingous and
if so add new bio's and copy the data in).

But with PVH and PVHVM driver domains, and also piggybacking on the work
that Malcolm is doing (Xen IOMMU), we can skip that check. (As the PFNs
for the guest would look contingous).

In essence we can do a lot:
 1). not copying or mapping grants if we detect that they are going to
     a DMA device.
 2). The 1) above + also use the Xen IOMMU to take care of setting the
     proper EPT entries for the pages that we need. This could be done
     as part of a grant_copy or grant_light_mapping in the hypervisor. This is
     a case were we MUST copy those pages in the other domain (say the
     Ethernet header). Whether a copy is done or a light mapping
     (b/c the moment the device does the DMA operation on the granted
     page we might as well remove the mapping. Hence the "light" or
     maybe "expiring" grant.
 3). The 2) above + Intel QuickData (a DMA engine that uses the same
     L3 cache that PCI devices use) to keep the copied pages in the L3.
     This has the benefit that when the PCI device is instructed to
     fetch the data, it would do it from the L3 cache and be incredibly quick.
     This would be using the grant_copy, but instead of the hypervisor
     doing it, it instructs the Intel QuickData chipset to do it. Would
     require some form of asynchronous grant_copy mechanism.
 4). Variants of the above.

  reply	other threads:[~2013-05-21 12:52 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-20 14:08 [Hackathon minutes] PV network improvements Stefano Stabellini
2013-05-20 14:49 ` George Dunlap
2013-05-20 18:33   ` Wei Liu
2013-05-21  8:22     ` Ian Campbell
2013-05-21  8:31       ` George Dunlap
2013-05-20 18:31 ` Wei Liu
2013-05-21  8:31   ` Ian Campbell
2013-05-21  9:26   ` Tim Deegan
2013-05-21  9:39     ` Wei Liu
2013-05-21 10:11       ` Tim Deegan
2013-05-21 10:01     ` George Dunlap
2013-05-21 10:06       ` Wei Liu
2013-05-21 10:51     ` Stefano Stabellini
2013-05-21 12:52       ` Konrad Rzeszutek Wilk [this message]
2013-05-21 13:32         ` Stefano Stabellini
2013-05-21 13:42       ` Tim Deegan
2013-05-21 16:58         ` Stefano Stabellini
2013-05-22  9:52           ` Tim Deegan
2013-05-22  9:55           ` Ian Campbell
2013-05-20 19:36 ` annie li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130521125209.GA492@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=tim@xen.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).