Re: [PATCH] Enable non page boundary BAR device assignment

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Graf <agraf@suse.de>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>,
	"kvm@vger.kernel.org list" <kvm@vger.kernel.org>
Subject: Re: [PATCH] Enable non page boundary BAR device assignment
Date: Thu, 10 Dec 2009 12:42:03 +0200	[thread overview]
Message-ID: <20091210104202.GD11028@redhat.com> (raw)
In-Reply-To: <EC3CA0CF-C0FA-429B-B348-28265FFD9E9F@suse.de>

On Thu, Dec 10, 2009 at 11:31:54AM +0100, Alexander Graf wrote:
> 
> On 10.12.2009, at 11:27, Michael S. Tsirkin wrote:
> 
> > On Thu, Dec 10, 2009 at 11:08:58AM +0100, Alexander Graf wrote:
> >> 
> >> On 10.12.2009, at 10:52, Alexander Graf wrote:
> >> 
> >>> 
> >>> On 10.12.2009, at 10:43, Michael S. Tsirkin wrote:
> >>> 
> >>>> On Thu, Dec 10, 2009 at 07:16:04AM +0200, Muli Ben-Yehuda wrote:
> >>>>> On Wed, Dec 09, 2009 at 06:38:54PM +0100, Alexander Graf wrote:
> >>>>> 
> >>>>>> While trying to get device passthrough working with an emulex hba,
> >>>>>> kvm refused to pass it through because it has a BAR of 256 bytes:
> >>>>>> 
> >>>>>>      Region 0: Memory at d2100000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>      Region 2: Memory at d2101000 (64-bit, non-prefetchable) [size=256]
> >>>>>>      Region 4: I/O ports at b100 [size=256]
> >>>>>> 
> >>>>>> Since the page boundary is an arbitrary optimization to allow 1:1
> >>>>>> mapping of physical to virtual addresses, we can still take the old
> >>>>>> MMIO callback route.
> >>>>>> 
> >>>>>> So let's add a second code path that allows for size & 0xFFF != 0
> >>>>>> sized regions by looping it through userspace.
> >>>>> 
> >>>>> That makes sense in general *but* the 4K-aligned check isn't just an
> >>>>> optimization, it also has a security implication. Consider the
> >>>>> theoretical case where has a multi-function device has BARs for two
> >>>>> functions on the same page (within a 4K boundary), and each function
> >>>>> is assigned to a different guest. With your current patch both guests
> >>>>> will be able to write to each other's BARs. Another case is where a
> >>>>> device has a bug and you must not write beyond the BAR or Bad Things
> >>>>> Happen. With this patch an *unprivileged* guest could exploit that bug
> >>>>> and make bad things happen.
> >>>>> 
> >>>>> This can be fixed if the slow userspace mmio path checks that all MMIO
> >>>>> accesses by a guest fall within the portion of the page that is
> >>>>> assigned to it.
> >>>> 
> >>>> This patch seems to implement range checks correctly,
> >>>> let me know if I am missing something.
> >>>> 
> >>>> One also notes that we currently link qemu with libpci
> >>>> which I think requires admin cap to work.
> >>>> However, in the future we might extend this to
> >>>> also support getting device fds over a unix socket
> >>>> from a higher priviledged process.
> >>>> 
> >>>> If or when this is done, we will have to be
> >>>> extra careful when passing
> >>>> device file descriptor to an unpriveledged qemu process if
> >>>> the BARs are less than full page in size: mapping
> >>>> such BAR will allow qemu access outside this BAR.
> >>>> 
> >>>> A possible solution to this problem
> >>>> if/when it arises would be adding yet another sysfs file
> >>>> for each resource, which would allow read/write but not
> >>>> mmap access, and perform range checks in the kernel.
> >>> 
> >>> Sounds like the best solution to this problem, yeah. Though we'd only need those for non-page-boundary BARs. So I guess the best would be to always export them from the kernel, but only use them when BAR & (PAGE_SIZE-1).
> >> 
> >> Hm, or add read/write fd functions that always do boundary checks to the existing interface and only allow mmap on size & PAGE_SIZE. Or only allow non-aligned mmap when the admin cap is present.
> >> 
> >> Alex
> > 
> > This might break existing applications.
> > We don't want that.
> 
> Well currently you can't mmap the resource at all without at least r/w rights on the file, right?

You could have dropped the cap or got the fd from another
process.

> But yeah, we'd probably change behavior that could break someone - sigh.
> 
> Alex

next prev parent reply	other threads:[~2009-12-10 10:44 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-09 17:38 [PATCH] Enable non page boundary BAR device assignment Alexander Graf
2009-12-09 20:49 ` Michael S. Tsirkin
2009-12-09 21:06   ` Alexander Graf
2009-12-10 10:35     ` Avi Kivity
2009-12-10  5:16 ` Muli Ben-Yehuda
2009-12-10  9:35   ` Alexander Graf
2009-12-10 10:21     ` Muli Ben-Yehuda
2009-12-10  9:43   ` Michael S. Tsirkin
2009-12-10  9:52     ` Alexander Graf
2009-12-10 10:08       ` Alexander Graf
2009-12-10 10:27         ` Michael S. Tsirkin
2009-12-10 10:31           ` Alexander Graf
2009-12-10 10:42             ` Michael S. Tsirkin [this message]
2009-12-10 10:23       ` Muli Ben-Yehuda
2009-12-10 10:31         ` Alexander Graf
2009-12-10 10:37           ` Muli Ben-Yehuda
2009-12-10 10:56             ` Michael S. Tsirkin
2009-12-10 11:09               ` Alexander Graf
2009-12-10 11:21                 ` Michael S. Tsirkin
2009-12-10 12:12                   ` Gleb Natapov
2009-12-10 11:28               ` Muli Ben-Yehuda
2009-12-10 11:34                 ` Alexander Graf
2009-12-10 11:46                   ` Michael S. Tsirkin
2009-12-10 11:37                 ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2009-12-10 23:06 Alexander Graf
2009-12-11 11:05 ` Michael S. Tsirkin
2009-12-15 18:16   ` Alexander Graf
2009-12-15 18:20     ` Michael S. Tsirkin
2009-12-15 18:24       ` Alexander Graf
2009-12-16 20:12         ` Muli Ben-Yehuda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091210104202.GD11028@redhat.com \
    --to=mst@redhat.com \
    --cc=agraf@suse.de \
    --cc=kvm@vger.kernel.org \
    --cc=muli@il.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.