qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] PCI access virtualization
@ 2006-01-05 10:14 Michael Renzmann
  2006-01-05 14:25 ` Paul Brook
  2006-01-05 16:25 ` Mark Williamson
  0 siblings, 2 replies; 9+ messages in thread
From: Michael Renzmann @ 2006-01-05 10:14 UTC (permalink / raw)
  To: qemu-devel

Hi all.

Disclaimer: I'm quite new to things like virtual machines, and even
half-baked when it comes to QEMU. I'm aware that my idea might be
absurd, but still it's worth a try :)

I'm wondering if it would be possible to modify QEMU such that VMs could
access PCI devices on the host system. And, if it would be possible at
all, how much work this would be.

Background: I'm one of the developers of MadWifi, which is a linux
driver for WLAN cards based on Atheros chipsets. QEMU could be a great
help in testing distribution-specific issues, as well as issues related
to SMP operation. The only downside is that it's not easily possible to
emulate the necessary WLAN hardware.
Now, if it would be possible to allow a VM to access physical PCI
devices on the host, it could make use of non-emulated hardware for this
purpose. Even more, thanks to the SMP emulation that recently went into
QEMU, it would be possible to test the compatibility of the driver with
SMP systems, even if the host is just a UP system.

I vaguely heard of a feature present in Xen, which allows to assign PCI
devices to one of the guests. I understand Xen works different than
QEMU, but maybe is would be possible to implement something similar.

Any comments appreciated, thanks in advance.

Bye, Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-05 10:14 [Qemu-devel] PCI access virtualization Michael Renzmann
@ 2006-01-05 14:25 ` Paul Brook
  2006-01-05 17:40   ` Mark Williamson
  2006-01-05 16:25 ` Mark Williamson
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Brook @ 2006-01-05 14:25 UTC (permalink / raw)
  To: qemu-devel, mrenzmann

> Background: I'm one of the developers of MadWifi, which is a linux
> driver for WLAN cards based on Atheros chipsets. QEMU could be a great
> help in testing distribution-specific issues, as well as issues related
> to SMP operation. The only downside is that it's not easily possible to
> emulate the necessary WLAN hardware.
> Now, if it would be possible to allow a VM to access physical PCI
> devices on the host, it could make use of non-emulated hardware for this
> purpose. Even more, thanks to the SMP emulation that recently went into
> QEMU, it would be possible to test the compatibility of the driver with
> SMP systems, even if the host is just a UP system.
>
There are two problems:

- IRQ sharing. Sharing host IRQs between native and virtualized devices is 
hard because the host needs to ack the interrupt in the IRQ handler, but 
doesn't really know how to do that until after it's run the guest to see what 
that does.
- DMA. qemu needs to rewrite DMA requests (in both directions) because the 
guest physical memory won't be at the same address on the host. Inlike ISA 
where there's a fixed DMA engine, I don't think there's any general way of 

There are patches that allow virtualization of PCI devices that don't use 
either of the above features. It's sufficient to get some Network cards 
working, but that's about it.

> I vaguely heard of a feature present in Xen, which allows to assign PCI
> devices to one of the guests. I understand Xen works different than
> QEMU, but maybe is would be possible to implement something similar.

Xen is much easier because it cooperates with the host system (ie. xen), so 
both the above problems can be solved by tweaking the guest OS drivers/PCI 
subsystem setup.

If you're testing specific drivers you could probably augment these drivers to 
pass the extra required information to qemu. ie. effectively use a special 
qemu pseudo-PCI interface rather than the normal piix PCI interface.

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-05 10:14 [Qemu-devel] PCI access virtualization Michael Renzmann
  2006-01-05 14:25 ` Paul Brook
@ 2006-01-05 16:25 ` Mark Williamson
  1 sibling, 0 replies; 9+ messages in thread
From: Mark Williamson @ 2006-01-05 16:25 UTC (permalink / raw)
  To: qemu-devel, mrenzmann

> I'm wondering if it would be possible to modify QEMU such that VMs could
> access PCI devices on the host system. And, if it would be possible at
> all, how much work this would be.

Sounds doable but would require a fair bit of hacking - I think the idea of 
doing this under QEmu came up once before in the 2004/2005 timeframe.

For the record, Intel also implemented Emulation + PCI passthrough for their 
SoftSDV project, but that also included extra hardware to allow the virtual 
machine to control an entire PCI bus, not just one device.  (the goal here 
was to have a working IA64 "hardware" platform before IA64 silicon was 
available).

> I vaguely heard of a feature present in Xen, which allows to assign PCI
> devices to one of the guests. I understand Xen works different than
> QEMU, but maybe is would be possible to implement something similar.

Yep, Xen 2.0 supports this.  Support is currently broken in Xen 3.0 (because 
PCI is handled differently) but will be coming back once somebody decides 
what approach we should use and implements it.

Cheers,
Mark

>
> Any comments appreciated, thanks in advance.
>
> Bye, Mike
>
>
>
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-05 14:25 ` Paul Brook
@ 2006-01-05 17:40   ` Mark Williamson
  2006-01-05 18:10     ` Paul Brook
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Williamson @ 2006-01-05 17:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paul Brook

> - IRQ sharing. Sharing host IRQs between native and virtualized devices is
> hard because the host needs to ack the interrupt in the IRQ handler, but
> doesn't really know how to do that until after it's run the guest to see
> what that does.

Could maybe have the (inevitable) kernel portion of the code grab the 
interrupt, and not ack it until userspace does an ioctl on a special file (or 
something like that?).  There are patches floating around for userspace IRQ 
handling, so I guess that could work.

> - DMA. qemu needs to rewrite DMA requests (in both directions) because the
> guest physical memory won't be at the same address on the host. Inlike ISA
> where there's a fixed DMA engine, I don't think there's any general way of

I was under the impression that you could get reasonably far by emulating a 
few of the most popular commercial DMA engine chips and reissuing 
address-corrected commands to the host.  I'm not sure how common it is for 
PCI cards to use custom DMA chips instead, though...

> There are patches that allow virtualization of PCI devices that don't use
> either of the above features. It's sufficient to get some Network cards
> working, but that's about it.

I guess PIO should be easy to work in any case.

> > I vaguely heard of a feature present in Xen, which allows to assign PCI
> > devices to one of the guests. I understand Xen works different than
> > QEMU, but maybe is would be possible to implement something similar.
>
> Xen is much easier because it cooperates with the host system (ie. xen), so
> both the above problems can be solved by tweaking the guest OS drivers/PCI
> subsystem setup.

Yep, XenLinux redefines various macros that were already present to do 
guest-physical <-> host-physical address translations, so DMA Just Works 
(TM).

> If you're testing specific drivers you could probably augment these drivers
> to pass the extra required information to qemu. ie. effectively use a
> special qemu pseudo-PCI interface rather than the normal piix PCI
> interface.

How about something like this?:
I'd imagine you could get away with a special header file with different macro 
defines (as for Xen, above), just in the driver in question, and a special 
"translation device / service" available to the QEmu virtual machine - could 
be as simple as "write the guest physical address to an IO port, returns the 
real physical address on next read".  The virt_to_bus (etc) macros would use 
the translation service to perform the appropriate translation at runtime.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-05 17:40   ` Mark Williamson
@ 2006-01-05 18:10     ` Paul Brook
  2006-01-05 21:13       ` Jim C. Brown
  2006-01-06  1:54       ` Mark Williamson
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Brook @ 2006-01-05 18:10 UTC (permalink / raw)
  To: Mark Williamson; +Cc: qemu-devel

On Thursday 05 January 2006 17:40, Mark Williamson wrote:
> > - IRQ sharing. Sharing host IRQs between native and virtualized devices
> > is hard because the host needs to ack the interrupt in the IRQ handler,
> > but doesn't really know how to do that until after it's run the guest to
> > see what that does.
>
> Could maybe have the (inevitable) kernel portion of the code grab the
> interrupt, and not ack it until userspace does an ioctl on a special file
> (or something like that?).  There are patches floating around for userspace
> IRQ handling, so I guess that could work.

This still requires cooperation from both sides (ie. both the host and guest 
drivers).

> > - DMA. qemu needs to rewrite DMA requests (in both directions) because
> > the guest physical memory won't be at the same address on the host.
> > Inlike ISA where there's a fixed DMA engine, I don't think there's any
> > general way of
>
> I was under the impression that you could get reasonably far by emulating a
> few of the most popular commercial DMA engine chips and reissuing
> address-corrected commands to the host.  I'm not sure how common it is for
> PCI cards to use custom DMA chips instead, though...

IIUC PCI cards don't really have "DMA engines" as such. The PCI bridge just 
maps PCI address space onto physical memory. A Busmaster PCI device can then 
make arbitrary acceses whenever it wants. I expect the default mapping is a 
1:1 mapping of the first 4G of physical ram.

> > There are patches that allow virtualization of PCI devices that don't use
> > either of the above features. It's sufficient to get some Network cards
> > working, but that's about it.
>
> I guess PIO should be easy to work in any case.

Yes.

> > > I vaguely heard of a feature present in Xen, which allows to assign PCI
> > > devices to one of the guests. I understand Xen works different than
> > > QEMU, but maybe is would be possible to implement something similar.
> >
> > Xen is much easier because it cooperates with the host system (ie. xen),
> > so both the above problems can be solved by tweaking the guest OS
> > drivers/PCI subsystem setup.
>
> Yep, XenLinux redefines various macros that were already present to do
> guest-physical <-> host-physical address translations, so DMA Just Works
> (TM).
>
> > If you're testing specific drivers you could probably augment these
> > drivers to pass the extra required information to qemu. ie. effectively
> > use a special qemu pseudo-PCI interface rather than the normal piix PCI
> > interface.
>
> How about something like this?:
> I'd imagine you could get away with a special header file with different
> macro defines (as for Xen, above), just in the driver in question, and a
> special "translation device / service" available to the QEmu virtual
> machine - could be as simple as "write the guest physical address to an IO
> port, returns the real physical address on next read".  The virt_to_bus
> (etc) macros would use the translation service to perform the appropriate
> translation at runtime.

That's exactly the sort of thing I meant. Ideally you'd just implement it as a 
different type of PCI bridge, and everything would just work. I don't know if 
linux supports such heterogeneous configurations though.

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-05 18:10     ` Paul Brook
@ 2006-01-05 21:13       ` Jim C. Brown
  2006-01-06  1:54       ` Mark Williamson
  1 sibling, 0 replies; 9+ messages in thread
From: Jim C. Brown @ 2006-01-05 21:13 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On Thu, Jan 05, 2006 at 06:10:54PM +0000, Paul Brook wrote:
> On Thursday 05 January 2006 17:40, Mark Williamson wrote:
> > > - IRQ sharing. Sharing host IRQs between native and virtualized devices
> > > is hard because the host needs to ack the interrupt in the IRQ handler,
> > > but doesn't really know how to do that until after it's run the guest to
> > > see what that does.
> >
> > Could maybe have the (inevitable) kernel portion of the code grab the
> > interrupt, and not ack it until userspace does an ioctl on a special file
> > (or something like that?).  There are patches floating around for userspace
> > IRQ handling, so I guess that could work.
> 
> This still requires cooperation from both sides (ie. both the host and guest 
> drivers).
> 

This would be a lot easier if linux supported user-space device drivers..
and I believe work is being done in that area.

-- 
Infinite complexity begets infinite beauty.
Infinite precision begets infinite perfection.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-05 18:10     ` Paul Brook
  2006-01-05 21:13       ` Jim C. Brown
@ 2006-01-06  1:54       ` Mark Williamson
  2006-01-06 13:27         ` Paul Brook
  1 sibling, 1 reply; 9+ messages in thread
From: Mark Williamson @ 2006-01-06  1:54 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

> > Could maybe have the (inevitable) kernel portion of the code grab the
> > interrupt, and not ack it until userspace does an ioctl on a special file
> > (or something like that?).  There are patches floating around for
> > userspace IRQ handling, so I guess that could work.
>
> This still requires cooperation from both sides (ie. both the host and
> guest drivers).

True.

> IIUC PCI cards don't really have "DMA engines" as such. The PCI bridge just
> maps PCI address space onto physical memory. A Busmaster PCI device can
> then make arbitrary acceses whenever it wants. I expect the default mapping
> is a 1:1 mapping of the first 4G of physical ram.

I was under the impression that the on-card bus-mastering engines were often 
one of a few standard designs -  this was suggested to me by someone more 
knowledgeable than myself but must admit I don't have any hard evidence that 
it's the case ;-)

A "half way" solution would, I guess, be to code up a partial emulator for the 
device, emulating key operations (by translating them and reissuing to the 
device) whilst allowing other operations to pass directly through.

> > How about something like this?:
> > I'd imagine you could get away with a special header file with different
> > macro defines (as for Xen, above), just in the driver in question, and a
> > special "translation device / service" available to the QEmu virtual
> > machine - could be as simple as "write the guest physical address to an
> > IO port, returns the real physical address on next read".  The
> > virt_to_bus (etc) macros would use the translation service to perform the
> > appropriate translation at runtime.
>
> That's exactly the sort of thing I meant. Ideally you'd just implement it
> as a different type of PCI bridge, and everything would just work. I don't
> know if linux supports such heterogeneous configurations though.

I think that wouldn't be sufficient due to the way Linux behaves - virtual to 
physical address conversion will be done within the guest driver and then 
issued directly to the hardware, so a bridge driver won't be able to fix up 
the DMA addressing correctly.

I guess with some header file hacks it'd be possible to make the conversion 
macros themselves switchable, so that (in principle) any driver could be 
loaded in "native" or "QEmu passthrough" mode by specifiying an optional 
load-time argument - that would be cool.  Making it automatically switch 
would be a bit tricky though... maybe the macros could be made to switch on 
the basis of the module PCI ID table :-D *evil grin*

Cheers,
Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-06  1:54       ` Mark Williamson
@ 2006-01-06 13:27         ` Paul Brook
  2006-01-06 14:23           ` Mark Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Brook @ 2006-01-06 13:27 UTC (permalink / raw)
  To: Mark Williamson; +Cc: qemu-devel

> > IIUC PCI cards don't really have "DMA engines" as such. The PCI bridge
> > just maps PCI address space onto physical memory. A Busmaster PCI device
> > can then make arbitrary acceses whenever it wants. I expect the default
> > mapping is a 1:1 mapping of the first 4G of physical ram.
>
> I was under the impression that the on-card bus-mastering engines were
> often one of a few standard designs -  this was suggested to me by someone
> more knowledgeable than myself but must admit I don't have any hard
> evidence that it's the case ;-)

What about 64-bit systems that use an IOMMU? Don't they already have a 64-bit 
physical -> 32-bit IO address space mapping? I don't know if this mapping is 
per-bus or system global.

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] PCI access virtualization
  2006-01-06 13:27         ` Paul Brook
@ 2006-01-06 14:23           ` Mark Williamson
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Williamson @ 2006-01-06 14:23 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

> What about 64-bit systems that use an IOMMU? Don't they already have a
> 64-bit physical -> 32-bit IO address space mapping? I don't know if this
> mapping is per-bus or system global.

If it's an Intel x86_64 machine (no IOMMU yet), IIRC the 32-bit PCI devices 
are capable of DMA-ing into the bottom of memory - there's no remapping.  For 
this reason I've seen various discussions about a ZONE_DMA_32 so that 32 bit 
devices can request memory in this area...

If there's an IOMMU then the addressable 32 bits could of course map anywhere 
in the host memory... For something like the Opteron, I imagine the IOMMU 
mappings happen in the on-CPU memory controller, so I'd guess there'd be (at 
most) one mapping per processor package.  I've not actually looked into the 
specifics of this, so take it with a pinch of salt ;-)

There's also a SoftIOMMU driver under Linux that fakes out IOMMU behaviour by 
using bounce buffers.  Actually, thinking about it I agree with you - it 
*ought* to be possible to implement an IOMMU driver that does (at least some) 
of what you'd want to do PCI passthrough...  the Linux APIs do require the 
kernel to be notified of DMA-able memory the driver intends to use.

Cheers,
Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-01-06 14:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-05 10:14 [Qemu-devel] PCI access virtualization Michael Renzmann
2006-01-05 14:25 ` Paul Brook
2006-01-05 17:40   ` Mark Williamson
2006-01-05 18:10     ` Paul Brook
2006-01-05 21:13       ` Jim C. Brown
2006-01-06  1:54       ` Mark Williamson
2006-01-06 13:27         ` Paul Brook
2006-01-06 14:23           ` Mark Williamson
2006-01-05 16:25 ` Mark Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).