KVM call minutes 2013-01-29

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* KVM call minutes 2013-01-29
@ 2013-01-29 15:41 Juan Quintela
  2013-01-29 16:01 ` Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Juan Quintela @ 2013-01-29 15:41 UTC (permalink / raw)
  To: KVM devel mailing list, qemu-devel qemu-devel, Alexander Graf,
	Andreas Färber

* Buildbot: discussed on the list (Andreas retired it)

* Replacing select(2) so that we will not hit the 1024 fd_set limit in the
  future. (stefan)

  Add checks for fd's bigger than 1024? multifunction devices uses lot
  of fd's for device.

  Portability?
  Use glib?  and let it use poll underneath.
  slirp is a problem.
  in the end loop: moving to a glib event loop, how we arrive there is the discussion.

* Outstanding virtio work for 1.4
  - Multiqueue virtio-net (Amos/Michael)
    version appeared today, problably it is on mergeable state
  - Refactorings (Fred/Peter)
    unlike before the hard freeze
  - virtio-ccw (Cornelia/Alex)
    conflict with multiqueue problably (alex)
    shouldn't (famous last words)
  - Do virtio-ccw used old style virtio API, and make integrating the
    refactorings more difficult?
  - Pushing refactorings to 1.5

* What's the plan for -device and IRQ assignment? (Alex)

  Alex will fill this

* Portio port to new memory regions?
  Andreas, could you fill?

Thanks, Juan.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29
  2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
@ 2013-01-29 16:01 ` Paolo Bonzini
  2013-01-29 16:47   ` Anthony Liguori
  2013-01-29 20:53 ` Alexander Graf
  2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
  2 siblings, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-01-29 16:01 UTC (permalink / raw)
  To: quintela
  Cc: KVM devel mailing list, qemu-devel qemu-devel, Alexander Graf,
	Andreas Färber

Il 29/01/2013 16:41, Juan Quintela ha scritto:
> * Replacing select(2) so that we will not hit the 1024 fd_set limit in the
>   future. (stefan)
> 
>   Add checks for fd's bigger than 1024? multifunction devices uses lot
>   of fd's for device.
> 
>   Portability?
>   Use glib?  and let it use poll underneath.
>   slirp is a problem.
>   in the end loop: moving to a glib event loop, how we arrive there is the discussion.

We can use g_poll while keeping the main-loop.c wrappers around the glib
event loop.  Both slirp and iohandler.c access the fd_sets randomly, so
we need to remember some state between the fill and poll functions.  We
can use two main-loop.c functions:

int qemu_add_poll_fd(int fd, int events);

  select: writes the events into three fd_sets, returns the file
  descriptor itself

  poll: writes a GPollFD into a dynamically-sized array (of GPollFDs)
  and returns the index in the array.

int qemu_get_poll_fd_revents(int index);

  select: takes the file descriptor (returned by qemu_add_poll_fd),
  makes up revents based on the three fd_sets

  poll: takes the index into the array and returns the corresponding
  revents

iohandler.c can simply store the index into struct IOHandlerRecord, and
use it later.  slirp can do the same for struct socket.

The select code can be kept for Windows after POSIX switches to poll.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29
  2013-01-29 16:01 ` Paolo Bonzini
@ 2013-01-29 16:47   ` Anthony Liguori
  2013-01-29 17:36     ` Paolo Bonzini
  0 siblings, 1 reply; 57+ messages in thread
From: Anthony Liguori @ 2013-01-29 16:47 UTC (permalink / raw)
  To: Paolo Bonzini, quintela
  Cc: KVM devel mailing list, qemu-devel qemu-devel, Alexander Graf,
	Andreas Färber

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 29/01/2013 16:41, Juan Quintela ha scritto:
>> * Replacing select(2) so that we will not hit the 1024 fd_set limit in the
>>   future. (stefan)
>> 
>>   Add checks for fd's bigger than 1024? multifunction devices uses lot
>>   of fd's for device.
>> 
>>   Portability?
>>   Use glib?  and let it use poll underneath.
>>   slirp is a problem.
>>   in the end loop: moving to a glib event loop, how we arrive there is the discussion.
>
> We can use g_poll while keeping the main-loop.c wrappers around the glib
> event loop.  Both slirp and iohandler.c access the fd_sets randomly, so
> we need to remember some state between the fill and poll functions.  We
> can use two main-loop.c functions:
>
> int qemu_add_poll_fd(int fd, int events);
>
>   select: writes the events into three fd_sets, returns the file
>   descriptor itself
>
>   poll: writes a GPollFD into a dynamically-sized array (of GPollFDs)
>   and returns the index in the array.
>
> int qemu_get_poll_fd_revents(int index);
>
>   select: takes the file descriptor (returned by qemu_add_poll_fd),
>   makes up revents based on the three fd_sets
>
>   poll: takes the index into the array and returns the corresponding
>   revents
>
> iohandler.c can simply store the index into struct IOHandlerRecord, and
> use it later.  slirp can do the same for struct socket.
>
> The select code can be kept for Windows after POSIX switches to poll.

Doesn't g_poll already do this under the covers for Windows?

Regards,

Anthony Liguori

>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29
  2013-01-29 16:47   ` Anthony Liguori
@ 2013-01-29 17:36     ` Paolo Bonzini
  0 siblings, 0 replies; 57+ messages in thread
From: Paolo Bonzini @ 2013-01-29 17:36 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Andreas Färber, qemu-devel qemu-devel,
	KVM devel mailing list, quintela

Il 29/01/2013 17:47, Anthony Liguori ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> Il 29/01/2013 16:41, Juan Quintela ha scritto:
>>> * Replacing select(2) so that we will not hit the 1024 fd_set limit in the
>>>   future. (stefan)
>>>
>>>   Add checks for fd's bigger than 1024? multifunction devices uses lot
>>>   of fd's for device.
>>>
>>>   Portability?
>>>   Use glib?  and let it use poll underneath.
>>>   slirp is a problem.
>>>   in the end loop: moving to a glib event loop, how we arrive there is the discussion.
>>
>> We can use g_poll while keeping the main-loop.c wrappers around the glib
>> event loop.  Both slirp and iohandler.c access the fd_sets randomly, so
>> we need to remember some state between the fill and poll functions.  We
>> can use two main-loop.c functions:
>>
>> int qemu_add_poll_fd(int fd, int events);
>>
>>   select: writes the events into three fd_sets, returns the file
>>   descriptor itself
>>
>>   poll: writes a GPollFD into a dynamically-sized array (of GPollFDs)
>>   and returns the index in the array.
>>
>> int qemu_get_poll_fd_revents(int index);
>>
>>   select: takes the file descriptor (returned by qemu_add_poll_fd),
>>   makes up revents based on the three fd_sets
>>
>>   poll: takes the index into the array and returns the corresponding
>>   revents
>>
>> iohandler.c can simply store the index into struct IOHandlerRecord, and
>> use it later.  slirp can do the same for struct socket.
>>
>> The select code can be kept for Windows after POSIX switches to poll.
> 
> Doesn't g_poll already do this under the covers for Windows?

No, g_poll is for synchronization objects (like Linux eventfd or
timerfd).  Sockets still require select.  You can tie a socket to a
synchronization object; this way socket events can exit g_poll, and in
fact that's exactly what QEMU does.  But you still need to retrieve the
currently-active events with select, so iohandler.c and slirp (which use
sockets) need to work in terms of select.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29
  2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
  2013-01-29 16:01 ` Paolo Bonzini
@ 2013-01-29 20:53 ` Alexander Graf
  2013-01-29 21:39   ` Anthony Liguori
  2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
  2 siblings, 1 reply; 57+ messages in thread
From: Alexander Graf @ 2013-01-29 20:53 UTC (permalink / raw)
  To: quintela
  Cc: KVM devel mailing list, qemu-devel qemu-devel,
	Andreas Färber, Anthony Liguori

On 01/29/2013 04:41 PM, Juan Quintela wrote:
> * Buildbot: discussed on the list (Andreas retired it)
>
> * Replacing select(2) so that we will not hit the 1024 fd_set limit in the
>    future. (stefan)
>
>    Add checks for fd's bigger than 1024? multifunction devices uses lot
>    of fd's for device.
>
>    Portability?
>    Use glib?  and let it use poll underneath.
>    slirp is a problem.
>    in the end loop: moving to a glib event loop, how we arrive there is the discussion.
>
>
> * Outstanding virtio work for 1.4
>    - Multiqueue virtio-net (Amos/Michael)
>      version appeared today, problably it is on mergeable state
>    - Refactorings (Fred/Peter)
>      unlike before the hard freeze
>    - virtio-ccw (Cornelia/Alex)
>      conflict with multiqueue problably (alex)
>      shouldn't (famous last words)
>    - Do virtio-ccw used old style virtio API, and make integrating the
>      refactorings more difficult?
>    - Pushing refactorings to 1.5
>
> * What's the plan for -device and IRQ assignment? (Alex)
>
>    Alex will fill this

When using -device, we can not specify an IRQ line to attach to the 
device. This works for some special buses like PCI, but not in the 
generic case. We need it generically for virtio-mmio and for potential 
platform assigned vfio devices though.

The conclusion we came up with was that in order to model IRQ lines 
between arbitrary devices, we should use QOM and the QOM name space. 
Details are up for Anthony to fill in :).


Alex


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29
  2013-01-29 20:53 ` Alexander Graf
@ 2013-01-29 21:39   ` Anthony Liguori
  2013-01-30  7:02     ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
  0 siblings, 1 reply; 57+ messages in thread
From: Anthony Liguori @ 2013-01-29 21:39 UTC (permalink / raw)
  To: Alexander Graf, quintela
  Cc: KVM devel mailing list, qemu-devel qemu-devel,
	Andreas Färber

Alexander Graf <agraf@suse.de> writes:

> On 01/29/2013 04:41 PM, Juan Quintela wrote:
>>    Alex will fill this
>
> When using -device, we can not specify an IRQ line to attach to the 
> device. This works for some special buses like PCI, but not in the 
> generic case. We need it generically for virtio-mmio and for potential 
> platform assigned vfio devices though.
>
> The conclusion we came up with was that in order to model IRQ lines 
> between arbitrary devices, we should use QOM and the QOM name space. 
> Details are up for Anthony to fill in :).

Oh good :-)  This is how far I got since I last touched this problem.

https://github.com/aliguori/qemu/commits/qom-pin.4

qemu_irq is basically foreign to QOM/qdev.  There are two things I did
to solve this.  The first is to have a stateful Pin object.  Stateful is
important because qemu_irq is totally broken wrt reset and live
migration as it stands today.

It's pretty easy to have a Pin object that can "connect" to a qemu_irq
source or sink which means we can incrementally refactor by first
converting each device under a bus to using Pins (using the qemu_irq
connect interface to maintain compat) until the bus controller can be
converted to export Pins allowing a full switch to using Pins only for
that bus.

The problems I ran into were (1) this is a lot of work (2) it basically
requires that all bus children have been qdev/QOM-ified.  Even with
something like the ISA bus which is where I started, quite a few devices
were not qdevified still.

I'm not going to be able to work on this in the foreseeable future, but
if someone wants to take it over, I'd be happy to provide advice.

I'm also open to other approaches that require less refactoring but I
honestly don't know that there is a way to avoid the heavy lifting.

Regards,

Anthony Liguori

>
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29)
  2013-01-29 21:39   ` Anthony Liguori
@ 2013-01-30  7:02     ` Markus Armbruster
  2013-01-30  8:39       ` What to do about non-qdevified devices? Andreas Färber
  2013-01-30 10:36       ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
  0 siblings, 2 replies; 57+ messages in thread
From: Markus Armbruster @ 2013-01-30  7:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel qemu-devel, Andreas Färber, Alexander Graf,
	KVM devel mailing list, quintela

Anthony Liguori <aliguori@us.ibm.com> writes:

[...]
> The problems I ran into were (1) this is a lot of work (2) it basically
> requires that all bus children have been qdev/QOM-ified.  Even with
> something like the ISA bus which is where I started, quite a few devices
> were not qdevified still.

So what's the plan to complete the qdevification job?  Lay really low
and quietly hope the problem goes away?  We've tried that for about
three years, doesn't seem to work.

[...]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: What to do about non-qdevified devices?
  2013-01-30  7:02     ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
@ 2013-01-30  8:39       ` Andreas Färber
  2013-01-30 10:36       ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
  1 sibling, 0 replies; 57+ messages in thread
From: Andreas Färber @ 2013-01-30  8:39 UTC (permalink / raw)
  To: Markus Armbruster, Anthony Liguori
  Cc: Alexander Graf, Juan Quintela, qemu-devel qemu-devel,
	KVM devel mailing list, Paolo Bonzini, Blue Swirl

Am 30.01.2013 08:02, schrieb Markus Armbruster:
> Anthony Liguori <aliguori@us.ibm.com> writes:
> 
> [...]
>> The problems I ran into were (1) this is a lot of work (2) it basically
>> requires that all bus children have been qdev/QOM-ified.  Even with
>> something like the ISA bus which is where I started, quite a few devices
>> were not qdevified still.
> 
> So what's the plan to complete the qdevification job?  Lay really low
> and quietly hope the problem goes away?  We've tried that for about
> three years, doesn't seem to work.

Stating (file) names would make that discussion much easier... ;)

I'd expect non-qdev'ified devices to rather be SysBusDevices (e.g.,
m68k, sh4, ppc). PReP's pc87312 qdev'ification was forgotten for 1.2 and
recently merged.
Would dma.c be a candidate for ISADevice? It uses isa_* API. (The stubs
in sun4m.c/sun4u.c due to use in fdc.c might be a candidate for stubs/
at least, short of an fdc.c rewrite.)

I recently went through all ISADevices and QOM'ified them:
https://lists.gnu.org/archive/html/qemu-devel/2012-11/msg02746.html

It became too late for 1.4 and I'm not quite sure where Anthony wanted
to draw the line between his 1) and 2):
https://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00071.html
Thus I've only been rebasing my queue [1] without sending a v2 so far.

Lack of an official ISA maintainer for reviewing is another issue, any
volunteers? :)

Cheers,
Andreas

[1] https://github.com/afaerber/qemu-cpu/commits/realize-isa

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29)
  2013-01-30  7:02     ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
  2013-01-30  8:39       ` What to do about non-qdevified devices? Andreas Färber
@ 2013-01-30 10:36       ` Peter Maydell
  2013-01-30 12:35         ` What to do about non-qdevified devices? Markus Armbruster
  1 sibling, 1 reply; 57+ messages in thread
From: Peter Maydell @ 2013-01-30 10:36 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Anthony Liguori, KVM devel mailing list, quintela, Alexander Graf,
	qemu-devel qemu-devel, Andreas Färber

On 30 January 2013 07:02, Markus Armbruster <armbru@redhat.com> wrote:
> Anthony Liguori <aliguori@us.ibm.com> writes:
>
> [...]
>> The problems I ran into were (1) this is a lot of work (2) it basically
>> requires that all bus children have been qdev/QOM-ified.  Even with
>> something like the ISA bus which is where I started, quite a few devices
>> were not qdevified still.
>
> So what's the plan to complete the qdevification job?  Lay really low
> and quietly hope the problem goes away?  We've tried that for about
> three years, doesn't seem to work.

Do we have a list of not-yet-qdevified devices? Maybe we need to
start saying "fix X Y and Z or platform P is dropped from the next
release". (This would of course be easier if we had a way to let users
know that platform P was in danger...)

-- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
  2013-01-29 16:01 ` Paolo Bonzini
  2013-01-29 20:53 ` Alexander Graf
@ 2013-01-30 11:39 ` Andreas Färber
  2013-01-30 11:48   ` Peter Maydell
                     ` (2 more replies)
  2 siblings, 3 replies; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 11:39 UTC (permalink / raw)
  To: Juan Quintela
  Cc: KVM devel mailing list, qemu-devel, Alexander Graf,
	Benjamin Herrenschmidt, qemu-ppc, Hervé Poussineau,
	David Gibson, Gerd Hoffmann, Alon Levy, Michael S. Tsirkin,
	Anthony Liguori

Am 29.01.2013 16:41, schrieb Juan Quintela:
> * Portio port to new memory regions?
>   Andreas, could you fill?

MemoryRegion's .old_portio mechanism requires workarounds for VGA on
ppc, affecting among others the sPAPR PCI host bridge:
http://git.qemu.org/?p=qemu.git;a=commit;h=a3cfa18eb075c7ef78358ca1956fe7b01caa1724

Patches were posted and merged removing all .old_portio users but one:
hw/ioport.c:portio_list_add_1(), used by portio_list_add()

hw/isa-bus.c:    portio_list_add(piolist, isabus->address_space_io, start);
hw/qxl.c:    portio_list_add(qxl_vga_port_list,
pci_address_space_io(dev), 0x3b0);
hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
hw/vga.c:        portio_list_add(vbe_port_list, address_space_io, 0x1ce);

Proposal by hpoussin was to move _list_add() code to ISADevice:
http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html

Concerns:
* PCI devices (VGA, QXL) register I/O ports as well
  => above patches add dependency on ISABus to machines
     -> "<benh> no mac ever had one"
  => PCIDevice shouldn't use ISA API with NULL ISADevice
* Lack of avi: Who decides about memory API these days?

armbru and agraf concluded that moving this into ISA is wrong.

=> I will drop the remaining ioport patches from above series.

Suggestions on how to proceed with tackling the issue are welcome.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
@ 2013-01-30 11:48   ` Peter Maydell
  2013-01-30 12:31     ` Michael S. Tsirkin
                       ` (3 more replies)
  2013-01-30 13:59   ` [Qemu-devel] " Anthony Liguori
  2013-01-30 15:45   ` [Qemu-devel] " Gerd Hoffmann
  2 siblings, 4 replies; 57+ messages in thread
From: Peter Maydell @ 2013-01-30 11:48 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Juan Quintela, KVM devel mailing list, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, Hervé Poussineau, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, Alon Levy, David Gibson

On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
> Proposal by hpoussin was to move _list_add() code to ISADevice:
> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>
> Concerns:
> * PCI devices (VGA, QXL) register I/O ports as well
>   => above patches add dependency on ISABus to machines
>      -> "<benh> no mac ever had one"
>   => PCIDevice shouldn't use ISA API with NULL ISADevice
> * Lack of avi: Who decides about memory API these days?
>
> armbru and agraf concluded that moving this into ISA is wrong.
>
> => I will drop the remaining ioport patches from above series.
>
> Suggestions on how to proceed with tackling the issue are welcome.

How does this stuff work on real hardware? I would have
expected that a PCI device registering the fact it has
IO ports would have to do so via the PCI controller it
is plugged into...

My naive don't-know-much-about-portio suggestion is that this
should work the same way as memory regions: each device
provides portio regions, and the controller for the bus
(ISA or PCI) exposes those to the next layer up, and
something at board level maps it all into the right places.

-- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:48   ` Peter Maydell
@ 2013-01-30 12:31     ` Michael S. Tsirkin
  2013-01-30 13:24       ` [Qemu-devel] " Anthony Liguori
  2013-01-30 12:32     ` Alexander Graf
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 12:31 UTC (permalink / raw)
  To: Peter Maydell
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Hervé Poussineau, Gerd Hoffmann, Anthony Liguori, qemu-ppc,
	David Gibson, Andreas Färber, Alon Levy

On Wed, Jan 30, 2013 at 11:48:14AM +0000, Peter Maydell wrote:
> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
> > Proposal by hpoussin was to move _list_add() code to ISADevice:
> > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
> >
> > Concerns:
> > * PCI devices (VGA, QXL) register I/O ports as well
> >   => above patches add dependency on ISABus to machines
> >      -> "<benh> no mac ever had one"
> >   => PCIDevice shouldn't use ISA API with NULL ISADevice
> > * Lack of avi: Who decides about memory API these days?
> >
> > armbru and agraf concluded that moving this into ISA is wrong.
> >
> > => I will drop the remaining ioport patches from above series.
> >
> > Suggestions on how to proceed with tackling the issue are welcome.
> 
> How does this stuff work on real hardware? I would have
> expected that a PCI device registering the fact it has
> IO ports would have to do so via the PCI controller it
> is plugged into...

All programming is done by the OS, devices do not register
with controller.

Each bridge has two ways to claim an IO transaction:
- transaction is within the window programmed in the bridge
- subtractive decoding enabled and no one else claims the transaction

At the bus level, transaction happens on a bus and an appropriate device
will claim it.

> My naive don't-know-much-about-portio suggestion is that this
> should work the same way as memory regions: each device
> provides portio regions, and the controller for the bus
> (ISA or PCI) exposes those to the next layer up, and
> something at board level maps it all into the right places.
> 
> -- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:48   ` Peter Maydell
  2013-01-30 12:31     ` Michael S. Tsirkin
@ 2013-01-30 12:32     ` Alexander Graf
  2013-01-30 13:09     ` Markus Armbruster
  2013-01-30 17:55     ` Andreas Färber
  3 siblings, 0 replies; 57+ messages in thread
From: Alexander Graf @ 2013-01-30 12:32 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Andreas Färber, Juan Quintela, KVM devel mailing list,
	Michael Tsirkin, qemu-devel qemu-devel, Hervé Poussineau,
	Gerd Hoffmann, Anthony Liguori, qemu-ppc, Alon Levy, David Gibson,
	Benjamin Herrenschmidt


On 30.01.2013, at 12:48, Peter Maydell wrote:

> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
>> Proposal by hpoussin was to move _list_add() code to ISADevice:
>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>> 
>> Concerns:
>> * PCI devices (VGA, QXL) register I/O ports as well
>>  => above patches add dependency on ISABus to machines
>>     -> "<benh> no mac ever had one"
>>  => PCIDevice shouldn't use ISA API with NULL ISADevice
>> * Lack of avi: Who decides about memory API these days?
>> 
>> armbru and agraf concluded that moving this into ISA is wrong.
>> 
>> => I will drop the remaining ioport patches from above series.
>> 
>> Suggestions on how to proceed with tackling the issue are welcome.
> 
> How does this stuff work on real hardware? I would have
> expected that a PCI device registering the fact it has
> IO ports would have to do so via the PCI controller it
> is plugged into...

That's pretty much how it works for PCI hardware, yes.

For ISA like hardware, I asked Ben last night:

29-01-2013 23:41:10 > agraf: benh: hey ben :)
29-01-2013 23:41:50 > agraf: benh: do you remember if g3 beige (grackle) and/or U2 based macs had an actual ISA bus exposed through MMIO or whether it was PCI only with a PIO compat region mapped by the PCI controller?
29-01-2013 23:59:28 < benh!~benh@180.200.150.145: agraf: no ISA
29-01-2013 23:59:48 < benh!~benh@180.200.150.145: agraf: no mac ever had one
29-01-2013 23:59:57 > agraf: benh: well, MCP750 has one
30-01-2013 00:00:06 > agraf: benh: that's why I'm asking :)
30-01-2013 00:00:17 < benh!~benh@180.200.150.145: mcp750 ? what is this ?
30-01-2013 00:00:28 > agraf: benh: some motorola soc
30-01-2013 00:00:39 < benh!~benh@180.200.150.145: ah ok
30-01-2013 00:00:50 < benh!~benh@180.200.150.145: mostly ISA is just hooked onto PCI anyway
30-01-2013 00:00:59 < benh!~benh@180.200.150.145: ie, PCI cycles with low addresses land on ISA
30-01-2013 00:01:59 > agraf: benh: sounds tricky to model :)
30-01-2013 00:02:44 < benh!~benh@180.200.150.145: that's also how it works on x86
30-01-2013 00:03:05 < benh!~benh@180.200.150.145: dunno how it works on that specific SoC tho but that's how it's usually done
30-01-2013 00:04:36 > agraf: interesting - didn't know that :)
30-01-2013 00:04:51 > agraf: on x86 it's hard to see from a software pov, because everything's linear ;)
30-01-2013 00:26:27 < benh!~benh@180.200.150.145: yeah, that's why x86 has a memory hole to make room for ISA
30-01-2013 00:26:40 < benh!~benh@180.200.150.145: while usually on ppc we remap things with an offset so we don't have to punch a hole in ram

> My naive don't-know-much-about-portio suggestion is that this
> should work the same way as memory regions: each device
> provides portio regions, and the controller for the bus
> (ISA or PCI) exposes those to the next layer up, and
> something at board level maps it all into the right places.

Right. With the addition that on some boards, the PCI host controller which provides a portio map would also expose an ISABus for devices to plug in. At least if I understand Ben correctly.


Alex


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: What to do about non-qdevified devices?
  2013-01-30 10:36       ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
@ 2013-01-30 12:35         ` Markus Armbruster
  2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
  2013-01-30 14:37           ` [Qemu-devel] " Anthony Liguori
  0 siblings, 2 replies; 57+ messages in thread
From: Markus Armbruster @ 2013-01-30 12:35 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Anthony Liguori, KVM devel mailing list, quintela, Alexander Graf,
	qemu-devel qemu-devel, Andreas Färber

Peter Maydell <peter.maydell@linaro.org> writes:

> On 30 January 2013 07:02, Markus Armbruster <armbru@redhat.com> wrote:
>> Anthony Liguori <aliguori@us.ibm.com> writes:
>>
>> [...]
>>> The problems I ran into were (1) this is a lot of work (2) it basically
>>> requires that all bus children have been qdev/QOM-ified.  Even with
>>> something like the ISA bus which is where I started, quite a few devices
>>> were not qdevified still.
>>
>> So what's the plan to complete the qdevification job?  Lay really low
>> and quietly hope the problem goes away?  We've tried that for about
>> three years, doesn't seem to work.
>
> Do we have a list of not-yet-qdevified devices? Maybe we need to
> start saying "fix X Y and Z or platform P is dropped from the next
> release". (This would of course be easier if we had a way to let users
> know that platform P was in danger...)

I think that's a good idea.  Only problem is identifying pre-qdev
devices in the code requires code inspection (grep won't do, I'm
afraid).

If we agree on a "qdevify or else" plan, I'd be prepared to help with
the digging up of devices.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:48   ` Peter Maydell
  2013-01-30 12:31     ` Michael S. Tsirkin
  2013-01-30 12:32     ` Alexander Graf
@ 2013-01-30 13:09     ` Markus Armbruster
  2013-01-30 15:08       ` [Qemu-devel] " Anthony Liguori
  2013-01-30 17:55     ` Andreas Färber
  3 siblings, 1 reply; 57+ messages in thread
From: Markus Armbruster @ 2013-01-30 13:09 UTC (permalink / raw)
  To: Peter Maydell
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, Alon Levy, Hervé Poussineau,
	Gerd Hoffmann, Anthony Liguori, qemu-ppc, Andreas Färber,
	David Gibson

Peter Maydell <peter.maydell@linaro.org> writes:

> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
>> Proposal by hpoussin was to move _list_add() code to ISADevice:
>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>>
>> Concerns:
>> * PCI devices (VGA, QXL) register I/O ports as well
>>   => above patches add dependency on ISABus to machines
>>      -> "<benh> no mac ever had one"
>>   => PCIDevice shouldn't use ISA API with NULL ISADevice
>> * Lack of avi: Who decides about memory API these days?
>>
>> armbru and agraf concluded that moving this into ISA is wrong.
>>
>> => I will drop the remaining ioport patches from above series.
>>
>> Suggestions on how to proceed with tackling the issue are welcome.
>
> How does this stuff work on real hardware? I would have
> expected that a PCI device registering the fact it has
> IO ports would have to do so via the PCI controller it
> is plugged into...
>
> My naive don't-know-much-about-portio suggestion is that this
> should work the same way as memory regions: each device
> provides portio regions, and the controller for the bus
> (ISA or PCI) exposes those to the next layer up, and
> something at board level maps it all into the right places.

Makes sense me, but I'm naive, too :)

For me, "I/O ports" are just an alternate address space some devices
have.  For instance, x86 CPUs have an extra pin for selecting I/O
vs. memory address space.  The ISA bus has separate read/write pins for
memory and I/O.

This isn't terribly special.  Mapping address spaces around is what
devices bridging buses do.

I'd expect a system bus for an x86 CPU to have both a memory and an I/O
address space.

I'd expect an ISA PC's sysbus - ISA bridge to map both directly.

I'd expect an ISA bridge for a sysbus without a separate I/O address
space to map the ISA I/O address space into the sysbus's normal address
space somehow.

PCI ISA bridges have their own rules, but I've gotten away with ignoring
the details so far :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 12:31     ` Michael S. Tsirkin
@ 2013-01-30 13:24       ` Anthony Liguori
  2013-01-30 14:11         ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 13:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, Peter Maydell
  Cc: Andreas Färber, Juan Quintela, KVM devel mailing list,
	qemu-devel, Alexander Graf, Hervé Poussineau, Gerd Hoffmann,
	qemu-ppc, Alon Levy, David Gibson

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Jan 30, 2013 at 11:48:14AM +0000, Peter Maydell wrote:
>> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
>> > Proposal by hpoussin was to move _list_add() code to ISADevice:
>> > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>> >
>> > Concerns:
>> > * PCI devices (VGA, QXL) register I/O ports as well
>> >   => above patches add dependency on ISABus to machines
>> >      -> "<benh> no mac ever had one"
>> >   => PCIDevice shouldn't use ISA API with NULL ISADevice
>> > * Lack of avi: Who decides about memory API these days?
>> >
>> > armbru and agraf concluded that moving this into ISA is wrong.
>> >
>> > => I will drop the remaining ioport patches from above series.
>> >
>> > Suggestions on how to proceed with tackling the issue are welcome.
>> 
>> How does this stuff work on real hardware? I would have
>> expected that a PCI device registering the fact it has
>> IO ports would have to do so via the PCI controller it
>> is plugged into...
>
> All programming is done by the OS, devices do not register
> with controller.
>
> Each bridge has two ways to claim an IO transaction:
> - transaction is within the window programmed in the bridge
> - subtractive decoding enabled and no one else claims the transaction

And there can only be one endpoint that accepts subtractive decoding and
this is usually the ISA bridge.

Also note that there are some really special cases with PCI.  The legacy
VGA ports are always routed to the first device with a DISPLAY class
type.

Likewise, with legacy IDE ports are routed to the first device with an
IDE class.  That's the only reason you can have these legacy devices not
behind the ISA bridge.

Regards,

Anthony Liguori

>
> At the bus level, transaction happens on a bus and an appropriate device
> will claim it.
>
>> My naive don't-know-much-about-portio suggestion is that this
>> should work the same way as memory regions: each device
>> provides portio regions, and the controller for the bus
>> (ISA or PCI) exposes those to the next layer up, and
>> something at board level maps it all into the right places.
>> 
>> -- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] What to do about non-qdevified devices?
  2013-01-30 12:35         ` What to do about non-qdevified devices? Markus Armbruster
@ 2013-01-30 13:44           ` Andreas Färber
  2013-01-30 16:58             ` Paolo Bonzini
  2013-01-31 18:48             ` Markus Armbruster
  2013-01-30 14:37           ` [Qemu-devel] " Anthony Liguori
  1 sibling, 2 replies; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 13:44 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Peter Maydell, Anthony Liguori, KVM devel mailing list,
	Juan Quintela, Alexander Graf, qemu-devel

Am 30.01.2013 13:35, schrieb Markus Armbruster:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On 30 January 2013 07:02, Markus Armbruster <armbru@redhat.com> wrote:
>>> Anthony Liguori <aliguori@us.ibm.com> writes:
>>>
>>> [...]
>>>> The problems I ran into were (1) this is a lot of work (2) it basically
>>>> requires that all bus children have been qdev/QOM-ified.  Even with
>>>> something like the ISA bus which is where I started, quite a few devices
>>>> were not qdevified still.
>>>
>>> So what's the plan to complete the qdevification job?  Lay really low
>>> and quietly hope the problem goes away?  We've tried that for about
>>> three years, doesn't seem to work.
>>
>> Do we have a list of not-yet-qdevified devices? Maybe we need to
>> start saying "fix X Y and Z or platform P is dropped from the next
>> release". (This would of course be easier if we had a way to let users
>> know that platform P was in danger...)
> 
> I think that's a good idea.  Only problem is identifying pre-qdev
> devices in the code requires code inspection (grep won't do, I'm
> afraid).

+1 That would address my request as well.

Having a list of low-hanging fruit on the Wiki might also give new
contributors some ideas of where and how to start poking at the code.

> If we agree on a "qdevify or else" plan, I'd be prepared to help with
> the digging up of devices.

I disagree on the "or else" part. I have been qdev'ifying and QOM'ifying
devices in my maintenance area, and progress is slow. It gets even
slower if one leaves clearly maintained areas. I see no good reason to
force a pistol on someone's breast, like you have done for IDE, unless
there is a good reason to do so. Currently I don't see any.

Just think of my pending ide/mmio.c patch [1] that no one has reviewed
or applied so far. Similarly, Fred's virtio refactoring has pretty long
review cycles, with discussions about very basic QOM and OOD idioms.

If we want to make progress, we need to encourage contributors to send
such patches by making sure they get feedback and find their way into
the tree within a reasonable time frame. It's always easier to rip out
and damage other people's work than to get things right yourself.

To take that thought to the extreme, I could propose to rip out any qdev
device that's not properly QOM'ified and realize'ified yet. That would
include i440fx, fdc and many core x86 devices in the repository...

Technical risks have been raised elsewhere: Making random code
SysBusDevices can lead to PCIDevices instantiating them not being
hot-pluggable any more simply because SysBus is a crappy fallback,
overused in lack of a clear alternative. I already started reviewing
parent_bus and qdev_get_parent_bus() uses in the tree [2, 3], but
constructive help would be more welcome than constant nagging about code
that's in bad shape. There's a lot of work to be done!

Andreas

[1] http://patchwork.ozlabs.org/patch/215482/
[2] http://patchwork.ozlabs.org/patch/209499/
[3] http://patchwork.ozlabs.org/patch/213971/

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
  2013-01-30 11:48   ` Peter Maydell
@ 2013-01-30 13:59   ` Anthony Liguori
  2013-01-30 21:05     ` Benjamin Herrenschmidt
  2013-01-30 15:45   ` [Qemu-devel] " Gerd Hoffmann
  2 siblings, 1 reply; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 13:59 UTC (permalink / raw)
  To: Andreas Färber, Juan Quintela
  Cc: KVM devel mailing list, qemu-devel, Alexander Graf,
	Benjamin Herrenschmidt, qemu-ppc, Hervé Poussineau,
	David Gibson, Gerd Hoffmann, Alon Levy, Michael S. Tsirkin

Andreas Färber <afaerber@suse.de> writes:

> Am 29.01.2013 16:41, schrieb Juan Quintela:
>> * Portio port to new memory regions?
>>   Andreas, could you fill?
>
> MemoryRegion's .old_portio mechanism requires workarounds for VGA on
> ppc, affecting among others the sPAPR PCI host bridge:
> http://git.qemu.org/?p=qemu.git;a=commit;h=a3cfa18eb075c7ef78358ca1956fe7b01caa1724
>
> Patches were posted and merged removing all .old_portio users but one:
> hw/ioport.c:portio_list_add_1(), used by portio_list_add()
>
> hw/isa-bus.c:    portio_list_add(piolist, isabus->address_space_io, start);
> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
> pci_address_space_io(dev), 0x3b0);
> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
> hw/vga.c:        portio_list_add(vbe_port_list, address_space_io, 0x1ce);
>
> Proposal by hpoussin was to move _list_add() code to ISADevice:
> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html

Okay, a couple things here:

There is no such thing as "PIO" as a general concept.  What leaves the
CPU and what a bus interprets are totally different things.

An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
the top bit is set determines whether it's a "PIO" transaction or an
"MMIO" transaction.  A large chunk of that address space is invalid of
course.

PCI has a 65 bit address space too.  The 65th bit determines whether
it's an IO transaction or an MMIO transaction.

For architectures that only have a 64-bit address space, what the PCI
controller typically does is pick a 16-bit window within that address
space to map to a PCI address with the 65th bit set.

Within the PCI bus, transactions are usually routed to devices via
positive decoding.  The device lists what address regions it wants to
handle (via BARs) and the PCI bus uses those to determine who to send
transactions to.

There are some exceptions though.  Specifically:

1) A chipset will route any non-positively decoded IO transaction (65th
   bit set) to a single end point (usually the ISA-bridge).  Which one it
   chooses is up to the chipset.  This is called subtractive decoding
   because the PCI bus will wait multiple cycles for that device to
   claim the transaction before bouncing it.

2) There are special hacks in most PCI chipsets to route very specific
   addresses ranges to certain devices.  Namely, legacy VGA IO transactions
   go to the first VGA device.  Legacy IDE IO transactions go to the first
   IDE device.  This doesn't need to be programmed in the BARs.  It will
   just happen.

3) As it turns out, all legacy PIIX3 devices are positively decoded and
   sent to the ISA-bridge (because it's faster this way).

Notice the lack of the word "ISA" in all of this other than describing
the PCI class of an end point.

So how should this be modeled?

On x86, the CPU has a pio address space.  That can propagate down
through the PCI bus which is what we do today.

On !x86, the PCI controller ought to setup a MemoryRegion for downstream
PIO that devices can use to register on.

We probably need to do something like change the PCI VGA devices to
export a MemoryRegion and allow the PCI controller to device how to
register that as a subregion.

Regards,

Anthony Liguori

>
> Concerns:
> * PCI devices (VGA, QXL) register I/O ports as well
>   => above patches add dependency on ISABus to machines
>      -> "<benh> no mac ever had one"
>   => PCIDevice shouldn't use ISA API with NULL ISADevice
> * Lack of avi: Who decides about memory API these days?
>
> armbru and agraf concluded that moving this into ISA is wrong.
>
> => I will drop the remaining ioport patches from above series.
>
> Suggestions on how to proceed with tackling the issue are welcome.
>
> Regards,
> Andreas
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 13:24       ` [Qemu-devel] " Anthony Liguori
@ 2013-01-30 14:11         ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 14:11 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Peter Maydell, Andreas Färber, Juan Quintela,
	KVM devel mailing list, qemu-devel, Alexander Graf,
	Hervé Poussineau, Gerd Hoffmann, qemu-ppc, Alon Levy,
	David Gibson

On Wed, Jan 30, 2013 at 07:24:57AM -0600, Anthony Liguori wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Jan 30, 2013 at 11:48:14AM +0000, Peter Maydell wrote:
> >> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
> >> > Proposal by hpoussin was to move _list_add() code to ISADevice:
> >> > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
> >> >
> >> > Concerns:
> >> > * PCI devices (VGA, QXL) register I/O ports as well
> >> >   => above patches add dependency on ISABus to machines
> >> >      -> "<benh> no mac ever had one"
> >> >   => PCIDevice shouldn't use ISA API with NULL ISADevice
> >> > * Lack of avi: Who decides about memory API these days?
> >> >
> >> > armbru and agraf concluded that moving this into ISA is wrong.
> >> >
> >> > => I will drop the remaining ioport patches from above series.
> >> >
> >> > Suggestions on how to proceed with tackling the issue are welcome.
> >> 
> >> How does this stuff work on real hardware? I would have
> >> expected that a PCI device registering the fact it has
> >> IO ports would have to do so via the PCI controller it
> >> is plugged into...
> >
> > All programming is done by the OS, devices do not register
> > with controller.
> >
> > Each bridge has two ways to claim an IO transaction:
> > - transaction is within the window programmed in the bridge
> > - subtractive decoding enabled and no one else claims the transaction
> 
> And there can only be one endpoint that accepts subtractive decoding and
> this is usually the ISA bridge.
> 
> Also note that there are some really special cases with PCI.  The legacy
> VGA ports are always routed to the first device with a DISPLAY class
> type.
> 
> Likewise, with legacy IDE ports are routed to the first device with an
> IDE class.  That's the only reason you can have these legacy devices not
> behind the ISA bridge.
> 
> Regards,
> 
> Anthony Liguori

Yes. And to futher clarify that, 'routed' in the sense that the spec
specifies the addresses for each class, it's a hard-coded set of
addresses.

The hardware never looks at the class, each device of
simply knows which addresses to claim and whether it's enabled.

What happens if you have more than one VGA adapter on a bus?
As long as only one is enabled, you are fine.
If more than one is enabled, bad things will happen including
possibly overheating.

Also, it's not just the class that specifies the addresses,
it's the programming interface too.
For example for display, hardcoded addresses are used for legacy sublass 0x0
and for programming ifc 0x0 - vga compatible adapter and
0x1 - 8514 compatible adapter.
But again - it specifies this to the OS.

> >
> > At the bus level, transaction happens on a bus and an appropriate device
> > will claim it.
> >
> >> My naive don't-know-much-about-portio suggestion is that this
> >> should work the same way as memory regions: each device
> >> provides portio regions, and the controller for the bus
> >> (ISA or PCI) exposes those to the next layer up, and
> >> something at board level maps it all into the right places.
> >> 
> >> -- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] What to do about non-qdevified devices?
  2013-01-30 12:35         ` What to do about non-qdevified devices? Markus Armbruster
  2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
@ 2013-01-30 14:37           ` Anthony Liguori
  1 sibling, 0 replies; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 14:37 UTC (permalink / raw)
  To: Markus Armbruster, Peter Maydell
  Cc: KVM devel mailing list, quintela, Alexander Graf,
	qemu-devel qemu-devel, Andreas Färber

Markus Armbruster <armbru@redhat.com> writes:

> Peter Maydell <peter.maydell@linaro.org> writes:
>
>> On 30 January 2013 07:02, Markus Armbruster <armbru@redhat.com> wrote:
>>> Anthony Liguori <aliguori@us.ibm.com> writes:
>>>
>>> [...]
>>>> The problems I ran into were (1) this is a lot of work (2) it basically
>>>> requires that all bus children have been qdev/QOM-ified.  Even with
>>>> something like the ISA bus which is where I started, quite a few devices
>>>> were not qdevified still.
>>>
>>> So what's the plan to complete the qdevification job?  Lay really low
>>> and quietly hope the problem goes away?  We've tried that for about
>>> three years, doesn't seem to work.
>>
>> Do we have a list of not-yet-qdevified devices? Maybe we need to
>> start saying "fix X Y and Z or platform P is dropped from the next
>> release". (This would of course be easier if we had a way to let users
>> know that platform P was in danger...)
>
> I think that's a good idea.  Only problem is identifying pre-qdev
> devices in the code requires code inspection (grep won't do, I'm
> afraid).
>
> If we agree on a "qdevify or else" plan, I'd be prepared to help with
> the digging up of devices.

That's a nice thought, but we're not going to rip out dma.c and break
every PC target.

But I will help put together a list of devices that need converting.  I
have patches actually for most of the PC devices.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 13:09     ` Markus Armbruster
@ 2013-01-30 15:08       ` Anthony Liguori
  0 siblings, 0 replies; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 15:08 UTC (permalink / raw)
  To: Markus Armbruster, Peter Maydell
  Cc: Andreas Färber, KVM devel mailing list, Michael S. Tsirkin,
	Juan Quintela, Alexander Graf, qemu-devel, Hervé Poussineau,
	Gerd Hoffmann, qemu-ppc, David Gibson, Alon Levy

Markus Armbruster <armbru@redhat.com> writes:

> Peter Maydell <peter.maydell@linaro.org> writes:
>
>> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
>>> Proposal by hpoussin was to move _list_add() code to ISADevice:
>>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>>>
>>> Concerns:
>>> * PCI devices (VGA, QXL) register I/O ports as well
>>>   => above patches add dependency on ISABus to machines
>>>      -> "<benh> no mac ever had one"
>>>   => PCIDevice shouldn't use ISA API with NULL ISADevice
>>> * Lack of avi: Who decides about memory API these days?
>>>
>>> armbru and agraf concluded that moving this into ISA is wrong.
>>>
>>> => I will drop the remaining ioport patches from above series.
>>>
>>> Suggestions on how to proceed with tackling the issue are welcome.
>>
>> How does this stuff work on real hardware? I would have
>> expected that a PCI device registering the fact it has
>> IO ports would have to do so via the PCI controller it
>> is plugged into...
>>
>> My naive don't-know-much-about-portio suggestion is that this
>> should work the same way as memory regions: each device
>> provides portio regions, and the controller for the bus
>> (ISA or PCI) exposes those to the next layer up, and
>> something at board level maps it all into the right places.
>
> Makes sense me, but I'm naive, too :)
>
> For me, "I/O ports" are just an alternate address space some devices
> have.  For instance, x86 CPUs have an extra pin for selecting I/O
> vs. memory address space.  The ISA bus has separate read/write pins for
> memory and I/O.
>
> This isn't terribly special.  Mapping address spaces around is what
> devices bridging buses do.
>
> I'd expect a system bus for an x86 CPU to have both a memory and an I/O
> address space.

There is no such thing as a "system bus".

There is a bus that links the CPUs to each other and to the North
Bridge.  This is QPI on modern systems.

Sometimes there's a bus to link the North Bridge to the South Bridge.
On modern systems, this is QPI.  On the i440fx, the i440fx is both the
South Bridge and North Bridge and the link between the two is internal
to the chip.  The South Bridge may then export one or more downstream
interfaces.  In the i440fx, it only exports PCI.

Behind the PCI bus, there may be bridges.  On the i440fx, there is a ISA
Bridge which also acts as a Super I/O chip.  It exposes a downstream ISA
bus.

sysbus is a relic of poor modeling.  A major milestone in QEMU's
evolution will be when sysbus is completely removed.

Regards,

Anthony Liguori

>
> I'd expect an ISA PC's sysbus - ISA bridge to map both directly.
>
> I'd expect an ISA bridge for a sysbus without a separate I/O address
> space to map the ISA I/O address space into the sysbus's normal address
> space somehow.
>
> PCI ISA bridges have their own rules, but I've gotten away with ignoring
> the details so far :)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
  2013-01-30 11:48   ` Peter Maydell
  2013-01-30 13:59   ` [Qemu-devel] " Anthony Liguori
@ 2013-01-30 15:45   ` Gerd Hoffmann
  2013-01-30 16:33     ` Anthony Liguori
  2 siblings, 1 reply; 57+ messages in thread
From: Gerd Hoffmann @ 2013-01-30 15:45 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Juan Quintela, KVM devel mailing list, qemu-devel, Alexander Graf,
	Benjamin Herrenschmidt, qemu-ppc, Hervé Poussineau,
	David Gibson, Alon Levy, Michael S. Tsirkin, Anthony Liguori

  Hi,

> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
> pci_address_space_io(dev), 0x3b0);
> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);

That reminds me I should solve this in a more elegant way.

qxl takes over the vga io ports.  The reason it does this is because qxl
switches into vga mode in case the vga ports are accessed while not in
vga mode.  After doing the check (and possibly switching mode) the vga
handler is called to actually handle it.

That twist makes it a bit hard to convert vga ...

Anyone knows how one would do that with the memory api instead? I think
taking over the ports is easy as the memory regions have priorities so I
can simply register a region with higher priority. I have no clue how to
forward the access to the vga code though.

Anyone has clues / suggestions?

thanks,
  Gerd

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 15:45   ` [Qemu-devel] " Gerd Hoffmann
@ 2013-01-30 16:33     ` Anthony Liguori
  2013-01-30 16:54       ` Andreas Färber
  2013-01-30 17:08       ` Paolo Bonzini
  0 siblings, 2 replies; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 16:33 UTC (permalink / raw)
  To: Gerd Hoffmann, Andreas Färber
  Cc: Juan Quintela, KVM devel mailing list, qemu-devel, Alexander Graf,
	Benjamin Herrenschmidt, qemu-ppc, Hervé Poussineau,
	David Gibson, Alon Levy, Michael S. Tsirkin

Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
>> pci_address_space_io(dev), 0x3b0);
>> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
>
> That reminds me I should solve this in a more elegant way.
>
> qxl takes over the vga io ports.  The reason it does this is because qxl
> switches into vga mode in case the vga ports are accessed while not in
> vga mode.  After doing the check (and possibly switching mode) the vga
> handler is called to actually handle it.

The best way to handle this would be to remodel how we do VGA.

Make VGACommonState a proper QOM object and use it as the base class for
QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.

The VGA accessors should be exposed as a memory region but the sub class
ought to be responsible for actually adding it to a subregion.

>
> That twist makes it a bit hard to convert vga ...
>
> Anyone knows how one would do that with the memory api instead? I think
> taking over the ports is easy as the memory regions have priorities so I
> can simply register a region with higher priority. I have no clue how to
> forward the access to the vga code though.
>

That should be possible with priorities, but I think it's wrong.  There
aren't two VGA devices.  QXL is-a VGA device and the best way to
override behavior of base VGA device is through polymorphism.

This isn't really a memory API issue, it's a modeling issue.

Regards,

Anthony Liguori

> Anyone has clues / suggestions?
>
> thanks,
>   Gerd

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 16:33     ` Anthony Liguori
@ 2013-01-30 16:54       ` Andreas Färber
  2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
  2013-01-30 21:07         ` Benjamin Herrenschmidt
  2013-01-30 17:08       ` Paolo Bonzini
  1 sibling, 2 replies; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 16:54 UTC (permalink / raw)
  To: Anthony Liguori, Gerd Hoffmann
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	Alexander Graf, qemu-devel, qemu-ppc, Hervé Poussineau,
	Alon Levy, David Gibson

Am 30.01.2013 17:33, schrieb Anthony Liguori:
> Gerd Hoffmann <kraxel@redhat.com> writes:
> 
>>> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
>>> pci_address_space_io(dev), 0x3b0);
>>> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
>>
>> That reminds me I should solve this in a more elegant way.
>>
>> qxl takes over the vga io ports.  The reason it does this is because qxl
>> switches into vga mode in case the vga ports are accessed while not in
>> vga mode.  After doing the check (and possibly switching mode) the vga
>> handler is called to actually handle it.
> 
> The best way to handle this would be to remodel how we do VGA.
> 
> Make VGACommonState a proper QOM object and use it as the base class for
> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.

That would require polymorphism since we already need to derive from
PCIDevice or ISADevice respectively for interfacing with the bus...
Modern object-oriented languages have tried to avoid multi-inheritence
due to arising complications, I thought. Wouldn't object if someone
wanted to do the dirty implementation work though. ;)

Another such example is EHCI, with PCIDevice and SysBusDevice frontends,
sharing an EHCIState struct and having helper functions operating on
that core state only. Quite a few device share such a pattern today
actually (serial, m48t59, ...).

> The VGA accessors should be exposed as a memory region but the sub class
> ought to be responsible for actually adding it to a subregion.
> 
>>
>> That twist makes it a bit hard to convert vga ...
>>
>> Anyone knows how one would do that with the memory api instead? I think
>> taking over the ports is easy as the memory regions have priorities so I
>> can simply register a region with higher priority. I have no clue how to
>> forward the access to the vga code though.
>>
> 
> That should be possible with priorities, but I think it's wrong.  There
> aren't two VGA devices.  QXL is-a VGA device and the best way to
> override behavior of base VGA device is through polymorphism.

In this particular case QXL is-a PCI VGA device though, so we can
decouple it from core VGA modeling. Placing the MemoryRegionOps inside
the Class (rather than static const) might be a short-term solution for
overriding read/write handlers of a particular VGA MemoryRegion. :)

Cheers,
Andreas

> This isn't really a memory API issue, it's a modeling issue.
> 
> Regards,
> 
> Anthony Liguori
> 
>> Anyone has clues / suggestions?
>>
>> thanks,
>>   Gerd

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: What to do about non-qdevified devices?
  2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
@ 2013-01-30 16:58             ` Paolo Bonzini
  2013-01-30 17:14               ` [Qemu-devel] " Andreas Färber
  2013-01-31 18:48             ` Markus Armbruster
  1 sibling, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-01-30 16:58 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Peter Maydell, Anthony Liguori, KVM devel mailing list,
	Juan Quintela, qemu-devel, Alexander Graf, Markus Armbruster

Il 30/01/2013 14:44, Andreas Färber ha scritto:
> I disagree on the "or else" part. I have been qdev'ifying and QOM'ifying
> devices in my maintenance area, and progress is slow. It gets even
> slower if one leaves clearly maintained areas. I see no good reason to
> force a pistol on someone's breast, like you have done for IDE, unless
> there is a good reason to do so. Currently I don't see any.

The reason for IDE is that it involved devices that are not
SysBusDevices (the IDE disk devices).  Having the same code work in two
ways, one qdevified and one not, is bad.

For simple SysBusDevice you're changing a crappy default to a less bad
one, but there's really little incentive to qdev/QOM-ification.

Paolo

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 16:33     ` Anthony Liguori
  2013-01-30 16:54       ` Andreas Färber
@ 2013-01-30 17:08       ` Paolo Bonzini
  2013-01-30 21:08         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 57+ messages in thread
From: Paolo Bonzini @ 2013-01-30 17:08 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gerd Hoffmann, Andreas Färber, Juan Quintela,
	KVM devel mailing list, qemu-devel, Alexander Graf,
	Benjamin Herrenschmidt, qemu-ppc, Hervé Poussineau,
	David Gibson, Alon Levy, Michael S. Tsirkin

Il 30/01/2013 17:33, Anthony Liguori ha scritto:
> Gerd Hoffmann <kraxel@redhat.com> writes:
> 
>>   Hi,
>>
>>> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
>>> pci_address_space_io(dev), 0x3b0);
>>> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
>>
>> That reminds me I should solve this in a more elegant way.
>>
>> qxl takes over the vga io ports.  The reason it does this is because qxl
>> switches into vga mode in case the vga ports are accessed while not in
>> vga mode.  After doing the check (and possibly switching mode) the vga
>> handler is called to actually handle it.
> 
> The best way to handle this would be to remodel how we do VGA.
> 
> Make VGACommonState a proper QOM object and use it as the base class for
> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.

I think QXL should have-a VGA rather than being one.  It completely
bypasses the VGA infrastructure if not in VGA mode.

> The VGA accessors should be exposed as a memory region but the sub class
> ought to be responsible for actually adding it to a subregion.
> 
>>
>> That twist makes it a bit hard to convert vga ...
>>
>> Anyone knows how one would do that with the memory api instead? I think
>> taking over the ports is easy as the memory regions have priorities so I
>> can simply register a region with higher priority. I have no clue how to
>> forward the access to the vga code though.

Avi had a prototype patch series for IOMMU regions.  You could add one
between the QXL device and the VGA.  It doesn't have to do a
translation, but trying to translate a VGA address already means that
you must go to VGA mode.

Paolo

> 
> That should be possible with priorities, but I think it's wrong.  There
> aren't two VGA devices.  QXL is-a VGA device and the best way to
> override behavior of base VGA device is through polymorphism.
> 
> This isn't really a memory API issue, it's a modeling issue.
> 
> Regards,
> 
> Anthony Liguori
> 
>> Anyone has clues / suggestions?
>>
>> thanks,
>>   Gerd
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] What to do about non-qdevified devices?
  2013-01-30 16:58             ` Paolo Bonzini
@ 2013-01-30 17:14               ` Andreas Färber
  0 siblings, 0 replies; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 17:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Markus Armbruster, Peter Maydell, Anthony Liguori,
	KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel

Am 30.01.2013 17:58, schrieb Paolo Bonzini:
> Il 30/01/2013 14:44, Andreas Färber ha scritto:
>> I disagree on the "or else" part. I have been qdev'ifying and QOM'ifying
>> devices in my maintenance area, and progress is slow. It gets even
>> slower if one leaves clearly maintained areas. I see no good reason to
>> force a pistol on someone's breast, like you have done for IDE, unless
>> there is a good reason to do so. Currently I don't see any.
> 
> The reason for IDE is that it involved devices that are not
> SysBusDevices (the IDE disk devices).  Having the same code work in two
> ways, one qdevified and one not, is bad.

Sure, I did help with the QOM'ification there. "Currently I don't see
any [good reason]" by contrast referred to removing *all* devices that
are not yet qdev/QOM'ified without such pressing reason.

> For simple SysBusDevice you're changing a crappy default to a less bad
> one, but there's really little incentive to qdev/QOM-ification.

No disagreement. The benefits don't come from doing a conversion, they
come from basing new work on the result of a conversion. :)

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 16:54       ` Andreas Färber
@ 2013-01-30 17:29         ` Anthony Liguori
  2013-01-30 20:08           ` Michael S. Tsirkin
  2013-01-30 20:19           ` [Qemu-devel] " Andreas Färber
  2013-01-30 21:07         ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 17:29 UTC (permalink / raw)
  To: Andreas Färber, Gerd Hoffmann
  Cc: Juan Quintela, KVM devel mailing list, qemu-devel, Alexander Graf,
	Benjamin Herrenschmidt, qemu-ppc, Hervé Poussineau,
	David Gibson, Alon Levy, Michael S. Tsirkin

Andreas Färber <afaerber@suse.de> writes:

> Am 30.01.2013 17:33, schrieb Anthony Liguori:
>> Gerd Hoffmann <kraxel@redhat.com> writes:
>> 
>>>> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
>>>> pci_address_space_io(dev), 0x3b0);
>>>> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
>>>
>>> That reminds me I should solve this in a more elegant way.
>>>
>>> qxl takes over the vga io ports.  The reason it does this is because qxl
>>> switches into vga mode in case the vga ports are accessed while not in
>>> vga mode.  After doing the check (and possibly switching mode) the vga
>>> handler is called to actually handle it.
>> 
>> The best way to handle this would be to remodel how we do VGA.
>> 
>> Make VGACommonState a proper QOM object and use it as the base class for
>> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.
>
> That would require polymorphism since we already need to derive from
> PCIDevice or ISADevice respectively for interfacing with the bus...

Nope.  You can use composition:

QXLDevice is-a VGACommonState

QXLPCI is-a PCIDevice
       has-a QXLDevice

> Modern object-oriented languages have tried to avoid multi-inheritence
> due to arising complications, I thought. Wouldn't object if someone
> wanted to do the dirty implementation work though. ;)

There is no need for MI.

> Another such example is EHCI, with PCIDevice and SysBusDevice frontends,
> sharing an EHCIState struct and having helper functions operating on
> that core state only. Quite a few device share such a pattern today
> actually (serial, m48t59, ...).

Yes, this is all about chipset modelling.  Chipsets should derive from
device and then be embedded in the appropriate bus device.

For instance.

SerialState is-a DeviceState

ISASerialState is-a ISADevice, has-a SerialState
MMIOSerialState is-a SysbusDevice, has-a SerialState

This is what we're doing in practice, we just aren't modeling the
chipsets and we're open coding the relationships (often in subtley
different ways).

Regards,

Anthony Liguori

>> The VGA accessors should be exposed as a memory region but the sub class
>> ought to be responsible for actually adding it to a subregion.
>> 
>>>
>>> That twist makes it a bit hard to convert vga ...
>>>
>>> Anyone knows how one would do that with the memory api instead? I think
>>> taking over the ports is easy as the memory regions have priorities so I
>>> can simply register a region with higher priority. I have no clue how to
>>> forward the access to the vga code though.
>>>
>> 
>> That should be possible with priorities, but I think it's wrong.  There
>> aren't two VGA devices.  QXL is-a VGA device and the best way to
>> override behavior of base VGA device is through polymorphism.
>
> In this particular case QXL is-a PCI VGA device though, so we can
> decouple it from core VGA modeling. Placing the MemoryRegionOps inside
> the Class (rather than static const) might be a short-term solution for
> overriding read/write handlers of a particular VGA MemoryRegion. :)
>
> Cheers,
> Andreas
>
>> This isn't really a memory API issue, it's a modeling issue.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>>> Anyone has clues / suggestions?
>>>
>>> thanks,
>>>   Gerd
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 11:48   ` Peter Maydell
                       ` (2 preceding siblings ...)
  2013-01-30 13:09     ` Markus Armbruster
@ 2013-01-30 17:55     ` Andreas Färber
  2013-01-30 20:20       ` Michael S. Tsirkin
  3 siblings, 1 reply; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 17:55 UTC (permalink / raw)
  To: Peter Maydell
  Cc: KVM devel mailing list, Michael S. Tsirkin, Juan Quintela,
	Alexander Graf, qemu-devel, Hervé Poussineau, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, David Gibson, Alon Levy

Am 30.01.2013 12:48, schrieb Peter Maydell:
> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
>> Proposal by hpoussin was to move _list_add() code to ISADevice:
>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>>
>> Concerns:
>> * PCI devices (VGA, QXL) register I/O ports as well
>>   => above patches add dependency on ISABus to machines
>>      -> "<benh> no mac ever had one"
>>   => PCIDevice shouldn't use ISA API with NULL ISADevice
>> * Lack of avi: Who decides about memory API these days?
>>
>> armbru and agraf concluded that moving this into ISA is wrong.
>>
>> => I will drop the remaining ioport patches from above series.
>>
>> Suggestions on how to proceed with tackling the issue are welcome.
> 
> How does this stuff work on real hardware? I would have
> expected that a PCI device registering the fact it has
> IO ports would have to do so via the PCI controller it
> is plugged into...
> 
> My naive don't-know-much-about-portio suggestion is that this
> should work the same way as memory regions: each device
> provides portio regions,

One remark on "same way as memory regions", me not knowing all the gory
hardware details myself.

PIO often contradicts the normal MemoryRegion usage. I.e., for an MMIO
device you would have a continuous region from say 0xa0000000 to
0xa007ffff inclusive and within that region you have some kind of sparse
registers. With ISA ports you often have dense overlapping ranges, say,
0x3-0x6 byte-reads foo, while 0x4 word-write does bar.
This is handled by having lists of (offset, length, size, handler)
quadruplets and consolidating those into MemoryRegions and aliases (cf.
patches) that then have a validation function to check whether a
particular access is valid and by whom it should be handled - that's
what MemoryRegionPortio[] and similar APIs are good for.

So yes, it might be possible to have a device declare its ports at
PCIDevice or DeviceState level, but it can't be directly passed through
to MemoryRegion API in most cases, or conflicts would arise. At least
that was my experience with PReP.

Andreas

> and the controller for the bus
> (ISA or PCI) exposes those to the next layer up, and
> something at board level maps it all into the right places.
> 
> -- PMM

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
@ 2013-01-30 20:08           ` Michael S. Tsirkin
  2013-01-30 20:19             ` Peter Maydell
  2013-01-30 20:19           ` [Qemu-devel] " Andreas Färber
  1 sibling, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 20:08 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Alon Levy, qemu-ppc, Gerd Hoffmann, Hervé Poussineau,
	Andreas Färber, David Gibson

On Wed, Jan 30, 2013 at 11:29:58AM -0600, Anthony Liguori wrote:
> Andreas Färber <afaerber@suse.de> writes:
> 
> > Am 30.01.2013 17:33, schrieb Anthony Liguori:
> >> Gerd Hoffmann <kraxel@redhat.com> writes:
> >> 
> >>>> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
> >>>> pci_address_space_io(dev), 0x3b0);
> >>>> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
> >>>
> >>> That reminds me I should solve this in a more elegant way.
> >>>
> >>> qxl takes over the vga io ports.  The reason it does this is because qxl
> >>> switches into vga mode in case the vga ports are accessed while not in
> >>> vga mode.  After doing the check (and possibly switching mode) the vga
> >>> handler is called to actually handle it.
> >> 
> >> The best way to handle this would be to remodel how we do VGA.
> >> 
> >> Make VGACommonState a proper QOM object and use it as the base class for
> >> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.
> >
> > That would require polymorphism since we already need to derive from
> > PCIDevice or ISADevice respectively for interfacing with the bus...
> 
> Nope.  You can use composition:
> 
> QXLDevice is-a VGACommonState
> 
> QXLPCI is-a PCIDevice
>        has-a QXLDevice

But why like this?
The distinction is artificial, isn't it?

> > Modern object-oriented languages have tried to avoid multi-inheritence
> > due to arising complications, I thought. Wouldn't object if someone
> > wanted to do the dirty implementation work though. ;)
> 
> There is no need for MI.
> 
> > Another such example is EHCI, with PCIDevice and SysBusDevice frontends,
> > sharing an EHCIState struct and having helper functions operating on
> > that core state only. Quite a few device share such a pattern today
> > actually (serial, m48t59, ...).
> 
> Yes, this is all about chipset modelling.  Chipsets should derive from
> device and then be embedded in the appropriate bus device.
> 
> For instance.
> 
> SerialState is-a DeviceState
> 
> ISASerialState is-a ISADevice, has-a SerialState
> MMIOSerialState is-a SysbusDevice, has-a SerialState

ISASerialState is not a SerialState?
Hmm but why?

> This is what we're doing in practice, we just aren't modeling the
> chipsets and we're open coding the relationships (often in subtley
> different ways).
> 
> Regards,
> 
> Anthony Liguori
> 
> >> The VGA accessors should be exposed as a memory region but the sub class
> >> ought to be responsible for actually adding it to a subregion.
> >> 
> >>>
> >>> That twist makes it a bit hard to convert vga ...
> >>>
> >>> Anyone knows how one would do that with the memory api instead? I think
> >>> taking over the ports is easy as the memory regions have priorities so I
> >>> can simply register a region with higher priority. I have no clue how to
> >>> forward the access to the vga code though.
> >>>
> >> 
> >> That should be possible with priorities, but I think it's wrong.  There
> >> aren't two VGA devices.  QXL is-a VGA device and the best way to
> >> override behavior of base VGA device is through polymorphism.
> >
> > In this particular case QXL is-a PCI VGA device though, so we can
> > decouple it from core VGA modeling. Placing the MemoryRegionOps inside
> > the Class (rather than static const) might be a short-term solution for
> > overriding read/write handlers of a particular VGA MemoryRegion. :)
> >
> > Cheers,
> > Andreas
> >
> >> This isn't really a memory API issue, it's a modeling issue.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >> 
> >>> Anyone has clues / suggestions?
> >>>
> >>> thanks,
> >>>   Gerd
> >
> > -- 
> > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 20:08           ` Michael S. Tsirkin
@ 2013-01-30 20:19             ` Peter Maydell
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Maydell @ 2013-01-30 20:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Alon Levy, Gerd Hoffmann, Anthony Liguori, qemu-ppc, David Gibson,
	Andreas Färber, Hervé Poussineau

On 30 January 2013 20:08, Michael S. Tsirkin <mst@redhat.com> wrote:
> Anthony wrote:
>> Nope.  You can use composition:
>>
>> QXLDevice is-a VGACommonState
>>
>> QXLPCI is-a PCIDevice
>>        has-a QXLDevice
>
> But why like this?
> The distinction is artificial, isn't it?

I think it's the wrong way round. QXLPCI should has-a PCI interface
(the physical card possesses an edge connector which fits a PCI
socket; it is not the case that the physical card is a kind of
edge connector). Having PCI card models inherit from PCIDevice
is just a convenient (but misleading) shortcut, and that is what
we should drop if it turns out that we should be inheriting
from some other class.

Or you could make them both has-a; I don't know enough about
QXLDevice to know if it should be is-a or has-a.

-- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
  2013-01-30 20:08           ` Michael S. Tsirkin
@ 2013-01-30 20:19           ` Andreas Färber
  1 sibling, 0 replies; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 20:19 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gerd Hoffmann, Juan Quintela, KVM devel mailing list, qemu-devel,
	Alexander Graf, Benjamin Herrenschmidt, qemu-ppc,
	Hervé Poussineau, David Gibson, Alon Levy,
	Michael S. Tsirkin

Am 30.01.2013 18:29, schrieb Anthony Liguori:
> Andreas Färber <afaerber@suse.de> writes:
> 
>> Am 30.01.2013 17:33, schrieb Anthony Liguori:
>>> Gerd Hoffmann <kraxel@redhat.com> writes:
>>>
>>>>> hw/qxl.c:    portio_list_add(qxl_vga_port_list,
>>>>> pci_address_space_io(dev), 0x3b0);
>>>>> hw/vga.c:        portio_list_add(vga_port_list, address_space_io, 0x3b0);
>>>>
>>>> That reminds me I should solve this in a more elegant way.
>>>>
>>>> qxl takes over the vga io ports.  The reason it does this is because qxl
>>>> switches into vga mode in case the vga ports are accessed while not in
>>>> vga mode.  After doing the check (and possibly switching mode) the vga
>>>> handler is called to actually handle it.
>>>
>>> The best way to handle this would be to remodel how we do VGA.
>>>
>>> Make VGACommonState a proper QOM object and use it as the base class for
>>> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.
>>
>> That would require polymorphism since we already need to derive from
>> PCIDevice or ISADevice respectively for interfacing with the bus...
> 
> Nope.  You can use composition:
> 
> QXLDevice is-a VGACommonState
> 
> QXLPCI is-a PCIDevice
>        has-a QXLDevice
> 
>> Modern object-oriented languages have tried to avoid multi-inheritence
>> due to arising complications, I thought. Wouldn't object if someone
>> wanted to do the dirty implementation work though. ;)
> 
> There is no need for MI.
> 
>> Another such example is EHCI, with PCIDevice and SysBusDevice frontends,
>> sharing an EHCIState struct and having helper functions operating on
>> that core state only. Quite a few device share such a pattern today
>> actually (serial, m48t59, ...).
> 
> Yes, this is all about chipset modelling.  Chipsets should derive from
> device and then be embedded in the appropriate bus device.
> 
> For instance.
> 
> SerialState is-a DeviceState
> 
> ISASerialState is-a ISADevice, has-a SerialState
> MMIOSerialState is-a SysbusDevice, has-a SerialState

Okay, but I don't like that both are transitively DeviceState then.
It's much too easy to add / hot-add the wrong device then, especially
when dropping no_user.

Andreas

> This is what we're doing in practice, we just aren't modeling the
> chipsets and we're open coding the relationships (often in subtley
> different ways).
> 
> Regards,
> 
> Anthony Liguori


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 17:55     ` Andreas Färber
@ 2013-01-30 20:20       ` Michael S. Tsirkin
  2013-01-30 20:33         ` [Qemu-devel] " Andreas Färber
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 20:20 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Peter Maydell, KVM devel mailing list, Juan Quintela, qemu-devel,
	Alexander Graf, Hervé Poussineau, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, Alon Levy, David Gibson

On Wed, Jan 30, 2013 at 06:55:47PM +0100, Andreas Färber wrote:
> Am 30.01.2013 12:48, schrieb Peter Maydell:
> > On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
> >> Proposal by hpoussin was to move _list_add() code to ISADevice:
> >> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
> >>
> >> Concerns:
> >> * PCI devices (VGA, QXL) register I/O ports as well
> >>   => above patches add dependency on ISABus to machines
> >>      -> "<benh> no mac ever had one"
> >>   => PCIDevice shouldn't use ISA API with NULL ISADevice
> >> * Lack of avi: Who decides about memory API these days?
> >>
> >> armbru and agraf concluded that moving this into ISA is wrong.
> >>
> >> => I will drop the remaining ioport patches from above series.
> >>
> >> Suggestions on how to proceed with tackling the issue are welcome.
> > 
> > How does this stuff work on real hardware? I would have
> > expected that a PCI device registering the fact it has
> > IO ports would have to do so via the PCI controller it
> > is plugged into...
> > 
> > My naive don't-know-much-about-portio suggestion is that this
> > should work the same way as memory regions: each device
> > provides portio regions,
> 
> One remark on "same way as memory regions", me not knowing all the gory
> hardware details myself.
> 
> PIO often contradicts the normal MemoryRegion usage. I.e., for an MMIO
> device you would have a continuous region from say 0xa0000000 to
> 0xa007ffff inclusive and within that region you have some kind of sparse
> registers. With ISA ports you often have dense overlapping ranges, say,
> 0x3-0x6 byte-reads foo, while 0x4 word-write does bar.

Hmm on x86 this is what happens with cf8..cfb range registers for example.
We plan handle this ATM using memory region priorities.
Same would work for prep won't it?

> This is handled by having lists of (offset, length, size, handler)
> quadruplets and consolidating those into MemoryRegions and aliases (cf.
> patches) that then have a validation function to check whether a
> particular access is valid and by whom it should be handled - that's
> what MemoryRegionPortio[] and similar APIs are good for.
> 
> So yes, it might be possible to have a device declare its ports at
> PCIDevice or DeviceState level, but it can't be directly passed through
> to MemoryRegion API in most cases, or conflicts would arise. At least
> that was my experience with PReP.
> 
> Andreas
> 
> > and the controller for the bus
> > (ISA or PCI) exposes those to the next layer up, and
> > something at board level maps it all into the right places.
> > 
> > -- PMM
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 20:20       ` Michael S. Tsirkin
@ 2013-01-30 20:33         ` Andreas Färber
  2013-01-30 20:55           ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Andreas Färber @ 2013-01-30 20:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, KVM devel mailing list, Juan Quintela, qemu-devel,
	Alexander Graf, Hervé Poussineau, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, Alon Levy, David Gibson

Am 30.01.2013 21:20, schrieb Michael S. Tsirkin:
> On Wed, Jan 30, 2013 at 06:55:47PM +0100, Andreas Färber wrote:
>> Am 30.01.2013 12:48, schrieb Peter Maydell:
>>> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
>>>> Proposal by hpoussin was to move _list_add() code to ISADevice:
>>>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>>>>
>>>> Concerns:
>>>> * PCI devices (VGA, QXL) register I/O ports as well
>>>>   => above patches add dependency on ISABus to machines
>>>>      -> "<benh> no mac ever had one"
>>>>   => PCIDevice shouldn't use ISA API with NULL ISADevice
>>>> * Lack of avi: Who decides about memory API these days?
>>>>
>>>> armbru and agraf concluded that moving this into ISA is wrong.
>>>>
>>>> => I will drop the remaining ioport patches from above series.
>>>>
>>>> Suggestions on how to proceed with tackling the issue are welcome.
>>>
>>> How does this stuff work on real hardware? I would have
>>> expected that a PCI device registering the fact it has
>>> IO ports would have to do so via the PCI controller it
>>> is plugged into...
>>>
>>> My naive don't-know-much-about-portio suggestion is that this
>>> should work the same way as memory regions: each device
>>> provides portio regions,
>>
>> One remark on "same way as memory regions", me not knowing all the gory
>> hardware details myself.
>>
>> PIO often contradicts the normal MemoryRegion usage. I.e., for an MMIO
>> device you would have a continuous region from say 0xa0000000 to
>> 0xa007ffff inclusive and within that region you have some kind of sparse
>> registers. With ISA ports you often have dense overlapping ranges, say,
>> 0x3-0x6 byte-reads foo, while 0x4 word-write does bar.
> 
> Hmm on x86 this is what happens with cf8..cfb range registers for example.
> We plan handle this ATM using memory region priorities.
> Same would work for prep won't it?

Hm, my point was that iiuc a MemoryRegion is per-address-range whereas
for I/O ports we seem to have per-data-width mappings.

Priorities would allow us to say:

0x1    -    0xff  is one region
    0x8-0xab      is a region with higher priority

but fallback for, e.g., word-access at 0xa0 to the lower-priority region
being unsupported today, no? I.e., the region being opaque.

Having said that, for the purposes of this discussion PReP is pretty
much a PC with a PowerPC CPU in it, unlike the modern CHRP machines.

Andreas

>> This is handled by having lists of (offset, length, size, handler)
>> quadruplets and consolidating those into MemoryRegions and aliases (cf.
>> patches) that then have a validation function to check whether a
>> particular access is valid and by whom it should be handled - that's
>> what MemoryRegionPortio[] and similar APIs are good for.
>>
>> So yes, it might be possible to have a device declare its ports at
>> PCIDevice or DeviceState level, but it can't be directly passed through
>> to MemoryRegion API in most cases, or conflicts would arise. At least
>> that was my experience with PReP.

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 20:33         ` [Qemu-devel] " Andreas Färber
@ 2013-01-30 20:55           ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 20:55 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Peter Maydell, KVM devel mailing list, Juan Quintela,
	Alexander Graf, qemu-devel, Hervé Poussineau, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, David Gibson, Alon Levy

On Wed, Jan 30, 2013 at 09:33:05PM +0100, Andreas Färber wrote:
> Am 30.01.2013 21:20, schrieb Michael S. Tsirkin:
> > On Wed, Jan 30, 2013 at 06:55:47PM +0100, Andreas Färber wrote:
> >> Am 30.01.2013 12:48, schrieb Peter Maydell:
> >>> On 30 January 2013 11:39, Andreas Färber <afaerber@suse.de> wrote:
> >>>> Proposal by hpoussin was to move _list_add() code to ISADevice:
> >>>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
> >>>>
> >>>> Concerns:
> >>>> * PCI devices (VGA, QXL) register I/O ports as well
> >>>>   => above patches add dependency on ISABus to machines
> >>>>      -> "<benh> no mac ever had one"
> >>>>   => PCIDevice shouldn't use ISA API with NULL ISADevice
> >>>> * Lack of avi: Who decides about memory API these days?
> >>>>
> >>>> armbru and agraf concluded that moving this into ISA is wrong.
> >>>>
> >>>> => I will drop the remaining ioport patches from above series.
> >>>>
> >>>> Suggestions on how to proceed with tackling the issue are welcome.
> >>>
> >>> How does this stuff work on real hardware? I would have
> >>> expected that a PCI device registering the fact it has
> >>> IO ports would have to do so via the PCI controller it
> >>> is plugged into...
> >>>
> >>> My naive don't-know-much-about-portio suggestion is that this
> >>> should work the same way as memory regions: each device
> >>> provides portio regions,
> >>
> >> One remark on "same way as memory regions", me not knowing all the gory
> >> hardware details myself.
> >>
> >> PIO often contradicts the normal MemoryRegion usage. I.e., for an MMIO
> >> device you would have a continuous region from say 0xa0000000 to
> >> 0xa007ffff inclusive and within that region you have some kind of sparse
> >> registers. With ISA ports you often have dense overlapping ranges, say,
> >> 0x3-0x6 byte-reads foo, while 0x4 word-write does bar.
> > 
> > Hmm on x86 this is what happens with cf8..cfb range registers for example.
> > We plan handle this ATM using memory region priorities.
> > Same would work for prep won't it?
> 
> Hm, my point was that iiuc a MemoryRegion is per-address-range whereas
> for I/O ports we seem to have per-data-width mappings.
> Priorities would allow us to say:
> 
> 0x1    -    0xff  is one region
>     0x8-0xab      is a region with higher priority
> 
> but fallback for, e.g., word-access at 0xa0 to the lower-priority region
> being unsupported today, no? I.e., the region being opaque.

No, MemoryRegion takes data width into account too.
See 'PIIX3: reset the VM when the Reset Control Register's RCPU bit gets
set' as one example.

> 
> Having said that, for the purposes of this discussion PReP is pretty
> much a PC with a PowerPC CPU in it, unlike the modern CHRP machines.
> 
> Andreas
> 
> >> This is handled by having lists of (offset, length, size, handler)
> >> quadruplets and consolidating those into MemoryRegions and aliases (cf.
> >> patches) that then have a validation function to check whether a
> >> particular access is valid and by whom it should be handled - that's
> >> what MemoryRegionPortio[] and similar APIs are good for.
> >>
> >> So yes, it might be possible to have a device declare its ports at
> >> PCIDevice or DeviceState level, but it can't be directly passed through
> >> to MemoryRegion API in most cases, or conflicts would arise. At least
> >> that was my experience with PReP.
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 13:59   ` [Qemu-devel] " Anthony Liguori
@ 2013-01-30 21:05     ` Benjamin Herrenschmidt
  2013-01-30 21:39       ` [Qemu-devel] " Anthony Liguori
  0 siblings, 1 reply; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-30 21:05 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, Alon Levy, qemu-ppc, Gerd Hoffmann,
	Hervé Poussineau, Andreas Färber, David Gibson

On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
> the top bit is set determines whether it's a "PIO" transaction or an
> "MMIO" transaction.  A large chunk of that address space is invalid of
> course.
> 
> PCI has a 65 bit address space too.  The 65th bit determines whether
> it's an IO transaction or an MMIO transaction.

This is somewhat an over simplification since IO and MMIO differs in
other ways, such as ordering rules :-) But for the sake of memory
regions decoding I suppose it will do.

> For architectures that only have a 64-bit address space, what the PCI
> controller typically does is pick a 16-bit window within that address
> space to map to a PCI address with the 65th bit set.

Sort-of yes. The window doesn't have to be 16-bit (we commonly have
larger IO space windows on powerpc) and there's a window per host
bridge, so there's effectively more than one IO space (as there is more
than one PCI MMIO space, with only a window off the CPU space routed to
each brigde).

Making a hard wired assumption that the PCI (MMIO and IO) space relates
directly to the CPU bus space is wrong on pretty much all !x86
architectures.

 .../...

You make it sound like substractive decode is a chipset hack. It's not,
it's specified in the PCI spec.

1) A chipset will route any non-positively decoded IO transaction (65th
>    bit set) to a single end point (usually the ISA-bridge).  Which one it
>    chooses is up to the chipset.  This is called subtractive decoding
>    because the PCI bus will wait multiple cycles for that device to
>    claim the transaction before bouncing it.

This is not a chipset matter. It's the ISA bridge itself that does
substractive decoding. There also exists P2P bridges doing such substractive
decoding, this used to be fairly common with transparent bridges used for
laptop docking.

> 2) There are special hacks in most PCI chipsets to route very specific
>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
>    go to the first VGA device.  Legacy IDE IO transactions go to the first
>    IDE device.  This doesn't need to be programmed in the BARs.  It will
>    just happen.

This is also mostly not a hack in the chipset. It's a well defined behaviour
for legacy devices, sometimes call hard decoding. Of course often those devices
are built into the chipset but they don't have to. Plug-in VGA devices will
hard decode legacy VGA regions for both IO and MMIO by default (this can be
disabled on most of them nowadays) for example. This has nothing to do with
the chipset.

There's a specific bit in P2P bridge to control the forwarding of legacy
transaction downstream (and VGA palette snoops), this is also fully specified
in the PCI spec.

> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
>    sent to the ISA-bridge (because it's faster this way).

Chipsets don't "send to a bridge". It's the bridge itself that decodes.

> Notice the lack of the word "ISA" in all of this other than describing
> the PCI class of an end point.

ISA is only relevant to the extent that the "legacy" regions of IO space
originate from the original ISA addresses of devices (VGA, IDE, etc...)
and to the extent that an ISA bus might still be present which will get
the transactions that nothing else have decoded in that space.

> So how should this be modeled?
> 
> On x86, the CPU has a pio address space.  That can propagate down
> through the PCI bus which is what we do today.
> 
> On !x86, the PCI controller ought to setup a MemoryRegion for
downstream
> PIO that devices can use to register on.
> 
> We probably need to do something like change the PCI VGA devices to
> export a MemoryRegion and allow the PCI controller to device how to
> register that as a subregion.

The VGA device should just register fixed address port IOs the same way
it would register an IO BAR. Essentially, hard coded IO addresses (or
memory, VGA does memory too, don't forget that) are equivalent to having
an invisible BAR with a fixed value in it.

There should be no "global port IO" because that concept is broken on
real multi-domain setups. Those "legacy" address ranges are just
hard-wired sub regions of the normal PCI space on which the device sits
on (unless you start doing real non-PCI ISA x86).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 16:54       ` Andreas Färber
  2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
@ 2013-01-30 21:07         ` Benjamin Herrenschmidt
  2013-01-30 21:42           ` [Qemu-devel] " Anthony Liguori
  1 sibling, 1 reply; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-30 21:07 UTC (permalink / raw)
  To: Andreas Färber
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, qemu-ppc, Gerd Hoffmann,
	Anthony Liguori, Hervé Poussineau, Alon Levy, David Gibson

On Wed, 2013-01-30 at 17:54 +0100, Andreas Färber wrote:
> 
> That would require polymorphism since we already need to derive from
> PCIDevice or ISADevice respectively for interfacing with the bus...
> Modern object-oriented languages have tried to avoid multi-inheritence
> due to arising complications, I thought. Wouldn't object if someone
> wanted to do the dirty implementation work though. ;)
> 
> Another such example is EHCI, with PCIDevice and SysBusDevice
> frontends,
> sharing an EHCIState struct and having helper functions operating on
> that core state only. Quite a few device share such a pattern today
> actually (serial, m48t59, ...).

This is a design bug of your model :-) You shouldn't derive from your
bus interface IMHO but from your functional interface, and have an
ownership relation to the PCIDevice (a bit like IOKit does if my memory
serves me well).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 17:08       ` Paolo Bonzini
@ 2013-01-30 21:08         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-30 21:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, Alon Levy, qemu-ppc, Gerd Hoffmann,
	Anthony Liguori, Hervé Poussineau, Andreas Färber,
	David Gibson

On Wed, 2013-01-30 at 18:08 +0100, Paolo Bonzini wrote:
> > Make VGACommonState a proper QOM object and use it as the base class
> for
> > QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.
> 
> I think QXL should have-a VGA rather than being one.  It completely
> bypasses the VGA infrastructure if not in VGA mode.

 ... Like any modern video card the minute you turn off the "enable
legacy crap" bit on them :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 21:05     ` Benjamin Herrenschmidt
@ 2013-01-30 21:39       ` Anthony Liguori
  2013-01-30 21:54         ` Benjamin Herrenschmidt
  2013-01-30 22:20         ` Michael S. Tsirkin
  0 siblings, 2 replies; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 21:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andreas Färber, Juan Quintela, KVM devel mailing list,
	qemu-devel, Alexander Graf, qemu-ppc, Hervé Poussineau,
	David Gibson, Gerd Hoffmann, Alon Levy, Michael S. Tsirkin

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
>> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
>> the top bit is set determines whether it's a "PIO" transaction or an
>> "MMIO" transaction.  A large chunk of that address space is invalid of
>> course.
>> 
>> PCI has a 65 bit address space too.  The 65th bit determines whether
>> it's an IO transaction or an MMIO transaction.
>
> This is somewhat an over simplification since IO and MMIO differs in
> other ways, such as ordering rules :-) But for the sake of memory
> regions decoding I suppose it will do.
>
>> For architectures that only have a 64-bit address space, what the PCI
>> controller typically does is pick a 16-bit window within that address
>> space to map to a PCI address with the 65th bit set.
>
> Sort-of yes. The window doesn't have to be 16-bit (we commonly have
> larger IO space windows on powerpc) and there's a window per host
> bridge, so there's effectively more than one IO space (as there is more
> than one PCI MMIO space, with only a window off the CPU space routed to
> each brigde).

Ack.

> Making a hard wired assumption that the PCI (MMIO and IO) space relates
> directly to the CPU bus space is wrong on pretty much all !x86
> architectures.

Ack.

>
>  .../...
>
> You make it sound like substractive decode is a chipset hack. It's not,
> it's specified in the PCI spec.

It's a hack :-)  It's a well specified hack, but it's still a hack.

>> 1) A chipset will route any non-positively decoded IO transaction (65th
>>    bit set) to a single end point (usually the ISA-bridge).  Which one it
>>    chooses is up to the chipset.  This is called subtractive decoding
>>    because the PCI bus will wait multiple cycles for that device to
>>    claim the transaction before bouncing it.
>
> This is not a chipset matter. It's the ISA bridge itself that does
> substractive decoding.

The PCI bus can have one end point that that can be the target for
subtractive decoding (not hard decoding, subtractive decoding).  IOW,
you can only have a single ISA Bridge within a single PCI domain.

You are right--chipset is the wrong word.  I'm used to thinking in terms
of only a single domain :-)

> There also exists P2P bridges doing such substractive
> decoding, this used to be fairly common with transparent bridges used for
> laptop docking.

I'm not sure I understand how this would work.  How can two devices on
the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
even says:

"Subtractive decoding can be implemented by only one device on the bus
 since it accepts all accesses not positively decoded by some other
 agent."

>> 2) There are special hacks in most PCI chipsets to route very specific
>>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
>>    go to the first VGA device.  Legacy IDE IO transactions go to the first
>>    IDE device.  This doesn't need to be programmed in the BARs.  It will
>>    just happen.
>
> This is also mostly not a hack in the chipset. It's a well defined behaviour
> for legacy devices, sometimes call hard decoding. Of course often those devices
> are built into the chipset but they don't have to. Plug-in VGA devices will
> hard decode legacy VGA regions for both IO and MMIO by default (this can be
> disabled on most of them nowadays) for example. This has nothing to do with
> the chipset.

So I understand what you're saying re: PCI because the devices actually
assert DEVSEL to indicate that they handle the transaction.

But for PCI-E, doesn't the controller have to expressly identify what
the target is?  Is this done with the device class?

> There's a specific bit in P2P bridge to control the forwarding of legacy
> transaction downstream (and VGA palette snoops), this is also fully specified
> in the PCI spec.

Ack.

>
>> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
>>    sent to the ISA-bridge (because it's faster this way).
>
> Chipsets don't "send to a bridge". It's the bridge itself that
> decodes.

With PCI...

>> Notice the lack of the word "ISA" in all of this other than describing
>> the PCI class of an end point.
>
> ISA is only relevant to the extent that the "legacy" regions of IO space
> originate from the original ISA addresses of devices (VGA, IDE, etc...)
> and to the extent that an ISA bus might still be present which will get
> the transactions that nothing else have decoded in that space.

Ack.

>  
>> So how should this be modeled?
>> 
>> On x86, the CPU has a pio address space.  That can propagate down
>> through the PCI bus which is what we do today.
>> 
>> On !x86, the PCI controller ought to setup a MemoryRegion for
> downstream
>> PIO that devices can use to register on.
>> 
>> We probably need to do something like change the PCI VGA devices to
>> export a MemoryRegion and allow the PCI controller to device how to
>> register that as a subregion.
>
> The VGA device should just register fixed address port IOs the same way
> it would register an IO BAR. Essentially, hard coded IO addresses (or
> memory, VGA does memory too, don't forget that) are equivalent to having
> an invisible BAR with a fixed value in it.

Ack.

>
> There should be no "global port IO" because that concept is broken on
> real multi-domain setups. Those "legacy" address ranges are just
> hard-wired sub regions of the normal PCI space on which the device sits
> on (unless you start doing real non-PCI ISA x86).

So, I think what you're suggesting (and I agree with), is that each PCI
device should export one or more MemoryRegions and indicate what the
MemoryRegions are for.

Potential options are:

 - MMIO BAR
 - PIO BAR
 - IDE hard decode
 - VGA hard decode
 - subtractive decode

I'm very much in agreement if that's what you're suggesting.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 21:07         ` Benjamin Herrenschmidt
@ 2013-01-30 21:42           ` Anthony Liguori
  0 siblings, 0 replies; 57+ messages in thread
From: Anthony Liguori @ 2013-01-30 21:42 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Andreas Färber
  Cc: Gerd Hoffmann, Juan Quintela, KVM devel mailing list, qemu-devel,
	Alexander Graf, qemu-ppc, Hervé Poussineau, David Gibson,
	Alon Levy, Michael S. Tsirkin

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Wed, 2013-01-30 at 17:54 +0100, Andreas Färber wrote:
>> 
>> That would require polymorphism since we already need to derive from
>> PCIDevice or ISADevice respectively for interfacing with the bus...
>> Modern object-oriented languages have tried to avoid multi-inheritence
>> due to arising complications, I thought. Wouldn't object if someone
>> wanted to do the dirty implementation work though. ;)
>> 
>> Another such example is EHCI, with PCIDevice and SysBusDevice
>> frontends,
>> sharing an EHCIState struct and having helper functions operating on
>> that core state only. Quite a few device share such a pattern today
>> actually (serial, m48t59, ...).
>
> This is a design bug of your model :-) You shouldn't derive from your
> bus interface IMHO but from your functional interface, and have an
> ownership relation to the PCIDevice (a bit like IOKit does if my memory
> serves me well).

Ack.  Hence:

SerialPCIDevice is-a PCIDevice
   has-a SerialChipset

The board that exports a bus interface is one object.  The chipset that
implements the functionality is another object.

The former's job in life is to map the bus interface to whatever
interface the functional object expects.  In most cases, this is just a
straight forward proxy of a MemoryRegion.  Sometimes this involves
address shifting, etc.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 21:39       ` [Qemu-devel] " Anthony Liguori
@ 2013-01-30 21:54         ` Benjamin Herrenschmidt
  2013-01-30 22:20         ` Michael S. Tsirkin
  1 sibling, 0 replies; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-30 21:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	qemu-devel, Alexander Graf, Alon Levy, qemu-ppc, Gerd Hoffmann,
	Hervé Poussineau, Andreas Färber, David Gibson

On Wed, 2013-01-30 at 15:39 -0600, Anthony Liguori wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> > There also exists P2P bridges doing such substractive
> > decoding, this used to be fairly common with transparent bridges used for
> > laptop docking.
> 
> I'm not sure I understand how this would work.  How can two devices on
> the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
> even says:
> 
> "Subtractive decoding can be implemented by only one device on the bus
>  since it accepts all accesses not positively decoded by some other
>  agent."

They would typically be the only one at *that* level, though of course
some other device can do it underneath them.

> >> 2) There are special hacks in most PCI chipsets to route very specific
> >>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
> >>    go to the first VGA device.  Legacy IDE IO transactions go to the first
> >>    IDE device.  This doesn't need to be programmed in the BARs.  It will
> >>    just happen.
> >
> > This is also mostly not a hack in the chipset. It's a well defined behaviour
> > for legacy devices, sometimes call hard decoding. Of course often those devices
> > are built into the chipset but they don't have to. Plug-in VGA devices will
> > hard decode legacy VGA regions for both IO and MMIO by default (this can be
> > disabled on most of them nowadays) for example. This has nothing to do with
> > the chipset.
> 
> So I understand what you're saying re: PCI because the devices actually
> assert DEVSEL to indicate that they handle the transaction.

Yes.

> But for PCI-E, doesn't the controller have to expressly identify what
> the target is?  Is this done with the device class?

No.

PCI-E is point to point. So until you have switches, a device will get
any transaction and can decide to accept or reject it (which boils down
from a high level SW perspective to the same thing as asserting DEVSEL
or not, the rejection typically causing a master abort style error).

If you have a switch, it's the bridge windows that define where a
transaction goes, and those still, I believe, support the bit to
indicate forwarding of legacy ISA in addition to the usual bridge
windows.
 
> > There's a specific bit in P2P bridge to control the forwarding of legacy
> > transaction downstream (and VGA palette snoops), this is also fully specified
> > in the PCI spec.
> 
> Ack.
> 
> >
> >> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
> >>    sent to the ISA-bridge (because it's faster this way).
> >
> > Chipsets don't "send to a bridge". It's the bridge itself that
> > decodes.
> 
> With PCI...

Right. And PCI-E :-)

> >> Notice the lack of the word "ISA" in all of this other than describing
> >> the PCI class of an end point.
> >
> > ISA is only relevant to the extent that the "legacy" regions of IO space
> > originate from the original ISA addresses of devices (VGA, IDE, etc...)
> > and to the extent that an ISA bus might still be present which will get
> > the transactions that nothing else have decoded in that space.
> 
> Ack.
> 
> >  
> >> So how should this be modeled?
> >> 
> >> On x86, the CPU has a pio address space.  That can propagate down
> >> through the PCI bus which is what we do today.
> >> 
> >> On !x86, the PCI controller ought to setup a MemoryRegion for
> > downstream
> >> PIO that devices can use to register on.
> >> 
> >> We probably need to do something like change the PCI VGA devices to
> >> export a MemoryRegion and allow the PCI controller to device how to
> >> register that as a subregion.
> >
> > The VGA device should just register fixed address port IOs the same way
> > it would register an IO BAR. Essentially, hard coded IO addresses (or
> > memory, VGA does memory too, don't forget that) are equivalent to having
> > an invisible BAR with a fixed value in it.
> 
> Ack.
> 
> >
> > There should be no "global port IO" because that concept is broken on
> > real multi-domain setups. Those "legacy" address ranges are just
> > hard-wired sub regions of the normal PCI space on which the device sits
> > on (unless you start doing real non-PCI ISA x86).
> 
> So, I think what you're suggesting (and I agree with), is that each PCI
> device should export one or more MemoryRegions and indicate what the
> MemoryRegions are for.
> 
> Potential options are:
> 
>  - MMIO BAR
>  - PIO BAR
>  - IDE hard decode
>  - VGA hard decode
>  - subtractive decode

Simpler:

 - MMIO BAR with variable address (normal BAR)
 - MMIO BAR with fixed address (legacy)
 - PIO BAR with variable address
 - PIO BAR with fixed address

That's for devices. For bridge, add substractive decode option.

IE. No need to make "IDE" or "VGA" special. There are other cases of
fixed address decoding anyway, some legacy serial cards (PCI based)
still hard decode by default for example, etc...

Also some devices actually use BARs for hard decode. IE. It's common
for IDE devices to actually have all BARs for PIO but they start
read-only and set to the legacy addresses, and you can change them to
RW & configurable by writing to ProgIf to switch from legacy to native
(it also has the side effect of changing the irq from edge to level on
some which is a horribly gross hack from hell since PCI interrupts
aren't supposed to be edge but heh ... welcome to x86).

> I'm very much in agreement if that's what you're suggesting.

I think we are roughly on the same line :-)

Cheers,
Ben.

> Regards,
> 
> Anthony Liguori
> 
> >
> > Cheers,
> > Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 21:39       ` [Qemu-devel] " Anthony Liguori
  2013-01-30 21:54         ` Benjamin Herrenschmidt
@ 2013-01-30 22:20         ` Michael S. Tsirkin
  2013-01-30 22:32           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 22:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Alon Levy, qemu-ppc, Gerd Hoffmann, Hervé Poussineau,
	Andreas Färber, David Gibson

On Wed, Jan 30, 2013 at 03:39:34PM -0600, Anthony Liguori wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> > On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
> >> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
> >> the top bit is set determines whether it's a "PIO" transaction or an
> >> "MMIO" transaction.  A large chunk of that address space is invalid of
> >> course.
> >> 
> >> PCI has a 65 bit address space too.  The 65th bit determines whether
> >> it's an IO transaction or an MMIO transaction.
> >
> > This is somewhat an over simplification since IO and MMIO differs in
> > other ways, such as ordering rules :-) But for the sake of memory
> > regions decoding I suppose it will do.
> >
> >> For architectures that only have a 64-bit address space, what the PCI
> >> controller typically does is pick a 16-bit window within that address
> >> space to map to a PCI address with the 65th bit set.
> >
> > Sort-of yes. The window doesn't have to be 16-bit (we commonly have
> > larger IO space windows on powerpc) and there's a window per host
> > bridge, so there's effectively more than one IO space (as there is more
> > than one PCI MMIO space, with only a window off the CPU space routed to
> > each brigde).
> 
> Ack.
> 
> > Making a hard wired assumption that the PCI (MMIO and IO) space relates
> > directly to the CPU bus space is wrong on pretty much all !x86
> > architectures.
> 
> Ack.
> 
> >
> >  .../...
> >
> > You make it sound like substractive decode is a chipset hack. It's not,
> > it's specified in the PCI spec.
> 
> It's a hack :-)  It's a well specified hack, but it's still a hack.
> 
> >> 1) A chipset will route any non-positively decoded IO transaction (65th
> >>    bit set) to a single end point (usually the ISA-bridge).  Which one it
> >>    chooses is up to the chipset.  This is called subtractive decoding
> >>    because the PCI bus will wait multiple cycles for that device to
> >>    claim the transaction before bouncing it.
> >
> > This is not a chipset matter. It's the ISA bridge itself that does
> > substractive decoding.
> 
> The PCI bus can have one end point that that can be the target for
> subtractive decoding (not hard decoding, subtractive decoding).  IOW,
> you can only have a single ISA Bridge within a single PCI domain.
> 
> You are right--chipset is the wrong word.  I'm used to thinking in terms
> of only a single domain :-)
> 
> > There also exists P2P bridges doing such substractive
> > decoding, this used to be fairly common with transparent bridges used for
> > laptop docking.
> 
> I'm not sure I understand how this would work.  How can two devices on
> the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
> even says:
> 
> "Subtractive decoding can be implemented by only one device on the bus
>  since it accepts all accesses not positively decoded by some other
>  agent."
> 
> >> 2) There are special hacks in most PCI chipsets to route very specific
> >>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
> >>    go to the first VGA device.  Legacy IDE IO transactions go to the first
> >>    IDE device.  This doesn't need to be programmed in the BARs.  It will
> >>    just happen.
> >
> > This is also mostly not a hack in the chipset. It's a well defined behaviour
> > for legacy devices, sometimes call hard decoding. Of course often those devices
> > are built into the chipset but they don't have to. Plug-in VGA devices will
> > hard decode legacy VGA regions for both IO and MMIO by default (this can be
> > disabled on most of them nowadays) for example. This has nothing to do with
> > the chipset.
> 
> So I understand what you're saying re: PCI because the devices actually
> assert DEVSEL to indicate that they handle the transaction.
> 
> But for PCI-E, doesn't the controller have to expressly identify what
> the target is?  Is this done with the device class?

Well you can have a PCI bridge and a legacy device behind that.
I think real PCI express devices can not be mapped onto legacy address
ranges.


> > There's a specific bit in P2P bridge to control the forwarding of legacy
> > transaction downstream (and VGA palette snoops), this is also fully specified
> > in the PCI spec.
> 
> Ack.
> 
> >
> >> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
> >>    sent to the ISA-bridge (because it's faster this way).
> >
> > Chipsets don't "send to a bridge". It's the bridge itself that
> > decodes.
> 
> With PCI...
> 
> >> Notice the lack of the word "ISA" in all of this other than describing
> >> the PCI class of an end point.
> >
> > ISA is only relevant to the extent that the "legacy" regions of IO space
> > originate from the original ISA addresses of devices (VGA, IDE, etc...)
> > and to the extent that an ISA bus might still be present which will get
> > the transactions that nothing else have decoded in that space.
> 
> Ack.
> 
> >  
> >> So how should this be modeled?
> >> 
> >> On x86, the CPU has a pio address space.  That can propagate down
> >> through the PCI bus which is what we do today.
> >> 
> >> On !x86, the PCI controller ought to setup a MemoryRegion for
> > downstream
> >> PIO that devices can use to register on.
> >> 
> >> We probably need to do something like change the PCI VGA devices to
> >> export a MemoryRegion and allow the PCI controller to device how to
> >> register that as a subregion.
> >
> > The VGA device should just register fixed address port IOs the same way
> > it would register an IO BAR. Essentially, hard coded IO addresses (or
> > memory, VGA does memory too, don't forget that) are equivalent to having
> > an invisible BAR with a fixed value in it.
> 
> Ack.
> 
> >
> > There should be no "global port IO" because that concept is broken on
> > real multi-domain setups. Those "legacy" address ranges are just
> > hard-wired sub regions of the normal PCI space on which the device sits
> > on (unless you start doing real non-PCI ISA x86).
> 
> So, I think what you're suggesting (and I agree with), is that each PCI
> device should export one or more MemoryRegions and indicate what the
> MemoryRegions are for.
> 
> Potential options are:
> 
>  - MMIO BAR
>  - PIO BAR
>  - IDE hard decode
>  - VGA hard decode
>  - subtractive decode
> 
> I'm very much in agreement if that's what you're suggesting.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > Cheers,
> > Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 22:20         ` Michael S. Tsirkin
@ 2013-01-30 22:32           ` Benjamin Herrenschmidt
  2013-01-30 22:49             ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-30 22:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Alon Levy, qemu-ppc, Gerd Hoffmann, Anthony Liguori,
	Hervé Poussineau, Andreas Färber, David Gibson

On Thu, 2013-01-31 at 00:20 +0200, Michael S. Tsirkin wrote:
> 
> Well you can have a PCI bridge and a legacy device behind that.
> I think real PCI express devices can not be mapped onto legacy address
> ranges.

In practice they do (VGA at least)

>From a SW modelling standpoint, I don't think it's worth differentiating
PCI and PCIE.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 22:32           ` Benjamin Herrenschmidt
@ 2013-01-30 22:49             ` Michael S. Tsirkin
  2013-01-30 23:02               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-30 22:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Alon Levy, qemu-ppc, Gerd Hoffmann, Anthony Liguori,
	Hervé Poussineau, Andreas Färber, David Gibson

On Thu, Jan 31, 2013 at 09:32:05AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2013-01-31 at 00:20 +0200, Michael S. Tsirkin wrote:
> > 
> > Well you can have a PCI bridge and a legacy device behind that.
> > I think real PCI express devices can not be mapped onto legacy address
> > ranges.
> 
> In practice they do (VGA at least)
> 
> >From a SW modelling standpoint, I don't think it's worth differentiating
> PCI and PCIE.
> 
> Cheers,
> Ben.

Interesting.
Do you have such hardware? Could you please dump
the output of lspci -vv?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 22:49             ` Michael S. Tsirkin
@ 2013-01-30 23:02               ` Benjamin Herrenschmidt
  2013-01-30 23:28                 ` Alex Williamson
  0 siblings, 1 reply; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-30 23:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: KVM devel mailing list, Juan Quintela, qemu-devel, Alexander Graf,
	Alon Levy, qemu-ppc, Gerd Hoffmann, Anthony Liguori,
	Hervé Poussineau, Andreas Färber, David Gibson

On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > In practice they do (VGA at least)
> > 
> > >From a SW modelling standpoint, I don't think it's worth
> differentiating
> > PCI and PCIE.
> > 
> > Cheers,
> > Ben.
> 
> Interesting.
> Do you have such hardware? Could you please dump
> the output of lspci -vv?

Any ATI or nVidia card still supports hard decoding of VGA regions for
the sake of legacy operating systems and BIOSes :-) I don't know about
Intel but I suppose it's the same.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 23:02               ` Benjamin Herrenschmidt
@ 2013-01-30 23:28                 ` Alex Williamson
  2013-01-31 10:49                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Alex Williamson @ 2013-01-30 23:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	Alexander Graf, qemu-devel, Alon Levy, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, David Gibson, Andreas Färber,
	Hervé Poussineau

On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > In practice they do (VGA at least)
> > > 
> > > >From a SW modelling standpoint, I don't think it's worth
> > differentiating
> > > PCI and PCIE.
> > > 
> > > Cheers,
> > > Ben.
> > 
> > Interesting.
> > Do you have such hardware? Could you please dump
> > the output of lspci -vv?
> 
> Any ATI or nVidia card still supports hard decoding of VGA regions for
> the sake of legacy operating systems and BIOSes :-) I don't know about
> Intel but I suppose it's the same.

For example:

-[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 p
           +-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]

00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000c000-0000cfff
	Memory behind bridge: fd100000-fd1fffff
	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
                                      ^^^^

VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
0x3df, and 0xa0000 - 0xbfff.  Device 2:00.0 of course doesn't report
these "ISA" ranges as they're implicit in the VGA class code.

BTW, I've been working on vfio-pci support of VGA assignment which makes
use of the VGA arbiter in the host to manipulate the VGA Enable control
register, allowing us to select which device to access.  The qemu side
is simply registering memory regions for the VGA areas and expecting to
be used with -vga none, but I'll adopt whatever strategy we choose for
hard coded address range support.  Current base patches at the links
below.  Thanks,

Alex

https://github.com/awilliam/qemu-vfio/commit/ea2befa59010a429dcf13c10dbccdf8b64e82fbd
https://github.com/awilliam/linux-vfio/commit/bae182d929229cbf1eaeb01e5fad4f77f81a4c61

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-30 23:28                 ` Alex Williamson
@ 2013-01-31 10:49                   ` Michael S. Tsirkin
  2013-01-31 16:34                     ` Alex Williamson
  2013-01-31 21:22                     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-31 10:49 UTC (permalink / raw)
  To: Alex Williamson
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alon Levy, Gerd Hoffmann, Anthony Liguori, qemu-ppc, David Gibson,
	Andreas Färber, Hervé Poussineau

On Wed, Jan 30, 2013 at 04:28:30PM -0700, Alex Williamson wrote:
> On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> > On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > > In practice they do (VGA at least)
> > > > 
> > > > >From a SW modelling standpoint, I don't think it's worth
> > > differentiating
> > > > PCI and PCIE.
> > > > 
> > > > Cheers,
> > > > Ben.
> > > 
> > > Interesting.
> > > Do you have such hardware? Could you please dump
> > > the output of lspci -vv?
> > 
> > Any ATI or nVidia card still supports hard decoding of VGA regions for
> > the sake of legacy operating systems and BIOSes :-) I don't know about
> > Intel but I suppose it's the same.
> 
> For example:
> 
> -[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 p
>            +-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]
> 
> 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> 	I/O behind bridge: 0000c000-0000cfff
> 	Memory behind bridge: fd100000-fd1fffff
> 	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> 	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
>                                       ^^^^
> VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
> 0x3df, and 0xa0000 - 0xbfff.  Device 2:00.0 of course doesn't report
> these "ISA" ranges as they're implicit in the VGA class code.

OK but this appears behind a bridge.  So the bridge configuration tells
the root complex where to send accesses to the VGA.

But qemu currently puts devices directly on root bus.

And as far as I can tell when we present devices directly on bus 0, we
pretend these are integrated in the root complex. The spec seems to
say explicitly that root complex integrated devices should not use legacy
addresses or support hotplug. So I would be surprised if such one
appears in real world.

Luckily guests do not seem to be worried as long as we use ACPI.

> 
> BTW, I've been working on vfio-pci support of VGA assignment which makes
> use of the VGA arbiter in the host to manipulate the VGA Enable control
> register, allowing us to select which device to access.  The qemu side
> is simply registering memory regions for the VGA areas and expecting to
> be used with -vga none, but I'll adopt whatever strategy we choose for
> hard coded address range support.  Current base patches at the links
> below.  Thanks,
> 
> Alex
> 
> https://github.com/awilliam/qemu-vfio/commit/ea2befa59010a429dcf13c10dbccdf8b64e82fbd
> https://github.com/awilliam/linux-vfio/commit/bae182d929229cbf1eaeb01e5fad4f77f81a4c61

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 10:49                   ` Michael S. Tsirkin
@ 2013-01-31 16:34                     ` Alex Williamson
  2013-01-31 21:11                       ` Michael S. Tsirkin
  2013-01-31 21:44                       ` Benjamin Herrenschmidt
  2013-01-31 21:22                     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 57+ messages in thread
From: Alex Williamson @ 2013-01-31 16:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alon Levy, Gerd Hoffmann, Anthony Liguori, qemu-ppc, David Gibson,
	Andreas Färber, Hervé Poussineau


On Thu, 2013-01-31 at 12:49 +0200, Michael S. Tsirkin wrote:
> On Wed, Jan 30, 2013 at 04:28:30PM -0700, Alex Williamson wrote:
> > On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> > > On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > > > In practice they do (VGA at least)
> > > > > 
> > > > > >From a SW modelling standpoint, I don't think it's worth
> > > > differentiating
> > > > > PCI and PCIE.
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > 
> > > > Interesting.
> > > > Do you have such hardware? Could you please dump
> > > > the output of lspci -vv?
> > > 
> > > Any ATI or nVidia card still supports hard decoding of VGA regions for
> > > the sake of legacy operating systems and BIOSes :-) I don't know about
> > > Intel but I suppose it's the same.
> > 
> > For example:
> > 
> > -[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 p
> >            +-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]
> > 
> > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
> > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > 	Latency: 0, Cache Line Size: 64 bytes
> > 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> > 	I/O behind bridge: 0000c000-0000cfff
> > 	Memory behind bridge: fd100000-fd1fffff
> > 	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
> > 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> > 	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
> >                                       ^^^^
> > VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
> > 0x3df, and 0xa0000 - 0xbfff.  Device 2:00.0 of course doesn't report
> > these "ISA" ranges as they're implicit in the VGA class code.
> 
> OK but this appears behind a bridge.  So the bridge configuration tells
> the root complex where to send accesses to the VGA.
> 
> But qemu currently puts devices directly on root bus.
> 
> And as far as I can tell when we present devices directly on bus 0, we
> pretend these are integrated in the root complex. The spec seems to
> say explicitly that root complex integrated devices should not use legacy
> addresses or support hotplug. So I would be surprised if such one
> appears in real world.
> 
> Luckily guests do not seem to be worried as long as we use ACPI.

Yes, in fact I just figured out last night that Windows is unhappy with
assigned PCI devices on bus 0 that claim to be an endpoint in their PCIe
capability rather than an integrated endpoint.  We'll need to do extra
mangling of the PCIe capability to massage it into the guest visible
topology.

Section 1.3.2.3 of the 3.0 spec says integrated endpoints must not
require I/O resources claimed through BAR(s).  VGA skirts around this by
not having the legacy resources claimed by BARs, but instead being
implicit.  Are there other sections restricting legacy I/O?

It's common that a plugin VGA card sits behind a root port where the
bridge registers tell us about VGA routing, but integrated VGA devices
are often on bus 0 though, here's an example:

-[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
           +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller

Often these systems will disable the integrated graphics when a plugin
graphics is installed below a root port.  I'm not sure how the system
knows to route VGA to the integrated device vs the root port otherwise.

Here's a more interesting example:

-+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
 |           \-00.1  NVIDIA Corporation High Definition Audio Controller
 \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
             +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port

This system seems to have two host bridges with VGA behind each of them.
There's no bridge to control VGA routing, so I don't know how the
selection is done.  It's possible the g210m never sees legacy VGA
accesses in this mode.  This bios has another mode which makes the g210m
the primary graphics and hides the integrated graphics, essentially the
same as I mention above with hiding integrated endpoint graphics when
plugin graphics are used.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: What to do about non-qdevified devices?
  2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
  2013-01-30 16:58             ` Paolo Bonzini
@ 2013-01-31 18:48             ` Markus Armbruster
  1 sibling, 0 replies; 57+ messages in thread
From: Markus Armbruster @ 2013-01-31 18:48 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Peter Maydell, Anthony Liguori, KVM devel mailing list,
	Juan Quintela, qemu-devel, Alexander Graf

Andreas Färber <afaerber@suse.de> writes:

> Am 30.01.2013 13:35, schrieb Markus Armbruster:
>> Peter Maydell <peter.maydell@linaro.org> writes:
>> 
>>> On 30 January 2013 07:02, Markus Armbruster <armbru@redhat.com> wrote:
>>>> Anthony Liguori <aliguori@us.ibm.com> writes:
>>>>
>>>> [...]
>>>>> The problems I ran into were (1) this is a lot of work (2) it basically
>>>>> requires that all bus children have been qdev/QOM-ified.  Even with
>>>>> something like the ISA bus which is where I started, quite a few devices
>>>>> were not qdevified still.
>>>>
>>>> So what's the plan to complete the qdevification job?  Lay really low
>>>> and quietly hope the problem goes away?  We've tried that for about
>>>> three years, doesn't seem to work.
>>>
>>> Do we have a list of not-yet-qdevified devices? Maybe we need to
>>> start saying "fix X Y and Z or platform P is dropped from the next
>>> release". (This would of course be easier if we had a way to let users
>>> know that platform P was in danger...)
>> 
>> I think that's a good idea.  Only problem is identifying pre-qdev
>> devices in the code requires code inspection (grep won't do, I'm
>> afraid).
>
> +1 That would address my request as well.
>
> Having a list of low-hanging fruit on the Wiki might also give new
> contributors some ideas of where and how to start poking at the code.
>
>> If we agree on a "qdevify or else" plan, I'd be prepared to help with
>> the digging up of devices.
>
> I disagree on the "or else" part. I have been qdev'ifying and QOM'ifying
> devices in my maintenance area, and progress is slow. It gets even

Good work, much appreciated.

> slower if one leaves clearly maintained areas. I see no good reason to
> force a pistol on someone's breast, like you have done for IDE, unless
> there is a good reason to do so. Currently I don't see any.

There's the reason that made me hijack this thread.  Paraphrashing
Anthony: doing IRQs right involves Pin objects, and ultimately requires
all bus children have been qdevified.  Even for ISA, there are still
stragglers holding us back.

Is that sufficient reason to rip out devices *now*?  No, and I didn't
call for it.

Could it become sufficient reason in the not too distant future?
Possibly.  Should we plan ahead for such a contingency?  Probably.  But
I didn't call for that either.

What I actually wrote was 1. I think mapping the remaining qdevification
work is a good idea, and 2. if we commit to attempt doing that work in a
reasonable time frame, I'd be willing to help with the mapping.
Implying that without such a committment, sorry, got more immediately
useful things to do.

And by the way, the kind of "pistol" I get to brandish in this group is
about as scary as a water pistol in the middle of the Gobi desert.

[...]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 16:34                     ` Alex Williamson
@ 2013-01-31 21:11                       ` Michael S. Tsirkin
  2013-01-31 21:21                         ` Alex Williamson
  2013-01-31 21:44                       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-31 21:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alon Levy, Gerd Hoffmann, Anthony Liguori, qemu-ppc, David Gibson,
	Andreas Färber, Hervé Poussineau

On Thu, Jan 31, 2013 at 09:34:03AM -0700, Alex Williamson wrote:
> 
> On Thu, 2013-01-31 at 12:49 +0200, Michael S. Tsirkin wrote:
> > On Wed, Jan 30, 2013 at 04:28:30PM -0700, Alex Williamson wrote:
> > > On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> > > > On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > > > > In practice they do (VGA at least)
> > > > > > 
> > > > > > >From a SW modelling standpoint, I don't think it's worth
> > > > > differentiating
> > > > > > PCI and PCIE.
> > > > > > 
> > > > > > Cheers,
> > > > > > Ben.
> > > > > 
> > > > > Interesting.
> > > > > Do you have such hardware? Could you please dump
> > > > > the output of lspci -vv?
> > > > 
> > > > Any ATI or nVidia card still supports hard decoding of VGA regions for
> > > > the sake of legacy operating systems and BIOSes :-) I don't know about
> > > > Intel but I suppose it's the same.
> > > 
> > > For example:
> > > 
> > > -[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 p
> > >            +-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]
> > > 
> > > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
> > > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > 	Latency: 0, Cache Line Size: 64 bytes
> > > 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> > > 	I/O behind bridge: 0000c000-0000cfff
> > > 	Memory behind bridge: fd100000-fd1fffff
> > > 	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
> > > 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> > > 	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
> > >                                       ^^^^
> > > VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
> > > 0x3df, and 0xa0000 - 0xbfff.  Device 2:00.0 of course doesn't report
> > > these "ISA" ranges as they're implicit in the VGA class code.
> > 
> > OK but this appears behind a bridge.  So the bridge configuration tells
> > the root complex where to send accesses to the VGA.
> > 
> > But qemu currently puts devices directly on root bus.
> > 
> > And as far as I can tell when we present devices directly on bus 0, we
> > pretend these are integrated in the root complex. The spec seems to
> > say explicitly that root complex integrated devices should not use legacy
> > addresses or support hotplug. So I would be surprised if such one
> > appears in real world.
> > 
> > Luckily guests do not seem to be worried as long as we use ACPI.
> 
> Yes, in fact I just figured out last night that Windows is unhappy with
> assigned PCI devices on bus 0 that claim to be an endpoint in their PCIe
> capability rather than an integrated endpoint.  We'll need to do extra
> mangling of the PCIe capability to massage it into the guest visible
> topology.

For now, just put you device behind an express bridge. This breaks acpi
hotplug for now, but I'm looking into hotplug with bridges anyway.

If you really need it I can give you a hack for hotplug too.

Of course express  does not allow hotplug of root complex parts
but happens to work because we use ACPI.

> Section 1.3.2.3 of the 3.0 spec says integrated endpoints must not
> require I/O resources claimed through BAR(s).  VGA skirts around this by
> not having the legacy resources claimed by BARs, but instead being
> implicit.

Aha. I missed this point.

>  Are there other sections restricting legacy I/O?

One other interesting things is that VGA enable bit (for bridge control
register) does not appear in express spec at all.

> It's common that a plugin VGA card sits behind a root port where the
> bridge registers tell us about VGA routing,
> but integrated VGA devices
> are often on bus 0 though, here's an example:
> 
> -[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
>            +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller
> 
> Often these systems will disable the integrated graphics when a plugin
> graphics is installed below a root port.  I'm not sure how the system
> knows to route VGA to the integrated device vs the root port otherwise.

I am guessing it disables the integrated graphics?

> Here's a more interesting example:
> 
> -+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
>  |           \-00.1  NVIDIA Corporation High Definition Audio Controller
>  \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
>              +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port
> 
> This system seems to have two host bridges with VGA behind each of them.
> There's no bridge to control VGA routing, so I don't know how the
> selection is done.

Is IO space disabled for the inactive card? Maybe that is how.

>  It's possible the g210m never sees legacy VGA
> accesses in this mode.  This bios has another mode which makes the g210m
> the primary graphics and hides the integrated graphics, essentially the
> same as I mention above with hiding integrated endpoint graphics when
> plugin graphics are used.  Thanks,
> 
> Alex

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 21:11                       ` Michael S. Tsirkin
@ 2013-01-31 21:21                         ` Alex Williamson
  2013-01-31 22:20                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Alex Williamson @ 2013-01-31 21:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alon Levy, Gerd Hoffmann, Anthony Liguori, qemu-ppc, David Gibson,
	Andreas Färber, Hervé Poussineau

On Thu, 2013-01-31 at 23:11 +0200, Michael S. Tsirkin wrote:
> On Thu, Jan 31, 2013 at 09:34:03AM -0700, Alex Williamson wrote:
> > 
> > On Thu, 2013-01-31 at 12:49 +0200, Michael S. Tsirkin wrote:
> > > On Wed, Jan 30, 2013 at 04:28:30PM -0700, Alex Williamson wrote:
> > > > On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> > > > > On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > > > > > In practice they do (VGA at least)
> > > > > > > 
> > > > > > > >From a SW modelling standpoint, I don't think it's worth
> > > > > > differentiating
> > > > > > > PCI and PCIE.
> > > > > > > 
> > > > > > > Cheers,
> > > > > > > Ben.
> > > > > > 
> > > > > > Interesting.
> > > > > > Do you have such hardware? Could you please dump
> > > > > > the output of lspci -vv?
> > > > > 
> > > > > Any ATI or nVidia card still supports hard decoding of VGA regions for
> > > > > the sake of legacy operating systems and BIOSes :-) I don't know about
> > > > > Intel but I suppose it's the same.
> > > > 
> > > > For example:
> > > > 
> > > > -[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 p
> > > >            +-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]
> > > > 
> > > > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
> > > > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > 	Latency: 0, Cache Line Size: 64 bytes
> > > > 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> > > > 	I/O behind bridge: 0000c000-0000cfff
> > > > 	Memory behind bridge: fd100000-fd1fffff
> > > > 	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
> > > > 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> > > > 	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
> > > >                                       ^^^^
> > > > VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
> > > > 0x3df, and 0xa0000 - 0xbfff.  Device 2:00.0 of course doesn't report
> > > > these "ISA" ranges as they're implicit in the VGA class code.
> > > 
> > > OK but this appears behind a bridge.  So the bridge configuration tells
> > > the root complex where to send accesses to the VGA.
> > > 
> > > But qemu currently puts devices directly on root bus.
> > > 
> > > And as far as I can tell when we present devices directly on bus 0, we
> > > pretend these are integrated in the root complex. The spec seems to
> > > say explicitly that root complex integrated devices should not use legacy
> > > addresses or support hotplug. So I would be surprised if such one
> > > appears in real world.
> > > 
> > > Luckily guests do not seem to be worried as long as we use ACPI.
> > 
> > Yes, in fact I just figured out last night that Windows is unhappy with
> > assigned PCI devices on bus 0 that claim to be an endpoint in their PCIe
> > capability rather than an integrated endpoint.  We'll need to do extra
> > mangling of the PCIe capability to massage it into the guest visible
> > topology.
> 
> For now, just put you device behind an express bridge. This breaks acpi
> hotplug for now, but I'm looking into hotplug with bridges anyway.

We have the problem in both directions though, Endpoints that should be
Integrated Endpoints and Integrated Endpoints that should be Endpoints.
So I think we need to mangle the type.

> If you really need it I can give you a hack for hotplug too.
> 
> Of course express  does not allow hotplug of root complex parts
> but happens to work because we use ACPI.

That's a little odd.

> > Section 1.3.2.3 of the 3.0 spec says integrated endpoints must not
> > require I/O resources claimed through BAR(s).  VGA skirts around this by
> > not having the legacy resources claimed by BARs, but instead being
> > implicit.
> 
> Aha. I missed this point.
> 
> >  Are there other sections restricting legacy I/O?
> 
> One other interesting things is that VGA enable bit (for bridge control
> register) does not appear in express spec at all.

Yep, but it appears on hardware.

> > It's common that a plugin VGA card sits behind a root port where the
> > bridge registers tell us about VGA routing,
> > but integrated VGA devices
> > are often on bus 0 though, here's an example:
> > 
> > -[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
> >            +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller
> > 
> > Often these systems will disable the integrated graphics when a plugin
> > graphics is installed below a root port.  I'm not sure how the system
> > knows to route VGA to the integrated device vs the root port otherwise.
> 
> I am guessing it disables the integrated graphics?
> 
> > Here's a more interesting example:
> > 
> > -+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
> >  |           \-00.1  NVIDIA Corporation High Definition Audio Controller
> >  \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
> >              +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port
> > 
> > This system seems to have two host bridges with VGA behind each of them.
> > There's no bridge to control VGA routing, so I don't know how the
> > selection is done.
> 
> Is IO space disabled for the inactive card? Maybe that is how.

The card has BAR defined I/O space resources.  My guess is that VGA is
just statically routed to the integrated device and the secondary works
only in non-legacy mode until the BIOS switch is flipped, the integrated
device is hidden and VGA is switched to static routing for the nvidia
device.  I suppose that means I'll never be able to assign the nvidia to
a guest, at least not with any kind of legacy VGA support.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 10:49                   ` Michael S. Tsirkin
  2013-01-31 16:34                     ` Alex Williamson
@ 2013-01-31 21:22                     ` Benjamin Herrenschmidt
  2013-01-31 22:28                       ` Michael S. Tsirkin
  1 sibling, 1 reply; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-31 21:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alex Williamson, Alon Levy, Gerd Hoffmann, Anthony Liguori,
	qemu-ppc, David Gibson, Andreas Färber,
	Hervé Poussineau

On Thu, 2013-01-31 at 12:49 +0200, Michael S. Tsirkin wrote:

> OK but this appears behind a bridge.  So the bridge configuration tells
> the root complex where to send accesses to the VGA.

Sort-of, again the root complex isn't "sending" anything targeted here.
PCIe is point to point and any device is behind a bridge, real or
virtual.

> But qemu currently puts devices directly on root bus.

Sure, because qemu doesn't specifically model PCIe but something "else"

> And as far as I can tell when we present devices directly on bus 0, we
> pretend these are integrated in the root complex.

Right, it's a bit gross.

>  The spec seems to
> say explicitly that root complex integrated devices should not use legacy
> addresses or support hotplug. So I would be surprised if such one
> appears in real world.

Sure but that doesn't change the fact that there's no point in treating
things differently between PCI and PCIe for the sake of address range
decoding. The high level model remains the same.

> Luckily guests do not seem to be worried as long as we use ACPI.

Right, it all just looks like PCI to the guest anyway and is mostly
treated as such for the sake of routing and decoding (until you turn on
ARI but that's a different can of worms).

> > BTW, I've been working on vfio-pci support of VGA assignment which makes
> > use of the VGA arbiter in the host to manipulate the VGA Enable control
> > register, allowing us to select which device to access.  The qemu side
> > is simply registering memory regions for the VGA areas and expecting to
> > be used with -vga none, but I'll adopt whatever strategy we choose for
> > hard coded address range support.  Current base patches at the links
> > below.  Thanks,
> > 
> > Alex
> > 
> > https://github.com/awilliam/qemu-vfio/commit/ea2befa59010a429dcf13c10dbccdf8b64e82fbd
> > https://github.com/awilliam/linux-vfio/commit/bae182d929229cbf1eaeb01e5fad4f77f81a4c61

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 16:34                     ` Alex Williamson
  2013-01-31 21:11                       ` Michael S. Tsirkin
@ 2013-01-31 21:44                       ` Benjamin Herrenschmidt
  2013-01-31 22:37                         ` Michael S. Tsirkin
  2013-01-31 23:25                         ` Alex Williamson
  1 sibling, 2 replies; 57+ messages in thread
From: Benjamin Herrenschmidt @ 2013-01-31 21:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: KVM devel mailing list, Juan Quintela, Michael S. Tsirkin,
	Alexander Graf, qemu-devel, Alon Levy, Gerd Hoffmann,
	Anthony Liguori, qemu-ppc, David Gibson, Andreas Färber,
	Hervé Poussineau

On Thu, 2013-01-31 at 09:34 -0700, Alex Williamson wrote:
> > Luckily guests do not seem to be worried as long as we use ACPI.
> 
> Yes, in fact I just figured out last night that Windows is unhappy with
> assigned PCI devices on bus 0 that claim to be an endpoint in their PCIe
> capability rather than an integrated endpoint.  We'll need to do extra
> mangling of the PCIe capability to massage it into the guest visible
> topology.

If you are on bus 0, you need to either not have the capability, or if
you do, have it be root complex or RC intergrated endpoint. It's fair
game for any OS to assume that an endpoint will have a parent bridge
(either a RC or a downstream port) and to muck around with link control
etc...

Typically on my laptop with intel chipset, bus 0 has devices that just
don't have any PCIe capabilities.

> Section 1.3.2.3 of the 3.0 spec says integrated endpoints must not
> require I/O resources claimed through BAR(s).  VGA skirts around this by
> not having the legacy resources claimed by BARs, but instead being
> implicit.  Are there other sections restricting legacy I/O?

Right this is odd, I don't know why they put that in. Legacy endpoints
don't have that limitation and I doubt system software actually cares.

On the other hand, I suspect that doesn't apply if you simply doesn't
have the PCIe capability at all :-) IE, that's basically what my laptop
looks like here. The Intel graphics appears on bus 0 and has IO ports
mapped with a BAR and no PCIe cap.

Same with the on-chip SATA.

In fact they have a "PCI Advanced features" capability, but not PCIe.

Then they have a bunch of root complexes as siblings.

> It's common that a plugin VGA card sits behind a root port where the
> bridge registers tell us about VGA routing, but integrated VGA devices
> are often on bus 0 though, here's an example:
> 
> -[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
>            +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller
> 
> Often these systems will disable the integrated graphics when a plugin
> graphics is installed below a root port.  I'm not sure how the system
> knows to route VGA to the integrated device vs the root port otherwise.

It's a good question... I would say the "cleanest" way is to use the VGA
Enable bit of the root complex. If the RC is set to forward downstream,
then the plug-in card gets the VGA cycles, else, they go to the
integrated one (substractive decoding -style).

However, the PCI-E spec has removed that bit from the bridge control
register definition :-)

So whatever mechanism those chipsets use has to be somewhat proprietary.

On the other hand, I don't see it hurting to make our own "proprietary"
mechanism consist of using ... the bridge control VGA enable bit. IE.
The bit is not used in the PCIe spec and probably never will be so we
can use it for its original purpose.

> Here's a more interesting example:
> 
> -+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
>  |           \-00.1  NVIDIA Corporation High Definition Audio Controller
>  \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
>              +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port
> 
> This system seems to have two host bridges with VGA behind each of them.
> There's no bridge to control VGA routing, so I don't know how the
> selection is done.  It's possible the g210m never sees legacy VGA
> accesses in this mode.  This bios has another mode which makes the g210m
> the primary graphics and hides the integrated graphics, essentially the
> same as I mention above with hiding integrated endpoint graphics when
> plugin graphics are used.  Thanks,

Wait, those are two different busses ... and there's no bridge ? Is that
the funky x86 multi domain crackpot where you have multiple roots with
non overlapping bus numbers in the same domain ?

Ben.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 21:21                         ` Alex Williamson
@ 2013-01-31 22:20                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-31 22:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alon Levy, Gerd Hoffmann, Anthony Liguori, qemu-ppc, David Gibson,
	Andreas Färber, Hervé Poussineau

On Thu, Jan 31, 2013 at 02:21:50PM -0700, Alex Williamson wrote:
> On Thu, 2013-01-31 at 23:11 +0200, Michael S. Tsirkin wrote:
> > On Thu, Jan 31, 2013 at 09:34:03AM -0700, Alex Williamson wrote:
> > > 
> > > On Thu, 2013-01-31 at 12:49 +0200, Michael S. Tsirkin wrote:
> > > > On Wed, Jan 30, 2013 at 04:28:30PM -0700, Alex Williamson wrote:
> > > > > On Thu, 2013-01-31 at 10:02 +1100, Benjamin Herrenschmidt wrote:
> > > > > > On Thu, 2013-01-31 at 00:49 +0200, Michael S. Tsirkin wrote:
> > > > > > > > In practice they do (VGA at least)
> > > > > > > > 
> > > > > > > > >From a SW modelling standpoint, I don't think it's worth
> > > > > > > differentiating
> > > > > > > > PCI and PCIE.
> > > > > > > > 
> > > > > > > > Cheers,
> > > > > > > > Ben.
> > > > > > > 
> > > > > > > Interesting.
> > > > > > > Do you have such hardware? Could you please dump
> > > > > > > the output of lspci -vv?
> > > > > > 
> > > > > > Any ATI or nVidia card still supports hard decoding of VGA regions for
> > > > > > the sake of legacy operating systems and BIOSes :-) I don't know about
> > > > > > Intel but I suppose it's the same.
> > > > > 
> > > > > For example:
> > > > > 
> > > > > -[0000:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 p
> > > > >            +-04.0-[02]--+-00.0  Advanced Micro Devices [AMD] nee ATI Cedar PRO [Radeon HD 5450/6350]
> > > > > 
> > > > > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D) (prog-if 00 [Normal decode])
> > > > > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > > 	Latency: 0, Cache Line Size: 64 bytes
> > > > > 	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> > > > > 	I/O behind bridge: 0000c000-0000cfff
> > > > > 	Memory behind bridge: fd100000-fd1fffff
> > > > > 	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
> > > > > 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
> > > > > 	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
> > > > >                                       ^^^^
> > > > > VGA+ (VGA Enable) indicates positive decode of 0x3b0 - 0x3bb, 0x3c0 -
> > > > > 0x3df, and 0xa0000 - 0xbfff.  Device 2:00.0 of course doesn't report
> > > > > these "ISA" ranges as they're implicit in the VGA class code.
> > > > 
> > > > OK but this appears behind a bridge.  So the bridge configuration tells
> > > > the root complex where to send accesses to the VGA.
> > > > 
> > > > But qemu currently puts devices directly on root bus.
> > > > 
> > > > And as far as I can tell when we present devices directly on bus 0, we
> > > > pretend these are integrated in the root complex. The spec seems to
> > > > say explicitly that root complex integrated devices should not use legacy
> > > > addresses or support hotplug. So I would be surprised if such one
> > > > appears in real world.
> > > > 
> > > > Luckily guests do not seem to be worried as long as we use ACPI.
> > > 
> > > Yes, in fact I just figured out last night that Windows is unhappy with
> > > assigned PCI devices on bus 0 that claim to be an endpoint in their PCIe
> > > capability rather than an integrated endpoint.  We'll need to do extra
> > > mangling of the PCIe capability to massage it into the guest visible
> > > topology.
> > 
> > For now, just put you device behind an express bridge. This breaks acpi
> > hotplug for now, but I'm looking into hotplug with bridges anyway.
> 
> We have the problem in both directions though, Endpoints that should be
> Integrated Endpoints and Integrated Endpoints that should be Endpoints.
> So I think we need to mangle the type.
> 
> > If you really need it I can give you a hack for hotplug too.
> > 
> > Of course express  does not allow hotplug of root complex parts
> > but happens to work because we use ACPI.
> 
> That's a little odd.
> 
> > > Section 1.3.2.3 of the 3.0 spec says integrated endpoints must not
> > > require I/O resources claimed through BAR(s).  VGA skirts around this by
> > > not having the legacy resources claimed by BARs, but instead being
> > > implicit.
> > 
> > Aha. I missed this point.
> > 
> > >  Are there other sections restricting legacy I/O?
> > 
> > One other interesting things is that VGA enable bit (for bridge control
> > register) does not appear in express spec at all.
> 
> Yep, but it appears on hardware.
> 
> > > It's common that a plugin VGA card sits behind a root port where the
> > > bridge registers tell us about VGA routing,
> > > but integrated VGA devices
> > > are often on bus 0 though, here's an example:
> > > 
> > > -[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
> > >            +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller
> > > 
> > > Often these systems will disable the integrated graphics when a plugin
> > > graphics is installed below a root port.  I'm not sure how the system
> > > knows to route VGA to the integrated device vs the root port otherwise.
> > 
> > I am guessing it disables the integrated graphics?
> > 
> > > Here's a more interesting example:
> > > 
> > > -+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
> > >  |           \-00.1  NVIDIA Corporation High Definition Audio Controller
> > >  \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
> > >              +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port
> > > 
> > > This system seems to have two host bridges with VGA behind each of them.
> > > There's no bridge to control VGA routing, so I don't know how the
> > > selection is done.
> > 
> > Is IO space disabled for the inactive card? Maybe that is how.
> 
> The card has BAR defined I/O space resources.  My guess is that VGA is
> just statically routed to the integrated device and the secondary works
> only in non-legacy mode until the BIOS switch is flipped, the integrated
> device is hidden and VGA is switched to static routing for the nvidia
> device.  I suppose that means I'll never be able to assign the nvidia to
> a guest, at least not with any kind of legacy VGA support.  Thanks,
> 
> Alex

Can you check device control for both before and after the switch.

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 21:22                     ` Benjamin Herrenschmidt
@ 2013-01-31 22:28                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-31 22:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alex Williamson, Alon Levy, Gerd Hoffmann, Anthony Liguori,
	qemu-ppc, David Gibson, Andreas Färber,
	Hervé Poussineau

On Fri, Feb 01, 2013 at 08:22:33AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2013-01-31 at 12:49 +0200, Michael S. Tsirkin wrote:
> 
> > OK but this appears behind a bridge.  So the bridge configuration tells
> > the root complex where to send accesses to the VGA.
> 
> Sort-of, again the root complex isn't "sending" anything targeted here.
> PCIe is point to point and any device is behind a bridge, real or
> virtual.

I think we are arguing about terminology here. root complex
has a virtual bridge for each port, presumably it examines bridge control
for each port to know which link to use for a VGA access.
I say presumably because VGA enable bit in bridge control
is not listed in spec (but as Alex says some real
hardware has it implemented).

> > But qemu currently puts devices directly on root bus.
> 
> Sure, because qemu doesn't specifically model PCIe but something "else"
> 
> > And as far as I can tell when we present devices directly on bus 0, we
> > pretend these are integrated in the root complex.
> 
> Right, it's a bit gross.
> 
> >  The spec seems to
> > say explicitly that root complex integrated devices should not use legacy
> > addresses or support hotplug. So I would be surprised if such one
> > appears in real world.
> 
> Sure but that doesn't change the fact that there's no point in treating
> things differently between PCI and PCIe for the sake of address range
> decoding. The high level model remains the same.

Yes, and it's not by chance.

> > Luckily guests do not seem to be worried as long as we use ACPI.
> 
> Right, it all just looks like PCI to the guest anyway and is mostly
> treated as such for the sake of routing and decoding (until you turn on
> ARI but that's a different can of worms).

Right, ARI only affects config cycles.

> > > BTW, I've been working on vfio-pci support of VGA assignment which makes
> > > use of the VGA arbiter in the host to manipulate the VGA Enable control
> > > register, allowing us to select which device to access.  The qemu side
> > > is simply registering memory regions for the VGA areas and expecting to
> > > be used with -vga none, but I'll adopt whatever strategy we choose for
> > > hard coded address range support.  Current base patches at the links
> > > below.  Thanks,
> > > 
> > > Alex
> > > 
> > > https://github.com/awilliam/qemu-vfio/commit/ea2befa59010a429dcf13c10dbccdf8b64e82fbd
> > > https://github.com/awilliam/linux-vfio/commit/bae182d929229cbf1eaeb01e5fad4f77f81a4c61
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 21:44                       ` Benjamin Herrenschmidt
@ 2013-01-31 22:37                         ` Michael S. Tsirkin
  2013-01-31 23:25                         ` Alex Williamson
  1 sibling, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2013-01-31 22:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: KVM devel mailing list, Juan Quintela, Alexander Graf, qemu-devel,
	Alex Williamson, Alon Levy, Gerd Hoffmann, Anthony Liguori,
	qemu-ppc, David Gibson, Andreas Färber,
	Hervé Poussineau

> > Here's a more interesting example:
> > 
> > -+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
> >  |           \-00.1  NVIDIA Corporation High Definition Audio Controller
> >  \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
> >              +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port
> > 
> > This system seems to have two host bridges with VGA behind each of them.
> > There's no bridge to control VGA routing, so I don't know how the
> > selection is done.  It's possible the g210m never sees legacy VGA
> > accesses in this mode.  This bios has another mode which makes the g210m
> > the primary graphics and hides the integrated graphics, essentially the
> > same as I mention above with hiding integrated endpoint graphics when
> > plugin graphics are used.  Thanks,
> 
> Wait, those are two different busses ... and there's no bridge ? Is that
> the funky x86 multi domain crackpot where you have multiple roots with
> non overlapping bus numbers in the same domain ?
> 
> Ben.

Domain numbering on x86 comes from firmware and you know what Linus
said about firmware developers ...

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: KVM call minutes 2013-01-29 - Port I/O
  2013-01-31 21:44                       ` Benjamin Herrenschmidt
  2013-01-31 22:37                         ` Michael S. Tsirkin
@ 2013-01-31 23:25                         ` Alex Williamson
  1 sibling, 0 replies; 57+ messages in thread
From: Alex Williamson @ 2013-01-31 23:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: KVM devel mailing list, Michael S. Tsirkin, Juan Quintela,
	qemu-devel, Alexander Graf, Hervé Poussineau, Alon Levy,
	Gerd Hoffmann, Anthony Liguori, qemu-ppc, Andreas Färber,
	David Gibson

On Fri, 2013-02-01 at 08:44 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2013-01-31 at 09:34 -0700, Alex Williamson wrote:
> > > Luckily guests do not seem to be worried as long as we use ACPI.
> > 
> > Yes, in fact I just figured out last night that Windows is unhappy with
> > assigned PCI devices on bus 0 that claim to be an endpoint in their PCIe
> > capability rather than an integrated endpoint.  We'll need to do extra
> > mangling of the PCIe capability to massage it into the guest visible
> > topology.
> 
> If you are on bus 0, you need to either not have the capability, or if
> you do, have it be root complex or RC intergrated endpoint. It's fair
> game for any OS to assume that an endpoint will have a parent bridge
> (either a RC or a downstream port) and to muck around with link control
> etc...

Yep, converting Endpoint to Integrated Endpoint is just a matter of
changing the guest visible type and hiding all the link(2) cap, control,
and status.  Integrated Endpoint to Endpoint appears to require
inventing some link capabilities since it's a required field.  Legacy
Endpoint to Integrated Endpoint seems incompatible, but I don't think we
model anything at a level that would care.

We could also take the opportunity to remove the PCIe capability when
exposing devices on 440fx, but I'm nervous that would break drivers that
are dumb and look for it anyway.

> Typically on my laptop with intel chipset, bus 0 has devices that just
> don't have any PCIe capabilities.

Oddly the audio device seems to be the only one that consistently has
it.

> > Section 1.3.2.3 of the 3.0 spec says integrated endpoints must not
> > require I/O resources claimed through BAR(s).  VGA skirts around this by
> > not having the legacy resources claimed by BARs, but instead being
> > implicit.  Are there other sections restricting legacy I/O?
> 
> Right this is odd, I don't know why they put that in. Legacy endpoints
> don't have that limitation and I doubt system software actually cares.
> 
> On the other hand, I suspect that doesn't apply if you simply doesn't
> have the PCIe capability at all :-) IE, that's basically what my laptop
> looks like here. The Intel graphics appears on bus 0 and has IO ports
> mapped with a BAR and no PCIe cap.
> 
> Same with the on-chip SATA.
> 
> In fact they have a "PCI Advanced features" capability, but not PCIe.
> 
> Then they have a bunch of root complexes as siblings.
> 
> > It's common that a plugin VGA card sits behind a root port where the
> > bridge registers tell us about VGA routing, but integrated VGA devices
> > are often on bus 0 though, here's an example:
> > 
> > -[0000:00]-+-00.0  Intel Corporation 2nd Generation Core Processor Family DRAM Controller
> >            +-02.0  Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller
> > 
> > Often these systems will disable the integrated graphics when a plugin
> > graphics is installed below a root port.  I'm not sure how the system
> > knows to route VGA to the integrated device vs the root port otherwise.
> 
> It's a good question... I would say the "cleanest" way is to use the VGA
> Enable bit of the root complex. If the RC is set to forward downstream,
> then the plug-in card gets the VGA cycles, else, they go to the
> integrated one (substractive decoding -style).
> 
> However, the PCI-E spec has removed that bit from the bridge control
> register definition :-)
> 
> So whatever mechanism those chipsets use has to be somewhat proprietary.
> 
> On the other hand, I don't see it hurting to make our own "proprietary"
> mechanism consist of using ... the bridge control VGA enable bit. IE.
> The bit is not used in the PCIe spec and probably never will be so we
> can use it for its original purpose.

Yes, our emulated root ports should include this, otherwise we have
little hope of properly supporting multiple assigned (or emulated)
graphics devices, each behind their own root port.  So we need the
ability for multiple devices to register VGA address (1 per bus?) and
change MemoryRegion routing just like hardware does.

> > Here's a more interesting example:
> > 
> > -+-[0000:01]-+-00.0  NVIDIA Corporation GT218 [GeForce G210M]
> >  |           \-00.1  NVIDIA Corporation High Definition Audio Controller
> >  \-[0000:00]-+-00.0  Intel Corporation Mobile 4 Series Chipset Memory Controller Hub
> >              +-01.0  Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port
> > 
> > This system seems to have two host bridges with VGA behind each of them.
> > There's no bridge to control VGA routing, so I don't know how the
> > selection is done.  It's possible the g210m never sees legacy VGA
> > accesses in this mode.  This bios has another mode which makes the g210m
> > the primary graphics and hides the integrated graphics, essentially the
> > same as I mention above with hiding integrated endpoint graphics when
> > plugin graphics are used.  Thanks,
> 
> Wait, those are two different busses ... and there's no bridge ? Is that
> the funky x86 multi domain crackpot where you have multiple roots with
> non overlapping bus numbers in the same domain ?

Perhaps.  This is an Intel GS45[1], section 4 talks about VGA routing
rules.  Thanks,

Alex

[1] http://www.intel.com/Assets/PDF/datasheet/320122.pdf

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2013-01-31 23:25 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
2013-01-29 16:01 ` Paolo Bonzini
2013-01-29 16:47   ` Anthony Liguori
2013-01-29 17:36     ` Paolo Bonzini
2013-01-29 20:53 ` Alexander Graf
2013-01-29 21:39   ` Anthony Liguori
2013-01-30  7:02     ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
2013-01-30  8:39       ` What to do about non-qdevified devices? Andreas Färber
2013-01-30 10:36       ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
2013-01-30 12:35         ` What to do about non-qdevified devices? Markus Armbruster
2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
2013-01-30 16:58             ` Paolo Bonzini
2013-01-30 17:14               ` [Qemu-devel] " Andreas Färber
2013-01-31 18:48             ` Markus Armbruster
2013-01-30 14:37           ` [Qemu-devel] " Anthony Liguori
2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
2013-01-30 11:48   ` Peter Maydell
2013-01-30 12:31     ` Michael S. Tsirkin
2013-01-30 13:24       ` [Qemu-devel] " Anthony Liguori
2013-01-30 14:11         ` Michael S. Tsirkin
2013-01-30 12:32     ` Alexander Graf
2013-01-30 13:09     ` Markus Armbruster
2013-01-30 15:08       ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:55     ` Andreas Färber
2013-01-30 20:20       ` Michael S. Tsirkin
2013-01-30 20:33         ` [Qemu-devel] " Andreas Färber
2013-01-30 20:55           ` Michael S. Tsirkin
2013-01-30 13:59   ` [Qemu-devel] " Anthony Liguori
2013-01-30 21:05     ` Benjamin Herrenschmidt
2013-01-30 21:39       ` [Qemu-devel] " Anthony Liguori
2013-01-30 21:54         ` Benjamin Herrenschmidt
2013-01-30 22:20         ` Michael S. Tsirkin
2013-01-30 22:32           ` Benjamin Herrenschmidt
2013-01-30 22:49             ` Michael S. Tsirkin
2013-01-30 23:02               ` Benjamin Herrenschmidt
2013-01-30 23:28                 ` Alex Williamson
2013-01-31 10:49                   ` Michael S. Tsirkin
2013-01-31 16:34                     ` Alex Williamson
2013-01-31 21:11                       ` Michael S. Tsirkin
2013-01-31 21:21                         ` Alex Williamson
2013-01-31 22:20                           ` Michael S. Tsirkin
2013-01-31 21:44                       ` Benjamin Herrenschmidt
2013-01-31 22:37                         ` Michael S. Tsirkin
2013-01-31 23:25                         ` Alex Williamson
2013-01-31 21:22                     ` Benjamin Herrenschmidt
2013-01-31 22:28                       ` Michael S. Tsirkin
2013-01-30 15:45   ` [Qemu-devel] " Gerd Hoffmann
2013-01-30 16:33     ` Anthony Liguori
2013-01-30 16:54       ` Andreas Färber
2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
2013-01-30 20:08           ` Michael S. Tsirkin
2013-01-30 20:19             ` Peter Maydell
2013-01-30 20:19           ` [Qemu-devel] " Andreas Färber
2013-01-30 21:07         ` Benjamin Herrenschmidt
2013-01-30 21:42           ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:08       ` Paolo Bonzini
2013-01-30 21:08         ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox