public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Andreas Färber" <afaerber@suse.de>,
	"Juan Quintela" <quintela@redhat.com>,
	"KVM devel mailing list" <kvm@vger.kernel.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Alexander Graf" <agraf@suse.de>, qemu-ppc <qemu-ppc@nongnu.org>,
	"Hervé Poussineau" <hpoussin@reactos.org>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Alon Levy" <alevy@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Date: Wed, 30 Jan 2013 15:39:34 -0600	[thread overview]
Message-ID: <87a9rq5p0p.fsf@codemonkey.ws> (raw)
In-Reply-To: <1359579910.23274.31.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
>> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
>> the top bit is set determines whether it's a "PIO" transaction or an
>> "MMIO" transaction.  A large chunk of that address space is invalid of
>> course.
>> 
>> PCI has a 65 bit address space too.  The 65th bit determines whether
>> it's an IO transaction or an MMIO transaction.
>
> This is somewhat an over simplification since IO and MMIO differs in
> other ways, such as ordering rules :-) But for the sake of memory
> regions decoding I suppose it will do.
>
>> For architectures that only have a 64-bit address space, what the PCI
>> controller typically does is pick a 16-bit window within that address
>> space to map to a PCI address with the 65th bit set.
>
> Sort-of yes. The window doesn't have to be 16-bit (we commonly have
> larger IO space windows on powerpc) and there's a window per host
> bridge, so there's effectively more than one IO space (as there is more
> than one PCI MMIO space, with only a window off the CPU space routed to
> each brigde).

Ack.

> Making a hard wired assumption that the PCI (MMIO and IO) space relates
> directly to the CPU bus space is wrong on pretty much all !x86
> architectures.

Ack.

>
>  .../...
>
> You make it sound like substractive decode is a chipset hack. It's not,
> it's specified in the PCI spec.

It's a hack :-)  It's a well specified hack, but it's still a hack.

>> 1) A chipset will route any non-positively decoded IO transaction (65th
>>    bit set) to a single end point (usually the ISA-bridge).  Which one it
>>    chooses is up to the chipset.  This is called subtractive decoding
>>    because the PCI bus will wait multiple cycles for that device to
>>    claim the transaction before bouncing it.
>
> This is not a chipset matter. It's the ISA bridge itself that does
> substractive decoding.

The PCI bus can have one end point that that can be the target for
subtractive decoding (not hard decoding, subtractive decoding).  IOW,
you can only have a single ISA Bridge within a single PCI domain.

You are right--chipset is the wrong word.  I'm used to thinking in terms
of only a single domain :-)

> There also exists P2P bridges doing such substractive
> decoding, this used to be fairly common with transparent bridges used for
> laptop docking.

I'm not sure I understand how this would work.  How can two devices on
the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
even says:

"Subtractive decoding can be implemented by only one device on the bus
 since it accepts all accesses not positively decoded by some other
 agent."

>> 2) There are special hacks in most PCI chipsets to route very specific
>>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
>>    go to the first VGA device.  Legacy IDE IO transactions go to the first
>>    IDE device.  This doesn't need to be programmed in the BARs.  It will
>>    just happen.
>
> This is also mostly not a hack in the chipset. It's a well defined behaviour
> for legacy devices, sometimes call hard decoding. Of course often those devices
> are built into the chipset but they don't have to. Plug-in VGA devices will
> hard decode legacy VGA regions for both IO and MMIO by default (this can be
> disabled on most of them nowadays) for example. This has nothing to do with
> the chipset.

So I understand what you're saying re: PCI because the devices actually
assert DEVSEL to indicate that they handle the transaction.

But for PCI-E, doesn't the controller have to expressly identify what
the target is?  Is this done with the device class?

> There's a specific bit in P2P bridge to control the forwarding of legacy
> transaction downstream (and VGA palette snoops), this is also fully specified
> in the PCI spec.

Ack.

>
>> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
>>    sent to the ISA-bridge (because it's faster this way).
>
> Chipsets don't "send to a bridge". It's the bridge itself that
> decodes.

With PCI...

>> Notice the lack of the word "ISA" in all of this other than describing
>> the PCI class of an end point.
>
> ISA is only relevant to the extent that the "legacy" regions of IO space
> originate from the original ISA addresses of devices (VGA, IDE, etc...)
> and to the extent that an ISA bus might still be present which will get
> the transactions that nothing else have decoded in that space.

Ack.

>  
>> So how should this be modeled?
>> 
>> On x86, the CPU has a pio address space.  That can propagate down
>> through the PCI bus which is what we do today.
>> 
>> On !x86, the PCI controller ought to setup a MemoryRegion for
> downstream
>> PIO that devices can use to register on.
>> 
>> We probably need to do something like change the PCI VGA devices to
>> export a MemoryRegion and allow the PCI controller to device how to
>> register that as a subregion.
>
> The VGA device should just register fixed address port IOs the same way
> it would register an IO BAR. Essentially, hard coded IO addresses (or
> memory, VGA does memory too, don't forget that) are equivalent to having
> an invisible BAR with a fixed value in it.

Ack.

>
> There should be no "global port IO" because that concept is broken on
> real multi-domain setups. Those "legacy" address ranges are just
> hard-wired sub regions of the normal PCI space on which the device sits
> on (unless you start doing real non-PCI ISA x86).

So, I think what you're suggesting (and I agree with), is that each PCI
device should export one or more MemoryRegions and indicate what the
MemoryRegions are for.

Potential options are:

 - MMIO BAR
 - PIO BAR
 - IDE hard decode
 - VGA hard decode
 - subtractive decode

I'm very much in agreement if that's what you're suggesting.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

  reply	other threads:[~2013-01-30 21:39 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
2013-01-29 16:01 ` Paolo Bonzini
2013-01-29 16:47   ` Anthony Liguori
2013-01-29 17:36     ` Paolo Bonzini
2013-01-29 20:53 ` Alexander Graf
2013-01-29 21:39   ` Anthony Liguori
2013-01-30  7:02     ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
2013-01-30  8:39       ` What to do about non-qdevified devices? Andreas Färber
2013-01-30 10:36       ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
2013-01-30 12:35         ` What to do about non-qdevified devices? Markus Armbruster
2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
2013-01-30 16:58             ` Paolo Bonzini
2013-01-30 17:14               ` [Qemu-devel] " Andreas Färber
2013-01-31 18:48             ` Markus Armbruster
2013-01-30 14:37           ` [Qemu-devel] " Anthony Liguori
2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
2013-01-30 11:48   ` Peter Maydell
2013-01-30 12:31     ` Michael S. Tsirkin
2013-01-30 13:24       ` [Qemu-devel] " Anthony Liguori
2013-01-30 14:11         ` Michael S. Tsirkin
2013-01-30 12:32     ` Alexander Graf
2013-01-30 13:09     ` Markus Armbruster
2013-01-30 15:08       ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:55     ` Andreas Färber
2013-01-30 20:20       ` Michael S. Tsirkin
2013-01-30 20:33         ` [Qemu-devel] " Andreas Färber
2013-01-30 20:55           ` Michael S. Tsirkin
2013-01-30 13:59   ` [Qemu-devel] " Anthony Liguori
2013-01-30 21:05     ` Benjamin Herrenschmidt
2013-01-30 21:39       ` Anthony Liguori [this message]
2013-01-30 21:54         ` Benjamin Herrenschmidt
2013-01-30 22:20         ` Michael S. Tsirkin
2013-01-30 22:32           ` Benjamin Herrenschmidt
2013-01-30 22:49             ` Michael S. Tsirkin
2013-01-30 23:02               ` Benjamin Herrenschmidt
2013-01-30 23:28                 ` Alex Williamson
2013-01-31 10:49                   ` Michael S. Tsirkin
2013-01-31 16:34                     ` Alex Williamson
2013-01-31 21:11                       ` Michael S. Tsirkin
2013-01-31 21:21                         ` Alex Williamson
2013-01-31 22:20                           ` Michael S. Tsirkin
2013-01-31 21:44                       ` Benjamin Herrenschmidt
2013-01-31 22:37                         ` Michael S. Tsirkin
2013-01-31 23:25                         ` Alex Williamson
2013-01-31 21:22                     ` Benjamin Herrenschmidt
2013-01-31 22:28                       ` Michael S. Tsirkin
2013-01-30 15:45   ` [Qemu-devel] " Gerd Hoffmann
2013-01-30 16:33     ` Anthony Liguori
2013-01-30 16:54       ` Andreas Färber
2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
2013-01-30 20:08           ` Michael S. Tsirkin
2013-01-30 20:19             ` Peter Maydell
2013-01-30 20:19           ` [Qemu-devel] " Andreas Färber
2013-01-30 21:07         ` Benjamin Herrenschmidt
2013-01-30 21:42           ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:08       ` Paolo Bonzini
2013-01-30 21:08         ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9rq5p0p.fsf@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=afaerber@suse.de \
    --cc=agraf@suse.de \
    --cc=alevy@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=hpoussin@reactos.org \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox