From: "Michael S. Tsirkin" <mst@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: "KVM devel mailing list" <kvm@vger.kernel.org>,
"Juan Quintela" <quintela@redhat.com>,
qemu-devel <qemu-devel@nongnu.org>,
"Alexander Graf" <agraf@suse.de>, "Alon Levy" <alevy@redhat.com>,
qemu-ppc <qemu-ppc@nongnu.org>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Hervé Poussineau" <hpoussin@reactos.org>,
"Andreas Färber" <afaerber@suse.de>,
"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: KVM call minutes 2013-01-29 - Port I/O
Date: Thu, 31 Jan 2013 00:20:18 +0200 [thread overview]
Message-ID: <20130130222017.GE6544@redhat.com> (raw)
In-Reply-To: <87a9rq5p0p.fsf@codemonkey.ws>
On Wed, Jan 30, 2013 at 03:39:34PM -0600, Anthony Liguori wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
> > On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
> >> An x86 CPU has a MMIO capability that's essentially 65 bits. Whether
> >> the top bit is set determines whether it's a "PIO" transaction or an
> >> "MMIO" transaction. A large chunk of that address space is invalid of
> >> course.
> >>
> >> PCI has a 65 bit address space too. The 65th bit determines whether
> >> it's an IO transaction or an MMIO transaction.
> >
> > This is somewhat an over simplification since IO and MMIO differs in
> > other ways, such as ordering rules :-) But for the sake of memory
> > regions decoding I suppose it will do.
> >
> >> For architectures that only have a 64-bit address space, what the PCI
> >> controller typically does is pick a 16-bit window within that address
> >> space to map to a PCI address with the 65th bit set.
> >
> > Sort-of yes. The window doesn't have to be 16-bit (we commonly have
> > larger IO space windows on powerpc) and there's a window per host
> > bridge, so there's effectively more than one IO space (as there is more
> > than one PCI MMIO space, with only a window off the CPU space routed to
> > each brigde).
>
> Ack.
>
> > Making a hard wired assumption that the PCI (MMIO and IO) space relates
> > directly to the CPU bus space is wrong on pretty much all !x86
> > architectures.
>
> Ack.
>
> >
> > .../...
> >
> > You make it sound like substractive decode is a chipset hack. It's not,
> > it's specified in the PCI spec.
>
> It's a hack :-) It's a well specified hack, but it's still a hack.
>
> >> 1) A chipset will route any non-positively decoded IO transaction (65th
> >> bit set) to a single end point (usually the ISA-bridge). Which one it
> >> chooses is up to the chipset. This is called subtractive decoding
> >> because the PCI bus will wait multiple cycles for that device to
> >> claim the transaction before bouncing it.
> >
> > This is not a chipset matter. It's the ISA bridge itself that does
> > substractive decoding.
>
> The PCI bus can have one end point that that can be the target for
> subtractive decoding (not hard decoding, subtractive decoding). IOW,
> you can only have a single ISA Bridge within a single PCI domain.
>
> You are right--chipset is the wrong word. I'm used to thinking in terms
> of only a single domain :-)
>
> > There also exists P2P bridges doing such substractive
> > decoding, this used to be fairly common with transparent bridges used for
> > laptop docking.
>
> I'm not sure I understand how this would work. How can two devices on
> the same PCI domain both do subtractive decoding? Indeed, the PCI spec
> even says:
>
> "Subtractive decoding can be implemented by only one device on the bus
> since it accepts all accesses not positively decoded by some other
> agent."
>
> >> 2) There are special hacks in most PCI chipsets to route very specific
> >> addresses ranges to certain devices. Namely, legacy VGA IO transactions
> >> go to the first VGA device. Legacy IDE IO transactions go to the first
> >> IDE device. This doesn't need to be programmed in the BARs. It will
> >> just happen.
> >
> > This is also mostly not a hack in the chipset. It's a well defined behaviour
> > for legacy devices, sometimes call hard decoding. Of course often those devices
> > are built into the chipset but they don't have to. Plug-in VGA devices will
> > hard decode legacy VGA regions for both IO and MMIO by default (this can be
> > disabled on most of them nowadays) for example. This has nothing to do with
> > the chipset.
>
> So I understand what you're saying re: PCI because the devices actually
> assert DEVSEL to indicate that they handle the transaction.
>
> But for PCI-E, doesn't the controller have to expressly identify what
> the target is? Is this done with the device class?
Well you can have a PCI bridge and a legacy device behind that.
I think real PCI express devices can not be mapped onto legacy address
ranges.
> > There's a specific bit in P2P bridge to control the forwarding of legacy
> > transaction downstream (and VGA palette snoops), this is also fully specified
> > in the PCI spec.
>
> Ack.
>
> >
> >> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
> >> sent to the ISA-bridge (because it's faster this way).
> >
> > Chipsets don't "send to a bridge". It's the bridge itself that
> > decodes.
>
> With PCI...
>
> >> Notice the lack of the word "ISA" in all of this other than describing
> >> the PCI class of an end point.
> >
> > ISA is only relevant to the extent that the "legacy" regions of IO space
> > originate from the original ISA addresses of devices (VGA, IDE, etc...)
> > and to the extent that an ISA bus might still be present which will get
> > the transactions that nothing else have decoded in that space.
>
> Ack.
>
> >
> >> So how should this be modeled?
> >>
> >> On x86, the CPU has a pio address space. That can propagate down
> >> through the PCI bus which is what we do today.
> >>
> >> On !x86, the PCI controller ought to setup a MemoryRegion for
> > downstream
> >> PIO that devices can use to register on.
> >>
> >> We probably need to do something like change the PCI VGA devices to
> >> export a MemoryRegion and allow the PCI controller to device how to
> >> register that as a subregion.
> >
> > The VGA device should just register fixed address port IOs the same way
> > it would register an IO BAR. Essentially, hard coded IO addresses (or
> > memory, VGA does memory too, don't forget that) are equivalent to having
> > an invisible BAR with a fixed value in it.
>
> Ack.
>
> >
> > There should be no "global port IO" because that concept is broken on
> > real multi-domain setups. Those "legacy" address ranges are just
> > hard-wired sub regions of the normal PCI space on which the device sits
> > on (unless you start doing real non-PCI ISA x86).
>
> So, I think what you're suggesting (and I agree with), is that each PCI
> device should export one or more MemoryRegions and indicate what the
> MemoryRegions are for.
>
> Potential options are:
>
> - MMIO BAR
> - PIO BAR
> - IDE hard decode
> - VGA hard decode
> - subtractive decode
>
> I'm very much in agreement if that's what you're suggesting.
>
> Regards,
>
> Anthony Liguori
>
> >
> > Cheers,
> > Ben.
next prev parent reply other threads:[~2013-01-30 22:20 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
2013-01-29 16:01 ` Paolo Bonzini
2013-01-29 16:47 ` Anthony Liguori
2013-01-29 17:36 ` Paolo Bonzini
2013-01-29 20:53 ` Alexander Graf
2013-01-29 21:39 ` Anthony Liguori
2013-01-30 7:02 ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
2013-01-30 8:39 ` What to do about non-qdevified devices? Andreas Färber
2013-01-30 10:36 ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
2013-01-30 12:35 ` What to do about non-qdevified devices? Markus Armbruster
2013-01-30 13:44 ` [Qemu-devel] " Andreas Färber
2013-01-30 16:58 ` Paolo Bonzini
2013-01-30 17:14 ` [Qemu-devel] " Andreas Färber
2013-01-31 18:48 ` Markus Armbruster
2013-01-30 14:37 ` [Qemu-devel] " Anthony Liguori
2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
2013-01-30 11:48 ` Peter Maydell
2013-01-30 12:31 ` Michael S. Tsirkin
2013-01-30 13:24 ` [Qemu-devel] " Anthony Liguori
2013-01-30 14:11 ` Michael S. Tsirkin
2013-01-30 12:32 ` Alexander Graf
2013-01-30 13:09 ` Markus Armbruster
2013-01-30 15:08 ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:55 ` Andreas Färber
2013-01-30 20:20 ` Michael S. Tsirkin
2013-01-30 20:33 ` [Qemu-devel] " Andreas Färber
2013-01-30 20:55 ` Michael S. Tsirkin
2013-01-30 13:59 ` [Qemu-devel] " Anthony Liguori
2013-01-30 21:05 ` Benjamin Herrenschmidt
2013-01-30 21:39 ` [Qemu-devel] " Anthony Liguori
2013-01-30 21:54 ` Benjamin Herrenschmidt
2013-01-30 22:20 ` Michael S. Tsirkin [this message]
2013-01-30 22:32 ` Benjamin Herrenschmidt
2013-01-30 22:49 ` Michael S. Tsirkin
2013-01-30 23:02 ` Benjamin Herrenschmidt
2013-01-30 23:28 ` Alex Williamson
2013-01-31 10:49 ` Michael S. Tsirkin
2013-01-31 16:34 ` Alex Williamson
2013-01-31 21:11 ` Michael S. Tsirkin
2013-01-31 21:21 ` Alex Williamson
2013-01-31 22:20 ` Michael S. Tsirkin
2013-01-31 21:44 ` Benjamin Herrenschmidt
2013-01-31 22:37 ` Michael S. Tsirkin
2013-01-31 23:25 ` Alex Williamson
2013-01-31 21:22 ` Benjamin Herrenschmidt
2013-01-31 22:28 ` Michael S. Tsirkin
2013-01-30 15:45 ` [Qemu-devel] " Gerd Hoffmann
2013-01-30 16:33 ` Anthony Liguori
2013-01-30 16:54 ` Andreas Färber
2013-01-30 17:29 ` [Qemu-devel] " Anthony Liguori
2013-01-30 20:08 ` Michael S. Tsirkin
2013-01-30 20:19 ` Peter Maydell
2013-01-30 20:19 ` [Qemu-devel] " Andreas Färber
2013-01-30 21:07 ` Benjamin Herrenschmidt
2013-01-30 21:42 ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:08 ` Paolo Bonzini
2013-01-30 21:08 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130130222017.GE6544@redhat.com \
--to=mst@redhat.com \
--cc=afaerber@suse.de \
--cc=agraf@suse.de \
--cc=alevy@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=david@gibson.dropbear.id.au \
--cc=hpoussin@reactos.org \
--cc=kraxel@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox