All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Andreas Färber" <afaerber@suse.de>,
	"Juan Quintela" <quintela@redhat.com>,
	"KVM devel mailing list" <kvm@vger.kernel.org>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Alexander Graf" <agraf@suse.de>, qemu-ppc <qemu-ppc@nongnu.org>,
	"Hervé Poussineau" <hpoussin@reactos.org>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Alon Levy" <alevy@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Date: Wed, 30 Jan 2013 15:39:34 -0600	[thread overview]
Message-ID: <87a9rq5p0p.fsf@codemonkey.ws> (raw)
In-Reply-To: <1359579910.23274.31.camel@pasglop>

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
>> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
>> the top bit is set determines whether it's a "PIO" transaction or an
>> "MMIO" transaction.  A large chunk of that address space is invalid of
>> course.
>> 
>> PCI has a 65 bit address space too.  The 65th bit determines whether
>> it's an IO transaction or an MMIO transaction.
>
> This is somewhat an over simplification since IO and MMIO differs in
> other ways, such as ordering rules :-) But for the sake of memory
> regions decoding I suppose it will do.
>
>> For architectures that only have a 64-bit address space, what the PCI
>> controller typically does is pick a 16-bit window within that address
>> space to map to a PCI address with the 65th bit set.
>
> Sort-of yes. The window doesn't have to be 16-bit (we commonly have
> larger IO space windows on powerpc) and there's a window per host
> bridge, so there's effectively more than one IO space (as there is more
> than one PCI MMIO space, with only a window off the CPU space routed to
> each brigde).

Ack.

> Making a hard wired assumption that the PCI (MMIO and IO) space relates
> directly to the CPU bus space is wrong on pretty much all !x86
> architectures.

Ack.

>
>  .../...
>
> You make it sound like substractive decode is a chipset hack. It's not,
> it's specified in the PCI spec.

It's a hack :-)  It's a well specified hack, but it's still a hack.

>> 1) A chipset will route any non-positively decoded IO transaction (65th
>>    bit set) to a single end point (usually the ISA-bridge).  Which one it
>>    chooses is up to the chipset.  This is called subtractive decoding
>>    because the PCI bus will wait multiple cycles for that device to
>>    claim the transaction before bouncing it.
>
> This is not a chipset matter. It's the ISA bridge itself that does
> substractive decoding.

The PCI bus can have one end point that that can be the target for
subtractive decoding (not hard decoding, subtractive decoding).  IOW,
you can only have a single ISA Bridge within a single PCI domain.

You are right--chipset is the wrong word.  I'm used to thinking in terms
of only a single domain :-)

> There also exists P2P bridges doing such substractive
> decoding, this used to be fairly common with transparent bridges used for
> laptop docking.

I'm not sure I understand how this would work.  How can two devices on
the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
even says:

"Subtractive decoding can be implemented by only one device on the bus
 since it accepts all accesses not positively decoded by some other
 agent."

>> 2) There are special hacks in most PCI chipsets to route very specific
>>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
>>    go to the first VGA device.  Legacy IDE IO transactions go to the first
>>    IDE device.  This doesn't need to be programmed in the BARs.  It will
>>    just happen.
>
> This is also mostly not a hack in the chipset. It's a well defined behaviour
> for legacy devices, sometimes call hard decoding. Of course often those devices
> are built into the chipset but they don't have to. Plug-in VGA devices will
> hard decode legacy VGA regions for both IO and MMIO by default (this can be
> disabled on most of them nowadays) for example. This has nothing to do with
> the chipset.

So I understand what you're saying re: PCI because the devices actually
assert DEVSEL to indicate that they handle the transaction.

But for PCI-E, doesn't the controller have to expressly identify what
the target is?  Is this done with the device class?

> There's a specific bit in P2P bridge to control the forwarding of legacy
> transaction downstream (and VGA palette snoops), this is also fully specified
> in the PCI spec.

Ack.

>
>> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
>>    sent to the ISA-bridge (because it's faster this way).
>
> Chipsets don't "send to a bridge". It's the bridge itself that
> decodes.

With PCI...

>> Notice the lack of the word "ISA" in all of this other than describing
>> the PCI class of an end point.
>
> ISA is only relevant to the extent that the "legacy" regions of IO space
> originate from the original ISA addresses of devices (VGA, IDE, etc...)
> and to the extent that an ISA bus might still be present which will get
> the transactions that nothing else have decoded in that space.

Ack.

>  
>> So how should this be modeled?
>> 
>> On x86, the CPU has a pio address space.  That can propagate down
>> through the PCI bus which is what we do today.
>> 
>> On !x86, the PCI controller ought to setup a MemoryRegion for
> downstream
>> PIO that devices can use to register on.
>> 
>> We probably need to do something like change the PCI VGA devices to
>> export a MemoryRegion and allow the PCI controller to device how to
>> register that as a subregion.
>
> The VGA device should just register fixed address port IOs the same way
> it would register an IO BAR. Essentially, hard coded IO addresses (or
> memory, VGA does memory too, don't forget that) are equivalent to having
> an invisible BAR with a fixed value in it.

Ack.

>
> There should be no "global port IO" because that concept is broken on
> real multi-domain setups. Those "legacy" address ranges are just
> hard-wired sub regions of the normal PCI space on which the device sits
> on (unless you start doing real non-PCI ISA x86).

So, I think what you're suggesting (and I agree with), is that each PCI
device should export one or more MemoryRegions and indicate what the
MemoryRegions are for.

Potential options are:

 - MMIO BAR
 - PIO BAR
 - IDE hard decode
 - VGA hard decode
 - subtractive decode

I'm very much in agreement if that's what you're suggesting.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

  reply	other threads:[~2013-01-30 21:39 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29 15:41 KVM call minutes 2013-01-29 Juan Quintela
2013-01-29 16:01 ` Paolo Bonzini
2013-01-29 16:47   ` Anthony Liguori
2013-01-29 17:36     ` Paolo Bonzini
2013-01-29 20:53 ` Alexander Graf
2013-01-29 21:39   ` Anthony Liguori
2013-01-30  7:02     ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Markus Armbruster
2013-01-30  8:39       ` What to do about non-qdevified devices? Andreas Färber
2013-01-30 10:36       ` What to do about non-qdevified devices? (was: KVM call minutes 2013-01-29) Peter Maydell
2013-01-30 12:35         ` What to do about non-qdevified devices? Markus Armbruster
2013-01-30 13:44           ` [Qemu-devel] " Andreas Färber
2013-01-30 16:58             ` Paolo Bonzini
2013-01-30 17:14               ` [Qemu-devel] " Andreas Färber
2013-01-31 18:48             ` Markus Armbruster
2013-01-30 14:37           ` [Qemu-devel] " Anthony Liguori
2013-01-30 11:39 ` [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O Andreas Färber
2013-01-30 11:48   ` Peter Maydell
2013-01-30 12:31     ` Michael S. Tsirkin
2013-01-30 13:24       ` [Qemu-devel] " Anthony Liguori
2013-01-30 14:11         ` Michael S. Tsirkin
2013-01-30 12:32     ` Alexander Graf
2013-01-30 13:09     ` Markus Armbruster
2013-01-30 15:08       ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:55     ` Andreas Färber
2013-01-30 20:20       ` Michael S. Tsirkin
2013-01-30 20:33         ` [Qemu-devel] " Andreas Färber
2013-01-30 20:55           ` Michael S. Tsirkin
2013-01-30 13:59   ` [Qemu-devel] " Anthony Liguori
2013-01-30 21:05     ` Benjamin Herrenschmidt
2013-01-30 21:39       ` Anthony Liguori [this message]
2013-01-30 21:54         ` Benjamin Herrenschmidt
2013-01-30 22:20         ` Michael S. Tsirkin
2013-01-30 22:32           ` Benjamin Herrenschmidt
2013-01-30 22:49             ` Michael S. Tsirkin
2013-01-30 23:02               ` Benjamin Herrenschmidt
2013-01-30 23:28                 ` Alex Williamson
2013-01-31 10:49                   ` Michael S. Tsirkin
2013-01-31 16:34                     ` Alex Williamson
2013-01-31 21:11                       ` Michael S. Tsirkin
2013-01-31 21:21                         ` Alex Williamson
2013-01-31 22:20                           ` Michael S. Tsirkin
2013-01-31 21:44                       ` Benjamin Herrenschmidt
2013-01-31 22:37                         ` Michael S. Tsirkin
2013-01-31 23:25                         ` Alex Williamson
2013-01-31 21:22                     ` Benjamin Herrenschmidt
2013-01-31 22:28                       ` Michael S. Tsirkin
2013-01-30 15:45   ` [Qemu-devel] " Gerd Hoffmann
2013-01-30 16:33     ` Anthony Liguori
2013-01-30 16:54       ` Andreas Färber
2013-01-30 17:29         ` [Qemu-devel] " Anthony Liguori
2013-01-30 20:08           ` Michael S. Tsirkin
2013-01-30 20:19             ` Peter Maydell
2013-01-30 20:19           ` [Qemu-devel] " Andreas Färber
2013-01-30 21:07         ` Benjamin Herrenschmidt
2013-01-30 21:42           ` [Qemu-devel] " Anthony Liguori
2013-01-30 17:08       ` Paolo Bonzini
2013-01-30 21:08         ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9rq5p0p.fsf@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=afaerber@suse.de \
    --cc=agraf@suse.de \
    --cc=alevy@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=hpoussin@reactos.org \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.