From: Anthony Liguori <anthony@codemonkey.ws>
To: Olivier Galibert <galibert@pobox.com>
Cc: Pekka Enberg <penberg@kernel.org>, Ingo Molnar <mingo@elte.hu>,
Avi Kivity <avi@redhat.com>,
linux-kernel@vger.kernel.org, aarcange@redhat.com,
mtosatti@redhat.com, kvm@vger.kernel.org, joro@8bytes.org,
penberg@cs.helsinki.fi, asias.hejun@gmail.com,
gorcunov@gmail.com
Subject: Re: [ANNOUNCE] Native Linux KVM tool
Date: Sat, 09 Apr 2011 21:54:38 -0500 [thread overview]
Message-ID: <4DA11BEE.1080500@codemonkey.ws> (raw)
In-Reply-To: <20110409182347.GB27431@dspnet.fr>
On 04/09/2011 01:23 PM, Olivier Galibert wrote:
> On Fri, Apr 08, 2011 at 09:00:43AM -0500, Anthony Liguori wrote:
>> Really, having a flat table doesn't make sense. You should just send
>> everything to an i440fx directly. Then the i440fx should decode what it
>> can, and send it to the next level, and so forth.
> No you shouldn't. The i440fx should merge and arbitrate the mappings
> and then push *direct* links to the handling functions at the top
> level. Mapping changes don't happen often on modern hardware, and
> decoding is expensive.
Decoding is not all that expensive. For non-PCI devices, the addresses
are almost always fixed so it becomes a series of conditionals and
function calls with a length of no more than 3 or 4.
For PCI devices, any downstream devices are going to fall into specific
regions that the bridge registers. Even in the pathological case of a
bus populated with 32 multi-function devices each having 6 bars, it's
still a non-overlapping list of ranges. There's nothing that prevents
you from storing a sorted version of the list such that you can binary
search to the proper dispatch device. Binary searching a list of 1500
entries is quite fast.
In practice, you have no more than 10-20 PCI devices with each device
having 2-3 bars. A simple linear search is not going to have a
noticeable overhead.
> Incidentally, you can have special handling
> functions which are in reality references to kernel handlers,
> shortcutting userspace entirely for critical ports/mmio ranges.
The cost here is the trip from the guest to userspace and back. If you
want to short cut in the kernel, you have to do that *before* returning
to userspace. In that case, how userspace models I/O flow doesn't matter.
The reason flow matters is that PCI controllers alter I/O. Most PCI
devices use little endian for device registers and some big endian
oriented buses will automatically do endian conversion.
Even without those types of controllers, if you use a native endian API,
an MMIO dispatch API is going to do endian conversion to the target
architecture. However, if you're expecting to return the data in little
endian (as PCI registers are expected to usually be), you need to flip
the endianness.
In QEMU, we handle this by registering bars with a function pointer
trampoline to do this. But this is with the special API. If you hook
the mapping API, you'll probably get this wrong.
Regards,
Anthony Liguori
> OG.
next prev parent reply other threads:[~2011-04-10 2:54 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-31 17:30 [ANNOUNCE] Native Linux KVM tool Pekka Enberg
[not found] ` <1B1AE097-4524-4026-85EC-F9A0E274FFF2@suse.de>
2011-04-01 7:07 ` Carsten Otte
2011-04-01 7:37 ` Cyrill Gorcunov
2011-04-01 14:26 ` Steven Rostedt
2011-04-02 20:38 ` Anthony Liguori
2011-04-03 6:21 ` Ingo Molnar
2011-04-03 8:24 ` Avi Kivity
2011-04-03 8:53 ` Pekka Enberg
2011-04-03 9:06 ` Cyrill Gorcunov
2011-04-03 9:37 ` CaT
2011-04-04 10:31 ` Ingo Molnar
2011-04-03 8:51 ` Pekka Enberg
2011-04-03 9:17 ` Avi Kivity
2011-04-03 8:23 ` Avi Kivity
2011-04-03 9:59 ` Pekka Enberg
2011-04-03 10:11 ` Avi Kivity
2011-04-03 10:17 ` Pekka Enberg
2011-04-03 10:22 ` Avi Kivity
2011-04-03 10:32 ` Pekka Enberg
2011-04-03 13:09 ` Anthony Liguori
2011-04-03 13:19 ` Avi Kivity
2011-04-06 9:33 ` Ingo Molnar
2011-04-06 9:36 ` Gleb Natapov
2011-04-06 9:46 ` Ingo Molnar
2011-04-06 9:49 ` Avi Kivity
2011-04-06 9:51 ` Gleb Natapov
2011-04-06 10:14 ` Olivier Galibert
2011-04-06 10:55 ` Ingo Molnar
2011-04-08 2:04 ` Anthony Liguori
2011-04-08 2:14 ` Anthony Liguori
2011-04-08 5:14 ` Pekka Enberg
2011-04-08 6:19 ` Cyrill Gorcunov
2011-04-08 6:47 ` Takuya Yoshikawa
2011-04-08 6:51 ` Pekka Enberg
2011-04-08 7:10 ` Takuya Yoshikawa
2011-04-08 7:39 ` Jan Kiszka
2011-04-08 8:27 ` Pekka Enberg
2011-04-08 9:11 ` Jan Kiszka
2011-04-08 9:32 ` Cyrill Gorcunov
2011-04-08 10:42 ` Jan Kiszka
2011-04-08 12:27 ` Alexander Graf
2011-04-08 12:33 ` Cyrill Gorcunov
2011-04-08 14:39 ` Ted Ts'o
2011-04-08 14:00 ` Anthony Liguori
2011-04-08 19:20 ` Andrea Arcangeli
2011-04-08 22:59 ` Anthony Liguori
2011-04-10 8:05 ` Avi Kivity
2011-04-09 7:40 ` Ingo Molnar
2011-04-12 0:58 ` Andrea Arcangeli
2011-04-09 18:23 ` Olivier Galibert
2011-04-10 2:54 ` Anthony Liguori [this message]
2011-04-08 15:59 ` Scott Wood
2011-04-08 22:58 ` Anthony Liguori
2011-04-06 8:59 ` Markus Armbruster
2011-04-06 9:29 ` Gleb Natapov
2011-04-03 9:01 ` Alon Levy
2011-04-03 10:01 ` Pekka Enberg
2011-04-03 10:15 ` Alon Levy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DA11BEE.1080500@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=aarcange@redhat.com \
--cc=asias.hejun@gmail.com \
--cc=avi@redhat.com \
--cc=galibert@pobox.com \
--cc=gorcunov@gmail.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mtosatti@redhat.com \
--cc=penberg@cs.helsinki.fi \
--cc=penberg@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox