public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Alexander Graf <agraf@suse.de>
Cc: Scott Wood <scottwood@freescale.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	kvm-devel <kvm@vger.kernel.org>,
	kvm-ppc <kvm-ppc@vger.kernel.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	Patch Tracking <patches@linaro.org>,
	Marc Zyngier <marc.zyngier@arm.com>
Subject: Re: [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO
Date: Fri, 24 Jan 2014 18:34:46 -0800	[thread overview]
Message-ID: <20140125023446.GC3750@cbox> (raw)
In-Reply-To: <86A0FDA4-97AB-43EA-8306-17A1D0A94D14@suse.de>

On Sat, Jan 25, 2014 at 03:15:35AM +0100, Alexander Graf wrote:
> 
> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
> 
> > On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
> >> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> >>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index 366bf4b..6dbd68c 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
> >>>> by kvm.  The 'data' member contains the written data if 'is_write' is
> >>>> true, and should be filled by application code otherwise.
> >>>> 
> >>>> +The 'data' member byte order is host kernel native endianness, regardless of
> >>>> +the endianness of the guest, and represents the the value as it would go on the
> >>>> +bus in real hardware.  The host user space should always be able to do:
> >>>> +<type> val = *((<type> *)mmio.data).
> >>> 
> >>> Host userspace should be able to do that with what results?  It would
> >>> only produce a directly usable value if host endianness is the same as
> >>> the emulated device's endianness.
> >> 
> >> With the result that it gets the value the CPU has sent out on
> >> the bus as the memory transaction.
> > 
> > Doesn't that assume the host kernel endianness is the same as the bus
> > (or rather, that the host CPU would not swap such an access before it
> > hits the bus)?
> > 
> > If you take the same hardware and boot a little endian host kernel one
> > day, and a big endian host kernel the next, the bus doesn't change, and
> > neither should the bytewise (assuming address invariance) contents of
> > data[].  How data[] would look when read as a larger integer would of
> > course change -- but that's due to how you're reading it.
> > 
> > It's clear to say that a value in memory has been stored there in host
> > endianness when the value is as you would want to see it in a CPU
> > register, but it's less clear when you talk about it relative to values
> > on a bus.  It's harder to correlate that to something that is software
> > visible.
> > 
> > I don't think there's any actual technical difference between your
> > wording and mine when each wording is properly interpreted, but I
> > suspect my wording is less likely to be misinterpreted (I could be
> > wrong).
> > 
> >> Obviously if what userspace
> >> is emulating is a bus which has a byteswapping bridge or if it's
> >> being helpful to device emulation by providing "here's the value
> >> even though you think you're wired up backwards" then it needs
> >> to byteswap.
> > 
> > Whether the emulated bus has "a byteswapping bridge" doesn't sound like
> > something that depends on the endianness that the host CPU is currently
> > running in.
> > 
> >>> How about a wording like this:
> >>> 
> >>>  The 'data' member contains, in its first 'len' bytes, the value as it
> >>>  would appear if the guest had accessed memory rather than I/O.
> >> 
> >> I think this is confusing, because now userspace authors have
> >> to figure out how to get back to "value X of size Y at address Z"
> >> by interpreting this text... Can you write out the equivalent of
> >> Christoffer's text "here's how you get the memory transaction
> >> value" for what you want?
> > 
> > Userspace swaps the value if and only if userspace's endianness differs
> > from the endianness with which the device interprets the data
> > (regardless of whether said interpretation is considered natural or
> > swapped relative to the way the bus is documented).  It's similar to how
> > userspace would handle emulating DMA.
> > 
> > KVM swaps the value if and only if the endianness of the guest access
> > differs from that of the host, i.e. if it would have done swapping when
> > emulating an ordinary memory access.
> > 
> >> (Also, value as it would appear to who?)
> > 
> > As it would appear to anyone.  It works because data[] actually is
> > memory.  Any difference in how data appears based on the reader's
> > context would already be reflected when the reader performs the load.
> > 
> >> I think your wording implies that the order of bytes in data[] depend
> >> on the guest CPU "usual byte order", ie the order which the CPU
> >> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
> >> and it would mean it would come out differently from
> >> my/Alex/Christoffer's proposal if the host kernel was the opposite
> >> endianness from that "usual" order.
> > 
> > It doesn't depend on "usual" anything.  The only thing it implicitly
> > says about guest byte order is that it's KVM's job to implement any
> > swapping if the endianness of the guest access is different from the
> > endianness of the host kernel access (whether it's due to the guest's
> > mode, the way a page is mapped, the instruction used, etc).
> > 
> >> Finally, I think it's a bit confusing in that "as if the guest had
> >> accessed memory" is assigning implicit semantics to memory
> >> in the emulated system, when memory is actually kind of outside
> >> KVM's purview because it's not part of the CPU.
> > 
> > That's sort of the point.  It defines it in a way that is independent of
> > the CPU, and thus independent of what endianness the CPU operates in.
> 
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
> 
> your proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> -> ldw_p() will give us the correct value to work with
> 
> current proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
> 
> -> *(uint32_t*)data will give us the correct value to work with
> 
> 
> There are pros and cons for both approaches.
> 
> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
> 
> Obviously both approaches would work as long as they're properly defined :).
> 
Just to clarify, with approach 2 existing supported QEMU configurations
of BE/LE on both ARM and PPC still work - it is only for future mixed
endian supprt we need to modify QEMU, right?

-Christoffer

  reply	other threads:[~2014-01-25  2:34 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-24 23:39 [PATCH v2] KVM: Specify byte order for KVM_EXIT_MMIO Christoffer Dall
2014-01-24 23:51 ` Scott Wood
2014-01-25  0:05   ` Victor Kamensky
2014-01-25  0:24   ` Peter Maydell
2014-01-25  1:56     ` Christoffer Dall
2014-01-25  2:04       ` Scott Wood
2014-01-25  2:16         ` Alexander Graf
2014-01-25  1:58     ` Scott Wood
2014-01-25  2:15       ` Alexander Graf
2014-01-25  2:34         ` Christoffer Dall [this message]
2014-01-25  9:13           ` Alexander Graf
2014-01-25  2:37         ` Victor Kamensky
2014-01-25  9:20           ` Alexander Graf
2014-01-25 15:36             ` Victor Kamensky
2014-01-25 16:12               ` Alexander Graf
2014-01-25 16:23         ` Peter Maydell
2014-01-25 18:31           ` Christoffer Dall
2014-01-26  3:46             ` Victor Kamensky
2014-01-26  5:43               ` Victor Kamensky
2014-01-27  7:52                 ` Alexander Graf
2014-01-27  9:42                   ` Peter Maydell
2014-01-27  7:41               ` Alexander Graf
2014-01-28  1:59         ` Scott Wood
2014-01-28  8:55           ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140125023446.GC3750@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=agraf@suse.de \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=marc.zyngier@arm.com \
    --cc=patches@linaro.org \
    --cc=peter.maydell@linaro.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox