From: Laszlo Ersek <lersek@redhat.com>
To: Andrew Jones <drjones@redhat.com>
Cc: peter.maydell@linaro.org, Alexander Graf <agraf@suse.de>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 1/8] fw_cfg: max access size and region size are the same for MMIO data reg
Date: Tue, 16 Dec 2014 20:00:46 +0100 [thread overview]
Message-ID: <5490815E.8@redhat.com> (raw)
In-Reply-To: <20141216134858.GD3283@hawk.usersys.redhat.com>
On 12/16/14 14:48, Andrew Jones wrote:
> On Fri, Dec 12, 2014 at 04:58:45PM +0100, Laszlo Ersek wrote:
>> Make it clear that the maximum access size to the MMIO data register
>> determines the full size of the memory region.
>>
>> Currently the max access size is 1. Ensure that if a larger size were
>> used in "fw_cfg_data_mem_ops.valid.max_access_size", the memory
>> subsystem would split the access to byte-sized accesses internally,
>> in increasing address order.
>>
>> fw_cfg_data_mem_read() and fw_cfg_data_mem_write() don't care about
>> "address" or "size"; they just call the sequential fw_cfg_read() and
>> fw_cfg_write() functions, correspondingly. Therefore the automatic
>> splitting is just right. (The endianness of "fw_cfg_data_mem_ops" is
>> native.)
>
> This 'is native' caught my eye. Laszlo's
> Documentation/devicetree/bindings/arm/fw-cfg.txt patch states that the
> selector register is LE and
>
> "
> The data register allows accesses with 8, 16, 32 and 64-bit width.
> Accesses larger than a byte are interpreted as arrays, bundled
> together only for better performance. The bytes constituting such a
> word, in increasing address order, correspond to the bytes that would
> have been transferred by byte-wide accesses in chronological order.
> "
>
> I chatted with Laszlo to make sure the host-is-BE case was considered.
> It looks like there may be an issue there that Laszlo promised to look
> into. Laszlo had already noticed that the selector was defined as
> native in qemu as well, but should be LE. Now that we support > 1 byte
> reads and writes from the data port, then maybe we should look into
> changing that as well.
Yes.
The root of this question is what each of
enum device_endian {
DEVICE_NATIVE_ENDIAN,
DEVICE_BIG_ENDIAN,
DEVICE_LITTLE_ENDIAN,
};
means.
Consider the following call tree, which implements the splitting /
combining of an MMIO read:
memory_region_dispatch_read() [memory.c]
memory_region_dispatch_read1()
access_with_adjusted_size()
memory_region_big_endian()
for each byte in the wide access:
memory_region_read_accessor()
fw_cfg_data_mem_read() [hw/nvram/fw_cfg.c]
fw_cfg_read()
adjust_endianness()
memory_region_wrong_endianness()
bswapXX()
The function access_with_adjusted_size() always iterates over the MMIO
address range in incrementing address order. However, it can calculate
the shift count for memory_region_read_accessor() in two ways.
When memory_region_big_endian() returns "true", the shift count
decreases as the MMIO address increases.
When memory_region_big_endian() returns "false", the shift count
increases as the MMIO address increases.
In memory_region_read_accessor(), the shift count is used for a logical
(ie. numeric) bitwise left shift (<<). That operator works with numeric
values and hides (ie. accounts for) host endianness.
Let's consider
- an array of two bytes, [0xaa, 0xbb],
- a uint16_t access made from the guest,
- and all twelve possible cases.
In the table below, the "host", "guest" and "device_endian" columns are
input variables. The columns to the right are calculated / derived
values. The arrows above the columns show the data dependencies.
After memory_region_dispatch_read1() constructs the value that is
visible in the "host value" column, it is passed to adjust_endianness().
If memory_region_wrong_endianness() returns "true", then the host value
is byte-swapped. The result is then passed to the guest.
+---------------------------------------------------------------------------------------------------------------+----------+
| | |
+---- ------+-------------------------------------------------------------------------+ | |
| | | | | |
+----------------------------------------------------------+---------+ | | |
| | | | | | | | |
| +-----------+-------------------+ +----------------+ | | +------------------------+-------------------+ | |
| | | | | | | | | | | | | | | | |
| | | | | | v | v | v | v | v | v
# host guest device_endian memory_region_big_endian() host value host repr. memory_region_wrong_endianness() guest repr. guest value
-- ---- ----- ------------- -------------------------- ---------- ------------ -------------------------------- ------------ -----------
1 LE LE native 0 0xbbaa [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xbbaa
2 LE LE BE 1 0xaabb [0xbb, 0xaa] 1 [0xaa, 0xbb] 0xbbaa
3 LE LE LE 0 0xbbaa [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xbbaa
4 LE BE native 1 0xaabb [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xbbaa
5 LE BE BE 1 0xaabb [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xbbaa
6 LE BE LE 0 0xbbaa [0xaa, 0xbb] 1 [0xbb, 0xaa] 0xbbaa
7 BE LE native 0 0xbbaa [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xaabb
8 BE LE BE 1 0xaabb [0xaa, 0xbb] 1 [0xbb, 0xaa] 0xaabb
9 BE LE LE 0 0xbbaa [0xbb, 0xaa] 0 [0xbb, 0xaa] 0xaabb
10 BE BE native 1 0xaabb [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xaabb
11 BE BE BE 1 0xaabb [0xaa, 0xbb] 0 [0xaa, 0xbb] 0xaabb
12 BE BE LE 0 0xbbaa [0xbb, 0xaa] 1 [0xaa, 0xbb] 0xaabb
Looking at the two rightmost columns, we must conclude:
(a) The "device_endian" field has absolutely no significance wrt. what
the guest sees. In each triplet of cases, when we go from
DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN to DEVICE_LITTLE_ENDIAN,
the guest sees the exact same value.
I don't understand this result (it makes me doubt device_endian
makes any sense). What did I do wrong?
Just to be sure that I was not seeing ghosts, I actually tested this
result. On an x86_64 hosts I tested the aarch64 guest firmware
(using TCG), cycling the "fw_cfg_data_mem_ops.endianness" field
through all three possible values. That is, I tested cases #1 to #3.
They *all* worked!
(b) Considering a given host endianness (ie. a group of six cases): when
the guest goes from little endian to big endian, the *numerical
value* the guest sees does not change.
In addition, fixating the guest endianness, and changing the host
endianness, the *numerical value* that the guest sees (for the
original host-side array [0xaa, 0xbb]) changes.
This means that this interface is *value preserving*, not
representation preserving. In other words: when host and guest
endiannesses are identical, the *array* is transferred okay (the
guest array matches the initial host array [0xaa, 0xbb]). When guest
and host differ in endianness, the guest will see an incorrect
*array*.
Which, alas, makes this interface completely unsuitable for the
purpose at hand (ie. transferring kernel & initrd images in words
wider than a byte). We couldn't care less as to what numerical value
the array [0xaa, 0xbb] encodes on the host -- whatever it encodes,
the guest wants to receive the same *array* (not the same value).
And the byte order cannot be fixed up in the guest, because it
depends on the XOR of guest and host endiannesses, and the guest
doesn't know about the host's endianness.
I apologize for wasting everyone's time, but I think both results are
very non-intuitive. I noticed the use of the bitwise shift in
memory_region_read_accessor() -- which internally accounts for the
host-side byte order -- just today, while discussing this with Drew on
IRC. I had assumed that it would store bytes in the recipient uint64_t
by address, not by numerical order of magnitude.
Looks like the DMA-like interface is the only way forward. :(
Laszlo
next prev parent reply other threads:[~2014-12-16 19:14 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 15:58 [Qemu-devel] [PATCH v4 0/8] fw_cfg, bootorder, and UEFI+'-kernel' on arm/virt Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 1/8] fw_cfg: max access size and region size are the same for MMIO data reg Laszlo Ersek
2014-12-16 13:48 ` Andrew Jones
2014-12-16 19:00 ` Laszlo Ersek [this message]
2014-12-16 19:49 ` Paolo Bonzini
2014-12-16 20:06 ` Laszlo Ersek
2014-12-16 20:17 ` Laszlo Ersek
2014-12-16 21:47 ` Paolo Bonzini
2014-12-17 4:52 ` Laszlo Ersek
2014-12-16 20:40 ` Paolo Bonzini
2014-12-16 21:47 ` Peter Maydell
2014-12-17 5:06 ` Laszlo Ersek
2014-12-17 9:23 ` Paolo Bonzini
2014-12-17 9:31 ` Alexander Graf
2014-12-16 20:41 ` Peter Maydell
2014-12-17 7:13 ` Laszlo Ersek
2014-12-17 8:28 ` Alexander Graf
2014-12-17 8:40 ` Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 2/8] fw_cfg: generalize overlap check for combining control and data I/O ports Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 3/8] fw_cfg: introduce the "data_memwidth" property Laszlo Ersek
2014-12-16 12:06 ` Alexander Graf
2014-12-16 12:42 ` Laszlo Ersek
2014-12-16 16:59 ` Laszlo Ersek
2014-12-16 17:10 ` Peter Maydell
2014-12-16 17:20 ` Alexander Graf
2014-12-16 18:52 ` Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 4/8] fw_cfg: expose the "data_memwidth" prop with fw_cfg_init_data_memwidth() Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 5/8] arm: add fw_cfg to "virt" board Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 6/8] hw/loader: split out load_image_gzipped_buffer() Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 7/8] hw/arm: pass pristine kernel image to guest firmware over fw_cfg Laszlo Ersek
2014-12-16 12:15 ` Alexander Graf
2014-12-16 12:18 ` Peter Maydell
2014-12-16 12:20 ` Alexander Graf
2014-12-16 12:25 ` Peter Maydell
2014-12-16 12:42 ` Richard W.M. Jones
2014-12-16 12:44 ` Laszlo Ersek
2014-12-12 15:58 ` [Qemu-devel] [PATCH v4 8/8] hw/arm/virt: enable passing of EFI-stubbed kernel to guest UEFI firmware Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5490815E.8@redhat.com \
--to=lersek@redhat.com \
--cc=agraf@suse.de \
--cc=drjones@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).