* [RFC PATCH] cputlb: implement load_helper_unaligned() for unaligned loads
@ 2021-06-09 9:35 Mark Cave-Ayland
2021-06-09 10:28 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 3+ messages in thread
From: Mark Cave-Ayland @ 2021-06-09 9:35 UTC (permalink / raw)
To: qemu-devel, richard.henderson, alex.bennee, peter.maydell
[RFC because this is currently only lightly tested and there have been some
discussions about whether this should be handled elsewhere in the memory API]
If an unaligned load is required then the load is split into 2 separate accesses
and combined together within load_helper(). This does not work correctly with
MMIO accesses because the original access size is used for both individual
accesses causing the little and big endian combine to return the wrong result.
There is already a similar solution in place for store_helper() where an unaligned
access is handled by a separate store_helper_unaligned() function which instead
of using the original access size, uses a single-byte access size to shift and
combine the result correctly regardless of the orignal access size or endian.
Implement a similar load_helper_unaligned() function which uses the same approach
for unaligned loads to return the correct result according to the original test
case.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/360
---
accel/tcg/cputlb.c | 99 ++++++++++++++++++++++++++++++++++++++--------
1 file changed, 82 insertions(+), 17 deletions(-)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index f24348e979..1845929e99 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1851,6 +1851,85 @@ load_memop(const void *haddr, MemOp op)
}
}
+static uint64_t __attribute__((noinline))
+load_helper_unaligned(CPUArchState *env, target_ulong addr, uintptr_t retaddr,
+ size_t size, uintptr_t mmu_idx, bool code_read,
+ bool big_endian)
+{
+ uintptr_t index, index2;
+ CPUTLBEntry *entry, *entry2;
+ const size_t tlb_off = code_read ?
+ offsetof(CPUTLBEntry, addr_code) : offsetof(CPUTLBEntry, addr_read);
+ const MMUAccessType access_type =
+ code_read ? MMU_INST_FETCH : MMU_DATA_LOAD;
+ target_ulong page2, tlb_addr, tlb_addr2;
+ uint64_t val = 0;
+ TCGMemOpIdx oi;
+ size_t size2;
+ int i;
+
+ /*
+ * Ensure the second page is in the TLB. Note that the first page
+ * is already guaranteed to be filled, and that the second page
+ * cannot evict the first.
+ */
+ page2 = (addr + size) & TARGET_PAGE_MASK;
+ size2 = (addr + size) & ~TARGET_PAGE_MASK;
+ index2 = tlb_index(env, mmu_idx, page2);
+ entry2 = tlb_entry(env, mmu_idx, page2);
+
+ tlb_addr2 = code_read ? entry2->addr_code : entry2->addr_read;
+ if (!tlb_hit_page(tlb_addr2, page2)) {
+ if (!victim_tlb_hit(env, mmu_idx, index2, tlb_off, page2)) {
+ tlb_fill(env_cpu(env), page2, size2, access_type,
+ mmu_idx, retaddr);
+ index2 = tlb_index(env, mmu_idx, page2);
+ entry2 = tlb_entry(env, mmu_idx, page2);
+ }
+ tlb_addr2 = code_read ? entry2->addr_code : entry2->addr_read;
+ }
+
+ index = tlb_index(env, mmu_idx, addr);
+ entry = tlb_entry(env, mmu_idx, addr);
+ tlb_addr = code_read ? entry->addr_code : entry->addr_read;
+
+ /*
+ * Handle watchpoints
+ */
+ if (unlikely(tlb_addr & TLB_WATCHPOINT)) {
+ cpu_check_watchpoint(env_cpu(env), addr, size - size2,
+ env_tlb(env)->d[mmu_idx].iotlb[index].attrs,
+ BP_MEM_READ, retaddr);
+ }
+ if (unlikely(tlb_addr2 & TLB_WATCHPOINT)) {
+ cpu_check_watchpoint(env_cpu(env), page2, size2,
+ env_tlb(env)->d[mmu_idx].iotlb[index2].attrs,
+ BP_MEM_READ, retaddr);
+ }
+
+ /*
+ * XXX: not efficient, but simple.
+ * This loop must go in the forward direction to avoid issues
+ * with self-modifying code in Windows 64-bit.
+ */
+ oi = make_memop_idx(MO_UB, mmu_idx);
+ if (big_endian) {
+ for (i = 0; i < size; ++i) {
+ /* Big-endian load. */
+ uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
+ val |= val8 << (((size - 1) * 8) - (i * 8));
+ }
+ } else {
+ for (i = 0; i < size; ++i) {
+ /* Little-endian load. */
+ uint8_t val8 = helper_ret_ldub_mmu(env, addr + i, oi, retaddr);
+ val |= val8 << (i * 8);
+ }
+ }
+
+ return val;
+}
+
static inline uint64_t QEMU_ALWAYS_INLINE
load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
uintptr_t retaddr, MemOp op, bool code_read,
@@ -1893,7 +1972,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
CPUIOTLBEntry *iotlbentry;
bool need_swap;
- /* For anything that is unaligned, recurse through full_load. */
+ /* For anything that is unaligned, recurse through byte loads. */
if ((addr & (size - 1)) != 0) {
goto do_unaligned_access;
}
@@ -1932,23 +2011,9 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
if (size > 1
&& unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
>= TARGET_PAGE_SIZE)) {
- target_ulong addr1, addr2;
- uint64_t r1, r2;
- unsigned shift;
do_unaligned_access:
- addr1 = addr & ~((target_ulong)size - 1);
- addr2 = addr1 + size;
- r1 = full_load(env, addr1, oi, retaddr);
- r2 = full_load(env, addr2, oi, retaddr);
- shift = (addr & (size - 1)) * 8;
-
- if (memop_big_endian(op)) {
- /* Big-endian combine. */
- res = (r1 << shift) | (r2 >> ((size * 8) - shift));
- } else {
- /* Little-endian combine. */
- res = (r1 >> shift) | (r2 << ((size * 8) - shift));
- }
+ res = load_helper_unaligned(env, addr, retaddr, size, mmu_idx,
+ code_read, memop_big_endian(op));
return res & MAKE_64BIT_MASK(0, size * 8);
}
--
2.20.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [RFC PATCH] cputlb: implement load_helper_unaligned() for unaligned loads
2021-06-09 9:35 [RFC PATCH] cputlb: implement load_helper_unaligned() for unaligned loads Mark Cave-Ayland
@ 2021-06-09 10:28 ` Philippe Mathieu-Daudé
2021-06-09 11:17 ` Mark Cave-Ayland
0 siblings, 1 reply; 3+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-06-09 10:28 UTC (permalink / raw)
To: Mark Cave-Ayland, qemu-devel, richard.henderson, alex.bennee,
peter.maydell
On 6/9/21 11:35 AM, Mark Cave-Ayland wrote:
> [RFC because this is currently only lightly tested and there have been some
> discussions about whether this should be handled elsewhere in the memory API]
>
> If an unaligned load is required then the load is split into 2 separate accesses
> and combined together within load_helper(). This does not work correctly with
> MMIO accesses because the original access size is used for both individual
> accesses causing the little and big endian combine to return the wrong result.
>
> There is already a similar solution in place for store_helper() where an unaligned
> access is handled by a separate store_helper_unaligned() function which instead
> of using the original access size, uses a single-byte access size to shift and
> combine the result correctly regardless of the orignal access size or endian.
>
> Implement a similar load_helper_unaligned() function which uses the same approach
> for unaligned loads to return the correct result according to the original test
> case.
>
> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/360
> ---
> accel/tcg/cputlb.c | 99 ++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 82 insertions(+), 17 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index f24348e979..1845929e99 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1851,6 +1851,85 @@ load_memop(const void *haddr, MemOp op)
> }
> }
>
> +static uint64_t __attribute__((noinline))
> +load_helper_unaligned(CPUArchState *env, target_ulong addr, uintptr_t retaddr,
> + size_t size, uintptr_t mmu_idx, bool code_read,
> + bool big_endian)
> +{
...
> +}
> +
> static inline uint64_t QEMU_ALWAYS_INLINE
> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> uintptr_t retaddr, MemOp op, bool code_read,
> @@ -1893,7 +1972,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> CPUIOTLBEntry *iotlbentry;
> bool need_swap;
>
> - /* For anything that is unaligned, recurse through full_load. */
> + /* For anything that is unaligned, recurse through byte loads. */
> if ((addr & (size - 1)) != 0) {
> goto do_unaligned_access;
> }
> @@ -1932,23 +2011,9 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
> if (size > 1
> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
> >= TARGET_PAGE_SIZE)) {
It would be easier to review extracting load_helper_unaligned() first.
> + res = load_helper_unaligned(env, addr, retaddr, size, mmu_idx,
> + code_read, memop_big_endian(op));
> return res & MAKE_64BIT_MASK(0, size * 8);
> }
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC PATCH] cputlb: implement load_helper_unaligned() for unaligned loads
2021-06-09 10:28 ` Philippe Mathieu-Daudé
@ 2021-06-09 11:17 ` Mark Cave-Ayland
0 siblings, 0 replies; 3+ messages in thread
From: Mark Cave-Ayland @ 2021-06-09 11:17 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel, richard.henderson,
alex.bennee, peter.maydell
On 09/06/2021 11:28, Philippe Mathieu-Daudé wrote:
> On 6/9/21 11:35 AM, Mark Cave-Ayland wrote:
>> [RFC because this is currently only lightly tested and there have been some
>> discussions about whether this should be handled elsewhere in the memory API]
>>
>> If an unaligned load is required then the load is split into 2 separate accesses
>> and combined together within load_helper(). This does not work correctly with
>> MMIO accesses because the original access size is used for both individual
>> accesses causing the little and big endian combine to return the wrong result.
>>
>> There is already a similar solution in place for store_helper() where an unaligned
>> access is handled by a separate store_helper_unaligned() function which instead
>> of using the original access size, uses a single-byte access size to shift and
>> combine the result correctly regardless of the orignal access size or endian.
>>
>> Implement a similar load_helper_unaligned() function which uses the same approach
>> for unaligned loads to return the correct result according to the original test
>> case.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/360
>> ---
>> accel/tcg/cputlb.c | 99 ++++++++++++++++++++++++++++++++++++++--------
>> 1 file changed, 82 insertions(+), 17 deletions(-)
>>
>> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
>> index f24348e979..1845929e99 100644
>> --- a/accel/tcg/cputlb.c
>> +++ b/accel/tcg/cputlb.c
>> @@ -1851,6 +1851,85 @@ load_memop(const void *haddr, MemOp op)
>> }
>> }
>>
>> +static uint64_t __attribute__((noinline))
>> +load_helper_unaligned(CPUArchState *env, target_ulong addr, uintptr_t retaddr,
>> + size_t size, uintptr_t mmu_idx, bool code_read,
>> + bool big_endian)
>> +{
> ...
>
>> +}
>> +
>> static inline uint64_t QEMU_ALWAYS_INLINE
>> load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>> uintptr_t retaddr, MemOp op, bool code_read,
>> @@ -1893,7 +1972,7 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>> CPUIOTLBEntry *iotlbentry;
>> bool need_swap;
>>
>> - /* For anything that is unaligned, recurse through full_load. */
>> + /* For anything that is unaligned, recurse through byte loads. */
>> if ((addr & (size - 1)) != 0) {
>> goto do_unaligned_access;
>> }
>> @@ -1932,23 +2011,9 @@ load_helper(CPUArchState *env, target_ulong addr, TCGMemOpIdx oi,
>> if (size > 1
>> && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1
>> >= TARGET_PAGE_SIZE)) {
>
> It would be easier to review extracting load_helper_unaligned() first.
Do you mean into a separate function? Since the function is recursive then I think
you'd end up having to copy everything apart from the change that does all the work,
which is the for() loop iterating over the full access a byte at a time. For the sake
of the extra few lines I'm not sure it makes sense?
The equivalent commit for store_helper() is 6b8b622e87e "cputlb: Make store_helper
less fragile to compiler optimizations" but that misses that its version of the
unaligned access code which splits everything into single byte accesses was already
present.
I'm also interested to know if the full_load parameter is still required for
load_helper() - given that it is an RFC patch I've left it there for now, but it's
likely this will need some thought too.
ATB,
Mark.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-06-09 11:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-09 9:35 [RFC PATCH] cputlb: implement load_helper_unaligned() for unaligned loads Mark Cave-Ayland
2021-06-09 10:28 ` Philippe Mathieu-Daudé
2021-06-09 11:17 ` Mark Cave-Ayland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).