From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.86_2) id 1hqXks-00025h-8c for mharc-qemu-riscv@gnu.org; Thu, 25 Jul 2019 03:01:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50448) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqXko-00025W-Ix for qemu-riscv@nongnu.org; Thu, 25 Jul 2019 03:01:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqXkl-000289-W5 for qemu-riscv@nongnu.org; Thu, 25 Jul 2019 03:01:30 -0400 Received: from smtpe1.intersmtp.com ([213.121.35.74]:58897) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hqXkb-0001yR-41; Thu, 25 Jul 2019 03:01:17 -0400 Received: from tpw09926dag18f.domain1.systemhost.net (10.9.212.26) by BWP09926079.bt.com (10.36.82.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1713.5; Thu, 25 Jul 2019 08:01:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net (10.9.212.18) by tpw09926dag18f.domain1.systemhost.net (10.9.212.26) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Thu, 25 Jul 2019 08:01:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c]) by tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c%12]) with mapi id 15.00.1395.000; Thu, 25 Jul 2019 08:01:14 +0100 From: To: CC: , , , , , , , , , , , , , , , , , , , , , , , Thread-Topic: [Qemu-devel] [PATCH v3 00/15] Invert Endian bit in SPARCv9 MMU TTE Thread-Index: AQHVQrP3RrTfmfnvbEOFfW6KFJKHhQ== Date: Thu, 25 Jul 2019 07:01:14 +0000 Message-ID: <1564038073754.91133@bt.com> References: , <1563810716254.18886@bt.com> In-Reply-To: <1563810716254.18886@bt.com> Accept-Language: en-AU, en-GB, en-US Content-Language: en-AU X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.187.101.42] Content-Type: multipart/alternative; boundary="_000_156403807375491133btcom_" MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 213.121.35.74 Subject: [Qemu-riscv] [Qemu-devel] [PATCH v3 00/15] Invert Endian bit in SPARCv9 MMU TTE X-BeenThere: qemu-riscv@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jul 2019 07:01:33 -0000 --_000_156403807375491133btcom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE. It is an attempt of the instructions outlined by Richard Henderson to Mark Cave-Ayland. Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfortunate= ly a separate keyboard issue remains in the way. On 01/11/17 19:15, Mark Cave-Ayland wrote: >On 15/08/17 19:10, Richard Henderson wrote: > >> [CC Peter re MemTxAttrs below] >> >> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote: >>> Working through an incorrect endian issue on qemu-system-sparc64, it ha= s >>> become apparent that at least one OS makes use of the IE (Invert Endian= ) >>> bit in the SPARCv9 MMU TTE to map PCI memory space without the >>> programmer having to manually endian-swap accesses. >>> >>> In other words, to quote the UltraSPARC specification: "if this bit is >>> set, accesses to the associated page are processed with inverse >>> endianness from what is specified by the instruction (big-for-little an= d >>> little-for-big)". A good explanation by Mark why the IE bit is required. >>> >>> Looking through various bits of code, I'm trying to get a feel for the >>> best way to implement this in an efficient manner. From what I can see >>> this could be solved using an additional MMU index, however I'm not >>> overly familiar with the memory and softmmu subsystems. >> >> No, it can't be solved with an MMU index. >> >>> Can anyone point me in the right direction as to what would be the best >>> way to implement this feature within QEMU? >> >> It's definitely tricky. >> >> We definitely need some TLB_FLAGS_MASK bit set so that we're forced thro= ugh >> the >> memory slow path. There is no other way to bypass the endianness that w= e've >> already encoded from the target instruction. >> >> Given the tlb_set_page_with_attrs interface, I would think that we need = a new >> bit in MemTxAttrs, so that the target/sparc tlb_fill (and subroutines) c= an >> pass >> along the TTE bit for the given page. >> >> We have an existing problem in softmmu_template.h, >> >> /* ??? Note that the io helpers always read data in the target >> byte ordering. We should push the LE/BE request down into io. *= / >> res =3D glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr); >> res =3D TGT_BE(res); >> >> We do not want to add a third(!) byte swap along the i/o path. We need = to >> collapse the two that we have already before considering this one. >> >> This probably takes the form of: >> >> (1) Replacing the "int size" argument with "TCGMemOp memop" for >> a) io_{read,write}x in accel/tcg/cputlb.c, >> b) memory_region_dispatch_{read,write} in memory.c, >> c) adjust_endianness in memory.c. >> This carries size+sign+endianness down to the next level. >> >> (2) In memory.c, adjust_endianness, >> >> if (memory_region_wrong_endianness(mr)) { >> - switch (size) { >> + memop ^=3D MO_BSWAP; >> + } >> + if (memop & MO_BSWAP) { >> >> For extra credit, re-arrange memory_region_wrong_endianness >> to something more explicit -- "wrong" isn't helpful. > >Finally I've had a bit of spare time to experiment with this approach, >and from what I can see there are currently 2 issues: > > >1) Using TCGMemOp in memory.c means it is no longer accelerator agnostic > >For the moment I've defined a separate MemOp in memory.h and provided a >mapping function in io_{read,write}x to map from TCGMemOp to MemOp and >then pass that into memory_region_dispatch_{read,write}. > >Other than not referencing TCGMemOp in the memory API, another reason >for doing this was that I wasn't convinced that all the MO_ attributes >were valid outside of TCG. I do, of course, strongly defer to other >people's knowledge in this area though. > > >2) The above changes to adjust_endianness() fail when >memory_region_dispatch_{read,write} are called recursively > >Whilst booting qemu-system-sparc64 I see that >memory_region_dispatch_{read,write} get called recursively - once via >io_{read,write}x and then again via flatview_read_continue() in exec.c. > >The net effect of this is that we perform the bswap correctly at the >tail of the recursion, but then as we travel back up the stack we hit >memory_region_dispatch_{read,write} once again causing a second bswap >which means the value is returned with the incorrect endian again. > > >My understanding from your softmmu_template.h comment above is that the >memory API should do the endian swapping internally allowing the removal >of the final TGT_BE/TGT_LE applied to the result, or did I get this wrong? > >> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and set >> a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK. >> >> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is set, >> then memop ^=3D MO_BSWAP. Thanks all for the v1 and v2 feedback. v2: - Moved size+sign+endianness attributes from TCGMemOp into MemOp. In v1 TCGMemOp was re-purposed entirely into MemOp. - Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias. This is to avoid warnings on comparing and coercing different enums. - Renamed get_memop to get_tcgmemop for clarity. - MEMOP is now SIZE_MEMOP, which is just ctzl(size). - Split patch 3/4 so one memory_region_dispatch_{read|write} interface is converted per patch. - Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead. - Split patch 4/4 so adding the MemTxAddrs parameters and converting tlb_set_page() to tlb_set_page_with_attrs() is separate from usage. - CC'd maintainers. v3: - Like v1, the entire TCGMemOp enum is now MemOp. - MemOp target dependant attributes are conditional upon NEED_CPU_H Tony Nguyen (15): tcg: TCGMemOp is now accelerator independent MemOp memory: Access MemoryRegion with MemOp target/mips: Access MemoryRegion with MemOp hw/s390x: Access MemoryRegion with MemOp hw/intc/armv7m_nic: Access MemoryRegion with MemOp hw/virtio: Access MemoryRegion with MemOp hw/vfio: Access MemoryRegion with MemOp exec: Access MemoryRegion with MemOp cputlb: Access MemoryRegion with MemOp memory: Access MemoryRegion with MemOp semantics memory: Single byte swap along the I/O path cpu: TLB_FLAGS_MASK bit to force memory slow path cputlb: Byte swap memory transaction attribute target/sparc: Add TLB entry with attributes target/sparc: sun4u Invert Endian TTE bit accel/tcg/cputlb.c | 71 +++++++++-------- exec.c | 6 +- hw/intc/armv7m_nvic.c | 12 ++- hw/s390x/s390-pci-inst.c | 8 +- hw/vfio/pci-quirks.c | 5 +- hw/virtio/virtio-pci.c | 7 +- include/exec/cpu-all.h | 10 ++- include/exec/memattrs.h | 2 + include/exec/memop.h | 112 +++++++++++++++++++++++++++ include/exec/memory.h | 9 ++- memory.c | 37 +++++---- memory_ldst.inc.c | 18 ++--- target/alpha/translate.c | 2 +- target/arm/translate-a64.c | 48 ++++++------ target/arm/translate-a64.h | 2 +- target/arm/translate-sve.c | 2 +- target/arm/translate.c | 32 ++++---- target/arm/translate.h | 2 +- target/hppa/translate.c | 14 ++-- target/i386/translate.c | 132 ++++++++++++++++------------= ---- target/m68k/translate.c | 2 +- target/microblaze/translate.c | 4 +- target/mips/op_helper.c | 5 +- target/mips/translate.c | 8 +- target/openrisc/translate.c | 4 +- target/ppc/translate.c | 12 +-- target/riscv/insn_trans/trans_rva.inc.c | 8 +- target/riscv/insn_trans/trans_rvi.inc.c | 4 +- target/s390x/translate.c | 6 +- target/s390x/translate_vx.inc.c | 10 +-- target/sparc/cpu.h | 2 + target/sparc/mmu_helper.c | 40 ++++++---- target/sparc/translate.c | 14 ++-- target/tilegx/translate.c | 10 +-- target/tricore/translate.c | 8 +- tcg/README | 2 +- tcg/aarch64/tcg-target.inc.c | 26 +++---- tcg/arm/tcg-target.inc.c | 26 +++---- tcg/i386/tcg-target.inc.c | 24 +++--- tcg/mips/tcg-target.inc.c | 16 ++-- tcg/optimize.c | 2 +- tcg/ppc/tcg-target.inc.c | 12 +-- tcg/riscv/tcg-target.inc.c | 20 ++--- tcg/s390/tcg-target.inc.c | 14 ++-- tcg/sparc/tcg-target.inc.c | 6 +- tcg/tcg-op.c | 38 ++++----- tcg/tcg-op.h | 86 ++++++++++----------- tcg/tcg.c | 2 +- tcg/tcg.h | 99 ++---------------------- trace/mem-internal.h | 4 +- trace/mem.h | 4 +- 51 files changed, 561 insertions(+), 488 deletions(-) create mode 100644 include/exec/memop.h -- 1.8.3.1 --_000_156403807375491133btcom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

This patchset implements the IE (Inve= rt Endian) bit in SPARCv9 MMU TTE.

It is an attempt of the instructions outlined by Richard Henderson to = Mark
Cave-Ayland.

Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfort= unately a
separate keyboard issue remains in the way.

On 01/11/17 19:15, Mark Cave-Ayland wrote:

>On 15/08/17 19:10, Richard Henderson wrote:
>
>> [CC Peter re MemTxAttrs below]
>>
>> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote:
>>> Working through an incorrect endian issue on qemu-system-= sparc64, it has
>>> become apparent that at least one OS makes use of the IE = (Invert Endian)
>>> bit in the SPARCv9 MMU TTE to map PCI memory space withou= t the
>>> programmer having to manually endian-swap accesses.
>>>
>>> In other words, to quote the UltraSPARC specification: &q= uot;if this bit is
>>> set, accesses to the associated page are processed with i= nverse
>>> endianness from what is specified by the instruction (big= -for-little and
>>> little-for-big)".

A good explanation by Mark why the IE bit is required.

>>>
>>> Looking through various bits of code, I'm trying to get a= feel for the
>>> best way to implement this in an efficient manner. From w= hat I can see
>>> this could be solved using an additional MMU index, howev= er I'm not
>>> overly familiar with the memory and softmmu subsystems.
>>
>> No, it can't be solved with an MMU index.
>>
>>> Can anyone point me in the right direction as to what wou= ld be the best
>>> way to implement this feature within QEMU?
>>
>> It's definitely tricky.
>>
>> We definitely need some TLB_FLAGS_MASK bit set so that we're = forced through
>> the
>> memory slow path.  There is no other way to bypass the e= ndianness that we've
>> already encoded from the target instruction.
>>
>> Given the tlb_set_page_with_attrs interface, I would think th= at we need a new
>> bit in MemTxAttrs, so that the target/sparc tlb_fill (and sub= routines) can
>> pass
>> along the TTE bit for the given page.
>>
>> We have an existing problem in softmmu_template.h,
>>
>>     /* ??? Note that the io helpers always read dat= a in the target
>>        byte ordering.  We should pus= h the LE/BE request down into io.  */
>>     res =3D glue(io_read, SUFFIX)(env, mmu_idx, ind= ex, addr, retaddr);
>>     res =3D TGT_BE(res);
>>
>> We do not want to add a third(!) byte swap along the i/o path= .  We need to
>> collapse the two that we have already before considering this= one.
>>
>> This probably takes the form of:
>>
>> (1) Replacing the "int size" argument with "TC= GMemOp memop" for
>>       a) io_{read,write}x in accel/tcg/cputlb.= c,
>>       b) memory_region_dispatch_{read,write} i= n memory.c,
>>       c) adjust_endianness in memory.c.
>>     This carries size+sign+endianness down = to the next level.
>>
>> (2) In memory.c, adjust_endianness,
>>
>>      if (memory_region_wrong_endianness(mr)) {=
>> -        switch (size) {
>> +        memop ^=3D MO_BSWAP;
>> +    }
>> +    if (memop & MO_BSWAP) {
>>
>>     For extra credit, re-arrange memory_region_wron= g_endianness
>>     to something more explicit -- "wrong"= isn't helpful.
>
>Finally I've had a bit of spare time to experiment with this appro= ach,
>and from what I can see there are currently 2 issues:
>
>
>1) Using TCGMemOp in memory.c means it is no longer accelerator ag= nostic
>
>For the moment I've defined a separate MemOp in memory.h and provi= ded a
>mapping function in io_{read,write}x to map from TCGMemOp to MemOp= and
>then pass that into memory_region_dispatch_{read,write}.
>
>Other than not referencing TCGMemOp in the memory API, another rea= son
>for doing this was that I wasn't convinced that all the MO_ attrib= utes
>were valid outside of TCG. I do, of course, strongly defer to othe= r
>people's knowledge in this area though.
>
>
>2) The above changes to adjust_endianness() fail when
>memory_region_dispatch_{read,write} are called recursively
>
>Whilst booting qemu-system-sparc64 I see that
>memory_region_dispatch_{read,write} get called recursively - once = via
>io_{read,write}x and then again via flatview_read_continue() in ex= ec.c.
>
>The net effect of this is that we perform the bswap correctly at t= he
>tail of the recursion, but then as we travel back up the stack we = hit
>memory_region_dispatch_{read,write} once again causing a second bs= wap
>which means the value is returned with the incorrect endian again.=
>
>
>My understanding from your softmmu_template.h comment above is tha= t the
>memory API should do the endian swapping internally allowing the r= emoval
>of the final TGT_BE/TGT_LE applied to the result, or did I get thi= s wrong?
>
>> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and se= t
>>     a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK.=
>>
>> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is= set,
>>     then memop ^=3D MO_BSWAP.

Thanks all for the v1 and v2 feedback.

v2:
- Moved size+sign+endianness attributes from TCGMemOp into Mem= Op.
  In v1 TCGMemOp was re-purposed entirely into MemOp.
- Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias.=
  This is to avoid warnings on comparing and coercing different e= nums.
- Renamed get_memop to get_tcgmemop for clarity.
- MEMOP is now SIZE_MEMOP, which is just ctzl(size).
- Split patch 3/4 so one memory_region_dispatch_{read|write} interface=
  is converted per patch.
- Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead.
- Split patch 4/4 so adding the MemTxAddrs parameters and converting
  tlb_set_page() to tlb_set_page_with_attrs() is separate from us= age.
- CC'd maintainers.

v3:
- Like v1, the entire TCGMemOp enum is now MemOp.
- MemOp target dependant attributes are conditional upon NEE= D_CPU_H 

Tony Nguyen (15):
  tcg: TCGMemOp is now accelerator independent MemOp
  memory: Access MemoryRegion with MemOp
  target/mips: Access MemoryRegion with MemOp
  hw/s390x: Access MemoryRegion with MemOp
  hw/intc/armv7m_nic: Access MemoryRegion with MemOp
  hw/virtio: Access MemoryRegion with MemOp
  hw/vfio: Access MemoryRegion with MemOp
  exec: Access MemoryRegion with MemOp
  cputlb: Access MemoryRegion with MemOp
  memory: Access MemoryRegion with MemOp semantics
  memory: Single byte swap along the I/O path
  cpu: TLB_FLAGS_MASK bit to force memory slow path
  cputlb: Byte swap memory transaction attribute
  target/sparc: Add TLB entry with attributes
  target/sparc: sun4u Invert Endian TTE bit

 accel/tcg/cputlb.c             &nb= sp;        |  71 ++++++= 3;++--------
 exec.c                 &= nbsp;                |   6 = 3;-
 hw/intc/armv7m_nvic.c             =       |  12 ++-
 hw/s390x/s390-pci-inst.c           &nbs= p;    |   8 +-
 hw/vfio/pci-quirks.c             &= nbsp;      |   5 +-
 hw/virtio/virtio-pci.c            =      |   7 +-
 include/exec/cpu-all.h            =      |  10 ++-
 include/exec/memattrs.h            = ;     |   2 +
 include/exec/memop.h             &= nbsp;      | 112 ++++++++= 3;++++++++++++++= 3;+++
 include/exec/memory.h             =       |   9 ++-
 memory.c                =                |  37 += 3;+++----
 memory_ldst.inc.c             &nbs= p;         |  18 ++---
 target/alpha/translate.c           &nbs= p;    |   2 +-
 target/arm/translate-a64.c           &n= bsp;  |  48 ++++++------
 target/arm/translate-a64.h           &n= bsp;  |   2 +-
 target/arm/translate-sve.c           &n= bsp;  |   2 +-
 target/arm/translate.c            =      |  32 ++++----
 target/arm/translate.h            =      |   2 +-
 target/hppa/translate.c            = ;     |  14 ++--
 target/i386/translate.c            = ;     | 132 ++++++++++= 3;+++++----------------
 target/m68k/translate.c            = ;     |   2 +-
 target/microblaze/translate.c          = |   4 +-
 target/mips/op_helper.c            = ;     |   5 +-
 target/mips/translate.c            = ;     |   8 +-
 target/openrisc/translate.c           &= nbsp; |   4 +-
 target/ppc/translate.c            =      |  12 +--
 target/riscv/insn_trans/trans_rva.inc.c |   8 +-
 target/riscv/insn_trans/trans_rvi.inc.c |   4 +-
 target/s390x/translate.c           &nbs= p;    |   6 +-
 target/s390x/translate_vx.inc.c         | &n= bsp;10 +--
 target/sparc/cpu.h             &nb= sp;        |   2 +
 target/sparc/mmu_helper.c           &nb= sp;   |  40 ++++++----
 target/sparc/translate.c           &nbs= p;    |  14 ++--
 target/tilegx/translate.c           &nb= sp;   |  10 +--
 target/tricore/translate.c           &n= bsp;  |   8 +-
 tcg/README               &nbs= p;              |   2 +-
 tcg/aarch64/tcg-target.inc.c           =  |  26 +++----
 tcg/arm/tcg-target.inc.c           &nbs= p;    |  26 +++----
 tcg/i386/tcg-target.inc.c           &nb= sp;   |  24 +++---
 tcg/mips/tcg-target.inc.c           &nb= sp;   |  16 ++--
 tcg/optimize.c               =            |   2 +-
 tcg/ppc/tcg-target.inc.c           &nbs= p;    |  12 +--
 tcg/riscv/tcg-target.inc.c           &n= bsp;  |  20 ++---
 tcg/s390/tcg-target.inc.c           &nb= sp;   |  14 ++--
 tcg/sparc/tcg-target.inc.c           &n= bsp;  |   6 +-
 tcg/tcg-op.c               &n= bsp;            |  38 +++= 3;-----
 tcg/tcg-op.h               &n= bsp;            |  86 +++= 3;++++++-----------
 tcg/tcg.c                = ;               |   2 +-
 tcg/tcg.h                = ;               |  99 ++---= -------------------
 trace/mem-internal.h             &= nbsp;      |   4 +-
 trace/mem.h               &nb= sp;             |   4 +-
 51 files changed, 561 insertions(+), 488 deletions(-)
 create mode 100644 include/exec/memop.h

-- 
1.8.3.1



--_000_156403807375491133btcom_-- From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by 2002:a5d:51d0:0:0:0:0:0 with SMTP id n16csp10265936wrv; Thu, 25 Jul 2019 00:01:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqyiSYNp75oxkCtWMUC2fttU7H0eJ6OQEck2Htp1zHojFsapkIiDXEwgvrXh+rtdMCUPT8lx X-Received: by 2002:a05:620a:12c4:: with SMTP id e4mr57551856qkl.81.1564038100285; Thu, 25 Jul 2019 00:01:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564038100; cv=none; d=google.com; s=arc-20160816; b=QYUKhE0M/AlnJCM4QBU5MDbWRLpmg88O2uaPUbRAD/NYmutoQmT0mtIv344Vf5UnKb MuliCFF9yk27IDU9xuIaO+ymwDiAt3UQ1czvwT8hD7A1jRdP0BLadrq8wNb5MPPK6Zpq mkEK7JYfkdmm2wAcrZ92uNmMJ2a1/hlflpeqh4gYPKybLk1UW0xXNI3dLWB9Ao5+jmPj RpFb+s7ERpImACwJSXOAQJq+DBeISFvwajB8+jvMIIaYvagy44fOtUJu5rXuPo8EP3Jl spJKanDLbAHQo7ZyaKqaDL87p+EkZ9qymJ3jeeW2e9MHuwIWfNqJVgDugoSnGKrTZsBS inuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:mime-version :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:to:from; bh=bDfhe0e4ijMIN7qJeHZqpYL/CubQhwNPvJHzyDlfwUA=; b=B3PjrJGKXteLqtkLzHJ5MCjAdhWF23yGV7pu4smY7mnvmAQbHLrthNvBqxMOnq/x9W VjkXoHkUKmEtJE4HF4wLetpsEf9yrDMEh9qyMbdK1JYBTUy93DelCqBKwChKY82aAoz9 RPEjtX6a4/UacSeEz9gjJ7ejpdkuCXMrtM86zVwCIxblqb9czpDYvWz/N0U4tBlSTiAZ i7KbRIBXyF+oQi22FNLCcWa1Ws/zqnVCOh2+OdYBH6YLpo4FINeuvb1zklxovv2Ow/tp SqdozS4YaSCwS8QqR8AhxXfNOarP4mrlO0i7B7GJI1NNRm36F3rUu6VnyqMN9iUt4/Ui Y+Zw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bt.com Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id n38si32085949qvc.79.2019.07.25.00.01.40 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 25 Jul 2019 00:01:40 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bt.com Received: from localhost ([::1]:56120 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqXkx-00027O-PQ for alex.bennee@linaro.org; Thu, 25 Jul 2019 03:01:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50471) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqXkt-000274-Bk for qemu-arm@nongnu.org; Thu, 25 Jul 2019 03:01:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqXkr-0002BV-2c for qemu-arm@nongnu.org; Thu, 25 Jul 2019 03:01:35 -0400 Received: from smtpe1.intersmtp.com ([213.121.35.74]:58897) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hqXkb-0001yR-41; Thu, 25 Jul 2019 03:01:17 -0400 Received: from tpw09926dag18f.domain1.systemhost.net (10.9.212.26) by BWP09926079.bt.com (10.36.82.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1713.5; Thu, 25 Jul 2019 08:01:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net (10.9.212.18) by tpw09926dag18f.domain1.systemhost.net (10.9.212.26) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Thu, 25 Jul 2019 08:01:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c]) by tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c%12]) with mapi id 15.00.1395.000; Thu, 25 Jul 2019 08:01:14 +0100 From: To: Thread-Topic: [Qemu-devel] [PATCH v3 00/15] Invert Endian bit in SPARCv9 MMU TTE Thread-Index: AQHVQrP3RrTfmfnvbEOFfW6KFJKHhQ== Date: Thu, 25 Jul 2019 07:01:14 +0000 Message-ID: <1564038073754.91133@bt.com> References: , <1563810716254.18886@bt.com> In-Reply-To: <1563810716254.18886@bt.com> Accept-Language: en-AU, en-GB, en-US Content-Language: en-AU X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.187.101.42] Content-Type: multipart/alternative; boundary="_000_156403807375491133btcom_" MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 213.121.35.74 Subject: [Qemu-arm] [Qemu-devel] [PATCH v3 00/15] Invert Endian bit in SPARCv9 MMU TTE X-BeenThere: qemu-arm@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, walling@linux.ibm.com, mst@redhat.com, palmer@sifive.com, mark.cave-ayland@ilande.co.uk, Alistair.Francis@wdc.com, arikalo@wavecomp.com, david@redhat.com, pasic@linux.ibm.com, borntraeger@de.ibm.com, rth@twiddle.net, atar4qemu@gmail.com, ehabkost@redhat.com, sw@weilnetz.de, qemu-s390x@nongnu.org, qemu-arm@nongnu.org, david@gibson.dropbear.id.au, qemu-riscv@nongnu.org, cohuck@redhat.com, alex.williamson@redhat.com, qemu-ppc@nongnu.org, amarkovic@wavecomp.com, pbonzini@redhat.com, aurelien@aurel32.net Errors-To: qemu-arm-bounces+alex.bennee=linaro.org@nongnu.org Sender: "Qemu-arm" X-TUID: WQfZc1edSm+T --_000_156403807375491133btcom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE. It is an attempt of the instructions outlined by Richard Henderson to Mark Cave-Ayland. Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfortunate= ly a separate keyboard issue remains in the way. On 01/11/17 19:15, Mark Cave-Ayland wrote: >On 15/08/17 19:10, Richard Henderson wrote: > >> [CC Peter re MemTxAttrs below] >> >> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote: >>> Working through an incorrect endian issue on qemu-system-sparc64, it ha= s >>> become apparent that at least one OS makes use of the IE (Invert Endian= ) >>> bit in the SPARCv9 MMU TTE to map PCI memory space without the >>> programmer having to manually endian-swap accesses. >>> >>> In other words, to quote the UltraSPARC specification: "if this bit is >>> set, accesses to the associated page are processed with inverse >>> endianness from what is specified by the instruction (big-for-little an= d >>> little-for-big)". A good explanation by Mark why the IE bit is required. >>> >>> Looking through various bits of code, I'm trying to get a feel for the >>> best way to implement this in an efficient manner. From what I can see >>> this could be solved using an additional MMU index, however I'm not >>> overly familiar with the memory and softmmu subsystems. >> >> No, it can't be solved with an MMU index. >> >>> Can anyone point me in the right direction as to what would be the best >>> way to implement this feature within QEMU? >> >> It's definitely tricky. >> >> We definitely need some TLB_FLAGS_MASK bit set so that we're forced thro= ugh >> the >> memory slow path. There is no other way to bypass the endianness that w= e've >> already encoded from the target instruction. >> >> Given the tlb_set_page_with_attrs interface, I would think that we need = a new >> bit in MemTxAttrs, so that the target/sparc tlb_fill (and subroutines) c= an >> pass >> along the TTE bit for the given page. >> >> We have an existing problem in softmmu_template.h, >> >> /* ??? Note that the io helpers always read data in the target >> byte ordering. We should push the LE/BE request down into io. *= / >> res =3D glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr); >> res =3D TGT_BE(res); >> >> We do not want to add a third(!) byte swap along the i/o path. We need = to >> collapse the two that we have already before considering this one. >> >> This probably takes the form of: >> >> (1) Replacing the "int size" argument with "TCGMemOp memop" for >> a) io_{read,write}x in accel/tcg/cputlb.c, >> b) memory_region_dispatch_{read,write} in memory.c, >> c) adjust_endianness in memory.c. >> This carries size+sign+endianness down to the next level. >> >> (2) In memory.c, adjust_endianness, >> >> if (memory_region_wrong_endianness(mr)) { >> - switch (size) { >> + memop ^=3D MO_BSWAP; >> + } >> + if (memop & MO_BSWAP) { >> >> For extra credit, re-arrange memory_region_wrong_endianness >> to something more explicit -- "wrong" isn't helpful. > >Finally I've had a bit of spare time to experiment with this approach, >and from what I can see there are currently 2 issues: > > >1) Using TCGMemOp in memory.c means it is no longer accelerator agnostic > >For the moment I've defined a separate MemOp in memory.h and provided a >mapping function in io_{read,write}x to map from TCGMemOp to MemOp and >then pass that into memory_region_dispatch_{read,write}. > >Other than not referencing TCGMemOp in the memory API, another reason >for doing this was that I wasn't convinced that all the MO_ attributes >were valid outside of TCG. I do, of course, strongly defer to other >people's knowledge in this area though. > > >2) The above changes to adjust_endianness() fail when >memory_region_dispatch_{read,write} are called recursively > >Whilst booting qemu-system-sparc64 I see that >memory_region_dispatch_{read,write} get called recursively - once via >io_{read,write}x and then again via flatview_read_continue() in exec.c. > >The net effect of this is that we perform the bswap correctly at the >tail of the recursion, but then as we travel back up the stack we hit >memory_region_dispatch_{read,write} once again causing a second bswap >which means the value is returned with the incorrect endian again. > > >My understanding from your softmmu_template.h comment above is that the >memory API should do the endian swapping internally allowing the removal >of the final TGT_BE/TGT_LE applied to the result, or did I get this wrong? > >> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and set >> a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK. >> >> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is set, >> then memop ^=3D MO_BSWAP. Thanks all for the v1 and v2 feedback. v2: - Moved size+sign+endianness attributes from TCGMemOp into MemOp. In v1 TCGMemOp was re-purposed entirely into MemOp. - Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias. This is to avoid warnings on comparing and coercing different enums. - Renamed get_memop to get_tcgmemop for clarity. - MEMOP is now SIZE_MEMOP, which is just ctzl(size). - Split patch 3/4 so one memory_region_dispatch_{read|write} interface is converted per patch. - Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead. - Split patch 4/4 so adding the MemTxAddrs parameters and converting tlb_set_page() to tlb_set_page_with_attrs() is separate from usage. - CC'd maintainers. v3: - Like v1, the entire TCGMemOp enum is now MemOp. - MemOp target dependant attributes are conditional upon NEED_CPU_H Tony Nguyen (15): tcg: TCGMemOp is now accelerator independent MemOp memory: Access MemoryRegion with MemOp target/mips: Access MemoryRegion with MemOp hw/s390x: Access MemoryRegion with MemOp hw/intc/armv7m_nic: Access MemoryRegion with MemOp hw/virtio: Access MemoryRegion with MemOp hw/vfio: Access MemoryRegion with MemOp exec: Access MemoryRegion with MemOp cputlb: Access MemoryRegion with MemOp memory: Access MemoryRegion with MemOp semantics memory: Single byte swap along the I/O path cpu: TLB_FLAGS_MASK bit to force memory slow path cputlb: Byte swap memory transaction attribute target/sparc: Add TLB entry with attributes target/sparc: sun4u Invert Endian TTE bit accel/tcg/cputlb.c | 71 +++++++++-------- exec.c | 6 +- hw/intc/armv7m_nvic.c | 12 ++- hw/s390x/s390-pci-inst.c | 8 +- hw/vfio/pci-quirks.c | 5 +- hw/virtio/virtio-pci.c | 7 +- include/exec/cpu-all.h | 10 ++- include/exec/memattrs.h | 2 + include/exec/memop.h | 112 +++++++++++++++++++++++++++ include/exec/memory.h | 9 ++- memory.c | 37 +++++---- memory_ldst.inc.c | 18 ++--- target/alpha/translate.c | 2 +- target/arm/translate-a64.c | 48 ++++++------ target/arm/translate-a64.h | 2 +- target/arm/translate-sve.c | 2 +- target/arm/translate.c | 32 ++++---- target/arm/translate.h | 2 +- target/hppa/translate.c | 14 ++-- target/i386/translate.c | 132 ++++++++++++++++------------= ---- target/m68k/translate.c | 2 +- target/microblaze/translate.c | 4 +- target/mips/op_helper.c | 5 +- target/mips/translate.c | 8 +- target/openrisc/translate.c | 4 +- target/ppc/translate.c | 12 +-- target/riscv/insn_trans/trans_rva.inc.c | 8 +- target/riscv/insn_trans/trans_rvi.inc.c | 4 +- target/s390x/translate.c | 6 +- target/s390x/translate_vx.inc.c | 10 +-- target/sparc/cpu.h | 2 + target/sparc/mmu_helper.c | 40 ++++++---- target/sparc/translate.c | 14 ++-- target/tilegx/translate.c | 10 +-- target/tricore/translate.c | 8 +- tcg/README | 2 +- tcg/aarch64/tcg-target.inc.c | 26 +++---- tcg/arm/tcg-target.inc.c | 26 +++---- tcg/i386/tcg-target.inc.c | 24 +++--- tcg/mips/tcg-target.inc.c | 16 ++-- tcg/optimize.c | 2 +- tcg/ppc/tcg-target.inc.c | 12 +-- tcg/riscv/tcg-target.inc.c | 20 ++--- tcg/s390/tcg-target.inc.c | 14 ++-- tcg/sparc/tcg-target.inc.c | 6 +- tcg/tcg-op.c | 38 ++++----- tcg/tcg-op.h | 86 ++++++++++----------- tcg/tcg.c | 2 +- tcg/tcg.h | 99 ++---------------------- trace/mem-internal.h | 4 +- trace/mem.h | 4 +- 51 files changed, 561 insertions(+), 488 deletions(-) create mode 100644 include/exec/memop.h -- 1.8.3.1 --_000_156403807375491133btcom_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

This patchset implements the IE (Inve= rt Endian) bit in SPARCv9 MMU TTE.

It is an attempt of the instructions outlined by Richard Henderson to = Mark
Cave-Ayland.

Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfort= unately a
separate keyboard issue remains in the way.

On 01/11/17 19:15, Mark Cave-Ayland wrote:

>On 15/08/17 19:10, Richard Henderson wrote:
>
>> [CC Peter re MemTxAttrs below]
>>
>> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote:
>>> Working through an incorrect endian issue on qemu-system-= sparc64, it has
>>> become apparent that at least one OS makes use of the IE = (Invert Endian)
>>> bit in the SPARCv9 MMU TTE to map PCI memory space withou= t the
>>> programmer having to manually endian-swap accesses.
>>>
>>> In other words, to quote the UltraSPARC specification: &q= uot;if this bit is
>>> set, accesses to the associated page are processed with i= nverse
>>> endianness from what is specified by the instruction (big= -for-little and
>>> little-for-big)".

A good explanation by Mark why the IE bit is required.

>>>
>>> Looking through various bits of code, I'm trying to get a= feel for the
>>> best way to implement this in an efficient manner. From w= hat I can see
>>> this could be solved using an additional MMU index, howev= er I'm not
>>> overly familiar with the memory and softmmu subsystems.
>>
>> No, it can't be solved with an MMU index.
>>
>>> Can anyone point me in the right direction as to what wou= ld be the best
>>> way to implement this feature within QEMU?
>>
>> It's definitely tricky.
>>
>> We definitely need some TLB_FLAGS_MASK bit set so that we're = forced through
>> the
>> memory slow path.  There is no other way to bypass the e= ndianness that we've
>> already encoded from the target instruction.
>>
>> Given the tlb_set_page_with_attrs interface, I would think th= at we need a new
>> bit in MemTxAttrs, so that the target/sparc tlb_fill (and sub= routines) can
>> pass
>> along the TTE bit for the given page.
>>
>> We have an existing problem in softmmu_template.h,
>>
>>     /* ??? Note that the io helpers always read dat= a in the target
>>        byte ordering.  We should pus= h the LE/BE request down into io.  */
>>     res =3D glue(io_read, SUFFIX)(env, mmu_idx, ind= ex, addr, retaddr);
>>     res =3D TGT_BE(res);
>>
>> We do not want to add a third(!) byte swap along the i/o path= .  We need to
>> collapse the two that we have already before considering this= one.
>>
>> This probably takes the form of:
>>
>> (1) Replacing the "int size" argument with "TC= GMemOp memop" for
>>       a) io_{read,write}x in accel/tcg/cputlb.= c,
>>       b) memory_region_dispatch_{read,write} i= n memory.c,
>>       c) adjust_endianness in memory.c.
>>     This carries size+sign+endianness down = to the next level.
>>
>> (2) In memory.c, adjust_endianness,
>>
>>      if (memory_region_wrong_endianness(mr)) {=
>> -        switch (size) {
>> +        memop ^=3D MO_BSWAP;
>> +    }
>> +    if (memop & MO_BSWAP) {
>>
>>     For extra credit, re-arrange memory_region_wron= g_endianness
>>     to something more explicit -- "wrong"= isn't helpful.
>
>Finally I've had a bit of spare time to experiment with this appro= ach,
>and from what I can see there are currently 2 issues:
>
>
>1) Using TCGMemOp in memory.c means it is no longer accelerator ag= nostic
>
>For the moment I've defined a separate MemOp in memory.h and provi= ded a
>mapping function in io_{read,write}x to map from TCGMemOp to MemOp= and
>then pass that into memory_region_dispatch_{read,write}.
>
>Other than not referencing TCGMemOp in the memory API, another rea= son
>for doing this was that I wasn't convinced that all the MO_ attrib= utes
>were valid outside of TCG. I do, of course, strongly defer to othe= r
>people's knowledge in this area though.
>
>
>2) The above changes to adjust_endianness() fail when
>memory_region_dispatch_{read,write} are called recursively
>
>Whilst booting qemu-system-sparc64 I see that
>memory_region_dispatch_{read,write} get called recursively - once = via
>io_{read,write}x and then again via flatview_read_continue() in ex= ec.c.
>
>The net effect of this is that we perform the bswap correctly at t= he
>tail of the recursion, but then as we travel back up the stack we = hit
>memory_region_dispatch_{read,write} once again causing a second bs= wap
>which means the value is returned with the incorrect endian again.=
>
>
>My understanding from your softmmu_template.h comment above is tha= t the
>memory API should do the endian swapping internally allowing the r= emoval
>of the final TGT_BE/TGT_LE applied to the result, or did I get thi= s wrong?
>
>> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and se= t
>>     a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK.=
>>
>> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is= set,
>>     then memop ^=3D MO_BSWAP.

Thanks all for the v1 and v2 feedback.

v2:
- Moved size+sign+endianness attributes from TCGMemOp into Mem= Op.
  In v1 TCGMemOp was re-purposed entirely into MemOp.
- Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias.=
  This is to avoid warnings on comparing and coercing different e= nums.
- Renamed get_memop to get_tcgmemop for clarity.
- MEMOP is now SIZE_MEMOP, which is just ctzl(size).
- Split patch 3/4 so one memory_region_dispatch_{read|write} interface=
  is converted per patch.
- Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead.
- Split patch 4/4 so adding the MemTxAddrs parameters and converting
  tlb_set_page() to tlb_set_page_with_attrs() is separate from us= age.
- CC'd maintainers.

v3:
- Like v1, the entire TCGMemOp enum is now MemOp.
- MemOp target dependant attributes are conditional upon NEE= D_CPU_H 

Tony Nguyen (15):
  tcg: TCGMemOp is now accelerator independent MemOp
  memory: Access MemoryRegion with MemOp
  target/mips: Access MemoryRegion with MemOp
  hw/s390x: Access MemoryRegion with MemOp
  hw/intc/armv7m_nic: Access MemoryRegion with MemOp
  hw/virtio: Access MemoryRegion with MemOp
  hw/vfio: Access MemoryRegion with MemOp
  exec: Access MemoryRegion with MemOp
  cputlb: Access MemoryRegion with MemOp
  memory: Access MemoryRegion with MemOp semantics
  memory: Single byte swap along the I/O path
  cpu: TLB_FLAGS_MASK bit to force memory slow path
  cputlb: Byte swap memory transaction attribute
  target/sparc: Add TLB entry with attributes
  target/sparc: sun4u Invert Endian TTE bit

 accel/tcg/cputlb.c             &nb= sp;        |  71 ++++++= 3;++--------
 exec.c                 &= nbsp;                |   6 = 3;-
 hw/intc/armv7m_nvic.c             =       |  12 ++-
 hw/s390x/s390-pci-inst.c           &nbs= p;    |   8 +-
 hw/vfio/pci-quirks.c             &= nbsp;      |   5 +-
 hw/virtio/virtio-pci.c            =      |   7 +-
 include/exec/cpu-all.h            =      |  10 ++-
 include/exec/memattrs.h            = ;     |   2 +
 include/exec/memop.h             &= nbsp;      | 112 ++++++++= 3;++++++++++++++= 3;+++
 include/exec/memory.h             =       |   9 ++-
 memory.c                =                |  37 += 3;+++----
 memory_ldst.inc.c             &nbs= p;         |  18 ++---
 target/alpha/translate.c           &nbs= p;    |   2 +-
 target/arm/translate-a64.c           &n= bsp;  |  48 ++++++------
 target/arm/translate-a64.h           &n= bsp;  |   2 +-
 target/arm/translate-sve.c           &n= bsp;  |   2 +-
 target/arm/translate.c            =      |  32 ++++----
 target/arm/translate.h            =      |   2 +-
 target/hppa/translate.c            = ;     |  14 ++--
 target/i386/translate.c            = ;     | 132 ++++++++++= 3;+++++----------------
 target/m68k/translate.c            = ;     |   2 +-
 target/microblaze/translate.c          = |   4 +-
 target/mips/op_helper.c            = ;     |   5 +-
 target/mips/translate.c            = ;     |   8 +-
 target/openrisc/translate.c           &= nbsp; |   4 +-
 target/ppc/translate.c            =      |  12 +--
 target/riscv/insn_trans/trans_rva.inc.c |   8 +-
 target/riscv/insn_trans/trans_rvi.inc.c |   4 +-
 target/s390x/translate.c           &nbs= p;    |   6 +-
 target/s390x/translate_vx.inc.c         | &n= bsp;10 +--
 target/sparc/cpu.h             &nb= sp;        |   2 +
 target/sparc/mmu_helper.c           &nb= sp;   |  40 ++++++----
 target/sparc/translate.c           &nbs= p;    |  14 ++--
 target/tilegx/translate.c           &nb= sp;   |  10 +--
 target/tricore/translate.c           &n= bsp;  |   8 +-
 tcg/README               &nbs= p;              |   2 +-
 tcg/aarch64/tcg-target.inc.c           =  |  26 +++----
 tcg/arm/tcg-target.inc.c           &nbs= p;    |  26 +++----
 tcg/i386/tcg-target.inc.c           &nb= sp;   |  24 +++---
 tcg/mips/tcg-target.inc.c           &nb= sp;   |  16 ++--
 tcg/optimize.c               =            |   2 +-
 tcg/ppc/tcg-target.inc.c           &nbs= p;    |  12 +--
 tcg/riscv/tcg-target.inc.c           &n= bsp;  |  20 ++---
 tcg/s390/tcg-target.inc.c           &nb= sp;   |  14 ++--
 tcg/sparc/tcg-target.inc.c           &n= bsp;  |   6 +-
 tcg/tcg-op.c               &n= bsp;            |  38 +++= 3;-----
 tcg/tcg-op.h               &n= bsp;            |  86 +++= 3;++++++-----------
 tcg/tcg.c                = ;               |   2 +-
 tcg/tcg.h                = ;               |  99 ++---= -------------------
 trace/mem-internal.h             &= nbsp;      |   4 +-
 trace/mem.h               &nb= sp;             |   4 +-
 51 files changed, 561 insertions(+), 488 deletions(-)
 create mode 100644 include/exec/memop.h

-- 
1.8.3.1



--_000_156403807375491133btcom_-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFB44C76191 for ; Thu, 25 Jul 2019 07:01:52 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 87EEA22C97 for ; Thu, 25 Jul 2019 07:01:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 87EEA22C97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bt.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56128 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqXl9-0002cq-Pm for qemu-devel@archiver.kernel.org; Thu, 25 Jul 2019 03:01:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50487) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hqXky-0002E7-3d for qemu-devel@nongnu.org; Thu, 25 Jul 2019 03:01:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hqXkv-0002En-Lw for qemu-devel@nongnu.org; Thu, 25 Jul 2019 03:01:40 -0400 Received: from smtpe1.intersmtp.com ([213.121.35.74]:58897) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hqXkb-0001yR-41; Thu, 25 Jul 2019 03:01:17 -0400 Received: from tpw09926dag18f.domain1.systemhost.net (10.9.212.26) by BWP09926079.bt.com (10.36.82.110) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1713.5; Thu, 25 Jul 2019 08:01:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net (10.9.212.18) by tpw09926dag18f.domain1.systemhost.net (10.9.212.26) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Thu, 25 Jul 2019 08:01:14 +0100 Received: from tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c]) by tpw09926dag18e.domain1.systemhost.net ([fe80::a946:6348:ccf4:fa6c%12]) with mapi id 15.00.1395.000; Thu, 25 Jul 2019 08:01:14 +0100 From: To: Thread-Topic: [Qemu-devel] [PATCH v3 00/15] Invert Endian bit in SPARCv9 MMU TTE Thread-Index: AQHVQrP3RrTfmfnvbEOFfW6KFJKHhQ== Date: Thu, 25 Jul 2019 07:01:14 +0000 Message-ID: <1564038073754.91133@bt.com> References: , <1563810716254.18886@bt.com> In-Reply-To: <1563810716254.18886@bt.com> Accept-Language: en-AU, en-GB, en-US Content-Language: en-AU X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.187.101.42] MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 213.121.35.74 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 Subject: [Qemu-devel] [PATCH v3 00/15] Invert Endian bit in SPARCv9 MMU TTE X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, walling@linux.ibm.com, mst@redhat.com, palmer@sifive.com, mark.cave-ayland@ilande.co.uk, Alistair.Francis@wdc.com, arikalo@wavecomp.com, david@redhat.com, pasic@linux.ibm.com, borntraeger@de.ibm.com, rth@twiddle.net, atar4qemu@gmail.com, ehabkost@redhat.com, sw@weilnetz.de, qemu-s390x@nongnu.org, qemu-arm@nongnu.org, david@gibson.dropbear.id.au, qemu-riscv@nongnu.org, cohuck@redhat.com, alex.williamson@redhat.com, qemu-ppc@nongnu.org, amarkovic@wavecomp.com, pbonzini@redhat.com, aurelien@aurel32.net Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This patchset implements the IE (Invert Endian) bit in SPARCv9 MMU TTE. It is an attempt of the instructions outlined by Richard Henderson to Mark Cave-Ayland. Tested with OpenBSD on sun4u. Solaris 10 is my actual goal, but unfortunate= ly a separate keyboard issue remains in the way. On 01/11/17 19:15, Mark Cave-Ayland wrote: >On 15/08/17 19:10, Richard Henderson wrote: > >> [CC Peter re MemTxAttrs below] >> >> On 08/15/2017 09:38 AM, Mark Cave-Ayland wrote: >>> Working through an incorrect endian issue on qemu-system-sparc64, it ha= s >>> become apparent that at least one OS makes use of the IE (Invert Endian= ) >>> bit in the SPARCv9 MMU TTE to map PCI memory space without the >>> programmer having to manually endian-swap accesses. >>> >>> In other words, to quote the UltraSPARC specification: "if this bit is >>> set, accesses to the associated page are processed with inverse >>> endianness from what is specified by the instruction (big-for-little an= d >>> little-for-big)". A good explanation by Mark why the IE bit is required. >>> >>> Looking through various bits of code, I'm trying to get a feel for the >>> best way to implement this in an efficient manner. From what I can see >>> this could be solved using an additional MMU index, however I'm not >>> overly familiar with the memory and softmmu subsystems. >> >> No, it can't be solved with an MMU index. >> >>> Can anyone point me in the right direction as to what would be the best >>> way to implement this feature within QEMU? >> >> It's definitely tricky. >> >> We definitely need some TLB_FLAGS_MASK bit set so that we're forced thro= ugh >> the >> memory slow path. There is no other way to bypass the endianness that w= e've >> already encoded from the target instruction. >> >> Given the tlb_set_page_with_attrs interface, I would think that we need = a new >> bit in MemTxAttrs, so that the target/sparc tlb_fill (and subroutines) c= an >> pass >> along the TTE bit for the given page. >> >> We have an existing problem in softmmu_template.h, >> >> /* ??? Note that the io helpers always read data in the target >> byte ordering. We should push the LE/BE request down into io. *= / >> res =3D glue(io_read, SUFFIX)(env, mmu_idx, index, addr, retaddr); >> res =3D TGT_BE(res); >> >> We do not want to add a third(!) byte swap along the i/o path. We need = to >> collapse the two that we have already before considering this one. >> >> This probably takes the form of: >> >> (1) Replacing the "int size" argument with "TCGMemOp memop" for >> a) io_{read,write}x in accel/tcg/cputlb.c, >> b) memory_region_dispatch_{read,write} in memory.c, >> c) adjust_endianness in memory.c. >> This carries size+sign+endianness down to the next level. >> >> (2) In memory.c, adjust_endianness, >> >> if (memory_region_wrong_endianness(mr)) { >> - switch (size) { >> + memop ^=3D MO_BSWAP; >> + } >> + if (memop & MO_BSWAP) { >> >> For extra credit, re-arrange memory_region_wrong_endianness >> to something more explicit -- "wrong" isn't helpful. > >Finally I've had a bit of spare time to experiment with this approach, >and from what I can see there are currently 2 issues: > > >1) Using TCGMemOp in memory.c means it is no longer accelerator agnostic > >For the moment I've defined a separate MemOp in memory.h and provided a >mapping function in io_{read,write}x to map from TCGMemOp to MemOp and >then pass that into memory_region_dispatch_{read,write}. > >Other than not referencing TCGMemOp in the memory API, another reason >for doing this was that I wasn't convinced that all the MO_ attributes >were valid outside of TCG. I do, of course, strongly defer to other >people's knowledge in this area though. > > >2) The above changes to adjust_endianness() fail when >memory_region_dispatch_{read,write} are called recursively > >Whilst booting qemu-system-sparc64 I see that >memory_region_dispatch_{read,write} get called recursively - once via >io_{read,write}x and then again via flatview_read_continue() in exec.c. > >The net effect of this is that we perform the bswap correctly at the >tail of the recursion, but then as we travel back up the stack we hit >memory_region_dispatch_{read,write} once again causing a second bswap >which means the value is returned with the incorrect endian again. > > >My understanding from your softmmu_template.h comment above is that the >memory API should do the endian swapping internally allowing the removal >of the final TGT_BE/TGT_LE applied to the result, or did I get this wrong? > >> (3) In tlb_set_page_with_attrs, notice attrs.byte_swap and set >> a new TLB_FORCE_SLOW bit within TLB_FLAGS_MASK. >> >> (4) In io_{read,write}x, if iotlbentry->attrs.byte_swap is set, >> then memop ^=3D MO_BSWAP. Thanks all for the v1 and v2 feedback. v2: - Moved size+sign+endianness attributes from TCGMemOp into MemOp. In v1 TCGMemOp was re-purposed entirely into MemOp. - Replaced MemOp MO_{8|16|32|64} with TCGMemOp MO_{UB|UW|UL|UQ} alias. This is to avoid warnings on comparing and coercing different enums. - Renamed get_memop to get_tcgmemop for clarity. - MEMOP is now SIZE_MEMOP, which is just ctzl(size). - Split patch 3/4 so one memory_region_dispatch_{read|write} interface is converted per patch. - Do not reuse TLB_RECHECK, use new TLB_FORCE_SLOW instead. - Split patch 4/4 so adding the MemTxAddrs parameters and converting tlb_set_page() to tlb_set_page_with_attrs() is separate from usage. - CC'd maintainers. v3: - Like v1, the entire TCGMemOp enum is now MemOp. - MemOp target dependant attributes are conditional upon NEED_CPU_H Tony Nguyen (15): tcg: TCGMemOp is now accelerator independent MemOp memory: Access MemoryRegion with MemOp target/mips: Access MemoryRegion with MemOp hw/s390x: Access MemoryRegion with MemOp hw/intc/armv7m_nic: Access MemoryRegion with MemOp hw/virtio: Access MemoryRegion with MemOp hw/vfio: Access MemoryRegion with MemOp exec: Access MemoryRegion with MemOp cputlb: Access MemoryRegion with MemOp memory: Access MemoryRegion with MemOp semantics memory: Single byte swap along the I/O path cpu: TLB_FLAGS_MASK bit to force memory slow path cputlb: Byte swap memory transaction attribute target/sparc: Add TLB entry with attributes target/sparc: sun4u Invert Endian TTE bit accel/tcg/cputlb.c | 71 +++++++++-------- exec.c | 6 +- hw/intc/armv7m_nvic.c | 12 ++- hw/s390x/s390-pci-inst.c | 8 +- hw/vfio/pci-quirks.c | 5 +- hw/virtio/virtio-pci.c | 7 +- include/exec/cpu-all.h | 10 ++- include/exec/memattrs.h | 2 + include/exec/memop.h | 112 +++++++++++++++++++++++++++ include/exec/memory.h | 9 ++- memory.c | 37 +++++---- memory_ldst.inc.c | 18 ++--- target/alpha/translate.c | 2 +- target/arm/translate-a64.c | 48 ++++++------ target/arm/translate-a64.h | 2 +- target/arm/translate-sve.c | 2 +- target/arm/translate.c | 32 ++++---- target/arm/translate.h | 2 +- target/hppa/translate.c | 14 ++-- target/i386/translate.c | 132 ++++++++++++++++------------= ---- target/m68k/translate.c | 2 +- target/microblaze/translate.c | 4 +- target/mips/op_helper.c | 5 +- target/mips/translate.c | 8 +- target/openrisc/translate.c | 4 +- target/ppc/translate.c | 12 +-- target/riscv/insn_trans/trans_rva.inc.c | 8 +- target/riscv/insn_trans/trans_rvi.inc.c | 4 +- target/s390x/translate.c | 6 +- target/s390x/translate_vx.inc.c | 10 +-- target/sparc/cpu.h | 2 + target/sparc/mmu_helper.c | 40 ++++++---- target/sparc/translate.c | 14 ++-- target/tilegx/translate.c | 10 +-- target/tricore/translate.c | 8 +- tcg/README | 2 +- tcg/aarch64/tcg-target.inc.c | 26 +++---- tcg/arm/tcg-target.inc.c | 26 +++---- tcg/i386/tcg-target.inc.c | 24 +++--- tcg/mips/tcg-target.inc.c | 16 ++-- tcg/optimize.c | 2 +- tcg/ppc/tcg-target.inc.c | 12 +-- tcg/riscv/tcg-target.inc.c | 20 ++--- tcg/s390/tcg-target.inc.c | 14 ++-- tcg/sparc/tcg-target.inc.c | 6 +- tcg/tcg-op.c | 38 ++++----- tcg/tcg-op.h | 86 ++++++++++----------- tcg/tcg.c | 2 +- tcg/tcg.h | 99 ++---------------------- trace/mem-internal.h | 4 +- trace/mem.h | 4 +- 51 files changed, 561 insertions(+), 488 deletions(-) create mode 100644 include/exec/memop.h -- 1.8.3.1