qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* TCG change broke MorphOS boot on sam460ex
@ 2024-02-27 16:47 BALATON Zoltan
  2024-02-27 18:15 ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2024-02-27 16:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Richard Henderson, Philippe Mathieu-Daudé

Hello,

Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
before that release but apparently missed it back then). It can be 
reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
following command:

qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
   -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
   -device ide-cd,drive=cd,bus=ide.1

before:
Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE10000014, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE10000214, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE30000014, size 4, region '(null)', reason: rejected
Invalid write at addr 0xE30000214, size 4, region '(null)', reason: rejected
8.440| sam460_i2c_write: Error while writing, sts 34
8.463|
8.463|
8.463| ABox 1.30 (2.7.2018)...

after:
Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00c01374
Invalid read at addr 0x4F0000700, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00000700

Not sure what it's trying to do here, maybe decompressing some code and 
then trying to execute it? Any idea what could be the problem or what to 
check further?

Regards,
BALATON Zoltan


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-02-27 16:47 TCG change broke MorphOS boot on sam460ex BALATON Zoltan
@ 2024-02-27 18:15 ` Philippe Mathieu-Daudé
  2024-02-27 19:48   ` BALATON Zoltan
  2024-03-21 18:41   ` BALATON Zoltan
  0 siblings, 2 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-02-27 18:15 UTC (permalink / raw)
  To: BALATON Zoltan, qemu-devel; +Cc: qemu-ppc, Richard Henderson

Hi Zoltan,

On 27/2/24 17:47, BALATON Zoltan wrote:
> Hello,
> 
> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified 
> it before that release but apparently missed it back then). It can be 
> reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
> following command:
> 
> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>    -device ide-cd,drive=cd,bus=ide.1
> 
> before:
> Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
> Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
> Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
> Invalid write at addr 0xE10000014, size 4, region '(null)', reason: 
> rejected
> Invalid write at addr 0xE10000214, size 4, region '(null)', reason: 
> rejected
> Invalid write at addr 0xE30000014, size 4, region '(null)', reason: 
> rejected
> Invalid write at addr 0xE30000214, size 4, region '(null)', reason: 
> rejected
> 8.440| sam460_i2c_write: Error while writing, sts 34
> 8.463|
> 8.463|
> 8.463| ABox 1.30 (2.7.2018)...
> 
> after:
> Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
> Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
> Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
> invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00c01374
> Invalid read at addr 0x4F0000700, size 4, region '(null)', reason: rejected
> invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00000700
> 
> Not sure what it's trying to do here, maybe decompressing some code and 
> then trying to execute it? Any idea what could be the problem or what to 
> check further?

Are you testing with commit cf9b5790db ("accel/tcg: Remove CF_LAST_IO")
included?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-02-27 18:15 ` Philippe Mathieu-Daudé
@ 2024-02-27 19:48   ` BALATON Zoltan
  2024-03-21 18:41   ` BALATON Zoltan
  1 sibling, 0 replies; 10+ messages in thread
From: BALATON Zoltan @ 2024-02-27 19:48 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: qemu-devel, qemu-ppc, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

On Tue, 27 Feb 2024, Philippe Mathieu-Daudé wrote:
> Hi Zoltan,
>
> On 27/2/24 17:47, BALATON Zoltan wrote:
>> Hello,
>> 
>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
>> before that release but apparently missed it back then). It can be 
>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and following 
>> command:
>> 
>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>>    -device ide-cd,drive=cd,bus=ide.1
>> 
>> before:
>> Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
>> Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
>> Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
>> Invalid write at addr 0xE10000014, size 4, region '(null)', reason: 
>> rejected
>> Invalid write at addr 0xE10000214, size 4, region '(null)', reason: 
>> rejected
>> Invalid write at addr 0xE30000014, size 4, region '(null)', reason: 
>> rejected
>> Invalid write at addr 0xE30000214, size 4, region '(null)', reason: 
>> rejected
>> 8.440| sam460_i2c_write: Error while writing, sts 34
>> 8.463|
>> 8.463|
>> 8.463| ABox 1.30 (2.7.2018)...
>> 
>> after:
>> Invalid read at addr 0xC08001216, size 1, region '(null)', reason: rejected
>> Invalid read at addr 0x216, size 1, region '(null)', reason: rejected
>> Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
>> invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00c01374
>> Invalid read at addr 0x4F0000700, size 4, region '(null)', reason: rejected
>> invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00000700
>> 
>> Not sure what it's trying to do here, maybe decompressing some code and 
>> then trying to execute it? Any idea what could be the problem or what to 
>> check further?
>
> Are you testing with commit cf9b5790db ("accel/tcg: Remove CF_LAST_IO")
> included?

The issue happens starting with commit 18a536f1f8 and present even in 
current master. The commit before it (200c1f904f accel/tcg: Always set 
CF_LAST_IO with CF_NOIRQ) still works. Commit cf9b5790db does not work.

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-02-27 18:15 ` Philippe Mathieu-Daudé
  2024-02-27 19:48   ` BALATON Zoltan
@ 2024-03-21 18:41   ` BALATON Zoltan
  2024-04-02 11:32     ` BALATON Zoltan
  1 sibling, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2024-03-21 18:41 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, qemu-ppc, Nicholas Piggin, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 7251 bytes --]

On 27/2/24 17:47, BALATON Zoltan wrote:
> Hello,
> 
> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified 
> it before that release but apparently missed it back then). It can be 
> reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
> following command:
> 
> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>    -device ide-cd,drive=cd,bus=ide.1

Although it breaks at the TCG change it may also be related to tlbwe 
changes somehow but I don't really understand it. I've tried to get some 
more debug info in case somebody can tell what's happening. With 
18a536f1f8^ (the commit before the one it broke at and still works) I get:

----------------
IN:
ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f0000000 f0000000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 80000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 90000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> e0000000 ff000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 8 address 00c01000 PID 0 <=> e1000000 ff000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 9 address 00c01000 PID 0 <=> e3000000 fffffc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 10 address 00c01000 PID 0 <=> e3001000 fffffc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 11 address 00c01000 PID 0 <=> e4000000 ffffc000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 12 address 00c01000 PID 0 <=> e5000000 fff00000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 13 address 00c01000 PID 0 <=> ef000000 ff000000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 14 address 00c01000 PID 0 <=> e2000000 fff00000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 15 address 00c01000 PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c01000 => 0000000000c01000 7 0
0x00c01354:  38c00040  li       r6, 0x40
0x00c01358:  38e10204  addi     r7, r1, 0x204
0x00c0135c:  39010104  addi     r8, r1, 0x104
0x00c01360:  39410004  addi     r10, r1, 4
0x00c01364:  39200000  li       r9, 0
0x00c01368:  7cc903a6  mtctr    r6
0x00c0136c:  84c70004  lwzu     r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu     r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi     r9, r9, 1
0x00c01388:  4200ffe4  bdnz     0xc0136c

helper_440_tlbwe word 0 entry 0 value 00000290
ppcemb_tlb_check: TLB 0 address 0df6bfb0 PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6bfb0 => 00000004fdf6bfb0 7 0
Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
helper_440_tlbwe word 1 entry 0 value 00000000
ppcemb_tlb_check: TLB 0 address 0df6beb0 PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6beb0 => 000000000df6beb0 7 0
helper_440_tlbwe word 2 entry 0 value 0000003f
ppcemb_tlb_check: TLB 0 address 00c0136c PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c0136c => 0000000000c0136c 7 0
----------------

and with commit 18a536f1f8 this changes to

----------------
IN:
ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f0000000 f0000000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 80000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 90000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c0000000 f0000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> e0000000 ff000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 8 address 00c01000 PID 0 <=> e1000000 ff000000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 9 address 00c01000 PID 0 <=> e3000000 fffffc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 10 address 00c01000 PID 0 <=> e3001000 fffffc00 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 11 address 00c01000 PID 0 <=> e4000000 ffffc000 0 3b
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 12 address 00c01000 PID 0 <=> e5000000 fff00000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 13 address 00c01000 PID 0 <=> ef000000 ff000000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 14 address 00c01000 PID 0 <=> e2000000 fff00000 0 7f
mmubooke_check_tlb: TLB entry not found
ppcemb_tlb_check: TLB 15 address 00c01000 PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c01000 => 0000000000c01000 7 0
0x00c01354:  38c00040  li       r6, 0x40
0x00c01358:  38e10204  addi     r7, r1, 0x204
0x00c0135c:  39010104  addi     r8, r1, 0x104
0x00c01360:  39410004  addi     r10, r1, 4
0x00c01364:  39200000  li       r9, 0
0x00c01368:  7cc903a6  mtctr    r6
0x00c0136c:  84c70004  lwzu     r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu     r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi     r9, r9, 1
0x00c01388:  4200ffe4  bdnz     0xc0136c

helper_440_tlbwe word 0 entry 0 value 00000290
ppcemb_tlb_check: TLB 0 address 0df6bfb0 PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 0df6bfb0 => 00000004fdf6bfb0 7 0
ppcemb_tlb_check: TLB 0 address 00c01374 PID 0 <=> 00000000 f0000000 0 7f
mmubooke_check_tlb: good TLB!
mmubooke_get_physical_address: access granted 00c01374 => 00000004f0c01374 7 0
Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00c01374
----------------

Any idea?

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-03-21 18:41   ` BALATON Zoltan
@ 2024-04-02 11:32     ` BALATON Zoltan
  2024-04-03  5:15       ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: BALATON Zoltan @ 2024-04-02 11:32 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Nicholas Piggin, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 7978 bytes --]

On Thu, 21 Mar 2024, BALATON Zoltan wrote:
> On 27/2/24 17:47, BALATON Zoltan wrote:
>> Hello,
>> 
>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
>> before that release but apparently missed it back then). It can be 
>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and following 
>> command:
>> 
>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>>    -device ide-cd,drive=cd,bus=ide.1

Any idea on this one? While MorphOS boots on other machines and other OSes 
seem to boot on this machine it may still suggest there's some problem 
somewhere as this worked before. So it may worth investigating it to make 
sure there's no bug that could affect other OSes too even if they boot. I 
don't know how to debug this so some help would be needed.

Regards,
BALATON Zoltan

> Although it breaks at the TCG change it may also be related to tlbwe changes 
> somehow but I don't really understand it. I've tried to get some more debug 
> info in case somebody can tell what's happening. With 18a536f1f8^ (the commit 
> before the one it broke at and still works) I get:
>
> ----------------
> IN:
> ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f0000000 f0000000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 80000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 90000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> e0000000 ff000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 8 address 00c01000 PID 0 <=> e1000000 ff000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 9 address 00c01000 PID 0 <=> e3000000 fffffc00 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 10 address 00c01000 PID 0 <=> e3001000 fffffc00 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 11 address 00c01000 PID 0 <=> e4000000 ffffc000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 12 address 00c01000 PID 0 <=> e5000000 fff00000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 13 address 00c01000 PID 0 <=> ef000000 ff000000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 14 address 00c01000 PID 0 <=> e2000000 fff00000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 15 address 00c01000 PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 00c01000 => 0000000000c01000 7 
> 0
> 0x00c01354:  38c00040  li       r6, 0x40
> 0x00c01358:  38e10204  addi     r7, r1, 0x204
> 0x00c0135c:  39010104  addi     r8, r1, 0x104
> 0x00c01360:  39410004  addi     r10, r1, 4
> 0x00c01364:  39200000  li       r9, 0
> 0x00c01368:  7cc903a6  mtctr    r6
> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> 0x00c01384:  39290001  addi     r9, r9, 1
> 0x00c01388:  4200ffe4  bdnz     0xc0136c
>
> helper_440_tlbwe word 0 entry 0 value 00000290
> ppcemb_tlb_check: TLB 0 address 0df6bfb0 PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 0df6bfb0 => 00000004fdf6bfb0 7 
> 0
> Invalid read at addr 0x4FDF6BFB0, size 4, region '(null)', reason: rejected
> helper_440_tlbwe word 1 entry 0 value 00000000
> ppcemb_tlb_check: TLB 0 address 0df6beb0 PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 0df6beb0 => 000000000df6beb0 7 
> 0
> helper_440_tlbwe word 2 entry 0 value 0000003f
> ppcemb_tlb_check: TLB 0 address 00c0136c PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 00c0136c => 0000000000c0136c 7 
> 0
> ----------------
>
> and with commit 18a536f1f8 this changes to
>
> ----------------
> IN:
> ppcemb_tlb_check: TLB 0 address 00c01000 PID 0 <=> f0000000 f0000000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 1 address 00c01000 PID 0 <=> d0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 2 address 00c01000 PID 0 <=> 80000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 3 address 00c01000 PID 0 <=> 90000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 4 address 00c01000 PID 0 <=> a0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 5 address 00c01000 PID 0 <=> b0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 6 address 00c01000 PID 0 <=> c0000000 f0000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 7 address 00c01000 PID 0 <=> e0000000 ff000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 8 address 00c01000 PID 0 <=> e1000000 ff000000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 9 address 00c01000 PID 0 <=> e3000000 fffffc00 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 10 address 00c01000 PID 0 <=> e3001000 fffffc00 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 11 address 00c01000 PID 0 <=> e4000000 ffffc000 0 3b
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 12 address 00c01000 PID 0 <=> e5000000 fff00000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 13 address 00c01000 PID 0 <=> ef000000 ff000000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 14 address 00c01000 PID 0 <=> e2000000 fff00000 0 7f
> mmubooke_check_tlb: TLB entry not found
> ppcemb_tlb_check: TLB 15 address 00c01000 PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 00c01000 => 0000000000c01000 7 
> 0
> 0x00c01354:  38c00040  li       r6, 0x40
> 0x00c01358:  38e10204  addi     r7, r1, 0x204
> 0x00c0135c:  39010104  addi     r8, r1, 0x104
> 0x00c01360:  39410004  addi     r10, r1, 4
> 0x00c01364:  39200000  li       r9, 0
> 0x00c01368:  7cc903a6  mtctr    r6
> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> 0x00c01384:  39290001  addi     r9, r9, 1
> 0x00c01388:  4200ffe4  bdnz     0xc0136c
>
> helper_440_tlbwe word 0 entry 0 value 00000290
> ppcemb_tlb_check: TLB 0 address 0df6bfb0 PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 0df6bfb0 => 00000004fdf6bfb0 7 
> 0
> ppcemb_tlb_check: TLB 0 address 00c01374 PID 0 <=> 00000000 f0000000 0 7f
> mmubooke_check_tlb: good TLB!
> mmubooke_get_physical_address: access granted 00c01374 => 00000004f0c01374 7 
> 0
> Invalid read at addr 0x4F0C01374, size 4, region '(null)', reason: rejected
> invalid/unsupported opcode: 00 - 00 - 00 - 00 (00000000) 00c01374
> ----------------
>
> Any idea?
>
> Regards,
> BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-04-02 11:32     ` BALATON Zoltan
@ 2024-04-03  5:15       ` Nicholas Piggin
  2024-04-03 22:23         ` BALATON Zoltan
  2024-05-27 22:23         ` BALATON Zoltan
  0 siblings, 2 replies; 10+ messages in thread
From: Nicholas Piggin @ 2024-04-03  5:15 UTC (permalink / raw)
  To: BALATON Zoltan, qemu-devel; +Cc: qemu-ppc, Richard Henderson

On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
> > On 27/2/24 17:47, BALATON Zoltan wrote:
> >> Hello,
> >> 
> >> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting 
> >> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it 
> >> before that release but apparently missed it back then). It can be 
> >> reproduced with https://www.morphos-team.net/morphos-3.18.iso and following 
> >> command:
> >> 
> >> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
> >>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
> >>    -device ide-cd,drive=cd,bus=ide.1
>
> Any idea on this one? While MorphOS boots on other machines and other OSes 
> seem to boot on this machine it may still suggest there's some problem 
> somewhere as this worked before. So it may worth investigating it to make 
> sure there's no bug that could affect other OSes too even if they boot. I 
> don't know how to debug this so some help would be needed.

In the bad case it crashes after running this TB:

----------------
IN:
0x00c01354:  38c00040  li       r6, 0x40
0x00c01358:  38e10204  addi     r7, r1, 0x204
0x00c0135c:  39010104  addi     r8, r1, 0x104
0x00c01360:  39410004  addi     r10, r1, 4
0x00c01364:  39200000  li       r9, 0
0x00c01368:  7cc903a6  mtctr    r6
0x00c0136c:  84c70004  lwzu     r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu     r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi     r9, r9, 1
0x00c01388:  4200ffe4  bdnz     0xc0136c
----------------
IN:
0x00c01374: unable to read memory
----------------

"unable to read memory" is the tracer, it does actually translate
the address, but it points to a wayward real address which returns
0 to TCG, which is an invalid instruction.

The good case instead doesn't exit the TB after 0x00c01370 but after
the complete loop at the bdnz. That look like this after the same
first TB:

----------------
IN:
0x00c0136c:  84c70004  lwzu     r6, 4(r7)
0x00c01370:  7cc907a4  tlbwehi  r6, r9
0x00c01374:  84c80004  lwzu     r6, 4(r8)
0x00c01378:  7cc90fa4  tlbwelo  r6, r9
0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
0x00c01380:  7cc917a4  tlbwehi  r6, r9
0x00c01384:  39290001  addi     r9, r9, 1
0x00c01388:  4200ffe4  bdnz     0xc0136c
----------------
IN:
0x00c0138c:  4c00012c  isync

All the tlbwe are executed in the same TB. MMU tracing shows the
first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000
that has a garbage RPN because the tlbwelo did not run yet.

What's happening in the bad case is that the translator breaks
and "re-fetches" instructions in the middle of that sequence, and
that's where the bogus translation causes 0 to be returned. The
good case the whole block is executed in the same fetch which
creates correct translations.

So it looks like a morphos bug, the can-do-io change just happens
to cause it to re-fetch in that place, but that could happen for
a number of reasons, so you can't rely on TLB *only* changing or
ifetch *only* re-fetching at a sync point like isync.

I would expect code like this to write an invalid entry with tlbwehi,
then tlbwelo to set the correct RPN, then make the entry valid with
the second tlbwehi. It would probably fix the bug if you just did the
first tlbwehi with r6=0 (or at least without the 0x200 bit set).

Thanks,
Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-04-03  5:15       ` Nicholas Piggin
@ 2024-04-03 22:23         ` BALATON Zoltan
  2024-05-27 22:23         ` BALATON Zoltan
  1 sibling, 0 replies; 10+ messages in thread
From: BALATON Zoltan @ 2024-04-03 22:23 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: qemu-devel, qemu-ppc, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 4362 bytes --]

On Wed, 3 Apr 2024, Nicholas Piggin wrote:
> On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
>> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
>>> On 27/2/24 17:47, BALATON Zoltan wrote:
>>>> Hello,
>>>>
>>>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
>>>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
>>>> before that release but apparently missed it back then). It can be
>>>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and following
>>>> command:
>>>>
>>>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>>>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>>>>    -device ide-cd,drive=cd,bus=ide.1
>>
>> Any idea on this one? While MorphOS boots on other machines and other OSes
>> seem to boot on this machine it may still suggest there's some problem
>> somewhere as this worked before. So it may worth investigating it to make
>> sure there's no bug that could affect other OSes too even if they boot. I
>> don't know how to debug this so some help would be needed.
>
> In the bad case it crashes after running this TB:
>
> ----------------
> IN:
> 0x00c01354:  38c00040  li       r6, 0x40
> 0x00c01358:  38e10204  addi     r7, r1, 0x204
> 0x00c0135c:  39010104  addi     r8, r1, 0x104
> 0x00c01360:  39410004  addi     r10, r1, 4
> 0x00c01364:  39200000  li       r9, 0
> 0x00c01368:  7cc903a6  mtctr    r6
> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> 0x00c01384:  39290001  addi     r9, r9, 1
> 0x00c01388:  4200ffe4  bdnz     0xc0136c
> ----------------
> IN:
> 0x00c01374: unable to read memory
> ----------------
>
> "unable to read memory" is the tracer, it does actually translate
> the address, but it points to a wayward real address which returns
> 0 to TCG, which is an invalid instruction.
>
> The good case instead doesn't exit the TB after 0x00c01370 but after
> the complete loop at the bdnz. That look like this after the same
> first TB:
>
> ----------------
> IN:
> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> 0x00c01384:  39290001  addi     r9, r9, 1
> 0x00c01388:  4200ffe4  bdnz     0xc0136c
> ----------------
> IN:
> 0x00c0138c:  4c00012c  isync
>
> All the tlbwe are executed in the same TB. MMU tracing shows the
> first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000
> that has a garbage RPN because the tlbwelo did not run yet.
>
> What's happening in the bad case is that the translator breaks
> and "re-fetches" instructions in the middle of that sequence, and
> that's where the bogus translation causes 0 to be returned. The
> good case the whole block is executed in the same fetch which
> creates correct translations.
>
> So it looks like a morphos bug, the can-do-io change just happens
> to cause it to re-fetch in that place, but that could happen for
> a number of reasons, so you can't rely on TLB *only* changing or
> ifetch *only* re-fetching at a sync point like isync.

Thanks a lot for the analysis. Probably ir works on real machine due to 
cache effects so maybe it was just luck this did not break.

> I would expect code like this to write an invalid entry with tlbwehi,
> then tlbwelo to set the correct RPN, then make the entry valid with
> the second tlbwehi. It would probably fix the bug if you just did the
> first tlbwehi with r6=0 (or at least without the 0x200 bit set).

I think I had to fix a similar issue in AROS years ago when I've first 
tried to make sam460ex emulation work and used AROS for testing:
https://github.com/aros-development-team/AROS/commit/586a8ada8a5b861a77cab177d39e01de8c3f4cf5

I can't fix MorphOS as it's not open source but hope MorphOS people will 
get to know about this and do something with it. It still works better on 
other emulated machines such as pegasos2 and mac99 so it's not a big deal, 
just wanted to make sure it would not be a bug that could affect other 
OSes on sam460ex.

Thank you,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-04-03  5:15       ` Nicholas Piggin
  2024-04-03 22:23         ` BALATON Zoltan
@ 2024-05-27 22:23         ` BALATON Zoltan
  2024-05-27 22:55           ` BALATON Zoltan
  2024-05-28  3:30           ` Nicholas Piggin
  1 sibling, 2 replies; 10+ messages in thread
From: BALATON Zoltan @ 2024-05-27 22:23 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: qemu-devel, qemu-ppc, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 12668 bytes --]

On Wed, 3 Apr 2024, Nicholas Piggin wrote:
> On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
>> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
>>> On 27/2/24 17:47, BALATON Zoltan wrote:
>>>> Hello,
>>>>
>>>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
>>>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
>>>> before that release but apparently missed it back then). It can be
>>>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and following
>>>> command:
>>>>
>>>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>>>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>>>>    -device ide-cd,drive=cd,bus=ide.1
>>
>> Any idea on this one? While MorphOS boots on other machines and other OSes
>> seem to boot on this machine it may still suggest there's some problem
>> somewhere as this worked before. So it may worth investigating it to make
>> sure there's no bug that could affect other OSes too even if they boot. I
>> don't know how to debug this so some help would be needed.
>
> In the bad case it crashes after running this TB:
>
> ----------------
> IN:
> 0x00c01354:  38c00040  li       r6, 0x40
> 0x00c01358:  38e10204  addi     r7, r1, 0x204
> 0x00c0135c:  39010104  addi     r8, r1, 0x104
> 0x00c01360:  39410004  addi     r10, r1, 4
> 0x00c01364:  39200000  li       r9, 0
> 0x00c01368:  7cc903a6  mtctr    r6
> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> 0x00c01384:  39290001  addi     r9, r9, 1
> 0x00c01388:  4200ffe4  bdnz     0xc0136c
> ----------------
> IN:
> 0x00c01374: unable to read memory
> ----------------
>
> "unable to read memory" is the tracer, it does actually translate
> the address, but it points to a wayward real address which returns
> 0 to TCG, which is an invalid instruction.
>
> The good case instead doesn't exit the TB after 0x00c01370 but after
> the complete loop at the bdnz. That look like this after the same
> first TB:
>
> ----------------
> IN:
> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> 0x00c01384:  39290001  addi     r9, r9, 1
> 0x00c01388:  4200ffe4  bdnz     0xc0136c
> ----------------
> IN:
> 0x00c0138c:  4c00012c  isync
>
> All the tlbwe are executed in the same TB. MMU tracing shows the
> first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000
> that has a garbage RPN because the tlbwelo did not run yet.
>
> What's happening in the bad case is that the translator breaks
> and "re-fetches" instructions in the middle of that sequence, and
> that's where the bogus translation causes 0 to be returned. The
> good case the whole block is executed in the same fetch which
> creates correct translations.
>
> So it looks like a morphos bug, the can-do-io change just happens
> to cause it to re-fetch in that place, but that could happen for
> a number of reasons, so you can't rely on TLB *only* changing or
> ifetch *only* re-fetching at a sync point like isync.
>
> I would expect code like this to write an invalid entry with tlbwehi,
> then tlbwelo to set the correct RPN, then make the entry valid with
> the second tlbwehi. It would probably fix the bug if you just did the
> first tlbwehi with r6=0 (or at least without the 0x200 bit set).

Revisiting this, I've found in the docs that PPC440 has shadow TLBs so 
this code can rely upon the TLB not being invalidated until isync and 
works on real machine but breaks on QEMU. We would either need to make 
sure the TB runs until the sync or somehow emulate the shadow TLB. I've 
experimented with the latter but I could not make it work (and 
unexpectedly keeping a cache of the most recently used entries is slower 
than always searching through all TLB entries as done now so I've 
abandoned that idea). The problem is that an entry is modified by multiple 
tlbwe instructions but these can come in any order (and sometimes only one 
of them is done like invalidating an entry seems to only do one write) so 
I don't know when to copy the new entry to the TLB and when to wait for 
more parts and keep the old one. Any idea how to fix this?

Also I'm not sure if it's related but by running the stream benchmark on 
sam460ex now I can reproduce some memory access problem but I'm not sure 
what causes it. The full output of that benchmark under AmigaOS on 
sam460ex is this:

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 3 microseconds.
Each test below will take on the order of 186279 microseconds.
    (= 62093 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1723.8     0.095517     0.092821     0.103645
Scale:            790.2     0.206338     0.202479     0.214062
Add:              994.7     0.246171     0.241289     0.256950
Triad:            763.2     0.323731     0.314454     0.343873
-------------------------------------------------------------
Failed Validation on array a[], AvgRelAbsErr > epsilon (1.000000e-13)
      Expected Value: 1.153301e+12, AvgAbsErr: 1.137394e+12, AvgRelAbsErr: 9.862079e-01
      For array a[], 9863168 errors were found.
Failed Validation on array b[], AvgRelAbsErr > epsilon (1.000000e-13)
      Expected Value: 2.306602e+11, AvgAbsErr: 2.274872e+11, AvgRelAbsErr: 9.862438e-01
      AvgRelAbsErr > Epsilon (1.000000e-13)
      For array b[], 9863168 errors were found.
Failed Validation on array c[], AvgRelAbsErr > epsilon (1.000000e-13)
      Expected Value: 3.075469e+11, AvgAbsErr: 3.033024e+11, AvgRelAbsErr: 9.861989e-01
      AvgRelAbsErr > Epsilon (1.000000e-13)
      For array c[], 9863168 errors were found.
-------------------------------------------------------------

while on amigaone or pegasos2 the same executable finishes with:
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

On a real Sam460EX this same executable also validates as confirmed here:
https://www.amigans.net/modules/newbb/viewtopic.php?post_id=148020#forumpost148020

The binary and source is from here:
http://os4depot.net/?function=showfile&file=utility/benchmark/stream.lha

This binary runs on QEMU amigaone and pegasos2 that use G4 and validates 
so only seems to be a problem with 460EX. I've compiled the source for PPC 
Linux and tried running that with qemu-ppc linux-user to verify it which 
does not use MMU so it's expected to work and it does:

$ qemu-ppc -cpu 460ex streamPPC
-------------------------------------------------------------
Based on STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 192649 microseconds.
    (= 192649 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            3191.5     0.050227     0.050133     0.050584
Scale:            889.5     0.181873     0.179880     0.183075
Add:             1174.7     0.207856     0.204303     0.213941
Triad:            683.0     0.354251     0.351415     0.358936
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
Results Validation Verbose Results:
     Expected a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 307546875000.000000
     Observed a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 307546875000.000000
     Rel Errors on a, b, c:     0.000000e+00 0.000000e+00 0.000000e+00
-------------------------------------------------------------

or compiled with -O3 that was said to be used for the AmigaOS binary it's 
even better (as long as no FPU is used at least which is another known 
weak point of QEMU):

$ qemu-ppc -cpu 460ex streamPPCpowerpcO3
-------------------------------------------------------------
Based on STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 171833 microseconds.
    (= 171833 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            8931.7     0.017950     0.017914     0.018114
Scale:           1078.1     0.151183     0.148407     0.153068
Add:             1359.3     0.178790     0.176561     0.184122
Triad:           1161.2     0.210525     0.206683     0.216876
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
Results Validation Verbose Results:
     Expected a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 307546875000.000000
     Observed a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 307546875000.000000
     Rel Errors on a, b, c:     0.000000e+00 0.000000e+00 0.000000e+00
-------------------------------------------------------------

Then I've tried booting Linux on QEMU sam460ex and run my compiled Linux 
exe under that and it validates there so I could only reproduce this on 
AmigaOS with the binary in stream.lha but that binary works on real 
machine so there is some problem somewhere but I'm not sure what and how 
to debug it. I think this may be related to TLB writes though as AmigaOS 
seems to do a lot of those when running this test so maybe it hits some 
issues that does not happen normally. Fixing the known issue with missing 
shadow TLB as found with MorphOS might fix this too or we could at least 
rule that out then.

I'm open to ideas on this one as I don't have any on how to proceed.

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-05-27 22:23         ` BALATON Zoltan
@ 2024-05-27 22:55           ` BALATON Zoltan
  2024-05-28  3:30           ` Nicholas Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: BALATON Zoltan @ 2024-05-27 22:55 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: qemu-devel, qemu-ppc, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 15404 bytes --]

On Tue, 28 May 2024, BALATON Zoltan wrote:
> On Wed, 3 Apr 2024, Nicholas Piggin wrote:
>> On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
>>> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
>>>> On 27/2/24 17:47, BALATON Zoltan wrote:
>>>>> Hello,
>>>>> 
>>>>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
>>>>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified 
>>>>> it
>>>>> before that release but apparently missed it back then). It can be
>>>>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and 
>>>>> following
>>>>> command:
>>>>> 
>>>>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
>>>>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
>>>>>    -device ide-cd,drive=cd,bus=ide.1
>>> 
>>> Any idea on this one? While MorphOS boots on other machines and other OSes
>>> seem to boot on this machine it may still suggest there's some problem
>>> somewhere as this worked before. So it may worth investigating it to make
>>> sure there's no bug that could affect other OSes too even if they boot. I
>>> don't know how to debug this so some help would be needed.
>> 
>> In the bad case it crashes after running this TB:
>> 
>> ----------------
>> IN:
>> 0x00c01354:  38c00040  li       r6, 0x40
>> 0x00c01358:  38e10204  addi     r7, r1, 0x204
>> 0x00c0135c:  39010104  addi     r8, r1, 0x104
>> 0x00c01360:  39410004  addi     r10, r1, 4
>> 0x00c01364:  39200000  li       r9, 0
>> 0x00c01368:  7cc903a6  mtctr    r6
>> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
>> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
>> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
>> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
>> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
>> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
>> 0x00c01384:  39290001  addi     r9, r9, 1
>> 0x00c01388:  4200ffe4  bdnz     0xc0136c
>> ----------------
>> IN:
>> 0x00c01374: unable to read memory
>> ----------------
>> 
>> "unable to read memory" is the tracer, it does actually translate
>> the address, but it points to a wayward real address which returns
>> 0 to TCG, which is an invalid instruction.
>> 
>> The good case instead doesn't exit the TB after 0x00c01370 but after
>> the complete loop at the bdnz. That look like this after the same
>> first TB:
>> 
>> ----------------
>> IN:
>> 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
>> 0x00c01370:  7cc907a4  tlbwehi  r6, r9
>> 0x00c01374:  84c80004  lwzu     r6, 4(r8)
>> 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
>> 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
>> 0x00c01380:  7cc917a4  tlbwehi  r6, r9
>> 0x00c01384:  39290001  addi     r9, r9, 1
>> 0x00c01388:  4200ffe4  bdnz     0xc0136c
>> ----------------
>> IN:
>> 0x00c0138c:  4c00012c  isync
>> 
>> All the tlbwe are executed in the same TB. MMU tracing shows the
>> first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000
>> that has a garbage RPN because the tlbwelo did not run yet.
>> 
>> What's happening in the bad case is that the translator breaks
>> and "re-fetches" instructions in the middle of that sequence, and
>> that's where the bogus translation causes 0 to be returned. The
>> good case the whole block is executed in the same fetch which
>> creates correct translations.
>> 
>> So it looks like a morphos bug, the can-do-io change just happens
>> to cause it to re-fetch in that place, but that could happen for
>> a number of reasons, so you can't rely on TLB *only* changing or
>> ifetch *only* re-fetching at a sync point like isync.
>> 
>> I would expect code like this to write an invalid entry with tlbwehi,
>> then tlbwelo to set the correct RPN, then make the entry valid with
>> the second tlbwehi. It would probably fix the bug if you just did the
>> first tlbwehi with r6=0 (or at least without the 0x200 bit set).
>
> Revisiting this, I've found in the docs that PPC440 has shadow TLBs so this 
> code can rely upon the TLB not being invalidated until isync and works on 
> real machine but breaks on QEMU. We would either need to make sure the TB 
> runs until the sync or somehow emulate the shadow TLB. I've experimented with 
> the latter but I could not make it work (and unexpectedly keeping a cache of 
> the most recently used entries is slower than always searching through all 
> TLB entries as done now so I've abandoned that idea). The problem is that an 
> entry is modified by multiple tlbwe instructions but these can come in any 
> order (and sometimes only one of them is done like invalidating an entry 
> seems to only do one write) so I don't know when to copy the new entry to the 
> TLB and when to wait for more parts and keep the old one. Any idea how to fix 
> this?
>
> Also I'm not sure if it's related but by running the stream benchmark on 
> sam460ex now I can reproduce some memory access problem but I'm not sure what 
> causes it. The full output of that benchmark under AmigaOS on sam460ex is 
> this:
>
> -------------------------------------------------------------
> STREAM version $Revision: 5.10 $
> -------------------------------------------------------------
> This system uses 8 bytes per array element.
> -------------------------------------------------------------
> Array size = 10000000 (elements), Offset = 0 (elements)
> Memory per array = 76.3 MiB (= 0.1 GiB).
> Total memory required = 228.9 MiB (= 0.2 GiB).
> Each kernel will be executed 10 times.
> The *best* time for each kernel (excluding the first iteration)
> will be used to compute the reported bandwidth.
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 3 microseconds.
> Each test below will take on the order of 186279 microseconds.
>   (= 62093 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            1723.8     0.095517     0.092821     0.103645
> Scale:            790.2     0.206338     0.202479     0.214062
> Add:              994.7     0.246171     0.241289     0.256950
> Triad:            763.2     0.323731     0.314454     0.343873
> -------------------------------------------------------------
> Failed Validation on array a[], AvgRelAbsErr > epsilon (1.000000e-13)
>     Expected Value: 1.153301e+12, AvgAbsErr: 1.137394e+12, AvgRelAbsErr: 
> 9.862079e-01
>     For array a[], 9863168 errors were found.
> Failed Validation on array b[], AvgRelAbsErr > epsilon (1.000000e-13)
>     Expected Value: 2.306602e+11, AvgAbsErr: 2.274872e+11, AvgRelAbsErr: 
> 9.862438e-01
>     AvgRelAbsErr > Epsilon (1.000000e-13)
>     For array b[], 9863168 errors were found.
> Failed Validation on array c[], AvgRelAbsErr > epsilon (1.000000e-13)
>     Expected Value: 3.075469e+11, AvgAbsErr: 3.033024e+11, AvgRelAbsErr: 
> 9.861989e-01
>     AvgRelAbsErr > Epsilon (1.000000e-13)
>     For array c[], 9863168 errors were found.
> -------------------------------------------------------------
>
> while on amigaone or pegasos2 the same executable finishes with:
> -------------------------------------------------------------
> Solution Validates: avg error less than 1.000000e-13 on all three arrays
> -------------------------------------------------------------
>
> On a real Sam460EX this same executable also validates as confirmed here:
> https://www.amigans.net/modules/newbb/viewtopic.php?post_id=148020#forumpost148020
>
> The binary and source is from here:
> http://os4depot.net/?function=showfile&file=utility/benchmark/stream.lha
>
> This binary runs on QEMU amigaone and pegasos2 that use G4 and validates so 
> only seems to be a problem with 460EX. I've compiled the source for PPC Linux 
> and tried running that with qemu-ppc linux-user to verify it which does not 
> use MMU so it's expected to work and it does:
>
> $ qemu-ppc -cpu 460ex streamPPC
> -------------------------------------------------------------
> Based on STREAM version $Revision: 5.10 $
> -------------------------------------------------------------
> This system uses 8 bytes per array element.
> -------------------------------------------------------------
> Array size = 10000000 (elements), Offset = 0 (elements)
> Memory per array = 76.3 MiB (= 0.1 GiB).
> Total memory required = 228.9 MiB (= 0.2 GiB).
> Each kernel will be executed 10 times.
> The *best* time for each kernel (excluding the first iteration)
> will be used to compute the reported bandwidth.
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 192649 microseconds.
>   (= 192649 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            3191.5     0.050227     0.050133     0.050584
> Scale:            889.5     0.181873     0.179880     0.183075
> Add:             1174.7     0.207856     0.204303     0.213941
> Triad:            683.0     0.354251     0.351415     0.358936
> -------------------------------------------------------------
> Solution Validates: avg error less than 1.000000e-13 on all three arrays
> Results Validation Verbose Results:
>    Expected a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 
> 307546875000.000000
>    Observed a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 
> 307546875000.000000
>    Rel Errors on a, b, c:     0.000000e+00 0.000000e+00 0.000000e+00
> -------------------------------------------------------------
>
> or compiled with -O3 that was said to be used for the AmigaOS binary it's 
> even better (as long as no FPU is used at least which is another known weak 
> point of QEMU):
>
> $ qemu-ppc -cpu 460ex streamPPCpowerpcO3
> -------------------------------------------------------------
> Based on STREAM version $Revision: 5.10 $
> -------------------------------------------------------------
> This system uses 8 bytes per array element.
> -------------------------------------------------------------
> Array size = 10000000 (elements), Offset = 0 (elements)
> Memory per array = 76.3 MiB (= 0.1 GiB).
> Total memory required = 228.9 MiB (= 0.2 GiB).
> Each kernel will be executed 10 times.
> The *best* time for each kernel (excluding the first iteration)
> will be used to compute the reported bandwidth.
> -------------------------------------------------------------
> Your clock granularity/precision appears to be 1 microseconds.
> Each test below will take on the order of 171833 microseconds.
>   (= 171833 clock ticks)
> Increase the size of the arrays if this shows that
> you are not getting at least 20 clock ticks per test.
> -------------------------------------------------------------
> WARNING -- The above is only a rough guideline.
> For best results, please be sure you know the
> precision of your system timer.
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            8931.7     0.017950     0.017914     0.018114
> Scale:           1078.1     0.151183     0.148407     0.153068
> Add:             1359.3     0.178790     0.176561     0.184122
> Triad:           1161.2     0.210525     0.206683     0.216876
> -------------------------------------------------------------
> Solution Validates: avg error less than 1.000000e-13 on all three arrays
> Results Validation Verbose Results:
>    Expected a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 
> 307546875000.000000
>    Observed a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 
> 307546875000.000000
>    Rel Errors on a, b, c:     0.000000e+00 0.000000e+00 0.000000e+00
> -------------------------------------------------------------
>
> Then I've tried booting Linux on QEMU sam460ex and run my compiled Linux exe 
> under that and it validates there so I could only reproduce this on AmigaOS

For completeness here's the run on sam460ex Linux:

$ ./streamPPCpowerpcO3
-------------------------------------------------------------
Based on STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 2 microseconds.
Each test below will take on the order of 169662 microseconds.
    (= 84831 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            2143.7     0.075259     0.074636     0.076172
Scale:            846.4     0.190517     0.189029     0.192099
Add:             1043.7     0.232548     0.229945     0.237299
Triad:            814.4     0.299620     0.294695     0.309323
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
Results Validation Verbose Results:
     Expected a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 307546875000.000000
     Observed a(1), b(1), c(1): 1153300781250.000000 230660156250.000000 307546875000.000000
     Rel Errors on a, b, c:     0.000000e+00 0.000000e+00 0.000000e+00
-------------------------------------------------------------

> with the binary in stream.lha but that binary works on real machine so there 
> is some problem somewhere but I'm not sure what and how to debug it. I think 
> this may be related to TLB writes though as AmigaOS seems to do a lot of 
> those when running this test so maybe it hits some issues that does not 
> happen normally. Fixing the known issue with missing shadow TLB as found with 
> MorphOS might fix this too or we could at least rule that out then.

Also it's not something that broke recently and thus could be bisected. I 
get the same validation error with QEMU v8.2.0 before any embedded tlb 
changes or even before that so it may be something pre existing and then 
not related to the MorphOS boot problem above.

> I'm open to ideas on this one as I don't have any on how to proceed.
>
> Regards,
> BALATON Zoltan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: TCG change broke MorphOS boot on sam460ex
  2024-05-27 22:23         ` BALATON Zoltan
  2024-05-27 22:55           ` BALATON Zoltan
@ 2024-05-28  3:30           ` Nicholas Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2024-05-28  3:30 UTC (permalink / raw)
  To: BALATON Zoltan; +Cc: qemu-devel, qemu-ppc, Richard Henderson

On Tue May 28, 2024 at 8:23 AM AEST, BALATON Zoltan wrote:
> On Wed, 3 Apr 2024, Nicholas Piggin wrote:
> > On Tue Apr 2, 2024 at 9:32 PM AEST, BALATON Zoltan wrote:
> >> On Thu, 21 Mar 2024, BALATON Zoltan wrote:
> >>> On 27/2/24 17:47, BALATON Zoltan wrote:
> >>>> Hello,
> >>>>
> >>>> Commit 18a536f1f8 (accel/tcg: Always require can_do_io) broke booting
> >>>> MorphOS on sam460ex (this was before 8.2.0 and I thought I've verified it
> >>>> before that release but apparently missed it back then). It can be
> >>>> reproduced with https://www.morphos-team.net/morphos-3.18.iso and following
> >>>> command:
> >>>>
> >>>> qemu-system-ppc -M sam460ex -serial stdio -d unimp,guest_errors \
> >>>>    -drive if=none,id=cd,format=raw,file=morphos-3.18.iso \
> >>>>    -device ide-cd,drive=cd,bus=ide.1
> >>
> >> Any idea on this one? While MorphOS boots on other machines and other OSes
> >> seem to boot on this machine it may still suggest there's some problem
> >> somewhere as this worked before. So it may worth investigating it to make
> >> sure there's no bug that could affect other OSes too even if they boot. I
> >> don't know how to debug this so some help would be needed.
> >
> > In the bad case it crashes after running this TB:
> >
> > ----------------
> > IN:
> > 0x00c01354:  38c00040  li       r6, 0x40
> > 0x00c01358:  38e10204  addi     r7, r1, 0x204
> > 0x00c0135c:  39010104  addi     r8, r1, 0x104
> > 0x00c01360:  39410004  addi     r10, r1, 4
> > 0x00c01364:  39200000  li       r9, 0
> > 0x00c01368:  7cc903a6  mtctr    r6
> > 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> > 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> > 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> > 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> > 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> > 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> > 0x00c01384:  39290001  addi     r9, r9, 1
> > 0x00c01388:  4200ffe4  bdnz     0xc0136c
> > ----------------
> > IN:
> > 0x00c01374: unable to read memory
> > ----------------
> >
> > "unable to read memory" is the tracer, it does actually translate
> > the address, but it points to a wayward real address which returns
> > 0 to TCG, which is an invalid instruction.
> >
> > The good case instead doesn't exit the TB after 0x00c01370 but after
> > the complete loop at the bdnz. That look like this after the same
> > first TB:
> >
> > ----------------
> > IN:
> > 0x00c0136c:  84c70004  lwzu     r6, 4(r7)
> > 0x00c01370:  7cc907a4  tlbwehi  r6, r9
> > 0x00c01374:  84c80004  lwzu     r6, 4(r8)
> > 0x00c01378:  7cc90fa4  tlbwelo  r6, r9
> > 0x00c0137c:  84ca0004  lwzu     r6, 4(r10)
> > 0x00c01380:  7cc917a4  tlbwehi  r6, r9
> > 0x00c01384:  39290001  addi     r9, r9, 1
> > 0x00c01388:  4200ffe4  bdnz     0xc0136c
> > ----------------
> > IN:
> > 0x00c0138c:  4c00012c  isync
> >
> > All the tlbwe are executed in the same TB. MMU tracing shows the
> > first tlbwehi creates a new valid(!) TLB for 0x00000000-0x100000000
> > that has a garbage RPN because the tlbwelo did not run yet.
> >
> > What's happening in the bad case is that the translator breaks
> > and "re-fetches" instructions in the middle of that sequence, and
> > that's where the bogus translation causes 0 to be returned. The
> > good case the whole block is executed in the same fetch which
> > creates correct translations.
> >
> > So it looks like a morphos bug, the can-do-io change just happens
> > to cause it to re-fetch in that place, but that could happen for
> > a number of reasons, so you can't rely on TLB *only* changing or
> > ifetch *only* re-fetching at a sync point like isync.
> >
> > I would expect code like this to write an invalid entry with tlbwehi,
> > then tlbwelo to set the correct RPN, then make the entry valid with
> > the second tlbwehi. It would probably fix the bug if you just did the
> > first tlbwehi with r6=0 (or at least without the 0x200 bit set).
>
> Revisiting this, I've found in the docs that PPC440 has shadow TLBs so 
> this code can rely upon the TLB not being invalidated until isync and 
> works on real machine but breaks on QEMU.

I never programmed for 440 but it's unclear to me from the docs how
much you can rely on this programatically (you would have to ensure
no page crossings, disable interrupts, hope for no machine check,
etc).

But it does break real software so whether or not it is following
exact letter of the law, it would be good to fix.

> We would either need to make 
> sure the TB runs until the sync or somehow emulate the shadow TLB. I've 
> experimented with the latter but I could not make it work (and 
> unexpectedly keeping a cache of the most recently used entries is slower 
> than always searching through all TLB entries as done now so I've 
> abandoned that idea). The problem is that an entry is modified by multiple 
> tlbwe instructions but these can come in any order (and sometimes only one 
> of them is done like invalidating an entry seems to only do one write) so 
> I don't know when to copy the new entry to the TLB and when to wait for 
> more parts and keep the old one. Any idea how to fix this?

Depends what the important common cases are and exactly how faithfully
you want to model the hardware behaviour I guess.

How are you trying to emulate the shadow TLB? I attached a really quick
hack to see what that would look like... That is modeling the hardware
filled TLB structure ahead of the machine's software TLB. It's not a
perfect model but might be enough, one downside is that it flushes
entire QEMU TLB for any BookE TLB entry change.

The other way to go might be to keep a structure containing the list of
outstanding BookE TLB modifications, and replay that into the TLB on
sync events. That way your QEMU TLB refill path has no extra data
structure to look up or maintain, and you could do more precise
flushing of the QEMU TLB when you apply the changes.

Difficulty would be that TLB instructions would become more complicated
and expensive (reads can't just go to the TLB they would have to
find the most recent change, etc). But maybe that is the better tradeoff
if your lookups are relatively much more common than software-tlb
instructions are not.

Thanks,
Nick

---
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 2015e603d4..afbc766fd1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -377,6 +377,12 @@ union ppc_tlb_t {
     ppc6xx_tlb_t *tlb6;
     ppcemb_tlb_t *tlbe;
     ppcmas_tlb_t *tlbm;
+
+    /* 440 shadow TLB */
+    ppcemb_tlb_t ishadow[4];
+    int ishadow_idx;
+    ppcemb_tlb_t dshadow[8];
+    int dshadow_idx;
 };
 
 /* possible TLB variants */
diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 02076e96fb..3207b594e1 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -363,10 +363,22 @@ void store_40x_sler(CPUPPCState *env, uint32_t val)
     env->spr[SPR_405_SLER] = val;
 }
 
+void ppc4xx_tlb_invalidate_shadow(CPUPPCState *env);
 void check_tlb_flush(CPUPPCState *env, bool global)
 {
     CPUState *cs = env_cpu(env);
 
+    if (env->mmu_model == POWERPC_MMU_SOFT_4xx) {
+        assert(!(env->tlb_need_flush & TLB_NEED_GLOBAL_FLUSH));
+
+        if (env->tlb_need_flush & TLB_NEED_LOCAL_FLUSH) {
+            env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
+            ppc4xx_tlb_invalidate_shadow(env);
+            tlb_flush(cs);
+        }
+        return;
+    }
+
     /* Handle global flushes first */
     if (global && (env->tlb_need_flush & TLB_NEED_GLOBAL_FLUSH)) {
         env->tlb_need_flush &= ~TLB_NEED_GLOBAL_FLUSH;
diff --git a/target/ppc/mmu-booke.c b/target/ppc/mmu-booke.c
index 55e5dd7c6b..31509be39b 100644
--- a/target/ppc/mmu-booke.c
+++ b/target/ppc/mmu-booke.c
@@ -74,55 +74,91 @@ int mmu40x_get_physical_address(CPUPPCState *env, hwaddr *raddr, int *prot,
 {
     ppcemb_tlb_t *tlb;
     int i, ret, zsel, zpr, pr;
+    uint32_t pid = env->spr[SPR_40x_PID];
 
     ret = -1;
-    pr = FIELD_EX64(env->msr, MSR, PR);
+
+    /* Check "shadow TLBs" first */
+    if (access_type == MMU_INST_FETCH) {
+        for (i = 0; i < 4; i++) {
+            tlb = &env->tlb.ishadow[i];
+            if (ppcemb_tlb_check(env, tlb, raddr, address, pid, -i)) {
+                goto found;
+            }
+        }
+    } else {
+        for (i = 0; i < 8; i++) {
+            tlb = &env->tlb.dshadow[i];
+            if (ppcemb_tlb_check(env, tlb, raddr, address, pid, -i)) {
+                goto found;
+            }
+        }
+    }
+    /* Then check main (software visible) TLB */
     for (i = 0; i < env->nb_tlb; i++) {
         tlb = &env->tlb.tlbe[i];
-        if (!ppcemb_tlb_check(env, tlb, raddr, address,
-                              env->spr[SPR_40x_PID], i)) {
-            continue;
+        if (ppcemb_tlb_check(env, tlb, raddr, address, pid, i)) {
+            goto found_main;
         }
-        zsel = (tlb->attr >> 4) & 0xF;
-        zpr = (env->spr[SPR_40x_ZPR] >> (30 - (2 * zsel))) & 0x3;
-        qemu_log_mask(CPU_LOG_MMU,
-                      "%s: TLB %d zsel %d zpr %d ty %d attr %08x\n",
-                      __func__, i, zsel, zpr, access_type, tlb->attr);
-        /* Check execute enable bit */
-        switch (zpr) {
-        case 0x2:
-            if (pr != 0) {
-                goto check_perms;
-            }
-            /* fall through */
-        case 0x3:
-            /* All accesses granted */
-            *prot = PAGE_RWX;
-            ret = 0;
-            break;
+    }
+    goto out;
 
-        case 0x0:
-            if (pr != 0) {
-                /* Raise Zone protection fault.  */
-                env->spr[SPR_40x_ESR] = 1 << 22;
-                *prot = 0;
-                ret = -2;
-                break;
-            }
-            /* fall through */
-        case 0x1:
-check_perms:
-            /* Check from TLB entry */
-            *prot = tlb->prot;
-            if (check_prot_access_type(*prot, access_type)) {
-                ret = 0;
-            } else {
-                env->spr[SPR_40x_ESR] = 0;
-                ret = -2;
-            }
+found_main:
+    /* Shadow must be reloaded, FIFO replacement */
+    if (access_type == MMU_INST_FETCH) {
+        env->tlb.ishadow[env->tlb.ishadow_idx] = *tlb;
+        env->tlb.ishadow_idx++;
+        env->tlb.ishadow_idx %= 4;
+    } else {
+        env->tlb.dshadow[env->tlb.dshadow_idx] = *tlb;
+        env->tlb.dshadow_idx++;
+        env->tlb.dshadow_idx %= 8;
+    }
+
+found:
+    pr = FIELD_EX64(env->msr, MSR, PR);
+
+    zsel = (tlb->attr >> 4) & 0xF;
+    zpr = (env->spr[SPR_40x_ZPR] >> (30 - (2 * zsel))) & 0x3;
+    qemu_log_mask(CPU_LOG_MMU,
+                  "%s: TLB %d zsel %d zpr %d ty %d attr %08x\n",
+                  __func__, i, zsel, zpr, access_type, tlb->attr);
+    /* Check execute enable bit */
+    switch (zpr) {
+    case 0x2:
+        if (pr != 0) {
+            goto check_perms;
+        }
+        /* fall through */
+    case 0x3:
+        /* All accesses granted */
+        *prot = PAGE_RWX;
+        ret = 0;
+        break;
+
+    case 0x0:
+        if (pr != 0) {
+            /* Raise Zone protection fault.  */
+            env->spr[SPR_40x_ESR] = 1 << 22;
+            *prot = 0;
+            ret = -2;
             break;
         }
+        /* fall through */
+    case 0x1:
+check_perms:
+        /* Check from TLB entry */
+        *prot = tlb->prot;
+        if (check_prot_access_type(*prot, access_type)) {
+            ret = 0;
+        } else {
+            env->spr[SPR_40x_ESR] = 0;
+            ret = -2;
+        }
+        break;
     }
+
+out:
     qemu_log_mask(CPU_LOG_MMU, "%s: access %s " TARGET_FMT_lx " => "
                   HWADDR_FMT_plx " %d %d\n",  __func__,
                   ret < 0 ? "refused" : "granted", address,
diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index b0a0676beb..502ddf65b6 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -108,6 +108,22 @@ static void ppc6xx_tlb_store(CPUPPCState *env, target_ulong EPN, int way,
 }
 
 /* Helpers specific to PowerPC 40x implementations */
+void ppc4xx_tlb_invalidate_shadow(CPUPPCState *env);
+void ppc4xx_tlb_invalidate_shadow(CPUPPCState *env)
+{
+    ppcemb_tlb_t *tlb;
+    int i;
+
+    for (i = 0; i < 4; i++) {
+        tlb = &env->tlb.ishadow[i];
+        tlb->prot &= ~PAGE_VALID;
+    }
+    for (i = 0; i < 8; i++) {
+        tlb = &env->tlb.dshadow[i];
+        tlb->prot &= ~PAGE_VALID;
+    }
+}
+
 static inline void ppc4xx_tlb_invalidate_all(CPUPPCState *env)
 {
     ppcemb_tlb_t *tlb;
@@ -117,6 +133,7 @@ static inline void ppc4xx_tlb_invalidate_all(CPUPPCState *env)
         tlb = &env->tlb.tlbe[i];
         tlb->prot &= ~PAGE_VALID;
     }
+    ppc4xx_tlb_invalidate_shadow(env);
     tlb_flush(env_cpu(env));
 }
 
@@ -719,6 +736,7 @@ target_ulong helper_4xx_tlbre_lo(CPUPPCState *env, target_ulong entry)
     return ret;
 }
 
+#if 0
 static void ppcemb_tlb_flush(CPUState *cs, ppcemb_tlb_t *tlb)
 {
     unsigned mmu_idx = 0;
@@ -736,6 +754,7 @@ static void ppcemb_tlb_flush(CPUState *cs, ppcemb_tlb_t *tlb)
     tlb_flush_range_by_mmuidx(cs, tlb->EPN, tlb->size, mmu_idx,
                               TARGET_LONG_BITS);
 }
+#endif
 
 void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong entry,
                          target_ulong val)
@@ -753,7 +772,7 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong entry,
         qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
                       TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
                       (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-        ppcemb_tlb_flush(cs, tlb);
+        env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
     }
     tlb->size = booke_tlb_to_page_size((val >> PPC4XX_TLBHI_SIZE_SHIFT)
                                        & PPC4XX_TLBHI_SIZE_MASK);
@@ -792,7 +811,7 @@ void helper_4xx_tlbwe_hi(CPUPPCState *env, target_ulong entry,
 void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong entry,
                          target_ulong val)
 {
-    CPUState *cs = env_cpu(env);
+//    CPUState *cs = env_cpu(env);
     ppcemb_tlb_t *tlb;
 
     qemu_log_mask(CPU_LOG_MMU, "%s entry %i val " TARGET_FMT_lx "\n",
@@ -804,7 +823,7 @@ void helper_4xx_tlbwe_lo(CPUPPCState *env, target_ulong entry,
         qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
                       TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
                       (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-        ppcemb_tlb_flush(cs, tlb);
+        env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
     }
     tlb->attr = val & PPC4XX_TLBLO_ATTR_MASK;
     tlb->RPN = val & PPC4XX_TLBLO_RPN_MASK;
@@ -865,7 +884,7 @@ void helper_440_tlbwe(CPUPPCState *env, uint32_t word, target_ulong entry,
         qemu_log_mask(CPU_LOG_MMU, "%s: invalidate old TLB %d start "
                       TARGET_FMT_lx " end " TARGET_FMT_lx "\n", __func__,
                       (int)entry, tlb->EPN, tlb->EPN + tlb->size);
-        ppcemb_tlb_flush(env_cpu(env), tlb);
+        env->tlb_need_flush |= TLB_NEED_LOCAL_FLUSH;
     }
 
     switch (word) {


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-28  3:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-27 16:47 TCG change broke MorphOS boot on sam460ex BALATON Zoltan
2024-02-27 18:15 ` Philippe Mathieu-Daudé
2024-02-27 19:48   ` BALATON Zoltan
2024-03-21 18:41   ` BALATON Zoltan
2024-04-02 11:32     ` BALATON Zoltan
2024-04-03  5:15       ` Nicholas Piggin
2024-04-03 22:23         ` BALATON Zoltan
2024-05-27 22:23         ` BALATON Zoltan
2024-05-27 22:55           ` BALATON Zoltan
2024-05-28  3:30           ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).