Re: Display update issue on M1 Macs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Akihiko Odaki <akihiko.odaki@gmail.com>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	qemu-devel@nongnu.org, Gerd Hoffmann <kraxel@redhat.com>,
	Joelle van Dyne <j@getutm.app>
Subject: Re: Display update issue on M1 Macs
Date: Fri, 3 Feb 2023 19:16:28 +0900	[thread overview]
Message-ID: <499a1290-1e90-20ef-881d-9434cbce8115@gmail.com> (raw)
In-Reply-To: <483662d9-2565-db44-0e19-fb9128f28bde@eik.bme.hu>

On 2023/02/02 19:51, BALATON Zoltan wrote:
> On Tue, 31 Jan 2023, BALATON Zoltan wrote:
>> On Tue, 31 Jan 2023, Akihiko Odaki wrote:
> [...]
> To summarise previous discussion:
> 
> - There's a problem on Apple M1 Macs with sm501 and ati-vga 2d accel 
> functions drawing from device model into the video memory of the 
> emulated card which is not shown on screen when the display update 
> callback is called from another thread. This works on x86_64 host so I 
> suspect it may be related to missing memory synchronisation that ARM may 
> need.
> 
> - This can be reproduced running AmigaOS4 on sam460ex or MorphOS (demo 
> iso downliadable from their web site) on sam460ex, pegasos2 or 
> mac99,via=pmu with -device ati-vga,romfile="" as described here: 
> http://zero.eik.bme.hu/~balaton/qemu/amiga/
> 
> - I can't test it myself lacking hardware so I have to rely on reports 
> from people who have this hardware so there may be some uncertainity in 
> the info I get.
> 
> - We have confirmed it's not related to a known race condition as 
> disabling dirty tracking and always doing full updates of whole screen 
> did not fix it:
> 
>>>>>>>>> But there is an exception: 
>>>>>>>>> memory_region_snapshot_and_clear_dirty() releases iothread 
>>>>>>>>> lock, and that broke raspi3b display device:
>>>>>>>>> https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=Wn+k8dQneB_ynQ@mail.gmail.com/T/
>>>>>>>>>
>>>>>>>>> It is unexpected that gfx_update() callback releases iothread 
>>>>>>>>> lock so it may break things in peculiar ways.
>>>>>>>>>
>>>>>>>>> Peter, is there any change in the situation regarding the race 
>>>>>>>>> introduced by memory_region_snapshot_and_clear_dirty()?
>>>>>>>>>
>>>>>>>>> For now, to workaround the issue, I think you can create 
>>>>>>>>> another mutex and make the entire sm501_2d_engine_write() and 
>>>>>>>>> sm501_update_display() critical sections.
>>>>>>>>
>>>>>>>> Interesting thread but not sure it's the same problem so this 
>>>>>>>> workaround may not be enough to fix my issue. Here's a video 
>>>>>>>> posted by one of the people who reported it showing the problem 
>>>>>>>> on M1 Mac:
>>>>>>>>
>>>>>>>> https://www.youtube.com/watch?v=FDqoNbp6PQs
>>>>>>>>
>>>>>>>> and here's how it looks like on other machines:
>>>>>>>>
>>>>>>>> https://www.youtube.com/watch?v=ML7-F4HNFKQ
>>>>>>>>
>>>>>>>> There are also videos showing it running on RPi 4 and G5 Mac 
>>>>>>>> without this issue so it seems to only happen on Apple Silicon 
>>>>>>>> M1 Macs. What's strange is that graphics elements are not just 
>>>>>>>> delayed which I think should happen with missing thread 
>>>>>>>> synchronisation where the update callback would miss some pixels 
>>>>>>>> rendered during it's running but subsequent update callbacks 
>>>>>>>> would eventually draw those, woudn't they? Also setting 
>>>>>>>> full_update to 1 in sm501_update_display() callback to disable 
>>>>>>>> dirty tracking does not fix the problem. So it looks like as if 
>>>>>>>> sm501_2d_operation() running on one CPU core only writes data to 
>>>>>>>> the local cache of that core which sm501_update_display() 
>>>>>>>> running on other core can't see, so maybe some cache 
>>>>>>>> synchronisation is needed in memory_region_set_dirty() or if 
>>>>>>>> that's already there maybe I should call it for all changes not 
>>>>>>>> only those in the visible display area? I'm still not sure I 
>>>>>>>> understand the problem and don't know what could be a fix for it 
>>>>>>>> so anything to test to identify the issue better might also 
>>>>>>>> bring us closer to a solution.
>>>>>>>
>>>>>>> If you set full_update to 1, you may also comment out 
>>>>>>> memory_region_snapshot_and_clear_dirty() and 
>>>>>>> memory_region_snapshot_get_dirty() to avoid the iothread mutex 
>>>>>>> being unlocked. The iothread mutex should ensure cache coherency 
>>>>>>> as well.
>>>>>>>
>>>>>>> But as you say, it's weird that the rendered result is not just 
>>>>>>> delayed but missed. That may imply other possibilities (e.g., the 
>>>>>>> results are overwritten by someone else). If the problem persists 
>>>>>>> after commenting out memory_region_snapshot_and_clear_dirty() and 
>>>>>>> memory_region_snapshot_get_dirty(), I think you can assume the 
>>>>>>> inter-thread coherency between sm501_2d_operation() and 
>>>>>>> sm501_update_display() is not causing the problem.
>>>>>>
>>>>>> I've asked people who reported and can reproduce it to test this 
>>>>>> but it did not change anything so confirmed it's not that race 
>>>>>> condition but looks more like some cache inconsistency maybe. Any 
>>>>>> other ideas?
>>>>>
>>>>> I can come up with two important differences between x86 and Arm 
>>>>> which can affect the execution of QEMU:
>>>>> 1. Memory model. Arm uses a memory model more relaxed than x86 so 
>>>>> it is more sensitive for synchronization failures among threads.
>>>>> 2. Different instructions. TCG uses JIT so differences in 
>>>>> instructions matter.
>>>>>
>>>>> We should be able to exclude 1) as a potential cause of the 
>>>>> problem. iothread mutex should take care of race condition and even 
>>>>> cache coherency problem; mutex includes memory barrier functionality.
> [...]
>>>>> For difference 2), you may try to use TCI. You can find details of 
>>>>> TCI in tcg/tci/README.
>>>>
>>>> This was tested and also with TCI got the same results just much 
>>>> slower.
>>>>
>>>>> The common sense tells, however, the memory model is usually the 
>>>>> cause of the problem when you see behavioral differences between 
>>>>> x86 and Arm, and TCG should work fine with both of x86 and Arm as 
>>>>> they should have been tested well.
> [...]
>>> Fortunately macOS provides Rosetta 2 for x86 emulation on Apple M1, 
>>> which makes it possible to compare x86 and Arm without concerning the 
>>> difference of the microarchitecture.
>>
>> We've tried that before and even running x86 QEMU on M1 with Rosetta 2 
>> it was the same so it's probably not something about the code itself 
>> but how it's
> 
> As this was odd I've asked to re-test this and now I'm told at least 
> QEMU 5.1 x86_64 build from emaculation.com is working with Rosetta on M1 
> Mac so this suggests it may be a problem with memory sync but still 
> don't know where and what to try. We're now try newer X86_64 builds to 
> see if it broke somewhere along the way.
> 
> Anybody else with an M1 Mac wants to help testing? Can you reproduce the 
> same with UTM with MorphOS and ati-vga? Here's what I've got showing the 
> problem: https://www.youtube.com/watch?v=j5Ag5_Yq-Mk
> 
> Regards,
> BALATON Zoltan

Hi,

I finally reproduced the issue with MorphOS and ati-vga and figured out 
its cause.

The problem is that pixman_blt() is disabled because its backend is 
written in GNU assembly, and GNU assembler is not available on macOS. 
There is no fallback written in C, unfortunately. The issue is tracked 
by the upstream at:
https://gitlab.freedesktop.org/pixman/pixman/-/issues/59

I hit the same problem on Asahi Linux, which is based on Arch Linux ARM. 
It is because Arch Linux copied PKGBUILD from x86 Arch Linux, which 
disables Arm backends. It is easy to enable the backend for the platform 
so I proposed a change at:
https://github.com/archlinuxarm/PKGBUILDs/pull/1985

Regards,
Akihiko Odaki

next prev parent reply	other threads:[~2023-02-03 10:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-04 23:24 Display update issue on M1 Macs BALATON Zoltan
2023-01-13 13:43 ` BALATON Zoltan
2023-01-14  2:41   ` Akihiko Odaki
2023-01-14 18:11     ` BALATON Zoltan
2023-01-19 13:10       ` Akihiko Odaki
2023-01-22 23:28         ` BALATON Zoltan
2023-01-28  4:01           ` Akihiko Odaki
2023-01-30 23:58             ` BALATON Zoltan
2023-01-31  7:37               ` Akihiko Odaki
2023-01-31 14:15                 ` BALATON Zoltan
2023-02-02 10:51                   ` BALATON Zoltan
2023-02-03 10:16                     ` Akihiko Odaki [this message]
2023-02-03 13:45                       ` BALATON Zoltan
2023-02-04  5:19                         ` Akihiko Odaki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499a1290-1e90-20ef-881d-9434cbce8115@gmail.com \
    --to=akihiko.odaki@gmail.com \
    --cc=balaton@eik.bme.hu \
    --cc=j@getutm.app \
    --cc=kraxel@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).