From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zen.linaroharston ([51.148.130.216]) by smtp.gmail.com with ESMTPSA id o4sm516544wry.80.2021.11.18.09.13.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Nov 2021 09:13:11 -0800 (PST) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id B93821FF96; Thu, 18 Nov 2021 17:13:10 +0000 (GMT) References: User-agent: mu4e 1.7.5; emacs 28.0.60 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Idan Horowitz Cc: qemu-arm@nongnu.org Subject: Re: DSB does not seem to wait for TLBI completion Date: Thu, 18 Nov 2021 17:01:45 +0000 In-reply-to: Message-ID: <87fsrtv13t.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TUID: cVYi3lPMS464 Idan Horowitz writes: > Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the = following scenario: > After receiving a data abort and mapping in the correct page I try to inv= alidate the corresponding TLB entry using the following assembly > sequence: > > dsb ish > tlbi vaae1is, x0 > dsb sy > > Unfortunately this does not seem to have any immediate effect, as upon re= turning back to the source of the exception I immediately hit > the same Data Abort. This cycle of receiving a Data Abort and then updati= ng the mapping continues for 100s of times, until the TLB finally > updates to the correct mapping. > > As part of my testing I also tried to replace the Inner Shareable tlbi I = showed above with the base version that only invalidates the current > PE's TLB entry (tlbi vaae1, x0) this seemed to fix the issue, which made = me suspect something was up with QEMU itself, as the inner > shareable version of the instruction is supposed to invalidate the curren= t PE's TLB entry as well as the others', so if the non-shareable > version works the inner-shareable one should work as well. > > After digging a bit through the code I saw that the non-shareable version= calls 'tlb_flush_page_bits_by_mmuidx' which eventually calls > 'tlb_flush_range_by_mmuidx_async_0' synchronously, while the inner-sharea= ble version calls > 'tlb_flush_page_bits_by_mmuidx_all_cpus_synced' which also eventually cal= ls 'tlb_flush_range_by_mmuidx_async_0', but asynchronously > this time. > > Moving on to the implementation of the DSB instruction I saw that it is t= ranslated into an 'INDEX_op_mb' operation, but looking at the > interpreter handling of that instruction, it simply performs a memory bar= rier, it does not handle any of the async tasks in the work queue > (at least explicitly) so from my (admittedly basic) understanding of the = code it looks like QEMU's implementation of the DSB instruction > does not wait until the TLB flush has finished, as required. If we exit the translation block like the code for ISB does then that will give a chance for all the queued work to complete. If we have done a _synced call this includes bringing all vCPUs to a halt before flushing and restarting. > If anyone can point me in the right direction it would be greatly > appreciated. Try: modified target/arm/translate-a64.c @@ -1553,6 +1553,7 @@ static void handle_sync(DisasContext *s, uint32_t ins= n, break; } tcg_gen_mb(bar); + gen_goto_tb(s, 0, s->base.pc_next); return; case 6: /* ISB */ and see if that helps. I suspect do be efficient we should probably do some more decode on the instruction to make that decision as ending a block for every DMB/DSB might be overkill and impact performance.=20 I don't think we have a way to track pending state awaiting a DSB instruction in the translator but in theory we could. I thought (ri->type & ARM_CP_IO) for system registers would ensure an end of block but apparently that is only for icount. > > Thanks, Idan Horowitz. --=20 Alex Benn=C3=A9e