All of lore.kernel.org
 help / color / mirror / Atom feed
* DSB does not seem to wait for TLBI completion
@ 2021-11-18 15:19 Idan Horowitz
  2021-11-18 17:01 ` Alex Bennée
  2021-11-18 17:32 ` Peter Maydell
  0 siblings, 2 replies; 9+ messages in thread
From: Idan Horowitz @ 2021-11-18 15:19 UTC (permalink / raw)
  To: qemu-arm

[-- Attachment #1: Type: text/plain, Size: 2000 bytes --]

Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the
following scenario:
After receiving a data abort and mapping in the correct page I try to
invalidate the corresponding TLB entry using the following assembly
sequence:

dsb ish
tlbi vaae1is, x0
dsb sy

Unfortunately this does not seem to have any immediate effect, as upon
returning back to the source of the exception I immediately hit the same
Data Abort. This cycle of receiving a Data Abort and then updating the
mapping continues for 100s of times, until the TLB finally updates to the
correct mapping.

As part of my testing I also tried to replace the Inner Shareable tlbi I
showed above with the base version that only invalidates the current PE's
TLB entry (tlbi vaae1, x0) this seemed to fix the issue, which made me
suspect something was up with QEMU itself, as the inner shareable version
of the instruction is supposed to invalidate the current PE's TLB entry as
well as the others', so if the non-shareable version works the
inner-shareable one should work as well.

After digging a bit through the code I saw that the non-shareable version
calls 'tlb_flush_page_bits_by_mmuidx' which eventually calls
'tlb_flush_range_by_mmuidx_async_0' synchronously, while the
inner-shareable version calls
'tlb_flush_page_bits_by_mmuidx_all_cpus_synced' which also eventually calls
'tlb_flush_range_by_mmuidx_async_0', but asynchronously this time.

Moving on to the implementation of the DSB instruction I saw that it is
translated into an 'INDEX_op_mb' operation, but looking at the interpreter
handling of that instruction, it simply performs a memory barrier, it does
not handle any of the async tasks in the work queue (at least explicitly)
so from my (admittedly basic) understanding of the code it looks like
QEMU's implementation of the DSB instruction does not wait until the TLB
flush has finished, as required.

If anyone can point me in the right direction it would be greatly
appreciated.

Thanks, Idan Horowitz.

[-- Attachment #2: Type: text/html, Size: 2335 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-11-18 15:19 DSB does not seem to wait for TLBI completion Idan Horowitz
@ 2021-11-18 17:01 ` Alex Bennée
  2021-11-21  7:52   ` Idan Horowitz
  2021-11-18 17:32 ` Peter Maydell
  1 sibling, 1 reply; 9+ messages in thread
From: Alex Bennée @ 2021-11-18 17:01 UTC (permalink / raw)
  To: Idan Horowitz; +Cc: qemu-arm


Idan Horowitz <idan.horowitz@gmail.com> writes:

> Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the following scenario:
> After receiving a data abort and mapping in the correct page I try to invalidate the corresponding TLB entry using the following assembly
> sequence:
>
> dsb ish
> tlbi vaae1is, x0
> dsb sy
>
> Unfortunately this does not seem to have any immediate effect, as upon returning back to the source of the exception I immediately hit
> the same Data Abort. This cycle of receiving a Data Abort and then updating the mapping continues for 100s of times, until the TLB finally
> updates to the correct mapping.
>
> As part of my testing I also tried to replace the Inner Shareable tlbi I showed above with the base version that only invalidates the current
> PE's TLB entry (tlbi vaae1, x0) this seemed to fix the issue, which made me suspect something was up with QEMU itself, as the inner
> shareable version of the instruction is supposed to invalidate the current PE's TLB entry as well as the others', so if the non-shareable
> version works the inner-shareable one should work as well.
>
> After digging a bit through the code I saw that the non-shareable version calls 'tlb_flush_page_bits_by_mmuidx' which eventually calls
> 'tlb_flush_range_by_mmuidx_async_0' synchronously, while the inner-shareable version calls
> 'tlb_flush_page_bits_by_mmuidx_all_cpus_synced' which also eventually calls 'tlb_flush_range_by_mmuidx_async_0', but asynchronously
> this time.
>
> Moving on to the implementation of the DSB instruction I saw that it is translated into an 'INDEX_op_mb' operation, but looking at the
> interpreter handling of that instruction, it simply performs a memory barrier, it does not handle any of the async tasks in the work queue
> (at least explicitly) so from my (admittedly basic) understanding of the code it looks like QEMU's implementation of the DSB instruction
> does not wait until the TLB flush has finished, as required.

If we exit the translation block like the code for ISB does then that
will give a chance for all the queued work to complete. If we have done
a _synced call this includes bringing all vCPUs to a halt before
flushing and restarting.

> If anyone can point me in the right direction it would be greatly
> appreciated.

Try:

modified   target/arm/translate-a64.c
@@ -1553,6 +1553,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
             break;
         }
         tcg_gen_mb(bar);
+        gen_goto_tb(s, 0, s->base.pc_next);
         return;
     case 6: /* ISB */

and see if that helps. I suspect do be efficient we should probably do
some more decode on the instruction to make that decision as ending a
block for every DMB/DSB might be overkill and impact performance. 

I don't think we have a way to track pending state awaiting a DSB
instruction in the translator but in theory we could. I thought
(ri->type & ARM_CP_IO) for system registers would ensure an end of block
but apparently that is only for icount.

>
> Thanks, Idan Horowitz.


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-11-18 15:19 DSB does not seem to wait for TLBI completion Idan Horowitz
  2021-11-18 17:01 ` Alex Bennée
@ 2021-11-18 17:32 ` Peter Maydell
  2021-11-18 18:50   ` Alex Bennée
  2021-11-21  7:57   ` Idan Horowitz
  1 sibling, 2 replies; 9+ messages in thread
From: Peter Maydell @ 2021-11-18 17:32 UTC (permalink / raw)
  To: Idan Horowitz; +Cc: qemu-arm, Alex Bennée

On Thu, 18 Nov 2021 at 15:46, Idan Horowitz <idan.horowitz@gmail.com> wrote:
>
> Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the following scenario:
> After receiving a data abort and mapping in the correct page I try to invalidate the corresponding TLB entry using the following assembly sequence:
>
> dsb ish
> tlbi vaae1is, x0
> dsb sy

Do you have a repro case you can give us ?
Does your setup involve SMP, or is this all on a single CPU ?

> Unfortunately this does not seem to have any immediate effect, as upon returning back to the source of the exception I immediately hit the same Data Abort. This cycle of receiving a Data Abort and then updating the mapping continues for 100s of times, until the TLB finally updates to the correct mapping.

Note that the architecture says that the DSB will guarantee the
TLB maintenance operation to be finished for *other* processors,
but that if you want to guarantee it to be finished for the
processor which executed the TLBI then you must do a DSB followed
by a "context synchronization event", eg a ISB insn, or return
from exception. (See the v8 Arm ARM DDI0487G.b page D5-2833.)
It sounds from your description as if a return-from-exception
is done on the CPU that executed the TLBI, though...

-- PMM

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-11-18 17:32 ` Peter Maydell
@ 2021-11-18 18:50   ` Alex Bennée
  2021-11-21  7:57   ` Idan Horowitz
  1 sibling, 0 replies; 9+ messages in thread
From: Alex Bennée @ 2021-11-18 18:50 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Idan Horowitz, qemu-arm


Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 18 Nov 2021 at 15:46, Idan Horowitz <idan.horowitz@gmail.com> wrote:
>>
>> Hey, I'm running a bare-metal image on QEMU 6.1 and I've encountered the following scenario:
>> After receiving a data abort and mapping in the correct page I try to invalidate the corresponding TLB entry using the following assembly sequence:
>>
>> dsb ish
>> tlbi vaae1is, x0
>> dsb sy
>
> Do you have a repro case you can give us ?
> Does your setup involve SMP, or is this all on a single CPU ?

I had started on writing an explicit test case for all of this in:

  https://github.com/stsquad/kvm-unit-tests/blob/712eb3a287df24cdeff00ef966d68aef6ff2b8eb/arm/tlbflush-data.c

but it's been a while and I need to debug what I was thinking when I
wrote it. However if we can get a test case for kvm-unit-tests that
would be great.

>
>> Unfortunately this does not seem to have any immediate effect, as
>> upon returning back to the source of the exception I immediately hit
>> the same Data Abort. This cycle of receiving a Data Abort and then
>> updating the mapping continues for 100s of times, until the TLB
>> finally updates to the correct mapping.
>
> Note that the architecture says that the DSB will guarantee the
> TLB maintenance operation to be finished for *other* processors,
> but that if you want to guarantee it to be finished for the
> processor which executed the TLBI then you must do a DSB followed
> by a "context synchronization event", eg a ISB insn, or return
> from exception. (See the v8 Arm ARM DDI0487G.b page D5-2833.)
> It sounds from your description as if a return-from-exception
> is done on the CPU that executed the TLBI, though...
>
> -- PMM


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-11-18 17:01 ` Alex Bennée
@ 2021-11-21  7:52   ` Idan Horowitz
  2021-12-01 15:40     ` Idan Horowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Idan Horowitz @ 2021-11-21  7:52 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-arm

Alex Bennée <alex.bennee@linaro.org> wrote:
> If we exit the translation block like the code for ISB does then that
> will give a chance for all the queued work to complete. If we have done
> a _synced call this includes bringing all vCPUs to a halt before
> flushing and restarting.
>
> Try:
>
> modified   target/arm/translate-a64.c
> @@ -1553,6 +1553,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
>              break;
>          }
>          tcg_gen_mb(bar);
> +        gen_goto_tb(s, 0, s->base.pc_next);
>          return;
>      case 6: /* ISB */
>
> and see if that helps. I suspect do be efficient we should probably do
> some more decode on the instruction to make that decision as ending a
> block for every DMB/DSB might be overkill and impact performance.
>
> I don't think we have a way to track pending state awaiting a DSB
> instruction in the translator but in theory we could. I thought
> (ri->type & ARM_CP_IO) for system registers would ensure an end of block
> but apparently that is only for icount.
>

I am actually running in icount mode (-icount shift=10 specifically),
and adding the translation block exit or just using ISB directly does
not seem to affect it unfortunately.

> --
> Alex Bennée

Idan Horowitz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-11-18 17:32 ` Peter Maydell
  2021-11-18 18:50   ` Alex Bennée
@ 2021-11-21  7:57   ` Idan Horowitz
  1 sibling, 0 replies; 9+ messages in thread
From: Idan Horowitz @ 2021-11-21  7:57 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-arm, Alex Bennée

Peter Maydell <peter.maydell@linaro.org> wrote:
>
> Do you have a repro case you can give us ?
> Does your setup involve SMP, or is this all on a single CPU ?
>

I'll try to create a minimized reproduction case.
As for the SMP question, I'm emulating 4 threads (note that the issue
does reproduce with less threads, but extremely rarely) with icount
enabled. (so single threaded tcg)
It is a bit curious that the amount of emulated threads affects it
though, as both the mapping of the correct page, the TLB invalidation,
and the data abort all happen on the same emulated thread.

>
> Note that the architecture says that the DSB will guarantee the
> TLB maintenance operation to be finished for *other* processors,
> but that if you want to guarantee it to be finished for the
> processor which executed the TLBI then you must do a DSB followed
> by a "context synchronization event", eg a ISB insn, or return
> from exception. (See the v8 Arm ARM DDI0487G.b page D5-2833.)
> It sounds from your description as if a return-from-exception
> is done on the CPU that executed the TLBI, though...
>

Indeed, an eret is executed after the handing of the data abort in
order to return to the instruction that raised the exception.
(Adding an ISB after the TLB invalidation does not seem to affect the
issue besides slowing down execution)

>
> -- PMM

Idan Horowitz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-11-21  7:52   ` Idan Horowitz
@ 2021-12-01 15:40     ` Idan Horowitz
  2021-12-01 16:13       ` Alex Bennée
  0 siblings, 1 reply; 9+ messages in thread
From: Idan Horowitz @ 2021-12-01 15:40 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-arm

Idan Horowitz <idan.horowitz@gmail.com> wrote:
>
> I am actually running in icount mode (-icount shift=10 specifically),
> and adding the translation block exit or just using ISB directly does
> not seem to affect it unfortunately.
>
> Idan Horowitz

After a lot of testing I had the thought of trying this without
icount, and it seems to work fine without it, so the issue is somehow
related to icount being enabled.

Idan Horowitz

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-12-01 15:40     ` Idan Horowitz
@ 2021-12-01 16:13       ` Alex Bennée
  2021-12-29 13:23         ` Idan Horowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Bennée @ 2021-12-01 16:13 UTC (permalink / raw)
  To: Idan Horowitz; +Cc: qemu-arm


Idan Horowitz <idan.horowitz@gmail.com> writes:

> Idan Horowitz <idan.horowitz@gmail.com> wrote:
>>
>> I am actually running in icount mode (-icount shift=10 specifically),
>> and adding the translation block exit or just using ISB directly does
>> not seem to affect it unfortunately.
>>
>> Idan Horowitz
>
> After a lot of testing I had the thought of trying this without
> icount, and it seems to work fine without it, so the issue is somehow
> related to icount being enabled.

That's is weird because icount basically ensures round robin scheduling
of each vCPU in turn. I wonder if there is a pending flush when the vCPU
switches?

We really need a reliable reproducer for this to investigate further.

>
> Idan Horowitz


-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: DSB does not seem to wait for TLBI completion
  2021-12-01 16:13       ` Alex Bennée
@ 2021-12-29 13:23         ` Idan Horowitz
  0 siblings, 0 replies; 9+ messages in thread
From: Idan Horowitz @ 2021-12-29 13:23 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-arm

Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> That's is weird because icount basically ensures round robin scheduling
> of each vCPU in turn. I wonder if there is a pending flush when the vCPU
> switches?
>
> We really need a reliable reproducer for this to investigate further.
>

I have finally been able to find the source of the issue, it was an
extremely subtle race condition in my code, so not an issue in QEMU
(although I did find a translation-related issue in QEMU during the
investigation: https://gitlab.com/qemu-project/qemu/-/issues/790), the
issue was so subtle in fact, that not even hardware was able to
reproduce it, only QEMU's highly deterministic icount mode was able to
reliably reproduce it. So thanks for your help, and sorry for wasting
your time.

>
> --
> Alex Bennée

Idan Horowitz

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-29 13:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-18 15:19 DSB does not seem to wait for TLBI completion Idan Horowitz
2021-11-18 17:01 ` Alex Bennée
2021-11-21  7:52   ` Idan Horowitz
2021-12-01 15:40     ` Idan Horowitz
2021-12-01 16:13       ` Alex Bennée
2021-12-29 13:23         ` Idan Horowitz
2021-11-18 17:32 ` Peter Maydell
2021-11-18 18:50   ` Alex Bennée
2021-11-21  7:57   ` Idan Horowitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.