qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
@ 2022-11-01 14:17 Peter Maydell
  2022-11-17 13:05 ` Peter Maydell
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2022-11-01 14:17 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Gerd Hoffmann, Paolo Bonzini, Richard Henderson

Hi; I'm trying to find out what the UI layer's threading and
locking strategy is, at least as far as it applies to display
device models.

Specifically:
 * is the device's GraphicHwOps::gfx_update method always called
   from one specific thread, or might it be called from any thread?
 * is that method called with any locks guaranteed held? (eg the
   iothread lock)
 * is the caller of the gfx_update method OK if an implementation
   of the method drops the iothread lock temporarily while it is
   executing? (my guess would be "no")
 * for a gfx_update_async = true device, what are the requirements
   on calling graphic_hw_update_done()? Does the caller need to hold
   any particular lock? Does the call need to be done from any
   particular thread?

The background to this is that I'm looking again at the race
condition involving the memory_region_snapshot_and_clear_dirty()
function, as described here:
 https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=Wn+k8dQneB_ynQ@mail.gmail.com/T/#u

Having worked through what is going on, as far as I can see:
 (1) in order to be sure that we have the right data to match
 the snapshotted dirty bitmap state, we must wait for all TCG
 vCPUs to leave their current TB
 (2) a vCPU might block waiting for the iothread lock mid-TB
 (3) therefore we cannot wait for the TCG vCPUs without dropping
 the iothread lock one way or another
 (4) but none of the callers expect that and various things break

My tentative idea for a fix is a bit of an upheaval:
 * have the display devices set gfx_update_async = true
 * instead of doing everything synchronously in their gfx_update
   method, they do the initial setup and call an 'async' version
   of memory_region_snapshot_and_clear_dirty()
 * that async version of the function will do what it does today,
   but without trying to wait for TCG vCPUs
 * instead the caller arranges (via call_rcu(), probably) a
   callback that will happen once all the TCG CPUs have finished
   executing their current TB
 * that callback does the actual copy-from-guest-ram-to-display
   and then calls graphic_hw_update_done()

This seems like an awful pain in the neck but I couldn't see
anything better :-(

Paolo: what (if any) guarantee does call_rcu() make about
which thread the callback function gets executed on, and what
locks are/are not held when it's called?

(I haven't looked at the migration code's use of
memory_global_after_dirty_log_sync() but I suspect it's
similarly broken.)

thanks
-- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
  2022-11-01 14:17 UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races Peter Maydell
@ 2022-11-17 13:05 ` Peter Maydell
  2022-11-21 22:37   ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2022-11-17 13:05 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Gerd Hoffmann, Paolo Bonzini, Richard Henderson

On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> Hi; I'm trying to find out what the UI layer's threading and
> locking strategy is, at least as far as it applies to display
> device models.

Ping! :-) I'm still looking for information about this,
and about what threads call_rcu() callbacks might be run on...

thanks
-- PMM

> Specifically:
>  * is the device's GraphicHwOps::gfx_update method always called
>    from one specific thread, or might it be called from any thread?
>  * is that method called with any locks guaranteed held? (eg the
>    iothread lock)
>  * is the caller of the gfx_update method OK if an implementation
>    of the method drops the iothread lock temporarily while it is
>    executing? (my guess would be "no")
>  * for a gfx_update_async = true device, what are the requirements
>    on calling graphic_hw_update_done()? Does the caller need to hold
>    any particular lock? Does the call need to be done from any
>    particular thread?
>
> The background to this is that I'm looking again at the race
> condition involving the memory_region_snapshot_and_clear_dirty()
> function, as described here:
>  https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=Wn+k8dQneB_ynQ@mail.gmail.com/T/#u
>
> Having worked through what is going on, as far as I can see:
>  (1) in order to be sure that we have the right data to match
>  the snapshotted dirty bitmap state, we must wait for all TCG
>  vCPUs to leave their current TB
>  (2) a vCPU might block waiting for the iothread lock mid-TB
>  (3) therefore we cannot wait for the TCG vCPUs without dropping
>  the iothread lock one way or another
>  (4) but none of the callers expect that and various things break
>
> My tentative idea for a fix is a bit of an upheaval:
>  * have the display devices set gfx_update_async = true
>  * instead of doing everything synchronously in their gfx_update
>    method, they do the initial setup and call an 'async' version
>    of memory_region_snapshot_and_clear_dirty()
>  * that async version of the function will do what it does today,
>    but without trying to wait for TCG vCPUs
>  * instead the caller arranges (via call_rcu(), probably) a
>    callback that will happen once all the TCG CPUs have finished
>    executing their current TB
>  * that callback does the actual copy-from-guest-ram-to-display
>    and then calls graphic_hw_update_done()
>
> This seems like an awful pain in the neck but I couldn't see
> anything better :-(
>
> Paolo: what (if any) guarantee does call_rcu() make about
> which thread the callback function gets executed on, and what
> locks are/are not held when it's called?
>
> (I haven't looked at the migration code's use of
> memory_global_after_dirty_log_sync() but I suspect it's
> similarly broken.)
>
> thanks
> -- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
  2022-11-17 13:05 ` Peter Maydell
@ 2022-11-21 22:37   ` Philippe Mathieu-Daudé
  2022-11-22  8:04     ` Akihiko Odaki
  0 siblings, 1 reply; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-11-21 22:37 UTC (permalink / raw)
  To: Akihiko Odaki, Emanuele Giuseppe Esposito, Volker Rümelin,
	Marc-André Lureau, Mark Cave-Ayland, Vivek Kasireddy,
	BALATON Zoltan
  Cc: QEMU Developers, Peter Maydell, Gerd Hoffmann, Paolo Bonzini,
	Richard Henderson

Cc'ing more UI/display contributors.

On 17/11/22 14:05, Peter Maydell wrote:
> On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> Hi; I'm trying to find out what the UI layer's threading and
>> locking strategy is, at least as far as it applies to display
>> device models.
> 
> Ping! :-) I'm still looking for information about this,
> and about what threads call_rcu() callbacks might be run on...
> 
> thanks
> -- PMM
> 
>> Specifically:
>>   * is the device's GraphicHwOps::gfx_update method always called
>>     from one specific thread, or might it be called from any thread?
>>   * is that method called with any locks guaranteed held? (eg the
>>     iothread lock)
>>   * is the caller of the gfx_update method OK if an implementation
>>     of the method drops the iothread lock temporarily while it is
>>     executing? (my guess would be "no")
>>   * for a gfx_update_async = true device, what are the requirements
>>     on calling graphic_hw_update_done()? Does the caller need to hold
>>     any particular lock? Does the call need to be done from any
>>     particular thread?
>>
>> The background to this is that I'm looking again at the race
>> condition involving the memory_region_snapshot_and_clear_dirty()
>> function, as described here:
>>   https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=Wn+k8dQneB_ynQ@mail.gmail.com/T/#u
>>
>> Having worked through what is going on, as far as I can see:
>>   (1) in order to be sure that we have the right data to match
>>   the snapshotted dirty bitmap state, we must wait for all TCG
>>   vCPUs to leave their current TB
>>   (2) a vCPU might block waiting for the iothread lock mid-TB
>>   (3) therefore we cannot wait for the TCG vCPUs without dropping
>>   the iothread lock one way or another
>>   (4) but none of the callers expect that and various things break
>>
>> My tentative idea for a fix is a bit of an upheaval:
>>   * have the display devices set gfx_update_async = true
>>   * instead of doing everything synchronously in their gfx_update
>>     method, they do the initial setup and call an 'async' version
>>     of memory_region_snapshot_and_clear_dirty()
>>   * that async version of the function will do what it does today,
>>     but without trying to wait for TCG vCPUs
>>   * instead the caller arranges (via call_rcu(), probably) a
>>     callback that will happen once all the TCG CPUs have finished
>>     executing their current TB
>>   * that callback does the actual copy-from-guest-ram-to-display
>>     and then calls graphic_hw_update_done()
>>
>> This seems like an awful pain in the neck but I couldn't see
>> anything better :-(
>>
>> Paolo: what (if any) guarantee does call_rcu() make about
>> which thread the callback function gets executed on, and what
>> locks are/are not held when it's called?
>>
>> (I haven't looked at the migration code's use of
>> memory_global_after_dirty_log_sync() but I suspect it's
>> similarly broken.)
>>
>> thanks
>> -- PMM
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
  2022-11-21 22:37   ` Philippe Mathieu-Daudé
@ 2022-11-22  8:04     ` Akihiko Odaki
  2022-11-22 11:52       ` Peter Maydell
  0 siblings, 1 reply; 5+ messages in thread
From: Akihiko Odaki @ 2022-11-22  8:04 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Emanuele Giuseppe Esposito,
	Volker Rümelin, Marc-André Lureau, Mark Cave-Ayland,
	Vivek Kasireddy, BALATON Zoltan
  Cc: QEMU Developers, Peter Maydell, Gerd Hoffmann, Paolo Bonzini,
	Richard Henderson

Hi,

On 2022/11/22 7:37, Philippe Mathieu-Daudé wrote:
> Cc'ing more UI/display contributors.
> 
> On 17/11/22 14:05, Peter Maydell wrote:
>> On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.maydell@linaro.org> 
>> wrote:
>>>
>>> Hi; I'm trying to find out what the UI layer's threading and
>>> locking strategy is, at least as far as it applies to display
>>> device models.
>>
>> Ping! :-) I'm still looking for information about this,
>> and about what threads call_rcu() callbacks might be run on...

I briefly checked the code, and it looks like rcu has its own thread. 
Search for "thread" in util/rcu.c. Looking at call_rcu_thread() in the 
file will tell what kind of context the callbacks will be ran on.

>>
>> thanks
>> -- PMM
>>
>>> Specifically:
>>>   * is the device's GraphicHwOps::gfx_update method always called
>>>     from one specific thread, or might it be called from any thread?
>>>   * is that method called with any locks guaranteed held? (eg the
>>>     iothread lock)
>>>   * is the caller of the gfx_update method OK if an implementation
>>>     of the method drops the iothread lock temporarily while it is
>>>     executing? (my guess would be "no")
>>>   * for a gfx_update_async = true device, what are the requirements
>>>     on calling graphic_hw_update_done()? Does the caller need to hold
>>>     any particular lock? Does the call need to be done from any
>>>     particular thread?
>>>
>>> The background to this is that I'm looking again at the race
>>> condition involving the memory_region_snapshot_and_clear_dirty()
>>> function, as described here:
>>>   
>>> https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=Wn+k8dQneB_ynQ@mail.gmail.com/T/#u
>>>
>>> Having worked through what is going on, as far as I can see:
>>>   (1) in order to be sure that we have the right data to match
>>>   the snapshotted dirty bitmap state, we must wait for all TCG
>>>   vCPUs to leave their current TB
>>>   (2) a vCPU might block waiting for the iothread lock mid-TB
>>>   (3) therefore we cannot wait for the TCG vCPUs without dropping
>>>   the iothread lock one way or another
>>>   (4) but none of the callers expect that and various things break
>>>
>>> My tentative idea for a fix is a bit of an upheaval:
>>>   * have the display devices set gfx_update_async = true
>>>   * instead of doing everything synchronously in their gfx_update
>>>     method, they do the initial setup and call an 'async' version
>>>     of memory_region_snapshot_and_clear_dirty()
>>>   * that async version of the function will do what it does today,
>>>     but without trying to wait for TCG vCPUs
>>>   * instead the caller arranges (via call_rcu(), probably) a
>>>     callback that will happen once all the TCG CPUs have finished
>>>     executing their current TB
>>>   * that callback does the actual copy-from-guest-ram-to-display
>>>     and then calls graphic_hw_update_done()
>>>
>>> This seems like an awful pain in the neck but I couldn't see
>>> anything better :-(

Converting memory_region_snapshot_and_clear_dirty() asynchronous is 
nice, but also don't forget about reordering things in 
framebuffer_update_display() so that the DisplaySurface reference 
happens after memory_region_snapshot_and_clear_dirty(). Even if you make 
memory_region_snapshot_and_clear_dirty() asynchronous, the bug will 
remain if you keep the stale reference of the DisplaySurface and pass it 
to the asynchronous callback.

Regards,
Akihiko Odaki

>>>
>>> Paolo: what (if any) guarantee does call_rcu() make about
>>> which thread the callback function gets executed on, and what
>>> locks are/are not held when it's called?
>>>
>>> (I haven't looked at the migration code's use of
>>> memory_global_after_dirty_log_sync() but I suspect it's
>>> similarly broken.)
>>>
>>> thanks
>>> -- PMM
>>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
  2022-11-22  8:04     ` Akihiko Odaki
@ 2022-11-22 11:52       ` Peter Maydell
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Maydell @ 2022-11-22 11:52 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé, Emanuele Giuseppe Esposito,
	Volker Rümelin, Marc-André Lureau, Mark Cave-Ayland,
	Vivek Kasireddy, BALATON Zoltan, QEMU Developers, Gerd Hoffmann,
	Paolo Bonzini, Richard Henderson

On Tue, 22 Nov 2022 at 08:04, Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
>
> Hi,
>
> On 2022/11/22 7:37, Philippe Mathieu-Daudé wrote:
> > Cc'ing more UI/display contributors.
> >
> > On 17/11/22 14:05, Peter Maydell wrote:
> >> On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.maydell@linaro.org>
> >> wrote:
> >>>
> >>> Hi; I'm trying to find out what the UI layer's threading and
> >>> locking strategy is, at least as far as it applies to display
> >>> device models.
> >>
> >> Ping! :-) I'm still looking for information about this,
> >> and about what threads call_rcu() callbacks might be run on...
>
> I briefly checked the code, and it looks like rcu has its own thread.
> Search for "thread" in util/rcu.c. Looking at call_rcu_thread() in the
> file will tell what kind of context the callbacks will be ran on.

Yes, I can read the code to find out what it does currently.
I'm hoping for an authoritative answer from the designer
about what the API guarantees are...

> >>> My tentative idea for a fix is a bit of an upheaval:
> >>>   * have the display devices set gfx_update_async = true
> >>>   * instead of doing everything synchronously in their gfx_update
> >>>     method, they do the initial setup and call an 'async' version
> >>>     of memory_region_snapshot_and_clear_dirty()
> >>>   * that async version of the function will do what it does today,
> >>>     but without trying to wait for TCG vCPUs
> >>>   * instead the caller arranges (via call_rcu(), probably) a
> >>>     callback that will happen once all the TCG CPUs have finished
> >>>     executing their current TB
> >>>   * that callback does the actual copy-from-guest-ram-to-display
> >>>     and then calls graphic_hw_update_done()
> >>>
> >>> This seems like an awful pain in the neck but I couldn't see
> >>> anything better :-(
>
> Converting memory_region_snapshot_and_clear_dirty() asynchronous is
> nice, but also don't forget about reordering things in
> framebuffer_update_display() so that the DisplaySurface reference
> happens after memory_region_snapshot_and_clear_dirty(). Even if you make
> memory_region_snapshot_and_clear_dirty() asynchronous, the bug will
> remain if you keep the stale reference of the DisplaySurface and pass it
> to the asynchronous callback.

Yes, it would need to either rearrange things, or else just take
a reference to the DisplaySurface and hold on to it until it's
done (which might result in a harmless write to a surface that's
no longer being actively used and will be thrown away when we
drop our reference). I think you probably need to take the reference,
because to identify the right arguments to take the memory snapshot
you need to know various properties of the DisplaySurface (eg its
size), so you need the DS both before and after the snapshot call.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-22 11:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-01 14:17 UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races Peter Maydell
2022-11-17 13:05 ` Peter Maydell
2022-11-21 22:37   ` Philippe Mathieu-Daudé
2022-11-22  8:04     ` Akihiko Odaki
2022-11-22 11:52       ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).