Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Philippe Mathieu-Daudé" <philmd@linaro.org>
To: "Akihiko Odaki" <akihiko.odaki@gmail.com>,
	"Emanuele Giuseppe Esposito" <eesposit@redhat.com>,
	"Volker Rümelin" <vr_qemu@t-online.de>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"BALATON Zoltan" <balaton@eik.bme.hu>
Cc: QEMU Developers <qemu-devel@nongnu.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>
Subject: Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
Date: Mon, 21 Nov 2022 23:37:09 +0100	[thread overview]
Message-ID: <b8a21f61-cf25-87c6-694a-c9623a9d9c43@linaro.org> (raw)
In-Reply-To: <CAFEAcA_gDzyucBEq2pQJVmgZkLEP5hhW7k6_LmY7_mO3gEGHhw@mail.gmail.com>

Cc'ing more UI/display contributors.

On 17/11/22 14:05, Peter Maydell wrote:
> On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> Hi; I'm trying to find out what the UI layer's threading and
>> locking strategy is, at least as far as it applies to display
>> device models.
> 
> Ping! :-) I'm still looking for information about this,
> and about what threads call_rcu() callbacks might be run on...
> 
> thanks
> -- PMM
> 
>> Specifically:
>>   * is the device's GraphicHwOps::gfx_update method always called
>>     from one specific thread, or might it be called from any thread?
>>   * is that method called with any locks guaranteed held? (eg the
>>     iothread lock)
>>   * is the caller of the gfx_update method OK if an implementation
>>     of the method drops the iothread lock temporarily while it is
>>     executing? (my guess would be "no")
>>   * for a gfx_update_async = true device, what are the requirements
>>     on calling graphic_hw_update_done()? Does the caller need to hold
>>     any particular lock? Does the call need to be done from any
>>     particular thread?
>>
>> The background to this is that I'm looking again at the race
>> condition involving the memory_region_snapshot_and_clear_dirty()
>> function, as described here:
>>   https://lore.kernel.org/qemu-devel/CAFEAcA9odnPo2LPip295Uztri7JfoVnQbkJ=Wn+k8dQneB_ynQ@mail.gmail.com/T/#u
>>
>> Having worked through what is going on, as far as I can see:
>>   (1) in order to be sure that we have the right data to match
>>   the snapshotted dirty bitmap state, we must wait for all TCG
>>   vCPUs to leave their current TB
>>   (2) a vCPU might block waiting for the iothread lock mid-TB
>>   (3) therefore we cannot wait for the TCG vCPUs without dropping
>>   the iothread lock one way or another
>>   (4) but none of the callers expect that and various things break
>>
>> My tentative idea for a fix is a bit of an upheaval:
>>   * have the display devices set gfx_update_async = true
>>   * instead of doing everything synchronously in their gfx_update
>>     method, they do the initial setup and call an 'async' version
>>     of memory_region_snapshot_and_clear_dirty()
>>   * that async version of the function will do what it does today,
>>     but without trying to wait for TCG vCPUs
>>   * instead the caller arranges (via call_rcu(), probably) a
>>     callback that will happen once all the TCG CPUs have finished
>>     executing their current TB
>>   * that callback does the actual copy-from-guest-ram-to-display
>>     and then calls graphic_hw_update_done()
>>
>> This seems like an awful pain in the neck but I couldn't see
>> anything better :-(
>>
>> Paolo: what (if any) guarantee does call_rcu() make about
>> which thread the callback function gets executed on, and what
>> locks are/are not held when it's called?
>>
>> (I haven't looked at the migration code's use of
>> memory_global_after_dirty_log_sync() but I suspect it's
>> similarly broken.)
>>
>> thanks
>> -- PMM
>

next prev parent reply	other threads:[~2022-11-21 22:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-01 14:17 UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races Peter Maydell
2022-11-17 13:05 ` Peter Maydell
2022-11-21 22:37   ` Philippe Mathieu-Daudé [this message]
2022-11-22  8:04     ` Akihiko Odaki
2022-11-22 11:52       ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8a21f61-cf25-87c6-694a-c9623a9d9c43@linaro.org \
    --to=philmd@linaro.org \
    --cc=akihiko.odaki@gmail.com \
    --cc=balaton@eik.bme.hu \
    --cc=eesposit@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=vivek.kasireddy@intel.com \
    --cc=vr_qemu@t-online.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).