From: "Nicholas Piggin" <npiggin@gmail.com>
To: "Fabiano Rosas" <farosas@suse.de>, <qemu-devel@nongnu.org>
Cc: qemu-block@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
"Kevin Wolf" <kwolf@redhat.com>,
"Hanna Reitz" <hreitz@redhat.com>,
"Pavel Dovgalyuk" <pavel.dovgaluk@ispras.ru>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Fam Zheng" <fam@euphon.net>,
"Ronnie Sahlberg" <ronniesahlberg@gmail.com>,
"John Snow" <jsnow@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru>,
"Peter Xu" <peterx@redhat.com>,
"Dr. David Alan Gilbert" <dave@treblig.org>,
"Markus Armbruster" <armbru@redhat.com>,
"Michael Roth" <michael.roth@amd.com>,
"Wainer dos Santos Moschetta" <wainersm@redhat.com>
Subject: Re: [PATCH 02/17] replay: Fix migration replay_mutex locking
Date: Sat, 21 Dec 2024 12:54:37 +1000 [thread overview]
Message-ID: <D6H1FS7JDKTL.306GHC0OE910D@gmail.com> (raw)
In-Reply-To: <87r062tscl.fsf@suse.de>
On Fri Dec 20, 2024 at 11:08 PM AEST, Fabiano Rosas wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>
> Hi Nick,
>
> I'm ignorant about replay, but we try to know why were taking the BQL in
> the migration code, we move it around sometimes, etc. Can we be a bit
> more strict with documentation here so we don't get stuck with a lock
> that can't be changed?
>
> > Migration causes a number of events that need to go in the replay
> > trace, such as vm state transitions. The replay_mutex lock needs to
> > be held for these.
> >
>
> Is it practical to explicitly list which events are those?
As a general rule it is something like "while the target can be
producing or consuming rr events".
There is some record-replay handling in snapshot code (flush
events, get icount, etc) as well as SHUTDOWN_CAUSE_SNAPSHOT_LOAD
event generated and possibly a few other things. So for migration
it's not just a side effect of calling other APIs, but it is
explicitly "replay-aware", at least in some part.
I actually don't know full details about how snapshot/migrate
and record-replay work together. I know reverse debugging can
use snapshots to load the most recent possible state to
minimize replay, but that is "external" to the machine itself.
But I don't know why you would want to record and replay snapshot
loading as part of the trace. But facility exists. Pavel
understands all the big picture much better.
> Are there any tests that exercise this that we could use to validate
> changes around this area?
Yes I added some more avocado testing which includes snapshotting
while recording, and that's where I hit these bugs. I do plan to
submit that up as soon as this series gets in, just trying to keep
things managable. In that case we could defer this patch from this
series (the replay_linux test does not do any snapshotting as yet).
I think once you have some regression tests, you probably won't
have to worry _too_ much about record/replay details in migraiton.
>
> > The simplest approach seems to be just take it up-front when taking
> > the bql.
>
> But also the thing asserts if taken inside the BQL, so is the actual
> matter here that we _cannot_ take the lock around the proper places?
Yes, that is part of it in this case. Some other code drops the
bql then retakes both... but that is more complex and requires
knowledge of calling bql context to be sure it is safe to drop.
> I also see the replay lock around the main loop, so is it basically bql2
> from the perspective of most of QEMU?
Unfortunately it is a big scope yes. Basically it needs to
maintain atomicity between the event log entry that we record or
replay (run N instructions; raise an interrupt; get a character
from keyboard; etc.,) and the actual running of that operation
in the machine.
Thanks,
Nick
next prev parent reply other threads:[~2024-12-21 2:55 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-20 10:42 [PATCH 00/17] replay: Fixes and avocado test updates Nicholas Piggin
2024-12-20 10:42 ` [PATCH 01/17] replay: Fix migration use of clock for statistics Nicholas Piggin
2024-12-20 16:31 ` Peter Xu
2024-12-21 3:02 ` Nicholas Piggin
2024-12-23 17:26 ` Peter Xu
2024-12-24 7:24 ` Pavel Dovgalyuk
2024-12-24 15:19 ` Peter Xu
2024-12-20 10:42 ` [PATCH 02/17] replay: Fix migration replay_mutex locking Nicholas Piggin
2024-12-20 13:08 ` Fabiano Rosas
2024-12-21 2:54 ` Nicholas Piggin [this message]
2024-12-20 10:42 ` [PATCH 03/17] async: rework async event API for replay Nicholas Piggin
2024-12-20 10:42 ` [PATCH 04/17] util/main-loop: Convert to new bh API Nicholas Piggin
2024-12-20 10:42 ` [PATCH 05/17] util/thread-pool: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 06/17] util/aio-wait: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 07/17] async/coroutine: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 08/17] migration: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 09/17] monitor: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 10/17] qmp: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 11/17] block: " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 12/17] hw/ide: Fix record-replay and convert " Nicholas Piggin
2024-12-20 10:42 ` [PATCH 13/17] hw/scsi: Convert " Nicholas Piggin
2024-12-20 23:54 ` Paolo Bonzini
2024-12-21 3:17 ` Nicholas Piggin
2024-12-20 10:42 ` [PATCH 14/17] async: add debugging assertions for record/replay in bh APIs Nicholas Piggin
2024-12-20 10:42 ` [PATCH 15/17] tests/avocado/replay_linux: Fix compile error Nicholas Piggin
2024-12-20 10:42 ` [PATCH 16/17] tests/avocado/replay_linux: Fix cdrom device setup Nicholas Piggin
2024-12-20 10:42 ` [PATCH 17/17] tests/avocado/replay_linux: remove the timeout expected guards Nicholas Piggin
2024-12-20 11:42 ` [PATCH 00/17] replay: Fixes and avocado test updates Pavel Dovgalyuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D6H1FS7JDKTL.306GHC0OE910D@gmail.com \
--to=npiggin@gmail.com \
--cc=alex.bennee@linaro.org \
--cc=armbru@redhat.com \
--cc=dave@treblig.org \
--cc=fam@euphon.net \
--cc=farosas@suse.de \
--cc=hreitz@redhat.com \
--cc=jasowang@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=pavel.dovgaluk@ispras.ru \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=ronniesahlberg@gmail.com \
--cc=stefanha@redhat.com \
--cc=vsementsov@yandex-team.ru \
--cc=wainersm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.