From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: alex.bennee@linaro.org, qemu-devel@nongnu.org, peterx@redhat.com
Subject: Re: [Qemu-devel] possible ahci/migrate fix
Date: Wed, 27 Feb 2019 13:16:22 +0000 [thread overview]
Message-ID: <20190227131622.GH2602@work-vm> (raw)
In-Reply-To: <877edltncj.fsf@trasno.org>
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > Hi Alex,
> > Can you see if the attached patch fixes the ahci/migrate failure you
> > see; it won't fail for me however mean I am to it.
> >
> >
>
> ....
>
> > void migration_object_finalize(void)
> > {
> > + /*
> > + * Cancel the current migration - that will (eventually)
> > + * stop the migration using this structure
> > + */
> > + migrate_fd_cancel(current_migration);
>
> This can only happen during "civilized" exit of qemu, right?
> Otherwise, we are changing the migration status.
I'm hoping that, in an exit where:
a) there was no migration this does nothing
b) the migration finished already this does nothing
c) There's a running migration then this causes a cancel.
So if (a) & (b) do nothing then this should be OK, and
if you're trying to do an exit during (c) then it's going bad anyway.
> > object_unref(OBJECT(current_migration));
> > }
> >
> > @@ -3134,6 +3140,7 @@ static void *migration_thread(void *opaque)
> >
> > rcu_register_thread();
> >
> > + object_ref(OBJECT(s));
>
> It is weird that this is not enough :-(
It maybe enough; I was being cautious to add the cancel as well -
I'm pretty sure this series doesn't cover every case, but since these
bugs are hard to reproduce I wanted to try and cover any and all easy
cases. There was a previous attempt (by Fam) 2 years ago which stalled
because we couldn't figure out how to cover everything.
> > s->iteration_start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >
> > qemu_savevm_state_header(s->to_dst_file);
> > @@ -3230,6 +3237,7 @@ static void *migration_thread(void *opaque)
> >
> > trace_migration_thread_after_loop();
> > migration_iteration_finish(s);
> > + object_unref(OBJECT(s));
> > rcu_unregister_thread();
> > return NULL;
> > }
> > diff --git a/vl.c b/vl.c
> > index 2f340686a7..c1920165f3 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -4579,6 +4579,12 @@ int main(int argc, char **argv, char **envp)
> >
> > gdbserver_cleanup();
> >
> > + /*
> > + * cleaning up the migration object cancels any existing migration
> > + * try to do this early so that it also stops using devices.
> > + */
> > + migration_object_finalize();
> > +
> > /* No more vcpu or device emulation activity beyond this point */
> > vm_shutdown();
> >
> > @@ -4594,7 +4600,6 @@ int main(int argc, char **argv, char **envp)
> > monitor_cleanup();
> > qemu_chr_cleanup();
> > user_creatable_cleanup();
> > - migration_object_finalize();
> > /* TODO: unref root container, check all devices are ok */
> >
> > return 0;
>
> Ok, it was happening really late.
>
> Once that you are at this, can we rename it?
>
> migration_cleanup()
>
> looks more consistent with everything else, and makes sure that it does
> "more" than finalize the object, no?
Sure, I'll go with migration_shutdown
> Furthermore, it is enough to "just" cancel the migration? It is not
> needed that we wait for the migration thread to finish?
I agree, but the problem is I don't want to block in the exit path - so
what are the choices?
> If the problem was that migration_thread() was accessing the object
> after main thread freed it, then just the ref counting should be enough,
> no? My understanding is that returning from main is the quivalent of
> exit() and kill all threads? Or are we doing something special for that
> not to happen?
The case Alex saw was gprof having an atexit handler so main hadn't
really exited yet and hadn't killed it's threads; having said that I
can't trigger this even with gprof.
Dave
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2019-02-27 13:23 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-27 12:10 [Qemu-devel] possible ahci/migrate fix Dr. David Alan Gilbert
2019-02-27 12:41 ` Juan Quintela
2019-02-27 13:16 ` Dr. David Alan Gilbert [this message]
2019-02-27 16:46 ` Alex Bennée
2019-02-27 16:51 ` Dr. David Alan Gilbert
2019-02-27 20:30 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190227131622.GH2602@work-vm \
--to=dgilbert@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.