All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com>
Cc: pkrempa@redhat.com, qemu-devel@nongnu.org, peterx@redhat.com,
	quintela@redhat.com
Subject: Re: [Qemu-devel] [PATCH] Migration+TLS: Fix crash due to double cleanup
Date: Tue, 1 May 2018 11:11:27 +0100	[thread overview]
Message-ID: <20180501101127.GK5708@redhat.com> (raw)
In-Reply-To: <20180501100035.GJ5708@redhat.com>

On Tue, May 01, 2018 at 11:00:35AM +0100, Daniel P. Berrangé wrote:
> On Mon, Apr 30, 2018 at 07:59:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > During a TLS connect we see:
> >   migration_channel_connect calls
> >   migration_tls_channel_connect
> >   (calls after TLS setup)
> >   migration_channel_connect
> > 
> > My previous error handling fix made migration_channel_connect
> > call migrate_fd_connect in all cases; unfortunately the above
> > means it gets called twice and crashes doing double cleanup.
> > 
> > Fixes: 688a3dcba98
> 
> This fixes the crash, but we're still seeing error messages duplicated
> 
> (qemu) migrate_set_parameter tls-creds tls0
> (qemu) migrate tcp:localhost:9000
> qemu-system-x86_64: Certificate does not match the hostname localhost
> qemu-system-x86_64: Certificate does not match the hostname localhost
> 
> git bisect points to 688a3dcba98 as the cause of these double
> errors still.

FYI the stack traces look like this..

The first error message is printed in this context:

#0  0x0000555555bb6c20 in error_report (fmt=0x555555d0d77e "%s") at util/qemu-error.c:273
#1  0x0000555555bb5fa5 in error_report_err (err=0x55555749cb00) at util/error.c:228
#2  0x0000555555a6d1f8 in migrate_fd_cleanup (opaque=opaque@entry=0x5555569925a0) at migration/migration.c:1106
#3  0x0000555555a6e2aa in migrate_fd_connect (s=s@entry=0x5555569925a0, error_in=0x55555768de80) at migration/migration.c:2387
#4  0x0000555555a6f8e7 in migration_channel_connect (s=s@entry=0x5555569925a0, ioc=ioc@entry=0x555557e1de10, hostname=hostname@entry=0x0, error=<optimized out>) at migration/channel.c:83
#5  0x0000555555a6f2d6 in migration_tls_outgoing_handshake (task=<optimized out>, opaque=0x5555569925a0) at migration/tls.c:124
#6  0x0000555555b6d2b2 in qio_task_complete (task=task@entry=0x5555576ae620) at io/task.c:142
#7  0x0000555555b68c74 in qio_channel_tls_handshake_task (ioc=ioc@entry=0x555557e1de10, task=task@entry=0x5555576ae620) at io/channel-tls.c:171
#8  0x0000555555b6975a in qio_channel_tls_handshake (ioc=ioc@entry=0x555557e1de10, func=func@entry=0x555555a6f250 <migration_tls_outgoing_handshake>, opaque=opaque@entry=0x5555569925a0, destroy=destroy@entry=0x0) at io/channel-tls.c:215
#9  0x0000555555a6f6ac in migration_tls_channel_connect (s=s@entry=0x5555569925a0, ioc=ioc@entry=0x555556a01000, hostname=hostname@entry=0x555556bc0c30 "localhost", errp=errp@entry=0x7fffffffd778) at migration/tls.c:159
#10 0x0000555555a6f967 in migration_channel_connect (s=0x5555569925a0, ioc=ioc@entry=0x555556a01000, hostname=0x555556bc0c30 "localhost", error=<optimized out>) at migration/channel.c:73
#11 0x0000555555a6e5f4 in socket_outgoing_migration (task=<optimized out>, opaque=0x555556cc7de0) at migration/socket.c:85
#12 0x0000555555b6d2b2 in qio_task_complete (task=task@entry=0x5555576ce3b0) at io/task.c:142
#13 0x0000555555b6d3a2 in gio_task_thread_result (opaque=0x5555578eff50) at io/task.c:88
#14 0x00007ffff5da1577 in g_idle_dispatch () at /lib64/libglib-2.0.so.0
#15 0x00007ffff5da4b77 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#16 0x0000555555baf677 in glib_pollfds_poll () at util/main-loop.c:214
#17 0x0000555555baf677 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#18 0x0000555555baf677 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:515
#19 0x00005555557b9157 in main_loop () at vl.c:1935
#20 0x00005555557b9157 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4767


The second error message is printed in this context:

#0  0x0000555555bb6c20 in error_report (fmt=fmt@entry=0x555555d0d77e "%s") at util/qemu-error.c:273
#1  0x0000555555904035 in hmp_migrate_status_cb (opaque=0x555556bc0ba0) at hmp.c:1909
#2  0x0000555555baefac in timerlist_run_timers (timer_list=0x555556a0d760) at util/qemu-timer.c:536
#3  0x0000555555baf1b7 in qemu_clock_run_timers (type=QEMU_CLOCK_REALTIME) at util/qemu-timer.c:547
#4  0x0000555555baf1b7 in qemu_clock_run_all_timers () at util/qemu-timer.c:662
#5  0x0000555555baf69a in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:521
#6  0x00005555557b9157 in main_loop () at vl.c:1935
#7  0x00005555557b9157 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4767


The second stack trace is the error reporting context that I added originally
in

  commit d59ce6f34434bf47a9b26138c908650bf9a24be1
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Wed Apr 27 11:05:00 2016 +0100

    migration: add reporting of errors for outgoing migration


So the first stack trace is the new duplicate.

Which error reporting context is "better" though, I don't know ?

My patch was based on the view that, although alot of code uses error_report,
long term all migration would eventually need to be able to filter an
'Error *errp' back up the stack, so that we can pass it back to QMP / HMP via
'info migrate' / query-migrate. So I decided to leave the error_report_err
call to the hmp.c code, as long term that's the only place that would need
to print to the console.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2018-05-01 10:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-30 18:59 [Qemu-devel] [PATCH] Migration+TLS: Fix crash due to double cleanup Dr. David Alan Gilbert (git)
2018-05-01 10:00 ` Daniel P. Berrangé
2018-05-01 10:11   ` Daniel P. Berrangé [this message]
2018-05-01 10:57     ` Dr. David Alan Gilbert
2018-05-01 11:21       ` Daniel P. Berrangé
2018-05-01 12:51         ` Daniel P. Berrangé
2018-05-02 12:03 ` Peter Krempa
2018-05-02 12:07 ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180501101127.GK5708@redhat.com \
    --to=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pkrempa@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.