From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Laurent Vivier <lvivier@redhat.com>,
Alexey Perevalov <a.perevalov@samsung.com>,
Juan Quintela <quintela@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy
Date: Fri, 4 Aug 2017 10:33:19 +0100 [thread overview]
Message-ID: <20170804093318.GC2805@work-vm> (raw)
In-Reply-To: <20170804034350.GD5561@pxdev.xzpeter.org>
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Aug 03, 2017 at 03:03:57PM +0100, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Tue, Aug 01, 2017 at 10:47:16AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Peter Xu (peterx@redhat.com) wrote:
> > >
> > > [...]
> > >
> > > > > +/* Return true if we should continue the migration, or false. */
> > > > > +static bool postcopy_pause_incoming(MigrationIncomingState *mis)
> > > > > +{
> > > > > + trace_postcopy_pause_incoming();
> > > > > +
> > > > > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > > > > + MIGRATION_STATUS_POSTCOPY_PAUSED);
> > > > > +
> > > > > + assert(mis->from_src_file);
> > > > > + qemu_file_shutdown(mis->from_src_file);
> > > > > + qemu_fclose(mis->from_src_file);
> > > > > + mis->from_src_file = NULL;
> > > > > +
> > > > > + assert(mis->to_src_file);
> > > > > + qemu_mutex_lock(&mis->rp_mutex);
> > > > > + qemu_file_shutdown(mis->to_src_file);
> > > > > + qemu_fclose(mis->to_src_file);
> > > > > + mis->to_src_file = NULL;
> > > > > + qemu_mutex_unlock(&mis->rp_mutex);
> > > >
> > > > Hmm is that safe? If we look at migrate_send_rp_message we have:
> > > >
> > > > static void migrate_send_rp_message(MigrationIncomingState *mis,
> > > > enum mig_rp_message_type message_type,
> > > > uint16_t len, void *data)
> > > > {
> > > > trace_migrate_send_rp_message((int)message_type, len);
> > > > qemu_mutex_lock(&mis->rp_mutex);
> > > > qemu_put_be16(mis->to_src_file, (unsigned int)message_type);
> > > > qemu_put_be16(mis->to_src_file, len);
> > > > qemu_put_buffer(mis->to_src_file, data, len);
> > > > qemu_fflush(mis->to_src_file);
> > > > qemu_mutex_unlock(&mis->rp_mutex);
> > > > }
> > > >
> > > > If we came into postcopy_pause_incoming at about the same time
> > > > migrate_send_rp_message was being called and pause_incoming took the
> > > > lock first, then once it release the lock, send_rp_message carries on
> > > > and uses mis->to_src_file that's now NULL.
> > > >
> > > > One solution here is to just call qemu_file_shutdown() but leave the
> > > > files open at this point, but clean the files up sometime later.
> > >
> > > I see the commnent on patch 14 as well - yeah, we need patch 14 to
> > > co-op here, and as long as we are with patch 14, we should be ok.
> > >
> > > >
> > > > > +
> > > > > + while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
> > > > > + qemu_sem_wait(&mis->postcopy_pause_sem_dst);
> > > > > + }
> > > > > +
> > > > > + trace_postcopy_pause_incoming_continued();
> > > > > +
> > > > > + return true;
> > > > > +}
> > > > > +
> > > > > static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > > > {
> > > > > uint8_t section_type;
> > > > > int ret = 0;
> > > > >
> > > > > +retry:
> > > > > while (true) {
> > > > > section_type = qemu_get_byte(f);
> > > > >
> > > > > @@ -2004,6 +2034,21 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
> > > > > out:
> > > > > if (ret < 0) {
> > > > > qemu_file_set_error(f, ret);
> > > > > +
> > > > > + /*
> > > > > + * Detect whether it is:
> > > > > + *
> > > > > + * 1. postcopy running
> > > > > + * 2. network failure (-EIO)
> > > > > + *
> > > > > + * If so, we try to wait for a recovery.
> > > > > + */
> > > > > + if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > > + ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > > + /* Reset f to point to the newly created channel */
> > > > > + f = mis->from_src_file;
> > > > > + goto retry;
> > > > > + }
> > > >
> > > > I wonder if:
> > > >
> > > > if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE &&
> > > > ret == -EIO && postcopy_pause_incoming(mis)) {
> > > > /* Try again after postcopy recovery */
> > > > return qemu_loadvm_state_main(mis->from_src_file, mis);
> > > > }
> > > > would be nicer; it avoids the goto loop.
> > >
> > > I agree we should avoid using goto loops. However I do see vast usages
> > > of goto like this one when we want to redo part of the procedures of a
> > > function (or, of course, when handling errors in "C-style").
> >
> > We mostly use them to jump forward to an error exit; only rarely do
> > we do loops with them; so if we can sensibly avoid them it's best.
> >
> > > Calling qemu_loadvm_state_main() inside itself is ok as well, but it
> > > also has defect: stack usage would be out of control, or even, it can
> > > be controled by malicious users. E.g., if someone used program to
> > > periodically stop/start any network endpoint along the migration
> > > network, QEMU may go into a paused -> recovery -> active -> paused ...
> > > loop, and stack usage will just grow with time. I'd say it's an
> > > extreme example though...
> >
> > I think it's safe because it's a tail-call so a new stack frame isn't
> > needed.
>
> I tried it and dumped the assembly, looks like even with tail-call, we
> didn't really avoid the "callq":
>
> (gdb) disassemble qemu_loadvm_state_main
> Dump of assembler code for function qemu_loadvm_state_main:
> 0x00000000005d9ff8 <+0>: push %rbp
> 0x00000000005d9ff9 <+1>: mov %rsp,%rbp
> 0x00000000005d9ffc <+4>: sub $0x20,%rsp
> 0x00000000005da000 <+8>: mov %rdi,-0x18(%rbp)
> 0x00000000005da004 <+12>: mov %rsi,-0x20(%rbp)
> 0x00000000005da008 <+16>: movl $0x0,-0x4(%rbp)
> 0x00000000005da00f <+23>: mov -0x18(%rbp),%rax
> 0x00000000005da013 <+27>: mov %rax,%rdi
> 0x00000000005da016 <+30>: callq 0x5e185e <qemu_get_byte>
>
> [...]
>
> 0x00000000005da135 <+317>: jne 0x5da165 <qemu_loadvm_state_main+365>
> 0x00000000005da137 <+319>: cmpl $0xfffffffb,-0x4(%rbp)
> 0x00000000005da13b <+323>: jne 0x5da165 <qemu_loadvm_state_main+365>
> 0x00000000005da13d <+325>: mov -0x20(%rbp),%rax
> 0x00000000005da141 <+329>: mov %rax,%rdi
> 0x00000000005da144 <+332>: callq 0x5d9eb4 <postcopy_pause_incoming>
> 0x00000000005da149 <+337>: test %al,%al
> 0x00000000005da14b <+339>: je 0x5da165 <qemu_loadvm_state_main+365>
> 0x00000000005da14d <+341>: mov -0x20(%rbp),%rax
> 0x00000000005da151 <+345>: mov (%rax),%rax
> 0x00000000005da154 <+348>: mov -0x20(%rbp),%rdx
> 0x00000000005da158 <+352>: mov %rdx,%rsi
> 0x00000000005da15b <+355>: mov %rax,%rdi
> 0x00000000005da15e <+358>: callq 0x5d9ff8 <qemu_loadvm_state_main>
> ^^^^^^^^^^^^^^^ (this one)
> 0x00000000005da163 <+363>: jmp 0x5da168 <qemu_loadvm_state_main+368>
> 0x00000000005da165 <+365>: mov -0x4(%rbp),%eax
> 0x00000000005da168 <+368>: leaveq
> 0x00000000005da169 <+369>: retq
>
> Do we need extra compilation parameters to achieve the tail-call
> optimization for gcc? My gcc version is: v6.1.1 20160621.
>
> (even with extra flags, I am still a bit worried on whether it'll work
> on the other compilers though)
Huh, I'd expected it to be smarter than that; not sure why it didn't!
Anyway, tbh I wouldn't worry about the stack depth in this case.
> And, the "label-way" to retry is indeed used widely at least in both
> QEMU and Linux kernel. I tried to directly grep "^retry:" (so we are
> ignoring the same usage using different label names), there are ~30
> usage in QEMU and hundreds of cases in Linux kernel. So not sure
> whether this can be seen as another "legal" way to use C labels...
OK, my distaste for Goto's is perhaps a bit stronger than others;
it's OK though.
Dave
> Thanks,
>
> --
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2017-08-04 9:33 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-28 8:06 [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 01/29] migration: fix incorrect postcopy recved_bitmap Peter Xu
2017-07-31 16:34 ` Dr. David Alan Gilbert
2017-08-01 2:11 ` Peter Xu
2017-08-01 5:48 ` Alexey Perevalov
2017-08-01 6:02 ` Peter Xu
2017-08-01 6:12 ` Alexey Perevalov
2017-07-28 8:06 ` [Qemu-devel] [RFC 02/29] migration: fix comment disorder in RAMState Peter Xu
2017-07-31 16:39 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 03/29] io: fix qio_channel_socket_accept err handling Peter Xu
2017-07-31 16:53 ` Dr. David Alan Gilbert
2017-08-01 2:25 ` Peter Xu
2017-08-01 8:32 ` Daniel P. Berrange
2017-08-01 8:55 ` Dr. David Alan Gilbert
2017-08-02 3:21 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 04/29] bitmap: introduce bitmap_invert() Peter Xu
2017-07-31 17:11 ` Dr. David Alan Gilbert
2017-08-01 2:43 ` Peter Xu
2017-08-01 8:40 ` Dr. David Alan Gilbert
2017-08-02 3:20 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 05/29] bitmap: introduce bitmap_count_one() Peter Xu
2017-07-31 17:58 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 06/29] migration: dump str in migrate_set_state trace Peter Xu
2017-07-31 18:27 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 07/29] migration: better error handling with QEMUFile Peter Xu
2017-07-31 18:39 ` Dr. David Alan Gilbert
2017-08-01 5:49 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 08/29] migration: reuse mis->userfault_quit_fd Peter Xu
2017-07-31 18:42 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 09/29] migration: provide postcopy_fault_thread_notify() Peter Xu
2017-07-31 18:45 ` Dr. David Alan Gilbert
2017-08-01 3:01 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 10/29] migration: new property "x-postcopy-fast" Peter Xu
2017-07-31 18:52 ` Dr. David Alan Gilbert
2017-08-01 3:13 ` Peter Xu
2017-08-01 8:50 ` Dr. David Alan Gilbert
2017-08-02 3:31 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 11/29] migration: new postcopy-pause state Peter Xu
2017-07-28 15:53 ` Eric Blake
2017-07-31 7:02 ` Peter Xu
2017-07-31 19:06 ` Dr. David Alan Gilbert
2017-08-01 6:28 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 12/29] migration: allow dst vm pause on postcopy Peter Xu
2017-08-01 9:47 ` Dr. David Alan Gilbert
2017-08-02 5:06 ` Peter Xu
2017-08-03 14:03 ` Dr. David Alan Gilbert
2017-08-04 3:43 ` Peter Xu
2017-08-04 9:33 ` Dr. David Alan Gilbert [this message]
2017-08-04 9:44 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 13/29] migration: allow src return path to pause Peter Xu
2017-08-01 10:01 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 14/29] migration: allow send_rq to fail Peter Xu
2017-08-01 10:30 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 15/29] migration: allow fault thread to pause Peter Xu
2017-08-01 10:41 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 16/29] qmp: hmp: add migrate "resume" option Peter Xu
2017-07-28 15:57 ` Eric Blake
2017-07-31 7:05 ` Peter Xu
2017-08-01 10:42 ` Dr. David Alan Gilbert
2017-08-01 11:03 ` Daniel P. Berrange
2017-08-02 5:56 ` Peter Xu
2017-08-02 9:28 ` Daniel P. Berrange
2017-07-28 8:06 ` [Qemu-devel] [RFC 17/29] migration: rebuild channel on source Peter Xu
2017-08-01 10:59 ` Dr. David Alan Gilbert
2017-08-02 6:14 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 18/29] migration: new state "postcopy-recover" Peter Xu
2017-08-01 11:36 ` Dr. David Alan Gilbert
2017-08-02 6:42 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 19/29] migration: let dst listen on port always Peter Xu
2017-08-01 10:56 ` Daniel P. Berrange
2017-08-02 7:02 ` Peter Xu
2017-08-02 9:26 ` Daniel P. Berrange
2017-08-02 11:02 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 20/29] migration: wakeup dst ram-load-thread for recover Peter Xu
2017-08-03 9:28 ` Dr. David Alan Gilbert
2017-08-04 5:46 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 21/29] migration: new cmd MIG_CMD_RECV_BITMAP Peter Xu
2017-08-03 9:49 ` Dr. David Alan Gilbert
2017-08-04 6:08 ` Peter Xu
2017-08-04 6:15 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 22/29] migration: new message MIG_RP_MSG_RECV_BITMAP Peter Xu
2017-08-03 10:50 ` Dr. David Alan Gilbert
2017-08-04 6:59 ` Peter Xu
2017-08-04 9:49 ` Dr. David Alan Gilbert
2017-08-07 6:11 ` Peter Xu
2017-08-07 9:04 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 23/29] migration: new cmd MIG_CMD_POSTCOPY_RESUME Peter Xu
2017-08-03 11:05 ` Dr. David Alan Gilbert
2017-08-04 7:04 ` Peter Xu
2017-08-04 7:09 ` Peter Xu
2017-08-04 8:30 ` Dr. David Alan Gilbert
2017-08-04 9:22 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 24/29] migration: new message MIG_RP_MSG_RESUME_ACK Peter Xu
2017-08-03 11:21 ` Dr. David Alan Gilbert
2017-08-04 7:23 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 25/29] migration: introduce SaveVMHandlers.resume_prepare Peter Xu
2017-08-03 11:38 ` Dr. David Alan Gilbert
2017-08-04 7:39 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 26/29] migration: synchronize dirty bitmap for resume Peter Xu
2017-08-03 11:56 ` Dr. David Alan Gilbert
2017-08-04 7:49 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 27/29] migration: setup ramstate " Peter Xu
2017-08-03 12:37 ` Dr. David Alan Gilbert
2017-08-04 8:39 ` Peter Xu
2017-07-28 8:06 ` [Qemu-devel] [RFC 28/29] migration: final handshake for the resume Peter Xu
2017-08-03 13:47 ` Dr. David Alan Gilbert
2017-08-04 9:05 ` Peter Xu
2017-08-04 9:53 ` Dr. David Alan Gilbert
2017-07-28 8:06 ` [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed Peter Xu
2017-08-03 13:54 ` Dr. David Alan Gilbert
2017-08-04 8:52 ` Peter Xu
2017-08-04 9:52 ` Dr. David Alan Gilbert
2017-08-07 6:57 ` Peter Xu
2017-07-28 10:06 ` [Qemu-devel] [RFC 00/29] Migration: postcopy failure recovery Peter Xu
2017-08-03 15:57 ` Dr. David Alan Gilbert
2017-08-21 7:47 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170804093318.GC2805@work-vm \
--to=dgilbert@redhat.com \
--cc=a.perevalov@samsung.com \
--cc=aarcange@redhat.com \
--cc=lvivier@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.