From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44581) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fAaLa-0001jA-2R for qemu-devel@nongnu.org; Mon, 23 Apr 2018 08:13:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fAaLZ-0004ec-0K for qemu-devel@nongnu.org; Mon, 23 Apr 2018 08:13:30 -0400 References: <20180423084518.2426-1-armbru@redhat.com> From: Paolo Bonzini Message-ID: Date: Mon, 23 Apr 2018 14:13:13 +0200 MIME-Version: 1.0 In-Reply-To: <20180423084518.2426-1-armbru@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] cpus: Fix event order on resume of stopped guest List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster , qemu-devel@nongnu.org Cc: qemu-block@nongnu.org, mreitz@redhat.com, kwolf@redhat.com On 23/04/2018 10:45, Markus Armbruster wrote: > When resume of a stopped guest immediately runs into block device > errors, the BLOCK_IO_ERROR event is sent before the RESUME event. > > Reproducer: > > 1. Create a scratch image > $ dd if=/dev/zero of=scratch.img bs=1M count=100 > > Size doesn't actually matter. > > 2. Prepare blkdebug configuration: > > $ cat >blkdebug.conf < [inject-error] > event = "write_aio" > errno = "5" > EOF > > Note that errno 5 is EIO. > > 3. Run a guest with an additional scratch disk, i.e. with additional > arguments > -drive if=none,id=scratch-drive,format=raw,werror=stop,file=blkdebug:blkdebug.conf:scratch.img > -device virtio-blk-pci,id=scratch,drive=scratch-drive > > The blkdebug part makes all writes to the scratch drive fail with > EIO. The werror=stop pauses the guest on write errors. > > 4. Connect to the QMP socket e.g. like this: > $ socat UNIX:/your/qmp/socket READLINE,history=$HOME/.qmp_history,prompt='QMP> ' > > Issue QMP command 'qmp_capabilities': > QMP> { "execute": "qmp_capabilities" } > > 5. Boot the guest. > > 6. In the guest, write to the scratch disk, e.g. like this: > > # dd if=/dev/zero of=/dev/vdb count=1 > > Do double-check the device specified with of= is actually the > scratch device! > > 7. Issue QMP command 'cont': > QMP> { "execute": "cont" } > > After step 6, I get a BLOCK_IO_ERROR event followed by a STOP event. Good. > > After step 7, I get BLOCK_IO_ERROR, then RESUME, then STOP. Not so > good; I'd expect RESUME, then BLOCK_IO_ERROR, then STOP. > > The funny event order confuses libvirt: virsh -r domstate DOMAIN > --reason reports "paused (unknown)" rather than "paused (I/O error)". > > The culprit is vm_prepare_start(). > > /* Ensure that a STOP/RESUME pair of events is emitted if a > * vmstop request was pending. The BLOCK_IO_ERROR event, for > * example, according to documentation is always followed by > * the STOP event. > */ > if (runstate_is_running()) { > qapi_event_send_stop(&error_abort); > res = -1; > } else { > replay_enable_events(); > cpu_enable_ticks(); > runstate_set(RUN_STATE_RUNNING); > vm_state_notify(1, RUN_STATE_RUNNING); > } > > /* We are sending this now, but the CPUs will be resumed shortly later */ > qapi_event_send_resume(&error_abort); > return res; > > When resuming a stopped guest, we take the else branch before we get > to sending RESUME. vm_state_notify() runs virtio_vmstate_change(), > among other things. This restarts I/O, triggering the BLOCK_IO_ERROR > event. > > Reshuffle vm_prepare_start() to send the RESUME event earlier. > > Fixes RHBZ 1566153. > > Cc: Paolo Bonzini > Signed-off-by: Markus Armbruster > --- > cpus.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/cpus.c b/cpus.c > index 38eba8bff3..398392bc3a 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -2043,7 +2043,6 @@ int vm_stop(RunState state) > int vm_prepare_start(void) > { > RunState requested; > - int res = 0; > > qemu_vmstop_requested(&requested); > if (runstate_is_running() && requested == RUN_STATE__MAX) { > @@ -2057,17 +2056,18 @@ int vm_prepare_start(void) > */ > if (runstate_is_running()) { > qapi_event_send_stop(&error_abort); > - res = -1; > - } else { > - replay_enable_events(); > - cpu_enable_ticks(); > - runstate_set(RUN_STATE_RUNNING); > - vm_state_notify(1, RUN_STATE_RUNNING); > + qapi_event_send_resume(&error_abort); > + return -1; > } > > /* We are sending this now, but the CPUs will be resumed shortly later */ > qapi_event_send_resume(&error_abort); > - return res; > + > + replay_enable_events(); > + cpu_enable_ticks(); > + runstate_set(RUN_STATE_RUNNING); > + vm_state_notify(1, RUN_STATE_RUNNING); > + return 0; > } > > void vm_start(void) > Queued, thanks. Paolo