From: Eric Blake <eblake@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>, qemu-devel@nongnu.org
Cc: "Laurent Vivier" <lvivier@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Yongji Xie" <elohimes@gmail.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Max Reitz" <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Mon, 22 Apr 2019 09:51:17 -0500 [thread overview]
Message-ID: <46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com> (raw)
In-Reply-To: <20190211182442.8542-1-berrange@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2938 bytes --]
On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> This is a followup to
>
> v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
>
> This series comes out of a discussion between myself & Yongji Xie in:
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
>
> I eventually understood that the problem faced was that
> tcp_chr_wait_connected was racing with the background connection attempt
> previously started, causing two connections to be established. This
> broke because some vhost user servers only allow a single connection.
>
> After messing around with the code alot the final solution was in fact
> very easy. We simply have to delay the first background connection
> attempt until the main loop is running. It will then automatically
> turn into a no-op if tcp_chr_wait_connected has been run. This is
> dealt with in the last patch in this series
>
> I believe this should solve the problem Yongji Xie faced, and thus not
> require us to add support for "nowait" option with client sockets at
> all. The reconnect=1 option effectively already implements nowait
> semantics, and now plays nicely with tcp_chr_wait_connected.
>
> In investigating this I found various other bugs that needed fixing and
> identified some useful refactoring to simplify / clarify the code, hence
> this very long series.
Even with this series applied, I'm still seeing sporadic failures of
iotest 169. Max posted a hack patch a while back that tries to work
around the race:
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html
which he originally diagnosed in iotest 147:
https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html
but as it was a hack, he has not pursued it further, and so the symptoms
are still there, although not completely reproducible:
169 10s ... - output mismatch (see 169.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/169.out 2018-11-16
15:48:12.018526748 -0600
+++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad 2019-04-22
09:38:45.481517132 -0500
@@ -1,3 +1,5 @@
+WARNING:qemu:qemu received signal 11:
/home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
-chardev
socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
-mon chardev=mon,mode=control -display none -vga none -qtest
unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
-machine accel=qtest -nodefaults -machine accel=qtest -drive
if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback
Any chance you can take a look as to what a non-hack fix should be?
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: Eric Blake <eblake@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>, qemu-devel@nongnu.org
Cc: "Laurent Vivier" <lvivier@redhat.com>,
"Thomas Huth" <thuth@redhat.com>, "Max Reitz" <mreitz@redhat.com>,
"Yongji Xie" <elohimes@gmail.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Mon, 22 Apr 2019 09:51:17 -0500 [thread overview]
Message-ID: <46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com> (raw)
Message-ID: <20190422145117.awsPh8lsu3V5jIdPnhdPxlpUJ-GL5kIJ9-KJ95CHrKc@z> (raw)
In-Reply-To: <20190211182442.8542-1-berrange@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 2938 bytes --]
On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> This is a followup to
>
> v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
>
> This series comes out of a discussion between myself & Yongji Xie in:
>
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
>
> I eventually understood that the problem faced was that
> tcp_chr_wait_connected was racing with the background connection attempt
> previously started, causing two connections to be established. This
> broke because some vhost user servers only allow a single connection.
>
> After messing around with the code alot the final solution was in fact
> very easy. We simply have to delay the first background connection
> attempt until the main loop is running. It will then automatically
> turn into a no-op if tcp_chr_wait_connected has been run. This is
> dealt with in the last patch in this series
>
> I believe this should solve the problem Yongji Xie faced, and thus not
> require us to add support for "nowait" option with client sockets at
> all. The reconnect=1 option effectively already implements nowait
> semantics, and now plays nicely with tcp_chr_wait_connected.
>
> In investigating this I found various other bugs that needed fixing and
> identified some useful refactoring to simplify / clarify the code, hence
> this very long series.
Even with this series applied, I'm still seeing sporadic failures of
iotest 169. Max posted a hack patch a while back that tries to work
around the race:
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html
which he originally diagnosed in iotest 147:
https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html
but as it was a hack, he has not pursued it further, and so the symptoms
are still there, although not completely reproducible:
169 10s ... - output mismatch (see 169.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/169.out 2018-11-16
15:48:12.018526748 -0600
+++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad 2019-04-22
09:38:45.481517132 -0500
@@ -1,3 +1,5 @@
+WARNING:qemu:qemu received signal 11:
/home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
-chardev
socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
-mon chardev=mon,mode=control -display none -vga none -qtest
unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
-machine accel=qtest -nodefaults -machine accel=qtest -drive
if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback
Any chance you can take a look as to what a non-hack fix should be?
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-04-22 14:51 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-11 18:24 [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 01/16] io: store reference to thread information in the QIOTask struct Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 02/16] io: add qio_task_wait_thread to join with a background thread Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 03/16] chardev: fix validation of options for QMP created chardevs Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 04/16] chardev: forbid 'reconnect' option with server sockets Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 05/16] chardev: forbid 'wait' option with client sockets Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 06/16] chardev: remove many local variables in qemu_chr_parse_socket Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 07/16] chardev: ensure qemu_chr_parse_compat reports missing driver error Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 08/16] chardev: remove unused 'sioc' variable & cleanup paths Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 09/16] chardev: split tcp_chr_wait_connected into two methods Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 10/16] chardev: split up qmp_chardev_open_socket connection code Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 11/16] chardev: use a state machine for socket connection state Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 12/16] chardev: honour the reconnect setting in tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 13/16] chardev: disallow TLS/telnet/websocket with tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 14/16] chardev: fix race with client connections in tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 15/16] tests: expand coverage of socket chardev test Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 16/16] chardev: ensure termios is fully initialized Daniel P. Berrangé
2019-04-22 14:51 ` Eric Blake [this message]
2019-04-22 14:51 ` [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected Eric Blake
2019-04-23 14:13 ` Daniel P. Berrangé
2019-04-23 14:13 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com \
--to=eblake@redhat.com \
--cc=berrange@redhat.com \
--cc=elohimes@gmail.com \
--cc=lvivier@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).