qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, "Laurent Vivier" <lvivier@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Yongji Xie" <elohimes@gmail.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Max Reitz" <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Tue, 23 Apr 2019 15:13:32 +0100	[thread overview]
Message-ID: <20190423141332.GN6022@redhat.com> (raw)
In-Reply-To: <46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com>

On Mon, Apr 22, 2019 at 09:51:17AM -0500, Eric Blake wrote:
> On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> > This is a followup to
> > 
> >   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> >   v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
> > 
> > This series comes out of a discussion between myself & Yongji Xie in:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
> > 
> > I eventually understood that the problem faced was that
> > tcp_chr_wait_connected was racing with the background connection attempt
> > previously started, causing two connections to be established. This
> > broke because some vhost user servers only allow a single connection.
> > 
> > After messing around with the code alot the final solution was in fact
> > very easy. We simply have to delay the first background connection
> > attempt until the main loop is running. It will then automatically
> > turn into a no-op if tcp_chr_wait_connected has been run. This is
> > dealt with in the last patch in this series
> > 
> > I believe this should solve the problem Yongji Xie faced, and thus not
> > require us to add support for "nowait" option with client sockets at
> > all. The reconnect=1 option effectively already implements nowait
> > semantics, and now plays nicely with tcp_chr_wait_connected.
> > 
> > In investigating this I found various other bugs that needed fixing and
> > identified some useful refactoring to simplify / clarify the code, hence
> > this very long series.
> 
> Even with this series applied, I'm still seeing sporadic failures of
> iotest 169. Max posted a hack patch a while back that tries to work
> around the race:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html
> 
> which he originally diagnosed in iotest 147:
> https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html
> 
> but as it was a hack, he has not pursued it further, and so the symptoms
> are still there, although not completely reproducible:
> 
> 169 10s ... - output mismatch (see 169.out.bad)
> --- /home/eblake/qemu/tests/qemu-iotests/169.out	2018-11-16
> 15:48:12.018526748 -0600
> +++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad	2019-04-22
> 09:38:45.481517132 -0500
> @@ -1,3 +1,5 @@
> +WARNING:qemu:qemu received signal 11:
> /home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
> -chardev
> socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
> -mon chardev=mon,mode=control -display none -vga none -qtest
> unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
> -machine accel=qtest -nodefaults -machine accel=qtest -drive
> if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback
> 
> Any chance you can take a look as to what a non-hack fix should be?

Oh, it looks like we dropped the ball here. We have a fix already but
it doesn't appear to have been merged for 4.0 :-(

  https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06174.html

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

WARNING: multiple messages have this Message-ID (diff)
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: "Laurent Vivier" <lvivier@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	qemu-devel@nongnu.org, "Max Reitz" <mreitz@redhat.com>,
	"Yongji Xie" <elohimes@gmail.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Tue, 23 Apr 2019 15:13:32 +0100	[thread overview]
Message-ID: <20190423141332.GN6022@redhat.com> (raw)
Message-ID: <20190423141332.FuAN6PS5t2v0g6LAf3nY8wMlTKInBV7jrSjjkhWmBXs@z> (raw)
In-Reply-To: <46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com>

On Mon, Apr 22, 2019 at 09:51:17AM -0500, Eric Blake wrote:
> On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> > This is a followup to
> > 
> >   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
> >   v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
> > 
> > This series comes out of a discussion between myself & Yongji Xie in:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
> > 
> > I eventually understood that the problem faced was that
> > tcp_chr_wait_connected was racing with the background connection attempt
> > previously started, causing two connections to be established. This
> > broke because some vhost user servers only allow a single connection.
> > 
> > After messing around with the code alot the final solution was in fact
> > very easy. We simply have to delay the first background connection
> > attempt until the main loop is running. It will then automatically
> > turn into a no-op if tcp_chr_wait_connected has been run. This is
> > dealt with in the last patch in this series
> > 
> > I believe this should solve the problem Yongji Xie faced, and thus not
> > require us to add support for "nowait" option with client sockets at
> > all. The reconnect=1 option effectively already implements nowait
> > semantics, and now plays nicely with tcp_chr_wait_connected.
> > 
> > In investigating this I found various other bugs that needed fixing and
> > identified some useful refactoring to simplify / clarify the code, hence
> > this very long series.
> 
> Even with this series applied, I'm still seeing sporadic failures of
> iotest 169. Max posted a hack patch a while back that tries to work
> around the race:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html
> 
> which he originally diagnosed in iotest 147:
> https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html
> 
> but as it was a hack, he has not pursued it further, and so the symptoms
> are still there, although not completely reproducible:
> 
> 169 10s ... - output mismatch (see 169.out.bad)
> --- /home/eblake/qemu/tests/qemu-iotests/169.out	2018-11-16
> 15:48:12.018526748 -0600
> +++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad	2019-04-22
> 09:38:45.481517132 -0500
> @@ -1,3 +1,5 @@
> +WARNING:qemu:qemu received signal 11:
> /home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
> -chardev
> socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
> -mon chardev=mon,mode=control -display none -vga none -qtest
> unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
> -machine accel=qtest -nodefaults -machine accel=qtest -drive
> if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback
> 
> Any chance you can take a look as to what a non-hack fix should be?

Oh, it looks like we dropped the ball here. We have a fix already but
it doesn't appear to have been merged for 4.0 :-(

  https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06174.html

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


  parent reply	other threads:[~2019-04-23 14:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11 18:24 [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 01/16] io: store reference to thread information in the QIOTask struct Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 02/16] io: add qio_task_wait_thread to join with a background thread Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 03/16] chardev: fix validation of options for QMP created chardevs Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 04/16] chardev: forbid 'reconnect' option with server sockets Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 05/16] chardev: forbid 'wait' option with client sockets Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 06/16] chardev: remove many local variables in qemu_chr_parse_socket Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 07/16] chardev: ensure qemu_chr_parse_compat reports missing driver error Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 08/16] chardev: remove unused 'sioc' variable & cleanup paths Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 09/16] chardev: split tcp_chr_wait_connected into two methods Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 10/16] chardev: split up qmp_chardev_open_socket connection code Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 11/16] chardev: use a state machine for socket connection state Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 12/16] chardev: honour the reconnect setting in tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 13/16] chardev: disallow TLS/telnet/websocket with tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 14/16] chardev: fix race with client connections in tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 15/16] tests: expand coverage of socket chardev test Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 16/16] chardev: ensure termios is fully initialized Daniel P. Berrangé
2019-04-22 14:51 ` [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected Eric Blake
2019-04-22 14:51   ` Eric Blake
2019-04-23 14:13   ` Daniel P. Berrangé [this message]
2019-04-23 14:13     ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190423141332.GN6022@redhat.com \
    --to=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=elohimes@gmail.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).