From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39803) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSyJ6-00072R-4Z for qemu-devel@nongnu.org; Thu, 28 Jul 2016 23:17:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSyJ0-0003Dk-6S for qemu-devel@nongnu.org; Thu, 28 Jul 2016 23:17:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56482) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSyIz-0003Df-UQ for qemu-devel@nongnu.org; Thu, 28 Jul 2016 23:17:46 -0400 Date: Fri, 29 Jul 2016 06:17:42 +0300 From: "Michael S. Tsirkin" Message-ID: <20160729061525-mutt-send-email-mst@kernel.org> References: <20160726211527.28510-1-marcandre.lureau@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20160726211527.28510-1-marcandre.lureau@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v6 00/33] vhost-user reconnect fixes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: marcandre.lureau@redhat.com Cc: qemu-devel@nongnu.org, mukawa@igel.co.jp, yuanhan.liu@linux.intel.com, victork@redhat.com, jonshin@cisco.com On Wed, Jul 27, 2016 at 01:14:54AM +0400, marcandre.lureau@redhat.com wro= te: > From: Marc-Andr=E9 Lureau >=20 > Hi, >=20 > Since 'vhost-user: simple reconnection support' has been merged, it is > possible to disconnect and reconnect a vhost-user backend. However, > many code paths in qemu may trigger assert() when the backend is > disconnected. >=20 > There are also code paths that are wrong, see "don't assume opaque is > a fd" patch for an example. Once those patches are reviewed & merged, > they are good candidates for stable too. Thanks, I merged most of these except the new tests. > Some assert() could simply be replaced by error_report() or silently > fail since they are recoverable cases. Some missing error checks can > also help prevent later issues. The errors are reported up to vhost.c, > as the vhost-user backend alone doesn't handle disconnected state > transparently so far. There are still problematic code paths left > after this series, for example, starting a migration with a > disconnected backend will abort(). It is likely that other problematic > code path exists (vhost_scsi_start failure is fatal, but there are no > vhost-user backend that I know yet). >=20 > In many cases, the code assumes get_vhost_net() will be non-NULL after > a succesful connection, so I changed it to stay after a disconnect > until the new connection comes (as suggested by Michael). >=20 > Since there is feature checks on reconnection, qemu should wait for > the initial connection feature negotiation to complete. The test added > demonstrates this. Additionally, a regression was found during v2, > which could have been spotted with a multiqueue test, so add a basic > one that would have exhibited the issue. >=20 > For convenience, the series is also available on: > https://github.com/elmarco/qemu, branch vhost-user-reconnect >=20 > v6: > - fixes after Ilya Maximets review > - add "disconnect on HUP" for cases where peer disconnect doesn't > trigger qemu disconnect event (reconnect won't work) > - add a flags mismatch on reconnect test >=20 > v5: > - rebased > - use a VHOST_OPS_DEBUG macro to print vhost_ops errors > - replace assert(foo !=3D NULL) with assert(foo) > - add "RFC: vhost: do not update last avail idx" >=20 > v4: > - change notify_migration_done() patch to be VHOST_BACKEND_TYPE_USER > specific, to avoid having to handle the case where the backend > doesn't implement the callback > - change vhost_dev_cleanup() to assert on empty log, instead of > adding a call to vhost_log_put() > - made "keep vhost_net after a disconnection" more robust, got rid of > the acked_features field > - improve commit log, and some patch reorganization for clarity >=20 > v3: > - add vhost-user multiqueue test, which would have helped to find the > following fix > - fix waiting on vhost-user connection with multiqueue (found by > Yuanhan Liu) > - merge vhost_user_{read,write}() error checking patches > - add error reporting to vhost_user_read() (similar to > vhost_user_write()) > - add a vhost_net_set_backend() wrapper to help with potential crash > - some leak fixes >=20 > v2: > - some patch ordering: minor fix, close(fd) fix, > assert/fprintf->error_report, check and return error, > vhost_dev_cleanup() fixes, keep vhost_net after a disconnect, wait > until connection is ready > - merge read/write error checks > - do not rely on link state to check vhost-user init completed >=20 > Marc-Andr=E9 Lureau (33): > misc: indentation > vhost-user: minor simplification > vhost-user: disconnect on HUP > vhost: don't assume opaque is a fd, use backend cleanup > vhost: make vhost_log_put() idempotent > vhost: assert the log was cleaned up > vhost: fix cleanup on not fully initialized device > vhost: make vhost_dev_cleanup() idempotent > vhost-net: always call vhost_dev_cleanup() on failure > vhost: fix calling vhost_dev_cleanup() after vhost_dev_init() > vhost: do not assert() on vhost_ops failure > vhost: add missing VHOST_OPS_DEBUG > vhost: use error_report() instead of fprintf(stderr,...) > qemu-char: fix qemu_chr_fe_set_msgfds() crash when disconnected > vhost-user: call set_msgfds unconditionally > vhost-user: check qemu_chr_fe_set_msgfds() return value > vhost-user: check vhost_user_{read,write}() return value > vhost-user: keep vhost_net after a disconnection > vhost-user: add get_vhost_net() assertions > Revert "vhost-net: do not crash if backend is not present" > vhost-net: vhost_migration_done is vhost-user specific > vhost: add assert() to check runtime behaviour > char: add chr_wait_connected callback > char: add and use tcp_chr_wait_connected > vhost-user: wait until backend init is completed > tests: plug some leaks in virtio-net-test > tests: fix vhost-user-test leak > tests: add /vhost-user/connect-fail test > tests: add a simple /vhost-user/multiqueue test > vhost-user: add error report in vhost_user_write() > vhost: add vhost_net_set_backend() > vhost-user-test: add flags mismatch test > RFC: vhost: do not update last avail idx on get_vring_base() failure >=20 > hw/net/vhost_net.c | 34 +++----- > hw/virtio/vhost-user.c | 67 ++++++++++----- > hw/virtio/vhost.c | 162 +++++++++++++++++++++++------------- > include/hw/virtio/vhost.h | 4 + > include/sysemu/char.h | 8 ++ > net/tap.c | 1 + > net/vhost-user.c | 65 +++++++++------ > qemu-char.c | 82 ++++++++++++------ > tests/Makefile.include | 2 +- > tests/vhost-user-test.c | 206 ++++++++++++++++++++++++++++++++++++++= +++++++- > tests/virtio-net-test.c | 12 ++- > 11 files changed, 486 insertions(+), 157 deletions(-) >=20 > --=20 > 2.9.0