From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51073)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yuanhan.liu@linux.intel.com>) id 1aveNO-0006sg-7B
	for qemu-devel@nongnu.org; Thu, 28 Apr 2016 01:20:35 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yuanhan.liu@linux.intel.com>) id 1aveNL-0001aX-1e
	for qemu-devel@nongnu.org; Thu, 28 Apr 2016 01:20:34 -0400
Received: from mga01.intel.com ([192.55.52.88]:63074)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yuanhan.liu@linux.intel.com>) id 1aveNK-0001aO-Rq
	for qemu-devel@nongnu.org; Thu, 28 Apr 2016 01:20:30 -0400
Date: Wed, 27 Apr 2016 22:23:04 -0700
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Message-ID: <20160428052304.GF25677@yliu-dev.sh.intel.com>
References: <1459509388-6185-1-git-send-email-marcandre.lureau@redhat.com>
	<1459509388-6185-12-git-send-email-marcandre.lureau@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1459509388-6185-12-git-send-email-marcandre.lureau@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: marcandre.lureau@redhat.com
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>, Tetsuya Mukawa <mukawa@igel.co.jp>, jonshin@cisco.com, Ilya Maximets <i.maximets@samsung.com>, "Xie, Huawei" <huawei.xie@intel.com>

On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote:
> From: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
>  docs/specs/vhost-user.txt | 15 +++++++++++++++
>  hw/virtio/vhost-user.c    | 16 ++++++++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> index 8a635fa..60d6d13 100644
> --- a/docs/specs/vhost-user.txt
> +++ b/docs/specs/vhost-user.txt
> @@ -487,3 +487,18 @@ Message types
>        request to the master. It is passed in the ancillary data.
>        This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
>        feature is available.
> +
> +Slave message types
> +-------------------
> +
> + * VHOST_USER_SLAVE_SHUTDOWN:
> +      Id: 1
> +      Master payload: N/A
> +      Slave payload: u64
> +
> +      Request the master to shutdown the slave. A 0 reply is for
> +      success, in which case the slave may close all connections
> +      immediately and quit. A non-zero reply cancels the request.
> +
> +      Before a reply comes, the master may make other requests in
> +      order to flush or sync state.

Hi all,

I threw this proposal as well as DPDK's implementation to our customer
(OVS, Openstack and some other teams) who made such request before. I'm
sorry to say that none of them really liked that we can't handle crash.
Making reconnect work from a vhost-user backend crash is exactly something
they are after.

And to handle the crash, I was thinking of the proposal from Michael.
That is to do reset from the guest OS. This would fix this issue
ultimately. However, old kernel will not benefit from this, as well
as other guest other than Linux, making it not that useful for current
usage. 

Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance
to get the vring base (last used idx) from the backend, Huawei suggests
that we could still make it in a consistent state after the crash, if
we get the vring base from vring->used->idx.  That worked as expected
from my test. The only tricky thing might be how to detect a crash,
and we could do a simple compare of the vring base from QEMU with
the vring->used->idx at the initiation stage. If mismatch found, get
it from vring->used->idx instead.

Comments/thoughts? Is it a solid enough solution to you?  If so, we
could make things much simpler, and what's most important, we could
be able to handle crash.

	--yliu