From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33836) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akMMp-0003SV-6h for qemu-devel@nongnu.org; Sun, 27 Mar 2016 21:53:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1akMMl-0004qE-U3 for qemu-devel@nongnu.org; Sun, 27 Mar 2016 21:53:19 -0400 Received: from mail-pa0-x22b.google.com ([2607:f8b0:400e:c03::22b]:33308) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1akMMl-0004qA-IL for qemu-devel@nongnu.org; Sun, 27 Mar 2016 21:53:15 -0400 Received: by mail-pa0-x22b.google.com with SMTP id zm5so1146049pac.0 for ; Sun, 27 Mar 2016 18:53:14 -0700 (PDT) References: <1441753806-14225-1-git-send-email-marcandre.lureau@redhat.com> <20151126121944-mutt-send-email-mst@redhat.com> <20160324071001.GA4525@yliu-dev.sh.intel.com> From: Tetsuya Mukawa Message-ID: <56F88E87.2030704@igel.co.jp> Date: Mon, 28 Mar 2016 10:53:11 +0900 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yuanhan Liu Cc: =?UTF-8?Q?Marc-Andr=c3=a9_Lureau?= , QEMU , "Michael S. Tsirkin" On 2016/03/26 3:00, Marc-André Lureau wrote: > Hi > > On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu > wrote: >>>> The following series starts from the idea that the slave can request a >>>> "managed" shutdown instead and later recover (I guess the use case for >>>> this is to allow for example to update static dispatching/filter rules >>>> etc) >> What if the backend crashes, that no such request will be sent? And >> I'm wondering why this request is needed, as we are able to detect >> the disconnect now (with your patches). > I don't think trying to handle backend crashes is really a thing we > need to take care of. If the backend is bad enough to crash, it may as > well corrupt the guest memory (mst: my understanding of vhost-user is > that backend must be trusted, or it could just throw garbage in the > queue descriptors with surprising consequences or elsewhere in the > guest memory actually, right?). > >> BTW, you meant to let QEMU as the server and the backend as the client >> here, right? Honestly, that's what we've thought of, too, in the first >> time. >> However, I'm wondering could we still go with the QEMU as the client >> and the backend as the server (the default and the only way DPDK >> supports), and let QEMU to try to reconnect when the backend crashes >> and restarts. In such case, we need enable the "reconnect" option >> for vhost-user, and once I have done that, it basically works in my >> test: >> > Conceptually, I think if we allow the backend to disconnect, it makes > sense that qemu is actually the socket server. But it doesn't matter > much, it's simple to teach qemu to reconnect a timer... So we should > probably allow both cases anyway. > >> - start DPDK vhost-switch example >> >> - start QEMU, which will connect to DPDK vhost-user >> >> link is good now. >> >> - kill DPDK vhost-switch >> >> link is broken at this stage >> >> - start DPDK vhost-switch again >> >> you will find that the link is back again. >> >> >> Will that makes sense to you? If so, we may need do nothing (or just >> very few) changes at all to DPDK to get the reconnect work. > The main issue with handling crashes (gone at any time) is that the > backend my not have time to sync the used idx (at the least). It may > already have processed incoming packets, so on reconnect, it may > duplicate the receiving/dispatching work. Similarly, on the backend > receiving end, some packets may be lost, never received by the VM, and > later overwritten by the backend after reconnect (for the same used > idx update reason). This may not be a big deal for unreliable > protocols, but I am not familiar enough with network usage to know if > that's fine in all cases. It may be fine for some packets, such as > udp. > > However, in general, vhost-user should not be specific to network > transmission, and it would be nice to have a reliable way for the the > backend to reconnect. That's what I try to do in this series. I'll > repost it after I have done more testing. > > thanks > Hi Yuanhan, Probably, we have 2 options here. One is using DEVICE_NEEDS_RESET, or adding one more new status like QUEUE_NEEDS_RESET to virtio specification. In this case, we will need to fix virtio-net drivers and virtio-net device of QEMU, so it might need to fix a lot of code, but we can handle unexpected shutdown of vhost-user backend. The other option is Marc's simple solution. In this case, we don't need to change virtio-net drivers, but we cannot handle unexpected shutdown. Thanks, Tetsuya