From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36160) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gWzEo-00040j-Sw for qemu-devel@nongnu.org; Wed, 12 Dec 2018 02:47:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gWzEj-0007Zi-QI for qemu-devel@nongnu.org; Wed, 12 Dec 2018 02:47:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41336) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gWzEj-0007ZS-Hr for qemu-devel@nongnu.org; Wed, 12 Dec 2018 02:47:17 -0500 References: <20181206063552.6701-1-xieyongji@baidu.com> <561e0d7a-af2f-caad-aec3-c15952665687@redhat.com> <20181206085849-mutt-send-email-mst@kernel.org> <5794e090-9a9b-ca30-1066-ef697c9b67be@redhat.com> <7520e2cd-59cc-c133-f913-e7397df684dd@redhat.com> From: Jason Wang Message-ID: Date: Wed, 12 Dec 2018 15:47:04 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yongji Xie Cc: "Michael S. Tsirkin" , zhangyu31@baidu.com, Xie Yongji , lilin24@baidu.com, qemu-devel@nongnu.org, chaiwen@baidu.com, marcandre.lureau@redhat.com, nixun@baidu.com On 2018/12/12 =E4=B8=8B=E5=8D=882:41, Yongji Xie wrote: > On Wed, 12 Dec 2018 at 12:07, Jason Wang wrote: >> >> On 2018/12/12 =E4=B8=8A=E5=8D=8811:21, Yongji Xie wrote: >>> On Wed, 12 Dec 2018 at 11:00, Jason Wang wrote: >>>> On 2018/12/12 =E4=B8=8A=E5=8D=8810:48, Yongji Xie wrote: >>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang wrot= e: >>>>>> On 2018/12/6 =E4=B8=8B=E5=8D=889:59, Michael S. Tsirkin wrote: >>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: >>>>>>>> On 2018/12/6 =E4=B8=8B=E5=8D=882:35,elohimes@gmail.com wrote: >>>>>>>>> From: Xie Yongji >>>>>>>>> >>>>>>>>> This patchset is aimed at supporting qemu to reconnect >>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or >>>>>>>>> restart. >>>>>>>>> >>>>>>>>> The patch 1 tries to implenment the sync connection for >>>>>>>>> "reconnect socket". >>>>>>>>> >>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLI= GHT >>>>>>>>> to support offering shared memory to backend to record >>>>>>>>> its inflight I/O. >>>>>>>>> >>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of >>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIG= HT. >>>>>>>>> >>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when >>>>>>>>> connection closed. >>>>>>>>> >>>>>>>>> The patch 6 tells qemu that we support reconnecting now. >>>>>>>>> >>>>>>>>> To use it, we could start qemu with: >>>>>>>>> >>>>>>>>> qemu-system-x86_64 \ >>>>>>>>> -chardev socket,id=3Dchar0,path=3D/path/vhost.sock= et,reconnect=3D1,wait \ >>>>>>>>> -device vhost-user-blk-pci,chardev=3Dchar0 \ >>>>>>>>> >>>>>>>>> and start vhost-user-blk backend with: >>>>>>>>> >>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket >>>>>>>>> >>>>>>>>> Then we can restart vhost-user-blk at any time during VM runnin= g. >>>>>>>> I wonder whether or not it's better to handle this at the level = of virtio >>>>>>>> protocol itself instead of vhost-user level. E.g expose last_ava= il_idx to >>>>>>>> driver might be sufficient? >>>>>>>> >>>>>>>> Another possible issue is, looks like you need to deal with diff= erent kinds >>>>>>>> of ring layouts e.g packed virtqueues. >>>>>>>> >>>>>>>> Thanks >>>>>>> I'm not sure I understand your comments here. >>>>>>> All these would be guest-visible extensions. >>>>>> Looks not, it only introduces a shared memory between qemu and >>>>>> vhost-user backend? >>>>>> >>>>>> >>>>>>> Possible for sure but how is this related to >>>>>>> a patch supporting transparent reconnects? >>>>>> I might miss something. My understanding is that we support transp= arent >>>>>> reconnects, but we can't deduce an accurate last_avail_idx and thi= s is >>>>>> what capability this series try to add. To me, this series is func= tional >>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in availab= le >>>>>> ring. So the information is inside guest memory, vhost-user backen= d can >>>>>> access it and update it directly. I believe this is some modern NI= C did >>>>>> as well (but index is in MMIO area of course). >>>>>> >>>>> Hi Jason, >>>>> >>>>> If my understand is correct, it might be not enough to only expose >>>>> last_avail_idx. >>>>> Because we would not process descriptors in the same order in which= they have >>>>> been made available sometimes. If so, we can't get correct inflight >>>>> I/O information >>>>> from available ring. >>>> You can get this with the help of the both used ring and last_avail_= idx >>>> I believe. Or maybe you can give us an example? >>>> >>> A simple example, we assume ring size is 8: >>> >>> 1. guest fill avail ring >>> >>> avail ring: 0 1 2 3 4 5 6 7 >>> used ring: >>> >>> 2. vhost-user backend complete 4,5,6,7 and fill used ring >>> >>> avail ring: 0 1 2 3 4 5 6 7 >>> used ring: 4 5 6 7 >>> >>> 3. guest fill avail ring again >>> >>> avail ring: 4 5 6 7 4 5 6 7 >>> used ring: 4 5 6 7 >>> >>> 4. vhost-user backend crash >>> >>> The inflight descriptors 0, 1, 2, 3 lost. >>> >>> Thanks, >>> Yongji >> >> Ok, then we can simply forbid increasing the avail_idx in this case? >> >> Basically, it's a question of whether or not it's better to done it in >> the level of virtio instead of vhost. I'm pretty sure if we expose >> sufficient information, it could be done without touching vhost-user. >> And we won't deal with e.g migration and other cases. >> > OK, I get your point. That's indeed an alternative way. But this featur= e seems > to be only useful to vhost-user backend. I admit I could not think of a use case other than vhost-user. > I'm not sure whether it make sense to > touch virtio protocol for this feature. Some possible advantages: - Feature could be determined and noticed by user or management layer. - There's no need to invent ring layout specific protocol to record in=20 flight descriptors. E.g if my understanding is correct, for this series=20 and for the example above, it still can not work for packed virtqueue=20 since descriptor id is not sufficient (descriptor could be overwritten=20 by used one). You probably need to have a (partial) copy of descriptor=20 ring for this. - No need to deal with migration, all information was in guest memory. Thanks > > Thanks, > Yongji