From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3092C282DD for ; Thu, 23 May 2019 15:37:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CCF2C2177E for ; Thu, 23 May 2019 15:37:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731032AbfEWPhR (ORCPT ); Thu, 23 May 2019 11:37:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36724 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730859AbfEWPhR (ORCPT ); Thu, 23 May 2019 11:37:17 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 39E8FAC2E5; Thu, 23 May 2019 15:37:12 +0000 (UTC) Received: from localhost (ovpn-116-196.ams2.redhat.com [10.36.116.196]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3FDE75B683; Thu, 23 May 2019 15:37:05 +0000 (UTC) Date: Thu, 23 May 2019 16:37:03 +0100 From: Stefan Hajnoczi To: Stefano Garzarella Cc: netdev@vger.kernel.org, Dexuan Cui , Jorgen Hansen , "David S. Miller" , Vishnu Dasa , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Sasha Levin Subject: Re: [RFC] vsock: proposal to support multiple transports at runtime Message-ID: <20190523153703.GC19296@stefanha-x1.localdomain> References: <20190514081543.f6nphcilgjuemlet@steredhat> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="tqI+Z3u+9OQ7kwn0" Content-Disposition: inline In-Reply-To: <20190514081543.f6nphcilgjuemlet@steredhat> User-Agent: Mutt/1.11.4 (2019-03-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Thu, 23 May 2019 15:37:17 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org --tqI+Z3u+9OQ7kwn0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote: > Hi guys, > I'm currently interested on implement a multi-transport support for VSOCK= in > order to handle nested VMs. >=20 > As Stefan suggested me, I started to look at this discussion: > https://lkml.org/lkml/2017/8/17/551 > Below I tried to summarize a proposal for a discussion, following the ide= as > from Dexuan, Jorgen, and Stefan. >=20 >=20 > We can define two types of transport that we have to handle at the same t= ime > (e.g. in a nested VM we would have both types of transport running togeth= er): >=20 > - 'host side transport', it runs in the host and it is used to communicat= e with > the guests of a specific hypervisor (KVM, VMWare or HyperV) >=20 > Should we support multiple 'host side transport' running at the same ti= me? >=20 > - 'guest side transport'. it runs in the guest and it is used to communic= ate > with the host transport I find this terminology confusing. Perhaps "host->guest" (your 'host side transport') and "guest->host" (your 'guest side transport') is clearer? Or maybe the nested virtualization terminology of L2 transport (your 'host side transport') and L0 transport (your 'guest side transport')? Here we are the L1 guest and L0 is the host and L2 is our nested guest. >=20 >=20 > The main goal is to find a way to decide what transport use in these case= s: > 1. connect() / sendto() >=20 > a. use the 'host side transport', if the destination is the guest > (dest_cid > VMADDR_CID_HOST). > If we want to support multiple 'host side transport' running at the > same time, we should assign CIDs uniquely across all transports. > In this way, a packet generated by the host side will get directed > to the appropriate transport based on the CID The multiple host side transport case is unlikely to be necessary on x86 where only one hypervisor uses VMX at any given time. But eventually it may happen so it's wise to at least allow it in the design. >=20 > b. use the 'guest side transport', if the destination is the host > (dest_cid =3D=3D VMADDR_CID_HOST) Makes sense to me. >=20 >=20 > 2. listen() / recvfrom() >=20 > a. use the 'host side transport', if the socket is bound to > VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no > guest transport. > We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to > address this case. > If we want to support multiple 'host side transport' running at the > same time, we should find a way to allow an application to bound a > specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM, > VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV) Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible. What if my service should only be available to a subset of VMware VMs? Instead it might be more appropriate to use network namespaces to create independent AF_VSOCK addressing domains. Then you could have two separate groups of VMware VMs and selectively listen to just one group. >=20 > b. use the 'guest side transport', if the socket is bound to local CID > different from the VMADDR_CID_HOST (guest CID get with > IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY > (to be backward compatible). > Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST. Two additional topics: 1. How will loading af_vsock.ko change? In particular, can an application create a socket in af_vsock.ko without any loaded transport? Can it enter listen state without any loaded transport (this seems useful with VMADDR_CID_ANY)? 2. Does your proposed behavior match VMware's existing nested vsock semantics? --tqI+Z3u+9OQ7kwn0 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAlzmvh8ACgkQnKSrs4Gr c8hofwf/Q9vJlFPU32Zvf8ODCDlhKdTsXr+k8K5C9cr/qVQs3ew4TaAXt94rtKlA jBJsxwrzVjabofTqlPIWVQm6LqhU9l2r+cR6YaqhH+5RlrgdyfOBHCCuWEKMjaor fmWFU1qx5f1UN8jX79edaxwWkZjxULNiMjzeOLIb2LcoXA4RUWSyTqo1/rpFWNoy J8BrUBOCl6HW5VzVpCSllIiEVe8Kl1wtSUcq+p//pnWDei2Ww/rl4QJGuT5PxQg2 Q9ze1wdVHl89uYSHiBCXqo62oKu7uZVUMqSrEpjB2jbBjnul3zYYr5KIz+cfzmJI FZD95HkGxgVNUgQcRS1ZpWpic4Z+tA== =uaFr -----END PGP SIGNATURE----- --tqI+Z3u+9OQ7kwn0--