public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefano Garzarella <sgarzare@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>,
	Jorgen Hansen <jhansen@vmware.com>
Cc: netdev@vger.kernel.org, Dexuan Cui <decui@microsoft.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vishnu Dasa <vdasa@vmware.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Sasha Levin <sashal@kernel.org>
Subject: Re: [RFC] vsock: proposal to support multiple transports at runtime
Date: Mon, 27 May 2019 12:44:47 +0200	[thread overview]
Message-ID: <20190527104447.gd23h2dsnmit75ry@steredhat> (raw)
In-Reply-To: <20190523153703.GC19296@stefanha-x1.localdomain>

On Thu, May 23, 2019 at 04:37:03PM +0100, Stefan Hajnoczi wrote:
> On Tue, May 14, 2019 at 10:15:43AM +0200, Stefano Garzarella wrote:
> > Hi guys,
> > I'm currently interested on implement a multi-transport support for VSOCK in
> > order to handle nested VMs.
> > 
> > As Stefan suggested me, I started to look at this discussion:
> > https://lkml.org/lkml/2017/8/17/551
> > Below I tried to summarize a proposal for a discussion, following the ideas
> > from Dexuan, Jorgen, and Stefan.
> > 
> > 
> > We can define two types of transport that we have to handle at the same time
> > (e.g. in a nested VM we would have both types of transport running together):
> > 
> > - 'host side transport', it runs in the host and it is used to communicate with
> >   the guests of a specific hypervisor (KVM, VMWare or HyperV)
> > 
> >   Should we support multiple 'host side transport' running at the same time?
> > 
> > - 'guest side transport'. it runs in the guest and it is used to communicate
> >   with the host transport
> 
> I find this terminology confusing.  Perhaps "host->guest" (your 'host
> side transport') and "guest->host" (your 'guest side transport') is
> clearer?

I agree, "host->guest" and "guest->host" are better, I'll use them.

> 
> Or maybe the nested virtualization terminology of L2 transport (your
> 'host side transport') and L0 transport (your 'guest side transport')?
> Here we are the L1 guest and L0 is the host and L2 is our nested guest.
>

I'm confused, if L2 is the nested guest, it should be the
'guest side transport'. Did I miss anything?

Maybe it is another point to your first proposal :)

> > 
> > 
> > The main goal is to find a way to decide what transport use in these cases:
> > 1. connect() / sendto()
> > 
> > 	a. use the 'host side transport', if the destination is the guest
> > 	   (dest_cid > VMADDR_CID_HOST).
> > 	   If we want to support multiple 'host side transport' running at the
> > 	   same time, we should assign CIDs uniquely across all transports.
> > 	   In this way, a packet generated by the host side will get directed
> > 	   to the appropriate transport based on the CID
> 
> The multiple host side transport case is unlikely to be necessary on x86
> where only one hypervisor uses VMX at any given time.  But eventually it
> may happen so it's wise to at least allow it in the design.
> 

Okay, I was in doubt, but I'll keep it in the design.

> > 
> > 	b. use the 'guest side transport', if the destination is the host
> > 	   (dest_cid == VMADDR_CID_HOST)
> 
> Makes sense to me.
> 
> > 
> > 
> > 2. listen() / recvfrom()
> > 
> > 	a. use the 'host side transport', if the socket is bound to
> > 	   VMADDR_CID_HOST, or it is bound to VMADDR_CID_ANY and there is no
> > 	   guest transport.
> > 	   We could also define a new VMADDR_CID_LISTEN_FROM_GUEST in order to
> > 	   address this case.
> > 	   If we want to support multiple 'host side transport' running at the
> > 	   same time, we should find a way to allow an application to bound a
> > 	   specific host transport (e.g. adding new VMADDR_CID_LISTEN_FROM_KVM,
> > 	   VMADDR_CID_LISTEN_FROM_VMWARE, VMADDR_CID_LISTEN_FROM_HYPERV)
> 
> Hmm...VMADDR_CID_LISTEN_FROM_KVM, VMADDR_CID_LISTEN_FROM_VMWARE,
> VMADDR_CID_LISTEN_FROM_HYPERV isn't very flexible.  What if my service
> should only be available to a subset of VMware VMs?

You're right, it is not very flexible.

> 
> Instead it might be more appropriate to use network namespaces to create
> independent AF_VSOCK addressing domains.  Then you could have two
> separate groups of VMware VMs and selectively listen to just one group.
> 

Does AF_VSOCK support network namespace or it could be another
improvement to take care? (IIUC is not currently supported)

A possible issue that I'm seeing with netns is if they are used for
other purpose (e.g. to isolate the network of a VM), we should have
multiple instances of the application, one per netns.

> > 
> > 	b. use the 'guest side transport', if the socket is bound to local CID
> > 	   different from the VMADDR_CID_HOST (guest CID get with
> > 	   IOCTL_VM_SOCKETS_GET_LOCAL_CID), or it is bound to VMADDR_CID_ANY
> > 	   (to be backward compatible).
> > 	   Also in this case, we could define a new VMADDR_CID_LISTEN_FROM_HOST.
> 
> Two additional topics:
> 
> 1. How will loading af_vsock.ko change?

I'd allow the loading of af_vsock.ko without any transport.
Maybe we should move the MODULE_ALIAS_NETPROTO(PF_VSOCK) from the
vmci_transport.ko to the af_vsock.ko, but this can impact the VMware
driver.

>    In particular, can an
>    application create a socket in af_vsock.ko without any loaded
>    transport?  Can it enter listen state without any loaded transport
>    (this seems useful with VMADDR_CID_ANY)?

I'll check if we can allow listen sockets without any loaded transport,
but I think could be a nice behaviour to have.

> 
> 2. Does your proposed behavior match VMware's existing nested vsock
>    semantics?

I'm not sure, but I tried to follow the Jorgen's answers to the original
thread. I hope that this proposal matches the VMware semantic.

@Jorgen, do you have any advice?

Thanks,
Stefano

  reply	other threads:[~2019-05-27 10:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-14  8:15 [RFC] vsock: proposal to support multiple transports at runtime Stefano Garzarella
2019-05-16 21:48 ` Dexuan Cui
2019-05-20 14:44   ` Stefano Garzarella
2019-05-23 15:37 ` Stefan Hajnoczi
2019-05-27 10:44   ` Stefano Garzarella [this message]
2019-05-28 16:01     ` Jorgen Hansen
2019-05-30 11:19       ` Stefano Garzarella
2019-05-31  9:24         ` Jorgen Hansen
2019-06-03 10:49           ` Stefano Garzarella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190527104447.gd23h2dsnmit75ry@steredhat \
    --to=sgarzare@redhat.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=jhansen@vmware.com \
    --cc=kys@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stefanha@redhat.com \
    --cc=sthemmin@microsoft.com \
    --cc=vdasa@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox