From: Andrew Cooper <andrew.cooper3@citrix.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Vincent Hanquez <Vincent.Hanquez@eu.citrix.com>,
Ross Philipson <Ross.Philipson@citrix.com>,
"Xen-devel@lists.xen.org" <Xen-devel@lists.xen.org>
Subject: Re: Inter-domain Communication using Virtual Sockets (high-level design)
Date: Tue, 11 Jun 2013 19:54:19 +0100 [thread overview]
Message-ID: <51B7725B.6020108@citrix.com> (raw)
In-Reply-To: <51B76754.7040800@citrix.com>
On 11/06/13 19:07, David Vrabel wrote:
> All,
>
> This is a high-level design document for an inter-domain communication
> system under the virtual sockets API (AF_VSOCK) recently added to Linux.
>
> Two low-level transports are discussed: a shared ring based one
> requiring no additional hypervisor support and v4v.
>
> The PDF (including the diagrams) is available here:
>
> http://xenbits.xen.org/people/dvrabel/inter-domain-comms-C.pdf
>
> % Inter-domain Communication using Virtual Sockets
> % David Vrabel <<david.vrabel@citrix.com>
Mismatched angles.
> % Draft C
>
> Introduction
> ============
>
> Revision History
> ----------------
>
> --------------------------------------------------------------------
> Version Date Changes
> ------- ----------- ----------------------------------------------
> Draft C 11 Jun 2013 Minor clarifications.
>
> Draft B 10 Jun 2013 Added a section on the low-level shared ring
> transport.
>
> Added a section on using v4v as the low-level
> transport.
>
> Draft A 28 May 2013 Initial draft.
> --------------------------------------------------------------------
>
> Purpose
> -------
>
> In the Windsor architecture for XenServer, dom0 is disaggregated into
> several _service domains_. Examples of service domains include
> network and storage driver domains, and qemu (stub) domains.
>
> To allow the toolstack to manage service domains there needs to be a
> communication mechanism between the toolstack running in one domain and
> all the service domains.
>
> The principle focus of this new transport is control-plane traffic
> (low latency and low data rates) but consideration is given to future
> uses requiring higher data rates.
>
> Linux 3.9 support virtual sockets which is a new type of socket (the
> new AF_VSOCK address family) for inter-domain communication. This was
> originally implemented for VMWare's VMCI transport but has hooks for
> other transports. This will be used to provide the interface to
> applications.
>
>
> System Overview
> ---------------
>
> 
>
>
> Design Map
> ----------
>
> The linux kernel requires a Xen-specific virtual socket transport and
> front and back drivers.
>
> The connection manager is a new user space daemon running in the
> backend domain.
>
> Toolstacks will require changes to allow them to set the policy used
> by the connection manager. The design of these changes is out of
> scope of this document.
>
> Definitions and Acronyms
> ------------------------
>
> _AF\_VSOCK_
> ~ The address family for virtual sockets.
>
> _CID (Context ID)_
>
> ~ The domain ID portion of the AF_VSOCK address format.
>
> _Port_
>
> ~ The part of the AF_VSOCK address format identifying a specific
> service. Similar to the port number used in TCP connection.
>
> _Virtual Socket_
>
> ~ A socket using the AF_VSOCK protocol.
>
> References
> ----------
>
> [Windsor Architecture slides from XenSummit
> 2012](http://www.slideshare.net/xen_com_mgr/windsor-domain-0-disaggregation-for-xenserver-and-xcp)
>
>
> Design Considerations
> =====================
>
> Assumptions
> -----------
>
> * There exists a low-level peer-to-peer, datagram based transport
> mechanism using shared rings (as in libvchan).
>
> Constraints
> -----------
>
> * The AF_VSOCK address format is limited to a 32-bit CID and a 32-bit
> port number. This is sufficient as Xen only has 16-bit domain IDs.
>
> Risks and Volatile Areas
> ------------------------
>
> * The transport may be used between untrusted peers. A domain may be
> subject to malicious activity or denial of service attacks.
>
> Architecture
> ============
>
> Overview
> --------
>
> 
>
> Linux's virtual sockets are used as the interface to applications.
> Virtual sockets were introduced in Linux 3.9 and provides a hypervisor
> independent[^1] interface to user space applications for inter-domain
> communication.
>
> [^1]: The API and address format is hypervisor independent but the
> address values are not.
>
> An internal API is provided to implement a low-level virtual socket
> transport. This will be implemented within a pair of front and back
> drivers. The use of the standard front/back driver method allows the
> toolstack to handle the suspend, resume and migration in a similar way
> to the existing drivers.
>
> The front/back pair provides a point-to-point link between the two
> domains. This is used to communicate between applications on those
> hosts and between the frontend domain and the _connection manager_
> running on the backend.
>
> The connection manager allows domUs to request direct connections to
> peer domains. Without the connection manager, peers have no mechanism
> to exchange the information ncessary for setting up the direct
> connections. The toolstack sets the policy in the connection manager
> to allow connection requests. The default policy is to deny
> connection requests.
>
>
> High Level Design
> =================
>
> Virtual Sockets
> ---------------
>
> The AF_VSOCK socket address family in the Linux kernel has a two part
> address format: a uint32_t _context ID_ (_CID_) identifying the domain
> and a uint32_t port for the specific service in that domain.
>
> The CID shall be the domain ID and some CIDs have a specific meaning.
>
> CID Purpose
> ------------------- -------
> 0x7FF0 (DOMID_SELF) The local domain.
> 0x7FF1 The backend domain (where the connection manager
> is).
0x7FF1 is DOMID_IO which has a separate definition as far as Xen is
concerned.
Is it not possible for this information to be in xenstore?
>
> Some port numbers are reserved.
>
> Port Purpose
> ---- -------
> 0 Reserved
> 1 Connection Manager
> 2-1023 Reserved for well-known services (such as a service discovery
> service).
If you are making use of DOMID_SELF, probably also make use of
DOMID_FIRST_RESERVED, which has the same numeric value.
>
> Front / Back Drivers
> --------------------
>
> Using a front or back driver to provide the virtual socket transport
> allows the toolstack to only make the inter-domain communication
> facility available to selected domains.
>
> The "standard" xenbus connection state machine shall be used. See
> figures \ref{fig_front-sm} and \ref{fig_back-sm} on pages
> \pageref{fig_front-sm} and \pageref{fig_back-sm}.
>
> 
>
> 
>
>
> Connection Manager
> ------------------
>
> The connection manager has two main purposes.
>
> 1. Checking that two domains are permitted to connect.
>
> 2. Providing a mechanism for two domains to exchange the grant
> references and event channels needed for them to setup a shared
> ring transport.
>
> Domains commnicate with the connection manager over the front-back
> transport link. The connection manager must be in the same domain as
> the virtual socket backend driver.
>
> The connection manager opens a virtual socket and listens on a well
> defined port (port 1).
>
> The following messages are defined.
>
> Message Purpose
> ------- -------
> CONNECT_req Request connection to another peer.
> CONNECT_rsp Response to a connection request.
> CONNECT_ind Indicate that a peer is trying to connect.
> CONNECT_ack Acknowledge a connection request.
>
> 
>
> Before forwarding a connection request to a peer, the connection
> manager checks that the connection is permitted. The toolstack sets
> these permissions.
>
> Disconnecting transport links to an uncooperative (or dead) domain is
> required. Therefore there are no messages for disconnecting transport
> links (as these may be ignore or delayed). Instead a transport link is
> disconnected by tearing down the local end. The peer will notice the
> remote end going away and then teardown its end.
>
> Low-level transport
> ===================
>
> [ This exact details are yet to be determined but this section should
> provide a reasonably summary of the mechanisms used. ]
>
> Frontend and backend domains
> ----------------------------
>
> As is typical for frontend and backend drivers, the frontend will
> grant copy-only access to two rings -- one for from-front messages and
> one for to-front messages. Each ring shall have an event channel for
> notifying when requests and responses are placed on the ring.
The term "grant copy-only" is very confusing to read in context.
However I cant offhand think of a better way of describing it.
~Andrew
>
> Peer domains
> ------------
>
> The initiator grants copy-only access to a from-initiator (transmit)
> ring and provides an event channel for notifications for this ring.
> This information is included in the CONNECT_req and CONNECT_ind
> messages.
>
> The responder grants copy-only access to a from-responder (transmit)
> ring and provides an event channel for notifications for this ring.
> The information is included in the CONNECT_ack and CONNECT_rsp
> messages.
>
> After the initial connection, the two domains operate as identical
> peers. Disconnection is signalled by a domain ungranting its transmit
> ring, notifying the peer via the associated event channel. The event
> channel is then unbound.
>
> Appendix
> ========
>
> V4V
> ---
>
> An alternative low-level transport (V4V) has been proposed. The
> hypervisor copies messages from the source domain into a destination
> ring provided by the destination domain.
>
> Because peers are untrusted, in order to prevent them from being able
> to denial-of-service the processing of messages from other peers, each
> receiver must have a per-peer receive ring. A listening service does
> not know in advance which peers may connect so it cannot create these
> rings in advance.
>
> The connection manager service running in a trusted domain (as in the
> shared ring transport described above) may be used. The CONNECT_ind
> message is used to trigger the creation of receive ring for that
> specific sender.
>
> A peer must be able to find the connection manager service both at
> start of day and if the connection manager service is restarted in a
> new domain. This can be done in two possible ways:
>
> 1. Watch a Xenstore key which contains the connection manager service
> domain ID.
>
> 2. Use a frontend/backend driver pair.
>
> ### Advantages
>
> * Does not use grant table resource. If shared rings are used then a
> busy guest with hundreds of peers will require more grant table
> entries than the current default.
>
> ### Disadvantages
>
> * Any changes or extentions to the protocol or ring format would
> require a hypervisor change. This is more difficult than making
> changes to guests.
>
> * The connection-less, "shared-bus" model of v4v is unsuitable for
> untrusted peers. This requires layering a connection model on top
> and much of the simplicity of the v4v ABI is lost.
>
> * The mechanism for handling full destination rings will not scale up
> on busy domains. The event channel only indicates that some ring
> may have space -- it does not identify which ring has space.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2013-06-11 18:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-11 18:07 Inter-domain Communication using Virtual Sockets (high-level design) David Vrabel
2013-06-11 18:54 ` Andrew Cooper [this message]
2013-06-13 16:27 ` Tim Deegan
2013-06-17 16:19 ` David Vrabel
2013-06-20 11:15 ` Tim Deegan
2013-06-17 18:28 ` Ross Philipson
2013-06-20 11:05 ` David Vrabel
2013-06-20 11:30 ` Tim Deegan
2013-06-20 14:11 ` Ross Philipson
2013-10-30 14:51 ` David Vrabel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B7725B.6020108@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ross.Philipson@citrix.com \
--cc=Vincent.Hanquez@eu.citrix.com \
--cc=Xen-devel@lists.xen.org \
--cc=david.vrabel@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).