Re: Re: Interdomain comms

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Wray <mike.wray@hp.com>
To: andrew.warfield@cl.cam.ac.uk
Cc: Eric Van Hensbergen <ericvh@gmail.com>,
	Eric Van Hensbergen <ericvh@users.sourceforge.net>,
	Harry Butterworth <harry@hebutterworth.freeserve.co.uk>,
	"Ronald G. Minnich" <rminnich@lanl.gov>,
	xen-devel@lists.xensource.com
Subject: Re: Re: Interdomain comms
Date: Tue, 10 May 2005 09:31:12 +0100	[thread overview]
Message-ID: <42807150.3030907@hp.com> (raw)
In-Reply-To: <eacc82a4050508011979bda457@mail.gmail.com>

Andrew Warfield wrote:
> Hi Eric,
> 
>    Your thoughts on 9P are all really interesting -- I'd come across
> the protocol years ago in looking into approaches to remote device/fs
> access but had a hard time finding details.  It's quite interesting to
> hear a bit more about the approach taken.
> 
>    Having a more accessible inter-domain comms API is clearly a good
> thing, and extending device channels (in our terminology -- shared
> memory + event notification) to work across a cluster is something
> that we've talked about on several occasions at the lab.
> 
>    I do think though, that as mentioned above there are some concerns
> with the VMM environment that make this a little trickier.  For the
> general case of inefficient comms between VMs, using the regular IP
> stack may be okay for many people.  The net drivers are being fixed up
> to special-case local communications.
> 
>    For the more specific cases of FE/BE comms, I think the devil may
> be in the details more than the current discussion is alluding to. 
> Specifically:
> 
> 
>>c) As long as the buffers in question (both *buf and the buffer cache
>>entry) were page-aligned, etc. -- we could play clever VM games
>>marking the page as shared RO between the two partitions and alias the
>>virtual memory pointed to by *buf to the shared page.  This is very
>>sketchy and high level and I need to delve into all sorts of details
>>-- but the idea would be to use virtual memory as your friend for
>>these sort of shared read-only buffer caches.  It would also require
>>careful allocation of buffers of the right size on the right alignment
>>-- but driver writers are used to that sort of thing.
> 
> 
>    Most of the good performance that Xen gets off of block and net
> split devices are specifically because of these clever VM games. 
> Block FEs pass page references down to be mapped directly for DMA. 
> Net devices pass pages into a free pool, and actually exchange
> physical pages under the feet of the VM as inbound packets are
> demultiplexed.  The grant tables that have recently been added provide
> separate mechanisms for the mapping and ownership transfer of pages
> across domains.  In addition to these tricks, we make careful use of
> timing event notification in order to batch messages.

It should be possible to still use the page mapping in the i/o transport.
The issue right now is that the i/o interface is very low-level and
intimately tangled up with the structs being transported.

And with the domain control channel there's an implicit assumption
that 'there can be only one'. This means for example, that domain A
using a device with backend in domain B can't connect directly to domain B,
but has to be 'introduced' by xend. It'd be better if it could connect
directly.

Something like what Harry proposes should still be able to use
page mapping for efficient local comms, but without _requiring_
it. This opens the way for alternative transports, such as network.

Rather than going straight for something very high-level, I'd prefer
to build up gradually, starting with a more general message transport
api that includes analogues to listen/connect/recv/send.

> 
>    In the case of the buffer cache that has come up several times in
> the thread, a cache across domains would potentially neet to pass read
> only page mappings as CoW in many situations, and a fault handler
> somewhere would need to bring in a new page to the guest on a write. 
> There are also a pile of complicating cases with regards cache
> eviction from a BE domain, migration, and so on that make the
> accounting really tricky.  I think it would be quite good to have a
> discussion of generalized interdomain comms address the current
> drivers, as well as a hypothetical buffer cache as potential cases. 
> Does 9P already have hooks that would allow you to handle this sort of
> per-application special case?
> 
>    Additionally, I think we get away with a lot in the current drivers
> from a falure model that excludes transport.  The FE or BE can crash,
> and the two drivers can be written defensively to handle that.  How
> does 9P handle the strangenesses of real distribution?
> 
> Anyhow, very interesting discussion... looking forward to your thoughts.
> 
> a.
> 

Mike

next prev parent reply	other threads:[~2005-05-10  8:31 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-05 15:18 please help: initialize XEND for my debug-FE/BE.c Aggarwal, Vikas (OFT)
2005-05-05 20:37 ` Harry Butterworth
     [not found]   ` <427B20B9.1010101@hp.com>
2005-05-06 12:14     ` Interdomain comms Harry Butterworth
2005-05-06 13:39       ` Mark Williamson
2005-05-06 16:04       ` Ronald G. Minnich
2005-05-06 16:49         ` Eric Van Hensbergen
2005-05-06 23:13         ` Harry Butterworth
2005-05-07  0:19           ` Eric Van Hensbergen
2005-05-07 13:26             ` Harry Butterworth
2005-05-07 14:57               ` Eric Van Hensbergen
2005-05-07 16:15               ` Ronald G. Minnich
2005-05-07 17:10                 ` Keir Fraser
2005-05-07 21:22                   ` Eric Van Hensbergen
2005-05-07 17:17                 ` Harry Butterworth
2005-05-07 21:29                   ` Eric Van Hensbergen
2005-05-07 22:11                     ` Harry Butterworth
2005-05-08  0:57                       ` Eric Van Hensbergen
2005-05-08  8:19                         ` Andrew Warfield
2005-05-08 15:27                           ` Eric Van Hensbergen
2005-05-10  8:31                           ` Mike Wray [this message]
2005-05-10 10:09                             ` Andrew Warfield
2005-05-10 14:30                               ` Mike Wray
2005-05-10 14:51                               ` Harry Butterworth
     [not found]                                 ` <eacc82a405051008243195164c@mail.gmail.com>
2005-05-10 15:26                                   ` Andrew Warfield
2005-05-10 16:42                                     ` Harry Butterworth
2005-05-08  8:36                         ` Harry Butterworth
2005-05-08 16:18                           ` Eric Van Hensbergen
2005-05-08 17:48                             ` Harry Butterworth
2005-05-06 16:57       ` Nivedita Singhvi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42807150.3030907@hp.com \
    --to=mike.wray@hp.com \
    --cc=andrew.warfield@cl.cam.ac.uk \
    --cc=ericvh@gmail.com \
    --cc=ericvh@users.sourceforge.net \
    --cc=harry@hebutterworth.freeserve.co.uk \
    --cc=rminnich@lanl.gov \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.