From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <aliguori@us.ibm.com>
Subject: splice() based interguest networking
Date: Mon, 01 Dec 2008 13:33:13 -0600
Message-ID: <49343BF9.30308@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: kvm-devel <kvm@vger.kernel.org>
To: Rusty Russell <rusty@rustcorp.com.au>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e2.ny.us.ibm.com ([32.97.182.142]:33889 "EHLO e2.ny.us.ibm.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752377AbYLATdP (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 1 Dec 2008 14:33:15 -0500
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236])
	by e2.ny.us.ibm.com (8.13.1/8.13.1) with ESMTP id mB1JWi8u032263
	for <kvm@vger.kernel.org>; Mon, 1 Dec 2008 14:32:44 -0500
Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215])
	by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id mB1JXF9Q137734
	for <kvm@vger.kernel.org>; Mon, 1 Dec 2008 14:33:15 -0500
Received: from d01av01.pok.ibm.com (loopback [127.0.0.1])
	by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id mB1JXE0e010760
	for <kvm@vger.kernel.org>; Mon, 1 Dec 2008 14:33:14 -0500
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Here's a random thought I had after seeing the new Xen netchannel2 tree 
had fast-path support for guest<=>guest communication.

With virtio, we could do really fast interguest networking in 
userspace.  We have a few requirements though:

1) There should be a minimal number of copies, just one in almost all cases.
2) The copy should occur on the receiving end since the receiver is most 
likely going to be accessing the data in the future
3) The copy should be done in the kernel so that in the future it could 
be accelerated with a generic DMA engine.

So far, all the approaches required mmap()'ing the guest memory in both 
QEMU instances which makes it much less useful.  I think splice solves 
this problem though and gets us most of the above for free.

If we have two shared pipes() between the two QEMU processes, then:

1) On TX, we vmsplice() from the sg buffer to one pipe.  This will end 
up being vmsplice_to_pipe() in the kernel which is zero-copy.

2) The pipe becomes readable which will result in an RX notification in 
the other process, we see if we have any buffers available in the 
receive queue.  If so, we vmsplice() from the pipe to the sg buffer.  
This will result in a copy via vmsplice_to_user().  In the future, 
vmsplice_to_user() would be an obvious candidate for IO-AT acceleration.

Since the copy is happening in the kernel, assuming you're not in a 
highmem situation, no page table manipulation is required.

We still have to address feature negotation and such.

Regards,

Anthony Liguori