From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: splice() based interguest networking Date: Mon, 01 Dec 2008 13:33:13 -0600 Message-ID: <49343BF9.30308@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm-devel To: Rusty Russell Return-path: Received: from e2.ny.us.ibm.com ([32.97.182.142]:33889 "EHLO e2.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752377AbYLATdP (ORCPT ); Mon, 1 Dec 2008 14:33:15 -0500 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.13.1/8.13.1) with ESMTP id mB1JWi8u032263 for ; Mon, 1 Dec 2008 14:32:44 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id mB1JXF9Q137734 for ; Mon, 1 Dec 2008 14:33:15 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id mB1JXE0e010760 for ; Mon, 1 Dec 2008 14:33:14 -0500 Sender: kvm-owner@vger.kernel.org List-ID: Here's a random thought I had after seeing the new Xen netchannel2 tree had fast-path support for guest<=>guest communication. With virtio, we could do really fast interguest networking in userspace. We have a few requirements though: 1) There should be a minimal number of copies, just one in almost all cases. 2) The copy should occur on the receiving end since the receiver is most likely going to be accessing the data in the future 3) The copy should be done in the kernel so that in the future it could be accelerated with a generic DMA engine. So far, all the approaches required mmap()'ing the guest memory in both QEMU instances which makes it much less useful. I think splice solves this problem though and gets us most of the above for free. If we have two shared pipes() between the two QEMU processes, then: 1) On TX, we vmsplice() from the sg buffer to one pipe. This will end up being vmsplice_to_pipe() in the kernel which is zero-copy. 2) The pipe becomes readable which will result in an RX notification in the other process, we see if we have any buffers available in the receive queue. If so, we vmsplice() from the pipe to the sg buffer. This will result in a copy via vmsplice_to_user(). In the future, vmsplice_to_user() would be an obvious candidate for IO-AT acceleration. Since the copy is happening in the kernel, assuming you're not in a highmem situation, no page table manipulation is required. We still have to address feature negotation and such. Regards, Anthony Liguori