From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christian =?utf-8?q?Borntr=C3=A4ger?= Subject: Re: [PATCH v2] Driver for Inter-VM shared memory device for KVM supporting interrupts. Date: Wed, 20 May 2009 09:33:01 +0200 Message-ID: <200905200933.01736.borntraeger@de.ibm.com> References: <1241713567-17256-1-git-send-email-cam@cs.ualberta.ca> <4A12E37C.700@cs.ualberta.ca> <200905201228.38718.rusty@rustcorp.com.au> Mime-Version: 1.0 Content-Type: Text/Plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Cam Macdonell , Avi Kivity , kvm@vger.kernel.org, Christian Ehrhardt , Anthony Liguori To: Rusty Russell Return-path: Received: from mtagate4.de.ibm.com ([195.212.29.153]:41079 "EHLO mtagate4.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753518AbZETHdI convert rfc822-to-8bit (ORCPT ); Wed, 20 May 2009 03:33:08 -0400 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate4.de.ibm.com (8.14.3/8.13.8) with ESMTP id n4K7X7fe176560 for ; Wed, 20 May 2009 07:33:07 GMT Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n4K7X7NP688138 for ; Wed, 20 May 2009 09:33:07 +0200 Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n4K7X3vo027655 for ; Wed, 20 May 2009 09:33:06 +0200 In-Reply-To: <200905201228.38718.rusty@rustcorp.com.au> Content-Disposition: inline Sender: kvm-owner@vger.kernel.org List-ID: Am Mittwoch 20 Mai 2009 04:58:38 schrieb Rusty Russell: > On Wed, 20 May 2009 02:21:08 am Cam Macdonell wrote: > > Avi Kivity wrote: > > > Christian Borntr=C3=A4ger wrote: > > >>> To summarize, Anthony thinks it should use virtio, while I beli= eve > > >>> virtio is useful for exporting guest memory, not for importing = host > > >>> memory. > > Yes, precisely. > > But what's it *for*, this shared memory? Implementing shared memory = is > trivial. Using it is harder. For example, inter-guest networking: y= ou'd > have to copy packets in and out, making it slow as well as losing > abstraction. > > The only interesting idea I can think of is exposing it to userspace,= and > having that run some protocol across it for fast app <-> app comms. = But if > that's your plan, you still have a lot of code the write! > > So I guess I'm missing the big picture here? I can give some insights about shared memory usage in z/VM. z/VM uses s= o- called discontiguous saved segments (DCSS) to shared memory between gue= sts. (naming side note: o discontigous because these segments can have holes and different acc= ess rights, e.g. you can build DCSS that go from 800M-801M read only = and 900M-910M exclusive-write. o segments because the 2nd level of our page tables is called segment = table. ) z/VM uses these segments for several purposes: o The monitoring subsystem uses a DCSS to get data from several compone= nts o shared guest kernels: The CMS operating system is build as a bootable= DCSS (called named-saved-segments NSS). All guests have the same host page= s for the read-only parts of the CMS kernel. The local data is stored in exclusive-write parts of the same NSS. Linux on System z is also capa= ble of using this feature (CONFIG_SHARED_KERNEL). The kernel linkage is chan= ged in a way to separate the read-only text segment from the other parts wit= h segment size alignment o execute-in-place: This is a Linux feature to exploit the DCSS technol= ogy. The goal is to shared identical guest pages without the additional ov= erhead of KSM etc. We have a block device driver for DCSS. This block device= driver supports the direct_access function and therefore allows to use the x= ip option of ext2. The idea is to put binaries into an read-only ext2 filesystem. Whenever an mmap is made on this file system, the page is= not mapped into the page cache. The ptes point into the DCSS memory inste= ad. Since the DCSS is demand-paged by the host no memory is wasted for un= used parts of the binaries. In case of COW the page is copied as usual. It= turned out that installations with many similar guests (lets say 400 guests)= will profit in terms of memory saving and quicker application startups (no= t the first guest of course). There is a downside: this requires a skilled administrator to setup. We have also experimented with network, Posix shared memory, and shared= caches=20 via DCSS. Most of these ideas turned out to be not very useful or hard = to=20 implement proper.