From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Nuizu-0002t8-8P for qemu-devel@nongnu.org; Thu, 25 Mar 2010 05:05:02 -0400 Received: from [140.186.70.92] (port=57010 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Nuizs-0002sY-F8 for qemu-devel@nongnu.org; Thu, 25 Mar 2010 05:05:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Nuizq-0005ck-4E for qemu-devel@nongnu.org; Thu, 25 Mar 2010 05:05:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52182) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Nuizp-0005cZ-Re for qemu-devel@nongnu.org; Thu, 25 Mar 2010 05:04:58 -0400 Message-ID: <4BAB2736.7020202@redhat.com> Date: Thu, 25 Mar 2010 11:04:54 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1269497310-21858-1-git-send-email-cam@cs.ualberta.ca> In-Reply-To: <1269497310-21858-1-git-send-email-cam@cs.ualberta.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH v3 0/2] Inter-VM shared memory PCI device List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cam Macdonell Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org On 03/25/2010 08:08 AM, Cam Macdonell wrote: > Support an inter-vm shared memory device that maps a shared-memory object > as a PCI device in the guest. This patch also supports interrupts between > guest by communicating over a unix domain socket. This patch applies to the > qemu-kvm repository. > > Changes in this version are using the qdev format and optional use of MSI and > ioeventfd/irqfd. > > The non-interrupt version is supported by passing the shm parameter > > -device ivshmem,size=,[shm=] > > which will simply map the shm object into a BAR. > > Interrupts are supported between multiple VMs by using a shared memory server > that is connected to with a socket character device > > -device ivshmem,size=[,chardev=][,irqfd=on] > [,msi=on][,nvectors=n] > -chardev socket,path=,id= > > The server passes file descriptors for the shared memory object and eventfds (our > interrupt mechanism) to the respective qemu instances. > > When using interrupts, VMs communicate with a shared memory server that passes > the shared memory object file descriptor using SCM_RIGHTS. The server assigns > each VM an ID number and sends this ID number to the Qemu process along with a > series of eventfd file descriptors, one per guest using the shared memory > server. These eventfds will be used to send interrupts between guests. Each > guest listens on the eventfd corresponding to their ID and may use the others > for sending interrupts to other guests. > Please put the spec somewhere publicly accessible with a permanent URL. I suggest a new qemu.git directory specs/. It's more important than the code IMO. > enum ivshmem_registers { > IntrMask = 0, > IntrStatus = 4, > IVPosition = 8, > Doorbell = 12 > }; > > The first two registers are the interrupt mask and status registers. Mask and > status are only used with pin-based interrupts. They are unused with MSI > interrupts. The IVPosition register is read-only and reports the guest's ID > number. Interrupts are triggered when a message is received on the guest's > eventfd from another VM. To trigger an event, a guest must write to another > guest's Doorbell. The "Doorbells" begin at offset 12. A particular guest's > doorbell offset in the MMIO region is equal to > > guest_id * 32 + Doorbell > > The doorbell register for each guest is 32-bits. The doorbell-per-guest > design was motivated for use with ioeventfd. > You can also use a single doorbell register with ioeventfd, as it can match against the data written. If you go this route, you'd have two doorbells, one where you write a guest ID to send an interrupt to that guest, and one where any write generates a multicast. Possible later extensions: - multiple doorbells that trigger different vectors - multicast doorbells > The semantics of the value written to the doorbell depends on whether the > device is using MSI or a regular pin-based interrupt. > I recommend against making the semantics interrupt-style dependent. It means the application needs to know whether MSI is in use or not, while it is generally the OS that is in control of that. > Regular Interrupts > ------------------ > > If regular interrupts are used (due to either a guest not supporting MSI or the > user specifying not to use them on the command-line) then the value written to > a guest's doorbell is what the guest's status register will be set to. > > An status of (2^32 - 1) indicates that a new guest has joined. Guests > should not send a message of this value for any other reason. > > Message Signalled Interrupts > ---------------------------- > > The important thing to remember with MSI is that it is only a signal, no > status is set (since MSI interrupts are not shared). All information other > than the interrupt itself should be communicated via the shared memory region. > MSI is on by default. It can be turned off with the msi=off to the parameter. > > If the device uses MSI then the value written to the doorbell is the MSI vector > that will be raised. Vector 0 is used to notify that a new guest has joined. > Vector 0 cannot be triggered by another guest since a value of 0 does not > trigger an eventfd. > Ah, looks like we approached the vector/guest matrix from different directions. > ioeventfd/irqfd > --------------- > > ioeventfd/irqfd is turned on by irqfd=on passed to the device parameter (it is > off by default). When using ioeventfd/irqfd the only interrupt value that can > be passed to another guest is 1 despite what value is written to a guest's > Doorbell. > ioeventfd/irqfd are an implementation detail. The spec should not depend on it. It needs to be written as if qemu and kvm do not exist. Again, I recommend Rusty's virtio-pci for inspiration. Applications should see exactly the same thing whether ioeventfd is enabled or not. > Sample programs, init scripts and the shared memory server are available in a > git repo here: > > www.gitorious.org/nahanni > > Cam Macdonell (2): > Support adding a file to qemu's ram allocation > Inter-VM shared memory PCI device > Do you plan do maintain the server indefinitely in that repository? If not, we can put it in qemu.git, perhaps under contrib/. -- error compiling committee.c: too many arguments to function