From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Np1Kz-0000Rv-Uj for qemu-devel@nongnu.org; Tue, 09 Mar 2010 10:27:14 -0500 Received: from [199.232.76.173] (port=58623 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Np1Kz-0000RZ-Fo for qemu-devel@nongnu.org; Tue, 09 Mar 2010 10:27:13 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1Np1Kx-00039y-9k for qemu-devel@nongnu.org; Tue, 09 Mar 2010 10:27:13 -0500 Received: from mail-iw0-f176.google.com ([209.85.223.176]:55929) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Np1Kw-00039s-Vw for qemu-devel@nongnu.org; Tue, 09 Mar 2010 10:27:11 -0500 Received: by iwn6 with SMTP id 6so3681499iwn.4 for ; Tue, 09 Mar 2010 07:27:09 -0800 (PST) MIME-Version: 1.0 Sender: camm@ualberta.ca In-Reply-To: <8286e4ee1003090724m1ef0b571g8b705a24e36e1753@mail.gmail.com> References: <1267833161-25267-1-git-send-email-cam@cs.ualberta.ca> <1267833161-25267-2-git-send-email-cam@cs.ualberta.ca> <4B94C9B3.1060904@redhat.com> <8286e4ee1003080957v9bb4837x187cebb8477348c2@mail.gmail.com> <4B962301.3030008@redhat.com> <8286e4ee1003090724m1ef0b571g8b705a24e36e1753@mail.gmail.com> Date: Tue, 9 Mar 2010 08:27:09 -0700 Message-ID: <8286e4ee1003090727j1d45e5dq3bc5d2ae89c354c@mail.gmail.com> From: Cam Macdonell Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: kvm@vger.kernel.org, qemu-devel@nongnu.org On Tue, Mar 9, 2010 at 3:29 AM, Avi Kivity wrote: > On 03/08/2010 07:57 PM, Cam Macdonell wrote: >> >>> Can you provide a spec that describes the device? =A0This would be usef= ul >>> for >>> maintaining the code, writing guest drivers, and as a framework for >>> review. >>> >> >> I'm not sure if you want the Qemu command-line part as part of the >> spec here, but I've included for completeness. >> > > I meant something from the guest's point of view, so command line syntax = is > less important. =A0It should be equally applicable to a real PCI card tha= t > works with the same driver. > > See http://ozlabs.org/~rusty/virtio-spec/ for an example. > >> The Inter-VM Shared Memory PCI device >> ----------------------------------------------------------- >> >> BARs >> >> The device supports two BARs. =A0BAR0 is a 256-byte MMIO region to >> support registers >> > > (but might be extended in the future) > >> and BAR1 is used to map the shared memory object from the host. =A0The s= ize >> of >> BAR1 is specified on the command-line and must be a power of 2 in size. >> >> Registers >> >> BAR0 currently supports 5 registers of 16-bits each. > > Suggest making registers 32-bits, friendlier towards non-x86. > >> =A0Registers are used >> for synchronization between guests sharing the same memory object when >> interrupts are supported (this requires using the shared memory server). >> > > How does the driver detect whether interrupts are supported or not? At the moment, the VM ID is set to -1 if interrupts aren't supported, but that may not be the clearest way to do things. =A0With UIO is there a way to detect if the interrupt pin is on? > >> When using interrupts, VMs communicate with a shared memory server that >> passes >> the shared memory object file descriptor using SCM_RIGHTS. =A0The server >> assigns >> each VM an ID number and sends this ID number to the Qemu process along >> with a >> series of eventfd file descriptors, one per guest using the shared memor= y >> server. =A0These eventfds will be used to send interrupts between guests= . >> =A0Each >> guest listens on the eventfd corresponding to their ID and may use the >> others >> for sending interrupts to other guests. >> >> enum ivshmem_registers { >> =A0 =A0 IntrMask =3D 0, >> =A0 =A0 IntrStatus =3D 2, >> =A0 =A0 Doorbell =3D 4, >> =A0 =A0 IVPosition =3D 6, >> =A0 =A0 IVLiveList =3D 8 >> }; >> >> The first two registers are the interrupt mask and status registers. >> Interrupts are triggered when a message is received on the guest's event= fd >> from >> another VM. =A0Writing to the 'Doorbell' register is how synchronization >> messages >> are sent to other VMs. >> >> The IVPosition register is read-only and reports the guest's ID number. >> =A0The >> IVLiveList register is also read-only and reports a bit vector of >> currently >> live VM IDs. >> > > That limits the number of guests to 16. True, it could grow to 32 or 64 without difficulty. =A0We could leave 'liveness' to the user (could be implemented using the shared memory region) or via the interrupts that arrive on guest attach/detach as you suggest below.. > >> The Doorbell register is 16-bits, but is treated as two 8-bit values. = =A0The >> upper 8-bits are used for the destination VM ID. =A0The lower 8-bits are= the >> value which will be written to the destination VM and what the guest >> status >> register will be set to when the interrupt is trigger is the destination >> guest. >> > > What happens when two interrupts are sent back-to-back to the same guest? > =A0Will the first status value be lost? Right now, it would be. =A0I believe that eventfd has a counting semaphore option, that could prevent loss of status (but limits what the status could be). =A0My understanding of uio_pci interrupt handling is fairly new, but we could have the uio driver store the interrupt statuses to avoid losing them. > > Also, reading the status register requires a vmexit. =A0I suggest droppin= g it > and requiring the application to manage this information in the shared > memory area (where it could do proper queueing of multiple messages). > >> A value of 255 in the upper 8-bits will trigger a broadcast where the >> message >> will be sent to all other guests. >> > > Please consider adding: > > - MSI support Sure, I'll look into it. > - interrupt on a guest attaching/detaching to the shared memory device Sure. > > With MSI you could also have the doorbell specify both guest ID and vecto= r > number, which may be useful. > > Thanks for this - it definitely makes reviewing easier.