From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NuqyE-0002Rc-Q0 for qemu-devel@nongnu.org; Thu, 25 Mar 2010 13:35:50 -0400 Received: from [140.186.70.92] (port=41998 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NuqyC-0002Po-TN for qemu-devel@nongnu.org; Thu, 25 Mar 2010 13:35:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Nuqy6-0003da-Ql for qemu-devel@nongnu.org; Thu, 25 Mar 2010 13:35:47 -0400 Received: from mail-iw0-f194.google.com ([209.85.223.194]:55460) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Nuqy5-0003d9-HL for qemu-devel@nongnu.org; Thu, 25 Mar 2010 13:35:41 -0400 Received: by iwn32 with SMTP id 32so4095523iwn.18 for ; Thu, 25 Mar 2010 10:35:40 -0700 (PDT) MIME-Version: 1.0 Sender: camm@ualberta.ca In-Reply-To: <4BAB9718.3030808@redhat.com> References: <1269497310-21858-1-git-send-email-cam@cs.ualberta.ca> <4BAB2736.7020202@redhat.com> <8286e4ee1003250950l45cc2883yd4788d20f99ef86c@mail.gmail.com> <4BAB9718.3030808@redhat.com> Date: Thu, 25 Mar 2010 11:35:39 -0600 Message-ID: <8286e4ee1003251035o75fed405j45b60d496afa66b5@mail.gmail.com> From: Cam Macdonell Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [PATCH v3 0/2] Inter-VM shared memory PCI device List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org On Thu, Mar 25, 2010 at 11:02 AM, Avi Kivity wrote: > On 03/25/2010 06:50 PM, Cam Macdonell wrote: >> >>> Please put the spec somewhere publicly accessible with a permanent URL. >>> =A0I >>> suggest a new qemu.git directory specs/. =A0It's more important than th= e >>> code >>> IMO. >>> >> >> Sorry to be pedantic, do you want a URL or the spec as part of a patch >> that adds it as =A0a file in qemu.git/docs/specs/ >> > > I leave it up to you. =A0If you are up to hosting it independently, than = just > post a URL as part of the patch. =A0Otherwise, I'm sure qemu.git will be = more > than happy to be the official repository for the memory sharing device > specification. =A0In that case, make the the spec the first patch in the > series. Ok, I'll send it as part of the series that way people can comment inline easily. > >>> Possible later extensions: >>> - multiple doorbells that trigger different vectors >>> - multicast doorbells >>> >> >> Since the doorbells are exposed the multicast could be done by the >> driver. =A0If multicast is handled by qemu, then we have different >> behaviour when using ioeventfd/irqfd since only one eventfd can be >> triggered by a write. >> > > Multicast by the driver would require one exit per guest signalled. > =A0Multicast by the shared memory server needs one exit to signal an even= tfd, > then the shared memory server signals the irqfds of all members of the > multicast group. > >>>> The semantics of the value written to the doorbell depends on whether >>>> the >>>> device is using MSI or a regular pin-based interrupt. >>>> >>>> >>> >>> I recommend against making the semantics interrupt-style dependent. =A0= It >>> means the application needs to know whether MSI is in use or not, while >>> it >>> is generally the OS that is in control of that. >>> >> >> It is basically the use of the status register that is the difference. >> =A0The application view of what is happening doesn't need to change, >> especially with UIO: write to doorbells, block on read until interrupt >> arrives. =A0In the MSI case I could set the status register to the >> vector that is received and then the would be equivalent from the view >> of the application. =A0But, if future MSI support in UIO allows MSI >> information (such as vector number) to be accessible in userspace, >> then applications would become MSI dependent anyway. >> > > Ah, I see. =A0You adjusted for the different behaviours in the driver. > > Still I recommend dropping the status register: this allows single-msi an= d > PIRQ to behave the same way. =A0Also it is racy, if two guests signal a t= hird, > they will overwrite each other's status. With shared interrupts with PIRQ without a status register how does a device know it generated the interrupt? > >>> ioeventfd/irqfd are an implementation detail. =A0The spec should not de= pend >>> on >>> it. =A0It needs to be written as if qemu and kvm do not exist. =A0Again= , I >>> recommend Rusty's virtio-pci for inspiration. >>> >>> Applications should see exactly the same thing whether ioeventfd is >>> enabled >>> or not. >>> >> >> The challenge I recently encountered with this is one line in the >> eventfd implementation >> >> from kvm/virt/kvm/eventfd.c >> >> /* MMIO/PIO writes trigger an event if the addr/val match */ >> static int >> ioeventfd_write(struct kvm_io_device *this, gpa_t addr, int len, >> =A0 =A0 =A0 =A0 const void *val) >> { >> =A0 =A0 struct _ioeventfd *p =3D to_ioeventfd(this); >> >> =A0 =A0 if (!ioeventfd_in_range(p, addr, len, val)) >> =A0 =A0 =A0 =A0 return -EOPNOTSUPP; >> >> =A0 =A0 eventfd_signal(p->eventfd, 1); >> =A0 =A0 return 0; >> } >> >> IIUC, no matter what value is written to an ioeventfd by a guest, a >> value of 1 is written. =A0So ioeventfds work differently than eventfds. >> Can we add a "multivalue" flag to ioeventfds so that the value that >> the guest writes is written to eventfd? >> > > Eventfd values are a counter, not a register. =A0A read() on the other si= de > returns the sum of all write()s (or eventfd_signal()s). =A0In the context= of > irqfd it just means the number of interrupts we coalesced. > > Multivalue was considered at one time for a different need and rejected. > =A0Really, to solve the race you need a queue, and that can only be done = in > the shared memory segment using locked instructions. I had a hunch it was probably considered. That explains why irqfd doesn't have a datamatch field. I guess supporting multiple MSI vectors with one doorbell per guest isn't possible if one 1 bit of information can be communicated. So, ioeventfd/irqfd restricts MSI to 1 vector between guests. Should multi-MSI even be supported then in the non-ioeventfd/irq case? Otherwise ioeventfd/irqfd become more than an implementation detail. > > -- > error compiling committee.c: too many arguments to function > >