From: "Michael S. Tsirkin" <mst@redhat.com>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH 0/2] eventfd: new EFD_STATE flag
Date: Sun, 10 Jan 2010 12:30:08 +0200 [thread overview]
Message-ID: <20100110103007.GA20538@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1001071622320.8698@makko.or.mcafeemobile.com>
On Thu, Jan 07, 2010 at 04:26:24PM -0800, Davide Libenzi wrote:
> On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:
>
> > Sure, I was trying to be as brief as possible, here's a detailed summary.
> >
> > Description of the system (MSI emulation in KVM):
> >
> > KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
> > in guest OS. When this eventfd is signalled, interrupt message is sent.
> > This assignment is done from qemu system emulator.
> >
> > eventfd is signalled from device emulation in another thread in
> > userspace or from kernel, which talks with guest OS through another
> > eventfd and shared memory (possibility of out of process was discussed
> > but never got implemented yet).
> >
> > Note: it's okay to delay messages from correctness point of view, but
> > generally this is latency-sensitive path. If multiple identical messages
> > are requested, it's okay to send a single last message, but missing a
> > message altogether causes deadlocks. Sending a message when none were
> > requested might in theory cause crashes, in practice doing this causes
> > performance degradation.
> >
> > Another KVM feature is interrupt masking: guest OS requests that we
> > stop sending some interrupt message, possibly modified mapping
> > and re-enables this message. This needs to be done without
> > involving the device that might keep requesting events:
> > while masked, message is marked "pending", and guest might test
> > the pending status.
> >
> > We can implement masking in system emulator in userspace, by using
> > assign/deassign ioctls: when message is masked, we simply deassign all
> > eventfd, and when it is unmasked, we assign them back.
> >
> > Here's some code to illustrate how this all works: assign/deassign code
> > in kernel looks like the following:
> >
> >
> > this is called to unmask interrupt
> >
> > static int
> > kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> > {
> > struct _irqfd *irqfd, *tmp;
> > struct file *file = NULL;
> > struct eventfd_ctx *eventfd = NULL;
> > int ret;
> > unsigned int events;
> >
> > irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> >
> > ...
> >
> > file = eventfd_fget(fd);
> > if (IS_ERR(file)) {
> > ret = PTR_ERR(file);
> > goto fail;
> > }
> >
> > eventfd = eventfd_ctx_fileget(file);
> > if (IS_ERR(eventfd)) {
> > ret = PTR_ERR(eventfd);
> > goto fail;
> > }
> >
> > irqfd->eventfd = eventfd;
> >
> > /*
> > * Install our own custom wake-up handling so we are notified via
> > * a callback whenever someone signals the underlying eventfd
> > */
> > init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> > init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> >
> > spin_lock_irq(&kvm->irqfds.lock);
> >
> > events = file->f_op->poll(file, &irqfd->pt);
> >
> > list_add_tail(&irqfd->list, &kvm->irqfds.items);
> > spin_unlock_irq(&kvm->irqfds.lock);
> >
> > A.
> > /*
> > * Check if there was an event already pending on the eventfd
> > * before we registered, and trigger it as if we didn't miss it.
> > */
> > if (events & POLLIN)
> > schedule_work(&irqfd->inject);
> >
> > /*
> > * do not drop the file until the irqfd is fully initialized, otherwise
> > * we might race against the POLLHUP
> > */
> > fput(file);
> >
> > return 0;
> >
> > fail:
> > ...
> > }
>
> What is you do (under proper irqfd locking) something like:
>
> eventfd_ctx_read(ctx, 1, &cnt);
> if (irqfd->cnt != cnt) {
> irqfd->cnt = cnt;
> schedule_work(&irqfd->inject);
> }
>
>
>
>
> > And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> > so this is not under the spinlock):
> >
> > /*
> > * Synchronize with the wait-queue and unhook ourselves to
> > * prevent
> > * further events.
> > */
> > B.
> > remove_wait_queue(irqfd->wqh, &irqfd->wait);
> >
> > ....
> >
> > /*
> > * It is now safe to release the object's resources
> > */
> > eventfd_ctx_put(irqfd->eventfd);
> > kfree(irqfd);
>
> And:
>
> eventfd_ctx_read(ctx, 1, &irqfd->cnt);
->
> remove_wait_queue(irqfd->wqh, &irqfd->wait);
>
>
>
>
> - Davide
Yes, this is exactly what I wanted to do. So, here's the issue: if an
event is signalled at point ->: after eventfd_ctx_read but before
remove_wait_queue, then we inject interrupt but counter will be left
non-zero and then when we unmask, we inject antoher, spurious interrupt.
This is why I wanted to have eventfd_ctx_read not take wait queue head
lock: then I could do:
spin_lock_irqsave(&ctx->wqh.lock, flags);
eventfd_ctx_read(ctx, 1, &irqfd->cnt);
__remove_wait_queue(irqfd->wqh, &irqfd->wait);
spin_lock_irqrestore(&ctx->wqh.lock, flags);
--
MST
next prev parent reply other threads:[~2010-01-10 10:33 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-20 15:56 [PATCH 0/2] eventfd: new EFD_STATE flag Michael S. Tsirkin
2009-08-20 16:20 ` Davide Libenzi
2009-08-20 17:38 ` Avi Kivity
2009-08-20 17:44 ` Davide Libenzi
2009-08-20 17:56 ` Paolo Bonzini
2009-08-21 17:21 ` Davide Libenzi
2009-08-20 17:55 ` Michael S. Tsirkin
2009-08-20 18:06 ` Avi Kivity
2009-08-20 18:28 ` Michael S. Tsirkin
2009-08-23 13:01 ` Avi Kivity
2009-08-23 13:36 ` Michael S. Tsirkin
2009-08-23 13:40 ` Avi Kivity
2009-08-23 14:30 ` Michael S. Tsirkin
2009-08-23 16:51 ` Paolo Bonzini
2009-08-24 18:25 ` Davide Libenzi
2009-08-24 18:31 ` Avi Kivity
2009-08-24 22:08 ` Davide Libenzi
2009-08-24 22:10 ` Paolo Bonzini
2009-08-24 22:32 ` Davide Libenzi
2009-08-25 6:59 ` Paolo Bonzini
2009-08-25 4:26 ` Avi Kivity
2009-08-24 21:49 ` Michael S. Tsirkin
2009-08-24 22:15 ` Davide Libenzi
2009-08-25 7:22 ` Michael S. Tsirkin
2009-08-25 21:57 ` Davide Libenzi
2009-08-26 10:29 ` Michael S. Tsirkin
2009-08-26 10:41 ` Avi Kivity
2009-08-26 17:45 ` Davide Libenzi
2009-08-26 18:58 ` Avi Kivity
2009-08-26 19:13 ` Davide Libenzi
2009-08-26 19:42 ` Avi Kivity
2009-08-26 19:44 ` Davide Libenzi
2009-08-26 23:30 ` Davide Libenzi
2009-08-27 4:13 ` Avi Kivity
2009-08-27 8:06 ` Michael S. Tsirkin
2009-08-27 14:20 ` Davide Libenzi
2009-08-26 19:50 ` Gleb Natapov
2009-08-26 20:04 ` Davide Libenzi
2009-08-27 5:25 ` Gleb Natapov
2009-08-27 9:05 ` Paolo Bonzini
2009-08-27 9:09 ` Michael S. Tsirkin
2009-08-27 14:21 ` Davide Libenzi
2009-08-27 14:30 ` Michael S. Tsirkin
2009-08-27 14:38 ` Davide Libenzi
2009-08-27 14:49 ` Michael S. Tsirkin
2009-08-27 15:29 ` Davide Libenzi
2009-08-27 17:09 ` Davide Libenzi
[not found] ` <alpine.DEB.2.00.0908311644410.17349@makko.or.mcafeemobile.com>
[not found] ` <4A9CB318.7030401@redhat.com>
[not found] ` <alpine.DEB.2.00.0909010723380.28172@makko.or.mcafeemobile.com>
2010-01-06 19:33 ` Michael S. Tsirkin
2010-01-06 20:43 ` Davide Libenzi
2010-01-06 20:55 ` Michael S. Tsirkin
2010-01-06 21:17 ` Davide Libenzi
2010-01-06 22:29 ` Michael S. Tsirkin
2010-01-06 22:46 ` Davide Libenzi
2010-01-06 23:45 ` Michael S. Tsirkin
2010-01-06 23:59 ` Davide Libenzi
2010-01-07 0:02 ` Michael S. Tsirkin
2010-01-07 6:45 ` Michael S. Tsirkin
2010-01-07 7:25 ` Davide Libenzi
2010-01-07 10:36 ` Michael S. Tsirkin
2010-01-07 23:37 ` Davide Libenzi
2010-01-08 0:13 ` Davide Libenzi
2010-01-08 0:26 ` Davide Libenzi
2010-01-10 10:30 ` Michael S. Tsirkin [this message]
2010-01-10 15:26 ` Davide Libenzi
2010-01-10 16:22 ` Michael S. Tsirkin
2010-01-10 17:27 ` Davide Libenzi
2010-01-10 17:35 ` Michael S. Tsirkin
2010-01-10 19:04 ` Davide Libenzi
2010-01-11 7:34 ` Michael S. Tsirkin
2010-01-11 19:14 ` Davide Libenzi
2010-01-11 19:19 ` Michael S. Tsirkin
2010-01-11 22:53 ` Davide Libenzi
2010-01-13 17:07 ` Michael S. Tsirkin
2010-01-11 9:01 ` Gleb Natapov
2010-01-11 9:02 ` Michael S. Tsirkin
2010-01-11 9:08 ` Gleb Natapov
2010-01-11 9:19 ` Michael S. Tsirkin
2010-01-11 9:36 ` Gleb Natapov
2010-01-11 9:41 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100110103007.GA20538@redhat.com \
--to=mst@redhat.com \
--cc=avi@redhat.com \
--cc=davidel@xmailserver.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).