Re: [PATCH 0/2] eventfd: new EFD_STATE flag

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH 0/2] eventfd: new EFD_STATE flag
Date: Sun, 10 Jan 2010 12:30:08 +0200	[thread overview]
Message-ID: <20100110103007.GA20538@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1001071622320.8698@makko.or.mcafeemobile.com>

On Thu, Jan 07, 2010 at 04:26:24PM -0800, Davide Libenzi wrote:
> On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:
> 
> > Sure, I was trying to be as brief as possible, here's a detailed summary.
> > 
> > Description of the system (MSI emulation in KVM):
> > 
> > KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
> > in guest OS.  When this eventfd is signalled, interrupt message is sent.
> > This assignment is done from qemu system emulator.
> > 
> > eventfd is signalled from device emulation in another thread in
> > userspace or from kernel, which talks with guest OS through another
> > eventfd and shared memory (possibility of out of process was discussed
> > but never got implemented yet).
> > 
> > Note: it's okay to delay messages from correctness point of view, but
> > generally this is latency-sensitive path. If multiple identical messages
> > are requested, it's okay to send a single last message, but missing a
> > message altogether causes deadlocks.  Sending a message when none were
> > requested might in theory cause crashes, in practice doing this causes
> > performance degradation.
> > 
> > Another KVM feature is interrupt masking: guest OS requests that we
> > stop sending some interrupt message, possibly modified mapping
> > and re-enables this message. This needs to be done without
> > involving the device that might keep requesting events:
> > while masked, message is marked "pending", and guest might test
> > the pending status.
> > 
> > We can implement masking in system emulator in userspace, by using
> > assign/deassign ioctls: when message is masked, we simply deassign all
> > eventfd, and when it is unmasked, we assign them back.
> > 
> > Here's some code to illustrate how this all works: assign/deassign code
> > in kernel looks like the following:
> > 
> > 
> > this is called to unmask interrupt
> > 
> > static int
> > kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> > {
> > 	struct _irqfd *irqfd, *tmp;
> > 	struct file *file = NULL;
> > 	struct eventfd_ctx *eventfd = NULL;
> > 	int ret;
> > 	unsigned int events;
> > 
> > 	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> > 
> > ...
> > 
> > 	file = eventfd_fget(fd);
> > 	if (IS_ERR(file)) {
> > 		ret = PTR_ERR(file);
> > 		goto fail;
> > 	}
> > 
> > 	eventfd = eventfd_ctx_fileget(file);
> > 	if (IS_ERR(eventfd)) {
> > 		ret = PTR_ERR(eventfd);
> > 		goto fail;
> > 	}
> > 
> > 	irqfd->eventfd = eventfd;
> > 
> > 	/*
> > 	 * Install our own custom wake-up handling so we are notified via
> > 	 * a callback whenever someone signals the underlying eventfd
> > 	 */
> > 	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> > 	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> > 
> > 	spin_lock_irq(&kvm->irqfds.lock);
> > 
> > 	events = file->f_op->poll(file, &irqfd->pt);
> > 
> > 	list_add_tail(&irqfd->list, &kvm->irqfds.items);
> > 	spin_unlock_irq(&kvm->irqfds.lock);
> > 
> > A.
> > 	/*
> > 	 * Check if there was an event already pending on the eventfd
> > 	 * before we registered, and trigger it as if we didn't miss it.
> > 	 */
> > 	if (events & POLLIN)
> > 		schedule_work(&irqfd->inject);
> > 
> > 	/*
> > 	 * do not drop the file until the irqfd is fully initialized, otherwise
> > 	 * we might race against the POLLHUP
> > 	 */
> > 	fput(file);
> > 
> > 	return 0;
> > 
> > fail:
> > 	...
> > }
> 
> What is you do (under proper irqfd locking) something like:
> 
> 	eventfd_ctx_read(ctx, 1, &cnt);
> 	if (irqfd->cnt != cnt) {
> 		irqfd->cnt = cnt;
> 		schedule_work(&irqfd->inject);
> 	}
> 
> 
> 
> 
> > And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> > so this is not under the spinlock):
> > 
> >         /*
> >          * Synchronize with the wait-queue and unhook ourselves to
> >          * prevent
> >          * further events.
> >          */
> > B.
> >         remove_wait_queue(irqfd->wqh, &irqfd->wait);
> > 
> > 	....
> > 
> >         /*
> >          * It is now safe to release the object's resources
> >          */
> >         eventfd_ctx_put(irqfd->eventfd);
> >         kfree(irqfd);
> 
> And:
> 
> 	eventfd_ctx_read(ctx, 1, &irqfd->cnt);


->

> 	remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 
> 
> 
> 
> - Davide

Yes, this is exactly what I wanted to do.  So, here's the issue: if an
event is signalled at point ->: after eventfd_ctx_read but before
remove_wait_queue, then we inject interrupt but counter will be left
non-zero and then when we unmask, we inject antoher, spurious interrupt.

This is why I wanted to have eventfd_ctx_read not take wait queue head
lock: then I could do:

	spin_lock_irqsave(&ctx->wqh.lock, flags);
 	eventfd_ctx_read(ctx, 1, &irqfd->cnt);
 	__remove_wait_queue(irqfd->wqh, &irqfd->wait);
	spin_lock_irqrestore(&ctx->wqh.lock, flags);


-- 
MST

next prev parent reply	other threads:[~2010-01-10 10:33 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-20 15:56 [PATCH 0/2] eventfd: new EFD_STATE flag Michael S. Tsirkin
2009-08-20 16:20 ` Davide Libenzi
2009-08-20 17:38   ` Avi Kivity
2009-08-20 17:44     ` Davide Libenzi
2009-08-20 17:56       ` Paolo Bonzini
2009-08-21 17:21         ` Davide Libenzi
2009-08-20 17:55     ` Michael S. Tsirkin
2009-08-20 18:06       ` Avi Kivity
2009-08-20 18:28         ` Michael S. Tsirkin
2009-08-23 13:01           ` Avi Kivity
2009-08-23 13:36             ` Michael S. Tsirkin
2009-08-23 13:40               ` Avi Kivity
2009-08-23 14:30                 ` Michael S. Tsirkin
2009-08-23 16:51                   ` Paolo Bonzini
2009-08-24 18:25                   ` Davide Libenzi
2009-08-24 18:31                     ` Avi Kivity
2009-08-24 22:08                       ` Davide Libenzi
2009-08-24 22:10                         ` Paolo Bonzini
2009-08-24 22:32                           ` Davide Libenzi
2009-08-25  6:59                             ` Paolo Bonzini
2009-08-25  4:26                         ` Avi Kivity
2009-08-24 21:49                     ` Michael S. Tsirkin
2009-08-24 22:15                       ` Davide Libenzi
2009-08-25  7:22                         ` Michael S. Tsirkin
2009-08-25 21:57                           ` Davide Libenzi
2009-08-26 10:29                             ` Michael S. Tsirkin
2009-08-26 10:41                               ` Avi Kivity
2009-08-26 17:45                               ` Davide Libenzi
2009-08-26 18:58                                 ` Avi Kivity
2009-08-26 19:13                                   ` Davide Libenzi
2009-08-26 19:42                                     ` Avi Kivity
2009-08-26 19:44                                       ` Davide Libenzi
2009-08-26 23:30                                         ` Davide Libenzi
2009-08-27  4:13                                           ` Avi Kivity
2009-08-27  8:06                                             ` Michael S. Tsirkin
2009-08-27 14:20                                               ` Davide Libenzi
2009-08-26 19:50                                       ` Gleb Natapov
2009-08-26 20:04                                         ` Davide Libenzi
2009-08-27  5:25                                           ` Gleb Natapov
2009-08-27  9:05                                     ` Paolo Bonzini
2009-08-27  9:09                                       ` Michael S. Tsirkin
2009-08-27 14:21                                       ` Davide Libenzi
2009-08-27 14:30                                         ` Michael S. Tsirkin
2009-08-27 14:38                                           ` Davide Libenzi
2009-08-27 14:49                                             ` Michael S. Tsirkin
2009-08-27 15:29                                               ` Davide Libenzi
2009-08-27 17:09                                                 ` Davide Libenzi
     [not found]                                                   ` <alpine.DEB.2.00.0908311644410.17349@makko.or.mcafeemobile.com>
     [not found]                                                     ` <4A9CB318.7030401@redhat.com>
     [not found]                                                       ` <alpine.DEB.2.00.0909010723380.28172@makko.or.mcafeemobile.com>
2010-01-06 19:33                                                         ` Michael S. Tsirkin
2010-01-06 20:43                                                           ` Davide Libenzi
2010-01-06 20:55                                                             ` Michael S. Tsirkin
2010-01-06 21:17                                                               ` Davide Libenzi
2010-01-06 22:29                                                                 ` Michael S. Tsirkin
2010-01-06 22:46                                                                   ` Davide Libenzi
2010-01-06 23:45                                                                     ` Michael S. Tsirkin
2010-01-06 23:59                                                                       ` Davide Libenzi
2010-01-07  0:02                                                                         ` Michael S. Tsirkin
2010-01-07  6:45                                                                         ` Michael S. Tsirkin
2010-01-07  7:25                                                                           ` Davide Libenzi
2010-01-07 10:36                                                                             ` Michael S. Tsirkin
2010-01-07 23:37                                                                               ` Davide Libenzi
2010-01-08  0:13                                                                                 ` Davide Libenzi
2010-01-08  0:26                                                                               ` Davide Libenzi
2010-01-10 10:30                                                                                 ` Michael S. Tsirkin [this message]
2010-01-10 15:26                                                                                   ` Davide Libenzi
2010-01-10 16:22                                                                                     ` Michael S. Tsirkin
2010-01-10 17:27                                                                                       ` Davide Libenzi
2010-01-10 17:35                                                                                         ` Michael S. Tsirkin
2010-01-10 19:04                                                                                           ` Davide Libenzi
2010-01-11  7:34                                                                                             ` Michael S. Tsirkin
2010-01-11 19:14                                                                                               ` Davide Libenzi
2010-01-11 19:19                                                                                                 ` Michael S. Tsirkin
2010-01-11 22:53                                                                                                   ` Davide Libenzi
2010-01-13 17:07                                                                                                     ` Michael S. Tsirkin
2010-01-11  9:01                                                                               ` Gleb Natapov
2010-01-11  9:02                                                                                 ` Michael S. Tsirkin
2010-01-11  9:08                                                                                   ` Gleb Natapov
2010-01-11  9:19                                                                                     ` Michael S. Tsirkin
2010-01-11  9:36                                                                                       ` Gleb Natapov
2010-01-11  9:41                                                                                         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100110103007.GA20538@redhat.com \
    --to=mst@redhat.com \
    --cc=avi@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.