Re: [PATCH 0/2] eventfd: new EFD_STATE flag

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: Avi Kivity <avi@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH 0/2] eventfd: new EFD_STATE flag
Date: Sun, 10 Jan 2010 12:30:08 +0200	[thread overview]
Message-ID: <20100110103007.GA20538@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1001071622320.8698@makko.or.mcafeemobile.com>

On Thu, Jan 07, 2010 at 04:26:24PM -0800, Davide Libenzi wrote:
> On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:
> 
> > Sure, I was trying to be as brief as possible, here's a detailed summary.
> > 
> > Description of the system (MSI emulation in KVM):
> > 
> > KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
> > in guest OS.  When this eventfd is signalled, interrupt message is sent.
> > This assignment is done from qemu system emulator.
> > 
> > eventfd is signalled from device emulation in another thread in
> > userspace or from kernel, which talks with guest OS through another
> > eventfd and shared memory (possibility of out of process was discussed
> > but never got implemented yet).
> > 
> > Note: it's okay to delay messages from correctness point of view, but
> > generally this is latency-sensitive path. If multiple identical messages
> > are requested, it's okay to send a single last message, but missing a
> > message altogether causes deadlocks.  Sending a message when none were
> > requested might in theory cause crashes, in practice doing this causes
> > performance degradation.
> > 
> > Another KVM feature is interrupt masking: guest OS requests that we
> > stop sending some interrupt message, possibly modified mapping
> > and re-enables this message. This needs to be done without
> > involving the device that might keep requesting events:
> > while masked, message is marked "pending", and guest might test
> > the pending status.
> > 
> > We can implement masking in system emulator in userspace, by using
> > assign/deassign ioctls: when message is masked, we simply deassign all
> > eventfd, and when it is unmasked, we assign them back.
> > 
> > Here's some code to illustrate how this all works: assign/deassign code
> > in kernel looks like the following:
> > 
> > 
> > this is called to unmask interrupt
> > 
> > static int
> > kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> > {
> > 	struct _irqfd *irqfd, *tmp;
> > 	struct file *file = NULL;
> > 	struct eventfd_ctx *eventfd = NULL;
> > 	int ret;
> > 	unsigned int events;
> > 
> > 	irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
> > 
> > ...
> > 
> > 	file = eventfd_fget(fd);
> > 	if (IS_ERR(file)) {
> > 		ret = PTR_ERR(file);
> > 		goto fail;
> > 	}
> > 
> > 	eventfd = eventfd_ctx_fileget(file);
> > 	if (IS_ERR(eventfd)) {
> > 		ret = PTR_ERR(eventfd);
> > 		goto fail;
> > 	}
> > 
> > 	irqfd->eventfd = eventfd;
> > 
> > 	/*
> > 	 * Install our own custom wake-up handling so we are notified via
> > 	 * a callback whenever someone signals the underlying eventfd
> > 	 */
> > 	init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
> > 	init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
> > 
> > 	spin_lock_irq(&kvm->irqfds.lock);
> > 
> > 	events = file->f_op->poll(file, &irqfd->pt);
> > 
> > 	list_add_tail(&irqfd->list, &kvm->irqfds.items);
> > 	spin_unlock_irq(&kvm->irqfds.lock);
> > 
> > A.
> > 	/*
> > 	 * Check if there was an event already pending on the eventfd
> > 	 * before we registered, and trigger it as if we didn't miss it.
> > 	 */
> > 	if (events & POLLIN)
> > 		schedule_work(&irqfd->inject);
> > 
> > 	/*
> > 	 * do not drop the file until the irqfd is fully initialized, otherwise
> > 	 * we might race against the POLLHUP
> > 	 */
> > 	fput(file);
> > 
> > 	return 0;
> > 
> > fail:
> > 	...
> > }
> 
> What is you do (under proper irqfd locking) something like:
> 
> 	eventfd_ctx_read(ctx, 1, &cnt);
> 	if (irqfd->cnt != cnt) {
> 		irqfd->cnt = cnt;
> 		schedule_work(&irqfd->inject);
> 	}
> 
> 
> 
> 
> > And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
> > so this is not under the spinlock):
> > 
> >         /*
> >          * Synchronize with the wait-queue and unhook ourselves to
> >          * prevent
> >          * further events.
> >          */
> > B.
> >         remove_wait_queue(irqfd->wqh, &irqfd->wait);
> > 
> > 	....
> > 
> >         /*
> >          * It is now safe to release the object's resources
> >          */
> >         eventfd_ctx_put(irqfd->eventfd);
> >         kfree(irqfd);
> 
> And:
> 
> 	eventfd_ctx_read(ctx, 1, &irqfd->cnt);


->

> 	remove_wait_queue(irqfd->wqh, &irqfd->wait);
> 
> 
> 
> 
> - Davide

Yes, this is exactly what I wanted to do.  So, here's the issue: if an
event is signalled at point ->: after eventfd_ctx_read but before
remove_wait_queue, then we inject interrupt but counter will be left
non-zero and then when we unmask, we inject antoher, spurious interrupt.

This is why I wanted to have eventfd_ctx_read not take wait queue head
lock: then I could do:

	spin_lock_irqsave(&ctx->wqh.lock, flags);
 	eventfd_ctx_read(ctx, 1, &irqfd->cnt);
 	__remove_wait_queue(irqfd->wqh, &irqfd->wait);
	spin_lock_irqrestore(&ctx->wqh.lock, flags);


-- 
MST

next prev parent reply	other threads:[~2010-01-10 10:33 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-20 15:56 [PATCH 0/2] eventfd: new EFD_STATE flag Michael S. Tsirkin
2009-08-20 16:20 ` Davide Libenzi
2009-08-20 17:38   ` Avi Kivity
2009-08-20 17:44     ` Davide Libenzi
2009-08-20 17:56       ` Paolo Bonzini
2009-08-21 17:21         ` Davide Libenzi
2009-08-20 17:55     ` Michael S. Tsirkin
2009-08-20 18:06       ` Avi Kivity
2009-08-20 18:28         ` Michael S. Tsirkin
2009-08-23 13:01           ` Avi Kivity
2009-08-23 13:36             ` Michael S. Tsirkin
2009-08-23 13:40               ` Avi Kivity
2009-08-23 14:30                 ` Michael S. Tsirkin
2009-08-23 16:51                   ` Paolo Bonzini
2009-08-24 18:25                   ` Davide Libenzi
2009-08-24 18:31                     ` Avi Kivity
2009-08-24 22:08                       ` Davide Libenzi
2009-08-24 22:10                         ` Paolo Bonzini
2009-08-24 22:32                           ` Davide Libenzi
2009-08-25  6:59                             ` Paolo Bonzini
2009-08-25  4:26                         ` Avi Kivity
2009-08-24 21:49                     ` Michael S. Tsirkin
2009-08-24 22:15                       ` Davide Libenzi
2009-08-25  7:22                         ` Michael S. Tsirkin
2009-08-25 21:57                           ` Davide Libenzi
2009-08-26 10:29                             ` Michael S. Tsirkin
2009-08-26 10:41                               ` Avi Kivity
2009-08-26 17:45                               ` Davide Libenzi
2009-08-26 18:58                                 ` Avi Kivity
2009-08-26 19:13                                   ` Davide Libenzi
2009-08-26 19:42                                     ` Avi Kivity
2009-08-26 19:44                                       ` Davide Libenzi
2009-08-26 23:30                                         ` Davide Libenzi
2009-08-27  4:13                                           ` Avi Kivity
2009-08-27  8:06                                             ` Michael S. Tsirkin
2009-08-27 14:20                                               ` Davide Libenzi
2009-08-26 19:50                                       ` Gleb Natapov
2009-08-26 20:04                                         ` Davide Libenzi
2009-08-27  5:25                                           ` Gleb Natapov
2009-08-27  9:05                                     ` Paolo Bonzini
2009-08-27  9:09                                       ` Michael S. Tsirkin
2009-08-27 14:21                                       ` Davide Libenzi
2009-08-27 14:30                                         ` Michael S. Tsirkin
2009-08-27 14:38                                           ` Davide Libenzi
2009-08-27 14:49                                             ` Michael S. Tsirkin
2009-08-27 15:29                                               ` Davide Libenzi
2009-08-27 17:09                                                 ` Davide Libenzi
     [not found]                                                   ` <alpine.DEB.2.00.0908311644410.17349@makko.or.mcafeemobile.com>
     [not found]                                                     ` <4A9CB318.7030401@redhat.com>
     [not found]                                                       ` <alpine.DEB.2.00.0909010723380.28172@makko.or.mcafeemobile.com>
2010-01-06 19:33                                                         ` Michael S. Tsirkin
2010-01-06 20:43                                                           ` Davide Libenzi
2010-01-06 20:55                                                             ` Michael S. Tsirkin
2010-01-06 21:17                                                               ` Davide Libenzi
2010-01-06 22:29                                                                 ` Michael S. Tsirkin
2010-01-06 22:46                                                                   ` Davide Libenzi
2010-01-06 23:45                                                                     ` Michael S. Tsirkin
2010-01-06 23:59                                                                       ` Davide Libenzi
2010-01-07  0:02                                                                         ` Michael S. Tsirkin
2010-01-07  6:45                                                                         ` Michael S. Tsirkin
2010-01-07  7:25                                                                           ` Davide Libenzi
2010-01-07 10:36                                                                             ` Michael S. Tsirkin
2010-01-07 23:37                                                                               ` Davide Libenzi
2010-01-08  0:13                                                                                 ` Davide Libenzi
2010-01-08  0:26                                                                               ` Davide Libenzi
2010-01-10 10:30                                                                                 ` Michael S. Tsirkin [this message]
2010-01-10 15:26                                                                                   ` Davide Libenzi
2010-01-10 16:22                                                                                     ` Michael S. Tsirkin
2010-01-10 17:27                                                                                       ` Davide Libenzi
2010-01-10 17:35                                                                                         ` Michael S. Tsirkin
2010-01-10 19:04                                                                                           ` Davide Libenzi
2010-01-11  7:34                                                                                             ` Michael S. Tsirkin
2010-01-11 19:14                                                                                               ` Davide Libenzi
2010-01-11 19:19                                                                                                 ` Michael S. Tsirkin
2010-01-11 22:53                                                                                                   ` Davide Libenzi
2010-01-13 17:07                                                                                                     ` Michael S. Tsirkin
2010-01-11  9:01                                                                               ` Gleb Natapov
2010-01-11  9:02                                                                                 ` Michael S. Tsirkin
2010-01-11  9:08                                                                                   ` Gleb Natapov
2010-01-11  9:19                                                                                     ` Michael S. Tsirkin
2010-01-11  9:36                                                                                       ` Gleb Natapov
2010-01-11  9:41                                                                                         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100110103007.GA20538@redhat.com \
    --to=mst@redhat.com \
    --cc=avi@redhat.com \
    --cc=davidel@xmailserver.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).