All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: avi@redhat.com, gleb@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 0/2] kvm: level irqfd support
Date: Wed, 22 Aug 2012 11:25:35 +0300	[thread overview]
Message-ID: <20120822082535.GA10680@redhat.com> (raw)
In-Reply-To: <1345598895.29292.115.camel@ul30vt.home>

On Tue, Aug 21, 2012 at 07:28:15PM -0600, Alex Williamson wrote:
> On Wed, 2012-08-22 at 03:31 +0300, Michael S. Tsirkin wrote:
> > On Tue, Aug 21, 2012 at 01:28:57PM -0600, Alex Williamson wrote:
> > > Here's the much anticipated re-write of support for level irqfds.  As
> > > Michael suggested, I've rolled the eoi/ack notification fd into
> > > KVM_IRQFD as a new mode.  For lack of a better name, as there seems to
> > > be objections to associating this specifically with an EOI or an ACK,
> > > I've name this OADN or "On Ack, De-assert & Notify".
> > > 
> > > Patch 1of2 switches current KVM_IRQFDs to use their own IRQ source ID
> > > since we're potentially stepping on KVM_USERSPACE_IRQ_SOURCE_ID.
> > > Unfurtunately I was not able to make 2of2 use a single IRQ source ID,
> > > the reason is it's racy.  Objects to track OADNs are made dynamically,
> > > we look through existing ones for a match under spinlock and setup a
> > > new one if there's no match.  On teardown, we can remove the OADN from
> > > the list under lock, but that same lock prevents us from de-assigning
> > > the IRQ ACK notifier or waiting for an RCU grace period.  We must make
> > > sure that any unused GSI is de-asserted, but the above means it's
> > > possible that another OADN has been created for this source ID/GSI
> > > and de-asserting the GSI could lead to breakage.
> > 
> > I do not see it. What breakage? Could you give an example please?
> > 
> > 
> > I think what you are saying is last deassign must clear
> > since otherwise we never will clear.
> > I agree it is either that or delay deassign until ack.
> > 
> > Can it be as simple as this (after all rcu etc dances)?
> > 	lock irqfds
> > 	if no oadns
> > 		set level to 0
> > 	unlock irqfds
> > ?
> 
> lock irqfds
> remove irqfd from oadn list
> if no oadns
>     remove oadn
>     set gsi 0
> unlock
>                                     lock irqfds
>                                     new oadn
>                                     unlock irqfds
> 
> >> EOI 
>                                                           ack notify new oadn
>                                                           de-assert gsi
>                                                           notify new oadn
> >> re-assert irqfd
>                                                           ack notify old oadn
>                                                           de-assert gsi
>                                                           notify old oadn
> 
> synchronize_rcu
> 
> kvm_unregister_irq_ack_notifier
>
> So, because the unregister is removed from the final clear and because
> we share an IRQ source ID there's a window where we can have two oadns
> registered for the same GSI.  The new one will de-assert and notify
> while the old one has an empty list to notify, but still de-asserts.  We
> can therefore de-assert w/o notify.
> 
> By using a new source ID, we separate the two so users of the new oadn
> can't race the old and we can cleanly free the old source ID,
> de-asserting it.

Need to think about it some more but is the problem two
ack notifiers for the same gsi?

In that case, how about we add __kvm_unregister_irq_ack_notifier
with no locking, and do most of the above under
kvm->irq_lock?

With one change: it is better not to call synchronize_rcu
under irq lock, I think we can safely move it to after
__kvm_unregister_irq_ack_notifier.


> > >  Instead each OADN
> > > object gets it's own source ID, but these are all shared by users
> > > of the same GSI.  So for PCI devices, we might have up to 4 IRQ
> > > source IDs allocated.
> > > 
> > > Michael had also suggested avoiding reference counting and using
> > > list_empty for this OADN object.  Unfortunately, that doesn't work
> > > for similar reasons.  We want to release the OADN object underlock,
> > > preventing others from re-using it on the free path, but in order
> > > to have lock-less de-assert & notify we use RCU, meaning we can't
> > > trust list_empty until after an RCU grace period, which must be
> > > done outside of spinlocks.
> > 
> > confused. list empty on assign/deassing would be under lock
> > so no need for grace periods to trust it.
> > what am I missing?
> > 
> > But if you like kref more that is OK too.
> 
> Maybe I'm misinterpreting this:
> 
> include/linux/rculist.h:
> /**
>  * list_del_rcu - deletes entry from list without re-initialization
>  * @entry: the element to delete from the list.
>  *
>  * Note: list_empty() on entry does not return true after this,
>  * the entry is in an undefined state. It is useful for RCU based
>  * lockfree traversal.
> 
> If I can trust list_empty on oadn->irqfds, which maybe I can re-reading
> it again, then we can drop the kref.  Thanks,
> 
> Alex

I think you are - *the entry you deleted* is not empty.
The list itself naturally is, or so it seems from code.

static inline void list_del_rcu(struct list_head *entry)
{
        __list_del_entry(entry);
        entry->prev = LIST_POISON2;
}

No?

> 
> > > If there are suggestions how we can handle these better, please
> > > make them, but I think this compromise is race-free and still
> > > manages to make allocation of IRQ source IDs mostly a non-issue
> > > for device assignment limits.  Thanks,
> > > 
> > > Alex
> > > 
> > > ---
> > > 
> > > Alex Williamson (2):
> > >       kvm: On Ack, De-assert & Notify KVM_IRQFD extension
> > >       kvm: Use a reserved IRQ source ID for irqfd
> > > 
> > > 
> > >  Documentation/virtual/kvm/api.txt |   13 ++
> > >  arch/x86/kvm/x86.c                |    4 +
> > >  include/linux/kvm.h               |    7 +
> > >  include/linux/kvm_host.h          |    2 
> > >  virt/kvm/eventfd.c                |  199 ++++++++++++++++++++++++++++++++++++-
> > >  5 files changed, 218 insertions(+), 7 deletions(-)
> 
> 

  reply	other threads:[~2012-08-22  8:25 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-21 19:28 [PATCH v9 0/2] kvm: level irqfd support Alex Williamson
2012-08-21 19:29 ` [PATCH v9 1/2] kvm: Use a reserved IRQ source ID for irqfd Alex Williamson
2012-08-21 19:58   ` Michael S. Tsirkin
2012-08-21 20:06     ` Alex Williamson
2012-08-21 20:41       ` Michael S. Tsirkin
2012-08-21 21:14         ` Alex Williamson
2012-08-22  0:41           ` Michael S. Tsirkin
2012-08-22  1:34             ` Alex Williamson
2012-09-05 14:35             ` Avi Kivity
2012-09-05 14:51               ` Michael S. Tsirkin
2012-09-05 14:59                 ` Avi Kivity
2012-09-05 15:13                   ` Michael S. Tsirkin
2012-09-05 15:22                     ` Avi Kivity
2012-09-05 15:28                       ` Michael S. Tsirkin
2012-09-05 15:35                         ` Avi Kivity
2012-09-05 15:09                 ` Michael S. Tsirkin
2012-09-05 15:12                   ` Avi Kivity
2012-09-05 15:15                     ` Michael S. Tsirkin
2012-09-05 14:46   ` Avi Kivity
2012-09-05 15:07     ` Michael S. Tsirkin
2012-09-05 15:15       ` Avi Kivity
2012-08-21 19:29 ` [PATCH v9 2/2] kvm: On Ack, De-assert & Notify KVM_IRQFD extension Alex Williamson
2012-08-21 20:37   ` Michael S. Tsirkin
2012-08-21 21:11     ` Alex Williamson
2012-08-22  0:14       ` Michael S. Tsirkin
2012-08-22  1:48         ` Alex Williamson
2012-09-05 14:57   ` Avi Kivity
2012-09-17 16:13     ` Alex Williamson
2012-08-22  0:31 ` [PATCH v9 0/2] kvm: level irqfd support Michael S. Tsirkin
2012-08-22  1:28   ` Alex Williamson
2012-08-22  8:25     ` Michael S. Tsirkin [this message]
2012-09-17 18:08       ` Alex Williamson
2012-08-22 10:10 ` Michael S. Tsirkin
2012-09-05 14:42 ` Avi Kivity
2012-09-05 14:52   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120822082535.GA10680@redhat.com \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.