public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Gregory Haskins <gregory.haskins@gmail.com>
Cc: Gregory Haskins <ghaskins@novell.com>,
	kvm@vger.kernel.org,
	"alacrityvm-devel@lists.sourceforge.net" 
	<alacrityvm-devel@lists.sourceforge.net>,
	linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com
Subject: Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Date: Tue, 27 Oct 2009 17:30:07 +0200	[thread overview]
Message-ID: <20091027153007.GP29477@redhat.com> (raw)
In-Reply-To: <4AE70815.7030307@gmail.com>

On Tue, Oct 27, 2009 at 10:47:49AM -0400, Gregory Haskins wrote:
> Gleb Natapov wrote:
> > On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:
> >> Gleb Natapov wrote:
> >>> On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
> >>>> The current code suffers from the following race condition:
> >>>>
> >>>> thread-1                                    thread-2
> >>>> -----------------------------------------------------------
> >>>>
> >>>> kvm_set_irq() {
> >>>>    rcu_read_lock()
> >>>>    irq_rt = rcu_dereference(table);
> >>>>    rcu_read_unlock();
> >>>>
> >>>>                                        kvm_set_irq_routing() {
> >>>>                                           mutex_lock();
> >>>>                                           irq_rt = table;
> >>>>                                           rcu_assign_pointer();
> >>>>                                           mutex_unlock();
> >>>>                                           synchronize_rcu();
> >>>>
> >>>>                                           kfree(irq_rt);
> >>>>
> >>>>    irq_rt->entry->set(); /* bad */
> >>>>
> >>> This is not what happens. irq_rt is never accessed outside read-side
> >>> critical section.
> >> Sorry, I was generalizing to keep the comments short.  I figured it
> >> would be clear what I was actually saying, but realize in retrospect
> >> that I was a little ambiguous.
> >>
> > A little is underestimation :) There is not /* bad */ line in the code!
> 
> Sorry, that was my own highlighting, not trying to reflect actual code.
> 
> > 
> >> Yes, irq_rt is not accessed outside the RSCS.  However, the entry
> >> pointers stored in the irq_rt->map are, and this is equally problematic
> >> afaict.
> > The pointer is in text and can't disappear without kvm_set_irq()
> > disappearing too.
> 
> No, the entry* pointer is .text _AND_ .data, and is subject to standard
> synchronization rules like most other objects.
> 
> Unless I am misreading the code, the entry* pointers point to heap
> within the irq_rt pointer.  Therefore, the "kfree(irq_rt)" I mention
> above effectively invalidates the entire set of entry* pointers that you
> are caching, and is thus an issue.
> 
I think you are missing that the content of the entry is copied, not
pointer to the entry:
	irq_set[i++] = *e;
> > 
> >> In this particular case we seem to never delete entries at run-time once
> >> they are established.  Therefore, while perhaps sloppy, its technically
> >> safe to leave them unprotected from this perspective.
> 
> Note: I was wrong in this statement.  I forgot that it's not safe at
> run-time either since the entry objects are part of irq_rt.
> 
> >> The issue is more
> >> related to shutdown since a kvm_set_irq() caller could be within the
> >> aforementioned race-region and call entry->set() after the guest is
> >> gone.  Or did I miss something?
> >>
> > The caller of kvm_set_irq() should hold reference to kvm instance, so it
> > can't disappear while you are inside kvm_set_irq(). RCU protects only
> > kvm->irq_routing not kvm structure itself.
> 
> Agreed, but this has nothing to do with protecting the entry* pointers.
> 
There are not used outside critical section.

> > 
> >>> Data is copied from irq_rt onto the stack and this copy is accessed
> >>> outside critical section.
> >> As mentioned above, I do not believe this really protect us.  And even
> > I don't see the prove it doesn't, so I assume it does.
> 
> What would you like to see beyond what I've already provided you?  I can
> show how the entry pointers are allocated as part of the irq_rt, and I
> can show how the irq_rt (via entry->set) access is racy against
> invalidation.
> 
> > 
> >> if it did, the copy is just a work-around to avoid sleeping within the
> > It is not a work-around. There was two solutions to the problem one is
> > to call ->set() outside rcu critical section
> 
> This is broken afaict without taking additional precautions, such as a
> reference count on the irq_rt structure, but I mentioned this alternate
> solution in my header.
> 
> > another is to use SRCU. I
> > decided to use the first one. This way the code is much simpler
> 
> "simpler" is debatable, but ok.  SRCU is an established pattern
> available in the upstream kernel, so I don't think its particularly
> complicated or controversial to use.
> 
> > and I remember asking Paul what are the disadvantages of using SRCU and there
> > was something.
> > 
> 
> The disadvantages to my knowledge are as follows:
> 
> 1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but
> we are talking about nanoseconds on modern hardware (I think Paul quoted
> me 10ns vs 45ns on his rig).  I don't think either overhead is something
> to be concerned about in this case.
> 
If we can avoid why not? Nanoseconds tend to add up.

> 2) standard rcu supports deferred synchronization (call_rcu()), as well
> as barriers (synchronize_rcu()), whereas SRCU only supports barriers
> (synchronize_srcu()).  We only use the barrier type in this code path,
> so that is not an issue.
Agree.

> 
> 3) SRCU requires explicit initialization/cleanup, whereas regular RCU
> does not.  Trivially solved in my patch since KVM has plenty of
> init/cleanup hook points.
> 
No problem here too.

> >> standard RCU RSCS, which is what SRCU is designed for.  So rather than
> >> inventing an awkward two-phased stack based solution, it's better to
> >> reuse the provided tools, IMO.
> >>
> >> To flip it around:  Is there any reason why an SRCU would not work here,
> >> and thus we were forced to use something like the stack-copy approach?
> >>
> > If SRCU has no disadvantage comparing to RCU why not use it always? :)
> 
> No one is debating that SRCU has some disadvantages to RCU, but it
> should also be noted that RCU has disadvantages as well (for instance,
> you can't sleep within the RSCS except for preemptible-based configurations)
> 
> The differences between them is really not the issue.  The bottom line
> is that upstream KVM irq_routing code is broken afaict with the
> application of RCU alone.
> 
> IMO: Its not the tool for the job:  At least, not when used alone.  You
> either need RCU + reference count (which has more overhead than SRCU due
> to the atomic ops), or SRCU.  There may perhaps be other variations on
> this theme, as well, and I am not married to SRCU as the solution, per
> se.  But it is *a* solution that I believe works, and IMO its the
> best/cleanest/simplest one at our disposal.
> 



--
			Gleb.

  reply	other threads:[~2009-10-27 15:30 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-26 16:21 [KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes Gregory Haskins
2009-10-26 16:21 ` [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic Gregory Haskins
2009-10-27  3:36   ` Paul E. McKenney
2009-10-27 13:34     ` Gregory Haskins
2009-10-27 17:01       ` Paul E. McKenney
2009-10-27  6:45   ` Gleb Natapov
2009-10-27 13:39     ` Gregory Haskins
2009-10-27 14:00       ` Gregory Haskins
2009-10-27 14:05         ` Gleb Natapov
2009-10-27 14:50           ` Gregory Haskins
2009-10-27 15:04             ` Gleb Natapov
2009-10-27 15:42               ` Gregory Haskins
2009-10-27 14:02       ` Gleb Natapov
2009-10-27 14:47         ` Gregory Haskins
2009-10-27 15:30           ` Gleb Natapov [this message]
2009-10-27 16:53             ` Gregory Haskins
2009-10-27 14:49         ` Paul E. McKenney
2009-10-27 15:02           ` Gregory Haskins
2009-10-27 16:14             ` Paul E. McKenney
2009-10-26 16:22 ` [KVM PATCH v3 2/3] KVM: export lockless GSI attribute Gregory Haskins
2009-10-28  7:46   ` Michael S. Tsirkin
2009-10-28 13:24     ` Gregory Haskins
2009-10-26 16:22 ` [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation Gregory Haskins
2009-10-27 17:45   ` Michael S. Tsirkin
2009-10-27 18:54     ` Gregory Haskins
2009-10-28  7:35       ` Michael S. Tsirkin
2009-10-28 13:20         ` Gregory Haskins
  -- strict thread matches above, loose matches on Subject: below --
2009-10-26 16:20 [KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes Gregory Haskins
2009-10-26 16:20 ` [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091027153007.GP29477@redhat.com \
    --to=gleb@redhat.com \
    --cc=alacrityvm-devel@lists.sourceforge.net \
    --cc=ghaskins@novell.com \
    --cc=gregory.haskins@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox