All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Gregory Haskins <gregory.haskins@gmail.com>
Cc: Gregory Haskins <ghaskins@novell.com>,
	kvm@vger.kernel.org, alacrityvm-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Date: Tue, 27 Oct 2009 10:01:21 -0700	[thread overview]
Message-ID: <20091027170121.GE6258@linux.vnet.ibm.com> (raw)
In-Reply-To: <4AE6F6F1.4090308@gmail.com>

On Tue, Oct 27, 2009 at 09:34:41AM -0400, Gregory Haskins wrote:
> Hi Paul,
> 
> Paul E. McKenney wrote:
> > On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
> >> The current code suffers from the following race condition:
> >>
> >> thread-1                                    thread-2
> >> -----------------------------------------------------------
> >>
> >> kvm_set_irq() {
> >>    rcu_read_lock()
> >>    irq_rt = rcu_dereference(table);
> >>    rcu_read_unlock();
> >>
> >>                                        kvm_set_irq_routing() {
> >>                                           mutex_lock();
> >>                                           irq_rt = table;
> >>                                           rcu_assign_pointer();
> >>                                           mutex_unlock();
> >>                                           synchronize_rcu();
> >>
> >>                                           kfree(irq_rt);
> >>
> >>    irq_rt->entry->set(); /* bad */
> >>
> >> -------------------------------------------------------------
> >>
> >> Because the pointer is accessed outside of the read-side critical
> >> section.  There are two basic patterns we can use to fix this bug:
> >>
> >> 1) Switch to sleeping-rcu and encompass the ->set() access within the
> >>    read-side critical section,
> >>
> >>    OR
> >>
> >> 2) Add reference counting to the irq_rt structure, and simply acquire
> >>    the reference from within the RSCS.
> >>
> >> This patch implements solution (1).
> > 
> > Looks like a good transformation!  A few questions interspersed below.
> 
> Thanks for the review.  I would have CC'd you but I figured I pestered
> you enough with my RCU reviews in the past, and didn't want to annoy you ;)
> 
> I will be sure to CC you in the future, unless you ask otherwise.

No problem either way.  ;-)

							Thanx, Paul

> >> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
> >> ---
> >>
> >>  include/linux/kvm_host.h |    6 +++++-
> >>  virt/kvm/irq_comm.c      |   50 +++++++++++++++++++++++++++-------------------
> >>  virt/kvm/kvm_main.c      |    1 +
> >>  3 files changed, 35 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index bd5a616..1fe135d 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -185,7 +185,10 @@ struct kvm {
> >>
> >>  	struct mutex irq_lock;
> >>  #ifdef CONFIG_HAVE_KVM_IRQCHIP
> >> -	struct kvm_irq_routing_table *irq_routing;
> >> +	struct {
> >> +		struct srcu_struct            srcu;
> > 
> > Each structure has its own SRCU domain.  This is OK, but just asking
> > if that is the intent.  It does look like the SRCU primitives are
> > passed a pointer to the correct structure, and that the return value
> > from srcu_read_lock() gets passed into the matching srcu_read_unlock()
> > like it needs to be, so that is good.
> 
> Yeah, it was intentional.  Technically the table is per-guest, and thus
> the locking is too, which is the desired/intentional granularity.
> 
> On that note, I tried to denote that kvm->irq_routing.srcu and
> kvm->irq_routing.table were related to one another, but then went ahead
> and modified the hunks that touched kvm->irq_ack_notifier_list, too.  In
> retrospect, this was probably a mistake.  I should leave the rcu usage
> outside of ->irq_routing.table alone.
> 
> > 
> >> +		struct kvm_irq_routing_table *table;
> >> +	} irq_routing;
> >>  	struct hlist_head mask_notifier_list;
> >>  	struct hlist_head irq_ack_notifier_list;
> >>  #endif
> > 
> > [ . . . ]
> > 
> >> @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
> >>  	 * IOAPIC.  So set the bit in both. The guest will ignore
> >>  	 * writes to the unused one.
> >>  	 */
> >> -	rcu_read_lock();
> >> -	irq_rt = rcu_dereference(kvm->irq_routing);
> >> +	idx = srcu_read_lock(&kvm->irq_routing.srcu);
> >> +	irq_rt = rcu_dereference(kvm->irq_routing.table);
> >>  	if (irq < irq_rt->nr_rt_entries)
> >> -		hlist_for_each_entry(e, n, &irq_rt->map[irq], link)
> >> -			irq_set[i++] = *e;
> >> -	rcu_read_unlock();
> >> +		hlist_for_each_entry(e, n, &irq_rt->map[irq], link) {
> > 
> > What prevents the above list from changing while we are traversing it?
> > (Yes, presumably whatever was preventing it from changing before this
> > patch, but what?)
> > 
> > Mostly kvm->lock is held, but not always.  And if kvm->lock were held
> > all the time, there would be no point in using SRCU.  ;-)
> 
> This is protected by kvm->irq_lock within kvm_set_irq_routing().
> Entries are added to a copy of the list, and the top-level table pointer
> is swapped (via rcu_assign_pointer(), as it should be) while holding the
> lock.  Finally, we synchronize with the RSCS before deleting the old
> copy.  It looks to me like the original author got this part right, so I
> didn't modify it outside of converting to SRCU.
> 
> > 
> >> +			int r;
> >>
> >> -	while(i--) {
> >> -		int r;
> >> -		r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level);
> >> -		if (r < 0)
> >> -			continue;
> >> +			r = e->set(e, kvm, irq_source_id, level);
> >> +			if (r < 0)
> >> +				continue;
> >>
> >> -		ret = r + ((ret < 0) ? 0 : ret);
> >> -	}
> >> +			ret = r + ((ret < 0) ? 0 : ret);
> >> +		}
> >> +	srcu_read_unlock(&kvm->irq_routing.srcu, idx);
> >>
> >>  	return ret;
> >>  }
> >> @@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> >>  	struct kvm_irq_ack_notifier *kian;
> >>  	struct hlist_node *n;
> >>  	int gsi;
> >> +	int idx;
> >>
> >>  	trace_kvm_ack_irq(irqchip, pin);
> >>
> >> -	rcu_read_lock();
> >> -	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> >> +	idx = srcu_read_lock(&kvm->irq_routing.srcu);
> >> +	gsi = rcu_dereference(kvm->irq_routing.table)->chip[irqchip][pin];
> >>  	if (gsi != -1)
> >>  		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
> >>  					 link)
> > 
> > And same question here -- what keeps the above list from changing while
> > we are traversing it?
> 
> This is also protected via the kvm->irq_lock in
> kvm_register_irq_ack_notifier().  Though as mentioned above, I should
> probably drop the non irq_routing.table hunks, so this will go away.
> But I think its correct either way.
> 
> Thanks Paul,
> -Greg
> 



  reply	other threads:[~2009-10-27 17:01 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-26 16:21 [KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes Gregory Haskins
2009-10-26 16:21 ` [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic Gregory Haskins
2009-10-27  3:36   ` Paul E. McKenney
2009-10-27 13:34     ` Gregory Haskins
2009-10-27 17:01       ` Paul E. McKenney [this message]
2009-10-27  6:45   ` Gleb Natapov
2009-10-27 13:39     ` Gregory Haskins
2009-10-27 14:00       ` Gregory Haskins
2009-10-27 14:05         ` Gleb Natapov
2009-10-27 14:50           ` Gregory Haskins
2009-10-27 15:04             ` Gleb Natapov
2009-10-27 15:42               ` Gregory Haskins
2009-10-27 14:02       ` Gleb Natapov
2009-10-27 14:47         ` Gregory Haskins
2009-10-27 15:30           ` Gleb Natapov
2009-10-27 16:53             ` Gregory Haskins
2009-10-27 14:49         ` Paul E. McKenney
2009-10-27 15:02           ` Gregory Haskins
2009-10-27 16:14             ` Paul E. McKenney
2009-10-26 16:22 ` [KVM PATCH v3 2/3] KVM: export lockless GSI attribute Gregory Haskins
2009-10-28  7:46   ` Michael S. Tsirkin
2009-10-28 13:24     ` Gregory Haskins
2009-10-26 16:22 ` [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation Gregory Haskins
2009-10-27 17:45   ` Michael S. Tsirkin
2009-10-27 18:54     ` Gregory Haskins
2009-10-28  7:35       ` Michael S. Tsirkin
2009-10-28 13:20         ` Gregory Haskins
  -- strict thread matches above, loose matches on Subject: below --
2009-10-26 16:20 [KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes Gregory Haskins
2009-10-26 16:20 ` [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091027170121.GE6258@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=alacrityvm-devel@lists.sourceforge.net \
    --cc=ghaskins@novell.com \
    --cc=gregory.haskins@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.