public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Gregory Haskins <gregory.haskins@gmail.com>
To: paulmck@linux.vnet.ibm.com
Cc: Gregory Haskins <ghaskins@novell.com>,
	kvm@vger.kernel.org, alacrityvm-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Date: Tue, 27 Oct 2009 09:34:41 -0400	[thread overview]
Message-ID: <4AE6F6F1.4090308@gmail.com> (raw)
In-Reply-To: <20091027033601.GA6645@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 6069 bytes --]

Hi Paul,

Paul E. McKenney wrote:
> On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
>> The current code suffers from the following race condition:
>>
>> thread-1                                    thread-2
>> -----------------------------------------------------------
>>
>> kvm_set_irq() {
>>    rcu_read_lock()
>>    irq_rt = rcu_dereference(table);
>>    rcu_read_unlock();
>>
>>                                        kvm_set_irq_routing() {
>>                                           mutex_lock();
>>                                           irq_rt = table;
>>                                           rcu_assign_pointer();
>>                                           mutex_unlock();
>>                                           synchronize_rcu();
>>
>>                                           kfree(irq_rt);
>>
>>    irq_rt->entry->set(); /* bad */
>>
>> -------------------------------------------------------------
>>
>> Because the pointer is accessed outside of the read-side critical
>> section.  There are two basic patterns we can use to fix this bug:
>>
>> 1) Switch to sleeping-rcu and encompass the ->set() access within the
>>    read-side critical section,
>>
>>    OR
>>
>> 2) Add reference counting to the irq_rt structure, and simply acquire
>>    the reference from within the RSCS.
>>
>> This patch implements solution (1).
> 
> Looks like a good transformation!  A few questions interspersed below.

Thanks for the review.  I would have CC'd you but I figured I pestered
you enough with my RCU reviews in the past, and didn't want to annoy you ;)

I will be sure to CC you in the future, unless you ask otherwise.

> 
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> ---
>>
>>  include/linux/kvm_host.h |    6 +++++-
>>  virt/kvm/irq_comm.c      |   50 +++++++++++++++++++++++++++-------------------
>>  virt/kvm/kvm_main.c      |    1 +
>>  3 files changed, 35 insertions(+), 22 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index bd5a616..1fe135d 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -185,7 +185,10 @@ struct kvm {
>>
>>  	struct mutex irq_lock;
>>  #ifdef CONFIG_HAVE_KVM_IRQCHIP
>> -	struct kvm_irq_routing_table *irq_routing;
>> +	struct {
>> +		struct srcu_struct            srcu;
> 
> Each structure has its own SRCU domain.  This is OK, but just asking
> if that is the intent.  It does look like the SRCU primitives are
> passed a pointer to the correct structure, and that the return value
> from srcu_read_lock() gets passed into the matching srcu_read_unlock()
> like it needs to be, so that is good.

Yeah, it was intentional.  Technically the table is per-guest, and thus
the locking is too, which is the desired/intentional granularity.

On that note, I tried to denote that kvm->irq_routing.srcu and
kvm->irq_routing.table were related to one another, but then went ahead
and modified the hunks that touched kvm->irq_ack_notifier_list, too.  In
retrospect, this was probably a mistake.  I should leave the rcu usage
outside of ->irq_routing.table alone.

> 
>> +		struct kvm_irq_routing_table *table;
>> +	} irq_routing;
>>  	struct hlist_head mask_notifier_list;
>>  	struct hlist_head irq_ack_notifier_list;
>>  #endif
> 
> [ . . . ]
> 
>> @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
>>  	 * IOAPIC.  So set the bit in both. The guest will ignore
>>  	 * writes to the unused one.
>>  	 */
>> -	rcu_read_lock();
>> -	irq_rt = rcu_dereference(kvm->irq_routing);
>> +	idx = srcu_read_lock(&kvm->irq_routing.srcu);
>> +	irq_rt = rcu_dereference(kvm->irq_routing.table);
>>  	if (irq < irq_rt->nr_rt_entries)
>> -		hlist_for_each_entry(e, n, &irq_rt->map[irq], link)
>> -			irq_set[i++] = *e;
>> -	rcu_read_unlock();
>> +		hlist_for_each_entry(e, n, &irq_rt->map[irq], link) {
> 
> What prevents the above list from changing while we are traversing it?
> (Yes, presumably whatever was preventing it from changing before this
> patch, but what?)
> 
> Mostly kvm->lock is held, but not always.  And if kvm->lock were held
> all the time, there would be no point in using SRCU.  ;-)

This is protected by kvm->irq_lock within kvm_set_irq_routing().
Entries are added to a copy of the list, and the top-level table pointer
is swapped (via rcu_assign_pointer(), as it should be) while holding the
lock.  Finally, we synchronize with the RSCS before deleting the old
copy.  It looks to me like the original author got this part right, so I
didn't modify it outside of converting to SRCU.

> 
>> +			int r;
>>
>> -	while(i--) {
>> -		int r;
>> -		r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level);
>> -		if (r < 0)
>> -			continue;
>> +			r = e->set(e, kvm, irq_source_id, level);
>> +			if (r < 0)
>> +				continue;
>>
>> -		ret = r + ((ret < 0) ? 0 : ret);
>> -	}
>> +			ret = r + ((ret < 0) ? 0 : ret);
>> +		}
>> +	srcu_read_unlock(&kvm->irq_routing.srcu, idx);
>>
>>  	return ret;
>>  }
>> @@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
>>  	struct kvm_irq_ack_notifier *kian;
>>  	struct hlist_node *n;
>>  	int gsi;
>> +	int idx;
>>
>>  	trace_kvm_ack_irq(irqchip, pin);
>>
>> -	rcu_read_lock();
>> -	gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
>> +	idx = srcu_read_lock(&kvm->irq_routing.srcu);
>> +	gsi = rcu_dereference(kvm->irq_routing.table)->chip[irqchip][pin];
>>  	if (gsi != -1)
>>  		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
>>  					 link)
> 
> And same question here -- what keeps the above list from changing while
> we are traversing it?

This is also protected via the kvm->irq_lock in
kvm_register_irq_ack_notifier().  Though as mentioned above, I should
probably drop the non irq_routing.table hunks, so this will go away.
But I think its correct either way.

Thanks Paul,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

  reply	other threads:[~2009-10-27 13:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-26 16:21 [KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes Gregory Haskins
2009-10-26 16:21 ` [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic Gregory Haskins
2009-10-27  3:36   ` Paul E. McKenney
2009-10-27 13:34     ` Gregory Haskins [this message]
2009-10-27 17:01       ` Paul E. McKenney
2009-10-27  6:45   ` Gleb Natapov
2009-10-27 13:39     ` Gregory Haskins
2009-10-27 14:00       ` Gregory Haskins
2009-10-27 14:05         ` Gleb Natapov
2009-10-27 14:50           ` Gregory Haskins
2009-10-27 15:04             ` Gleb Natapov
2009-10-27 15:42               ` Gregory Haskins
2009-10-27 14:02       ` Gleb Natapov
2009-10-27 14:47         ` Gregory Haskins
2009-10-27 15:30           ` Gleb Natapov
2009-10-27 16:53             ` Gregory Haskins
2009-10-27 14:49         ` Paul E. McKenney
2009-10-27 15:02           ` Gregory Haskins
2009-10-27 16:14             ` Paul E. McKenney
2009-10-26 16:22 ` [KVM PATCH v3 2/3] KVM: export lockless GSI attribute Gregory Haskins
2009-10-28  7:46   ` Michael S. Tsirkin
2009-10-28 13:24     ` Gregory Haskins
2009-10-26 16:22 ` [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation Gregory Haskins
2009-10-27 17:45   ` Michael S. Tsirkin
2009-10-27 18:54     ` Gregory Haskins
2009-10-28  7:35       ` Michael S. Tsirkin
2009-10-28 13:20         ` Gregory Haskins
  -- strict thread matches above, loose matches on Subject: below --
2009-10-26 16:20 [KVM PATCH v3 0/3] irqfd enhancements, and irq_routing fixes Gregory Haskins
2009-10-26 16:20 ` [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AE6F6F1.4090308@gmail.com \
    --to=gregory.haskins@gmail.com \
    --cc=alacrityvm-devel@lists.sourceforge.net \
    --cc=ghaskins@novell.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox