kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: paulmck@linux.vnet.ibm.com
Cc: Linu Cherian <linuc.decode@gmail.com>,
	rkrcmar@redhat.com, kvm@vger.kernel.org, linu.cherian@cavium.com,
	sunil.goutham@cavium.com, Geethasowjanya.Akula@cavium.com
Subject: Re: Possible race condition during lock_count increment in srcu_read_lock
Date: Tue, 30 May 2017 15:49:31 +0200	[thread overview]
Message-ID: <a0ad9967-95c8-b596-5ca6-a63f45bf0ad9@redhat.com> (raw)
In-Reply-To: <20170529201448.GN3956@linux.vnet.ibm.com>



On 29/05/2017 22:14, Paul E. McKenney wrote:
> On Mon, May 29, 2017 at 06:38:19PM +0200, Paolo Bonzini wrote:
>> On 29/05/2017 17:50, Paul E. McKenney wrote:
>>> You actually are not supposed to call __srcu_read_lock() and
>>> __srcu_read_unlock() from irq context.  (Unless you know that you
>>> interrupted some context that is guaranteed never to use the srcu_struct
>>> in question or some such.)

I was going to improve the SRCU doc comments, and then started digging 
into srcu.c history to find out why srcu_read_unlock uses this_cpu_inc.

It turns out that back in 2012 __srcu_read_lock did two accesses to 
percpu variables, while __srcu_read_unlock did one (a decrement).  Then, 
commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via 
this_cpu_dec()", 2012-11-29) changed __srcu_read_unlock to use 
this_cpu_dec and remove the preempt_disable/preempt_enable pair.

Nowadays, __srcu_read_lock only does one access to percpu variables so 
we could do the same as commit 5a41344a3d83 did:

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 167ad8831aaf..4c1d5f7e62c4 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -172,9 +172,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
 {
 	int retval;
 
-	preempt_disable();
 	retval = __srcu_read_lock(sp);
-	preempt_enable();
 	rcu_lock_acquire(&(sp)->dep_map);
 	return retval;
 }
diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c
index 584d8a983883..dea03614263f 100644
--- a/kernel/rcu/srcu.c
+++ b/kernel/rcu/srcu.c
@@ -263,7 +263,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
 
 /*
  * Counts the new reader in the appropriate per-CPU element of the
- * srcu_struct.  Must be called from process context.
+ * srcu_struct.
  * Returns an index that must be passed to the matching srcu_read_unlock().
  */
 int __srcu_read_lock(struct srcu_struct *sp)
@@ -271,7 +271,7 @@ int __srcu_read_lock(struct srcu_struct *sp)
 	int idx;
 
 	idx = READ_ONCE(sp->completed) & 0x1;
-	__this_cpu_inc(sp->per_cpu_ref->lock_count[idx]);
+	this_cpu_inc(sp->per_cpu_ref->lock_count[idx]);
 	smp_mb(); /* B */  /* Avoid leaking the critical section. */
 	return idx;
 }
@@ -281,7 +281,6 @@ int __srcu_read_lock(struct srcu_struct *sp)
  * Removes the count for the old reader from the appropriate per-CPU
  * element of the srcu_struct.  Note that this may well be a different
  * CPU than that which was incremented by the corresponding srcu_read_lock().
- * Must be called from process context.
  */
 void __srcu_read_unlock(struct srcu_struct *sp, int idx)
 {


As said earlier, this would slow down SRCU on machines where neither fast
this_cpu_inc nor atomic_inc_relaxed are available, and local_irq_save/restore
is slower than disabling preemption.  The main one is s390, which however is
already paying the price in __srcu_read_unlock.

Any objections to the above for 4.12, with commit message explaining both the
history of __srcu_read_unlock and what this fixes in KVM?

Thanks,

Paolo

>>>
>>> Given that this is the KVM irq injection path, I am guessing that it
>>> is executed quite frequently, so that you do need a low-cost highly
>>> scalable solution.  However, you only really need to exclude between
>>> process level and the various interrupts, so there are three ways
>>> that an architecture could handle this:
>>>
>>> 1.	Single memory-to-memory increment instructions, as you say.
>>>
>>> 2.	Atomic read-modify-write instructions, again as you say.
>>>
>>> 3.	Disable interrupts across __srcu_read_lock() and __srcu_read_unlock().
>>>
>>> Different architectures would want different approached, depending on what
>>> instructions are available and the relative speed of these instructions.
>>> It would not be hard to make this an architecture choice on the one
>>> hand or to provide an srcu_read_lock_irq() and srcu_read_unlock_irq()
>>> on the other.  But I recently got some serious pushback on RCU API
>>> proliferation on the one hand and SRCU in particular on the other.  So
>>> it might be good to look at some alternatives as well.
>>
>> Yeah, I liked Linu's suggestion of this_cpu_inc because it's already
>> used in srcu_read_unlock, and because it can fall back to (3)
>> automatically.  Relaxed atomics weren't much of a proposal, rather a way
>> to implement this_cpu_inc on some architectures that do not support it
>> yet (notably 32-bit ARM, MIPS and PPC).
>>
>> The disadvantage would be slowing down SRCU on machines where neither
>> (1) nor (2) are possible, and local_irq_save/restore is slower than
>> disabling preemption.  But I can certainly wrap ->irq_srcu lock/unlock
>> into my own helpers that take care of saving/restoring the interrupt
>> state, like
>>
>>     #define kvm_irq_srcu_read_lock_irqsave(kvm, flags)
>>     #define kvm_irq_srcu_read_unlock_irqrestore(kvm, flags)
>>
>> This would be very simple, confined into KVM and respectful of the SRCU
>> API requirements.
> 
> I do like this approach!  If it turns out to be needed in (say) four
> other places, then it might make sense for me to pull this into SRCU as
> an additional API.

Good.  On the other hand, I was 

  reply	other threads:[~2017-05-30 13:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-29 14:30 Possible race condition during lock_count increment in srcu_read_lock Linu Cherian
2017-05-29 15:05 ` Paolo Bonzini
2017-05-29 15:50   ` Paul E. McKenney
2017-05-29 16:38     ` Paolo Bonzini
2017-05-29 20:14       ` Paul E. McKenney
2017-05-30 13:49         ` Paolo Bonzini [this message]
2017-05-30 16:19           ` Paul E. McKenney
2017-05-31 13:03           ` Linu Cherian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0ad9967-95c8-b596-5ca6-a63f45bf0ad9@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=Geethasowjanya.Akula@cavium.com \
    --cc=kvm@vger.kernel.org \
    --cc=linu.cherian@cavium.com \
    --cc=linuc.decode@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rkrcmar@redhat.com \
    --cc=sunil.goutham@cavium.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).