linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: Re: [PATCH] percpu-rwsem: use barrier in unlock path
Date: Thu, 18 Oct 2012 10:18:30 +0800	[thread overview]
Message-ID: <507F66F6.20704@cn.fujitsu.com> (raw)
In-Reply-To: <20121017202806.GA7282@home.goodmis.org>

On 10/18/2012 04:28 AM, Steven Rostedt wrote:
> On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote:
>>>
>>> Even the previous patch is applied, percpu_down_read() still
>>> needs mb() to pair with it.
>>
>> percpu_down_read uses rcu_read_lock which should guarantee that memory 
>> accesses don't escape in front of a rcu-protected section.
> 
> You do realize that rcu_read_lock() does nothing more that a barrier(),
> right?
> 
> Paul worked really hard to get rcu_read_locks() to not call HW barriers.
> 
>>
>> If rcu_read_unlock has only an unlock barrier and not a full barrier, 
>> memory accesses could be moved in front of rcu_read_unlock and reordered 
>> with this_cpu_inc(*p->counters), but it doesn't matter because 
>> percpu_down_write does synchronize_rcu(), so it never sees these accesses 
>> halfway through.
> 
> Looking at the patch, you are correct. The read side doesn't need the
> memory barrier as the worse thing that will happen is that it sees the
> locked = false, and will just grab the mutex unnecessarily.

---------------------
A memory barrier can be added iff these two things are known:
	1) it disables the disordering between what and what.
	2) what is the corresponding mb() that it pairs with.

You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering
between the writes to the protected data and the statement "p->locked = false",
But I can't find out the corresponding mb() that it pairs with.

percpu_down_read()					writes to the data
	The cpu cache/prefetch the data			writes to the data
	which is chaos					writes to the data
							percpu_up_write()
								mb()
								p->locked = false;
	unlikely(p->locked)
		the cpu see p->lock = false,
		don't discard the cached/prefetch data
	this_cpu_inc(*p->counters);
	the code of read-access to the data
	****and we use the chaos data*****

So you need to add a mb() after "unlikely(p->locked)".

-------------------------

The RCU you use don't protect any data. It protects codes of the fast path:
	unlikely(p->locked);
	this_cpu_inc(*p->counters);

and synchronize_rcu() ensures all previous fast path had fully finished
"this_cpu_inc(*p->counters);".

It don't protect other code/data, if you want to protect other code or other
data, please add more synchronizations or mb()s.

---------------

I extremely hate a synchronization protects code instead of data.
but sometimes I also have to do it.

---------------

a very draft example of paired-mb()s is here:


diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index cf80f7e..84a93c0 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -12,6 +12,14 @@ struct percpu_rw_semaphore {
 	struct mutex mtx;
 };
 
+#if 1
+#define light_mb() barrier()
+#define heavy_mb() synchronize_sched()
+#else
+#define light_mb() smp_mb()
+#define heavy_mb() smp_mb();
+#endif
+
 static inline void percpu_down_read(struct percpu_rw_semaphore *p)
 {
 	rcu_read_lock();
@@ -24,22 +32,12 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *p)
 	}
 	this_cpu_inc(*p->counters);
 	rcu_read_unlock();
+	light_mb(); /* A, between read of p->locked and read of data, paired with D */
 }
 
 static inline void percpu_up_read(struct percpu_rw_semaphore *p)
 {
-	/*
-	 * On X86, write operation in this_cpu_dec serves as a memory unlock
-	 * barrier (i.e. memory accesses may be moved before the write, but
-	 * no memory accesses are moved past the write).
-	 * On other architectures this may not be the case, so we need smp_mb()
-	 * there.
-	 */
-#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && !defined(CONFIG_X86_OOSTORE))
-	barrier();
-#else
-	smp_mb();
-#endif
+	light_mb(); /* B, between read of the data and write to p->counter, paired with C */
 	this_cpu_dec(*p->counters);
 }
 
@@ -61,11 +59,12 @@ static inline void percpu_down_write(struct percpu_rw_semaphore *p)
 	synchronize_rcu();
 	while (__percpu_count(p->counters))
 		msleep(1);
-	smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */
+	heavy_mb(); /* C, between read of p->counter and write to data, paired with B */
 }
 
 static inline void percpu_up_write(struct percpu_rw_semaphore *p)
 {
+	heavy_mb(); /* D, between write to data and write to p->locked, paired with A */
 	p->locked = false;
 	mutex_unlock(&p->mtx);
 }

  parent reply	other threads:[~2012-10-18  2:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <Pine.LNX.4.64.1210151716310.10685@file.rdu.redhat.com>
     [not found] ` <Pine.LNX.4.64.1210161924350.20581@file.rdu.redhat.com>
2012-10-17  2:23   ` [PATCH] percpu-rwsem: use barrier in unlock path Linus Torvalds
2012-10-17  5:58     ` Lai Jiangshan
2012-10-17  5:58       ` Lai Jiangshan
2012-10-17 15:07       ` Mikulas Patocka
2012-10-17 20:28         ` Steven Rostedt
2012-10-17 20:28           ` Steven Rostedt
2012-10-18  2:18           ` Lai Jiangshan [this message]
2012-10-18  4:13             ` Steven Rostedt
2012-10-18 16:17               ` Mikulas Patocka
2012-10-18 15:32             ` Mikulas Patocka
2012-10-18 19:56             ` Mikulas Patocka
2012-10-18 16:05           ` Mikulas Patocka
2012-10-17  9:56     ` Alan Cox
2012-10-18 16:00     ` Mikulas Patocka
2012-10-19 18:48       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507F66F6.20704@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=axboe@kernel.dk \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).