From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Oleg Nesterov <oleg@tv-sign.ru>,
Kernel development list <linux-kernel@vger.kernel.org>
Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Date: Sun, 19 Nov 2006 13:43:15 -0800 [thread overview]
Message-ID: <20061119214315.GI4427@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0611181536050.15971-100000@netrider.rowland.org>
On Sat, Nov 18, 2006 at 04:00:35PM -0500, Alan Stern wrote:
> On Sat, 18 Nov 2006, Paul E. McKenney wrote:
>
> > > > @@ -94,7 +112,8 @@ void cleanup_srcu_struct(struct srcu_str
> > > > WARN_ON(sum); /* Leakage unless caller handles error. */
> > > > if (sum != 0)
> > > > return;
> > > > - free_percpu(sp->per_cpu_ref);
> > > > + if (sp->per_cpu_ref != NULL)
> > > > + free_percpu(sp->per_cpu_ref);
> > >
> > > Now that Andrew has accepted the "allow free_percpu(NULL)" change, you can
> > > remove the test here.
> >
> > OK. I thought that there was some sort of error-checking involved,
> > but if not, will fix.
>
> Just make sure that _you_ have the free_percpu(NULL) patch installed on
> your machine before testing this -- otherwise you'll get a nice hard
> crash!
'Nuff said -- will leave this fixup till later. ;-)
> > > > preempt_disable();
> > > > idx = sp->completed & 0x1;
> > > > - barrier(); /* ensure compiler looks -once- at sp->completed. */
> > > > - per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
> > > > - srcu_barrier(); /* ensure compiler won't misorder critical section. */
> > > > + sap = rcu_dereference(sp->per_cpu_ref);
> > > > + if (likely(sap != NULL)) {
> > > > + barrier(); /* ensure compiler looks -once- at sp->completed. */
> > >
> > > Put this barrier() back where the old one was (outside the "if").
> >
> > Why? Outside this "if", I don't use "sap".
>
> Because it looks funny to see the comment here talking about sp->completed
> when sp->completed hasn't been used for several lines. (Maybe it looks
> less funny in the patched source than in the patch itself.) The best
> place to prevent extra accesses of sp->completed is immediately after
> the required access.
Good point -- and with the addition of the second element of hardluckref,
it has to be hoisted out of the "if" in any case.
> > > > + smp_processor_id())->c[idx]++;
> > > > + smp_mb();
> > > > + preempt_enable();
> > > > + return idx;
> > > > + }
> > > > + if (mutex_trylock(&sp->mutex)) {
> > > > + preempt_enable();
> > >
> > > Move the preempt_enable() before the "if", then get rid of the
> > > preempt_enable() after the "if" block.
> >
> > No can do. The preempt_enable() must follow the increment and
> > the memory barrier, otherwise the synchronize_sched() inside
> > synchronize_srcu() can't do its job.
>
> You misunderstood -- I was talking about the preempt_enable() in the last
> line quoted above (not the one in the third line) and the "if
> (mutex_trylock" (not the earlier "if (likely").
OK, I see your point -- but this has changed thoroughly with the
addition of the second element of hardluckref.
> > > > + if (sp->per_cpu_ref == NULL)
> > > > + sp->per_cpu_ref = alloc_srcu_struct_percpu();
> > >
> > > It would be cleaner to put the mutex_unlock() and closing '}' right here.
> >
> > I can move the mutex_unlock() to this point, but I cannot otherwise
> > merge the two following pieces of code -- at least not without doing
> > an otherwise-gratuitous preempt_disable(). Which I suppose I could
> > do, but seems like it would be more confusing than would the
> > separate code. I will play with this a bit and see if I can eliminate
> > the duplication.
>
> If you follow the advice above then you won't need to add a gratuitous
> preempt_disable(). Try it and see how it comes out; the idea is that
> you can use the same code for testing sp->per_cpu_ref regardless of
> whether the mutex_trylock() or the call to alloc_srcu_struct_percpu()
> succeeded.
Understood, finally -- but the two-element hardluckref now requires
greater preempt_disable() coverage.
> > > What happens if a prior reader failed to allocate the memory but this call
> > > succeeds? You need to check hardluckref before doing this. The same is
> > > true in srcu_read_lock().
> >
> > All accounted for by the fact that hardluckref is unconditionally
> > added in by srcu_readers_active(). Right?
>
> Yes, you're right.
>
> > Will spin a new patch...
>
> Good -- it's getting pretty messy to look at this one!
>
> By the way, I think the fastpath for synchronize_srcu() should be safe,
> now that you have added the memory barriers into srcu_read_lock() and
> srcu_read_unlock(). You might as well try putting it in.
>
> Although now that I look at it again, you have forgotten to put smp_mb()
> after the atomic_inc() call and before the atomic_dec(). In
> srcu_read_unlock() you could just move the existing smp_mb() back before
> the test of idx.
Good catch again -- added smp_mb__before_atomic_dec() and
smp_mb__after_atomic_inc(). The reason for avoiding moving the smp_mb()
is that atomic_dec() implies a memory barrier on some architectures,
such as x86. In these cases, smp_mb__before_atomic_dec() is a no-op.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (was @us.ibm.com)
---
include/linux/srcu.h | 8 ---
kernel/srcu.c | 126 +++++++++++++++++++++++++++------------------------
2 files changed, 71 insertions(+), 63 deletions(-)
diff -urpNa -X dontdiff linux-2.6.19-rc5/include/linux/srcu.h linux-2.6.19-rc5-dsrcu/include/linux/srcu.h
--- linux-2.6.19-rc5/include/linux/srcu.h 2006-11-17 15:44:40.000000000 -0800
+++ linux-2.6.19-rc5-dsrcu/include/linux/srcu.h 2006-11-19 13:33:35.000000000 -0800
@@ -35,19 +35,15 @@ struct srcu_struct {
int completed;
struct srcu_struct_array *per_cpu_ref;
struct mutex mutex;
+ atomic_t hardluckref[2];
};
-#ifndef CONFIG_PREEMPT
-#define srcu_barrier() barrier()
-#else /* #ifndef CONFIG_PREEMPT */
-#define srcu_barrier()
-#endif /* #else #ifndef CONFIG_PREEMPT */
-
int init_srcu_struct(struct srcu_struct *sp);
void cleanup_srcu_struct(struct srcu_struct *sp);
int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
void synchronize_srcu(struct srcu_struct *sp);
long srcu_batches_completed(struct srcu_struct *sp);
+int srcu_readers_active(struct srcu_struct *sp);
#endif
diff -urpNa -X dontdiff linux-2.6.19-rc5/kernel/srcu.c linux-2.6.19-rc5-dsrcu/kernel/srcu.c
--- linux-2.6.19-rc5/kernel/srcu.c 2006-11-17 15:44:40.000000000 -0800
+++ linux-2.6.19-rc5-dsrcu/kernel/srcu.c 2006-11-19 13:40:33.000000000 -0800
@@ -34,6 +34,18 @@
#include <linux/smp.h>
#include <linux/srcu.h>
+/*
+ * Initialize the per-CPU array, returning the pointer.
+ */
+static inline struct srcu_struct_array *alloc_srcu_struct_percpu(void)
+{
+ struct srcu_struct_array *sap;
+
+ sap = alloc_percpu(struct srcu_struct_array);
+ smp_wmb();
+ return (sap);
+}
+
/**
* init_srcu_struct - initialize a sleep-RCU structure
* @sp: structure to initialize.
@@ -46,7 +58,9 @@ int init_srcu_struct(struct srcu_struct
{
sp->completed = 0;
mutex_init(&sp->mutex);
- sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
+ sp->per_cpu_ref = alloc_srcu_struct_percpu();
+ atomic_set(&sp->hardluckref[0], 0);
+ atomic_set(&sp->hardluckref[1], 0);
return (sp->per_cpu_ref ? 0 : -ENOMEM);
}
@@ -58,12 +72,15 @@ int init_srcu_struct(struct srcu_struct
static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
{
int cpu;
+ struct srcu_struct_array *sap;
int sum;
sum = 0;
- for_each_possible_cpu(cpu)
- sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
- return sum;
+ sap = rcu_dereference(sp->per_cpu_ref);
+ if (likely(sap != NULL))
+ for_each_possible_cpu(cpu)
+ sum += per_cpu_ptr(sap, cpu)->c[idx];
+ return sum + atomic_read(&sp->hardluckref[idx]);
}
/**
@@ -94,7 +111,8 @@ void cleanup_srcu_struct(struct srcu_str
WARN_ON(sum); /* Leakage unless caller handles error. */
if (sum != 0)
return;
- free_percpu(sp->per_cpu_ref);
+ if (sp->per_cpu_ref != NULL)
+ free_percpu(sp->per_cpu_ref);
sp->per_cpu_ref = NULL;
}
@@ -105,18 +123,41 @@ void cleanup_srcu_struct(struct srcu_str
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Must be called from process context.
* Returns an index that must be passed to the matching srcu_read_unlock().
+ * The index is mapped to negative numbers if the srcu_struct is not and
+ * cannot be initialized.
*/
int srcu_read_lock(struct srcu_struct *sp)
{
int idx;
+ struct srcu_struct_array *sap;
preempt_disable();
idx = sp->completed & 0x1;
barrier(); /* ensure compiler looks -once- at sp->completed. */
- per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
- srcu_barrier(); /* ensure compiler won't misorder critical section. */
+ sap = rcu_dereference(sp->per_cpu_ref);
+ if (likely(sap != NULL)) {
+ per_cpu_ptr(sap, smp_processor_id())->c[idx]++;
+ smp_mb();
+ preempt_enable();
+ return idx;
+ }
+ if (mutex_trylock(&sp->mutex)) {
+ preempt_enable();
+ if (sp->per_cpu_ref == NULL)
+ sp->per_cpu_ref = alloc_srcu_struct_percpu();
+ if (sp->per_cpu_ref == NULL) {
+ mutex_unlock(&sp->mutex);
+ preempt_disable();
+ idx = sp->completed & 0x1;
+ } else {
+ mutex_unlock(&sp->mutex);
+ return srcu_read_lock(sp);
+ }
+ }
+ atomic_inc(&sp->hardluckref[idx]);
+ smp_mb__after_atomic_inc();
preempt_enable();
- return idx;
+ return -1 - idx;
}
/**
@@ -131,10 +172,16 @@ int srcu_read_lock(struct srcu_struct *s
*/
void srcu_read_unlock(struct srcu_struct *sp, int idx)
{
- preempt_disable();
- srcu_barrier(); /* ensure compiler won't misorder critical section. */
- per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
- preempt_enable();
+ if (likely(idx <= 0)) {
+ preempt_disable();
+ smp_mb();
+ per_cpu_ptr(rcu_dereference(sp->per_cpu_ref),
+ smp_processor_id())->c[idx]--;
+ preempt_enable();
+ return;
+ }
+ smp_mb__before_atomic_dec();
+ atomic_dec(&sp->hardluckref[-1 - idx]);
}
/**
@@ -158,6 +205,11 @@ void synchronize_srcu(struct srcu_struct
idx = sp->completed;
mutex_lock(&sp->mutex);
+ /* Initialize if not already initialized. */
+
+ if (sp->per_cpu_ref == NULL)
+ sp->per_cpu_ref = alloc_srcu_struct_percpu();
+
/*
* Check to see if someone else did the work for us while we were
* waiting to acquire the lock. We need -two- advances of
@@ -173,65 +225,25 @@ void synchronize_srcu(struct srcu_struct
return;
}
- synchronize_sched(); /* Force memory barrier on all CPUs. */
-
- /*
- * The preceding synchronize_sched() ensures that any CPU that
- * sees the new value of sp->completed will also see any preceding
- * changes to data structures made by this CPU. This prevents
- * some other CPU from reordering the accesses in its SRCU
- * read-side critical section to precede the corresponding
- * srcu_read_lock() -- ensuring that such references will in
- * fact be protected.
- *
- * So it is now safe to do the flip.
- */
-
+ smp_mb(); /* ensure srcu_read_lock() sees prior change first! */
idx = sp->completed & 0x1;
sp->completed++;
- synchronize_sched(); /* Force memory barrier on all CPUs. */
+ synchronize_sched();
/*
* At this point, because of the preceding synchronize_sched(),
* all srcu_read_lock() calls using the old counters have completed.
* Their corresponding critical sections might well be still
* executing, but the srcu_read_lock() primitives themselves
- * will have finished executing.
+ * will have finished executing. The "old" rank of counters
+ * can therefore only decrease, never increase in value.
*/
while (srcu_readers_active_idx(sp, idx))
schedule_timeout_interruptible(1);
- synchronize_sched(); /* Force memory barrier on all CPUs. */
-
- /*
- * The preceding synchronize_sched() forces all srcu_read_unlock()
- * primitives that were executing concurrently with the preceding
- * for_each_possible_cpu() loop to have completed by this point.
- * More importantly, it also forces the corresponding SRCU read-side
- * critical sections to have also completed, and the corresponding
- * references to SRCU-protected data items to be dropped.
- *
- * Note:
- *
- * Despite what you might think at first glance, the
- * preceding synchronize_sched() -must- be within the
- * critical section ended by the following mutex_unlock().
- * Otherwise, a task taking the early exit can race
- * with a srcu_read_unlock(), which might have executed
- * just before the preceding srcu_readers_active() check,
- * and whose CPU might have reordered the srcu_read_unlock()
- * with the preceding critical section. In this case, there
- * is nothing preventing the synchronize_sched() task that is
- * taking the early exit from freeing a data structure that
- * is still being referenced (out of order) by the task
- * doing the srcu_read_unlock().
- *
- * Alternatively, the comparison with "2" on the early exit
- * could be changed to "3", but this increases synchronize_srcu()
- * latency for bulk loads. So the current code is preferred.
- */
+ smp_mb(); /* must see critical section prior to srcu_read_unlock() */
mutex_unlock(&sp->mutex);
}
next prev parent reply other threads:[~2006-11-19 21:46 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-16 20:00 BUG: cpufreq notification broken Thomas Gleixner
2006-11-16 20:15 ` [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync Ingo Molnar
2006-11-16 20:45 ` Thomas Gleixner
2006-11-16 21:47 ` Linus Torvalds
2006-11-16 22:03 ` Alan Stern
2006-11-16 22:21 ` Linus Torvalds
2006-11-17 3:06 ` Alan Stern
2006-11-17 6:51 ` Paul E. McKenney
2006-11-17 9:29 ` Jens Axboe
2006-11-17 18:39 ` Oleg Nesterov
2006-11-18 0:28 ` Paul E. McKenney
2006-11-18 16:15 ` Alan Stern
2006-11-18 17:14 ` Paul E. McKenney
2006-11-18 19:34 ` Oleg Nesterov
2006-11-19 21:26 ` Paul E. McKenney
2006-11-18 21:00 ` Alan Stern
2006-11-18 21:25 ` Oleg Nesterov
2006-11-18 22:13 ` Alan Stern
2006-11-18 22:46 ` Oleg Nesterov
2006-11-19 20:12 ` Alan Stern
2006-11-19 21:43 ` Paul E. McKenney [this message]
2006-11-20 17:19 ` Alan Stern
2006-11-20 17:58 ` Jens Axboe
2006-11-20 19:39 ` Alan Stern
2006-11-20 20:13 ` Jens Axboe
2006-11-20 21:39 ` Alan Stern
2006-11-21 7:39 ` Jens Axboe
2006-11-20 18:57 ` Oleg Nesterov
2006-11-20 20:01 ` Alan Stern
2006-11-20 20:51 ` Paul E. McKenney
2006-11-21 20:04 ` Oleg Nesterov
2006-11-21 20:54 ` Alan Stern
2006-11-21 22:07 ` Paul E. McKenney
2006-11-20 20:38 ` Paul E. McKenney
2006-11-21 16:44 ` Oleg Nesterov
2006-11-21 19:56 ` Paul E. McKenney
2006-11-21 20:26 ` Alan Stern
2006-11-21 23:03 ` Paul E. McKenney
2006-11-22 2:17 ` Alan Stern
2006-11-22 17:01 ` Paul E. McKenney
2006-11-26 22:25 ` Oleg Nesterov
2006-11-27 21:10 ` Alan Stern
2006-11-28 1:47 ` Paul E. McKenney
2006-11-20 19:17 ` Paul E. McKenney
2006-11-20 20:22 ` Alan Stern
2006-11-21 17:55 ` Paul E. McKenney
2006-11-21 17:56 ` Alan Stern
2006-11-21 19:13 ` Paul E. McKenney
2006-11-21 20:40 ` Alan Stern
2006-11-22 18:08 ` Paul E. McKenney
2006-11-21 21:01 ` Oleg Nesterov
2006-11-22 0:51 ` Paul E. McKenney
2006-11-18 18:46 ` Oleg Nesterov
2006-11-19 21:07 ` Paul E. McKenney
2006-11-20 7:15 ` Jens Axboe
2006-11-20 16:59 ` Paul E. McKenney
2006-11-20 17:55 ` Jens Axboe
2006-11-20 20:09 ` Paul E. McKenney
2006-11-20 20:21 ` Jens Axboe
2006-11-18 19:02 ` Oleg Nesterov
2006-11-19 21:23 ` Paul E. McKenney
2006-11-17 19:15 ` Paul E. McKenney
2006-11-17 19:27 ` Alan Stern
2006-11-18 0:38 ` Paul E. McKenney
2006-11-18 4:33 ` Alan Stern
2006-11-18 4:51 ` Andrew Morton
2006-11-18 5:57 ` Paul E. McKenney
2006-11-19 19:00 ` Oleg Nesterov
2006-11-19 20:21 ` Alan Stern
2006-11-19 20:55 ` Oleg Nesterov
2006-11-19 21:09 ` Alan Stern
2006-11-19 21:17 ` Oleg Nesterov
2006-11-19 21:54 ` Paul E. McKenney
2006-11-19 22:28 ` Oleg Nesterov
2006-11-20 5:47 ` Paul E. McKenney
2006-11-19 21:20 ` Paul E. McKenney
2006-11-19 21:50 ` Oleg Nesterov
2006-11-19 22:04 ` Paul E. McKenney
2006-11-23 14:59 ` Oleg Nesterov
2006-11-23 20:40 ` Paul E. McKenney
2006-11-23 21:34 ` Oleg Nesterov
2006-11-23 21:49 ` Oleg Nesterov
2006-11-27 4:33 ` Paul E. McKenney
2006-11-24 18:21 ` Oleg Nesterov
2006-11-24 20:04 ` Jens Axboe
2006-11-24 20:47 ` Alan Stern
2006-11-24 21:13 ` Oleg Nesterov
2006-11-25 3:24 ` Alan Stern
2006-11-25 17:14 ` Oleg Nesterov
2006-11-25 22:06 ` Alan Stern
2006-11-26 21:34 ` Oleg Nesterov
2006-11-27 5:02 ` Paul E. McKenney
2006-11-27 16:11 ` Oleg Nesterov
2006-11-27 16:56 ` Oleg Nesterov
2006-11-29 19:29 ` Paul E. McKenney
2006-11-29 20:16 ` Oleg Nesterov
2006-11-29 23:08 ` Paul E. McKenney
2006-11-30 0:01 ` Oleg Nesterov
2006-11-17 2:33 ` Paul E. McKenney
2006-11-16 20:52 ` Peter Zijlstra
2006-11-16 21:20 ` Andrew Morton
2006-11-16 21:27 ` Thomas Gleixner
2006-11-20 19:57 ` Linus Torvalds
2006-11-16 20:27 ` BUG: cpufreq notification broken Alan Stern
2006-11-16 21:09 ` Thomas Gleixner
2006-11-16 21:26 ` Alan Stern
2006-11-16 21:36 ` Thomas Gleixner
2006-11-16 21:56 ` Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061119214315.GI4427@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@tv-sign.ru \
--cc=paulmck@linux.vnet.ibm.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).