From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Oleg Nesterov <oleg@tv-sign.ru>,
Kernel development list <linux-kernel@vger.kernel.org>
Subject: Re: [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync
Date: Sun, 19 Nov 2006 13:43:15 -0800 [thread overview]
Message-ID: <20061119214315.GI4427@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0611181536050.15971-100000@netrider.rowland.org>
On Sat, Nov 18, 2006 at 04:00:35PM -0500, Alan Stern wrote:
> On Sat, 18 Nov 2006, Paul E. McKenney wrote:
>
> > > > @@ -94,7 +112,8 @@ void cleanup_srcu_struct(struct srcu_str
> > > > WARN_ON(sum); /* Leakage unless caller handles error. */
> > > > if (sum != 0)
> > > > return;
> > > > - free_percpu(sp->per_cpu_ref);
> > > > + if (sp->per_cpu_ref != NULL)
> > > > + free_percpu(sp->per_cpu_ref);
> > >
> > > Now that Andrew has accepted the "allow free_percpu(NULL)" change, you can
> > > remove the test here.
> >
> > OK. I thought that there was some sort of error-checking involved,
> > but if not, will fix.
>
> Just make sure that _you_ have the free_percpu(NULL) patch installed on
> your machine before testing this -- otherwise you'll get a nice hard
> crash!
'Nuff said -- will leave this fixup till later. ;-)
> > > > preempt_disable();
> > > > idx = sp->completed & 0x1;
> > > > - barrier(); /* ensure compiler looks -once- at sp->completed. */
> > > > - per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
> > > > - srcu_barrier(); /* ensure compiler won't misorder critical section. */
> > > > + sap = rcu_dereference(sp->per_cpu_ref);
> > > > + if (likely(sap != NULL)) {
> > > > + barrier(); /* ensure compiler looks -once- at sp->completed. */
> > >
> > > Put this barrier() back where the old one was (outside the "if").
> >
> > Why? Outside this "if", I don't use "sap".
>
> Because it looks funny to see the comment here talking about sp->completed
> when sp->completed hasn't been used for several lines. (Maybe it looks
> less funny in the patched source than in the patch itself.) The best
> place to prevent extra accesses of sp->completed is immediately after
> the required access.
Good point -- and with the addition of the second element of hardluckref,
it has to be hoisted out of the "if" in any case.
> > > > + smp_processor_id())->c[idx]++;
> > > > + smp_mb();
> > > > + preempt_enable();
> > > > + return idx;
> > > > + }
> > > > + if (mutex_trylock(&sp->mutex)) {
> > > > + preempt_enable();
> > >
> > > Move the preempt_enable() before the "if", then get rid of the
> > > preempt_enable() after the "if" block.
> >
> > No can do. The preempt_enable() must follow the increment and
> > the memory barrier, otherwise the synchronize_sched() inside
> > synchronize_srcu() can't do its job.
>
> You misunderstood -- I was talking about the preempt_enable() in the last
> line quoted above (not the one in the third line) and the "if
> (mutex_trylock" (not the earlier "if (likely").
OK, I see your point -- but this has changed thoroughly with the
addition of the second element of hardluckref.
> > > > + if (sp->per_cpu_ref == NULL)
> > > > + sp->per_cpu_ref = alloc_srcu_struct_percpu();
> > >
> > > It would be cleaner to put the mutex_unlock() and closing '}' right here.
> >
> > I can move the mutex_unlock() to this point, but I cannot otherwise
> > merge the two following pieces of code -- at least not without doing
> > an otherwise-gratuitous preempt_disable(). Which I suppose I could
> > do, but seems like it would be more confusing than would the
> > separate code. I will play with this a bit and see if I can eliminate
> > the duplication.
>
> If you follow the advice above then you won't need to add a gratuitous
> preempt_disable(). Try it and see how it comes out; the idea is that
> you can use the same code for testing sp->per_cpu_ref regardless of
> whether the mutex_trylock() or the call to alloc_srcu_struct_percpu()
> succeeded.
Understood, finally -- but the two-element hardluckref now requires
greater preempt_disable() coverage.
> > > What happens if a prior reader failed to allocate the memory but this call
> > > succeeds? You need to check hardluckref before doing this. The same is
> > > true in srcu_read_lock().
> >
> > All accounted for by the fact that hardluckref is unconditionally
> > added in by srcu_readers_active(). Right?
>
> Yes, you're right.
>
> > Will spin a new patch...
>
> Good -- it's getting pretty messy to look at this one!
>
> By the way, I think the fastpath for synchronize_srcu() should be safe,
> now that you have added the memory barriers into srcu_read_lock() and
> srcu_read_unlock(). You might as well try putting it in.
>
> Although now that I look at it again, you have forgotten to put smp_mb()
> after the atomic_inc() call and before the atomic_dec(). In
> srcu_read_unlock() you could just move the existing smp_mb() back before
> the test of idx.
Good catch again -- added smp_mb__before_atomic_dec() and
smp_mb__after_atomic_inc(). The reason for avoiding moving the smp_mb()
is that atomic_dec() implies a memory barrier on some architectures,
such as x86. In these cases, smp_mb__before_atomic_dec() is a no-op.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> (was @us.ibm.com)
---
include/linux/srcu.h | 8 ---
kernel/srcu.c | 126 +++++++++++++++++++++++++++------------------------
2 files changed, 71 insertions(+), 63 deletions(-)
diff -urpNa -X dontdiff linux-2.6.19-rc5/include/linux/srcu.h linux-2.6.19-rc5-dsrcu/include/linux/srcu.h
--- linux-2.6.19-rc5/include/linux/srcu.h 2006-11-17 15:44:40.000000000 -0800
+++ linux-2.6.19-rc5-dsrcu/include/linux/srcu.h 2006-11-19 13:33:35.000000000 -0800
@@ -35,19 +35,15 @@ struct srcu_struct {
int completed;
struct srcu_struct_array *per_cpu_ref;
struct mutex mutex;
+ atomic_t hardluckref[2];
};
-#ifndef CONFIG_PREEMPT
-#define srcu_barrier() barrier()
-#else /* #ifndef CONFIG_PREEMPT */
-#define srcu_barrier()
-#endif /* #else #ifndef CONFIG_PREEMPT */
-
int init_srcu_struct(struct srcu_struct *sp);
void cleanup_srcu_struct(struct srcu_struct *sp);
int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
void synchronize_srcu(struct srcu_struct *sp);
long srcu_batches_completed(struct srcu_struct *sp);
+int srcu_readers_active(struct srcu_struct *sp);
#endif
diff -urpNa -X dontdiff linux-2.6.19-rc5/kernel/srcu.c linux-2.6.19-rc5-dsrcu/kernel/srcu.c
--- linux-2.6.19-rc5/kernel/srcu.c 2006-11-17 15:44:40.000000000 -0800
+++ linux-2.6.19-rc5-dsrcu/kernel/srcu.c 2006-11-19 13:40:33.000000000 -0800
@@ -34,6 +34,18 @@
#include <linux/smp.h>
#include <linux/srcu.h>
+/*
+ * Initialize the per-CPU array, returning the pointer.
+ */
+static inline struct srcu_struct_array *alloc_srcu_struct_percpu(void)
+{
+ struct srcu_struct_array *sap;
+
+ sap = alloc_percpu(struct srcu_struct_array);
+ smp_wmb();
+ return (sap);
+}
+
/**
* init_srcu_struct - initialize a sleep-RCU structure
* @sp: structure to initialize.
@@ -46,7 +58,9 @@ int init_srcu_struct(struct srcu_struct
{
sp->completed = 0;
mutex_init(&sp->mutex);
- sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
+ sp->per_cpu_ref = alloc_srcu_struct_percpu();
+ atomic_set(&sp->hardluckref[0], 0);
+ atomic_set(&sp->hardluckref[1], 0);
return (sp->per_cpu_ref ? 0 : -ENOMEM);
}
@@ -58,12 +72,15 @@ int init_srcu_struct(struct srcu_struct
static int srcu_readers_active_idx(struct srcu_struct *sp, int idx)
{
int cpu;
+ struct srcu_struct_array *sap;
int sum;
sum = 0;
- for_each_possible_cpu(cpu)
- sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx];
- return sum;
+ sap = rcu_dereference(sp->per_cpu_ref);
+ if (likely(sap != NULL))
+ for_each_possible_cpu(cpu)
+ sum += per_cpu_ptr(sap, cpu)->c[idx];
+ return sum + atomic_read(&sp->hardluckref[idx]);
}
/**
@@ -94,7 +111,8 @@ void cleanup_srcu_struct(struct srcu_str
WARN_ON(sum); /* Leakage unless caller handles error. */
if (sum != 0)
return;
- free_percpu(sp->per_cpu_ref);
+ if (sp->per_cpu_ref != NULL)
+ free_percpu(sp->per_cpu_ref);
sp->per_cpu_ref = NULL;
}
@@ -105,18 +123,41 @@ void cleanup_srcu_struct(struct srcu_str
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Must be called from process context.
* Returns an index that must be passed to the matching srcu_read_unlock().
+ * The index is mapped to negative numbers if the srcu_struct is not and
+ * cannot be initialized.
*/
int srcu_read_lock(struct srcu_struct *sp)
{
int idx;
+ struct srcu_struct_array *sap;
preempt_disable();
idx = sp->completed & 0x1;
barrier(); /* ensure compiler looks -once- at sp->completed. */
- per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++;
- srcu_barrier(); /* ensure compiler won't misorder critical section. */
+ sap = rcu_dereference(sp->per_cpu_ref);
+ if (likely(sap != NULL)) {
+ per_cpu_ptr(sap, smp_processor_id())->c[idx]++;
+ smp_mb();
+ preempt_enable();
+ return idx;
+ }
+ if (mutex_trylock(&sp->mutex)) {
+ preempt_enable();
+ if (sp->per_cpu_ref == NULL)
+ sp->per_cpu_ref = alloc_srcu_struct_percpu();
+ if (sp->per_cpu_ref == NULL) {
+ mutex_unlock(&sp->mutex);
+ preempt_disable();
+ idx = sp->completed & 0x1;
+ } else {
+ mutex_unlock(&sp->mutex);
+ return srcu_read_lock(sp);
+ }
+ }
+ atomic_inc(&sp->hardluckref[idx]);
+ smp_mb__after_atomic_inc();
preempt_enable();
- return idx;
+ return -1 - idx;
}
/**
@@ -131,10 +172,16 @@ int srcu_read_lock(struct srcu_struct *s
*/
void srcu_read_unlock(struct srcu_struct *sp, int idx)
{
- preempt_disable();
- srcu_barrier(); /* ensure compiler won't misorder critical section. */
- per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
- preempt_enable();
+ if (likely(idx <= 0)) {
+ preempt_disable();
+ smp_mb();
+ per_cpu_ptr(rcu_dereference(sp->per_cpu_ref),
+ smp_processor_id())->c[idx]--;
+ preempt_enable();
+ return;
+ }
+ smp_mb__before_atomic_dec();
+ atomic_dec(&sp->hardluckref[-1 - idx]);
}
/**
@@ -158,6 +205,11 @@ void synchronize_srcu(struct srcu_struct
idx = sp->completed;
mutex_lock(&sp->mutex);
+ /* Initialize if not already initialized. */
+
+ if (sp->per_cpu_ref == NULL)
+ sp->per_cpu_ref = alloc_srcu_struct_percpu();
+
/*
* Check to see if someone else did the work for us while we were
* waiting to acquire the lock. We need -two- advances of
@@ -173,65 +225,25 @@ void synchronize_srcu(struct srcu_struct
return;
}
- synchronize_sched(); /* Force memory barrier on all CPUs. */
-
- /*
- * The preceding synchronize_sched() ensures that any CPU that
- * sees the new value of sp->completed will also see any preceding
- * changes to data structures made by this CPU. This prevents
- * some other CPU from reordering the accesses in its SRCU
- * read-side critical section to precede the corresponding
- * srcu_read_lock() -- ensuring that such references will in
- * fact be protected.
- *
- * So it is now safe to do the flip.
- */
-
+ smp_mb(); /* ensure srcu_read_lock() sees prior change first! */
idx = sp->completed & 0x1;
sp->completed++;
- synchronize_sched(); /* Force memory barrier on all CPUs. */
+ synchronize_sched();
/*
* At this point, because of the preceding synchronize_sched(),
* all srcu_read_lock() calls using the old counters have completed.
* Their corresponding critical sections might well be still
* executing, but the srcu_read_lock() primitives themselves
- * will have finished executing.
+ * will have finished executing. The "old" rank of counters
+ * can therefore only decrease, never increase in value.
*/
while (srcu_readers_active_idx(sp, idx))
schedule_timeout_interruptible(1);
- synchronize_sched(); /* Force memory barrier on all CPUs. */
-
- /*
- * The preceding synchronize_sched() forces all srcu_read_unlock()
- * primitives that were executing concurrently with the preceding
- * for_each_possible_cpu() loop to have completed by this point.
- * More importantly, it also forces the corresponding SRCU read-side
- * critical sections to have also completed, and the corresponding
- * references to SRCU-protected data items to be dropped.
- *
- * Note:
- *
- * Despite what you might think at first glance, the
- * preceding synchronize_sched() -must- be within the
- * critical section ended by the following mutex_unlock().
- * Otherwise, a task taking the early exit can race
- * with a srcu_read_unlock(), which might have executed
- * just before the preceding srcu_readers_active() check,
- * and whose CPU might have reordered the srcu_read_unlock()
- * with the preceding critical section. In this case, there
- * is nothing preventing the synchronize_sched() task that is
- * taking the early exit from freeing a data structure that
- * is still being referenced (out of order) by the task
- * doing the srcu_read_unlock().
- *
- * Alternatively, the comparison with "2" on the early exit
- * could be changed to "3", but this increases synchronize_srcu()
- * latency for bulk loads. So the current code is preferred.
- */
+ smp_mb(); /* must see critical section prior to srcu_read_unlock() */
mutex_unlock(&sp->mutex);
}
next prev parent reply other threads:[~2006-11-19 21:46 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-16 20:00 BUG: cpufreq notification broken Thomas Gleixner
2006-11-16 20:15 ` [patch] cpufreq: mark cpufreq_tsc() as core_initcall_sync Ingo Molnar
2006-11-16 20:45 ` Thomas Gleixner
2006-11-16 21:47 ` Linus Torvalds
2006-11-16 22:03 ` Alan Stern
2006-11-16 22:21 ` Linus Torvalds
2006-11-17 3:06 ` Alan Stern
2006-11-17 6:51 ` Paul E. McKenney
2006-11-17 9:29 ` Jens Axboe
2006-11-17 18:39 ` Oleg Nesterov
2006-11-18 0:28 ` Paul E. McKenney
2006-11-18 16:15 ` Alan Stern
2006-11-18 17:14 ` Paul E. McKenney
2006-11-18 19:34 ` Oleg Nesterov
2006-11-19 21:26 ` Paul E. McKenney
2006-11-18 21:00 ` Alan Stern
2006-11-18 21:25 ` Oleg Nesterov
2006-11-18 22:13 ` Alan Stern
2006-11-18 22:46 ` Oleg Nesterov
2006-11-19 20:12 ` Alan Stern
2006-11-19 21:43 ` Paul E. McKenney [this message]
2006-11-20 17:19 ` Alan Stern
2006-11-20 17:58 ` Jens Axboe
2006-11-20 19:39 ` Alan Stern
2006-11-20 20:13 ` Jens Axboe
2006-11-20 21:39 ` Alan Stern
2006-11-21 7:39 ` Jens Axboe
2006-11-20 18:57 ` Oleg Nesterov
2006-11-20 20:01 ` Alan Stern
2006-11-20 20:51 ` Paul E. McKenney
2006-11-21 20:04 ` Oleg Nesterov
2006-11-21 20:54 ` Alan Stern
2006-11-21 22:07 ` Paul E. McKenney
2006-11-20 20:38 ` Paul E. McKenney
2006-11-21 16:44 ` Oleg Nesterov
2006-11-21 19:56 ` Paul E. McKenney
2006-11-21 20:26 ` Alan Stern
2006-11-21 23:03 ` Paul E. McKenney
2006-11-22 2:17 ` Alan Stern
2006-11-22 17:01 ` Paul E. McKenney
2006-11-26 22:25 ` Oleg Nesterov
2006-11-27 21:10 ` Alan Stern
2006-11-28 1:47 ` Paul E. McKenney
2006-11-20 19:17 ` Paul E. McKenney
2006-11-20 20:22 ` Alan Stern
2006-11-21 17:55 ` Paul E. McKenney
2006-11-21 17:56 ` Alan Stern
2006-11-21 19:13 ` Paul E. McKenney
2006-11-21 20:40 ` Alan Stern
2006-11-22 18:08 ` Paul E. McKenney
2006-11-21 21:01 ` Oleg Nesterov
2006-11-22 0:51 ` Paul E. McKenney
2006-11-18 18:46 ` Oleg Nesterov
2006-11-19 21:07 ` Paul E. McKenney
2006-11-20 7:15 ` Jens Axboe
2006-11-20 16:59 ` Paul E. McKenney
2006-11-20 17:55 ` Jens Axboe
2006-11-20 20:09 ` Paul E. McKenney
2006-11-20 20:21 ` Jens Axboe
2006-11-18 19:02 ` Oleg Nesterov
2006-11-19 21:23 ` Paul E. McKenney
2006-11-17 19:15 ` Paul E. McKenney
2006-11-17 19:27 ` Alan Stern
2006-11-18 0:38 ` Paul E. McKenney
2006-11-18 4:33 ` Alan Stern
2006-11-18 4:51 ` Andrew Morton
2006-11-18 5:57 ` Paul E. McKenney
2006-11-19 19:00 ` Oleg Nesterov
2006-11-19 20:21 ` Alan Stern
2006-11-19 20:55 ` Oleg Nesterov
2006-11-19 21:09 ` Alan Stern
2006-11-19 21:17 ` Oleg Nesterov
2006-11-19 21:54 ` Paul E. McKenney
2006-11-19 22:28 ` Oleg Nesterov
2006-11-20 5:47 ` Paul E. McKenney
2006-11-19 21:20 ` Paul E. McKenney
2006-11-19 21:50 ` Oleg Nesterov
2006-11-19 22:04 ` Paul E. McKenney
2006-11-23 14:59 ` Oleg Nesterov
2006-11-23 20:40 ` Paul E. McKenney
2006-11-23 21:34 ` Oleg Nesterov
2006-11-23 21:49 ` Oleg Nesterov
2006-11-27 4:33 ` Paul E. McKenney
2006-11-24 18:21 ` Oleg Nesterov
2006-11-24 20:04 ` Jens Axboe
2006-11-24 20:47 ` Alan Stern
2006-11-24 21:13 ` Oleg Nesterov
2006-11-25 3:24 ` Alan Stern
2006-11-25 17:14 ` Oleg Nesterov
2006-11-25 22:06 ` Alan Stern
2006-11-26 21:34 ` Oleg Nesterov
2006-11-27 5:02 ` Paul E. McKenney
2006-11-27 16:11 ` Oleg Nesterov
2006-11-27 16:56 ` Oleg Nesterov
2006-11-29 19:29 ` Paul E. McKenney
2006-11-29 20:16 ` Oleg Nesterov
2006-11-29 23:08 ` Paul E. McKenney
2006-11-30 0:01 ` Oleg Nesterov
2006-11-17 2:33 ` Paul E. McKenney
2006-11-16 20:52 ` Peter Zijlstra
2006-11-16 21:20 ` Andrew Morton
2006-11-16 21:27 ` Thomas Gleixner
2006-11-20 19:57 ` Linus Torvalds
2006-11-16 20:27 ` BUG: cpufreq notification broken Alan Stern
2006-11-16 21:09 ` Thomas Gleixner
2006-11-16 21:26 ` Alan Stern
2006-11-16 21:36 ` Thomas Gleixner
2006-11-16 21:56 ` Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061119214315.GI4427@us.ibm.com \
--to=paulmck@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@tv-sign.ru \
--cc=paulmck@linux.vnet.ibm.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.