From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Rusty Russell <rusty@rustcorp.com.au>,
Theodore Tso <tytso@mit.edu>,
linux kernel <linux-kernel@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Mingming Cao <cmm@us.ibm.com>,
linux-ext4@vger.kernel.org,
Christoph Lameter <cl@linux-foundation.org>
Subject: Re: [PATCH] percpu_counter: use local_t and atomic_long_t if possible
Date: Fri, 12 Dec 2008 12:29:37 +0100 [thread overview]
Message-ID: <1229081377.12883.134.camel@twins> (raw)
In-Reply-To: <49424637.3010107@cosmosbay.com>
On Fri, 2008-12-12 at 12:08 +0100, Eric Dumazet wrote:
> After discussions on percpu_counter subject, I cooked following patch
>
> My goals were :
>
> - IRQ safe percpu_counter (needed for net-next-2.6)
> z- 64bit platforms can avoid spin_lock and reduce size of percpu_counter
> - No increase of API
>
> Final result, on x86_64, __percpu_counter_add() is really fast and irq safe :
> Changes are :
>
> We use local_t instead of s32 for the local storage (for each cpu)
do enough arches have a sane enough local_t implementation so this
doesn't make things worse for them?
> We use atomic_long_t instead of s64 on 64bit arches, to avoid spin_lock.
>
> On 32bit arches, we guard the shared s64 value with an irqsafe spin_lock.
> As this spin_lock is not taken in fast path, this should not make a real
> difference.
Cycles are cycles, and spin_lock_irqsave is more expensive than
spin_lock_irq is more expensive than spin_lock, but sure, it looks good.
I really like the code, however my worry is that we don't regress weird
archs too much.
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> ---
> include/linux/percpu_counter.h | 38 +++++++++--
> lib/percpu_counter.c | 104 ++++++++++++++++++-------------
> 2 files changed, 94 insertions(+), 48 deletions(-)
>
> diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
> index 9007ccd..f5133ce 100644
> --- a/include/linux/percpu_counter.h
> +++ b/include/linux/percpu_counter.h
> @@ -12,16 +12,42 @@
> #include <linux/threads.h>
> #include <linux/percpu.h>
> #include <linux/types.h>
> +#include <asm/local.h>
>
> #ifdef CONFIG_SMP
>
> +#ifdef CONFIG_64BIT
> +struct s64_counter {
> + atomic_long_t val;
> +};
> +
> +static inline s64 s64c_read(struct s64_counter *c)
> +{
> + return atomic_long_read(&c->val);
> +}
> +#else
> +struct s64_counter {
> + spinlock_t lock;
> + s64 val;
> +};
> +
> +static inline s64 s64c_read(struct s64_counter *c)
> +{
> + /*
> + * Previous percpu_counter implementation used to
> + * read s64 without locking. Thats racy.
> + */
Does this comment have any value besides archelogical? but yeah, that
was a known issue, there were some seqlock patches floating around
trying to address this.
Here I'd suggest taking that lock and fixing that race.
> + return c->val;
> +}
> +
> +#endif
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index b255b93..6ef4a44 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -14,35 +14,58 @@ static LIST_HEAD(percpu_counters);
> static DEFINE_MUTEX(percpu_counters_lock);
> #endif
>
> +#ifdef CONFIG_64BIT
> +static inline void s64c_add(s64 amount, struct s64_counter *c)
> +{
> + atomic_long_add(amount, &c->val);
> +}
> +
> +static inline void s64c_set(struct s64_counter *c, s64 amount)
> +{
> + atomic_long_set(&c->val, amount);
> +}
> +
> +#else
> +
> +static inline void s64c_add(s64 amount, struct s64_counter *c)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&c->lock, flags);
> + c->val += amount;
> + spin_unlock_irqrestore(&c->lock, flags);
> +}
> +
> +static inline void s64c_set(struct s64_counter *c, s64 amount)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&c->lock, flags);
> + c->val = amount;
> + spin_unlock_irqrestore(&c->lock, flags);
> +}
> +#endif /* CONFIG_64BIT */
Since they're inline's anyway, does it look better to stick them in the
header along with s64c_read() ?
> void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
> {
> int cpu;
>
> + for_each_possible_cpu(cpu)
> + local_set(per_cpu_ptr(fbc->counters, cpu), 0);
> + s64c_set(&fbc->counter, amount);
> }
Did we document somewhere that this function is racy and only meant as
initialization?
> +void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, long batch)
> {
> + long count;
> + local_t *pcount;
> +
> + pcount = per_cpu_ptr(fbc->counters, get_cpu());
> + count = local_add_return(amount, pcount);
> + if (unlikely(count >= batch || count <= -batch)) {
> + local_sub(count, pcount);
> + s64c_add(count, &fbc->counter);
> }
> put_cpu();
> }
very neat.
> @@ -91,8 +111,13 @@ int percpu_counter_init_irq(struct percpu_counter *fbc, s64 amount)
> int err;
>
> err = percpu_counter_init(fbc, amount);
> - if (!err)
> - lockdep_set_class(&fbc->lock, &percpu_counter_irqsafe);
> +#ifndef CONFIG_64BIT
> + if (!err) {
> + static struct lock_class_key percpu_counter_irqsafe;
> +
> + lockdep_set_class(&fbc->counter.lock, &percpu_counter_irqsafe);
> + }
> +#endif
Since they're all irqsafe can this be removed?
> return err;
> }
next prev parent reply other threads:[~2008-12-12 11:29 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-03 18:40 [PATCH] percpu_counter: fix CPU unplug race in percpu_counter_destroy() Eric Dumazet
2008-12-03 20:24 ` [PATCH] percpu_counter: Fix __percpu_counter_sum() Eric Dumazet
2008-12-03 20:45 ` Peter Zijlstra
2008-12-04 6:14 ` David Miller
2008-12-07 4:22 ` Andrew Morton
2008-12-07 4:22 ` Andrew Morton
2008-12-07 10:25 ` Peter Zijlstra
2008-12-07 13:28 ` Eric Dumazet
2008-12-07 13:28 ` Eric Dumazet
2008-12-07 17:28 ` Andrew Morton
2008-12-07 18:00 ` Eric Dumazet
2008-12-07 18:00 ` Eric Dumazet
2008-12-08 4:52 ` Andrew Morton
2008-12-08 22:12 ` Theodore Tso
2008-12-08 22:20 ` Peter Zijlstra
2008-12-08 23:00 ` Theodore Tso
2008-12-08 23:05 ` Peter Zijlstra
2008-12-08 23:08 ` Peter Zijlstra
2008-12-09 8:12 ` Eric Dumazet
2008-12-09 8:12 ` Eric Dumazet
2008-12-09 8:34 ` Peter Zijlstra
2008-12-09 8:34 ` Peter Zijlstra
2008-12-10 5:09 ` Eric Dumazet
2008-12-10 5:49 ` Andrew Morton
2008-12-10 22:56 ` Eric Dumazet
2008-12-10 22:56 ` Eric Dumazet
2008-12-12 8:17 ` Rusty Russell
2008-12-12 8:22 ` Eric Dumazet
2008-12-12 8:22 ` Eric Dumazet
2008-12-12 11:08 ` [PATCH] percpu_counter: use local_t and atomic_long_t if possible Eric Dumazet
2008-12-12 11:08 ` Eric Dumazet
2008-12-12 11:29 ` Peter Zijlstra [this message]
2008-12-23 11:43 ` Peter Zijlstra
2008-12-25 13:26 ` Rusty Russell
2008-12-15 12:53 ` [PATCH] percpu_counter: Fix __percpu_counter_sum() Rusty Russell
2008-12-16 20:16 ` Ingo Molnar
2008-12-10 7:12 ` Peter Zijlstra
2008-12-08 23:07 ` Andrew Morton
2008-12-08 23:49 ` Theodore Tso
2008-12-08 22:22 ` Andrew Morton
2008-12-08 22:44 ` Mingming Cao
2008-12-08 22:44 ` Mingming Cao
2008-12-07 22:24 ` [PATCH] atomic: fix a typo in atomic_long_xchg() Eric Dumazet
2008-12-07 22:24 ` Eric Dumazet
2008-12-07 15:28 ` [PATCH] percpu_counter: Fix __percpu_counter_sum() Theodore Tso
2008-12-08 4:42 ` Andrew Morton
2008-12-08 17:55 ` Mingming Cao
2008-12-08 17:55 ` Mingming Cao
2008-12-11 16:32 ` [stable] " Greg KH
2008-12-08 17:44 ` Mingming Cao
2008-12-08 17:44 ` Mingming Cao
2008-12-04 6:13 ` [PATCH] percpu_counter: fix CPU unplug race in percpu_counter_destroy() David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1229081377.12883.134.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=cmm@us.ibm.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.