From: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Nick Piggin <npiggin@kernel.dk>,
Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] percpu_counter : add percpu_counter_add_fast()
Date: Thu, 21 Oct 2010 18:55:16 -0700 [thread overview]
Message-ID: <20101021185516.be13a83f.akpm@linux-foundation.org> (raw)
In-Reply-To: <20101021174536.26213ab7.akpm@linux-foundation.org>
On Thu, 21 Oct 2010 17:45:36 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> this_cpu_add_return() isn't really needed in this application.
>
> {
> this_cpu_add(*fbc->counters, amount);
> if (unlikely(abs(this_cpu_read(*fbc->counters)) > fbc->batch))
> out_of_line_stuff();
> }
>
> will work just fine.
Did that. Got alarmed at a few things.
The compiler cannot CSE the above code - it has to reload the percpu
base each time. Doing it by hand:
{
long *p;
p = this_cpu_ptr(fbc->counters);
*p += amount;
if (unlikely(abs(*p) > fbc->batch))
out_of_line_stuff();
}
generates better code.
So this:
static __always_inline void percpu_counter_add_batch(struct percpu_counter *fbc,
s64 amount, long batch)
{
long *pcounter;
preempt_disable();
pcounter = this_cpu_ptr(fbc->counters);
*pcounter += amount;
if (unlikely(abs(*pcounter) >= batch))
percpu_counter_handle_overflow(fbc);
preempt_enable();
}
when compiling this:
--- a/lib/proportions.c~b
+++ a/lib/proportions.c
@@ -263,6 +263,11 @@ void __prop_inc_percpu(struct prop_descr
prop_put_global(pd, pg);
}
+void foo(struct prop_local_percpu *pl)
+{
+ percpu_counter_add(&pl->events, 1);
+}
+
/*
* identical to __prop_inc_percpu, except that it limits this pl's fraction to
* @frac/PROP_FRAC_BASE by ignoring events when this limit has been exceeded.
comes down to
.globl foo
.type foo, @function
foo:
pushq %rbp #
movslq percpu_counter_batch(%rip),%rcx # percpu_counter_batch, batch
movq 96(%rdi), %rdx # <variable>.counters, tcp_ptr__
movq %rsp, %rbp #,
#APP
add %gs:this_cpu_off, %rdx # this_cpu_off, tcp_ptr__
#NO_APP
movq (%rdx), %rax #* tcp_ptr__, D.11817
incq %rax # D.11817
movq %rax, (%rdx) # D.11817,* tcp_ptr__
cqto
xorq %rdx, %rax # tmp67, D.11817
subq %rdx, %rax # tmp67, D.11817
cmpq %rcx, %rax # batch, D.11817
jl .L33 #,
call percpu_counter_handle_overflow #
.L33:
leave
ret
But what's really alarming is that the compiler (4.0.2) is cheerily
ignoring the inline directives and was generating out-of-line versions
of most of the percpu_counter.h functions into lib/proportions.s.
That's rather a worry.
lib/proportions.o got rather larger as a result of inlining things and
it's not obvious that it's all a net benefit.
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
Nick Piggin <npiggin@kernel.dk>,
Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-ker
Subject: Re: [PATCH] percpu_counter : add percpu_counter_add_fast()
Date: Thu, 21 Oct 2010 18:55:16 -0700 [thread overview]
Message-ID: <20101021185516.be13a83f.akpm@linux-foundation.org> (raw)
In-Reply-To: <20101021174536.26213ab7.akpm@linux-foundation.org>
On Thu, 21 Oct 2010 17:45:36 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> this_cpu_add_return() isn't really needed in this application.
>
> {
> this_cpu_add(*fbc->counters, amount);
> if (unlikely(abs(this_cpu_read(*fbc->counters)) > fbc->batch))
> out_of_line_stuff();
> }
>
> will work just fine.
Did that. Got alarmed at a few things.
The compiler cannot CSE the above code - it has to reload the percpu
base each time. Doing it by hand:
{
long *p;
p = this_cpu_ptr(fbc->counters);
*p += amount;
if (unlikely(abs(*p) > fbc->batch))
out_of_line_stuff();
}
generates better code.
So this:
static __always_inline void percpu_counter_add_batch(struct percpu_counter *fbc,
s64 amount, long batch)
{
long *pcounter;
preempt_disable();
pcounter = this_cpu_ptr(fbc->counters);
*pcounter += amount;
if (unlikely(abs(*pcounter) >= batch))
percpu_counter_handle_overflow(fbc);
preempt_enable();
}
when compiling this:
--- a/lib/proportions.c~b
+++ a/lib/proportions.c
@@ -263,6 +263,11 @@ void __prop_inc_percpu(struct prop_descr
prop_put_global(pd, pg);
}
+void foo(struct prop_local_percpu *pl)
+{
+ percpu_counter_add(&pl->events, 1);
+}
+
/*
* identical to __prop_inc_percpu, except that it limits this pl's fraction to
* @frac/PROP_FRAC_BASE by ignoring events when this limit has been exceeded.
comes down to
.globl foo
.type foo, @function
foo:
pushq %rbp #
movslq percpu_counter_batch(%rip),%rcx # percpu_counter_batch, batch
movq 96(%rdi), %rdx # <variable>.counters, tcp_ptr__
movq %rsp, %rbp #,
#APP
add %gs:this_cpu_off, %rdx # this_cpu_off, tcp_ptr__
#NO_APP
movq (%rdx), %rax #* tcp_ptr__, D.11817
incq %rax # D.11817
movq %rax, (%rdx) # D.11817,* tcp_ptr__
cqto
xorq %rdx, %rax # tmp67, D.11817
subq %rdx, %rax # tmp67, D.11817
cmpq %rcx, %rax # batch, D.11817
jl .L33 #,
call percpu_counter_handle_overflow #
.L33:
leave
ret
But what's really alarming is that the compiler (4.0.2) is cheerily
ignoring the inline directives and was generating out-of-line versions
of most of the percpu_counter.h functions into lib/proportions.s.
That's rather a worry.
lib/proportions.o got rather larger as a result of inlining things and
it's not obvious that it's all a net benefit.
next prev parent reply other threads:[~2010-10-22 1:55 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-29 12:18 [PATCH 0/17] fs: Inode cache scalability Dave Chinner
2010-09-29 12:18 ` [PATCH 01/17] kernel: add bl_list Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-10-16 7:55 ` Nick Piggin
2010-10-16 16:28 ` Christoph Hellwig
2010-10-01 5:48 ` Christoph Hellwig
2010-09-29 12:18 ` [PATCH 02/17] fs: icache lock s_inodes list Dave Chinner
2010-10-01 5:49 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-10-16 16:12 ` Christoph Hellwig
2010-10-16 17:09 ` Nick Piggin
2010-10-17 0:42 ` Christoph Hellwig
2010-10-17 2:03 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 03/17] fs: icache lock inode hash Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-09-30 6:13 ` Dave Chinner
2010-10-01 6:06 ` Christoph Hellwig
2010-10-16 7:57 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 04/17] fs: icache lock i_state Dave Chinner
2010-10-01 5:54 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 05/17] fs: icache lock i_count Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-10-01 5:55 ` Christoph Hellwig
2010-10-01 6:04 ` Andrew Morton
2010-10-01 6:16 ` Christoph Hellwig
2010-10-01 6:23 ` Andrew Morton
2010-09-29 12:18 ` [PATCH 06/17] fs: icache lock lru/writeback lists Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-09-30 6:16 ` Dave Chinner
2010-10-16 7:55 ` Nick Piggin
2010-10-01 6:01 ` Christoph Hellwig
2010-10-05 22:30 ` Dave Chinner
2010-09-29 12:18 ` [PATCH 07/17] fs: icache atomic inodes_stat Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-09-30 6:20 ` Dave Chinner
2010-09-30 6:37 ` Andrew Morton
2010-10-16 7:56 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 08/17] fs: icache protect inode state Dave Chinner
2010-10-01 6:02 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 09/17] fs: Make last_ino, iunique independent of inode_lock Dave Chinner
2010-09-30 4:53 ` Andrew Morton
2010-10-01 6:08 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 10/17] fs: icache remove inode_lock Dave Chinner
2010-09-29 12:18 ` [PATCH 11/17] fs: Factor inode hash operations into functions Dave Chinner
2010-10-01 6:06 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 12/17] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-09-30 1:52 ` Christoph Hellwig
2010-09-30 2:43 ` Dave Chinner
2010-10-16 7:55 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 13/17] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-09-30 2:05 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 14/17] fs: Inode counters do not need to be atomic Dave Chinner
2010-09-29 12:18 ` [PATCH 15/17] fs: inode per-cpu last_ino allocator Dave Chinner
2010-09-30 2:07 ` Christoph Hellwig
2010-10-06 6:29 ` Dave Chinner
2010-10-06 8:51 ` Christoph Hellwig
2010-09-30 4:53 ` Andrew Morton
2010-09-30 5:36 ` Eric Dumazet
2010-09-30 7:53 ` Eric Dumazet
2010-09-30 7:53 ` Eric Dumazet
2010-09-30 8:14 ` Andrew Morton
2010-09-30 10:22 ` [PATCH] " Eric Dumazet
2010-09-30 16:45 ` Andrew Morton
2010-09-30 17:28 ` Eric Dumazet
2010-09-30 17:28 ` Eric Dumazet
2010-09-30 17:39 ` Andrew Morton
2010-09-30 18:05 ` Eric Dumazet
2010-10-01 6:12 ` Christoph Hellwig
2010-10-01 6:45 ` Eric Dumazet
2010-10-01 6:45 ` Eric Dumazet
2010-10-16 6:36 ` Nick Piggin
2010-10-16 6:40 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 16/17] fs: Convert nr_inodes to a per-cpu counter Dave Chinner
2010-09-30 2:12 ` Christoph Hellwig
2010-09-30 4:53 ` Andrew Morton
2010-09-30 6:10 ` Dave Chinner
2010-10-16 7:55 ` Nick Piggin
2010-10-16 8:29 ` Eric Dumazet
2010-10-16 8:29 ` Eric Dumazet
2010-10-16 9:07 ` Andrew Morton
2010-10-16 9:31 ` Eric Dumazet
2010-10-16 9:31 ` Eric Dumazet
2010-10-16 14:19 ` [PATCH] percpu_counter : add percpu_counter_add_fast() Eric Dumazet
2010-10-18 15:24 ` Christoph Lameter
2010-10-18 15:39 ` Eric Dumazet
2010-10-18 15:39 ` Eric Dumazet
2010-10-18 16:12 ` Christoph Lameter
2010-10-21 22:37 ` Andrew Morton
2010-10-21 23:10 ` Christoph Lameter
2010-10-22 0:45 ` Andrew Morton
2010-10-22 1:55 ` Andrew Morton [this message]
2010-10-22 1:55 ` Andrew Morton
2010-10-22 1:58 ` Nick Piggin
2010-10-22 2:14 ` Andrew Morton
2010-10-22 4:12 ` Eric Dumazet
2010-10-22 4:12 ` Eric Dumazet
2010-10-21 22:43 ` Andrew Morton
2010-10-21 22:58 ` Eric Dumazet
2010-10-21 23:18 ` Andrew Morton
2010-10-21 23:22 ` Eric Dumazet
2010-10-21 23:22 ` Eric Dumazet
2010-10-21 22:31 ` [PATCH 16/17] fs: Convert nr_inodes to a per-cpu counter Andrew Morton
2010-10-21 22:58 ` Eric Dumazet
2010-10-02 16:02 ` Christoph Hellwig
2010-09-29 12:18 ` [PATCH 17/17] fs: Clean up inode reference counting Dave Chinner
2010-09-30 2:15 ` Christoph Hellwig
2010-10-16 7:55 ` Nick Piggin
2010-10-16 16:14 ` Christoph Hellwig
2010-10-16 17:09 ` Nick Piggin
2010-09-30 4:53 ` Andrew Morton
2010-09-29 23:57 ` [PATCH 0/17] fs: Inode cache scalability Christoph Hellwig
2010-09-30 0:24 ` Dave Chinner
2010-09-30 2:21 ` Christoph Hellwig
2010-10-02 23:10 ` Carlos Carvalho
2010-10-04 7:22 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101021185516.be13a83f.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=david@fromorbit.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@kernel.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.