From: Andrew Morton <akpm@linux-foundation.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>,
Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 16/17] fs: Convert nr_inodes to a per-cpu counter
Date: Sat, 16 Oct 2010 02:07:44 -0700 [thread overview]
Message-ID: <20101016020744.366bd9c6.akpm@linux-foundation.org> (raw)
In-Reply-To: <1287217748.2799.68.camel@edumazet-laptop>
On Sat, 16 Oct 2010 10:29:08 +0200 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le samedi 16 octobre 2010 __ 18:55 +1100, Nick Piggin a __crit :
> > On Thu, Sep 30, 2010 at 04:10:39PM +1000, Dave Chinner wrote:
> > > On Wed, Sep 29, 2010 at 09:53:22PM -0700, Andrew Morton wrote:
> > > > On Wed, 29 Sep 2010 22:18:48 +1000 Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > > From: Eric Dumazet <dada1@cosmosbay.com>
> > > > >
> > > > > The number of inodes allocated does not need to be tied to the
> > > > > addition or removal of an inode to/from a list. If we are not tied
> > > > > to a list lock, we could update the counters when inodes are
> > > > > initialised or destroyed, but to do that we need to convert the
> > > > > counters to be per-cpu (i.e. independent of a lock). This means that
> > > > > we have the freedom to change the list/locking implementation
> > > > > without needing to care about the counters.
> > > > >
> > > > >
> > > > > ...
> > > > >
> > > > > +int get_nr_inodes(void)
> > > > > +{
> > > > > + int i;
> > > > > + int sum = 0;
> > > > > + for_each_possible_cpu(i)
> > > > > + sum += per_cpu(nr_inodes, i);
> > > > > + return sum < 0 ? 0 : sum;
> > > > > +}
> > > >
> > > > This reimplements percpu_counter_sum_positive(), rather poorly
> >
> > Why is it poorly?
>
> Nick
>
> Some people believe percpu_counter object is the right answer to such
> distributed counters, because the loop is done on 'online' cpus instead
> of 'possible' cpus. "It must be better if number of possible cpus is
> 4096 and only one or two cpus are online"...
>
> But if we do this loop only on rare events, like
> "cat /proc/sys/fs/inode-nr", then the percpu_counter() is more
> expensive, because percpu_add() _is_ more expensive :
>
> - Its a function call and lot of instructions/cycles per call, while
> this_cpu_inc(nr_inodes) is a single instruction, using no register on
> x86.
You want an inlined percpu_counter_inc() then write one! Bonus points
for writing this_cpu_add_return() and doing it without a
preempt_disable(). It collapses to just a few instructions.
It's extremely poor form to say "oh X sucks so I'm going to implement
my own" without first addressing why X allegedly sucks.
> - Its possibly accessing a shared spinlock and counter when the percpu
> counter reaches the batch limit.
That's in the noise foor.
> To recap : nr_inodes is not a counter that needs to be estimated in real
> time, since we have not limit on number of inodes in the machine (limit
> is the memory allocator).
>
> Unless someone can prove "cat /proc/sys/fs/inode-nr" must be performed
> thousand of times per second on their setup, the choice I made to scale
> nr_inodes is better over the 'obvious percpu_counter choice'
>
> This choice was made to scale some counters in network stack some years
> ago, and this rocks.
And we get open-coded reimplementations of the same damn thing all over
the tree.
next prev parent reply other threads:[~2010-10-16 9:07 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-29 12:18 [PATCH 0/17] fs: Inode cache scalability Dave Chinner
2010-09-29 12:18 ` [PATCH 01/17] kernel: add bl_list Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-10-16 7:55 ` Nick Piggin
2010-10-16 16:28 ` Christoph Hellwig
2010-10-01 5:48 ` Christoph Hellwig
2010-09-29 12:18 ` [PATCH 02/17] fs: icache lock s_inodes list Dave Chinner
2010-10-01 5:49 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-10-16 16:12 ` Christoph Hellwig
2010-10-16 17:09 ` Nick Piggin
2010-10-17 0:42 ` Christoph Hellwig
2010-10-17 2:03 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 03/17] fs: icache lock inode hash Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-09-30 6:13 ` Dave Chinner
2010-10-01 6:06 ` Christoph Hellwig
2010-10-16 7:57 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 04/17] fs: icache lock i_state Dave Chinner
2010-10-01 5:54 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 05/17] fs: icache lock i_count Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-10-01 5:55 ` Christoph Hellwig
2010-10-01 6:04 ` Andrew Morton
2010-10-01 6:16 ` Christoph Hellwig
2010-10-01 6:23 ` Andrew Morton
2010-09-29 12:18 ` [PATCH 06/17] fs: icache lock lru/writeback lists Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-09-30 6:16 ` Dave Chinner
2010-10-16 7:55 ` Nick Piggin
2010-10-01 6:01 ` Christoph Hellwig
2010-10-05 22:30 ` Dave Chinner
2010-09-29 12:18 ` [PATCH 07/17] fs: icache atomic inodes_stat Dave Chinner
2010-09-30 4:52 ` Andrew Morton
2010-09-30 6:20 ` Dave Chinner
2010-09-30 6:37 ` Andrew Morton
2010-10-16 7:56 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 08/17] fs: icache protect inode state Dave Chinner
2010-10-01 6:02 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 09/17] fs: Make last_ino, iunique independent of inode_lock Dave Chinner
2010-09-30 4:53 ` Andrew Morton
2010-10-01 6:08 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 10/17] fs: icache remove inode_lock Dave Chinner
2010-09-29 12:18 ` [PATCH 11/17] fs: Factor inode hash operations into functions Dave Chinner
2010-10-01 6:06 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 12/17] fs: Introduce per-bucket inode hash locks Dave Chinner
2010-09-30 1:52 ` Christoph Hellwig
2010-09-30 2:43 ` Dave Chinner
2010-10-16 7:55 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 13/17] fs: Implement lazy LRU updates for inodes Dave Chinner
2010-09-30 2:05 ` Christoph Hellwig
2010-10-16 7:54 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 14/17] fs: Inode counters do not need to be atomic Dave Chinner
2010-09-29 12:18 ` [PATCH 15/17] fs: inode per-cpu last_ino allocator Dave Chinner
2010-09-30 2:07 ` Christoph Hellwig
2010-10-06 6:29 ` Dave Chinner
2010-10-06 8:51 ` Christoph Hellwig
2010-09-30 4:53 ` Andrew Morton
2010-09-30 5:36 ` Eric Dumazet
2010-09-30 7:53 ` Eric Dumazet
2010-09-30 8:14 ` Andrew Morton
2010-09-30 10:22 ` [PATCH] " Eric Dumazet
2010-09-30 16:45 ` Andrew Morton
2010-09-30 17:28 ` Eric Dumazet
2010-09-30 17:39 ` Andrew Morton
2010-09-30 18:05 ` Eric Dumazet
2010-10-01 6:12 ` Christoph Hellwig
2010-10-01 6:45 ` Eric Dumazet
2010-10-16 6:36 ` Nick Piggin
2010-10-16 6:40 ` Nick Piggin
2010-09-29 12:18 ` [PATCH 16/17] fs: Convert nr_inodes to a per-cpu counter Dave Chinner
2010-09-30 2:12 ` Christoph Hellwig
2010-09-30 4:53 ` Andrew Morton
2010-09-30 6:10 ` Dave Chinner
2010-10-16 7:55 ` Nick Piggin
2010-10-16 8:29 ` Eric Dumazet
2010-10-16 9:07 ` Andrew Morton [this message]
2010-10-16 9:31 ` Eric Dumazet
2010-10-16 14:19 ` [PATCH] percpu_counter : add percpu_counter_add_fast() Eric Dumazet
2010-10-18 15:24 ` Christoph Lameter
2010-10-18 15:39 ` Eric Dumazet
2010-10-18 16:12 ` Christoph Lameter
2010-10-21 22:37 ` Andrew Morton
2010-10-21 23:10 ` Christoph Lameter
2010-10-22 0:45 ` Andrew Morton
2010-10-22 1:55 ` Andrew Morton
2010-10-22 1:58 ` Nick Piggin
2010-10-22 2:14 ` Andrew Morton
2010-10-22 4:12 ` Eric Dumazet
2010-10-21 22:43 ` Andrew Morton
2010-10-21 22:58 ` Eric Dumazet
2010-10-21 23:18 ` Andrew Morton
2010-10-21 23:22 ` Eric Dumazet
2010-10-21 22:31 ` [PATCH 16/17] fs: Convert nr_inodes to a per-cpu counter Andrew Morton
2010-10-21 22:58 ` Eric Dumazet
2010-10-02 16:02 ` Christoph Hellwig
2010-09-29 12:18 ` [PATCH 17/17] fs: Clean up inode reference counting Dave Chinner
2010-09-30 2:15 ` Christoph Hellwig
2010-10-16 7:55 ` Nick Piggin
2010-10-16 16:14 ` Christoph Hellwig
2010-10-16 17:09 ` Nick Piggin
2010-09-30 4:53 ` Andrew Morton
2010-09-29 23:57 ` [PATCH 0/17] fs: Inode cache scalability Christoph Hellwig
2010-09-30 0:24 ` Dave Chinner
2010-09-30 2:21 ` Christoph Hellwig
2010-10-02 23:10 ` Carlos Carvalho
2010-10-04 7:22 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101016020744.366bd9c6.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=eric.dumazet@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@kernel.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).