linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shaohua Li <shaohua.li@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"cl@linux.com" <cl@linux.com>,
	"npiggin@kernel.dk" <npiggin@kernel.dk>
Subject: Re: [patch V3] percpu_counter: scalability works
Date: Tue, 17 May 2011 14:45:28 +0200	[thread overview]
Message-ID: <20110517124528.GN20624@htj.dyndns.org> (raw)
In-Reply-To: <1305634807.2850.89.camel@edumazet-laptop>

Hello, Eric.

On Tue, May 17, 2011 at 02:20:07PM +0200, Eric Dumazet wrote:
> Spikes are expected and have no effect by design.
> 
> batch value is chosen so that granularity of the percpu_counter
> (batch*num_online_cpus()) is the spike factor, and thats pretty
> difficult when number of cpus is high.
> 
> In Shaohua workload, 'amount' for a 128Mbyte mapping is 32768, while the
> batch value is 48. 48*24 = 1152.
> So the percpu s32 being in [-47 .. 47] range would not change the
> accuracy of the _sum() function [ if it was eventually called, but its
> not ]
> 
> No drift in the counter is the only thing we care - and _read() being
> not too far away from the _sum() value, in particular if the
> percpu_counter is used to check a limit that happens to be low (against
> granularity of the percpu_counter : batch*num_online_cpus()).
> 
> I claim extra care is not needed. This might give the false impression
> to reader/user that percpu_counter object can replace a plain
> atomic64_t.

We already had this discussion.  Sure, we can argue about it again all
day but I just don't think it's a necessary compromise and really
makes _sum() quite dubious.  It's not about strict correctness, it
can't be, but if I spent the overhead to walk all the different percpu
counters, I'd like to have a rather exact number if there's nothing
much going on (freeblock count, for example).  Also, I want to be able
to use large @batch if the situation allows for it without worrying
about _sum() accuracy.

Given that _sum() is super-slow path and we have a lot of latitude
there, this should be possible without resorting to heavy handed
approach like lglock.  I was hoping that someone would come up with a
better solution, which didn't seem to have happened.  Maybe I was
wrong, I don't know.  I'll give it a shot.

But, anyways, here's my position regarding the issue.

* If we're gonna just fix up the slow path, I don't want to make
  _sum() less useful by making its accuracy dependent upon @batch.

* If somebody is interested, it would be worthwhile to see whether we
  can integrate vmstat and percpu counters so that its deviation is
  automatically regulated and we don't have to think about all this
  anymore.

I'll see if I can come up with something.

Thank you.

-- 
tejun

  reply	other threads:[~2011-05-17 12:45 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-11  8:10 [patch v2 0/5] percpu_counter: bug fix and enhancement Shaohua Li
2011-05-11  8:10 ` [patch v2 1/5] percpu_counter: fix code for 32bit systems for UP Shaohua Li
2011-05-11  8:10 ` [patch v2 2/5] lglock: convert it to work with dynamically allocated structure Shaohua Li
2011-05-11  8:10 ` [patch v2 3/5] percpu_counter: use lglock to protect percpu data Shaohua Li
2011-05-11  8:10 ` [patch v2 4/5] percpu_counter: use atomic64 for counter in SMP Shaohua Li
2011-05-11  9:34   ` Andrew Morton
2011-05-12  2:40     ` Shaohua Li
2011-05-11  8:10 ` [patch v2 5/5] percpu_counter: preemptless __per_cpu_counter_add Shaohua Li
2011-05-11  9:28 ` [patch v2 0/5] percpu_counter: bug fix and enhancement Tejun Heo
2011-05-12  2:48   ` Shaohua Li
2011-05-12  8:21     ` Tejun Heo
2011-05-12  8:55       ` Shaohua Li
2011-05-12  8:59         ` Tejun Heo
2011-05-12  9:02           ` Eric Dumazet
2011-05-12  9:03             ` Eric Dumazet
2011-05-12  9:05             ` Tejun Heo
2011-05-13  3:09               ` Shaohua Li
2011-05-13  4:37               ` Shaohua Li
2011-05-13  5:20                 ` Eric Dumazet
2011-05-13  5:28                   ` Shaohua Li
2011-05-13  6:34                     ` Eric Dumazet
2011-05-13  7:33                       ` Shaohua Li
2011-05-13 14:51                       ` [patch] percpu_counter: scalability works Eric Dumazet
2011-05-13 15:39                         ` Eric Dumazet
2011-05-13 16:35                           ` [patch V2] " Eric Dumazet
2011-05-13 16:46                             ` Eric Dumazet
2011-05-13 22:03                               ` [patch V3] " Eric Dumazet
2011-05-16  0:58                                 ` Shaohua Li
2011-05-16  6:11                                   ` Eric Dumazet
2011-05-16  6:37                                     ` Shaohua Li
2011-05-16  6:55                                       ` Eric Dumazet
2011-05-16  7:15                                         ` Shaohua Li
2011-05-16  7:44                                           ` Eric Dumazet
2011-05-16  8:34                                             ` Shaohua Li
2011-05-16  9:35                                               ` Eric Dumazet
2011-05-16 14:22                                                 ` Eric Dumazet
2011-05-17  0:55                                                   ` Shaohua Li
2011-05-17  4:56                                                     ` Eric Dumazet
2011-05-17  5:22                                                       ` Shaohua Li
2011-05-17  9:01                                                         ` Eric Dumazet
2011-05-17  9:11                                                           ` Tejun Heo
2011-05-17  9:45                                                             ` Eric Dumazet
2011-05-17  9:50                                                               ` Tejun Heo
2011-05-17 12:20                                                                 ` Eric Dumazet
2011-05-17 12:45                                                                   ` Tejun Heo [this message]
2011-05-17 13:00                                                                     ` Eric Dumazet
2011-05-17 13:04                                                                       ` Tejun Heo
2011-05-17 13:55                                                                         ` Christoph Lameter
2011-05-17 14:02                                                                           ` Tejun Heo
2011-05-17 14:38                                                                             ` Christoph Lameter
2011-05-18  1:00                                                                 ` Shaohua Li
2011-05-12 14:38   ` [patch v2 0/5] percpu_counter: bug fix and enhancement Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110517124528.GN20624@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).