public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Waiman Long <waiman.long@hpe.com>
Cc: Scott J Norton <scott.norton@hpe.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Douglas Hatch <doug.hatch@hpe.com>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare()
Date: Thu, 8 Oct 2015 10:04:42 +1100	[thread overview]
Message-ID: <20151007230441.GG32150@dastard> (raw)
In-Reply-To: <561579EA.60507@hpe.com>

On Wed, Oct 07, 2015 at 04:00:42PM -0400, Waiman Long wrote:
> On 10/06/2015 05:30 PM, Dave Chinner wrote:
> >>>/*
> >>>  * Aggregate the per-cpu counter magazines back into the global
> >>>  * counter. This avoids the need for repeated compare operations to
> >>>  * run the slow path when the majority of the counter value is held
> >>>  * in the per-cpu magazines. Folding them back into the global
> >>>  * counter means we will continue to hit the fast
> >>>  * percpu_counter_read() path until the counter value falls
> >>>  * completely within the comparison limit passed to
> >>>  * __percpu_counter_compare().
> >>>  */
> >>>static s64 percpu_counter_aggregate(struct percpu_counter *fbc)
> >>>{
> >>>	s64 ret;
> >>>	int cpu;
> >>>	unsigned long flags;
> >>>
> >>>	raw_spin_lock_irqsave(&fbc->lock, flags);
> >>>	ret = fbc->count;
> >>>	for_each_online_cpu(cpu) {
> >>>		s32 count = __this_cpu_read(*fbc->counters);
> >>>                 ret += count;
> >>>		__this_cpu_sub(*fbc->counters, count)
> >>>	}
> >>>	fbc->count = ret;
> >>>	raw_spin_unlock_irqrestore(&fbc->lock, flags);
> >>>	return ret;
> >>>}
> >>I don't think that will work as some other CPUs may change the
> >>percpu counters values between percpu_counter_aggregate() and
> >>__percpu_counter_compare().  To be safe, the precise counter has to
> >>be compted whenever the comparison value difference is less than
> >>nr_cpus * batch size.
> >Well, yes. Why do you think the above function does the same
> >function as percpu_counter_sum()? So that the percpu_counter_sum()
> >call *inside* __percpu_counter_compare() can be replaced by this
> >call. i.e.
> >
> >			return -1;
> >	}
> >	/* Need to use precise count */
> >-	count = percpu_counter_sum(fbc);
> >+	count = percpu_counter_aggregate(fbc);
> >	if (count>  rhs)
> >		return 1;
> >	else if (count<  rhs)
> >
> >Please think about what I'm saying rather than dismissing it without
> >first understanding my suggestions.
> 
> I understood what you were saying. However, the per-cpu counter
> isn't protected by the spinlock. Reading it is OK, but writing may
> cause race if that counter is modified by a CPU other than its
> owning CPU.

<sigh>

You're still trying to pick apart the code without considering what
we need to acheive.  We don't need to the code to be bullet proof to
test whether this hypothesis is correct or not - we just need
something that is "near-enough" to give us the data point to tell us
where we should focus our efforts. If optimising the counter like
above does not reduce the overhead, then we may have to change XFS.
If it does reduce the overhead, then the XFS code remains unchanged
and we focus on optimising the counter code.

But we *need to test the hypothesis first*.

As it is, the update race you pointed out is easy to solve with
__this_cpu_cmpxchg rather than _this_cpu_sub (similar to mod_state()
in the MM percpu counter stats code, perhaps).

So please test the above change and stop quibbling over details
that just don't matter until we know which way we need to go.

> The slow performance of percpu_counter_sum() is due to its need to
> access n different (likely cold) cachelines where n is the number of
> CPUs in the system. So the larger the system, the more problematic
> it will be.

Welcome to today's lecture titled "Per-cpu Counters 101". :/

Have a think about who you are lecturing(*).  Please don't treat me
as you would university intern who's just learning about
multithreaded programming, because a) it's disrepectful, and b)
you'll jump to the wrong conclusions because you may not immediately
understand what I'm saying or why I'm asking you to do something.

I always start by assuming participants understand the topic being
discussed.  I can quickly tell if a person has deep knowledge of the
topic from their responses and the above "explain the basics"
response is a key indicator.  If you assume that the other person
understands the topic as well or better than you do then you won't
make this mistake and, better yet, we might all learn something in
the ensuing discussion.

Cheers,

Dave.

(*) XFS had custom per-cpu counters because there was no generic
infrastructure in linux for this ten years ago:

commit 8d280b98cfe3c0b69c37d355218975c1c0279bb0
Author: David Chinner <dgc@sgi.com>
Date:   Tue Mar 14 13:13:09 2006 +1100

    [XFS] On machines with more than 8 cpus, when running parallel I/O
    threads, the incore superblock lock becomes the limiting factor for
    buffered write throughput. Make the contended fields in the incore
    superblock use per-cpu counters so that there is no global lock to limit
    scalability.
    
    SGI-PV: 946630
    SGI-Modid: xfs-linux-melb:xfs-kern:25106a
    
    Signed-off-by: David Chinner <dgc@sgi.com>
    Signed-off-by: Nathan Scott <nathans@sgi.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-10-07 23:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02 17:29 [PATCH] percpu_counter: return precise count from __percpu_counter_compare() Waiman Long
2015-10-02 18:04 ` kbuild test robot
2015-10-05 23:03   ` Waiman Long
2015-10-02 18:05 ` kbuild test robot
2015-10-02 18:12 ` kbuild test robot
2015-10-02 18:15 ` kbuild test robot
2015-10-02 22:16 ` Dave Chinner
2015-10-05 23:02   ` Waiman Long
2015-10-06  0:25     ` Dave Chinner
2015-10-06 17:33       ` Waiman Long
2015-10-06 21:30         ` Dave Chinner
2015-10-07 20:00           ` Waiman Long
2015-10-07 23:04             ` Dave Chinner [this message]
2015-10-07 23:20               ` Tejun Heo
2015-10-08  1:02                 ` Dave Chinner
2015-10-08  1:09                   ` Tejun Heo
2015-10-08 16:06                   ` Waiman Long
2015-10-08 16:01               ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151007230441.GG32150@dastard \
    --to=david@fromorbit.com \
    --cc=cl@linux-foundation.org \
    --cc=doug.hatch@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott.norton@hpe.com \
    --cc=tj@kernel.org \
    --cc=waiman.long@hpe.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox