linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hpe.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare()
Date: Wed, 07 Oct 2015 16:00:42 -0400	[thread overview]
Message-ID: <561579EA.60507@hpe.com> (raw)
In-Reply-To: <20151006213023.GP27164@dastard>

On 10/06/2015 05:30 PM, Dave Chinner wrote:
> On Tue, Oct 06, 2015 at 01:33:21PM -0400, Waiman Long wrote:
> Yes, it may be, but that does not mean we should optimise for it.
> If you are doing filesystem scalability testing on small filesystems
> near capacity, then your testing methodology is needs fixing. Not
> the code.
>
>>>>> XFS trades off low overhead for fast path allocation  with slowdowns
>>>>> as we near ENOSPC in allocation routines. It gets harder to find
>>>>> contiguous free space, files get more fragmented, IO takes longer
>>>>> because we seek more, etc. Hence we accept that performance slows
>>>>> down as as the need for precision increases as we near ENOSPC.
>>>>>
>>>>> I'd suggest you retry your benchmark with larger filesystems, and
>>>>> see what happens...
>>>> I don't think I am going to see the slowdown that I observed on
>>>> larger filesystems with more free space.
>>> So there is no problem that needs fixing.... ;)
>> Well, I am still worrying that corner cases when the slowpath is
>> triggered. I would like to make it perform better in those cases.
> It's a pretty damn small slowdown in your somewhat extreme,
> artificial test. Show me a real world production system that runs
> small fileystems permanently at>99% filesystem capacity, and them
> maybe vwe've got something that needs changing.
>
>>>> gauge how far it is from ENOSPC.  So we don't really need to get
>>>> the precise count as long as number of CPUs are taken into
>>>> consideration in the comparison.
>>> I think you are looking in the wrong place. There is nothing
>>> wrong with XFS doing two compares here. If we are hitting the
>>> __percpu_counter_compare() slow path too much, then we should be
>>> understanding exactly why that slow path is being hit so hard so
>>> often. I don't see any analysis of the actual per-cpu counter
>>> behaviour and why the slow path is being taken so often....
>> I am thinking of making the following changes:
> No. Please test the change to the per-cpu counters that I suggested:
>
>>> /*
>>>   * Aggregate the per-cpu counter magazines back into the global
>>>   * counter. This avoids the need for repeated compare operations to
>>>   * run the slow path when the majority of the counter value is held
>>>   * in the per-cpu magazines. Folding them back into the global
>>>   * counter means we will continue to hit the fast
>>>   * percpu_counter_read() path until the counter value falls
>>>   * completely within the comparison limit passed to
>>>   * __percpu_counter_compare().
>>>   */
>>> static s64 percpu_counter_aggregate(struct percpu_counter *fbc)
>>> {
>>> 	s64 ret;
>>> 	int cpu;
>>> 	unsigned long flags;
>>>
>>> 	raw_spin_lock_irqsave(&fbc->lock, flags);
>>> 	ret = fbc->count;
>>> 	for_each_online_cpu(cpu) {
>>> 		s32 count = __this_cpu_read(*fbc->counters);
>>>                  ret += count;
>>> 		__this_cpu_sub(*fbc->counters, count)
>>> 	}
>>> 	fbc->count = ret;
>>> 	raw_spin_unlock_irqrestore(&fbc->lock, flags);
>>> 	return ret;
>>> }
>> I don't think that will work as some other CPUs may change the
>> percpu counters values between percpu_counter_aggregate() and
>> __percpu_counter_compare().  To be safe, the precise counter has to
>> be compted whenever the comparison value difference is less than
>> nr_cpus * batch size.
> Well, yes. Why do you think the above function does the same
> function as percpu_counter_sum()? So that the percpu_counter_sum()
> call *inside* __percpu_counter_compare() can be replaced by this
> call. i.e.
>
> 			return -1;
> 	}
> 	/* Need to use precise count */
> -	count = percpu_counter_sum(fbc);
> +	count = percpu_counter_aggregate(fbc);
> 	if (count>  rhs)
> 		return 1;
> 	else if (count<  rhs)
>
> Please think about what I'm saying rather than dismissing it without
> first understanding my suggestions.

I understood what you were saying. However, the per-cpu counter isn't 
protected by the spinlock. Reading it is OK, but writing may cause race 
if that counter is modified by a CPU other than its owning CPU.

The slow performance of percpu_counter_sum() is due to its need to 
access n different (likely cold) cachelines where n is the number of 
CPUs in the system. So the larger the system, the more problematic it 
will be. My main concern about xfs_mod_fdblocks() is that it can 
potentially call percpu_counter_sum() twice which is what I want to 
prevent. It is OK if you don't think that change is necessary. However, 
I will come back if I find more evidence that this can be an issue.

Cheers,
Longman




  reply	other threads:[~2015-10-07 20:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02 17:29 [PATCH] percpu_counter: return precise count from __percpu_counter_compare() Waiman Long
2015-10-02 18:04 ` kbuild test robot
2015-10-05 23:03   ` Waiman Long
2015-10-02 18:05 ` kbuild test robot
2015-10-02 18:12 ` kbuild test robot
2015-10-02 18:15 ` kbuild test robot
2015-10-02 22:16 ` Dave Chinner
2015-10-05 23:02   ` Waiman Long
2015-10-06  0:25     ` Dave Chinner
2015-10-06 17:33       ` Waiman Long
2015-10-06 21:30         ` Dave Chinner
2015-10-07 20:00           ` Waiman Long [this message]
2015-10-07 23:04             ` Dave Chinner
2015-10-07 23:20               ` Tejun Heo
2015-10-08  1:02                 ` Dave Chinner
2015-10-08  1:09                   ` Tejun Heo
2015-10-08 16:06                   ` Waiman Long
2015-10-08 16:01               ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=561579EA.60507@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=cl@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=doug.hatch@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott.norton@hpe.com \
    --cc=tj@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).