All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <waiman.long@hpe.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Scott J Norton <scott.norton@hpe.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Douglas Hatch <doug.hatch@hpe.com>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare()
Date: Thu, 08 Oct 2015 12:06:36 -0400	[thread overview]
Message-ID: <5616948C.5000401@hpe.com> (raw)
In-Reply-To: <20151008010218.GT27164@dastard>

On 10/07/2015 09:02 PM, Dave Chinner wrote:
> On Wed, Oct 07, 2015 at 04:20:10PM -0700, Tejun Heo wrote:
>> Hello, Dave.
>>
>> On Thu, Oct 08, 2015 at 10:04:42AM +1100, Dave Chinner wrote:
>> ...
>>> As it is, the update race you pointed out is easy to solve with
>>> __this_cpu_cmpxchg rather than _this_cpu_sub (similar to mod_state()
>>> in the MM percpu counter stats code, perhaps).
>> percpu cmpxchg is no different from sub or any other operations
>> regarding cross-CPU synchronization.  They're safe iff the operations
>> are on the local CPU.  They have to be made atomics if they need to be
>> manipulated from remote CPUs.
> Again, another trivially solvable problem, but still irrelevant
> because we don't have the data that tells us whether changing the
> counter behaviour solves the problem....
>
>> That said, while we can't manipulate the percpu counters directly, we
>> can add a separate global counter to cache sum result from the
>> previous run which gets automatically invalidated when any percpu
>> counter overflows.
>>
>> That should give better and in case of
>> back-to-back invocations pretty good precision compared to just
>> returning the global overflow counter.  Interface-wise, that'd be a
>> lot easier to deal with although I have no idea whether it'd fit this
>> particular use case or whether this use case even exists.
> No, it doesn't help - it's effectively what Waiman's original patch
> did by returning the count from the initial comparison and using
> that for ENOSPC detection instead of doing a second comparison...
>
> FWIW, XFS has done an expensive per-cpu counter sum in this ENOSPC
> situation since 2006, but in 2007 ENOSPC was wrapped in a mutex to
> prevent spinlock contention on the aggregated global counter:
>
> commit 20b642858b6bb413976ff13ae6a35cc596967bab
> Author: David Chinner<dgc@sgi.com>
> Date:   Sat Feb 10 18:35:09 2007 +1100
>
>      [XFS] Reduction global superblock lock contention near ENOSPC.
>
>      The existing per-cpu superblock counter code uses the global superblock
>      spin lock when we approach ENOSPC for global synchronisation. On larger
>      machines than this code was originally tested on this can still get
>      catastrophic spinlock contention due increasing rebalance frequency near
>      ENOSPC.
>
>      By introducing a sleeping lock that is used to serialise balances and
>      modifications near ENOSPC we prevent contention from needlessly from
>      wasting the CPU time of potentially hundreds of CPUs.
>
>      To reduce the number of balances occuring, we separate the need rebalance
>      case from the slow allocate case. Now, a counter running dry will trigger
>      a rebalance during which counters are disabled. Any thread that sees a
>      disabled counter enters a different path where it waits on the new mutex.
>      When it gets the new mutex, it checks if the counter is disabled. If the
>      counter is disabled, then we _know_ that we have to use the global counter
>      and lock and it is safe to do so immediately. Otherwise, we drop the mutex
>      and go back to trying the per-cpu counters which we know were re-enabled.
>
>      SGI-PV: 952227
>      SGI-Modid: xfs-linux-melb:xfs-kern:27612a
>
>      Signed-off-by: David Chinner<dgc@sgi.com>
>      Signed-off-by: Lachlan McIlroy<lachlan@sgi.com>
>      Signed-off-by: Tim Shimmin<tes@sgi.com>
>
> This is effectively the same symptoms that what we are seeing with
> the new "lockless" generic percpu counteri algorithm, which is why
> I'm trying to find out if it an issue with the counter
> implementation before I do anything else...
>
> FWIW, the first comparison doesn't need to be that precise as it
> just changes the batch passed to percpu_counter_add() to get the
> value folded back into the global counter immediately near ENOSPC.
> This is done so percpu_counter_read() becomes more accurate as
> ENOSPC is approached as that is used for monitoring and reporting
> (e.g. via vfsstat). If we want to avoid a counter sum, then this
> is the comparison we will need to modify in XFS.

That is what I have advocated in the in the inlined patch that I sent 
you in a previous mail. That patch modified the first comparison, but 
leave the 2nd comparison intact. We will still see bad performance near 
ENOSPC, but it will be better than before.

> However, the second comparison needs to be precise as that's the one
> that does the ENOSPC detection. That sum needs to be done after the
> counter add that "uses" the space and so there is no avoiding having
> an expensive counter sum as we near ENOSPC....
>
> Cheers,
>
> Dave.

Cheers,
Longman

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <waiman.long@hpe.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare()
Date: Thu, 08 Oct 2015 12:06:36 -0400	[thread overview]
Message-ID: <5616948C.5000401@hpe.com> (raw)
In-Reply-To: <20151008010218.GT27164@dastard>

On 10/07/2015 09:02 PM, Dave Chinner wrote:
> On Wed, Oct 07, 2015 at 04:20:10PM -0700, Tejun Heo wrote:
>> Hello, Dave.
>>
>> On Thu, Oct 08, 2015 at 10:04:42AM +1100, Dave Chinner wrote:
>> ...
>>> As it is, the update race you pointed out is easy to solve with
>>> __this_cpu_cmpxchg rather than _this_cpu_sub (similar to mod_state()
>>> in the MM percpu counter stats code, perhaps).
>> percpu cmpxchg is no different from sub or any other operations
>> regarding cross-CPU synchronization.  They're safe iff the operations
>> are on the local CPU.  They have to be made atomics if they need to be
>> manipulated from remote CPUs.
> Again, another trivially solvable problem, but still irrelevant
> because we don't have the data that tells us whether changing the
> counter behaviour solves the problem....
>
>> That said, while we can't manipulate the percpu counters directly, we
>> can add a separate global counter to cache sum result from the
>> previous run which gets automatically invalidated when any percpu
>> counter overflows.
>>
>> That should give better and in case of
>> back-to-back invocations pretty good precision compared to just
>> returning the global overflow counter.  Interface-wise, that'd be a
>> lot easier to deal with although I have no idea whether it'd fit this
>> particular use case or whether this use case even exists.
> No, it doesn't help - it's effectively what Waiman's original patch
> did by returning the count from the initial comparison and using
> that for ENOSPC detection instead of doing a second comparison...
>
> FWIW, XFS has done an expensive per-cpu counter sum in this ENOSPC
> situation since 2006, but in 2007 ENOSPC was wrapped in a mutex to
> prevent spinlock contention on the aggregated global counter:
>
> commit 20b642858b6bb413976ff13ae6a35cc596967bab
> Author: David Chinner<dgc@sgi.com>
> Date:   Sat Feb 10 18:35:09 2007 +1100
>
>      [XFS] Reduction global superblock lock contention near ENOSPC.
>
>      The existing per-cpu superblock counter code uses the global superblock
>      spin lock when we approach ENOSPC for global synchronisation. On larger
>      machines than this code was originally tested on this can still get
>      catastrophic spinlock contention due increasing rebalance frequency near
>      ENOSPC.
>
>      By introducing a sleeping lock that is used to serialise balances and
>      modifications near ENOSPC we prevent contention from needlessly from
>      wasting the CPU time of potentially hundreds of CPUs.
>
>      To reduce the number of balances occuring, we separate the need rebalance
>      case from the slow allocate case. Now, a counter running dry will trigger
>      a rebalance during which counters are disabled. Any thread that sees a
>      disabled counter enters a different path where it waits on the new mutex.
>      When it gets the new mutex, it checks if the counter is disabled. If the
>      counter is disabled, then we _know_ that we have to use the global counter
>      and lock and it is safe to do so immediately. Otherwise, we drop the mutex
>      and go back to trying the per-cpu counters which we know were re-enabled.
>
>      SGI-PV: 952227
>      SGI-Modid: xfs-linux-melb:xfs-kern:27612a
>
>      Signed-off-by: David Chinner<dgc@sgi.com>
>      Signed-off-by: Lachlan McIlroy<lachlan@sgi.com>
>      Signed-off-by: Tim Shimmin<tes@sgi.com>
>
> This is effectively the same symptoms that what we are seeing with
> the new "lockless" generic percpu counteri algorithm, which is why
> I'm trying to find out if it an issue with the counter
> implementation before I do anything else...
>
> FWIW, the first comparison doesn't need to be that precise as it
> just changes the batch passed to percpu_counter_add() to get the
> value folded back into the global counter immediately near ENOSPC.
> This is done so percpu_counter_read() becomes more accurate as
> ENOSPC is approached as that is used for monitoring and reporting
> (e.g. via vfsstat). If we want to avoid a counter sum, then this
> is the comparison we will need to modify in XFS.

That is what I have advocated in the in the inlined patch that I sent 
you in a previous mail. That patch modified the first comparison, but 
leave the 2nd comparison intact. We will still see bad performance near 
ENOSPC, but it will be better than before.

> However, the second comparison needs to be precise as that's the one
> that does the ENOSPC detection. That sum needs to be done after the
> counter add that "uses" the space and so there is no avoiding having
> an expensive counter sum as we near ENOSPC....
>
> Cheers,
>
> Dave.

Cheers,
Longman

  parent reply	other threads:[~2015-10-08 16:06 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-02 17:29 [PATCH] percpu_counter: return precise count from __percpu_counter_compare() Waiman Long
2015-10-02 17:29 ` Waiman Long
2015-10-02 18:04 ` kbuild test robot
2015-10-02 18:04   ` kbuild test robot
2015-10-05 23:03   ` Waiman Long
2015-10-05 23:03     ` Waiman Long
2015-10-02 18:05 ` kbuild test robot
2015-10-02 18:05   ` kbuild test robot
2015-10-02 18:12 ` kbuild test robot
2015-10-02 18:12   ` kbuild test robot
2015-10-02 18:15 ` kbuild test robot
2015-10-02 18:15   ` kbuild test robot
2015-10-02 22:16 ` Dave Chinner
2015-10-02 22:16   ` Dave Chinner
2015-10-05 23:02   ` Waiman Long
2015-10-05 23:02     ` Waiman Long
2015-10-06  0:25     ` Dave Chinner
2015-10-06  0:25       ` Dave Chinner
2015-10-06 17:33       ` Waiman Long
2015-10-06 17:33         ` Waiman Long
2015-10-06 21:30         ` Dave Chinner
2015-10-06 21:30           ` Dave Chinner
2015-10-07 20:00           ` Waiman Long
2015-10-07 20:00             ` Waiman Long
2015-10-07 23:04             ` Dave Chinner
2015-10-07 23:04               ` Dave Chinner
2015-10-07 23:20               ` Tejun Heo
2015-10-07 23:20                 ` Tejun Heo
2015-10-08  1:02                 ` Dave Chinner
2015-10-08  1:02                   ` Dave Chinner
2015-10-08  1:09                   ` Tejun Heo
2015-10-08  1:09                     ` Tejun Heo
2015-10-08 16:06                   ` Waiman Long [this message]
2015-10-08 16:06                     ` Waiman Long
2015-10-08 16:01               ` Waiman Long
2015-10-08 16:01                 ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5616948C.5000401@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=cl@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=doug.hatch@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott.norton@hpe.com \
    --cc=tj@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.