From: Dave Chinner <david@fromorbit.com>
To: Alex Elder <aelder@sgi.com>
Cc: XFS Mailing List <xfs@oss.sgi.com>
Subject: Re: [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt()
Date: Thu, 23 Dec 2010 17:06:52 +1100 [thread overview]
Message-ID: <20101223060652.GE18264@dastard> (raw)
In-Reply-To: <1293076575.2408.425.camel@doink>
On Wed, Dec 22, 2010 at 09:56:15PM -0600, Alex Elder wrote:
> In __percpu_counter_add_unless_lt(), there is a test to see if
> a call to __percpu_counter_add() can be made, knowing the result
> will not be less than the given threshhold and the called function
> will do the addition without taking a lock.
>
> Unfortunately, it is using the wrong value in its comparison--the
> amount to add is not properly being taken into account. As a
> result, __percpu_counter_add() could end up adding a value when it
> should not. And even if the result will be non-negative, the called
> function may take the lock anyway because it does so if the *sum* of
> the (percpu) delta count and the given amount (and not just the amount
> alone) is greater than the batch size (or less than its negative).
>
> Fix the comparison, and since it __percpu_counter_add() will do
> the right thing (and might lock anyway), just call it regardless
> of what amount is getting added.
>
> Signed-off-by: Alex Elder <aelder@sgi.com>
>
> ---
> lib/percpu_counter.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> Index: b/lib/percpu_counter.c
> ===================================================================
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -239,10 +239,10 @@ int __percpu_counter_add_unless_lt(struc
> goto out;
>
> /*
> - * If the counter is over the threshold and the change is less than the
> - * batch size, we might be able to avoid locking.
> + * If the updated counter will be over the threshold we know
> + * we can safely add, and might be able to avoid locking.
> */
> - if (count > threshold + error && abs(amount) < batch) {
> + if (count + amount > threshold + error) {
> __percpu_counter_add(fbc, amount, batch);
> ret = 1;
> goto out;
What you have is what I first implemented following your exact
logic. However, after having several assert failures n the XFS code,
I realised that the logic fails when there are concurrent modifications
with abs(amount) > batch. To demonstrate, take the initial conditions:
threshold = 0
amount = -39 (for ease of maths)
batch = 32
error = 256 (2 * 32 * 4 cpus)
with the per CPU counter values:
fbc->count = 296
CPU 0 1 2 3
value -31 -31 -31 -31
And the start the concurrent modification such that every racing CPU
sees fbc->count = 296 at their initial sample. All evaluate it as:
count = 296
if (296 - 39 > 0 + 256) {
and so take the __percpu_counter_add() path. Assuming CPUs 1-3
complete before CPU 0, the counter changes value like:
cpu 1 pcp_cnt_add_lt(-39)
CPU 0 1 2 3
value -31 0 -31 -31
fbc->count = 216
cpu 2 pcp_cnt_add_lt(-39)
CPU 0 1 2 3
value -31 0 0 -31
fbc->count = 136
cpu 3 pcp_cnt_add_lt(-39)
CPU 0 1 2 3
value -31 0 0 0
fbc->count = 56
And the we run the counter add on CPU 0:
__percpu_counter_add(fbc, -39, 32);
CPU 0 1 2 3
value 0 0 0 0
fbc->count = -24
And we've just blown through the threshold. We modified the counter
and pushed the result less than the threshold which we are not
supposed to be, and worst yet is the fact we returning a positive
number indicating that we _didn't_ bust the threshold.
Analysis of the failures lead me to this lead me to this logic
for the unlocked check in my proposed patch:
We can do unlocked modifications concurrently on all CPUs IFF
1. the code cannot be preempted between sampling fbc->count
and calling __percpu_counter_add()
2. -batch < amount < batch
3. the error bound is set to 2 * batch * nr_cpus
To demonstrate the worst case initial conditions, but being bound by
the above rules:
threshold = 0
amount = -31
batch = 32
error = 256 (2 * 32 * 4 cpus)
per CPU counters are at worst case lower edge for lt detection:
CPU 0 1 2 3
value -31 -31 -31 -31
fbc->count = 258
and the racing CPUs are all doing equivalent add_unless_lt of -31.
Every racing CPU sees fbc->count = 258 at their initial sample.
what we end up with is:
CPU 0: Racing CPUs
count = 258
if (258 > 0 + 256) {
cpu 1 pcp_cnt_add_lt(-31)
CPU 0 1 2 3
value -31 0 -31 -31
fbc->count = 196
cpu 2 pcp_cnt_add_lt(-31)
CPU 0 1 2 3
value -31 0 0 -31
fbc->count = 134
cpu 3 pcp_cnt_add_lt(-31)
CPU 0 1 2 3
value -31 0 0 0
fbc->count = 70
__percpu_counter_add(fbc, -31, 32);
CPU 0 1 2 3
value 0 0 0 0
fbc->count = 6
And so the return value of 1 is valid because 6 > 0. Indeed, looking
at this I can probably change the rule to -batch <= amount <= batch,
because the above with amount = -32 would give and end value of
fbc->count = 2....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-12-23 6:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-23 3:56 [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() Alex Elder
2010-12-23 6:06 ` Dave Chinner [this message]
2010-12-29 20:56 ` Alex Elder
2010-12-29 21:49 ` Alex Elder
2010-12-30 22:35 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101223060652.GE18264@dastard \
--to=david@fromorbit.com \
--cc=aelder@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox