* [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt()
@ 2010-12-23 3:56 Alex Elder
2010-12-23 6:06 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Alex Elder @ 2010-12-23 3:56 UTC (permalink / raw)
To: XFS Mailing List
In __percpu_counter_add_unless_lt(), there is a test to see if
a call to __percpu_counter_add() can be made, knowing the result
will not be less than the given threshhold and the called function
will do the addition without taking a lock.
Unfortunately, it is using the wrong value in its comparison--the
amount to add is not properly being taken into account. As a
result, __percpu_counter_add() could end up adding a value when it
should not. And even if the result will be non-negative, the called
function may take the lock anyway because it does so if the *sum* of
the (percpu) delta count and the given amount (and not just the amount
alone) is greater than the batch size (or less than its negative).
Fix the comparison, and since it __percpu_counter_add() will do
the right thing (and might lock anyway), just call it regardless
of what amount is getting added.
Signed-off-by: Alex Elder <aelder@sgi.com>
---
lib/percpu_counter.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Index: b/lib/percpu_counter.c
===================================================================
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -239,10 +239,10 @@ int __percpu_counter_add_unless_lt(struc
goto out;
/*
- * If the counter is over the threshold and the change is less than the
- * batch size, we might be able to avoid locking.
+ * If the updated counter will be over the threshold we know
+ * we can safely add, and might be able to avoid locking.
*/
- if (count > threshold + error && abs(amount) < batch) {
+ if (count + amount > threshold + error) {
__percpu_counter_add(fbc, amount, batch);
ret = 1;
goto out;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() 2010-12-23 3:56 [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() Alex Elder @ 2010-12-23 6:06 ` Dave Chinner 2010-12-29 20:56 ` Alex Elder 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2010-12-23 6:06 UTC (permalink / raw) To: Alex Elder; +Cc: XFS Mailing List On Wed, Dec 22, 2010 at 09:56:15PM -0600, Alex Elder wrote: > In __percpu_counter_add_unless_lt(), there is a test to see if > a call to __percpu_counter_add() can be made, knowing the result > will not be less than the given threshhold and the called function > will do the addition without taking a lock. > > Unfortunately, it is using the wrong value in its comparison--the > amount to add is not properly being taken into account. As a > result, __percpu_counter_add() could end up adding a value when it > should not. And even if the result will be non-negative, the called > function may take the lock anyway because it does so if the *sum* of > the (percpu) delta count and the given amount (and not just the amount > alone) is greater than the batch size (or less than its negative). > > Fix the comparison, and since it __percpu_counter_add() will do > the right thing (and might lock anyway), just call it regardless > of what amount is getting added. > > Signed-off-by: Alex Elder <aelder@sgi.com> > > --- > lib/percpu_counter.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > Index: b/lib/percpu_counter.c > =================================================================== > --- a/lib/percpu_counter.c > +++ b/lib/percpu_counter.c > @@ -239,10 +239,10 @@ int __percpu_counter_add_unless_lt(struc > goto out; > > /* > - * If the counter is over the threshold and the change is less than the > - * batch size, we might be able to avoid locking. > + * If the updated counter will be over the threshold we know > + * we can safely add, and might be able to avoid locking. > */ > - if (count > threshold + error && abs(amount) < batch) { > + if (count + amount > threshold + error) { > __percpu_counter_add(fbc, amount, batch); > ret = 1; > goto out; What you have is what I first implemented following your exact logic. However, after having several assert failures n the XFS code, I realised that the logic fails when there are concurrent modifications with abs(amount) > batch. To demonstrate, take the initial conditions: threshold = 0 amount = -39 (for ease of maths) batch = 32 error = 256 (2 * 32 * 4 cpus) with the per CPU counter values: fbc->count = 296 CPU 0 1 2 3 value -31 -31 -31 -31 And the start the concurrent modification such that every racing CPU sees fbc->count = 296 at their initial sample. All evaluate it as: count = 296 if (296 - 39 > 0 + 256) { and so take the __percpu_counter_add() path. Assuming CPUs 1-3 complete before CPU 0, the counter changes value like: cpu 1 pcp_cnt_add_lt(-39) CPU 0 1 2 3 value -31 0 -31 -31 fbc->count = 216 cpu 2 pcp_cnt_add_lt(-39) CPU 0 1 2 3 value -31 0 0 -31 fbc->count = 136 cpu 3 pcp_cnt_add_lt(-39) CPU 0 1 2 3 value -31 0 0 0 fbc->count = 56 And the we run the counter add on CPU 0: __percpu_counter_add(fbc, -39, 32); CPU 0 1 2 3 value 0 0 0 0 fbc->count = -24 And we've just blown through the threshold. We modified the counter and pushed the result less than the threshold which we are not supposed to be, and worst yet is the fact we returning a positive number indicating that we _didn't_ bust the threshold. Analysis of the failures lead me to this lead me to this logic for the unlocked check in my proposed patch: We can do unlocked modifications concurrently on all CPUs IFF 1. the code cannot be preempted between sampling fbc->count and calling __percpu_counter_add() 2. -batch < amount < batch 3. the error bound is set to 2 * batch * nr_cpus To demonstrate the worst case initial conditions, but being bound by the above rules: threshold = 0 amount = -31 batch = 32 error = 256 (2 * 32 * 4 cpus) per CPU counters are at worst case lower edge for lt detection: CPU 0 1 2 3 value -31 -31 -31 -31 fbc->count = 258 and the racing CPUs are all doing equivalent add_unless_lt of -31. Every racing CPU sees fbc->count = 258 at their initial sample. what we end up with is: CPU 0: Racing CPUs count = 258 if (258 > 0 + 256) { cpu 1 pcp_cnt_add_lt(-31) CPU 0 1 2 3 value -31 0 -31 -31 fbc->count = 196 cpu 2 pcp_cnt_add_lt(-31) CPU 0 1 2 3 value -31 0 0 -31 fbc->count = 134 cpu 3 pcp_cnt_add_lt(-31) CPU 0 1 2 3 value -31 0 0 0 fbc->count = 70 __percpu_counter_add(fbc, -31, 32); CPU 0 1 2 3 value 0 0 0 0 fbc->count = 6 And so the return value of 1 is valid because 6 > 0. Indeed, looking at this I can probably change the rule to -batch <= amount <= batch, because the above with amount = -32 would give and end value of fbc->count = 2.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() 2010-12-23 6:06 ` Dave Chinner @ 2010-12-29 20:56 ` Alex Elder 2010-12-29 21:49 ` Alex Elder 2010-12-30 22:35 ` Dave Chinner 0 siblings, 2 replies; 5+ messages in thread From: Alex Elder @ 2010-12-29 20:56 UTC (permalink / raw) To: Dave Chinner; +Cc: XFS Mailing List On Thu, 2010-12-23 at 17:06 +1100, Dave Chinner wrote: > On Wed, Dec 22, 2010 at 09:56:15PM -0600, Alex Elder wrote: > > In __percpu_counter_add_unless_lt(), there is a test to see if > > a call to __percpu_counter_add() can be made, knowing the result > > will not be less than the given threshhold and the called function > > will do the addition without taking a lock. . . . > > =================================================================== > > --- a/lib/percpu_counter.c > > +++ b/lib/percpu_counter.c > > @@ -239,10 +239,10 @@ int __percpu_counter_add_unless_lt(struc > > goto out; > > > > /* > > - * If the counter is over the threshold and the change is less than the > > - * batch size, we might be able to avoid locking. > > + * If the updated counter will be over the threshold we know > > + * we can safely add, and might be able to avoid locking. > > */ > > - if (count > threshold + error && abs(amount) < batch) { > > + if (count + amount > threshold + error) { > > __percpu_counter_add(fbc, amount, batch); > > ret = 1; > > goto out; > > What you have is what I first implemented following your exact > logic. However, after having several assert failures n the XFS code, > I realised that the logic fails when there are concurrent modifications > with abs(amount) > batch. To demonstrate, take the initial conditions: I think we're both wrong. First I'll point out a think-o you did, then I'll lay out a counterexample to your approach. > threshold = 0 > amount = -39 (for ease of maths) > batch = 32 > error = 256 (2 * 32 * 4 cpus) > > with the per CPU counter values: > > fbc->count = 296 > > CPU 0 1 2 3 > value -31 -31 -31 -31 > > And the start the concurrent modification such that every racing CPU > sees fbc->count = 296 at their initial sample. All evaluate it as: > > count = 296 > if (296 - 39 > 0 + 256) { > > and so take the __percpu_counter_add() path. Assuming CPUs 1-3 > complete before CPU 0, the counter changes value like: > > > cpu 1 pcp_cnt_add_lt(-39) > CPU 0 1 2 3 > value -31 0 -31 -31 > fbc->count = 216 No. fbc->count + *pcount + amount = 296 + (-31) + (-39) = 226. You're mis-computing (*pcount + amount), which is -70, not -80. > cpu 2 pcp_cnt_add_lt(-39) > CPU 0 1 2 3 > value -31 0 0 -31 > fbc->count = 136 fbc->count = 156 > cpu 3 pcp_cnt_add_lt(-39) > CPU 0 1 2 3 > value -31 0 0 0 > fbc->count = 56 fbc->count = 86 > And the we run the counter add on CPU 0: > > __percpu_counter_add(fbc, -39, 32); > > CPU 0 1 2 3 > value 0 0 0 0 > fbc->count = -24 fbc->count = 16, so all is well. > And we've just blown through the threshold. We modified the counter > and pushed the result less than the threshold which we are not > supposed to be, and worst yet is the fact we returning a positive > number indicating that we _didn't_ bust the threshold. Now here's an example that will fail with your version, which is: if (count > threshold + error && abs(amount) < batch) { __percpu_counter_add(fbc, amount, batch); My example is not symmetric like yours is. threshold = 0 batch = 32 4 cpus, so (your) error = 2 * 4 * 32 = 256 We'll start with 0 for all per-cpu counter values for simplicity, but we'll not assume that the amount being added by all CPU's is the same. So here are both the current per-cpu counter values and the amount each racing CPU is going to try to add. fbc->count = 257 CPU 0 1 2 3 value 0 0 0 0 amount -31 -76 -76 -76 As in your example, all CPU's enter at the same time and start with the same initial value from fbc->count, 257. First, CPU 0 checks the condition and finds it holds: if (count > threshold + error && abs(amount) < batch) { 257 > 0 + 256 && abs(-31) < 32 Now, before CPU 0 gets its chance to call __percpu_counter_add(), CPU's 1, 2, and 3 reach this point, find the condition does not hold, and proceed with adding their values to fbc->count under lock. Net result is: (-76 * 3 = -228; 257 - 228 = 29) fbc->count = 29 CPU 0 1 2 3 value 0 0 0 0 amount -31 (done) (done) (done) Now CPU 0 proceeds with its __percpu_counter_add(), and inside that it finds that *pcount + amount = -31, which is < batch and > -batch, so it simply records the result in the local "delta" counter for CPU 0. Net result: fbc->count = 29 CPU 0 1 2 3 value -31 0 0 0 amount (done) (done) (done) (done) So although the test made it seem OK to add the value, the "net" result has gone negative. I.e, percpu_counter_sum() would produce -2. The whole point of these bounds checking things is to ensure the net value (not just fbc->count) of the counter stays above the threshold. The same problem exists if the amount being added on CPU 0 was -76 like the others. The add would be done under lock, but the result would still go below the threshold. Your approach embeds in it an assumption that "amount" will be within a certain range across all CPU's--specifically, less than the batch size (I think). The real problem is that we're caching the value of fbc->count, and as soon as we've done that it could be out of date. To get around it we need to examine it under lock--preferably a reader/writer lock I suppose. You'll notice that fbc->count is only ever read under the protection of the spinlock everywhere else in the file. Locking is avoided only when checking whether it is possible to add to a per-cpu value, and that is done with preemption disabled. It may be that the only way around this is to compute the accurate sum except in the simple case where we know that the maximum possible net value of the counter will produce a too-low value when the amount is added. Otherwise you have to compute the sum under lock, and then complete the assignment only if the threshold condition is met. I think that--unless you impose a range bound on "amount"--you can't escape the fact that other CPU's can screw up the result by updating fbc->count after you've sampled it. I have a little more below. > Analysis of the failures lead me to this lead me to this logic > for the unlocked check in my proposed patch: > > We can do unlocked modifications concurrently on all CPUs IFF > > 1. the code cannot be preempted between sampling fbc->count > and calling __percpu_counter_add() This I agree with (you convinced me in your response to my patch 5/5). > 2. -batch < amount < batch This isn't right, or maybe just isn't relevant. The reason is that it neither takes into account this CPU's counter value, nor what a concurrent call on the other CPU's might be adding to the net value. > 3. the error bound is set to 2 * batch * nr_cpus I think this assumes that all the CPU's are adding a limited amount to the counter. And I still think you're doubling the error bound unnecessarily (but that may reflect the implied limit). Below I have a version of the code that I think would work, but it unfortunately requires a lock in the (presumably) normal case. > To demonstrate the worst case initial conditions, but being bound by > the above rules: . . . -Alex int __percpu_counter_add_unless_lt( struct percpu_counter *fbc, s64 amount, s64 threshold, s32 batch) { s64 error = batch * num_online_cpus(); s64 new_count; int cpu; int ret = -1; /* * The maximum value the counter holds is bounded by its * current count plus the error magnitude computed above. * If that plus the amount to be added is less than the * threshold, we can't do the add because the result would * be too low. */ if (percpu_counter_read(fbc) + error + amount < threshold) return -1; /* * Otherwise, compute the accurate sum. If and adding the * given amount doesn't exceed the threshold, add it. In * that case, just add the amount to the global counter. */ spin_lock(&fbc->lock); new_count = fbc->count + amount; for_each_online_cpu(cpu) { s32 *pcount = per_cpu_ptr(fbc->counters, cpu); new_count += *pcount; } if (new_count >= threshold) { ret = new_count > threshold; fbc->count = fbc_count + amount; } spin_unlock(&fbc->lock); return ret; } _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() 2010-12-29 20:56 ` Alex Elder @ 2010-12-29 21:49 ` Alex Elder 2010-12-30 22:35 ` Dave Chinner 1 sibling, 0 replies; 5+ messages in thread From: Alex Elder @ 2010-12-29 21:49 UTC (permalink / raw) To: Dave Chinner; +Cc: XFS Mailing List On Wed, 2010-12-29 at 14:56 -0600, Alex Elder wrote: > if (new_count >= threshold) { > ret = new_count > threshold; > fbc->count = fbc_count + amount; > } > spin_unlock(&fbc->lock); > > return ret; > } I obviously didn't test the code I sent, but you get what I mean... Here is the way one of the lines above should look: fbc->count = fbc->count + amount; Or even better: fbc->count += amount; -Alex _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() 2010-12-29 20:56 ` Alex Elder 2010-12-29 21:49 ` Alex Elder @ 2010-12-30 22:35 ` Dave Chinner 1 sibling, 0 replies; 5+ messages in thread From: Dave Chinner @ 2010-12-30 22:35 UTC (permalink / raw) To: Alex Elder; +Cc: XFS Mailing List On Wed, Dec 29, 2010 at 02:56:28PM -0600, Alex Elder wrote: > On Thu, 2010-12-23 at 17:06 +1100, Dave Chinner wrote: > > On Wed, Dec 22, 2010 at 09:56:15PM -0600, Alex Elder wrote: > > > In __percpu_counter_add_unless_lt(), there is a test to see if > > > a call to __percpu_counter_add() can be made, knowing the result > > > will not be less than the given threshhold and the called function > > > will do the addition without taking a lock. > > . . . > > > > =================================================================== > > > --- a/lib/percpu_counter.c > > > +++ b/lib/percpu_counter.c > > > @@ -239,10 +239,10 @@ int __percpu_counter_add_unless_lt(struc > > > goto out; > > > > > > /* > > > - * If the counter is over the threshold and the change is less than the > > > - * batch size, we might be able to avoid locking. > > > + * If the updated counter will be over the threshold we know > > > + * we can safely add, and might be able to avoid locking. > > > */ > > > - if (count > threshold + error && abs(amount) < batch) { > > > + if (count + amount > threshold + error) { > > > __percpu_counter_add(fbc, amount, batch); > > > ret = 1; > > > goto out; > > > > What you have is what I first implemented following your exact > > logic. However, after having several assert failures n the XFS code, > > I realised that the logic fails when there are concurrent modifications > > with abs(amount) > batch. To demonstrate, take the initial conditions: > > I think we're both wrong. First I'll point out a think-o > you did, then I'll lay out a counterexample to your approach. Argh, you're right. However, if you change amount to -49 and the initial fbc->count to 306, we get 226, 146, 66, -14, so the problem I was pointing out still stands. > > And we've just blown through the threshold. We modified the counter > > and pushed the result less than the threshold which we are not > > supposed to be, and worst yet is the fact we returning a positive > > number indicating that we _didn't_ bust the threshold. > > Now here's an example that will fail with your version, which is: > > if (count > threshold + error && abs(amount) < batch) { > __percpu_counter_add(fbc, amount, batch); > > My example is not symmetric like yours is. > > threshold = 0 > batch = 32 > 4 cpus, so (your) error = 2 * 4 * 32 = 256 > > We'll start with 0 for all per-cpu counter values > for simplicity, but we'll not assume that the > amount being added by all CPU's is the same. So > here are both the current per-cpu counter values and > the amount each racing CPU is going to try to add. > > fbc->count = 257 > CPU 0 1 2 3 > value 0 0 0 0 > amount -31 -76 -76 -76 Granted, that is a possible state that can occur at the sampling point, but it assumes that the race condition of CPUs 1, 2 and 3 complete the fbc->count modification before seeing the CPU 0 modification of the local CPU value can occur. I don't think that race condition can occur, because CPUs 1, 2 and 3 will take the slow path because abs(amount) >= batch and serialise on the fbc->lock, whilst CPU 0 will not be serialised, is running under preempt_disable() and so should complete first.... > The real problem is that we're caching the value of > fbc->count, and as soon as we've done that it > could be out of date. To get around it we need > to examine it under lock--preferably a reader/writer > lock I suppose. No, new locks are not an option. The point of percpu counters is to avoid locking where possible. memory barriers, OTOH.... > You'll notice that fbc->count is > only ever read under the protection of the spinlock > everywhere else in the file. It isn't, in fact. percpu_counter_read(), for example. > Locking is avoided > only when checking whether it is possible to add to > a per-cpu value, and that is done with preemption > disabled. Sure, that's the rules for using per-cpu data structures (i.e. preemption needs to be disabled). That doesn't mean we can't avoid locking where possible.... > I have a little more below. > > > Analysis of the failures lead me to this lead me to this logic > > for the unlocked check in my proposed patch: > > > > We can do unlocked modifications concurrently on all CPUs IFF > > > > 1. the code cannot be preempted between sampling fbc->count > > and calling __percpu_counter_add() > > This I agree with (you convinced me in your response > to my patch 5/5). > > > 2. -batch < amount < batch > > This isn't right, or maybe just isn't relevant. The > reason is that it neither takes into account this CPU's > counter value, nor what a concurrent call on the other > CPU's might be adding to the net value. It is supposed to take into account whether the local addition will transgress the error bound for this CPU.... > > 3. the error bound is set to 2 * batch * nr_cpus > > I think this assumes that all the CPU's are adding a limited > amount to the counter. And I still think you're doubling > the error bound unnecessarily (but that may reflect the > implied limit). It is the limit that is supposed to take into account the current counter and the amount being added. I think this the the cause of the issue - the error doesn't correctly take into account amount - it assumes that -batch < amount < batch is always true. We can change the error bound to take amount into account, but that still doesn't work if another CPU is adding a larger abs(amount). So for it to work in all cases we need to set a maximum bound on amount, which is impractical. However, I think this error bound is still valid if we modify the lockless code to detect if fbc->count has changed before we make the modification. It's only when fbc->count changes that the error bound is invalidated if abs(amount) < batch, so checking that it hasn't changed before applying the modification seems to solve the problem to me. And rather than using locking, we can use memory barriers to ensure that we pick up any racing changes to fbc->count. i.e something like: restart: smp_mb(); count = fbc->count; if (count > threshold + error && abs(amount) < batch) { s64 lcount; pcount = this_cpu_ptr(fbc->counters); lcount = *pcount + amount; if (lcount >= batch || lcount <= -batch) { spin_lock(&fbc->lock); if (fbc->count != count) { /* raced with other modifications */ spin_unlock(&fbc->lock); goto restart; } fbc->count += lcount; *pcount = 0; spin_unlock(&fbc->lock); } else { smp_mb(); if (fbc->count != count) goto restart; *pcount = lcount; } ret = 1; got out; } This means that in your example (and indeed any example where fbc->count is modified directly on another CPU), we will detect it and redo the unlocked checks. > Below I have a version of the code that I think would work, > but it unfortunately requires a lock in the (presumably) > normal case. Which, unfortunately, defeats the primary purpose of using a percpu counter in the first place (i.e. to reduce lock contention on a global counter) and so if that is the best that can be done, there is no point making any change. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-12-30 22:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-23 3:56 [PATCH 1/5] percpu_counter: fix test in __percpu_counter_add_unless_lt() Alex Elder 2010-12-23 6:06 ` Dave Chinner 2010-12-29 20:56 ` Alex Elder 2010-12-29 21:49 ` Alex Elder 2010-12-30 22:35 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox