Re: [PATCH]Avoid the overflow when calculate the proportion of bdi quota

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: zhejiang <zhe.jiang@intel.com>
Cc: linux-kernel@vger.kernel.org, yanmin_zhang@linux.intel.com
Subject: Re: [PATCH]Avoid the overflow when calculate the proportion of bdi quota
Date: Fri, 14 Dec 2007 09:47:50 +0100	[thread overview]
Message-ID: <1197622070.6895.33.camel@twins> (raw)
In-Reply-To: <1197620803.22733.13.camel@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 5851 bytes --]


On Fri, 2007-12-14 at 16:26 +0800, zhejiang wrote:
> On Thu, 2007-12-13 at 22:54 +0100, Peter Zijlstra wrote:
> > On Thu, 2007-12-13 at 11:30 +0800, zhejiang wrote:
> > > __percpu_counter_add() cache the result in percpu variable until it
> > > exceeds the batch.
> > > The prop_norm_percpu() use the percpu_counter_read(&pl->events) to read
> > > the counter ,and use percpu_counter_add(&pl->events, -half) to half the
> > > counter.
> > > 
> > > There are potential problems:
> > > 1.The counter may be negative 
> > > 2.After some calculation, it may be converted to a big positive number.
> > > 
> > > 1.
> > > For example, the batch is 32, when the bdi add 32 to the pl->events,
> > > the pl->events->count will be 32.Suppose one of the percpu counter is 1.
> > > 
> > > In the prop_norm_percpu(),the half will be 16.Because it is under the
> > > batch, the pl->events->count won't be modified and one of the percpu
> > > counter may be -15. If call the prop_norm_percpu() again, the half will
> > > still be 16,though it should be 8.The percpu counter may be -31.
> > > Now, there pl->events->count is still 32.
> > > If do the third calculation, the percpu counter will be -47, it will
> > > be merged into the pl->evnets->count.Then pl->events->count will be
> > > negative.
> > > 
> > > 2.When the pl->events->count is negative,
> > > unsigned long val = percpu_counter_read(&pl->events);
> > > This statement may return a negative number, so the val would be a big
> > > number.Because of the overflow, the pl->events->count will be converted
> > > into a big positive number after some calculation.
> > > 
> > > Because of the overflow, I catch some very big numerators when call the
> > > prop_fraction_percpu().
> > > 
> > > I think that it should use percpu_counter_sum() instead of the
> > > percpu_counter_read() to be more robust.
> > > 
> > > diff -Nur a/proportions.c b/proportions.c
> > > --- a/proportions.c     2007-12-12 11:05:59.000000000 +0800
> > > +++ b/proportions.c     2007-12-13 11:05:40.000000000 +0800
> > > @@ -241,7 +241,7 @@
> > >          * can never result in a negative number.
> > >          */
> > >         while (pl->period != global_period) {
> > > -               unsigned long val = percpu_counter_read(&pl->events);
> > > +               unsigned long val = percpu_counter_sum(&pl->events);
> > >                 unsigned long half = (val + 1) >> 1;
> > >  
> > >                 /*
> > > 
> > 
> > _sum is unacceptably slow here. I'm thinking something like
> > bdi_stat_error() as used in balance_dirty_pages() would also solve this
> > issue, no?
> 
> _sum is slow, but it can avoid over-subtraction when the counter is less
> than 2 * BDI_STAT_BATCH.
> 
> How about do the calculation like this patch?
> 
> Change the val and half into long type, if we found the val is negative,
> it means that we did subtraction too much.
> So we actually add some value to it.
> 
> diff -Nur a/proportions.c b/proportions.c
> --- a/proportions.c     2007-12-12 11:05:59.000000000 +0800
> +++ b/proportions.c     2007-12-14 16:21:08.000000000 +0800
> @@ -241,8 +241,18 @@
>          * can never result in a negative number.
>          */
>         while (pl->period != global_period) {
> -               unsigned long val = percpu_counter_read(&pl->events);
> -               unsigned long half = (val + 1) >> 1;
> +               long val = percpu_counter_read(&pl->events);
> +               long half;
> +
> +               if (val >= 0) {
> +                       half = (val + 1) >> 1;
> +               } else {
> +                       /*
> +                        *If val is negative, the counter has been
> subtracted too much.
> +                        *Do some compensation here.
> +                        */
> +                       half = (val - 1) / 2;
> +               }
>  
>                 /*
>                  * Half of zero won't be much less, break out.

How about something like this:

diff --git a/lib/proportions.c b/lib/proportions.c
index 332d8c5..4afb330 100644
--- a/lib/proportions.c
+++ b/lib/proportions.c
@@ -190,6 +190,8 @@ prop_adjust_shift(int *pl_shift, unsigned long *pl_period, int new_shift)
  * PERCPU
  */
 
+#define PROP_BATCH (8*(1+ilog2(nr_cpu_ids)))
+
 int prop_local_init_percpu(struct prop_local_percpu *pl)
 {
 	spin_lock_init(&pl->lock);
@@ -203,6 +205,16 @@ void prop_local_destroy_percpu(struct prop_local_percpu *pl)
 	percpu_counter_destroy(&pl->events);
 }
 
+static unsigned long prop_percpu_read(struct prop_local_percpu *pl)
+{
+	unsigned long val = percpu_counter_read(&pl->events);
+
+	if (val < (nr_cpu_ids * PROP_BATCH))
+		val = percpu_counter_sum(&pl->events);
+
+	return val;
+}
+
 /*
  * Catch up with missed period expirations.
  *
@@ -241,7 +253,7 @@ void prop_norm_percpu(struct prop_global *pg, struct prop_local_percpu *pl)
 	 * can never result in a negative number.
 	 */
 	while (pl->period != global_period) {
-		unsigned long val = percpu_counter_read(&pl->events);
+		unsigned long val = prop_percpu_read(pl);
 		unsigned long half = (val + 1) >> 1;
 
 		/*
@@ -252,7 +264,7 @@ void prop_norm_percpu(struct prop_global *pg, struct prop_local_percpu *pl)
 		if (!val)
 			break;
 
-		percpu_counter_add(&pl->events, -half);
+		__percpu_counter_add(&pl->events, -half, PROP_BATCH);
 		pl->period += period;
 	}
 	pl->period = global_period;
@@ -267,7 +279,7 @@ void __prop_inc_percpu(struct prop_descriptor *pd, struct prop_local_percpu *pl)
 	struct prop_global *pg = prop_get_global(pd);
 
 	prop_norm_percpu(pg, pl);
-	percpu_counter_add(&pl->events, 1);
+	__percpu_counter_add(&pl->events, 1, PROP_BATCH);
 	percpu_counter_add(&pg->events, 1);
 	prop_put_global(pd, pg);
 }


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

next prev parent reply	other threads:[~2007-12-14  8:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-13  3:30 [PATCH]Avoid the overflow when calculate the proportion of bdi quota zhejiang
2007-12-13 21:54 ` Peter Zijlstra
2007-12-14  8:26   ` zhejiang
2007-12-14  8:47     ` Peter Zijlstra [this message]
2007-12-14  9:49       ` Peter Zijlstra
2007-12-14 16:01     ` [PATCH] lib: proportion: fix underflow in prop_norm_percpu() Peter Zijlstra
2007-12-17  1:55       ` Jiang zhe
2007-12-17  5:24       ` Jiang zhe

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:332d8c5 dfblob:4afb330 )
 OR (
bs:"Re: [PATCH]Avoid the overflow when calculate the proportion of bdi quota" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1197622070.6895.33.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yanmin_zhang@linux.intel.com \
    --cc=zhe.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox