percpu: preemptless __per_cpu_counter

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* percpu: preemptless __per_cpu_counter_add
@ 2011-04-13 14:45 Christoph Lameter
  2011-04-13 16:49 ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-13 14:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, eric.dumazet

Use this_cpu_cmpxchg to avoid preempt_disable/enable in
__percpu_counter_add.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 lib/percpu_counter.c |   27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 09:26:19.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-13 09:36:37.000000000 -0500
@@ -71,19 +71,22 @@ EXPORT_SYMBOL(percpu_counter_set);

 void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 {
-	s64 count;
+	s64 count, new;

-	preempt_disable();
-	count = __this_cpu_read(*fbc->counters) + amount;
-	if (count >= batch || count <= -batch) {
-		spin_lock(&fbc->lock);
-		fbc->count += count;
-		__this_cpu_write(*fbc->counters, 0);
-		spin_unlock(&fbc->lock);
-	} else {
-		__this_cpu_write(*fbc->counters, count);
-	}
-	preempt_enable();
+	do {
+		count = this_cpu_read(*fbc->counters);
+
+		new = count + amount;
+		/* In case of overflow fold it into the global counter instead */
+		if (new >= batch || new <= -batch) {
+			spin_lock(&fbc->lock);
+			fbc->count += __this_cpu_read(*fbc->counters) + amount;
+			spin_unlock(&fbc->lock);
+			amount = 0;
+			new = 0;
+		}
+
+	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
 }
 EXPORT_SYMBOL(__percpu_counter_add);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: percpu: preemptless __per_cpu_counter_add
  2011-04-13 14:45 percpu: preemptless __per_cpu_counter_add Christoph Lameter
@ 2011-04-13 16:49 ` Christoph Lameter
  2011-04-13 18:56   ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-13 16:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, eric.dumazet

Duh the retry setup if the number overflows is not correct.

Signed-off-by: Christoph Lameter <cl@linux.com>


---
 lib/percpu_counter.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 11:43:23.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-13 11:43:30.000000000 -0500
@@ -80,9 +80,14 @@ void __percpu_counter_add(struct percpu_
 		/* In case of overflow fold it into the global counter instead */
 		if (new >= batch || new <= -batch) {
 			spin_lock(&fbc->lock);
-			fbc->count += __this_cpu_read(*fbc->counters) + amount;
+			count = __this_cpu_read(*fbc->counters);
+			fbc->count += count + amount;
 			spin_unlock(&fbc->lock);
-			amount = 0;
+			/*
+			 * If cmpxchg fails then we need to subtract the amount that
+			 * we found in the percpu value.
+			 */
+			amount = -count;
 			new = 0;
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: percpu: preemptless __per_cpu_counter_add
  2011-04-13 16:49 ` Christoph Lameter
@ 2011-04-13 18:56   ` Tejun Heo
  2011-04-13 20:22     ` [PATCH] " Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-13 18:56 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, eric.dumazet

On Wed, Apr 13, 2011 at 11:49:51AM -0500, Christoph Lameter wrote:
> Duh the retry setup if the number overflows is not correct.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Can you please repost folded patch with proper [PATCH] subject line
and cc shaohua.li@intel.com so that he can resolve conflicts?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 18:56   ` Tejun Heo
@ 2011-04-13 20:22     ` Christoph Lameter
  2011-04-13 21:50       ` Tejun Heo
  2011-04-14  2:14       ` Eric Dumazet
  0 siblings, 2 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-13 20:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, eric.dumazet, shaohua.li

On Thu, 14 Apr 2011, Tejun Heo wrote:

> On Wed, Apr 13, 2011 at 11:49:51AM -0500, Christoph Lameter wrote:
> > Duh the retry setup if the number overflows is not correct.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Can you please repost folded patch with proper [PATCH] subject line
> and cc shaohua.li@intel.com so that he can resolve conflicts?
>
> Thanks.

Ok here it is:




From: Christoph Lameter <cl@linux.com>
Subject: [PATCH] percpu: preemptless __per_cpu_counter_add

Use this_cpu_cmpxchg to avoid preempt_disable/enable in __percpu_add.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 lib/percpu_counter.c |   27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 09:26:19.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-13 09:36:37.000000000 -0500
@@ -71,19 +71,22 @@ EXPORT_SYMBOL(percpu_counter_set);

 void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 {
-	s64 count;
+	s64 count, new;

-	preempt_disable();
-	count = __this_cpu_read(*fbc->counters) + amount;
-	if (count >= batch || count <= -batch) {
-		spin_lock(&fbc->lock);
-		fbc->count += count;
-		__this_cpu_write(*fbc->counters, 0);
-		spin_unlock(&fbc->lock);
-	} else {
-		__this_cpu_write(*fbc->counters, count);
-	}
-	preempt_enable();
+	do {
+		count = this_cpu_read(*fbc->counters);
+
+		new = count + amount;
+		/* In case of overflow fold it into the global counter instead */
+		if (new >= batch || new <= -batch) {
+			spin_lock(&fbc->lock);
+			fbc->count += __this_cpu_read(*fbc->counters) + amount;
+			spin_unlock(&fbc->lock);
+			amount = 0;
+			new = 0;
+		}
+
+	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
 }
 EXPORT_SYMBOL(__percpu_counter_add);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 20:22     ` [PATCH] " Christoph Lameter
@ 2011-04-13 21:50       ` Tejun Heo
  2011-04-13 22:17         ` Christoph Lameter
  2011-04-14  2:14       ` Eric Dumazet
  1 sibling, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-13 21:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, eric.dumazet, shaohua.li

Hello,

On Wed, Apr 13, 2011 at 03:22:36PM -0500, Christoph Lameter wrote:
> +	do {
> +		count = this_cpu_read(*fbc->counters);
> +
> +		new = count + amount;
> +		/* In case of overflow fold it into the global counter instead */
> +		if (new >= batch || new <= -batch) {
> +			spin_lock(&fbc->lock);
> +			fbc->count += __this_cpu_read(*fbc->counters) + amount;
> +			spin_unlock(&fbc->lock);
> +			amount = 0;
> +			new = 0;
> +		}
> +
> +	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);

Is this correct?  If the percpu count changes in the middle, doesn't
the count get added twice?  Can you please use the cmpxchg() only in
the fast path?  ie.

	do {
		count = this_cpu_read();
		if (overflow) {
			disable preemption and do the slow thing.
			return;
		}
	} while (this_cpu_cmpxchg());

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 21:50       ` Tejun Heo
@ 2011-04-13 22:17         ` Christoph Lameter
  2011-04-13 22:23           ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-13 22:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, eric.dumazet, shaohua.li

On Thu, 14 Apr 2011, Tejun Heo wrote:

> Hello,
>
> On Wed, Apr 13, 2011 at 03:22:36PM -0500, Christoph Lameter wrote:
> > +	do {
> > +		count = this_cpu_read(*fbc->counters);
> > +
> > +		new = count + amount;
> > +		/* In case of overflow fold it into the global counter instead */
> > +		if (new >= batch || new <= -batch) {
> > +			spin_lock(&fbc->lock);
> > +			fbc->count += __this_cpu_read(*fbc->counters) + amount;
> > +			spin_unlock(&fbc->lock);
> > +			amount = 0;
> > +			new = 0;
> > +		}
> > +
> > +	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
>
> Is this correct?  If the percpu count changes in the middle, doesn't
> the count get added twice?  Can you please use the cmpxchg() only in
> the fast path?  ie.

Oh gosh this is the old version. The fixed version should do this
differently.

We need to update the counter that is cpu specific as well otherwise it
will overflow again soon. Hmmm... Yes, it could be done while holding the
spinlock which disables preempt.



Subject: [PATCH] percpu: preemptless __per_cpu_counter_add

Use this_cpu_cmpxchg to avoid preempt_disable/enable in __percpu_add.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 lib/percpu_counter.c |   32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 15:19:54.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-13 15:20:04.000000000 -0500
@@ -71,19 +71,27 @@ EXPORT_SYMBOL(percpu_counter_set);

 void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 {
-	s64 count;
+	s64 count, new;

-	preempt_disable();
-	count = __this_cpu_read(*fbc->counters) + amount;
-	if (count >= batch || count <= -batch) {
-		spin_lock(&fbc->lock);
-		fbc->count += count;
-		__this_cpu_write(*fbc->counters, 0);
-		spin_unlock(&fbc->lock);
-	} else {
-		__this_cpu_write(*fbc->counters, count);
-	}
-	preempt_enable();
+	do {
+		count = this_cpu_read(*fbc->counters);
+
+		new = count + amount;
+		/* In case of overflow fold it into the global counter instead */
+		if (new >= batch || new <= -batch) {
+			spin_lock(&fbc->lock);
+			count = __this_cpu_read(*fbc->counters);
+			fbc->count += count + amount;
+			spin_unlock(&fbc->lock);
+			/*
+			 * If cmpxchg fails then we need to subtract the amount that
+			 * we found in the percpu value.
+			 */
+			amount = -count;
+			new = 0;
+		}
+
+	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
 }
 EXPORT_SYMBOL(__percpu_counter_add);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 22:17         ` Christoph Lameter
@ 2011-04-13 22:23           ` Christoph Lameter
  2011-04-13 23:55             ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-13 22:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-mm, eric.dumazet, shaohua.li


Suggested fixup. Return from slowpath and update percpu variable under
spinlock.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 lib/percpu_counter.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 17:20:41.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-13 17:21:33.000000000 -0500
@@ -82,13 +82,9 @@ void __percpu_counter_add(struct percpu_
 			spin_lock(&fbc->lock);
 			count = __this_cpu_read(*fbc->counters);
 			fbc->count += count + amount;
+			__this_cpu_write(*fbc->counters, 0);
 			spin_unlock(&fbc->lock);
-			/*
-			 * If cmpxchg fails then we need to subtract the amount that
-			 * we found in the percpu value.
-			 */
-			amount = -count;
-			new = 0;
+			return;
 		}

 	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 22:23           ` Christoph Lameter
@ 2011-04-13 23:55             ` Tejun Heo
  2011-04-14  2:00               ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-13 23:55 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-mm, eric.dumazet, shaohua.li

Hello, Christoph.

On Wed, Apr 13, 2011 at 05:23:04PM -0500, Christoph Lameter wrote:
> 
> Suggested fixup. Return from slowpath and update percpu variable under
> spinlock.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  lib/percpu_counter.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> Index: linux-2.6/lib/percpu_counter.c
> ===================================================================
> --- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 17:20:41.000000000 -0500
> +++ linux-2.6/lib/percpu_counter.c	2011-04-13 17:21:33.000000000 -0500
> @@ -82,13 +82,9 @@ void __percpu_counter_add(struct percpu_
>  			spin_lock(&fbc->lock);
>  			count = __this_cpu_read(*fbc->counters);
>  			fbc->count += count + amount;
> +			__this_cpu_write(*fbc->counters, 0);
>  			spin_unlock(&fbc->lock);
> -			/*
> -			 * If cmpxchg fails then we need to subtract the amount that
> -			 * we found in the percpu value.
> -			 */
> -			amount = -count;
> -			new = 0;
> +			return;

Yeah, looks pretty good to me now.  Just a couple more things.

* Please fold this one into the original patch.

* While you're restructuring the functions, can you add unlikely to
  the slow path?

It now looks correct to me but just in case, Eric, do you mind
reviewing and acking it?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 23:55             ` Tejun Heo
@ 2011-04-14  2:00               ` Eric Dumazet
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Dumazet @ 2011-04-14  2:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Christoph Lameter, akpm, linux-mm, shaohua.li

Le jeudi 14 avril 2011 A  08:55 +0900, Tejun Heo a A(C)crit :
> Hello, Christoph.
> 
> On Wed, Apr 13, 2011 at 05:23:04PM -0500, Christoph Lameter wrote:
> > 
> > Suggested fixup. Return from slowpath and update percpu variable under
> > spinlock.
> > 
> > Signed-off-by: Christoph Lameter <cl@linux.com>
> > 
> > ---
> >  lib/percpu_counter.c |    8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> > 
> > Index: linux-2.6/lib/percpu_counter.c
> > ===================================================================
> > --- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 17:20:41.000000000 -0500
> > +++ linux-2.6/lib/percpu_counter.c	2011-04-13 17:21:33.000000000 -0500
> > @@ -82,13 +82,9 @@ void __percpu_counter_add(struct percpu_
> >  			spin_lock(&fbc->lock);
> >  			count = __this_cpu_read(*fbc->counters);
> >  			fbc->count += count + amount;
> > +			__this_cpu_write(*fbc->counters, 0);
> >  			spin_unlock(&fbc->lock);
> > -			/*
> > -			 * If cmpxchg fails then we need to subtract the amount that
> > -			 * we found in the percpu value.
> > -			 */
> > -			amount = -count;
> > -			new = 0;
> > +			return;
> 
> Yeah, looks pretty good to me now.  Just a couple more things.
> 
> * Please fold this one into the original patch.
> 
> * While you're restructuring the functions, can you add unlikely to
>   the slow path?
> 
> It now looks correct to me but just in case, Eric, do you mind
> reviewing and acking it?
> 
> Thanks.
> 

I am not sure its worth it, considering we hit this on machines where
preemption is off (CONFIG_PREEMPT_NONE=y) ?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-13 20:22     ` [PATCH] " Christoph Lameter
  2011-04-13 21:50       ` Tejun Heo
@ 2011-04-14  2:14       ` Eric Dumazet
  2011-04-14 21:10         ` Christoph Lameter
  1 sibling, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2011-04-14  2:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Tejun Heo, akpm, linux-mm, shaohua.li

Le mercredi 13 avril 2011 A  15:22 -0500, Christoph Lameter a A(C)crit :
> On Thu, 14 Apr 2011, Tejun Heo wrote:
> 
> > On Wed, Apr 13, 2011 at 11:49:51AM -0500, Christoph Lameter wrote:
> > > Duh the retry setup if the number overflows is not correct.
> > >
> > > Signed-off-by: Christoph Lameter <cl@linux.com>
> >
> > Can you please repost folded patch with proper [PATCH] subject line
> > and cc shaohua.li@intel.com so that he can resolve conflicts?
> >
> > Thanks.
> 
> Ok here it is:
> 
> 
> 
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: [PATCH] percpu: preemptless __per_cpu_counter_add
> 
> Use this_cpu_cmpxchg to avoid preempt_disable/enable in __percpu_add.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> ---
>  lib/percpu_counter.c |   27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> Index: linux-2.6/lib/percpu_counter.c
> ===================================================================
> --- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 09:26:19.000000000 -0500
> +++ linux-2.6/lib/percpu_counter.c	2011-04-13 09:36:37.000000000 -0500
> @@ -71,19 +71,22 @@ EXPORT_SYMBOL(percpu_counter_set);
> 
>  void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
>  {
> -	s64 count;
> +	s64 count, new;
> 
> -	preempt_disable();
> -	count = __this_cpu_read(*fbc->counters) + amount;
> -	if (count >= batch || count <= -batch) {
> -		spin_lock(&fbc->lock);
> -		fbc->count += count;
> -		__this_cpu_write(*fbc->counters, 0);
> -		spin_unlock(&fbc->lock);
> -	} else {
> -		__this_cpu_write(*fbc->counters, count);
> -	}
> -	preempt_enable();
> +	do {
> +		count = this_cpu_read(*fbc->counters);
> +
> +		new = count + amount;
> +		/* In case of overflow fold it into the global counter instead */
> +		if (new >= batch || new <= -batch) {
> +			spin_lock(&fbc->lock);
> +			fbc->count += __this_cpu_read(*fbc->counters) + amount;
> +			spin_unlock(&fbc->lock);
> +			amount = 0;
> +			new = 0;
> +		}
> +
> +	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
>  }
>  EXPORT_SYMBOL(__percpu_counter_add);
> 

Not sure its a win for my servers, where CONFIG_PREEMPT_NONE=y

Maybe use here latest cmpxchg16b stuff instead and get rid of spinlock ?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-14  2:14       ` Eric Dumazet
@ 2011-04-14 21:10         ` Christoph Lameter
  2011-04-14 21:15           ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-14 21:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tejun Heo, akpm, linux-mm, shaohua.li

On Thu, 14 Apr 2011, Eric Dumazet wrote:

> Not sure its a win for my servers, where CONFIG_PREEMPT_NONE=y

Well the fast path would then also be irq safe. Does that bring us
anything?

We could not do the cmpxchg in the !PREEMPT case and instead simply store
the value.

The preempt on/off seems to be a bigger deal for realtime.

> Maybe use here latest cmpxchg16b stuff instead and get rid of spinlock ?

Shaohua already got an atomic in there. You mean get rid of his preempt
disable/enable in the slow path?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-14 21:10         ` Christoph Lameter
@ 2011-04-14 21:15           ` Tejun Heo
  2011-04-15 17:37             ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-14 21:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello,

On Thu, Apr 14, 2011 at 04:10:34PM -0500, Christoph Lameter wrote:
> On Thu, 14 Apr 2011, Eric Dumazet wrote:
> 
> > Not sure its a win for my servers, where CONFIG_PREEMPT_NONE=y
> 
> Well the fast path would then also be irq safe. Does that bring us
> anything?
> 
> We could not do the cmpxchg in the !PREEMPT case and instead simply store
> the value.
> 
> The preempt on/off seems to be a bigger deal for realtime.

Also, the cmpxchg used is local one w/o LOCK prefix.  It might not
bring anything to table on !PREEMPT kernels but at the same time it
shouldn't hurt either.  One way or the other, some benchmark numbers
showing that it at least doesn't hurt would be nice.

> > Maybe use here latest cmpxchg16b stuff instead and get rid of spinlock ?
> 
> Shaohua already got an atomic in there. You mean get rid of his preempt
> disable/enable in the slow path?

I personally care much less about slow path.  According to Shaohua,
atomic64_t behaves pretty nice and it isn't too complex, so I'd like
to stick with that unless complex this_cpu ops can deliver something
much better.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-14 21:15           ` Tejun Heo
@ 2011-04-15 17:37             ` Christoph Lameter
  2011-04-15 18:27               ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-15 17:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

On Fri, 15 Apr 2011, Tejun Heo wrote:

> > The preempt on/off seems to be a bigger deal for realtime.
>
> Also, the cmpxchg used is local one w/o LOCK prefix.  It might not
> bring anything to table on !PREEMPT kernels but at the same time it
> shouldn't hurt either.  One way or the other, some benchmark numbers
> showing that it at least doesn't hurt would be nice.

Maybe just fall back to this_cpu_write()?
> > > Maybe use here latest cmpxchg16b stuff instead and get rid of spinlock ?
> >
> > Shaohua already got an atomic in there. You mean get rid of his preempt
> > disable/enable in the slow path?
>
> I personally care much less about slow path.  According to Shaohua,
> atomic64_t behaves pretty nice and it isn't too complex, so I'd like
> to stick with that unless complex this_cpu ops can deliver something
> much better.

Ok here is a new patch that will allow Shaohua to simply convert the
slowpath to a single atomic op. No preemption anymore anywhere.



Subject: percpu: preemptless __per_cpu_counter_add V3

Use this_cpu_cmpxchg to avoid preempt_disable/enable in __percpu_add.

V3 - separate out the slow path so that the slowpath can also be done with
	a simple atomic add without preemption enable/disable.
   - Fallback in the !PREEMPT case to a simple this_cpu_write().

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 lib/percpu_counter.c |   29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-13 17:12:59.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-15 12:34:39.000000000 -0500
@@ -71,19 +71,30 @@ EXPORT_SYMBOL(percpu_counter_set);

 void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 {
-	s64 count;
+	s64 count, new, overflow;

-	preempt_disable();
-	count = __this_cpu_read(*fbc->counters) + amount;
-	if (count >= batch || count <= -batch) {
+	do {
+		overflow = 0;
+		count = this_cpu_read(*fbc->counters);
+
+		new = count + amount;
+		/* In case of overflow fold it into the global counter instead */
+		if (new >= batch || new <= -batch) {
+			overflow = new;
+			new = 0;
+		}
+#ifdef CONFIG_PREEMPT
+	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
+#else
+	} while (0);
+	this_cpu_write(*fbc->counters, new);
+#endif
+
+	if (unlikely(overflow)) {
 		spin_lock(&fbc->lock);
-		fbc->count += count;
-		__this_cpu_write(*fbc->counters, 0);
+		fbc->count += overflow;
 		spin_unlock(&fbc->lock);
-	} else {
-		__this_cpu_write(*fbc->counters, count);
 	}
-	preempt_enable();
 }
 EXPORT_SYMBOL(__percpu_counter_add);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-15 17:37             ` Christoph Lameter
@ 2011-04-15 18:27               ` Tejun Heo
  2011-04-15 19:43                 ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-15 18:27 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello, Christoph.

>  void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
>  {
> -	s64 count;
> +	s64 count, new, overflow;
> 
> -	preempt_disable();
> -	count = __this_cpu_read(*fbc->counters) + amount;
> -	if (count >= batch || count <= -batch) {
> +	do {
> +		overflow = 0;
> +		count = this_cpu_read(*fbc->counters);
> +
> +		new = count + amount;
> +		/* In case of overflow fold it into the global counter instead */
> +		if (new >= batch || new <= -batch) {
> +			overflow = new;
> +			new = 0;
> +		}
> +#ifdef CONFIG_PREEMPT
> +	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
> +#else
> +	} while (0);
> +	this_cpu_write(*fbc->counters, new);
> +#endif

Eeeek, no.  If you want to do the above, please put it in a separate
inline function with sufficient comment.

> +	if (unlikely(overflow)) {
>  		spin_lock(&fbc->lock);
> -		fbc->count += count;
> -		__this_cpu_write(*fbc->counters, 0);
> +		fbc->count += overflow;
>  		spin_unlock(&fbc->lock);

Why put this outside and use yet another branch?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-15 18:27               ` Tejun Heo
@ 2011-04-15 19:43                 ` Christoph Lameter
  2011-04-15 23:52                   ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-15 19:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

On Sat, 16 Apr 2011, Tejun Heo wrote:

> > +			new = 0;
> > +		}
> > +#ifdef CONFIG_PREEMPT
> > +	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
> > +#else
> > +	} while (0);
> > +	this_cpu_write(*fbc->counters, new);
> > +#endif
>
> Eeeek, no.  If you want to do the above, please put it in a separate
> inline function with sufficient comment.

That would not work well with the control flow. Just leave the cmpxchg for
both cases? That would make the function irq safe as well!

> > +	if (unlikely(overflow)) {
> >  		spin_lock(&fbc->lock);
> > -		fbc->count += count;
> > -		__this_cpu_write(*fbc->counters, 0);
> > +		fbc->count += overflow;
> >  		spin_unlock(&fbc->lock);
>
> Why put this outside and use yet another branch?

Because that way we do not need preempt enable/disable. The cmpxchg is
used to update the per cpu counter in the slow case as well. All that is
left then is to add the count to the global counter.

The branches are not an issue since they are forward branches over one
(after converting to an atomic operation) or two instructions each. A
possible stall is only possible in case of the cmpxchg failing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-15 19:43                 ` Christoph Lameter
@ 2011-04-15 23:52                   ` Tejun Heo
  2011-04-18 14:38                     ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-15 23:52 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello, Christoph.

On Fri, Apr 15, 2011 at 02:43:15PM -0500, Christoph Lameter wrote:
> On Sat, 16 Apr 2011, Tejun Heo wrote:
> 
> > > +			new = 0;
> > > +		}
> > > +#ifdef CONFIG_PREEMPT
> > > +	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
> > > +#else
> > > +	} while (0);
> > > +	this_cpu_write(*fbc->counters, new);
> > > +#endif
> >
> > Eeeek, no.  If you want to do the above, please put it in a separate
> > inline function with sufficient comment.
> 
> That would not work well with the control flow.

It doesn't have to be that way.  ie.

	static inline bool pcnt_add_cmpxchg(counter, count, new)
	{
		/* blah blah */
	#ifdef PREEMPT
		return this_cpu_cmpxchg() == count;
	#else
		this_cpu_write();
		return true;
	#endif
	}

	void __percpu_counter_add(...)
	{
		...
		do {
			...
		} while (!pcnt_add_cmpxchg(counter, count, new))
		...
	}

It's the same thing but ifdef'd "} while()"'s are just too ugly.

> Just leave the cmpxchg for both cases? That would make the function
> irq safe as well!

Maybe, I don't know.  On x86, it shouldn't be a problem on both 32 and
64bit.  Even on archs which lack local cmpxchg, preemption flips are
cheap anyway so yeah maybe.

> > > +	if (unlikely(overflow)) {
> > >  		spin_lock(&fbc->lock);
> > > -		fbc->count += count;
> > > -		__this_cpu_write(*fbc->counters, 0);
> > > +		fbc->count += overflow;
> > >  		spin_unlock(&fbc->lock);
> >
> > Why put this outside and use yet another branch?
> 
> Because that way we do not need preempt enable/disable. The cmpxchg is
> used to update the per cpu counter in the slow case as well. All that is
> left then is to add the count to the global counter.
> 
> The branches are not an issue since they are forward branches over one
> (after converting to an atomic operation) or two instructions each. A
> possible stall is only possible in case of the cmpxchg failing.

It's slow path and IMHO it's needlessly complex.  I really don't care
whether the counter is reloaded once more or the task gets migrated to
another cpu before spin_lock() and ends up flushing local counter on a
cpu where it isn't strictly necessary.  Let's keep it simple.

Thank you.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-15 23:52                   ` Tejun Heo
@ 2011-04-18 14:38                     ` Christoph Lameter
  2011-04-21 14:43                       ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-18 14:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

On Sat, 16 Apr 2011, Tejun Heo wrote:

> Maybe, I don't know.  On x86, it shouldn't be a problem on both 32 and
> 64bit.  Even on archs which lack local cmpxchg, preemption flips are
> cheap anyway so yeah maybe.

Preemption flips are not cheap since enabling preemption may mean a call
into the scheduler. On RT things get more expensive.

Preempt_enable means at least one additional branch. We are saving a
branch by not using preempt.

> > The branches are not an issue since they are forward branches over one
> > (after converting to an atomic operation) or two instructions each. A
> > possible stall is only possible in case of the cmpxchg failing.
>
> It's slow path and IMHO it's needlessly complex.  I really don't care
> whether the counter is reloaded once more or the task gets migrated to
> another cpu before spin_lock() and ends up flushing local counter on a
> cpu where it isn't strictly necessary.  Let's keep it simple.

In order to make it simple I avoided an preempt enable/disable. With
Shaohua's patches there will be a simple atomic_add within the last if
cluase. I was able to consolidate multiple code paths into the cmpxchg
loop with this approach.

The one below avoids the #ifdef that is ugly...



Subject: percpu: preemptless __per_cpu_counter_add V4

Use this_cpu_cmpxchg to avoid preempt_disable/enable in __percpu_add.

Signed-off-by: Christoph Lameter <cl@linux.com>

---
 lib/percpu_counter.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

Index: linux-2.6/lib/percpu_counter.c
===================================================================
--- linux-2.6.orig/lib/percpu_counter.c	2011-04-15 15:34:23.000000000 -0500
+++ linux-2.6/lib/percpu_counter.c	2011-04-18 09:31:37.000000000 -0500
@@ -71,19 +71,25 @@ EXPORT_SYMBOL(percpu_counter_set);

 void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 {
-	s64 count;
+	s64 count, new, overflow;

-	preempt_disable();
-	count = __this_cpu_read(*fbc->counters) + amount;
-	if (count >= batch || count <= -batch) {
+	do {
+		count = this_cpu_read(*fbc->counters);
+
+		new = count + amount;
+		/* In case of overflow fold it into the global counter instead */
+		if (new >= batch || new <= -batch) {
+			overflow = new;
+			new = 0;
+		} else
+			overflow = 0;
+	} while (this_cpu_cmpxchg(*fbc->counters, count, new) != count);
+
+	if (unlikely(overflow)) {
 		spin_lock(&fbc->lock);
-		fbc->count += count;
-		__this_cpu_write(*fbc->counters, 0);
+		fbc->count += overflow;
 		spin_unlock(&fbc->lock);
-	} else {
-		__this_cpu_write(*fbc->counters, count);
 	}
-	preempt_enable();
 }
 EXPORT_SYMBOL(__percpu_counter_add);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-18 14:38                     ` Christoph Lameter
@ 2011-04-21 14:43                       ` Tejun Heo
  2011-04-21 14:58                         ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-21 14:43 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello, Christoph.

On Mon, Apr 18, 2011 at 09:38:03AM -0500, Christoph Lameter wrote:
> Preemption flips are not cheap since enabling preemption may mean a call
> into the scheduler. On RT things get more expensive.
> 
> Preempt_enable means at least one additional branch. We are saving a
> branch by not using preempt.

It is cheap.  The cost of preempt_enable() regarding scheduler call is
TIF_NEED_RESCHED check.  The scheduler() call occurring afterwards is
not the overhead of preemption check, but the overhead of preemption
itself.  Also, in cases where the preemption check doesn't make sense
(I don't think that's the case here), the right thing to do is using
preempt_enable_no_resched().

> In order to make it simple I avoided an preempt enable/disable. With
> Shaohua's patches there will be a simple atomic_add within the last if
> cluase. I was able to consolidate multiple code paths into the cmpxchg
> loop with this approach.
> 
> The one below avoids the #ifdef that is ugly...

That said, combined with Shaohua's patch, maybe it's better this way.
Let's see...

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 14:43                       ` Tejun Heo
@ 2011-04-21 14:58                         ` Tejun Heo
  2011-04-21 17:50                           ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-21 14:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello, Christoph, Shaohua.

On Thu, Apr 21, 2011 at 04:43:00PM +0200, Tejun Heo wrote:
> > In order to make it simple I avoided an preempt enable/disable. With
> > Shaohua's patches there will be a simple atomic_add within the last if
> > cluase. I was able to consolidate multiple code paths into the cmpxchg
> > loop with this approach.
> > 
> > The one below avoids the #ifdef that is ugly...
> 
> That said, combined with Shaohua's patch, maybe it's better this way.
> Let's see...

Unfortunately, I have a new concern for __percpu_counter_sum(), which
applies to both your and Shaohua's change.  Before these changes,
percpu_counter->lock protects whole of batch transfer.  IOW, while
__percpu_counter_sum() is holding ->lock, it can be sure that batch
transfer from percpu counter to the main counter isn't in progress and
that the deviation it might see is limited by the number of on-going
percpu inc/dec's which is much lower than batch transfers.

With the proposed changes to percpu counter, this no longer holds.
cl's patch de-couples local counter update from the global counter
update and __percpu_counter_sum() can see batch amount of deviation
per concurrent updater making the whole visit-each-counter thing more
or less meaningless.  This, however, can be fixed by putting the whole
slow path inside spin_lock() as suggested before so that the whole
batch transferring from local to global is enclosed inside spinlock.

Unfortunately, Shaohua's atomic64_t update ain't that easy.  The whole
point of that update was avoiding spinlocks in favor of atomic64_t,
which naturally collides with the ability to enclosing local and
global updates into the same exclusion block, which is necessary for
__percpu_counter_sum() accuracy.

So, Christoph, please put the whole slow path inside spin_lock().
Shaohua, unfortunately, I think your change is caught inbetween rock
and hard place.  Any ideas?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 14:58                         ` Tejun Heo
@ 2011-04-21 17:50                           ` Christoph Lameter
  2011-04-21 18:01                             ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-21 17:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

On Thu, 21 Apr 2011, Tejun Heo wrote:

> Unfortunately, I have a new concern for __percpu_counter_sum(), which
> applies to both your and Shaohua's change.  Before these changes,
> percpu_counter->lock protects whole of batch transfer.  IOW, while
> __percpu_counter_sum() is holding ->lock, it can be sure that batch
> transfer from percpu counter to the main counter isn't in progress and
> that the deviation it might see is limited by the number of on-going
> percpu inc/dec's which is much lower than batch transfers.

Yes. But there was already a fuzzyiness coming with the
__percpu_counter_sum() not seeing the percpu counters that are being
updated due to unserialized access to the counters before this patch.
There is no material difference here. The VM statistics counters work the
same way and have to deal with similar fuzziness effects.

> With the proposed changes to percpu counter, this no longer holds.
> cl's patch de-couples local counter update from the global counter
> update and __percpu_counter_sum() can see batch amount of deviation
> per concurrent updater making the whole visit-each-counter thing more
> or less meaningless.  This, however, can be fixed by putting the whole
> slow path inside spin_lock() as suggested before so that the whole
> batch transferring from local to global is enclosed inside spinlock.

The local counter increment was already decoupled before. The shifting of
the overflow into the global counter was also not serialized before.

> Unfortunately, Shaohua's atomic64_t update ain't that easy.  The whole
> point of that update was avoiding spinlocks in favor of atomic64_t,
> which naturally collides with the ability to enclosing local and
> global updates into the same exclusion block, which is necessary for
> __percpu_counter_sum() accuracy.

There was no total accuracy before either.

> So, Christoph, please put the whole slow path inside spin_lock().
> Shaohua, unfortunately, I think your change is caught inbetween rock
> and hard place.  Any ideas?

I think there is no new problem here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 17:50                           ` Christoph Lameter
@ 2011-04-21 18:01                             ` Tejun Heo
  2011-04-21 18:20                               ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-21 18:01 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello, Christoph.

On Thu, Apr 21, 2011 at 12:50:11PM -0500, Christoph Lameter wrote:
> Yes. But there was already a fuzzyiness coming with the
> __percpu_counter_sum() not seeing the percpu counters that are being
> updated due to unserialized access to the counters before this patch.
> There is no material difference here. The VM statistics counters work the
> same way and have to deal with similar fuzziness effects.

That basically amounts to "there's no difference between
percpu_counter_sum() and percpu_counter_read()", which is true in the
sense that both don't guarantee complete correctness.

The only difference between the two is the level of fuziness.  The
former deviates only by the number of concurrent updaters (and maybe
cacheline update latencies) while the latter may deviate in multiples
of @batch.

If you wanna say that the difference in the level of fuzziness is
irrelevant, the first patch of this series should be removing
percpu_counter_sum() before making any other changes.

So, yeah, here, the level of fuzziness matters and that's exactly why
we have the hugely costly percpu_counter_sum().  Even with the
proposed change, percpu_counter_sum() would tend to be more accurate
because the number of batch deviations is limited by the number of
concurrent updaters but it would still be much worse than before.

> The local counter increment was already decoupled before. The shifting of
> the overflow into the global counter was also not serialized before.

No, it wasn't.

	...
	if (count >= batch || count <= -batch) {
		spin_lock(&fbc->lock);
		fbc->count += count;
		__this_cpu_write(*fbc->counters, 0);
		spin_unlock(&fbc->lock);
	} else {
	...

percpu_counter_sum() would see either both the percpu and global
counters updated or un-updated.  It will never see local counter reset
with global counter not updated yet.

> There was no total accuracy before either.

It's not about total accuracy.  It's about different levels of
fuzziness.  If it can be shown that the different levels of fuzziness
doesn't matter and thus percpu_counter_sum() can be removed, I'll be a
happy camper.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 18:01                             ` Tejun Heo
@ 2011-04-21 18:20                               ` Christoph Lameter
  2011-04-21 18:37                                 ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-21 18:20 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

On Thu, 21 Apr 2011, Tejun Heo wrote:

> The only difference between the two is the level of fuziness.  The
> former deviates only by the number of concurrent updaters (and maybe
> cacheline update latencies) while the latter may deviate in multiples
> of @batch.

I dont think multiple times of batch is such a concern. Either the per cpu
counter is high or the overflow has been folded into the global counter.

The interregnum is very short and since the counters are already fuzzy
this is tolerable. We do the same thing elsewhere for vmstats.

> If you wanna say that the difference in the level of fuzziness is
> irrelevant, the first patch of this series should be removing
> percpu_counter_sum() before making any other changes.

percpu_counter_sum() is more accurate since it considers the per cpu
counters. That is vastly different.

> > The local counter increment was already decoupled before. The shifting of
> > the overflow into the global counter was also not serialized before.
>
> No, it wasn't.

>
> 	...
> 	if (count >= batch || count <= -batch) {
> 		spin_lock(&fbc->lock);
> 		fbc->count += count;
> 		__this_cpu_write(*fbc->counters, 0);
> 		spin_unlock(&fbc->lock);
> 	} else {
> 	...
>
> percpu_counter_sum() would see either both the percpu and global
> counters updated or un-updated.  It will never see local counter reset
> with global counter not updated yet.

Sure there is a slight race there and there is no way to avoid that race
without a lock.

> > There was no total accuracy before either.
>
> It's not about total accuracy.  It's about different levels of
> fuzziness.  If it can be shown that the different levels of fuzziness
> doesn't matter and thus percpu_counter_sum() can be removed, I'll be a
> happy camper.

percpu_counter_sum() is a totally different animal since it considers the
per cpu differentials but while it does that the per cpu differentials can
be updated. So the fuzziness is much lower than just looking at the global
counter for wich all sorts of counters differentials on multiple cpus can
be outstanding over long time periods.

Look at mm/vmstat.c. There is __inc_zone_state() which does an analogous
thing. and include/linux/vmstat.h:zone_page_state_snapshot() which is
analoguous to percpu_counter_sum().

In fact as far as I can tell the percpu_counter stuff was cribbed from
that one. What I did is the same process as in mm/vmstat.c:mod_state.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 18:20                               ` Christoph Lameter
@ 2011-04-21 18:37                                 ` Tejun Heo
  2011-04-21 18:54                                   ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-21 18:37 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello, Christoph.

On Thu, Apr 21, 2011 at 01:20:39PM -0500, Christoph Lameter wrote:
> I dont think multiple times of batch is such a concern. Either the per cpu
> counter is high or the overflow has been folded into the global counter.
> 
> The interregnum is very short and since the counters are already fuzzy
> this is tolerable. We do the same thing elsewhere for vmstats.

We're talking about three different levels of fuzziness.

1. percpu_counter_sum() before the changes

	May deviate by the number of concurrent updaters and cacheline
	update latencies.

2. percpu_counter_sum() after the changes

	May deviate by multiples of @batch; however, the duration
	during which the deviation may be visible is brief (really?
	we're allowing preemption between local and global updates).

3. percpu_counter_read()

	May deviate by multiples of @batch.  Deviations are visible
	almost always.

You're arguing that change from #1 to #2 should be okay, which might
as well be true, but your change per-se doesn't require such
compromise and there's no reason to bundle the two changes together,
so, again, please update your patch to avoid the transition from #1 to
#2.

Shaohua's change requires transition from #1 to #2, which might or
might not be okay.  I really don't know.  You say it should be okay as
it came from vmstat and vmstat is updated the same way; however, no
matter where it came from, percpu_counter is now used in different
places which may or may not have different expectations regarding the
level of fuzziness in percpu_counter_sum(), so we would need more than
"but vmstat does that too" to make the change.

If you haven't noticed yet, I'm not feeling too enthusiastic about
cold path optimizations.  If cold path is kicking in too often, change
the code such that things don't happen that way instead of trying to
make cold paths go faster.  Leave cold paths robust and easy to
understand.

So, unless someone can show me that percpu_counter_sum() is
unnecessary (ie. the differences between not only #1 and #2 but also
between #1 and #3 are irrelevant), I don't think I'm gonna change the
slow path.  It's silly to micro optimize slow path to begin with and
I'm not gonna do that at the cost of subtle functionality change which
can bite us in the ass in twisted ways.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 18:37                                 ` Tejun Heo
@ 2011-04-21 18:54                                   ` Christoph Lameter
  2011-04-21 19:08                                     ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-21 18:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

On Thu, 21 Apr 2011, Tejun Heo wrote:

> > The interregnum is very short and since the counters are already fuzzy
> > this is tolerable. We do the same thing elsewhere for vmstats.
>
> We're talking about three different levels of fuzziness.
>
> 1. percpu_counter_sum() before the changes
>
> 	May deviate by the number of concurrent updaters and cacheline
> 	update latencies.

Not true. Any of the updaters may have done multiple updates by the time
we are through cycling through the list.

> 2. percpu_counter_sum() after the changes
>
> 	May deviate by multiples of @batch; however, the duration
> 	during which the deviation may be visible is brief (really?
> 	we're allowing preemption between local and global updates).

Well again there is general fuzziness here and we are trying to make the
best of it without compromising performance too much. Shaohua's numbers
indicate that removing the lock is very advantagous. More over we do the
same thing in other places.

> So, unless someone can show me that percpu_counter_sum() is
> unnecessary (ie. the differences between not only #1 and #2 but also
> between #1 and #3 are irrelevant), I don't think I'm gonna change the
> slow path.  It's silly to micro optimize slow path to begin with and
> I'm not gonna do that at the cost of subtle functionality change which
> can bite us in the ass in twisted ways.

Actually its good to make the code paths for vmstats and percpu counters
similar. That is what this does too.

Preempt enable/disable in any function that is supposedly fast is
something bad that can be avoided with these patches as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 18:54                                   ` Christoph Lameter
@ 2011-04-21 19:08                                     ` Tejun Heo
  2011-04-22  2:33                                       ` Shaohua Li
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-21 19:08 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, akpm, linux-mm, shaohua.li

Hello,

On Thu, Apr 21, 2011 at 01:54:51PM -0500, Christoph Lameter wrote:
> Well again there is general fuzziness here and we are trying to make the
> best of it without compromising performance too much. Shaohua's numbers
> indicate that removing the lock is very advantagous. More over we do the
> same thing in other places.

The problem with Shaohua's numbers is that it's a pessimistic test
case with too low batch count.  If an optimization improves such
situations without compromising funcitionality or introducing too much
complexity, sure, why not?  But I'm not sure that's the case here.

> Actually its good to make the code paths for vmstats and percpu counters
> similar. That is what this does too.
> 
> Preempt enable/disable in any function that is supposedly fast is
> something bad that can be avoided with these patches as well.

If you really wanna push the _sum() fuziness change, the only way to
do that would be auditing all the current users and making sure that
it won't affect any of them.  It really doesn't matter what vmstat is
doing.  They're different users.

And, no matter what, that's a separate issue from the this_cpu hot
path optimizations and should be done separately.  So, _please_ update
this_cpu patch so that it doesn't change the slow path semantics.

Thank you.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-21 19:08                                     ` Tejun Heo
@ 2011-04-22  2:33                                       ` Shaohua Li
  2011-04-26 12:10                                         ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Shaohua Li @ 2011-04-22  2:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 2011-04-22 at 03:08 +0800, Tejun Heo wrote:
> Hello,
> 
> On Thu, Apr 21, 2011 at 01:54:51PM -0500, Christoph Lameter wrote:
> > Well again there is general fuzziness here and we are trying to make the
> > best of it without compromising performance too much. Shaohua's numbers
> > indicate that removing the lock is very advantagous. More over we do the
> > same thing in other places.
> 
> The problem with Shaohua's numbers is that it's a pessimistic test
> case with too low batch count.  If an optimization improves such
> situations without compromising funcitionality or introducing too much
> complexity, sure, why not?  But I'm not sure that's the case here.
> 
> > Actually its good to make the code paths for vmstats and percpu counters
> > similar. That is what this does too.
> > 
> > Preempt enable/disable in any function that is supposedly fast is
> > something bad that can be avoided with these patches as well.
> 
> If you really wanna push the _sum() fuziness change, the only way to
> do that would be auditing all the current users and making sure that
> it won't affect any of them.  It really doesn't matter what vmstat is
> doing.  They're different users.
> 
> And, no matter what, that's a separate issue from the this_cpu hot
> path optimizations and should be done separately.  So, _please_ update
> this_cpu patch so that it doesn't change the slow path semantics.
in the original implementation, a updater can change several times too,
it can update the count from -(batch -1) to (batch -1) without holding
the lock. so we always have batch*num_cpus*2 deviate

if we really worry about _sum deviates too much. can we do something
like this:
percpu_counter_sum
{
again:
	sum=0
	old = atomic64_read(&fbc->counter)
	for_each_online_cpu()
		sum += per cpu counter
	new = atomic64_read(&fbc->counter)
	if (new - old > batch * num_cpus || old - new > batch * num_cpus)
		goto again;
	return new + sum;
}
in this way we limited the deviate to number of concurrent updater. This
doesn't make _sum too slow too, because we have the batch * num_cpus
check.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-22  2:33                                       ` Shaohua Li
@ 2011-04-26 12:10                                         ` Tejun Heo
  2011-04-26 19:02                                           ` Hugh Dickins
  2011-04-27  5:43                                           ` Shaohua Li
  0 siblings, 2 replies; 64+ messages in thread
From: Tejun Heo @ 2011-04-26 12:10 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hello, please pardon delay (and probably bad temper).  I'm still sick
& slow.

On Fri, Apr 22, 2011 at 10:33:00AM +0800, Shaohua Li wrote:
> > And, no matter what, that's a separate issue from the this_cpu hot
> > path optimizations and should be done separately.  So, _please_ update
> > this_cpu patch so that it doesn't change the slow path semantics.
>
> in the original implementation, a updater can change several times too,
> it can update the count from -(batch -1) to (batch -1) without holding
> the lock. so we always have batch*num_cpus*2 deviate

That would be a pathelogical case but, even then, after the change the
number becomes much higher as it becomes a function of batch *
num_updaters, right?

I'll try to re-summarize my concerns as my communications don't seem
to be getting through very well these few days (likely my fault).

The biggest issue I have with the change is that with the suggested
changes, the devaition seen by _sum becomes much less predictable.
_sum can't be accurate.  It never was and never will be, but the
deviations have been quite predictable regardless of @batch.  It's
dependent only on the number and frequency of concurrent updaters.

If concurrent updates aren't very frequent and numerous, the caller is
guaranteed to get a result which deviates only by quite small margin.
If concurrent updates are very frequent and numerous, the caller
natuarally can't expect a very accurate result.

However, after the change, especially with high @batch count, the
result may deviate significantly even with low frequency concurrent
updates.  @batch deviations won't happen often but will happen once in
a while, which is just nasty and makes the API much less useful and
those occasional deviations can cause sporadic erratic behaviors -
e.g. filesystems use it for free block accounting.  It's actually used
for somewhat critical decision making.

If it were in the fast path, sure, we might and plan for slower
contingencies where accuracy is more important, but we're talking
about slow path already - it's visiting each per-cpu area for $DEITY's
sake, so the tradeoff doesn't make a lot of sense to me.

> if we really worry about _sum deviates too much. can we do something
> like this:
> percpu_counter_sum
> {
> again:
> 	sum=0
> 	old = atomic64_read(&fbc->counter)
> 	for_each_online_cpu()
> 		sum += per cpu counter
> 	new = atomic64_read(&fbc->counter)
> 	if (new - old > batch * num_cpus || old - new > batch * num_cpus)
> 		goto again;
> 	return new + sum;
> }
> in this way we limited the deviate to number of concurrent updater. This
> doesn't make _sum too slow too, because we have the batch * num_cpus
> check.

I don't really worry about _sum performance.  It's a quite slow path
and most of the cost is from causing cacheline bounces anyway.  That
said, I don't see how the above would help the deviation problem.
Let's say an updater reset per cpu counter but got preempted before
updating the global counter.  What differences does it make to check
fbc->counter before & after like above?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-26 12:10                                         ` Tejun Heo
@ 2011-04-26 19:02                                           ` Hugh Dickins
  2011-04-27 10:28                                             ` Tejun Heo
  2011-04-27  5:43                                           ` Shaohua Li
  1 sibling, 1 reply; 64+ messages in thread
From: Hugh Dickins @ 2011-04-26 19:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Christoph Lameter, Eric Dumazet,
	akpm@linux-foundation.org, linux-mm@kvack.org

On Tue, 26 Apr 2011, Tejun Heo wrote:
> 
> However, after the change, especially with high @batch count, the
> result may deviate significantly even with low frequency concurrent
> updates.  @batch deviations won't happen often but will happen once in
> a while, which is just nasty and makes the API much less useful and
> those occasional deviations can cause sporadic erratic behaviors -
> e.g. filesystems use it for free block accounting.  It's actually used
> for somewhat critical decision making.

This worried me a little when the percpu block counting went into tmpfs,
though it's not really critical there.

Would it be feasible, with these counters that are used against limits,
to have an adaptive batching scheme such that the batches get smaller
and smaller, down to 1 and to 0, as the total approaches the limit?
(Of course a single global percpu_counter_batch won't do for this.)

Perhaps it's a demonstrable logical impossibility, perhaps it would
slow down the fast (far from limit) path more than we can afford,
perhaps I haven't read enough of this thread and I'm taking it
off-topic.  Forgive me if so.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-26 12:10                                         ` Tejun Heo
  2011-04-26 19:02                                           ` Hugh Dickins
@ 2011-04-27  5:43                                           ` Shaohua Li
  2011-04-27 10:20                                             ` Tejun Heo
  1 sibling, 1 reply; 64+ messages in thread
From: Shaohua Li @ 2011-04-27  5:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Tue, 2011-04-26 at 20:10 +0800, Tejun Heo wrote:
> Hello, please pardon delay (and probably bad temper).  I'm still sick
> & slow.
no problem.

> On Fri, Apr 22, 2011 at 10:33:00AM +0800, Shaohua Li wrote:
> > > And, no matter what, that's a separate issue from the this_cpu hot
> > > path optimizations and should be done separately.  So, _please_ update
> > > this_cpu patch so that it doesn't change the slow path semantics.
> >
> > in the original implementation, a updater can change several times too,
> > it can update the count from -(batch -1) to (batch -1) without holding
> > the lock. so we always have batch*num_cpus*2 deviate
> 
> That would be a pathelogical case but, even then, after the change the
> number becomes much higher as it becomes a function of batch *
> num_updaters, right?
I don't understand the difference between batch * num_updaters and batch
* num_cpus except preempt. So the only problem here is _add should have
preempt disabled? I agree preempt can make deviation worse.
except the preempt issue, are there other concerns against the atomic
convert? in the preempt disabled case, before/after the atomic convert
the deviation is the same (batch*num_cpus)

> I'll try to re-summarize my concerns as my communications don't seem
> to be getting through very well these few days (likely my fault).
> 
> The biggest issue I have with the change is that with the suggested
> changes, the devaition seen by _sum becomes much less predictable.
> _sum can't be accurate.  It never was and never will be, but the
> deviations have been quite predictable regardless of @batch.  It's
> dependent only on the number and frequency of concurrent updaters.
> 
> If concurrent updates aren't very frequent and numerous, the caller is
> guaranteed to get a result which deviates only by quite small margin.
> If concurrent updates are very frequent and numerous, the caller
> natuarally can't expect a very accurate result.
> 
> However, after the change, especially with high @batch count, the
> result may deviate significantly even with low frequency concurrent
> updates.  @batch deviations won't happen often but will happen once in
> a while, which is just nasty and makes the API much less useful and
> those occasional deviations can cause sporadic erratic behaviors -
> e.g. filesystems use it for free block accounting.  It's actually used
> for somewhat critical decision making.
> 
> If it were in the fast path, sure, we might and plan for slower
> contingencies where accuracy is more important, but we're talking
> about slow path already - it's visiting each per-cpu area for $DEITY's
> sake, so the tradeoff doesn't make a lot of sense to me.
> 
> > if we really worry about _sum deviates too much. can we do something
> > like this:
> > percpu_counter_sum
> > {
> > again:
> > 	sum=0
> > 	old = atomic64_read(&fbc->counter)
> > 	for_each_online_cpu()
> > 		sum += per cpu counter
> > 	new = atomic64_read(&fbc->counter)
> > 	if (new - old > batch * num_cpus || old - new > batch * num_cpus)
> > 		goto again;
> > 	return new + sum;
> > }
> > in this way we limited the deviate to number of concurrent updater. This
> > doesn't make _sum too slow too, because we have the batch * num_cpus
> > check.
> 
> I don't really worry about _sum performance.  It's a quite slow path
> and most of the cost is from causing cacheline bounces anyway.  That
> said, I don't see how the above would help the deviation problem.
> Let's say an updater reset per cpu counter but got preempted before
> updating the global counter.  What differences does it make to check
> fbc->counter before & after like above?
yes, this is a problem. Again I don't mind to disable preempt in _add.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-27  5:43                                           ` Shaohua Li
@ 2011-04-27 10:20                                             ` Tejun Heo
  2011-04-28  3:28                                               ` Shaohua Li
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-27 10:20 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hello, Shaohua.

On Wed, Apr 27, 2011 at 01:43:29PM +0800, Shaohua Li wrote:
> > That would be a pathelogical case but, even then, after the change the
> > number becomes much higher as it becomes a function of batch *
> > num_updaters, right?
>
> I don't understand the difference between batch * num_updaters and batch
> * num_cpus except preempt. So the only problem here is _add should have
> preempt disabled? I agree preempt can make deviation worse.
> except the preempt issue, are there other concerns against the atomic
> convert? in the preempt disabled case, before/after the atomic convert
> the deviation is the same (batch*num_cpus)

Yes, with preemption disabled, I think the patheological worst case
wouldn't be too different.

> > I don't really worry about _sum performance.  It's a quite slow path
> > and most of the cost is from causing cacheline bounces anyway.  That
> > said, I don't see how the above would help the deviation problem.
> > Let's say an updater reset per cpu counter but got preempted before
> > updating the global counter.  What differences does it make to check
> > fbc->counter before & after like above?
>
> yes, this is a problem. Again I don't mind to disable preempt in _add.

Okay, this communication failure isn't my fault.  Please re-read what
I wrote before, my concern wasn't primarily about pathological worst
case - if that many concurrent updates are happening && the counter
needs to be accurate, it can't even use atomic counter.  It should be
doing full exclusion around the counter and the associated operation
_together_.

I'm worried about sporadic erratic behavior happening regardless of
update frequency and preemption would contribute but isn't necessary
for that to happen.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-26 19:02                                           ` Hugh Dickins
@ 2011-04-27 10:28                                             ` Tejun Heo
  0 siblings, 0 replies; 64+ messages in thread
From: Tejun Heo @ 2011-04-27 10:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Shaohua Li, Christoph Lameter, Eric Dumazet,
	akpm@linux-foundation.org, linux-mm@kvack.org

Hello, Hugh.

On Tue, Apr 26, 2011 at 12:02:15PM -0700, Hugh Dickins wrote:
> This worried me a little when the percpu block counting went into tmpfs,
> though it's not really critical there.
> 
> Would it be feasible, with these counters that are used against limits,
> to have an adaptive batching scheme such that the batches get smaller
> and smaller, down to 1 and to 0, as the total approaches the limit?
> (Of course a single global percpu_counter_batch won't do for this.)
> 
> Perhaps it's a demonstrable logical impossibility, perhaps it would
> slow down the fast (far from limit) path more than we can afford,
> perhaps I haven't read enough of this thread and I'm taking it
> off-topic.  Forgive me if so.

Hmmm... it seems like what you're suggesting is basically a simple
per-cpu slot allocator.

* The global counter is initialized with predefined number of slots
  and per-cpu buffers start at zero (or with initial chunks).

* Allocation first consults per-cpu buffer and if enough slots are
  there, they're consumed.  If not, global counter is accessed and
  some portion of it is transferred to per-cpu buffer and allocation
  is retried.  The amount transferred is adjusted according to the
  amount of slots available in the global pool.

* Free populates the per-cpu buffer.  If it meets certain criteria,
  part of the buffer is returned to the global pool.

The principles are easy but as with any per-cpu allocation scheme
balancing and reclaiming would be the complex part.  ie. What to do
when the global and some per-cpu buffers are draining while a few
still have some left.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-27 10:20                                             ` Tejun Heo
@ 2011-04-28  3:28                                               ` Shaohua Li
  2011-04-28 10:09                                                 ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Shaohua Li @ 2011-04-28  3:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Wed, 2011-04-27 at 18:20 +0800, Tejun Heo wrote:
> Hello, Shaohua.
> 
> On Wed, Apr 27, 2011 at 01:43:29PM +0800, Shaohua Li wrote:
> > > That would be a pathelogical case but, even then, after the change the
> > > number becomes much higher as it becomes a function of batch *
> > > num_updaters, right?
> >
> > I don't understand the difference between batch * num_updaters and batch
> > * num_cpus except preempt. So the only problem here is _add should have
> > preempt disabled? I agree preempt can make deviation worse.
> > except the preempt issue, are there other concerns against the atomic
> > convert? in the preempt disabled case, before/after the atomic convert
> > the deviation is the same (batch*num_cpus)
> 
> Yes, with preemption disabled, I think the patheological worst case
> wouldn't be too different.
> 
> > > I don't really worry about _sum performance.  It's a quite slow path
> > > and most of the cost is from causing cacheline bounces anyway.  That
> > > said, I don't see how the above would help the deviation problem.
> > > Let's say an updater reset per cpu counter but got preempted before
> > > updating the global counter.  What differences does it make to check
> > > fbc->counter before & after like above?
> >
> > yes, this is a problem. Again I don't mind to disable preempt in _add.
> 
> Okay, this communication failure isn't my fault.  Please re-read what
> I wrote before, my concern wasn't primarily about pathological worst
> case - if that many concurrent updates are happening && the counter
> needs to be accurate, it can't even use atomic counter.  It should be
> doing full exclusion around the counter and the associated operation
> _together_.
> 
> I'm worried about sporadic erratic behavior happening regardless of
> update frequency and preemption would contribute but isn't necessary
> for that to happen.
Ok, I misunderstood the mail you sent to Christoph, sorry. So you have
no problem about the atomic convert. I'll update the patch against base
tree, given the preemptless patch has problem.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28  3:28                                               ` Shaohua Li
@ 2011-04-28 10:09                                                 ` Tejun Heo
  2011-04-28 14:11                                                   ` Christoph Lameter
  2011-04-29  8:19                                                   ` Shaohua Li
  0 siblings, 2 replies; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 10:09 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hey,

On Thu, Apr 28, 2011 at 11:28:04AM +0800, Shaohua Li wrote:
> > Okay, this communication failure isn't my fault.  Please re-read what
> > I wrote before, my concern wasn't primarily about pathological worst
> > case - if that many concurrent updates are happening && the counter
> > needs to be accurate, it can't even use atomic counter.  It should be
> > doing full exclusion around the counter and the associated operation
> > _together_.
> > 
> > I'm worried about sporadic erratic behavior happening regardless of
> > update frequency and preemption would contribute but isn't necessary
> > for that to happen.
>
> Ok, I misunderstood the mail you sent to Christoph, sorry. So you have
> no problem about the atomic convert. I'll update the patch against base
> tree, given the preemptless patch has problem.

Hmm... we're now more lost than ever. :-( Can you please re-read my
message two replies ago?  The one where I talked about sporadic
erratic behaviors in length and why I was worried about it.

In your last reply, you talked about preemption and that you didn't
have problems with disabling preemption, which, unfortunately, doesn't
have much to do with my concern with the sporadic erratic behaviors
and that's what I pointed out in my previous reply.  So, it doesn't
feel like anything is resolved.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 10:09                                                 ` Tejun Heo
@ 2011-04-28 14:11                                                   ` Christoph Lameter
  2011-04-28 14:23                                                     ` Tejun Heo
  2011-04-29  8:19                                                   ` Shaohua Li
  1 sibling, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 14:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> Hmm... we're now more lost than ever. :-( Can you please re-read my
> message two replies ago?  The one where I talked about sporadic
> erratic behaviors in length and why I was worried about it.
>
> In your last reply, you talked about preemption and that you didn't
> have problems with disabling preemption, which, unfortunately, doesn't
> have much to do with my concern with the sporadic erratic behaviors
> and that's what I pointed out in my previous reply.  So, it doesn't
> feel like anything is resolved.

Sporadic erratic behavior exists today since any thread can add an
abitrary number to its local counter while you are adding up all the per
cpu differentials. If this happens just after you picked up the value then
a single cpu can cause a high deviation. If multiple cpus do this then a
high degree of deviation can even be had with todays implementation.

Can you show in some tests how the chance of deviations is increased? If
at all then in some special sitations. Maybe others get better?

The counters were always designed to be racy for performance reasons.
Trying to serialize them goes against the design of these things. In order
to increase accuracy you have to decrease the allowable delta in the per
cpu differentials.

Looping over all differentials to get more accuracy is something that may
not work as we have seen recently with the VM counters issues that caused
bad behavior during reclaim.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:11                                                   ` Christoph Lameter
@ 2011-04-28 14:23                                                     ` Tejun Heo
  2011-04-28 14:30                                                       ` Tejun Heo
  2011-04-28 14:42                                                       ` Christoph Lameter
  0 siblings, 2 replies; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 14:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hello, Christoph.

On Thu, Apr 28, 2011 at 09:11:20AM -0500, Christoph Lameter wrote:
> Sporadic erratic behavior exists today since any thread can add an
> abitrary number to its local counter while you are adding up all the per
> cpu differentials. If this happens just after you picked up the value then
> a single cpu can cause a high deviation. If multiple cpus do this then a
> high degree of deviation can even be had with todays implementation.

Yeah, but that's still something which is expected.  If the user is
adding/subtracing large number concurrently, sure.  What I'm concerned
about is adding unexpected deviation.  Currently, the _sum interface
basically provides basically the same level of expectedness (is this
even a word?) as atomic_t - the deviations are solely caused and
limited by concurrent updates.  After the proposed changes, one
concurrent updater doing +1 can cause @batch deviation.

> Can you show in some tests how the chance of deviations is increased? If
> at all then in some special sitations. Maybe others get better?

It's kinda obvious, isn't it?  Do relatively low freq (say, every
10ms) +1's and continuously do _sum().  Before, _sum() would never
deviate much from the real count.  After, there will be @batch jumps.
If you still need proof code, I would write it but please note that
I'm pretty backed up.

> The counters were always designed to be racy for performance reasons.
> Trying to serialize them goes against the design of these things. In order
> to increase accuracy you have to decrease the allowable delta in the per
> cpu differentials.
> 
> Looping over all differentials to get more accuracy is something that may
> not work as we have seen recently with the VM counters issues that caused
> bad behavior during reclaim.

Such users then shouldn't use _sum() - maybe rename it to
_very_slow_sum() if you're concerned about misusage.  percpu_counter()
is already used in filesystems to count free blocks and there are
times where atomic_t type accuracy is needed and _sum() achieves that.
The proposed changes break that.  Why do I need to say this over and
over again?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:23                                                     ` Tejun Heo
@ 2011-04-28 14:30                                                       ` Tejun Heo
  2011-04-28 14:58                                                         ` Christoph Lameter
  2011-04-28 14:42                                                       ` Christoph Lameter
  1 sibling, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 14:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, Apr 28, 2011 at 04:23:31PM +0200, Tejun Heo wrote:
> Such users then shouldn't use _sum() - maybe rename it to
> _very_slow_sum() if you're concerned about misusage.  percpu_counter()
> is already used in filesystems to count free blocks and there are
> times where atomic_t type accuracy is needed and _sum() achieves that.
> The proposed changes break that.  Why do I need to say this over and
> over again?

And I'm getting more and more frustrated.  THIS IS SLOW PATH.  If it's
showing up on your profile, bump up @batch.  It doesn't make any sense
to micro optimize slow path at the cost of introducing such nastiness.
Unless someone can show me such nastiness doesn't exist, I'm not gonna
take this change.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:23                                                     ` Tejun Heo
  2011-04-28 14:30                                                       ` Tejun Heo
@ 2011-04-28 14:42                                                       ` Christoph Lameter
  2011-04-28 14:44                                                         ` Tejun Heo
  1 sibling, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 14:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> On Thu, Apr 28, 2011 at 09:11:20AM -0500, Christoph Lameter wrote:
> > Sporadic erratic behavior exists today since any thread can add an
> > abitrary number to its local counter while you are adding up all the per
> > cpu differentials. If this happens just after you picked up the value then
> > a single cpu can cause a high deviation. If multiple cpus do this then a
> > high degree of deviation can even be had with todays implementation.
>
> Yeah, but that's still something which is expected.  If the user is
> adding/subtracing large number concurrently, sure.  What I'm concerned
> about is adding unexpected deviation.  Currently, the _sum interface
> basically provides basically the same level of expectedness (is this
> even a word?) as atomic_t - the deviations are solely caused and
> limited by concurrent updates.  After the proposed changes, one
> concurrent updater doing +1 can cause @batch deviation.

As I explained before _sum does not provide the accuracy that you think.

> > Can you show in some tests how the chance of deviations is increased? If
> > at all then in some special sitations. Maybe others get better?
>
> It's kinda obvious, isn't it?  Do relatively low freq (say, every
> 10ms) +1's and continuously do _sum().  Before, _sum() would never
> deviate much from the real count.  After, there will be @batch jumps.
> If you still need proof code, I would write it but please note that
> I'm pretty backed up.

"Obvious" could mean that you are drawing conclusions without a proper
reasoning chain. Here you assume certain things about the users of the
counters. The same assumptions were made when we had the vm counter
issues. The behavior of counter increments is typically not a regular
stream but occurs in spurts.

> > Looping over all differentials to get more accuracy is something that may
> > not work as we have seen recently with the VM counters issues that caused
> > bad behavior during reclaim.
>
> Such users then shouldn't use _sum() - maybe rename it to
> _very_slow_sum() if you're concerned about misusage.  percpu_counter()
> is already used in filesystems to count free blocks and there are
> times where atomic_t type accuracy is needed and _sum() achieves that.
> The proposed changes break that.  Why do I need to say this over and
> over again?

Because you are making strange assumptions about "accuracy" of the
counters? There is no atomic_t type accuracy with per cpu counters as we
have shown multiple times. Why are you repeating the same nonsense again
and again? If you want atomic_t accuracy then you need to use atomic_t and
take the performance penalty. Per cpu counters involve fuzziness and
through that fuzziness we gain a performance advantage.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:42                                                       ` Christoph Lameter
@ 2011-04-28 14:44                                                         ` Tejun Heo
  2011-04-28 14:52                                                           ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 14:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, Apr 28, 2011 at 09:42:49AM -0500, Christoph Lameter wrote:
> > > Can you show in some tests how the chance of deviations is increased? If
> > > at all then in some special sitations. Maybe others get better?
> >
> > It's kinda obvious, isn't it?  Do relatively low freq (say, every
> > 10ms) +1's and continuously do _sum().  Before, _sum() would never
> > deviate much from the real count.  After, there will be @batch jumps.
> > If you still need proof code, I would write it but please note that
> > I'm pretty backed up.
> 
> "Obvious" could mean that you are drawing conclusions without a proper
> reasoning chain. Here you assume certain things about the users of the
> counters. The same assumptions were made when we had the vm counter
> issues. The behavior of counter increments is typically not a regular
> stream but occurs in spurts.

Eh?  Are you saying the above can't happen or the above doesn't
matter?

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:44                                                         ` Tejun Heo
@ 2011-04-28 14:52                                                           ` Christoph Lameter
  2011-04-28 14:56                                                             ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 14:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> Eh?  Are you saying the above can't happen or the above doesn't
> matter?

Its an artificial use case that does not reflect the realities on how
these counters are typically used.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:52                                                           ` Christoph Lameter
@ 2011-04-28 14:56                                                             ` Tejun Heo
  2011-04-28 15:05                                                               ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 14:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hey,

On Thu, Apr 28, 2011 at 09:52:35AM -0500, Christoph Lameter wrote:
> On Thu, 28 Apr 2011, Tejun Heo wrote:
> 
> > Eh?  Are you saying the above can't happen or the above doesn't
> > matter?
> 
> Its an artificial use case that does not reflect the realities on how
> these counters are typically used.

Gees, Christoph.  That is a test case to show the issue prominently,
which is what a test case is supposed to do.  What it means is that
_any_ update can trigger @batch deviation on _sum() regardless of its
frequency or concurrency level and that's the nastiness I've been
talking about over and over again.

For fast path, sure.  For slow path, I don't think so.  If the
tradeoff still doesn't make sense to you, I don't know how to persuade
you guys.  I'm not gonna take it.  You and Shaohua are welcome to go
over my head and send the changes to Andrew or Linus, but please keep
me cc'd so that I can voice my objection.

Thank you.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:30                                                       ` Tejun Heo
@ 2011-04-28 14:58                                                         ` Christoph Lameter
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 14:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> And I'm getting more and more frustrated.  THIS IS SLOW PATH.  If it's
> showing up on your profile, bump up @batch.  It doesn't make any sense
> to micro optimize slow path at the cost of introducing such nastiness.
> Unless someone can show me such nastiness doesn't exist, I'm not gonna
> take this change.

I think its dangerous to think about these counters as comparable to
atomics. The user always has to keep in mind that these counters are
fuzzy and the coder has to consider that for any use of these.

Simplifying the slow path (and the rest of the code) by dropping the
spinlock is a worthy goal and may allow inlining the whole _add function
at some point since it then becomes quite small and no longer needs to do
function calls. As a side effect: Seeing the code without a spinlock will
make sure that no one will assume that he will get atomic_t type
consistency from these counters. Having a lock in there seems to cause
strange expectations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 14:56                                                             ` Tejun Heo
@ 2011-04-28 15:05                                                               ` Christoph Lameter
  2011-04-28 15:12                                                                 ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 15:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> Hey,
>
> On Thu, Apr 28, 2011 at 09:52:35AM -0500, Christoph Lameter wrote:
> > On Thu, 28 Apr 2011, Tejun Heo wrote:
> >
> > > Eh?  Are you saying the above can't happen or the above doesn't
> > > matter?
> >
> > Its an artificial use case that does not reflect the realities on how
> > these counters are typically used.
>
> Gees, Christoph.  That is a test case to show the issue prominently,
> which is what a test case is supposed to do.  What it means is that
> _any_ update can trigger @batch deviation on _sum() regardless of its
> frequency or concurrency level and that's the nastiness I've been
> talking about over and over again.

As far as I understand it: This is a test case where you want to show us
the atomic_t type behavior of _sum. This only works in such an artificial
test case. In reality batches of updates will modify any 'accurate' result
that you may have obtained from the _sum function.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:05                                                               ` Christoph Lameter
@ 2011-04-28 15:12                                                                 ` Tejun Heo
  2011-04-28 15:22                                                                   ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 15:12 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, Apr 28, 2011 at 10:05:09AM -0500, Christoph Lameter wrote:
> On Thu, 28 Apr 2011, Tejun Heo wrote:
> > Gees, Christoph.  That is a test case to show the issue prominently,
> > which is what a test case is supposed to do.  What it means is that
> > _any_ update can trigger @batch deviation on _sum() regardless of its
> > frequency or concurrency level and that's the nastiness I've been
> > talking about over and over again.
> 
> As far as I understand it: This is a test case where you want to show us
> the atomic_t type behavior of _sum. This only works in such an artificial
> test case. In reality batches of updates will modify any 'accurate' result
> that you may have obtained from the _sum function.

It seems like we can split hairs all day long about the similarities
and differences with atomics, so let's forget about atomics for now.

I don't like any update having possibility of causing @batch jumps in
_sum() result.  That severely limits the usefulness of hugely
expensive _sum() and the ability to scale @batch.  Not everything in
the world is vmstat.  Think about other _CURRENT_ use cases in
filesystems.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:12                                                                 ` Tejun Heo
@ 2011-04-28 15:22                                                                   ` Christoph Lameter
  2011-04-28 15:31                                                                     ` Tejun Heo
  2011-04-28 15:48                                                                     ` Eric Dumazet
  0 siblings, 2 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 15:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> It seems like we can split hairs all day long about the similarities
> and differences with atomics, so let's forget about atomics for now.

Ok good idea. So you accept that per cpu counters are inevitably fuzzy and
that the result cannot be relied upon in the same way as on atomics?

> I don't like any update having possibility of causing @batch jumps in
> _sum() result.  That severely limits the usefulness of hugely
> expensive _sum() and the ability to scale @batch.  Not everything in
> the world is vmstat.  Think about other _CURRENT_ use cases in
> filesystems.

Of course you can cause jumps > batch. You are looping over 8 cpus and
have done cpu 1 and 2 already. Then cpu 1 adds batch-1 to the local per
cpu counter. cpu 2 increments its per cpu counter. When _sum returns we
have deviation of batch.

In the worst case the deviation can increase to (NR_CPUS - 1) * (batch
-1). That is for the current code!!!

The hugely expensive _sum() is IMHO pretty useless given the above. It is
a function that is called with the *hope* of getting a more accurate
result.

The only way to reduce the fuzziness is by reducing batch. But then you
dont need the _sum function.

Either that or you need additional external synchronization by the
subsystem that prevents updates.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:22                                                                   ` Christoph Lameter
@ 2011-04-28 15:31                                                                     ` Tejun Heo
  2011-04-28 15:40                                                                       ` Tejun Heo
  2011-04-28 15:48                                                                     ` Eric Dumazet
  1 sibling, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 15:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, Apr 28, 2011 at 10:22:19AM -0500, Christoph Lameter wrote:
> On Thu, 28 Apr 2011, Tejun Heo wrote:
> 
> > It seems like we can split hairs all day long about the similarities
> > and differences with atomics, so let's forget about atomics for now.
> 
> Ok good idea. So you accept that per cpu counters are inevitably fuzzy and
> that the result cannot be relied upon in the same way as on atomics?

No, I don't, at all.

> > I don't like any update having possibility of causing @batch jumps in
> > _sum() result.  That severely limits the usefulness of hugely
> > expensive _sum() and the ability to scale @batch.  Not everything in
> > the world is vmstat.  Think about other _CURRENT_ use cases in
> > filesystems.
> 
> Of course you can cause jumps > batch. You are looping over 8 cpus and
> have done cpu 1 and 2 already. Then cpu 1 adds batch-1 to the local per
> cpu counter. cpu 2 increments its per cpu counter. When _sum returns we
> have deviation of batch.
> 
> In the worst case the deviation can increase to (NR_CPUS - 1) * (batch
> -1). That is for the current code!!!
> 
> The hugely expensive _sum() is IMHO pretty useless given the above. It is
> a function that is called with the *hope* of getting a more accurate
> result.

No, it isn't.  It provides meaningful enough accuracy for many use
cases.  It is not about extreme limits on pathological cases.  It's
being reasonably accurate so that it doesn't deviate too much from
expectations arising from given level of update frequency and
concurrency.

Christoph, I'm sorry but I really can't explain it any better and am
out of this thread.  If you still wanna proceed, you'll need to route
the patches yourself.  Please keep me cc'd.

Thank you.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:31                                                                     ` Tejun Heo
@ 2011-04-28 15:40                                                                       ` Tejun Heo
  2011-04-28 15:47                                                                         ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-28 15:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, Apr 28, 2011 at 05:31:01PM +0200, Tejun Heo wrote:
> > The hugely expensive _sum() is IMHO pretty useless given the above. It is
> > a function that is called with the *hope* of getting a more accurate
> > result.
...
> Christoph, I'm sorry but I really can't explain it any better and am
> out of this thread.  If you still wanna proceed, you'll need to route
> the patches yourself.  Please keep me cc'd.

Oh, another way to proceed would be, if _sum() is as useless as you
suggested above, remove _sum() first and see how that goes.  If that
flies, there will be no reason to argue over anything, right?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:40                                                                       ` Tejun Heo
@ 2011-04-28 15:47                                                                         ` Christoph Lameter
  0 siblings, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 15:47 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Tejun Heo wrote:

> On Thu, Apr 28, 2011 at 05:31:01PM +0200, Tejun Heo wrote:
> > > The hugely expensive _sum() is IMHO pretty useless given the above. It is
> > > a function that is called with the *hope* of getting a more accurate
> > > result.
> ...
> > Christoph, I'm sorry but I really can't explain it any better and am
> > out of this thread.  If you still wanna proceed, you'll need to route
> > the patches yourself.  Please keep me cc'd.
>
> Oh, another way to proceed would be, if _sum() is as useless as you
> suggested above, remove _sum() first and see how that goes.  If that
> flies, there will be no reason to argue over anything, right?

There is no point of arguing anymore since you do not accept that the
percpu counters do not have atomic_t consistency properties. The relaxing
of the consistency requirements in order to increase performance was the
intend when these schemes were created.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:22                                                                   ` Christoph Lameter
  2011-04-28 15:31                                                                     ` Tejun Heo
@ 2011-04-28 15:48                                                                     ` Eric Dumazet
  2011-04-28 15:59                                                                       ` Eric Dumazet
  1 sibling, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2011-04-28 15:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

Le jeudi 28 avril 2011 A  10:22 -0500, Christoph Lameter a A(C)crit :
> On Thu, 28 Apr 2011, Tejun Heo wrote:
> 
> > It seems like we can split hairs all day long about the similarities
> > and differences with atomics, so let's forget about atomics for now.
> 
> Ok good idea. So you accept that per cpu counters are inevitably fuzzy and
> that the result cannot be relied upon in the same way as on atomics?
> 
> > I don't like any update having possibility of causing @batch jumps in
> > _sum() result.  That severely limits the usefulness of hugely
> > expensive _sum() and the ability to scale @batch.  Not everything in
> > the world is vmstat.  Think about other _CURRENT_ use cases in
> > filesystems.
> 
> Of course you can cause jumps > batch. You are looping over 8 cpus and
> have done cpu 1 and 2 already. Then cpu 1 adds batch-1 to the local per
> cpu counter. cpu 2 increments its per cpu counter. When _sum returns we
> have deviation of batch.
> 
> In the worst case the deviation can increase to (NR_CPUS - 1) * (batch
> -1). That is for the current code!!!
> 
> The hugely expensive _sum() is IMHO pretty useless given the above. It is
> a function that is called with the *hope* of getting a more accurate
> result.
> 
> The only way to reduce the fuzziness is by reducing batch. But then you
> dont need the _sum function.
> 
> Either that or you need additional external synchronization by the
> subsystem that prevents updates.
> 
> 

We could add a seqcount (shared), and increment it each time one cpu
changes global count.

_sum() could get an additional parameter, the max number of allowed
changes during _sum() run.

If _sum() notices seqcount was changed too much, restart the loop.

s64 __percpu_counter_sum(struct percpu_counter *fbc, unsigned maxfuzzy)
{
	s64 ret;
	unsigned int oldseq, newseq;
	int cpu;
restart:
	oldseq = fbc->seqcount;
	smp_rmb();
	ret = fbc->count;
	for_each_online_cpu(cpu) {
		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
		ret += *pcount;
	}
	smp_rmb()
	newseq = fbc->count;
	if (newseq - oldseq >= maxfuzzy)
		goto restart;
	return ret;
}



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:48                                                                     ` Eric Dumazet
@ 2011-04-28 15:59                                                                       ` Eric Dumazet
  2011-04-28 16:17                                                                         ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2011-04-28 15:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

Le jeudi 28 avril 2011 A  17:48 +0200, Eric Dumazet a A(C)crit :
> uld add a seqcount (shared), and increment it each time one cpu
> changes global count.
> 
> _sum() could get an additional parameter, the max number of allowed
> changes during _sum() run.
> 
> If _sum() notices seqcount was changed too much, restart the loop.
> 
> s64 __percpu_counter_sum(struct percpu_counter *fbc, unsigned maxfuzzy)
> {
> 	s64 ret;
> 	unsigned int oldseq, newseq;
> 	int cpu;
> restart:
> 	oldseq = fbc->seqcount;
> 	smp_rmb();
> 	ret = fbc->count;
> 	for_each_online_cpu(cpu) {
> 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
> 		ret += *pcount;
> 	}
> 	smp_rmb()
> 	newseq = fbc->count;

Sorry, it should be : 
	newseq = fbc->seqcount

> 	if (newseq - oldseq >= maxfuzzy)
> 		goto restart;
> 	return ret;
> }
> 
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 15:59                                                                       ` Eric Dumazet
@ 2011-04-28 16:17                                                                         ` Christoph Lameter
  2011-04-28 16:35                                                                           ` Eric Dumazet
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 16:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tejun Heo, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, 28 Apr 2011, Eric Dumazet wrote:

> > If _sum() notices seqcount was changed too much, restart the loop.

This does not address the issue of cpus adding batch -1 while the
loop is going on.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 16:17                                                                         ` Christoph Lameter
@ 2011-04-28 16:35                                                                           ` Eric Dumazet
  2011-04-28 16:52                                                                             ` Christoph Lameter
  2011-04-29  8:32                                                                             ` Shaohua Li
  0 siblings, 2 replies; 64+ messages in thread
From: Eric Dumazet @ 2011-04-28 16:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

Le jeudi 28 avril 2011 A  11:17 -0500, Christoph Lameter a A(C)crit :
> On Thu, 28 Apr 2011, Eric Dumazet wrote:
> 
> > > If _sum() notices seqcount was changed too much, restart the loop.
> 
> This does not address the issue of cpus adding batch -1 while the
> loop is going on.


Yes, it does, I left the needed changes to write side as an exercice ;)


For example, based on current linux-2.6 code


void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
{
        s64 count;

        preempt_disable();
        count = __this_cpu_read(*fbc->counters) + amount;
        if (count >= batch || count <= -batch) {
                spin_lock(&fbc->lock);
		fbc->seqcount++;
                fbc->count += count;
                __this_cpu_write(*fbc->counters, 0);
                spin_unlock(&fbc->lock);
        } else {
                __this_cpu_write(*fbc->counters, count);
        }
        preempt_enable();
}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 16:35                                                                           ` Eric Dumazet
@ 2011-04-28 16:52                                                                             ` Christoph Lameter
  2011-04-28 16:59                                                                               ` Eric Dumazet
  2011-04-29  8:32                                                                             ` Shaohua Li
  1 sibling, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-28 16:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tejun Heo, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1148 bytes --]


On Thu, 28 Apr 2011, Eric Dumazet wrote:

> Le jeudi 28 avril 2011 à 11:17 -0500, Christoph Lameter a écrit :
> > On Thu, 28 Apr 2011, Eric Dumazet wrote:
> >
> > > > If _sum() notices seqcount was changed too much, restart the loop.
> >
> > This does not address the issue of cpus adding batch -1 while the
> > loop is going on.
>
>
> Yes, it does, I left the needed changes to write side as an exercice ;)
>
>
> For example, based on current linux-2.6 code
>
>
> void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
> {
>         s64 count;
>
>         preempt_disable();
>         count = __this_cpu_read(*fbc->counters) + amount;
>         if (count >= batch || count <= -batch) {
>                 spin_lock(&fbc->lock);
> 		fbc->seqcount++;
>                 fbc->count += count;
>                 __this_cpu_write(*fbc->counters, 0);
>                 spin_unlock(&fbc->lock);
>         } else {
>                 __this_cpu_write(*fbc->counters, count);
>         }
>         preempt_enable();
> }

I can still add (batch - 1) without causing the seqcount to be
incremented.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 16:52                                                                             ` Christoph Lameter
@ 2011-04-28 16:59                                                                               ` Eric Dumazet
  2011-04-29  8:52                                                                                 ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Dumazet @ 2011-04-28 16:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Tejun Heo, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

Le jeudi 28 avril 2011 A  11:52 -0500, Christoph Lameter a A(C)crit :

> I can still add (batch - 1) without causing the seqcount to be
> incremented.

It always had been like that, from the very beginning.

Point was trying to remove the lock, and Tejun said it was going to
increase fuzzyness.

I said : Readers should specify what max fuzzyness they allow.

Most users dont care at all, but we can provide an API.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 10:09                                                 ` Tejun Heo
  2011-04-28 14:11                                                   ` Christoph Lameter
@ 2011-04-29  8:19                                                   ` Shaohua Li
  2011-04-29  8:44                                                     ` Tejun Heo
  1 sibling, 1 reply; 64+ messages in thread
From: Shaohua Li @ 2011-04-29  8:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hi,
On Thu, 2011-04-28 at 18:09 +0800, Tejun Heo wrote:
> On Thu, Apr 28, 2011 at 11:28:04AM +0800, Shaohua Li wrote:
> > > Okay, this communication failure isn't my fault.  Please re-read what
> > > I wrote before, my concern wasn't primarily about pathological worst
> > > case - if that many concurrent updates are happening && the counter
> > > needs to be accurate, it can't even use atomic counter.  It should be
> > > doing full exclusion around the counter and the associated operation
> > > _together_.
> > > 
> > > I'm worried about sporadic erratic behavior happening regardless of
> > > update frequency and preemption would contribute but isn't necessary
> > > for that to happen.
> >
> > Ok, I misunderstood the mail you sent to Christoph, sorry. So you have
> > no problem about the atomic convert. I'll update the patch against base
> > tree, given the preemptless patch has problem.
> 
> Hmm... we're now more lost than ever. :-( Can you please re-read my
> message two replies ago?  The one where I talked about sporadic
> erratic behaviors in length and why I was worried about it.
> 
> In your last reply, you talked about preemption and that you didn't
> have problems with disabling preemption, which, unfortunately, doesn't
> have much to do with my concern with the sporadic erratic behaviors
> and that's what I pointed out in my previous reply.  So, it doesn't
> feel like anything is resolved.
ok, I got your point. I'd agree there is sporadic erratic behaviors, but
I expect there is no problem here. We all agree the worst case is the
same before/after the change. Any program should be able to handle the
worst case, otherwise the program itself is buggy. Discussing a buggy
program is meaningless. After the change, something behavior is changed,
but the worst case isn't. So I don't think this is a big problem.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 16:35                                                                           ` Eric Dumazet
  2011-04-28 16:52                                                                             ` Christoph Lameter
@ 2011-04-29  8:32                                                                             ` Shaohua Li
  1 sibling, 0 replies; 64+ messages in thread
From: Shaohua Li @ 2011-04-29  8:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Tejun Heo, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 2011-04-29 at 00:35 +0800, Eric Dumazet wrote:
> Le jeudi 28 avril 2011 A  11:17 -0500, Christoph Lameter a A(C)crit :
> > On Thu, 28 Apr 2011, Eric Dumazet wrote:
> > 
> > > > If _sum() notices seqcount was changed too much, restart the loop.
> > 
> > This does not address the issue of cpus adding batch -1 while the
> > loop is going on.
> 
> 
> Yes, it does, I left the needed changes to write side as an exercice ;)
> 
> 
> For example, based on current linux-2.6 code
> 
> 
> void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
> {
>         s64 count;
> 
>         preempt_disable();
>         count = __this_cpu_read(*fbc->counters) + amount;
>         if (count >= batch || count <= -batch) {
>                 spin_lock(&fbc->lock);
> 		fbc->seqcount++;
>                 fbc->count += count;
>                 __this_cpu_write(*fbc->counters, 0);
>                 spin_unlock(&fbc->lock);
>         } else {
>                 __this_cpu_write(*fbc->counters, count);
>         }
>         preempt_enable();
> }
yep, this can resolve Tejun's concern. The problem is how do you
determine maxfuzzy. This can mislead user too. Because in the worst
case, the deviation is num_cpu*batch. If a user uses a small maxfuzzy
and expect the max deviation is maxfuzzy, he actually will get a bigger
deviation in worst case.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29  8:19                                                   ` Shaohua Li
@ 2011-04-29  8:44                                                     ` Tejun Heo
  2011-04-29 14:02                                                       ` Christoph Lameter
  0 siblings, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-29  8:44 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, Apr 29, 2011 at 04:19:31PM +0800, Shaohua Li wrote:
> > In your last reply, you talked about preemption and that you didn't
> > have problems with disabling preemption, which, unfortunately, doesn't
> > have much to do with my concern with the sporadic erratic behaviors
> > and that's what I pointed out in my previous reply.  So, it doesn't
> > feel like anything is resolved.
>
> ok, I got your point. I'd agree there is sporadic erratic behaviors, but
> I expect there is no problem here. We all agree the worst case is the
> same before/after the change. Any program should be able to handle the
> worst case, otherwise the program itself is buggy. Discussing a buggy
> program is meaningless. After the change, something behavior is changed,
> but the worst case isn't. So I don't think this is a big problem.

If you really think that, go ahead and remove _sum(), really.  If you
still can't see the difference between "reasonably accurate unless
there's concurrent high frequency update" and "can jump on whatever",
I can't help you.  Worst case is important to consider but that's not
the only criterion you base your decisions on.

Think about it.  It becomes the difference between "oh yeah, while my
crazy concurrent FS benchmark is running, free block count is an
estimate but otherwise it's pretty accruate" and "holy shit, it jumped
while there's almost nothing going on the filesystem".  It drastically
limits both the usefulness of _sum() and thus the percpu counter and
how much we can scale @batch on heavily loaded counters because it
ends up directly affecting the accuracy of _sum().

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-28 16:59                                                                               ` Eric Dumazet
@ 2011-04-29  8:52                                                                                 ` Tejun Heo
  0 siblings, 0 replies; 64+ messages in thread
From: Tejun Heo @ 2011-04-29  8:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Shaohua Li, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Thu, Apr 28, 2011 at 06:59:56PM +0200, Eric Dumazet wrote:
> Le jeudi 28 avril 2011 a 11:52 -0500, Christoph Lameter a ecrit :
> 
> > I can still add (batch - 1) without causing the seqcount to be
> > incremented.
> 
> It always had been like that, from the very beginning.

This doesn't matter.  At this level, the order of concurrent
operations is not well defined.  You might as well say "oh well, then
the update happened after the sum is calculated".

The problem I have with the interface are two-folds.

1. Is it even necessary?  With concurrent updates, we don't and can't
   define strict order of updates across multiple CPUs.  If we sweep
   the counters without being intervened (IRQ or, well, NMI), it
   should be and has been acceptable enough.

2. Let's say we need this.  Then, @maxfuzzy.  Few people are gonna
   understand it well and use it properly.  Why can't you track the
   actual deviation introduced since sum started instead of tracking
   the number of deviation events?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29  8:44                                                     ` Tejun Heo
@ 2011-04-29 14:02                                                       ` Christoph Lameter
  2011-04-29 14:03                                                         ` Christoph Lameter
  2011-04-29 14:18                                                         ` Tejun Heo
  0 siblings, 2 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-29 14:02 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 29 Apr 2011, Tejun Heo wrote:

> > ok, I got your point. I'd agree there is sporadic erratic behaviors, but
> > I expect there is no problem here. We all agree the worst case is the
> > same before/after the change. Any program should be able to handle the
> > worst case, otherwise the program itself is buggy. Discussing a buggy
> > program is meaningless. After the change, something behavior is changed,
> > but the worst case isn't. So I don't think this is a big problem.
>
> If you really think that, go ahead and remove _sum(), really.  If you
> still can't see the difference between "reasonably accurate unless
> there's concurrent high frequency update" and "can jump on whatever",
> I can't help you.  Worst case is important to consider but that's not
> the only criterion you base your decisions on.

We agree it seems that _sum is a pretty attempt to be more accurate but
not really gets you total accuracy (atomic_t). Our experiences with adding
a _sum

> Think about it.  It becomes the difference between "oh yeah, while my
> crazy concurrent FS benchmark is running, free block count is an
> estimate but otherwise it's pretty accruate" and "holy shit, it jumped
> while there's almost nothing going on the filesystem".  It drastically
> limits both the usefulness of _sum() and thus the percpu counter and
> how much we can scale @batch on heavily loaded counters because it
> ends up directly affecting the accuracy of _sum().

free block count is always an estimate if it only uses percpu_counter
without other serialization. "Pretty accurate" is saying you feel good
about it. In fact the potential deviations are not that much different.

The problem with quietness arises because per_cpu_counter_add does not
have the time boundary that f.e. VM statistics have. Those fold the
differentials into the global sum every second or so.

It would be better to replace _sum() with a loop that adds the
differential and then zaps it. With the cmpxchg solution this is possible.

Could we do the handling here in the same way that vm stat per cpu
counters are done:

1. Bound the differential by time and size (we already have a batch notion
here but we need to add a time boundary. Run a function over all counters
every second or so that sums up the diffs).

2. Use lockless operations for differentials and atomics for globals to
make it scale well.

If someone wants more accuracy then we need the ability to dynamically set
the batch limit similar to what the vm statistics do.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29 14:02                                                       ` Christoph Lameter
@ 2011-04-29 14:03                                                         ` Christoph Lameter
  2011-04-29 14:18                                                         ` Tejun Heo
  1 sibling, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-29 14:03 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 29 Apr 2011, Christoph Lameter wrote:

> If someone wants more accuracy then we need the ability to dynamically set
> the batch limit similar to what the vm statistics do.

Forget the key point: After these measures it should be possible to remove
_sum.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29 14:02                                                       ` Christoph Lameter
  2011-04-29 14:03                                                         ` Christoph Lameter
@ 2011-04-29 14:18                                                         ` Tejun Heo
  2011-04-29 14:25                                                           ` Christoph Lameter
  1 sibling, 1 reply; 64+ messages in thread
From: Tejun Heo @ 2011-04-29 14:18 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, Apr 29, 2011 at 09:02:23AM -0500, Christoph Lameter wrote:
> On Fri, 29 Apr 2011, Tejun Heo wrote:
> We agree it seems that _sum is a pretty attempt to be more accurate but
> not really gets you total accuracy (atomic_t). Our experiences with adding
> a _sum

With concurrent updates, what is total accuracy anyway?  The order of
operations isn't defined.  Sure, atomic_t is serialized and in that
sense it might be totally accurate but without outer synchronization
the concurrent events it is counting aren't ordered.  Let's forget
about total accuracy.  I don't care and I don't think anyone should
care, but I do want reasonable output without out-of-place hiccups.

> free block count is always an estimate if it only uses percpu_counter
> without other serialization. "Pretty accurate" is saying you feel good
> about it. In fact the potential deviations are not that much different.

Well, I feel good for a reason - it doesn't hiccup on a single
touch(1) going on somewhere.

> 1. Bound the differential by time and size (we already have a batch notion
> here but we need to add a time boundary. Run a function over all counters
> every second or so that sums up the diffs).
> 
> 2. Use lockless operations for differentials and atomics for globals to
> make it scale well.
> 
> If someone wants more accuracy then we need the ability to dynamically set
> the batch limit similar to what the vm statistics do.

So, if you can remove _sum() by doing the above without introducing
excessive complexity or penalizing use cases which might not have too
much commonality with vmstat, by all means, but please pay attention
to the current users.  Actually take a look at them.

Also, if someone is gonna put considerable amount of effort into it,
please also invest some time into showing actual performance benefits,
or at least possibility of actual benefits.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29 14:18                                                         ` Tejun Heo
@ 2011-04-29 14:25                                                           ` Christoph Lameter
  2011-04-29 14:43                                                             ` Tejun Heo
  0 siblings, 1 reply; 64+ messages in thread
From: Christoph Lameter @ 2011-04-29 14:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 29 Apr 2011, Tejun Heo wrote:

> > If someone wants more accuracy then we need the ability to dynamically set
> > the batch limit similar to what the vm statistics do.
>
> So, if you can remove _sum() by doing the above without introducing
> excessive complexity or penalizing use cases which might not have too
> much commonality with vmstat, by all means, but please pay attention
> to the current users.  Actually take a look at them.

I am content to be maintaining the vm statistics.... But Shaohua may want
to have a look at it?

> Also, if someone is gonna put considerable amount of effort into it,
> please also invest some time into showing actual performance benefits,
> or at least possibility of actual benefits.

Yes. I think Shaohua already has some tests that could use some
improving. So we have kind of an agreement on how to move forward?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29 14:25                                                           ` Christoph Lameter
@ 2011-04-29 14:43                                                             ` Tejun Heo
  2011-04-29 14:55                                                               ` Christoph Lameter
  2011-05-05  4:08                                                               ` Shaohua Li
  0 siblings, 2 replies; 64+ messages in thread
From: Tejun Heo @ 2011-04-29 14:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

Hello,

On Fri, Apr 29, 2011 at 09:25:09AM -0500, Christoph Lameter wrote:
> On Fri, 29 Apr 2011, Tejun Heo wrote:
> 
> > > If someone wants more accuracy then we need the ability to dynamically set
> > > the batch limit similar to what the vm statistics do.
> >
> > So, if you can remove _sum() by doing the above without introducing
> > excessive complexity or penalizing use cases which might not have too
> > much commonality with vmstat, by all means, but please pay attention
> > to the current users.  Actually take a look at them.
> 
> I am content to be maintaining the vm statistics.... But Shaohua may want
> to have a look at it?

It would be nice if vmstat can be merged with percpu counter tho so
that the flushing can be done together.  If we such piggybacking, the
flushing overhead becomes much easier to justify.

How does vmstat collect the percpu counters?  Does one cpu visit all
of them or each cpu flush local counter to global one periodically?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29 14:43                                                             ` Tejun Heo
@ 2011-04-29 14:55                                                               ` Christoph Lameter
  2011-05-05  4:08                                                               ` Shaohua Li
  1 sibling, 0 replies; 64+ messages in thread
From: Christoph Lameter @ 2011-04-29 14:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 29 Apr 2011, Tejun Heo wrote:

> > I am content to be maintaining the vm statistics.... But Shaohua may want
> > to have a look at it?
>
> It would be nice if vmstat can be merged with percpu counter tho so
> that the flushing can be done together.  If we such piggybacking, the
> flushing overhead becomes much easier to justify.

Right. We could put the folding of the percpu counters diffs into the
vmstats function. See vmstat.c:refresh_cpu_vm_stats(). They would be
folded with the same logic as the VM stats. percpu counters are a linked
list though and therefore its expensive to scan that list. Maybe we can
convert that to a vmalloced table?

> How does vmstat collect the percpu counters?  Does one cpu visit all
> of them or each cpu flush local counter to global one periodically?

Each cpu folds its differentials into the per zone and the global counter
every N seconds. The seconds are configurable via
/proc/sys/vm/stats_interval.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH] percpu: preemptless __per_cpu_counter_add
  2011-04-29 14:43                                                             ` Tejun Heo
  2011-04-29 14:55                                                               ` Christoph Lameter
@ 2011-05-05  4:08                                                               ` Shaohua Li
  1 sibling, 0 replies; 64+ messages in thread
From: Shaohua Li @ 2011-05-05  4:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Eric Dumazet, akpm@linux-foundation.org,
	linux-mm@kvack.org

On Fri, 2011-04-29 at 22:43 +0800, Tejun Heo wrote:
> Hello,
> 
> On Fri, Apr 29, 2011 at 09:25:09AM -0500, Christoph Lameter wrote:
> > On Fri, 29 Apr 2011, Tejun Heo wrote:
> > 
> > > > If someone wants more accuracy then we need the ability to dynamically set
> > > > the batch limit similar to what the vm statistics do.
> > >
> > > So, if you can remove _sum() by doing the above without introducing
> > > excessive complexity or penalizing use cases which might not have too
> > > much commonality with vmstat, by all means, but please pay attention
> > > to the current users.  Actually take a look at them.
> > 
> > I am content to be maintaining the vm statistics.... But Shaohua may want
> > to have a look at it?
> 
> It would be nice if vmstat can be merged with percpu counter tho so
> that the flushing can be done together.  If we such piggybacking, the
> flushing overhead becomes much easier to justify.
> 
> How does vmstat collect the percpu counters?  Does one cpu visit all
> of them or each cpu flush local counter to global one periodically?
I thought we can use a lglock like locking to address the issue. each
cpu holds its lock to do update. _sum hold all cpu's lock to get
consistent result. This will slow down _sum a little bit, but assume
this doesn't matter. I haven't checked if we can still make fast path
preemptless, but we use atomic now. This can still give me similar
performance boost like previous implementation.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2011-05-05  4:08 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-13 14:45 percpu: preemptless __per_cpu_counter_add Christoph Lameter
2011-04-13 16:49 ` Christoph Lameter
2011-04-13 18:56   ` Tejun Heo
2011-04-13 20:22     ` [PATCH] " Christoph Lameter
2011-04-13 21:50       ` Tejun Heo
2011-04-13 22:17         ` Christoph Lameter
2011-04-13 22:23           ` Christoph Lameter
2011-04-13 23:55             ` Tejun Heo
2011-04-14  2:00               ` Eric Dumazet
2011-04-14  2:14       ` Eric Dumazet
2011-04-14 21:10         ` Christoph Lameter
2011-04-14 21:15           ` Tejun Heo
2011-04-15 17:37             ` Christoph Lameter
2011-04-15 18:27               ` Tejun Heo
2011-04-15 19:43                 ` Christoph Lameter
2011-04-15 23:52                   ` Tejun Heo
2011-04-18 14:38                     ` Christoph Lameter
2011-04-21 14:43                       ` Tejun Heo
2011-04-21 14:58                         ` Tejun Heo
2011-04-21 17:50                           ` Christoph Lameter
2011-04-21 18:01                             ` Tejun Heo
2011-04-21 18:20                               ` Christoph Lameter
2011-04-21 18:37                                 ` Tejun Heo
2011-04-21 18:54                                   ` Christoph Lameter
2011-04-21 19:08                                     ` Tejun Heo
2011-04-22  2:33                                       ` Shaohua Li
2011-04-26 12:10                                         ` Tejun Heo
2011-04-26 19:02                                           ` Hugh Dickins
2011-04-27 10:28                                             ` Tejun Heo
2011-04-27  5:43                                           ` Shaohua Li
2011-04-27 10:20                                             ` Tejun Heo
2011-04-28  3:28                                               ` Shaohua Li
2011-04-28 10:09                                                 ` Tejun Heo
2011-04-28 14:11                                                   ` Christoph Lameter
2011-04-28 14:23                                                     ` Tejun Heo
2011-04-28 14:30                                                       ` Tejun Heo
2011-04-28 14:58                                                         ` Christoph Lameter
2011-04-28 14:42                                                       ` Christoph Lameter
2011-04-28 14:44                                                         ` Tejun Heo
2011-04-28 14:52                                                           ` Christoph Lameter
2011-04-28 14:56                                                             ` Tejun Heo
2011-04-28 15:05                                                               ` Christoph Lameter
2011-04-28 15:12                                                                 ` Tejun Heo
2011-04-28 15:22                                                                   ` Christoph Lameter
2011-04-28 15:31                                                                     ` Tejun Heo
2011-04-28 15:40                                                                       ` Tejun Heo
2011-04-28 15:47                                                                         ` Christoph Lameter
2011-04-28 15:48                                                                     ` Eric Dumazet
2011-04-28 15:59                                                                       ` Eric Dumazet
2011-04-28 16:17                                                                         ` Christoph Lameter
2011-04-28 16:35                                                                           ` Eric Dumazet
2011-04-28 16:52                                                                             ` Christoph Lameter
2011-04-28 16:59                                                                               ` Eric Dumazet
2011-04-29  8:52                                                                                 ` Tejun Heo
2011-04-29  8:32                                                                             ` Shaohua Li
2011-04-29  8:19                                                   ` Shaohua Li
2011-04-29  8:44                                                     ` Tejun Heo
2011-04-29 14:02                                                       ` Christoph Lameter
2011-04-29 14:03                                                         ` Christoph Lameter
2011-04-29 14:18                                                         ` Tejun Heo
2011-04-29 14:25                                                           ` Christoph Lameter
2011-04-29 14:43                                                             ` Tejun Heo
2011-04-29 14:55                                                               ` Christoph Lameter
2011-05-05  4:08                                                               ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).