aim7 scalability issue on 4 socket machine

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* aim7 scalability issue on 4 socket machine
@ 2009-09-17  9:31 Zhang, Yanmin
  2009-09-17  9:40 ` Peter Zijlstra
  2009-09-17 10:35 ` Ingo Molnar
  0 siblings, 2 replies; 18+ messages in thread
From: Zhang, Yanmin @ 2009-09-17  9:31 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Arjan van de Ven, Andrew Morton,
	Andi Kleen

Aim7 result is bad on my new Nehalem machines (4*8*2 logical cpu). Perf counter
shows spinlock consumes 70% cpu time on the machine. Lock_stat shows
anon_vma->lock causes most of the spinlock contention. Function tracer shows
below call chain creates the spinlock.

do_brk => vma_merge =>vma_adjust

Aim7 consists of lots of subtests. One test is to fork lots of processes and
every process calls sbrk for 1000 times to grow/shrink the heap. All the vma of
the heap of all sub-processes point to the same anon_vma and use the same
anon_vma->lock. When sbrk is called, kernel calls do_brk => vma_merge =>vma_adjust
and lock anon_vma->lock to create spinlock contentions.

There is a comment section in front of spin_lock(&anon_vma->lock. It says
anon_vma lock can be optimized when just changing vma->vm_end. As a matter
of fact, anon_vma->lock is used to protect anon_vma->list when an entry is
deleted/inserted or the list is accessed. There is no such deletion/insertion
if only vma->end is changed in function vma_adjust.

Below patch fixes it.

Test results with kernel 2.6.31-rc8. The improvement on the machine is about 150%.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>

---

--- linux-2.6.31-rc8/mm/mmap.c	2009-09-03 10:03:57.000000000 +0800
+++ linux-2.6.31-rc8_aim7/mm/mmap.c	2009-09-17 19:11:20.000000000 +0800
@@ -512,6 +512,7 @@ void vma_adjust(struct vm_area_struct *v
 	struct anon_vma *anon_vma = NULL;
 	long adjust_next = 0;
 	int remove_next = 0;
+	int anon_vma_use_lock;

 	if (next && !insert) {
 		if (end >= next->vm_end) {
@@ -568,22 +569,32 @@ again:			remove_next = 1 + (end > next->
 		}
 	}

-	/*
-	 * When changing only vma->vm_end, we don't really need
-	 * anon_vma lock: but is that case worth optimizing out?
-	 */
 	if (vma->anon_vma)
 		anon_vma = vma->anon_vma;
+	anon_vma_use_lock = 0;
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
 		/*
-		 * Easily overlooked: when mprotect shifts the boundary,
-		 * make sure the expanding vma has anon_vma set if the
-		 * shrinking vma had, to cover any anon pages imported.
+		 * When changing only vma->vm_end, we don't really need
+		 * anon_vma lock.
+		 * ana_vma->lock is to protect the access to the list
+		 * started from anon_vma->head. If we don't remove or
+		 * insert a vma to the list, and also don't access
+		 * the list, we don't need  ana_vma->lock.
 		 */
-		if (importer && !importer->anon_vma) {
-			importer->anon_vma = anon_vma;
-			__anon_vma_link(importer);
+		if (remove_next ||
+			insert ||
+			(importer && !importer->anon_vma)) {
+			anon_vma_use_lock = 1;
+			spin_lock(&anon_vma->lock);
+			/*
+			 * Easily overlooked: when mprotect shifts the boundary,
+			 * make sure the expanding vma has anon_vma set if the
+			 * shrinking vma had, to cover any anon pages imported.
+			 */
+			if (importer && !importer->anon_vma) {
+				importer->anon_vma = anon_vma;
+				__anon_vma_link(importer);
+			}
 		}
 	}

@@ -628,7 +639,7 @@ again:			remove_next = 1 + (end > next->
 		__insert_vm_struct(mm, insert);
 	}

-	if (anon_vma)
+	if (anon_vma_use_lock)
 		spin_unlock(&anon_vma->lock);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-17  9:31 aim7 scalability issue on 4 socket machine Zhang, Yanmin
@ 2009-09-17  9:40 ` Peter Zijlstra
  2009-09-17 10:35   ` Hugh Dickins
  2009-09-17 10:35 ` Ingo Molnar
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2009-09-17  9:40 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: LKML, Ingo Molnar, Arjan van de Ven, Andrew Morton, Andi Kleen,
	lee.schermerhorn@hp.com, hugh.dickins

On Thu, 2009-09-17 at 17:31 +0800, Zhang, Yanmin wrote:
> Aim7 result is bad on my new Nehalem machines (4*8*2 logical cpu). Perf counter
> shows spinlock consumes 70% cpu time on the machine. Lock_stat shows
> anon_vma->lock causes most of the spinlock contention. Function tracer shows
> below call chain creates the spinlock.
> 
> do_brk => vma_merge =>vma_adjust
> 
> Aim7 consists of lots of subtests. One test is to fork lots of processes and
> every process calls sbrk for 1000 times to grow/shrink the heap. All the vma of
> the heap of all sub-processes point to the same anon_vma and use the same
> anon_vma->lock. When sbrk is called, kernel calls do_brk => vma_merge =>vma_adjust
> and lock anon_vma->lock to create spinlock contentions.
> 
> There is a comment section in front of spin_lock(&anon_vma->lock. It says
> anon_vma lock can be optimized when just changing vma->vm_end. As a matter
> of fact, anon_vma->lock is used to protect anon_vma->list when an entry is
> deleted/inserted or the list is accessed. There is no such deletion/insertion
> if only vma->end is changed in function vma_adjust.
> 
> Below patch fixes it.
> 
> Test results with kernel 2.6.31-rc8. The improvement on the machine is about 150%.

Did you see Lee's patch?:

 http://lkml.org/lkml/2009/9/9/290

Added Lee and Hugh to CC, retained the below patch for them.

> Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> 
> ---
> 
> --- linux-2.6.31-rc8/mm/mmap.c	2009-09-03 10:03:57.000000000 +0800
> +++ linux-2.6.31-rc8_aim7/mm/mmap.c	2009-09-17 19:11:20.000000000 +0800
> @@ -512,6 +512,7 @@ void vma_adjust(struct vm_area_struct *v
>  	struct anon_vma *anon_vma = NULL;
>  	long adjust_next = 0;
>  	int remove_next = 0;
> +	int anon_vma_use_lock;
>  
>  	if (next && !insert) {
>  		if (end >= next->vm_end) {
> @@ -568,22 +569,32 @@ again:			remove_next = 1 + (end > next->
>  		}
>  	}
>  
> -	/*
> -	 * When changing only vma->vm_end, we don't really need
> -	 * anon_vma lock: but is that case worth optimizing out?
> -	 */
>  	if (vma->anon_vma)
>  		anon_vma = vma->anon_vma;
> +	anon_vma_use_lock = 0;
>  	if (anon_vma) {
> -		spin_lock(&anon_vma->lock);
>  		/*
> -		 * Easily overlooked: when mprotect shifts the boundary,
> -		 * make sure the expanding vma has anon_vma set if the
> -		 * shrinking vma had, to cover any anon pages imported.
> +		 * When changing only vma->vm_end, we don't really need
> +		 * anon_vma lock.
> +		 * ana_vma->lock is to protect the access to the list
> +		 * started from anon_vma->head. If we don't remove or
> +		 * insert a vma to the list, and also don't access
> +		 * the list, we don't need  ana_vma->lock.
>  		 */
> -		if (importer && !importer->anon_vma) {
> -			importer->anon_vma = anon_vma;
> -			__anon_vma_link(importer);
> +		if (remove_next ||
> +			insert ||
> +			(importer && !importer->anon_vma)) {
> +			anon_vma_use_lock = 1;
> +			spin_lock(&anon_vma->lock);
> +			/*
> +			 * Easily overlooked: when mprotect shifts the boundary,
> +			 * make sure the expanding vma has anon_vma set if the
> +			 * shrinking vma had, to cover any anon pages imported.
> +			 */
> +			if (importer && !importer->anon_vma) {
> +				importer->anon_vma = anon_vma;
> +				__anon_vma_link(importer);
> +			}
>  		}
>  	}
>  
> @@ -628,7 +639,7 @@ again:			remove_next = 1 + (end > next->
>  		__insert_vm_struct(mm, insert);
>  	}
>  
> -	if (anon_vma)
> +	if (anon_vma_use_lock)
>  		spin_unlock(&anon_vma->lock);
>  	if (mapping)
>  		spin_unlock(&mapping->i_mmap_lock);
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-17  9:31 aim7 scalability issue on 4 socket machine Zhang, Yanmin
  2009-09-17  9:40 ` Peter Zijlstra
@ 2009-09-17 10:35 ` Ingo Molnar
  1 sibling, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2009-09-17 10:35 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: LKML, Peter Zijlstra, Arjan van de Ven, Andrew Morton, Andi Kleen


* Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:

> ???Aim7 result is bad on my new Nehalem machines (4*8*2 logical cpu). 
> Perf counter shows spinlock consumes 70% cpu time on the machine. 
> Lock_stat shows anon_vma->lock causes most of the spinlock contention. 
> Function tracer shows below call chain creates the spinlock.
> 
> do_brk => vma_merge =>vma_adjust
> 
> Aim7 consists of lots of subtests. One test is to fork lots of 
> processes and every process calls sbrk for 1000 times to grow/shrink 
> the heap. All the vma of the heap of all sub-processes point to the 
> same anon_vma and use the same anon_vma->lock. When sbrk is called, 
> kernel calls do_brk => vma_merge =>vma_adjust and lock anon_vma->lock 
> to create spinlock contentions.
> 
> There is a comment section in front of spin_lock(&anon_vma->lock. It 
> says anon_vma lock can be optimized when just changing vma->vm_end. As 
> a matter of fact, anon_vma->lock is used to protect anon_vma->list 
> when an entry is deleted/inserted or the list is accessed. There is no 
> such deletion/insertion if only vma->end is changed in function 
> vma_adjust.
> 
> Below patch fixes it.
> 
> Test results with kernel 2.6.31-rc8. The improvement on the machine is 
> about 150%.

Impressive speedup!

[ Also, the array of tools you used to debug this is impressive as well
  ;-) ]

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-17  9:40 ` Peter Zijlstra
@ 2009-09-17 10:35   ` Hugh Dickins
  2009-09-18  2:02     ` Zhang, Yanmin
  0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2009-09-17 10:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Zhang, Yanmin, LKML, Ingo Molnar, Arjan van de Ven, Andrew Morton,
	Andi Kleen, lee.schermerhorn@hp.com

On Thu, 17 Sep 2009, Peter Zijlstra wrote:
> On Thu, 2009-09-17 at 17:31 +0800, Zhang, Yanmin wrote:
> > Aim7 result is bad on my new Nehalem machines (4*8*2 logical cpu). Perf counter
> > shows spinlock consumes 70% cpu time on the machine. Lock_stat shows
> > anon_vma->lock causes most of the spinlock contention. Function tracer shows
> > below call chain creates the spinlock.
> > 
> > do_brk => vma_merge =>vma_adjust
> > 
> > Aim7 consists of lots of subtests. One test is to fork lots of processes and
> > every process calls sbrk for 1000 times to grow/shrink the heap. All the vma of
> > the heap of all sub-processes point to the same anon_vma and use the same
> > anon_vma->lock. When sbrk is called, kernel calls do_brk => vma_merge =>vma_adjust
> > and lock anon_vma->lock to create spinlock contentions.
> > 
> > There is a comment section in front of spin_lock(&anon_vma->lock. It says
> > anon_vma lock can be optimized when just changing vma->vm_end. As a matter
> > of fact, anon_vma->lock is used to protect anon_vma->list when an entry is
> > deleted/inserted or the list is accessed. There is no such deletion/insertion
> > if only vma->end is changed in function vma_adjust.
> > 
> > Below patch fixes it.
> > 
> > Test results with kernel 2.6.31-rc8. The improvement on the machine is about 150%.
> 
> Did you see Lee's patch?:
> 
>  http://lkml.org/lkml/2009/9/9/290
> 
> Added Lee and Hugh to CC, retained the below patch for them.

Thanks a lot for the CC, Peter.
See my reply to that mail for the slightly corrected version.

Yes, Yanmin and Lee appear to be fixing exactly the same issue.
I haven't thought through Yanmin's version for correctness, but
it lacks the vm_start check I added to Lee's, and I do prefer
Lee's style - hey, nothing personal!

So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
and let us know if that works as well for you - thanks.

Hugh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-17 10:35   ` Hugh Dickins
@ 2009-09-18  2:02     ` Zhang, Yanmin
  2009-09-18  2:59       ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Zhang, Yanmin @ 2009-09-18  2:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Arjan van de Ven,
	Andrew Morton, Andi Kleen, lee.schermerhorn@hp.com

On Thu, 2009-09-17 at 11:35 +0100, Hugh Dickins wrote:
> On Thu, 17 Sep 2009, Peter Zijlstra wrote:
> > On Thu, 2009-09-17 at 17:31 +0800, Zhang, Yanmin wrote:
> > > Aim7 result is bad on my new Nehalem machines (4*8*2 logical cpu). Perf counter
> > > shows spinlock consumes 70% cpu time on the machine. Lock_stat shows
> > > anon_vma->lock causes most of the spinlock contention. Function tracer shows
> > > below call chain creates the spinlock.
> > > 
> > > do_brk => vma_merge =>vma_adjust
> > > 
> > > Aim7 consists of lots of subtests. One test is to fork lots of processes and
> > > every process calls sbrk for 1000 times to grow/shrink the heap. All the vma of
> > > the heap of all sub-processes point to the same anon_vma and use the same
> > > anon_vma->lock. When sbrk is called, kernel calls do_brk => vma_merge =>vma_adjust
> > > and lock anon_vma->lock to create spinlock contentions.
> > > 
> > > There is a comment section in front of spin_lock(&anon_vma->lock. It says
> > > anon_vma lock can be optimized when just changing vma->vm_end. As a matter
> > > of fact, anon_vma->lock is used to protect anon_vma->list when an entry is
> > > deleted/inserted or the list is accessed. There is no such deletion/insertion
> > > if only vma->end is changed in function vma_adjust.
> > > 
> > > Below patch fixes it.
> > > 
> > > Test results with kernel 2.6.31-rc8. The improvement on the machine is about 150%.
> > 
> > Did you see Lee's patch?:
> > 
> >  http://lkml.org/lkml/2009/9/9/290
> > 
> > Added Lee and Hugh to CC, retained the below patch for them.
> 
> Thanks a lot for the CC, Peter.
> See my reply to that mail for the slightly corrected version.
> 
> Yes, Yanmin and Lee appear to be fixing exactly the same issue.
> I haven't thought through Yanmin's version for correctness, but
> it lacks the vm_start check I added to Lee's, and I do prefer
> Lee's style - hey, nothing personal!
> 
> So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> and let us know if that works as well for you - thanks.
I tested Lee's patch and it does fix the issue.

Yanmin



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  2:02     ` Zhang, Yanmin
@ 2009-09-18  2:59       ` Andrew Morton
  2009-09-18  3:17         ` Zhang, Yanmin
  2009-09-18  6:53         ` Hugh Dickins
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2009-09-18  2:59 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Hugh Dickins, Peter Zijlstra, LKML, Ingo Molnar, Arjan van de Ven,
	Andi Kleen, lee.schermerhorn@hp.com

On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:

> > > Did you see Lee's patch?:
> > > 
> > >  http://lkml.org/lkml/2009/9/9/290
> > > 
> > > Added Lee and Hugh to CC, retained the below patch for them.
> > 
> > Thanks a lot for the CC, Peter.
> > See my reply to that mail for the slightly corrected version.
> > 
> > Yes, Yanmin and Lee appear to be fixing exactly the same issue.
> > I haven't thought through Yanmin's version for correctness, but
> > it lacks the vm_start check I added to Lee's, and I do prefer
> > Lee's style - hey, nothing personal!
> > 
> > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > and let us know if that works as well for you - thanks.
> I tested Lee's patch and it does fix the issue.

Do we think we should cook up something for -stable?

Either this is a regression or the workload is particularly obscure. 

aim7 is sufficiently non-obscure to make me wonder what's happened here?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  2:59       ` Andrew Morton
@ 2009-09-18  3:17         ` Zhang, Yanmin
  2009-09-18  6:53         ` Hugh Dickins
  1 sibling, 0 replies; 18+ messages in thread
From: Zhang, Yanmin @ 2009-09-18  3:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Peter Zijlstra, LKML, Ingo Molnar, Arjan van de Ven,
	Andi Kleen, lee.schermerhorn@hp.com

On Thu, 2009-09-17 at 19:59 -0700, Andrew Morton wrote:
> On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
> 
> > > > Did you see Lee's patch?:
> > > > 
> > > >  http://lkml.org/lkml/2009/9/9/290
> > > > 
> > > > Added Lee and Hugh to CC, retained the below patch for them.
> > > 
> > > Thanks a lot for the CC, Peter.
> > > See my reply to that mail for the slightly corrected version.
> > > 
> > > Yes, Yanmin and Lee appear to be fixing exactly the same issue.
> > > I haven't thought through Yanmin's version for correctness, but
> > > it lacks the vm_start check I added to Lee's, and I do prefer
> > > Lee's style - hey, nothing personal!
> > > 
> > > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > > and let us know if that works as well for you - thanks.
> > I tested Lee's patch and it does fix the issue.
> 
> Do we think we should cook up something for -stable?
It's better to have a patch for -stable.

> 
> Either this is a regression or the workload is particularly obscure. 
This issue is not clear on dual socket Nehalem machine (2*4*2 cpu), but
is severe on large machine (4*8*2 cpu).

> 
> aim7 is sufficiently non-obscure to make me wonder what's happened here?
I copy previous content below:

Aim7 consists of lots of subtests. One test is to fork lots of processes and
every process calls sbrk for 1000 times to grow/shrink the heap. All the vma of
the heap of all sub-processes point to the same anon_vma and use the same
anon_vma->lock. When sbrk is called, kernel calls do_brk => vma_merge =>vma_adjust
and lock anon_vma->lock to create spinlock contentions.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  2:59       ` Andrew Morton
  2009-09-18  3:17         ` Zhang, Yanmin
@ 2009-09-18  6:53         ` Hugh Dickins
  2009-09-18  7:05           ` Andrew Morton
  2009-09-18  7:12           ` Andi Kleen
  1 sibling, 2 replies; 18+ messages in thread
From: Hugh Dickins @ 2009-09-18  6:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Zhang, Yanmin, Peter Zijlstra, LKML, Ingo Molnar,
	Arjan van de Ven, Andi Kleen, lee.schermerhorn@hp.com

On Thu, 17 Sep 2009, Andrew Morton wrote:
> On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
> > > 
> > > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > > and let us know if that works as well for you - thanks.
> > I tested Lee's patch and it does fix the issue.

Thanks for checking and reporting back, Yanmin.

> 
> Do we think we should cook up something for -stable?

Gosh, I laughed at Lee (sorry!) for suggesting it for -stable:
is stable really for getting a better number out of a benchmark?

I'd have thought the next release is the right place for that; but
I've no problem if you guys and the stable guys agree it's appropriate.

> 
> Either this is a regression or the workload is particularly obscure. 

I've not cross-checked descriptions, but assume Lee was actually
testing on exactly the same kind of upcoming Nehalem as Yanmin, and
that machine happens to have characteristics which show up badly here.

> 
> aim7 is sufficiently non-obscure to make me wonder what's happened here?

Not a regression, just the onward march of new hardware, I think.
Could easily be other such things in other places with other tests.

Hugh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  6:53         ` Hugh Dickins
@ 2009-09-18  7:05           ` Andrew Morton
  2009-12-06 20:08             ` Pavel Machek
  2009-09-18  7:12           ` Andi Kleen
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2009-09-18  7:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Zhang, Yanmin, Peter Zijlstra, LKML, Ingo Molnar,
	Arjan van de Ven, Andi Kleen, lee.schermerhorn@hp.com, stable

On Fri, 18 Sep 2009 07:53:58 +0100 (BST) Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:

> On Thu, 17 Sep 2009, Andrew Morton wrote:
> > On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
> > > > 
> > > > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > > > and let us know if that works as well for you - thanks.
> > > I tested Lee's patch and it does fix the issue.
> 
> Thanks for checking and reporting back, Yanmin.
> 
> > 
> > Do we think we should cook up something for -stable?
> 
> Gosh, I laughed at Lee (sorry!) for suggesting it for -stable:
> is stable really for getting a better number out of a benchmark?
> 
> I'd have thought the next release is the right place for that; but
> I've no problem if you guys and the stable guys agree it's appropriate.
> 
> > 
> > Either this is a regression or the workload is particularly obscure. 
> 
> I've not cross-checked descriptions, but assume Lee was actually
> testing on exactly the same kind of upcoming Nehalem as Yanmin, and
> that machine happens to have characteristics which show up badly here.
> 
> > 
> > aim7 is sufficiently non-obscure to make me wonder what's happened here?
> 
> Not a regression, just the onward march of new hardware, I think.
> Could easily be other such things in other places with other tests.
> 

Well, it comes down to the question "what is -stable for".

If you take it as "bugfixed version of the 2.6.x kernel" then no,
speedups aren't appropriate.

If you consider -stable to be "something distros, etc will use" then
yes, perhaps we serve those consumers better by including fairly major
efficiency improvements.

I suspect most consumers of -stable would prefer the latter approach,
as long as we don't go nuts.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  6:53         ` Hugh Dickins
  2009-09-18  7:05           ` Andrew Morton
@ 2009-09-18  7:12           ` Andi Kleen
  2009-09-18  7:29             ` Hugh Dickins
  2009-09-18 13:15             ` Lee Schermerhorn
  1 sibling, 2 replies; 18+ messages in thread
From: Andi Kleen @ 2009-09-18  7:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Zhang, Yanmin, Peter Zijlstra, LKML, Ingo Molnar,
	Arjan van de Ven, Andi Kleen, lee.schermerhorn@hp.com

On Fri, Sep 18, 2009 at 07:53:58AM +0100, Hugh Dickins wrote:
> On Thu, 17 Sep 2009, Andrew Morton wrote:
> > On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
> > > > 
> > > > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > > > and let us know if that works as well for you - thanks.
> > > I tested Lee's patch and it does fix the issue.
> 
> Thanks for checking and reporting back, Yanmin.
> 
> > 
> > Do we think we should cook up something for -stable?
> 
> Gosh, I laughed at Lee (sorry!) for suggesting it for -stable:
> is stable really for getting a better number out of a benchmark?

When your system is large enough scalability problems (e.g.
lock contention) can be a serious bug. i.e. when your workload
is 150% slower than expected that can well be a show stopper.

Admittedly the workload in this case was a benchmark, but it's
not that far fetched to expect the same problem in a real application.

We had a similar problem with the accounting lock some time 
ago, I think that patch also went in.

So yes I think simple non intrusive fixes for serious scalability
problems should be stable candidates.

> > Either this is a regression or the workload is particularly obscure. 
> 
> I've not cross-checked descriptions, but assume Lee was actually
> testing on exactly the same kind of upcoming Nehalem as Yanmin, and
> that machine happens to have characteristics which show up badly here.

AFAIK Lee usually tests on large IA64 boxes.

> > aim7 is sufficiently non-obscure to make me wonder what's happened here?
> 
> Not a regression, just the onward march of new hardware, I think.
> Could easily be other such things in other places with other tests.

Yes, it's just a much larger machine, so old hidden scalability sins now
appear.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  7:12           ` Andi Kleen
@ 2009-09-18  7:29             ` Hugh Dickins
  2009-09-18 13:15             ` Lee Schermerhorn
  1 sibling, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2009-09-18  7:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Zhang, Yanmin, Peter Zijlstra, LKML, Ingo Molnar,
	Arjan van de Ven, lee.schermerhorn@hp.com

On Fri, 18 Sep 2009, Andi Kleen wrote:
> 
> So yes I think simple non intrusive fixes for serious scalability
> problems should be stable candidates.

This one is certainly non-intrusive, and you're all in agreement
that it better go in, so I won't dispute that.

(But I'm a little worried that this easily fixed issue will turn out
to be just the first in a string of not so easily fixed issues.)

Hugh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  7:12           ` Andi Kleen
  2009-09-18  7:29             ` Hugh Dickins
@ 2009-09-18 13:15             ` Lee Schermerhorn
  2009-09-18 14:33               ` Andi Kleen
  1 sibling, 1 reply; 18+ messages in thread
From: Lee Schermerhorn @ 2009-09-18 13:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Hugh Dickins, Andrew Morton, Zhang, Yanmin, Peter Zijlstra, LKML,
	Ingo Molnar, Arjan van de Ven

On Fri, 2009-09-18 at 09:12 +0200, Andi Kleen wrote:
> On Fri, Sep 18, 2009 at 07:53:58AM +0100, Hugh Dickins wrote:
> > On Thu, 17 Sep 2009, Andrew Morton wrote:
> > > On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
> > > > > 
> > > > > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > > > > and let us know if that works as well for you - thanks.
> > > > I tested Lee's patch and it does fix the issue.
> > 
> > Thanks for checking and reporting back, Yanmin.
> > 
> > > 
> > > Do we think we should cook up something for -stable?
> > 
> > Gosh, I laughed at Lee (sorry!) for suggesting it for -stable:
> > is stable really for getting a better number out of a benchmark?
> 
> When your system is large enough scalability problems (e.g.
> lock contention) can be a serious bug. i.e. when your workload
> is 150% slower than expected that can well be a show stopper.
> 
> Admittedly the workload in this case was a benchmark, but it's
> not that far fetched to expect the same problem in a real application.
> 
> We had a similar problem with the accounting lock some time 
> ago, I think that patch also went in.
> 
> So yes I think simple non intrusive fixes for serious scalability
> problems should be stable candidates.
> 
> > > Either this is a regression or the workload is particularly obscure. 
> > 
> > I've not cross-checked descriptions, but assume Lee was actually
> > testing on exactly the same kind of upcoming Nehalem as Yanmin, and
> > that machine happens to have characteristics which show up badly here.
> 
> AFAIK Lee usually tests on large IA64 boxes.

In this case, it's an x86_64 [DL785] platform--an 8 socket, 4 core
Shanghai in a glueless, "twisted ladder" config.

Lee



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18 13:15             ` Lee Schermerhorn
@ 2009-09-18 14:33               ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2009-09-18 14:33 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Andi Kleen, Hugh Dickins, Andrew Morton, Zhang, Yanmin,
	Peter Zijlstra, LKML, Ingo Molnar, Arjan van de Ven

> > 
> > AFAIK Lee usually tests on large IA64 boxes.
> 
> In this case, it's an x86_64 [DL785] platform--an 8 socket, 4 core
> Shanghai in a glueless, "twisted ladder" config.

Thanks for the correction. However it's still a quite different system,
so I think it's clear it happens on multiple different systems.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-09-18  7:05           ` Andrew Morton
@ 2009-12-06 20:08             ` Pavel Machek
  2009-12-06 20:11               ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Machek @ 2009-12-06 20:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Hugh Dickins, Zhang, Yanmin, Peter Zijlstra, LKML, Ingo Molnar,
	Arjan van de Ven, Andi Kleen, lee.schermerhorn@hp.com, stable

On Fri 2009-09-18 00:05:42, Andrew Morton wrote:
> On Fri, 18 Sep 2009 07:53:58 +0100 (BST) Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
> 
> > On Thu, 17 Sep 2009, Andrew Morton wrote:
> > > On Fri, 18 Sep 2009 10:02:19 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:
> > > > > 
> > > > > So, Yanmin, please retest with http://lkml.org/lkml/2009/9/13/25
> > > > > and let us know if that works as well for you - thanks.
> > > > I tested Lee's patch and it does fix the issue.
> > 
> > Thanks for checking and reporting back, Yanmin.
> > 
> > > 
> > > Do we think we should cook up something for -stable?
> > 
> > Gosh, I laughed at Lee (sorry!) for suggesting it for -stable:
> > is stable really for getting a better number out of a benchmark?
> > 
> > I'd have thought the next release is the right place for that; but
> > I've no problem if you guys and the stable guys agree it's appropriate.
> > 
> > > 
> > > Either this is a regression or the workload is particularly obscure. 
> > 
> > I've not cross-checked descriptions, but assume Lee was actually
> > testing on exactly the same kind of upcoming Nehalem as Yanmin, and
> > that machine happens to have characteristics which show up badly here.
> > 
> > > 
> > > aim7 is sufficiently non-obscure to make me wonder what's happened here?
> > 
> > Not a regression, just the onward march of new hardware, I think.
> > Could easily be other such things in other places with other tests.
> > 
> 
> Well, it comes down to the question "what is -stable for".
> 
> If you take it as "bugfixed version of the 2.6.x kernel" then no,
> speedups aren't appropriate.
> 
> If you consider -stable to be "something distros, etc will use" then
> yes, perhaps we serve those consumers better by including fairly major
> efficiency improvements.

Well, if speedups are ok, then someone should update stable_rules
file...?

...because I do not think it should be accepted based on that.

								Pavel
It currently says:
-------------------


Rules on what kind of patches are accepted, and which ones are not,
into the
"-stable" tree:

 - It must be obviously correct and tested.
 - It cannot be bigger than 100 lines, with context.
 - It must fix only one thing.
 - It must fix a real bug that bothers people (not a, "This could be a
   problem..." type thing).
 - It must fix a problem that causes a build error (but not for things
   marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
   security issue, or some "oh, that's not good" issue.  In short,
something
   critical.
 - New device IDs and quirks are also accepted.
 - No "theoretical race condition" issues, unless an explanation of
how the
   race can be exploited is also provided.
 - It cannot contain any "trivial" fixes in it (spelling changes,
   whitespace cleanups, etc).
 - It must follow the Documentation/SubmittingPatches rules.
 - It or an equivalent fix must already exist in Linus' tree.  Quote
the
   respective commit ID in Linus' tree in your patch submission to
-stable.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-12-06 20:08             ` Pavel Machek
@ 2009-12-06 20:11               ` Andi Kleen
  2009-12-06 21:11                 ` Pavel Machek
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2009-12-06 20:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Hugh Dickins, Zhang, Yanmin, Peter Zijlstra, LKML,
	Ingo Molnar, Arjan van de Ven, Andi Kleen,
	lee.schermerhorn@hp.com, stable

>  - It must be obviously correct and tested.
>  - It cannot be bigger than 100 lines, with context.
>  - It must fix only one thing.
>  - It must fix a real bug that bothers people (not a, "This could be a
>    problem..." type thing).

A significant slow down in a common situation is a "significant
bug that bothers people"

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: aim7 scalability issue on 4 socket machine
  2009-12-06 20:11               ` Andi Kleen
@ 2009-12-06 21:11                 ` Pavel Machek
  2009-12-06 22:17                   ` [stable] " Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Pavel Machek @ 2009-12-06 21:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Hugh Dickins, Zhang, Yanmin, Peter Zijlstra, LKML,
	Ingo Molnar, Arjan van de Ven, lee.schermerhorn@hp.com, stable

On Sun 2009-12-06 21:11:36, Andi Kleen wrote:
> >  - It must be obviously correct and tested.
> >  - It cannot be bigger than 100 lines, with context.
> >  - It must fix only one thing.
> >  - It must fix a real bug that bothers people (not a, "This could be a
> >    problem..." type thing).
> 
> A significant slow down in a common situation is a "significant
> bug that bothers people"

Well, IIRC it was benchmark that was slowed down, on huge system, so
it was exactly "this could be a problem".

Anyway, I don't care about -stable series too much (and sorry for
replying to such an old mail), but perhaps the docs should be updated?

Examples cited there are such as "data corruption" or "oops", which is
clearly different ballpark then "slowdown".
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [stable] aim7 scalability issue on 4 socket machine
  2009-12-06 21:11                 ` Pavel Machek
@ 2009-12-06 22:17                   ` Greg KH
  2009-12-06 22:23                     ` Pavel Machek
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2009-12-06 22:17 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andi Kleen, lee.schermerhorn@hp.com, Zhang, Yanmin,
	Peter Zijlstra, LKML, stable, Hugh Dickins, Andrew Morton,
	Arjan van de Ven, Ingo Molnar

On Sun, Dec 06, 2009 at 10:11:05PM +0100, Pavel Machek wrote:
> On Sun 2009-12-06 21:11:36, Andi Kleen wrote:
> > >  - It must be obviously correct and tested.
> > >  - It cannot be bigger than 100 lines, with context.
> > >  - It must fix only one thing.
> > >  - It must fix a real bug that bothers people (not a, "This could be a
> > >    problem..." type thing).
> > 
> > A significant slow down in a common situation is a "significant
> > bug that bothers people"
> 
> Well, IIRC it was benchmark that was slowed down, on huge system, so
> it was exactly "this could be a problem".
> 
> Anyway, I don't care about -stable series too much (and sorry for
> replying to such an old mail), but perhaps the docs should be updated?
> 
> Examples cited there are such as "data corruption" or "oops", which is
> clearly different ballpark then "slowdown".

Why does it really matter if the intent is the thing that matters here.

Deal with specifics on a case-by-case basis, and if you don't really
care about this, then why dig up a many-month old thread?

strange,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [stable] aim7 scalability issue on 4 socket machine
  2009-12-06 22:17                   ` [stable] " Greg KH
@ 2009-12-06 22:23                     ` Pavel Machek
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Machek @ 2009-12-06 22:23 UTC (permalink / raw)
  To: Greg KH
  Cc: Andi Kleen, lee.schermerhorn@hp.com, Zhang, Yanmin,
	Peter Zijlstra, LKML, stable, Hugh Dickins, Andrew Morton,
	Arjan van de Ven, Ingo Molnar


> Deal with specifics on a case-by-case basis, and if you don't really
> care about this, then why dig up a many-month old thread?

Digging it was a mistake, I was working on wrong machine. Sorry.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-12-06 22:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-17  9:31 aim7 scalability issue on 4 socket machine Zhang, Yanmin
2009-09-17  9:40 ` Peter Zijlstra
2009-09-17 10:35   ` Hugh Dickins
2009-09-18  2:02     ` Zhang, Yanmin
2009-09-18  2:59       ` Andrew Morton
2009-09-18  3:17         ` Zhang, Yanmin
2009-09-18  6:53         ` Hugh Dickins
2009-09-18  7:05           ` Andrew Morton
2009-12-06 20:08             ` Pavel Machek
2009-12-06 20:11               ` Andi Kleen
2009-12-06 21:11                 ` Pavel Machek
2009-12-06 22:17                   ` [stable] " Greg KH
2009-12-06 22:23                     ` Pavel Machek
2009-09-18  7:12           ` Andi Kleen
2009-09-18  7:29             ` Hugh Dickins
2009-09-18 13:15             ` Lee Schermerhorn
2009-09-18 14:33               ` Andi Kleen
2009-09-17 10:35 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox