IO CPU affinity test results

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* IO CPU affinity test results
@ 2008-03-14 12:14 Alan D. Brunelle
  2008-03-14 12:23 ` Alan D. Brunelle
  2008-03-14 12:36 ` Jens Axboe
  0 siblings, 2 replies; 7+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 12:14 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe, npiggin, dgc

Good morning Jens - 

I had two machines running the latest patches hang last night: 

o  2-way AMD64 - I inadvertently left the patched kernel running, and I was moving a ton of data (100+GB) back up over the net to this node. It hard hung (believe it or not) about 99% of the way through. Hard hang, wouldn't respond to anything.

o  4-way IA64 - I was performing a simple test: [mkfs / mount / untar linux sources / make allnoconfig / make -j 5 / umount] repeatedly switching rq_affinity to 0/1 between each run. After 22 passes it had a hard hang with rq_affinity set to 1.

Of course, there is no way of knowing if either hang had anything to do with the patches, but it seems a bit ominous as RQ=1 was set in both cases.

This same test worked fine for 30 passes on a 2-way AMD64 box, with the following results:

Part  RQ   MIN     AVG     MAX      Dev
----- --  ------  ------  ------  ------
 mkfs  0  41.656  41.862  42.086   0.141
 mkfs  1  41.618  41.909  42.270   0.192

untar  0  18.055  19.611  20.906   0.720
untar  1  18.523  19.905  21.988   0.738

 make  0  50.480  50.991  51.752   0.340
 make  1  49.819  50.442  51.000   0.292

 comb  0 110.433 112.464 114.176   0.932
 comb  1 110.694 112.256 114.683   0.948

 psys  0  10.28%  10.91%  11.29%   0.243
 psys  1  10.21%  11.05%  11.80%   0.350

All results are in seconds (as measured by Python's time.time()), except for the psys - which was the average of mpstat's %sys column over the life of the whole run. The mkfs part consisted of [mkfs -t ext2 ; sync ; sync], untar [mount; untar linux sources; umount; sync; sync], make [mount; make allnoconfig; make -j 3; umount; sync; sync], and comb is the combined times of the mkfs, untar and make parts. 

So, in a nutshell, we saw slightly better overall performance, but not conclusively, and we saw slightly elevated %system time to accomplish the task. 

On the 4-way, results were much worse: the final data shown before the system hung showed the rq=1 passes taking significantly longer, albeit at lower %system. I'm going to try the runs again, but I have a feeling that the latest "clean" patch based upon Nick's single call mechanism is a step backwards.

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IO CPU affinity test results
  2008-03-14 12:14 IO CPU affinity test results Alan D. Brunelle
@ 2008-03-14 12:23 ` Alan D. Brunelle
  2008-03-14 12:37   ` Jens Axboe
  2008-03-14 12:36 ` Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 12:23 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel, Jens Axboe, npiggin, dgc

Alan D. Brunelle wrote:
> Good morning Jens -
>
> I had two machines running the latest patches hang last night:
>
> o  2-way AMD64 - I inadvertently left the patched kernel running, and I was moving a ton of data (100+GB) back up over the net to this node. It hard hung (believe it or not) about 99% of the way through. Hard hang, wouldn't respond to anything.
>
> o  4-way IA64 - I was performing a simple test: [mkfs / mount / untar linux sources / make allnoconfig / make -j 5 / umount] repeatedly switching rq_affinity to 0/1 between each run. After 22 passes it had a hard hang with rq_affinity set to 1.
>
> Of course, there is no way of knowing if either hang had anything to do with the patches, but it seems a bit ominous as RQ=1 was set in both cases.
>
> This same test worked fine for 30 passes on a 2-way AMD64 box, with the following results:
>
> Part  RQ   MIN     AVG     MAX      Dev
> ----- --  ------  ------  ------  ------
>  mkfs  0  41.656  41.862  42.086   0.141
>  mkfs  1  41.618  41.909  42.270   0.192
>
> untar  0  18.055  19.611  20.906   0.720
> untar  1  18.523  19.905  21.988   0.738
>
>  make  0  50.480  50.991  51.752   0.340
>  make  1  49.819  50.442  51.000   0.292
>
>  comb  0 110.433 112.464 114.176   0.932
>  comb  1 110.694 112.256 114.683   0.948
>
>  psys  0  10.28%  10.91%  11.29%   0.243
>  psys  1  10.21%  11.05%  11.80%   0.350
>
>
> All results are in seconds (as measured by Python's time.time()), except for the psys - which was the average of mpstat's %sys column over the life of the whole run. The mkfs part consisted of [mkfs -t ext2 ; sync ; sync], untar [mount; untar linux sources; umount; sync; sync], make [mount; make allnoconfig; make -j 3; umount; sync; sync], and comb is the combined times of the mkfs, untar and make parts.
>
> So, in a nutshell, we saw slightly better overall performance, but not conclusively, and we saw slightly elevated %system time to accomplish the task.
>
> On the 4-way, results were much worse: the final data shown before the system hung showed the rq=1 passes taking significantly longer, albeit at lower %system. I'm going to try the runs again, but I have a feeling that the latest "clean" patch based upon Nick's single call mechanism is a step backwards.
>
> Alan

I was able to go back and capture the results after 17 passes on the 4-way IA64 box (before it hung), and with rq=1 it shows a huge increase in time needed to do the combined tests - almost 18% longer, however with a reduction of about 24% less system time.

Part  RQ   MIN     AVG     MAX      Dev
----- --  ------  ------  ------  ------
 mkfs  0  18.543  19.055  19.514   0.285
 mkfs  1  18.730  19.217  19.812   0.316

untar  0  17.119  21.396  43.868   8.025
untar  1  16.987  28.155  44.637  10.175

 make  0  23.105  23.866  24.487   0.359
 make  1  24.015  28.384  37.829   3.598

 comb  0  59.610  64.317  86.733   7.896
 comb  1  63.181  75.755  94.079  10.489

 psys  0  10.35%  14.16%  16.28%   1.375
 psys  1   6.89%  10.73%  14.30%   2.368

I'll try to snag some profile data to see what's up.

Alan
PS. Besides the AMD64/IA64 architectural difference, the underlying storage was different as well: U320 on the AMD64 and FC on the IA64, I don't know if that has anything to do with the different results seen on the two hosts.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IO CPU affinity test results
  2008-03-14 12:14 IO CPU affinity test results Alan D. Brunelle
  2008-03-14 12:23 ` Alan D. Brunelle
@ 2008-03-14 12:36 ` Jens Axboe
  2008-03-14 13:05   ` Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2008-03-14 12:36 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel, npiggin, dgc

On Fri, Mar 14 2008, Alan D. Brunelle wrote:
> Good morning Jens - 
> 
> I had two machines running the latest patches hang last night: 
> 
> o  2-way AMD64 - I inadvertently left the patched kernel running, and
> I was moving a ton of data (100+GB) back up over the net to this node.
> It hard hung (believe it or not) about 99% of the way through. Hard
> hang, wouldn't respond to anything.
> 
> o  4-way IA64 - I was performing a simple test: [mkfs / mount / untar
> linux sources / make allnoconfig / make -j 5 / umount] repeatedly
> switching rq_affinity to 0/1 between each run. After 22 passes it had
> a hard hang with rq_affinity set to 1.
> 
> Of course, there is no way of knowing if either hang had anything to
> do with the patches, but it seems a bit ominous as RQ=1 was set in
> both cases.
> 
> This same test worked fine for 30 passes on a 2-way AMD64 box, with
> the following results:
> 
> Part  RQ   MIN     AVG     MAX      Dev
> ----- --  ------  ------  ------  ------
>  mkfs  0  41.656  41.862  42.086   0.141
>  mkfs  1  41.618  41.909  42.270   0.192
> 
> untar  0  18.055  19.611  20.906   0.720
> untar  1  18.523  19.905  21.988   0.738
> 
>  make  0  50.480  50.991  51.752   0.340
>  make  1  49.819  50.442  51.000   0.292
> 
>  comb  0 110.433 112.464 114.176   0.932
>  comb  1 110.694 112.256 114.683   0.948
> 
>  psys  0  10.28%  10.91%  11.29%   0.243
>  psys  1  10.21%  11.05%  11.80%   0.350
> 
> 
> All results are in seconds (as measured by Python's time.time()),
> except for the psys - which was the average of mpstat's %sys column
> over the life of the whole run. The mkfs part consisted of [mkfs -t
> ext2 ; sync ; sync], untar [mount; untar linux sources; umount; sync;
> sync], make [mount; make allnoconfig; make -j 3; umount; sync; sync],
> and comb is the combined times of the mkfs, untar and make parts. 
> 
> So, in a nutshell, we saw slightly better overall performance, but not
> conclusively, and we saw slightly elevated %system time to accomplish
> the task. 

I think that is encouraging, for such a small setup. The make results
are particularly nice. The hangs are a bother, I have no good ideas on
why the occur. The fact that it happens on both archs indicates that
this is perhaps a generic problem, which is good. The code to support
this is relatively simple, so it should be possible to go over it with a
fine toothed comb and see if anything shows up.

You didn't get any watchdog triggers on the serial console, or anything
like that?

> On the 4-way, results were much worse: the final data shown before the
> system hung showed the rq=1 passes taking significantly longer, albeit
> at lower %system. I'm going to try the runs again, but I have a
> feeling that the latest "clean" patch based upon Nick's single call
> mechanism is a step backwards.

As to the hangs, yes it's definitely a step back. Cleanliness of code is
better though and it's something we can support, so I'm inclined to try
and make it work. The overhead of the cleaner approach should not be a
problem, it'll be a bit more costly than the ipi hack but not a lot.
Since the tuning and tweaks, we got rid of the allocations in the path,
so it should be about as fast as the original.

So once the base is stable, we can go and profile and squeeze some more
performance out of it :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IO CPU affinity test results
  2008-03-14 12:23 ` Alan D. Brunelle
@ 2008-03-14 12:37   ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2008-03-14 12:37 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel, npiggin, dgc

On Fri, Mar 14 2008, Alan D. Brunelle wrote:
> Alan D. Brunelle wrote:
> > Good morning Jens -
> >
> > I had two machines running the latest patches hang last night:
> >
> > o  2-way AMD64 - I inadvertently left the patched kernel running, and I was moving a ton of data (100+GB) back up over the net to this node. It hard hung (believe it or not) about 99% of the way through. Hard hang, wouldn't respond to anything.
> >
> > o  4-way IA64 - I was performing a simple test: [mkfs / mount / untar linux sources / make allnoconfig / make -j 5 / umount] repeatedly switching rq_affinity to 0/1 between each run. After 22 passes it had a hard hang with rq_affinity set to 1.
> >
> > Of course, there is no way of knowing if either hang had anything to do with the patches, but it seems a bit ominous as RQ=1 was set in both cases.
> >
> > This same test worked fine for 30 passes on a 2-way AMD64 box, with the following results:
> >
> > Part  RQ   MIN     AVG     MAX      Dev
> > ----- --  ------  ------  ------  ------
> >  mkfs  0  41.656  41.862  42.086   0.141
> >  mkfs  1  41.618  41.909  42.270   0.192
> >
> > untar  0  18.055  19.611  20.906   0.720
> > untar  1  18.523  19.905  21.988   0.738
> >
> >  make  0  50.480  50.991  51.752   0.340
> >  make  1  49.819  50.442  51.000   0.292
> >
> >  comb  0 110.433 112.464 114.176   0.932
> >  comb  1 110.694 112.256 114.683   0.948
> >
> >  psys  0  10.28%  10.91%  11.29%   0.243
> >  psys  1  10.21%  11.05%  11.80%   0.350
> >
> >
> > All results are in seconds (as measured by Python's time.time()), except for the psys - which was the average of mpstat's %sys column over the life of the whole run. The mkfs part consisted of [mkfs -t ext2 ; sync ; sync], untar [mount; untar linux sources; umount; sync; sync], make [mount; make allnoconfig; make -j 3; umount; sync; sync], and comb is the combined times of the mkfs, untar and make parts.
> >
> > So, in a nutshell, we saw slightly better overall performance, but not conclusively, and we saw slightly elevated %system time to accomplish the task.
> >
> > On the 4-way, results were much worse: the final data shown before the system hung showed the rq=1 passes taking significantly longer, albeit at lower %system. I'm going to try the runs again, but I have a feeling that the latest "clean" patch based upon Nick's single call mechanism is a step backwards.
> >
> > Alan
> 
> I was able to go back and capture the results after 17 passes on the
> 4-way IA64 box (before it hung), and with rq=1 it shows a huge
> increase in time needed to do the combined tests - almost 18% longer,
> however with a reduction of about 24% less system time.
> 
> Part  RQ   MIN     AVG     MAX      Dev
> ----- --  ------  ------  ------  ------
>  mkfs  0  18.543  19.055  19.514   0.285
>  mkfs  1  18.730  19.217  19.812   0.316
> 
> untar  0  17.119  21.396  43.868   8.025
> untar  1  16.987  28.155  44.637  10.175
> 
>  make  0  23.105  23.866  24.487   0.359
>  make  1  24.015  28.384  37.829   3.598
> 
>  comb  0  59.610  64.317  86.733   7.896
>  comb  1  63.181  75.755  94.079  10.489
> 
>  psys  0  10.35%  14.16%  16.28%   1.375
>  psys  1   6.89%  10.73%  14.30%   2.368
> 
> I'll try to snag some profile data to see what's up.

That definitely looks like a bug. One thing to try would be to drop the
test for whether to send the IPI or not, and just unconditionally do it.
For ia64, that would be in
arch/ia64/kernel/smp.c:__smp_call_function_single(), just change

        if (ipi)
                send_IPI_single(cpu, IPI_CALL_FUNC_SINGLE);

to

        send_IPI_single(cpu, IPI_CALL_FUNC_SINGLE);

so that it always gets notified.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IO CPU affinity test results
  2008-03-14 12:36 ` Jens Axboe
@ 2008-03-14 13:05   ` Jens Axboe
  2008-03-14 13:41     ` Alan D. Brunelle
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2008-03-14 13:05 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel, npiggin, dgc

On Fri, Mar 14 2008, Jens Axboe wrote:
> I think that is encouraging, for such a small setup. The make results
> are particularly nice. The hangs are a bother, I have no good ideas on
> why the occur. The fact that it happens on both archs indicates that
> this is perhaps a generic problem, which is good. The code to support
> this is relatively simple, so it should be possible to go over it with a
> fine toothed comb and see if anything shows up.
> 
> You didn't get any watchdog triggers on the serial console, or anything
> like that?

Here's something that may explain it - if interrupts aren't disabled
when generic_smp_call_function_single() is called, we could deadlock
on the dst->lock. I think that the IPI invoke will have interrupt
disabled, but I'm not 100% certain.

Can you see if this passes the muster?

diff --git a/kernel/smp.c b/kernel/smp.c
index 852abd3..65808df 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -24,12 +24,13 @@ void __cpuinit generic_init_call_single_data(void)
 void generic_smp_call_function_single_interrupt(void)
 {
 	struct call_single_queue *q;
+	unsigned long flags;
 	LIST_HEAD(list);
 
 	q = &__get_cpu_var(call_single_queue);
-	spin_lock(&q->lock);
+	spin_lock_irqsave(&q->lock);
 	list_replace_init(&q->list, &list);
-	spin_unlock(&q->lock);
+	spin_unlock_irqrestore(&q->lock);
 
 	while (!list_empty(&list)) {
 		struct call_single_data *data;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: IO CPU affinity test results
  2008-03-14 13:05   ` Jens Axboe
@ 2008-03-14 13:41     ` Alan D. Brunelle
  2008-03-14 15:55       ` Alan D. Brunelle
  0 siblings, 1 reply; 7+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 13:41 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, npiggin, dgc

Jens Axboe wrote:
> On Fri, Mar 14 2008, Jens Axboe wrote:
>> I think that is encouraging, for such a small setup. The make results
>> are particularly nice. The hangs are a bother, I have no good ideas on
>> why the occur. The fact that it happens on both archs indicates that
>> this is perhaps a generic problem, which is good. The code to support
>> this is relatively simple, so it should be possible to go over it with a
>> fine toothed comb and see if anything shows up.
>>
>> You didn't get any watchdog triggers on the serial console, or anything
>> like that?
>
> Here's something that may explain it - if interrupts aren't disabled
> when generic_smp_call_function_single() is called, we could deadlock
> on the dst->lock. I think that the IPI invoke will have interrupt
> disabled, but I'm not 100% certain.
>
> Can you see if this passes the muster?
>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 852abd3..65808df 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -24,12 +24,13 @@ void __cpuinit generic_init_call_single_data(void)
>  void generic_smp_call_function_single_interrupt(void)
>  {
>  	struct call_single_queue *q;
> +	unsigned long flags;
>  	LIST_HEAD(list);
> 
>  	q = &__get_cpu_var(call_single_queue);
> -	spin_lock(&q->lock);
> +	spin_lock_irqsave(&q->lock);
>  	list_replace_init(&q->list, &list);
> -	spin_unlock(&q->lock);
> +	spin_unlock_irqrestore(&q->lock);
> 
>  	while (!list_empty(&list)) {
>  		struct call_single_data *data;
>

Well, putting in these two small patches (/always/ invoke send_IPI_single + this one, modified to include flags on the two spin calls), things look better after two runs (without profiling enabled), this was done on the 4-way ia64 box:

Part  RQ   MIN     AVG     MAX      Dev
----- --  ------  ------  ------  ------
 mkfs  0  18.151  18.311  18.472   0.227
 mkfs  1  18.420  18.456  18.492   0.051

untar  0  18.260  18.420  18.581   0.228
untar  1  18.594  19.477  20.360   1.249

 make  0  23.730  24.149  24.567   0.592
 make  1  23.401  23.599  23.797   0.280

 comb  0  60.141  60.881  61.620   1.046
 comb  1  60.810  61.532  62.253   1.020

 psys  0   4.16%   4.17%   4.18%   0.011
 psys  1   3.80%   3.93%   4.06%   0.180

Will do a longer set of runs and report out on that. As you had noticed earlier Jens, the make results look good...

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IO CPU affinity test results
  2008-03-14 13:41     ` Alan D. Brunelle
@ 2008-03-14 15:55       ` Alan D. Brunelle
  0 siblings, 0 replies; 7+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 15:55 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Jens Axboe, linux-kernel, npiggin, dgc

4-way IA64, better, but not good after 30 runs (in single user mode):

Part  RQ   MIN     AVG     MAX      Dev
----- --  ------  ------  ------  ------
 mkfs 0  18.893  19.429  20.129   0.351
 mkfs 1  18.912  19.347  19.934   0.258

untar 0  17.007  17.538  18.829   0.416
untar 1  16.980  17.329  18.369   0.284

 make 0  22.050  23.000  23.546   0.400
 make 1  22.416  23.141  23.637   0.349

 comb 0  58.138  59.967  61.923   0.793
 comb 1  58.428  59.817  60.750   0.608

 psys 0   7.69%   8.77%   9.93%   0.443
 psys 1   8.47%  14.20%  16.33%   2.566

Better run time w/ rq=1 (only 0.25%) but at a huge cost in %sys. Will do some profiling to see what's up.

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-03-14 15:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-14 12:14 IO CPU affinity test results Alan D. Brunelle
2008-03-14 12:23 ` Alan D. Brunelle
2008-03-14 12:37   ` Jens Axboe
2008-03-14 12:36 ` Jens Axboe
2008-03-14 13:05   ` Jens Axboe
2008-03-14 13:41     ` Alan D. Brunelle
2008-03-14 15:55       ` Alan D. Brunelle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox