netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Timeline of IPoIB performance
       [not found] ` <52br20lsei.fsf@cisco.com>
@ 2005-10-08  2:25   ` Matt Leininger
  2005-10-10 18:23     ` Roland Dreier
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Leininger @ 2005-10-08  2:25 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

I'm adding netdev to this thread to see if they can help.

I'm seeing an IPoIB (IP over InfiniBand) netperf performance drop off,
of up to 90 MB/s, when using kernels newer than 2.6.11.  This doesn't
appear to be an OpenIB IPoIB issue since the older in-kernel IB for
2.6.11 and a recent svn3687 snapshot both have the same performance (464
MB/s) with 2.6.11.  I used the same kernel config file as a starting
point for each of these kernel builds.  Have there been any changes in
Linux that would explain these results?

Here is the hardware setup and netperf results using 'netperf -f -M -c
-C -H IPoIB_ADDRESS

All benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0
dual EM64T 3.2 GHz PCIe IB HCA (memfull)

Kernel           OpenIB    msi_x  netperf (MB/s)  
2.6.14-rc3      in-kernel    1     374 
2.6.13.2        svn3627      1     386 
2.6.13.2        in-kernel    1     394 
2.6.12.5-lustre in-kernel    1     399  
2.6.12.5        in-kernel    1     402 
2.6.12          in-kernel    1     406 
2.6.12-rc6      in-kernel    1     407
2.6.12-rc5      in-kernel    1     405   <<<<<
2.6.12-rc4      in-kernel    1     470   <<<<<
2.6.12-rc3      in-kernel    1     466 
2.6.12-rc2      in-kernel    1     469 
2.6.12-rc1      in-kernel    1     466
2.6.11          in-kernel    1     464 
2.6.11          svn3687      1     464 
2.6.9-11.ELsmp  svn3513      1     425  (Woody's results, 3.6Ghz EM64T) 

 Thanks,

	- Matt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-08  2:25   ` Timeline of IPoIB performance Matt Leininger
@ 2005-10-10 18:23     ` Roland Dreier
  2005-10-10 20:03       ` Michael S. Tsirkin
                         ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Roland Dreier @ 2005-10-10 18:23 UTC (permalink / raw)
  To: Matt Leininger; +Cc: netdev, openib-general

     > 2.6.12-rc5      in-kernel    1     405   <<<<<
     > 2.6.12-rc4      in-kernel    1     470   <<<<<

I was optimistic when I saw this, because the changeover to git
occurred with 2.6.12-rc2, so I thought I could use git bisect to track
down exactly when the performance regression happened.

However, I haven't been able to get numbers that are stable enough to
track this down.  I have two systems, both HP DL145s with dual Opteron
875s and two-port mem-free PCI Express HCAs.  I use MSI-X with the
completion interrupt affinity set to CPU 0, and "taskset 2" to run
netserver and netperf on CPU 1.

With default netperf parameters (just "-H otherguy") I get numbers
between ~490 MB/sec and ~550 MB/sec for 2.6.12-rc4 and 2.6.12-rc5.
The numbers are quite consistent between reboots, but if I reboot the
system (even keeping the kernel identical), I see large performance
changes.  Presumably something is happening like the cache coloring of
some hot data structures changing semi-randomly depending on the
timing of various initialations.

Matt, how stable are your numbers?

 - R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 18:23     ` Roland Dreier
@ 2005-10-10 20:03       ` Michael S. Tsirkin
  2005-10-10 21:03         ` Roland Dreier
  2005-10-10 20:17       ` Rick Jones
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2005-10-10 20:03 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

Hi Roland,

Quoting r. Roland Dreier <rolandd@cisco.com>:
> However, I haven't been able to get numbers that are stable enough to
> track this down. 

Disabling irq balancing sometimes helps me make the numbers more stable.
Hope this helps,

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 18:23     ` Roland Dreier
  2005-10-10 20:03       ` Michael S. Tsirkin
@ 2005-10-10 20:17       ` Rick Jones
  2005-10-10 20:58         ` Roland Dreier
  2005-10-10 21:26       ` Grant Grundler
  2005-10-10 23:25       ` Matt Leininger
  3 siblings, 1 reply; 19+ messages in thread
From: Rick Jones @ 2005-10-10 20:17 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

Roland Dreier wrote:
>      > 2.6.12-rc5      in-kernel    1     405   <<<<<
>      > 2.6.12-rc4      in-kernel    1     470   <<<<<
> 
> I was optimistic when I saw this, because the changeover to git
> occurred with 2.6.12-rc2, so I thought I could use git bisect to track
> down exactly when the performance regression happened.
> 
> However, I haven't been able to get numbers that are stable enough to
> track this down.  I have two systems, both HP DL145s with dual Opteron
> 875s and two-port mem-free PCI Express HCAs.  I use MSI-X with the
> completion interrupt affinity set to CPU 0, and "taskset 2" to run
> netserver and netperf on CPU 1.
> 
> With default netperf parameters (just "-H otherguy") I get numbers
> between ~490 MB/sec and ~550 MB/sec for 2.6.12-rc4 and 2.6.12-rc5.
> The numbers are quite consistent between reboots, but if I reboot the
> system (even keeping the kernel identical), I see large performance
> changes.  Presumably something is happening like the cache coloring of
> some hot data structures changing semi-randomly depending on the
> timing of various initialations.

Which rev of netperf are you using, and areyou using the "confidence intervals" 
options (-i, -I)?  for a long time, the linux-unique behaviour of returning the 
overhead bytes for SO_[SND|RCV]BUF and them being 2X what one gives in 
setsockopt() gave netperf some trouble - the socket buffer would double in size 
each iteration on a confidence interval run.  Later netperf versions (late 2.3, 
and 2.4.X) have a kludge for this.

Slightly related to that, IIRC, the linux receiver code adjusts the advertised 
window as the connection goes along - how far the receive code opens the window 
may change from run to run - might that have an effect?  If there is a way to 
get the linux receiver to simply advertise the full window from the beginning 
that might help minimize the number of variables.

Are there large changes in service demand along with the large performance changes?

FWIW, on later netperfs the -T option should allow you to specify the CPU on 
which netperf and/or netserver run, although I've had some trouble reliably 
detecting the right sched_setaffinity syntax among the releases.

rick jones

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 20:17       ` Rick Jones
@ 2005-10-10 20:58         ` Roland Dreier
  2005-10-10 21:22           ` Rick Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2005-10-10 20:58 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, openib-general

    Rick> Which rev of netperf are you using, and areyou using the
    Rick> "confidence intervals" options (-i, -I)?  for a long time,
    Rick> the linux-unique behaviour of returning the overhead bytes
    Rick> for SO_[SND|RCV]BUF and them being 2X what one gives in
    Rick> setsockopt() gave netperf some trouble - the socket buffer
    Rick> would double in size each iteration on a confidence interval
    Rick> run.  Later netperf versions (late 2.3, and 2.4.X) have a
    Rick> kludge for this.

I believe it's netperf 2.2.

I'm not using any confidence interval stuff.  However, the variation
is not between single runs of netperf -- if I do 5 runs of netperf in
a row, I get roughly the same number from each run.  For example, I
might see something like

    TCP STREAM TEST to 192.168.145.2 : histogram
    Recv   Send    Send
    Socket Socket  Message  Elapsed
    Size   Size    Size     Time     Throughput
    bytes  bytes   bytes    secs.    10^6bits/sec
    
     87380  16384  16384    10.00    3869.82
    
and then

    TCP STREAM TEST to 192.168.145.2 : histogram
    Recv   Send    Send
    Socket Socket  Message  Elapsed
    Size   Size    Size     Time     Throughput
    bytes  bytes   bytes    secs.    10^6bits/sec
    
     87380  16384  16384    10.00    3862.41

for two successive runs.  However, if I reboot the system into the
same kernel (ie everything set up exactly the same), the same
invocation of netperf might give

    TCP STREAM TEST to 192.168.145.2 : histogram
    Recv   Send    Send
    Socket Socket  Message  Elapsed
    Size   Size    Size     Time     Throughput
    bytes  bytes   bytes    secs.    10^6bits/sec
    
     87380  16384  16384    10.00    4389.20

    Rick> Are there large changes in service demand along with the
    Rick> large performance changes?

Not sure.  How do I have netperf report service demand?

 - R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 20:03       ` Michael S. Tsirkin
@ 2005-10-10 21:03         ` Roland Dreier
  0 siblings, 0 replies; 19+ messages in thread
From: Roland Dreier @ 2005-10-10 21:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, openib-general

    Michael> Disabling irq balancing sometimes helps me make the
    Michael> numbers more stable.

I don't think that's an issue.  I'm running on x86_64, which I don't
think has the kernel irq balancer, and I'm not running a userspace IRQ
balancer.  I can see all the mthca interrupts going to the CPU I set
through the smp_affinity file.

 - R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 20:58         ` Roland Dreier
@ 2005-10-10 21:22           ` Rick Jones
  0 siblings, 0 replies; 19+ messages in thread
From: Rick Jones @ 2005-10-10 21:22 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

Roland Dreier wrote:
>     Rick> Which rev of netperf are you using, and areyou using the
>     Rick> "confidence intervals" options (-i, -I)?  for a long time,
>     Rick> the linux-unique behaviour of returning the overhead bytes
>     Rick> for SO_[SND|RCV]BUF and them being 2X what one gives in
>     Rick> setsockopt() gave netperf some trouble - the socket buffer
>     Rick> would double in size each iteration on a confidence interval
>     Rick> run.  Later netperf versions (late 2.3, and 2.4.X) have a
>     Rick> kludge for this.
> 
> I believe it's netperf 2.2.

That's rather old.  I literally just put 2.4.1 out on ftp.cup.hp.com - probably 
better to use that if possible. Not that it will change the variability just 
that I like it when people are up-to-date on the versions :)  If nothing else, 
the 2.4.X version(s) have a much improved (hopefully) manual in doc/


[If you are really maschochistic, the very first release of netperf 4.0.0 source 
has happened. I can make no guarantees as to its actually working at the moment 
though :) Netperf4 is going to be the stream for the multiple-connection, 
multiple system tests rather than the single-connection nature of netperf2]

> I'm not using any confidence interval stuff.  However, the variation
> is not between single runs of netperf -- if I do 5 runs of netperf in
> a row, I get roughly the same number from each run.  For example, I
> might see something like
> 
>     TCP STREAM TEST to 192.168.145.2 : histogram
>     Recv   Send    Send
>     Socket Socket  Message  Elapsed
>     Size   Size    Size     Time     Throughput
>     bytes  bytes   bytes    secs.    10^6bits/sec
>     
>      87380  16384  16384    10.00    3869.82
>     
> and then
> 
>     TCP STREAM TEST to 192.168.145.2 : histogram
>     Recv   Send    Send
>     Socket Socket  Message  Elapsed
>     Size   Size    Size     Time     Throughput
>     bytes  bytes   bytes    secs.    10^6bits/sec
>     
>      87380  16384  16384    10.00    3862.41
> 
> for two successive runs.  However, if I reboot the system into the
> same kernel (ie everything set up exactly the same), the same
> invocation of netperf might give
> 
>     TCP STREAM TEST to 192.168.145.2 : histogram
>     Recv   Send    Send
>     Socket Socket  Message  Elapsed
>     Size   Size    Size     Time     Throughput
>     bytes  bytes   bytes    secs.    10^6bits/sec
>     
>      87380  16384  16384    10.00    4389.20
> 
>     Rick> Are there large changes in service demand along with the
>     Rick> large performance changes?
> 
> Not sure.  How do I have netperf report service demand?

Ask for CPU utilization with -c (local) and -C (remote).  The /proc/stat stuff 
used by Linux does not need calibration (IIRC) so you don't have to worry about 
that.

If cache effects are involved, you can make netperf "harder" or "easier" on the 
caches by altering the size of the send and/or recv buffer rings.  By default 
they are one more than the socket buffer size divided by the send size, but you 
can make them larger or smaller with the -W option.

These days I use a 128K socket buffer and 32K send for the "canonical" (although 
  not default :) netperf TCP_STREAM test:

netperf -H remote -c -C -- -s 128K -S 128K -m 32K

In netperf-speak K == 1024, k == 1000, M == 2^20, m == 10^6, G == 2^40, g == 10^9...
rick jones

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 18:23     ` Roland Dreier
  2005-10-10 20:03       ` Michael S. Tsirkin
  2005-10-10 20:17       ` Rick Jones
@ 2005-10-10 21:26       ` Grant Grundler
  2005-10-10 23:30         ` Grant Grundler
  2005-10-10 23:25       ` Matt Leininger
  3 siblings, 1 reply; 19+ messages in thread
From: Grant Grundler @ 2005-10-10 21:26 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

On Mon, Oct 10, 2005 at 11:23:45AM -0700, Roland Dreier wrote:
>      > 2.6.12-rc5      in-kernel    1     405   <<<<<
>      > 2.6.12-rc4      in-kernel    1     470   <<<<<
> 
> I was optimistic when I saw this, because the changeover to git
> occurred with 2.6.12-rc2, so I thought I could use git bisect to track
> down exactly when the performance regression happened.
> 
> However, I haven't been able to get numbers that are stable enough to
> track this down.  I have two systems, both HP DL145s with dual Opteron
> 875s and two-port mem-free PCI Express HCAs.  I use MSI-X with the
> completion interrupt affinity set to CPU 0, and "taskset 2" to run
> netserver and netperf on CPU 1.

As you know, opteron boxes are NUMA. I think you want MSI-X interrupt
bound to the same CPU that's connected to the IO. Is CPU 0 closer to IO?
I would bind netperf to CPU0 and netserver to CPU 1 on each box respectively.
Or just try all 4 combinations to see which combinations are CPU bound
vs memory/IO bound.

> With default netperf parameters (just "-H otherguy") I get numbers
> between ~490 MB/sec and ~550 MB/sec for 2.6.12-rc4 and 2.6.12-rc5.
> The numbers are quite consistent between reboots, but if I reboot the
> system (even keeping the kernel identical), I see large performance
> changes.

I gather you meant "tests" in the first phrase? (vs reboot).

> Presumably something is happening like the cache coloring of
> some hot data structures changing semi-randomly depending on the
> timing of various initialations.

My guess is based on the same premise.
The mem-free card will be very sensitive to were it's control data
is allocated. Is either box configured to interleave memory from
both CPUs?

If it's interleaving, every other cacheline will be "local".
Can you disable interleave and try different netperf/server
bindings as suggested above?

hth,
grant

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 18:23     ` Roland Dreier
                         ` (2 preceding siblings ...)
  2005-10-10 21:26       ` Grant Grundler
@ 2005-10-10 23:25       ` Matt Leininger
  2005-10-10 23:38         ` Roland Dreier
  3 siblings, 1 reply; 19+ messages in thread
From: Matt Leininger @ 2005-10-10 23:25 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

On Mon, 2005-10-10 at 11:23 -0700, Roland Dreier wrote:
>      > 2.6.12-rc5      in-kernel    1     405   <<<<<
>      > 2.6.12-rc4      in-kernel    1     470   <<<<<
> 
> I was optimistic when I saw this, because the changeover to git
> occurred with 2.6.12-rc2, so I thought I could use git bisect to track
> down exactly when the performance regression happened.
> 
> However, I haven't been able to get numbers that are stable enough to
> track this down.  I have two systems, both HP DL145s with dual Opteron
> 875s and two-port mem-free PCI Express HCAs.  I use MSI-X with the
> completion interrupt affinity set to CPU 0, and "taskset 2" to run
> netserver and netperf on CPU 1.
> 
> With default netperf parameters (just "-H otherguy") I get numbers
> between ~490 MB/sec and ~550 MB/sec for 2.6.12-rc4 and 2.6.12-rc5.
> The numbers are quite consistent between reboots, but if I reboot the
> system (even keeping the kernel identical), I see large performance
> changes.  Presumably something is happening like the cache coloring of
> some hot data structures changing semi-randomly depending on the
> timing of various initialations.
> 
> Matt, how stable are your numbers?


  Pretty consistent.  Here are a few runs with 2.6.12-rc5 with reboots
in between each run.  I'm using netperf-2.3pl1.

Run 1:
TCP STREAM TEST to 10.128.20.6
Recv   Send    Send                          Utilization       Service
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send
Recv
Size   Size    Size     Time     Throughput  local    remote   local
remote
bytes  bytes   bytes    secs.    KBytes  /s  % T      % T      us/KB
us/KB

 87380  16384  16384    10.00      410302.39   99.89    92.09    4.869
4.489

Run 2: (after another reboot)
TCP STREAM TEST to 10.128.20.6
Recv   Send    Send                          Utilization       Service
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send
Recv
Size   Size    Size     Time     Throughput  local    remote   local
remote
bytes  bytes   bytes    secs.    KBytes  /s  % T      % T      us/KB
us/KB

 87380  16384  16384    10.00      409510.33   99.89    91.59    4.879
4.473

Run 3: (after reboot)
TCP STREAM TEST to 10.128.20.6
Recv   Send    Send                          Utilization       Service
Demand
Socket Socket  Message  Elapsed              Send     Recv     Send
Recv
Size   Size    Size     Time     Throughput  local    remote   local
remote
bytes  bytes   bytes    secs.    KBytes  /s  % T      % T      us/KB
us/KB

 87380  16384  16384    10.00      404354.11   99.89    91.39    4.941
4.520


I see the same variance in netperf results if I don't reboot between
runs.  

  - Matt



  


> 
  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 21:26       ` Grant Grundler
@ 2005-10-10 23:30         ` Grant Grundler
  2005-10-11  0:51           ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Grant Grundler @ 2005-10-10 23:30 UTC (permalink / raw)
  To: Grant Grundler; +Cc: netdev, openib-general

On Mon, Oct 10, 2005 at 02:26:52PM -0700, Grant Grundler wrote:
...
> If it's interleaving, every other cacheline will be "local".

ISTR AMD64 was page-interleaved but then got confused by documents
describing "128-bit" 2-way interleave. I now realize the 128bit
is refering to interleave between two "banks" of memory behind
each memory controller. ie 2 * 128-bit provides in the 32-byte
cacheline size that most x86 programs expect.

Anyway, I'm hoping that we'll see a consistent result if node interleave
is turned off.

sorry for the confusion,
grant

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 23:25       ` Matt Leininger
@ 2005-10-10 23:38         ` Roland Dreier
  2005-10-10 23:44           ` Matt Leininger
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2005-10-10 23:38 UTC (permalink / raw)
  To: Matt Leininger; +Cc: netdev, openib-general

    Matt>   Pretty consistent.  Here are a few runs with 2.6.12-rc5
    Matt> with reboots in between each run.  I'm using netperf-2.3pl1.

That's interesting.  I'm guessing you're using mem-ful HCAs?

Given that your results are more stable than mine, if you're up for
it, you could install git, clone Linus's tree, and then do a git
bisect between 2.6.12-rc4 and 2.6.12-rc5 to narrow down the regression
to a single commit (if in fact that's possible).

 - R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 23:38         ` Roland Dreier
@ 2005-10-10 23:44           ` Matt Leininger
  2005-10-10 23:53             ` Roland Dreier
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Leininger @ 2005-10-10 23:44 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, openib-general

On Mon, 2005-10-10 at 16:38 -0700, Roland Dreier wrote:
>     Matt>   Pretty consistent.  Here are a few runs with 2.6.12-rc5
>     Matt> with reboots in between each run.  I'm using netperf-2.3pl1.
> 
> That's interesting.  I'm guessing you're using mem-ful HCAs?

  Yes, I'm using mem-full HCAs.  I could try reflashing the firmware for
memfree if that's of interest.
> 
> Given that your results are more stable than mine, if you're up for
> it, you could install git, clone Linus's tree, and then do a git
> bisect between 2.6.12-rc4 and 2.6.12-rc5 to narrow down the regression
> to a single commit (if in fact that's possible).
  
 I was hoping someone else would do this.  :)
 
 I'll start working on it tomorrow if no one else gets to it.

  Thanks,

	- Matt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 23:44           ` Matt Leininger
@ 2005-10-10 23:53             ` Roland Dreier
  2005-10-11  4:03               ` Roland Dreier
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2005-10-10 23:53 UTC (permalink / raw)
  To: Matt Leininger; +Cc: netdev, openib-general

    Matt>   Yes, I'm using mem-full HCAs.  I could try reflashing the
    Matt> firmware for memfree if that's of interest.

No, probably not.  If I get a chance I'll do the opposite (flash
mem-free -> mem-full, since my HCAs do have memory) and see if it
makes my results stable.
  
    Matt>  I was hoping someone else would do this.  :) I'll start
    Matt> working on it tomorrow if no one else gets to it.

I might get a chance to do it tonight... I'll post if I do.

 - R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 23:30         ` Grant Grundler
@ 2005-10-11  0:51           ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2005-10-11  0:51 UTC (permalink / raw)
  To: Grant Grundler; +Cc: netdev, openib-general

On Tuesday 11 October 2005 01:30, Grant Grundler wrote:
> On Mon, Oct 10, 2005 at 02:26:52PM -0700, Grant Grundler wrote:
> ...
>
> > If it's interleaving, every other cacheline will be "local".
>
> ISTR AMD64 was page-interleaved but then got confused by documents
> describing "128-bit" 2-way interleave. I now realize the 128bit
> is refering to interleave between two "banks" of memory behind
> each memory controller. ie 2 * 128-bit provides in the 32-byte
> cacheline size that most x86 programs expect.

The cache line size on K7 and K8 is 64 bytes.

> Anyway, I'm hoping that we'll see a consistent result if node interleave
> is turned off.

Yes usually a good idea.


-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-10 23:53             ` Roland Dreier
@ 2005-10-11  4:03               ` Roland Dreier
  0 siblings, 0 replies; 19+ messages in thread
From: Roland Dreier @ 2005-10-11  4:03 UTC (permalink / raw)
  To: Matt Leininger; +Cc: netdev, openib-general

    Roland> I might get a chance to do it tonight... I'll post if I do.

I'm giving it a shot but I just can't reproduce this well on my
systems.  I do see a pretty big regression between 2.6.12-rc4 and
2.6.14-rc2, but 2.6.12-rc5 looks OK on my systems.

I reflashed to FW 4.7.0 (mem-ful) and built netperf 2.4.1.

With 2.6.12-rc4 I've seen runs as slow as:

    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.145.2 (192.168.145.2) port 0 AF_INET
    Recv   Send    Send                          Utilization       Service Demand
    Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
    Size   Size    Size     Time     Throughput  local    remote   local   remote
    bytes  bytes   bytes    secs.    MBytes  /s  % S      % U      us/KB   us/KB
    
     87380  16384  16384    10.00       553.71   37.46    -1.00    2.642   -1.000

and with 2.6.12-rc5 I've seen runs as fast as:

    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.145.2 (192.168.145.2) port 0 AF_INET
    Recv   Send    Send                          Utilization       Service Demand
    Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
    Size   Size    Size     Time     Throughput  local    remote   local   remote
    bytes  bytes   bytes    secs.    MBytes  /s  % S      % U      us/KB   us/KB
    
     87380  16384  16384    10.00       581.82   39.58    -1.00    2.657   -1.000

so not much difference there.  With 2.6.14-rc2, the best of 10 runs was:

    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.145.2 (192.168.145.2) port 0 AF_INET
    Recv   Send    Send                          Utilization       Service Demand
    Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
    Size   Size    Size     Time     Throughput  local    remote   local   remote
    bytes  bytes   bytes    secs.    MBytes  /s  % S      % U      us/KB   us/KB
    
     87380  16384  16384    10.01       497.00   39.71    -1.00    3.121   -1.000

so we've definitely lost something there.

Time to do some more bisecting...

 - R.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
       [not found] <E1EPIxF-0001Cp-00@gondolin.me.apana.org.au>
@ 2005-10-12 16:53 ` Roland Dreier
  2005-10-12 18:28   ` Matt Leininger
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Dreier @ 2005-10-12 16:53 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, openib-general

    Herbert> Try reverting the changeset

    Herbert> 314324121f9b94b2ca657a494cf2b9cb0e4a28cc

    Herbert> which lies between these two points and may be relevant.

Matt, I pulled this out of git for you.  I guess Herbert is suggesting
to patch -R the below against 2.6.12-rc5:

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 79835a6..5bad504 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4355,16 +4355,7 @@ int tcp_rcv_established(struct sock *sk,
 					goto no_ack;
 			}
 
-			if (eaten) {
-				if (tcp_in_quickack_mode(tp)) {
-					tcp_send_ack(sk);
-				} else {
-					tcp_send_delayed_ack(sk);
-				}
-			} else {
-				__tcp_ack_snd_check(sk, 0);
-			}
-
+			__tcp_ack_snd_check(sk, 0);
 no_ack:
 			if (eaten)
 				__kfree_skb(skb);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-12 16:53 ` Roland Dreier
@ 2005-10-12 18:28   ` Matt Leininger
  2005-10-13  1:24     ` Matt Leininger
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Leininger @ 2005-10-12 18:28 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Herbert Xu, openib-general

On Wed, 2005-10-12 at 09:53 -0700, Roland Dreier wrote:
>     Herbert> Try reverting the changeset
> 
>     Herbert> 314324121f9b94b2ca657a494cf2b9cb0e4a28cc
> 
>     Herbert> which lies between these two points and may be relevant.
> 
> Matt, I pulled this out of git for you.  I guess Herbert is suggesting
> to patch -R the below against 2.6.12-rc5:

I applied your patch suggest by Herbert:

http://www.mail-archive.com/openib-general%40openib.org/msg11415.html

to my 2.6.12-rc5 tree and IPoIB performance improved back to the ~475
MB/s range for my EM64T system.  The data is below.

I'm building/testing 2.6.14-rc4 with and without this patch now.


All benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0
dual EM64T 3.2 GHz PCIe IB HCA (memfull)

Kernel           OpenIB    msi_x  netperf (MB/s)  
2.6.14-rc3      in-kernel    1     374 
2.6.13.2        svn3627      1     386 
2.6.13.2        in-kernel    1     394 
2.6.12.5-lustre in-kernel    1     399  
2.6.12.5        in-kernel    1     402 
2.6.12          in-kernel    1     406 
2.6.12-rc6      in-kernel    1     407
2.6.12-rc5      in-kernel    1     405                        <<<<
2.6.12-rc5                                                    <<<<
 - remove changeset 314324121f9b94b2ca657a494cf2b9cb0e4a28cc  <<<<
                in-kernel    1     474                        <<<<
2.6.12-rc4      in-kernel    1     470 
2.6.12-rc3      in-kernel    1     466 
2.6.12-rc2      in-kernel    1     469 
2.6.12-rc1      in-kernel    1     466
2.6.11          in-kernel    1     464 
2.6.11          svn3687      1     464 
2.6.9-11.ELsmp  svn3513      1     425  (Woody's results, 3.6Ghz EM64T) 

  - Matt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-12 18:28   ` Matt Leininger
@ 2005-10-13  1:24     ` Matt Leininger
  2005-10-13  1:48       ` Herbert Xu
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Leininger @ 2005-10-13  1:24 UTC (permalink / raw)
  To: Roland Dreier; +Cc: netdev, Herbert Xu, openib-general

On Wed, 2005-10-12 at 11:28 -0700, Matt Leininger wrote:
> On Wed, 2005-10-12 at 09:53 -0700, Roland Dreier wrote:
> >     Herbert> Try reverting the changeset
> > 
> >     Herbert> 314324121f9b94b2ca657a494cf2b9cb0e4a28cc
> > 
> >     Herbert> which lies between these two points and may be relevant.
> > 
> > Matt, I pulled this out of git for you.  I guess Herbert is suggesting
> > to patch -R the below against 2.6.12-rc5:

> I applied your patch suggest by Herbert:
> 
> http://www.mail-archive.com/openib-general%40openib.org/msg11415.html
> 
  I backed out this patch out of a few other kernels and always see a
performance improvement.  This gets back ~50-60 MB/s of the 90-100 MB/s
drop off in IPoIB performance.   

  Is it still worth testing the TSO patches that Herbert suggested for
some of the 2.6.13-rc kernels?
 
  Thanks,

   - Matt



All benchmarks are with RHEL4 x86_64 with HCA FW v4.7.0
dual EM64T 3.2 GHz PCIe IB HCA (memfull)

Kernel           OpenIB    msi_x  netperf (MB/s)  
2.6.14-rc4      in-kernel    1     434  (backed out patch)
2.6.14-rc4      in-kernel    1     385 

2.6.13.2        svn3627      1     446  (backed out patch)
2.6.13.2        svn3627      1     386 
2.6.13.2        in-kernel    1     394 

2.6.12.5        in-kernel    1     464  (backed out patch)
2.6.12.5        in-kernel    1     402 

2.6.12-rc6      in-kernel    1     470  (backed out patch) 
2.6.12-rc6      in-kernel    1     407

2.6.12-rc5      in-kernel    1     474 (backed out patch)
2.6.12-rc5      in-kernel    1     405 


2.6.9-11.ELsmp  svn3513      1     425  (Woody's results, 3.6Ghz EM64T) 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Timeline of IPoIB performance
  2005-10-13  1:24     ` Matt Leininger
@ 2005-10-13  1:48       ` Herbert Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Herbert Xu @ 2005-10-13  1:48 UTC (permalink / raw)
  To: Matt Leininger; +Cc: netdev, openib-general

On Wed, Oct 12, 2005 at 06:24:32PM -0700, Matt Leininger wrote:
> 
>   Is it still worth testing the TSO patches that Herbert suggested for
> some of the 2.6.13-rc kernels?

If you're still seeing a performance regression compared to 
2.6.12-rc4, then yes (According to the figures in your message
there does seem to be a bit of loss after the release of 2.6.12).

The patch you reverted may degrade the performance on the receiver.
The TSO patches may be causing some degradation on your sender.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-10-13  1:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1128672413.13948.326.camel@localhost>
     [not found] ` <52br20lsei.fsf@cisco.com>
2005-10-08  2:25   ` Timeline of IPoIB performance Matt Leininger
2005-10-10 18:23     ` Roland Dreier
2005-10-10 20:03       ` Michael S. Tsirkin
2005-10-10 21:03         ` Roland Dreier
2005-10-10 20:17       ` Rick Jones
2005-10-10 20:58         ` Roland Dreier
2005-10-10 21:22           ` Rick Jones
2005-10-10 21:26       ` Grant Grundler
2005-10-10 23:30         ` Grant Grundler
2005-10-11  0:51           ` Andi Kleen
2005-10-10 23:25       ` Matt Leininger
2005-10-10 23:38         ` Roland Dreier
2005-10-10 23:44           ` Matt Leininger
2005-10-10 23:53             ` Roland Dreier
2005-10-11  4:03               ` Roland Dreier
     [not found] <E1EPIxF-0001Cp-00@gondolin.me.apana.org.au>
2005-10-12 16:53 ` Roland Dreier
2005-10-12 18:28   ` Matt Leininger
2005-10-13  1:24     ` Matt Leininger
2005-10-13  1:48       ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).