public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Re: SDP perf drop with 2.6.15
@ 2006-01-12 19:50 Grant Grundler
  2006-01-12 19:58 ` Michael S. Tsirkin
  2006-01-31 21:55 ` [openib-general] " Grant Grundler
  0 siblings, 2 replies; 3+ messages in thread
From: Grant Grundler @ 2006-01-12 19:50 UTC (permalink / raw)
  To: linux-ia64

On Thu, Jan 12, 2006 at 09:18:03AM +0200, Michael S. Tsirkin wrote:
> > Makes me think it's a change in the interrupt code path.
> > Any clues on where I might start with this?
> 
> oprofile should show where we are spending the time.

Executive summary:
	Silly me. interrupt bindings changed and I didn't check after rolling
	to 2.6.15 (from 2.6.14). That explains the increase cpu utilization
	with lower performance.
	Getting similar performance with -T 0,0 (2.6.15) vs -T 1,1 (2.6.14).

	I can't explain why q-syscollect *improves* perf by ~11 to 17%.
	I cc'd linux-ia64 hoping someone might explain it.

Details:
Since I prefer q-syscollect:
grundler@gsyprf3:~/openib-perf-2006/rx2600-r4929$ LD_PRELOAD=/usr/local/lib/libsdp.so q-syscollect /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_RR -T 1,1 -c -C -- -r 1,1 -s 0 -S 0
libsdp.so: $LIBSDP_CONFIG_FILE not set. Using /usr/local/etc/libsdp.conf
libsdp.so: $LIBSDP_CONFIG_FILE not set. Using /usr/local/etc/libsdp.conf
bind_to_specific_processor: enter
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.30 (10.0.0.30) port 0 AF_INET
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

16384  87380  1       1      60.00   16300.11  10.53  8.26   12.925  10.137

Wierd. Performance jumps from 13900 to 16300 (+2400 or +%17).
Hrm...something got me to look at /proc/interrupts and I see that
mthca is interrupting on CPU0 now:
 70:  644084899          0       PCI-MSI-X  ib_mthca (comp)
 71:          8          0       PCI-MSI-X  ib_mthca (async)
 72:      27247          0       PCI-MSI-X  ib_mthca (cmd)

Retest with -T 0,1 :

16384  87380  1       1      60.00   17557.94  6.06   10.88  6.909   12.390

And again -T 0,1 but without q-syscollect:
16384  87380  1       1      60.00   15891.41  6.13   7.61   7.713   9.571 

Now with -T 0,0:
16384  87380  1       1      60.00   20719.03  5.93   5.26   5.724   5.076 

with -T 0,0 and without q-syscollect:
16384  87380  1       1      60.00   18553.61  5.73   5.36   6.181   5.782 

That's +11% on the last set.
I'm stumped why q-syscollect would *improve* performance.
Maybe someone on linux-ia64 has a theory?

.q/ output from the runs below is available on:
http://gsyprf3.external.hp.com/~grundler/openib-perf-2006/rx2600-r4929/.q

The "insteresting" output files from the last q-syscollect run:
	netperf-pid5693-cpu0.info#0
	q-syscollect-pid5691-cpu0.info#0
	unknown-cpu0.info#2


q-view output for last run of netperf/SDP is:
Command: /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_RR -T 0 0 -c -C -- -r 1 1 -s 0 -S 0
Flat profile of CPU_CYCLES in netperf-pid5693-cpu0.hist#0:
 Each histogram sample counts as 1.00034m seconds
 % time      self     cumul     calls self/call  tot/call name
  27.18      1.52      1.52     21.8M     69.9n     69.9n _spin_unlock_irqrestore<kernel>
  16.08      0.90      2.42     1.18M      761n      846n schedule<kernel>
   6.24      0.35      2.77     1.29M      271n     2.56u sdp_inet_recv<kernel>
   4.33      0.24      3.01     2.59M     93.6n     93.6n __kernel_syscall_via_break<kernel>
   3.90      0.22      3.23     1.18M      185n      342n sdp_send_buff_post<kernel>
   3.51      0.20      3.43     1.24M      158n      933n sdp_inet_send<kernel>
   2.70      0.15      3.58     2.48M     60.8n     60.8n kmem_cache_free<kernel>
   2.11      0.12      3.69     4.82M     24.5n     24.5n kmem_cache_alloc<kernel>
   1.93      0.11      3.80     1.20M     90.1n      358n sdp_recv_flush<kernel>
   1.79      0.10      3.90     2.53M     39.5n     39.5n fget<kernel>
   1.75      0.10      4.00     1.22M     80.2n     2.95u sys_recv<kernel>
   1.65      0.09      4.09     1.22M     75.4n      414n sdp_send_data_queue_test<kernel>
   1.59      0.09      4.18     1.25M     71.3n     1.16u sys_send<kernel>
   1.54      0.09      4.27         -         -         - send_tcp_rr
   1.41      0.08      4.35     2.33M     33.9n     33.9n sdp_buff_q_put_tail<kernel>
   1.22      0.07      4.41     21.7M     3.14n     3.14n _spin_lock_irqsave<kernel>
   1.09      0.06      4.48     1.25M     48.7n     48.7n __divsi3<kernel>
   1.06      0.06      4.53     1.23M     48.1n     2.86u sys_recvfrom<kernel>
   1.02      0.06      4.59     2.29M     24.9n     24.9n sba_map_single<kernel>
   1.02      0.06      4.65     2.49M     22.9n     63.0n sockfd_lookup<kernel>
   0.98      0.06      4.70         -         -         - sdp_exit<kernel>
   0.98      0.06      4.76     1.26M     43.7n      967n sock_sendmsg<kernel>
   0.97      0.05      4.81     2.38M     22.7n     72.3n sdp_buff_pool_get<kernel>
   0.95      0.05      4.87     2.50M     21.2n     21.2n fput<kernel>
   0.91      0.05      4.92     1.23M     41.4n     2.72u sock_recvmsg<kernel>
   0.89      0.05      4.97     1.19M     42.1n     62.9n memcpy_toiovec<kernel>
   0.88      0.05      5.02     2.50M     19.6n     19.6n __copy_user<kernel>
   0.86      0.05      5.06     1.21M     39.8n      868n schedule_timeout<kernel>
   0.82      0.05      5.11     1.29M     35.7n      128n send
   0.79      0.04      5.15     1.25M     35.1n     35.1n sched_clock<kernel>
   0.64      0.04      5.19     1.29M     27.8n      123n recv
   0.59      0.03      5.22     1.23M     26.9n     26.9n __udivdi3<kernel>
   0.59      0.03      5.26     1.19M     27.8n     99.1n mthca_tavor_post_send<kernel>
   ....
                     

q-view output for the q-syscollect process:
Command: q-syscollect /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_
RR -T 0,0 -c -C -- -r 1,1 -s 0 -S 0
Flat profile of CPU_CYCLES in q-syscollect-pid5691-cpu0.hist#0:
 Each histogram sample counts as 1.00034m seconds
 % time      self     cumul     calls self/call  tot/call name
  0.00      0.00      0.00       162      0.00      0.00 _spin_lock_irqsave<kernel>
  0.00      0.00      0.00       149      0.00      0.00 _spin_unlock_irqrestore <kernel>
  0.00      0.00      0.00       115      0.00      0.00 lock_timer_base<kernel>
  0.00      0.00      0.00       102      0.00      0.00 schedule<kernel>
  0.00      0.00      0.00      41.0      0.00      0.00 kfree<kernel>
  ...

Only notable thing here is it calls "schedule" and triggers
an additional ~160 interrupts (my guess).


unknown-cpu0.info#2 is probably interesting too:

Flat profile of CPU_CYCLES in unknown-cpu0.hist#2:
 Each histogram sample counts as 1.00034m seconds
 % time      self     cumul     calls self/call  tot/call name
  91.59     53.37     53.37      931k     57.3u     57.3u default_idle
   4.53      2.64     56.00     10.8M      244n      244n _spin_unlock_irqrestore
   1.78      1.04     57.04     1.27M      815n      885n schedule
   0.75      0.43     57.48     1.38M      315n     2.30u mthca_eq_int
   0.49      0.28     57.76     1.38M      206n     2.57u handle_IRQ_event
   0.19      0.11     57.87         -         -         - cpu_idle
   0.14      0.08     57.96     1.35M     62.1n     2.41u mthca_tavor_msi_x_interrupt
   0.10      0.06     58.02     1.32M     45.4n     2.07u mthca_cq_completion
   ...


thanks,
grant

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: SDP perf drop with 2.6.15
  2006-01-12 19:50 SDP perf drop with 2.6.15 Grant Grundler
@ 2006-01-12 19:58 ` Michael S. Tsirkin
  2006-01-31 21:55 ` [openib-general] " Grant Grundler
  1 sibling, 0 replies; 3+ messages in thread
From: Michael S. Tsirkin @ 2006-01-12 19:58 UTC (permalink / raw)
  To: linux-ia64

Quoting r. Grant Grundler <iod00d@hp.com>:
> Subject: Re: SDP perf drop with 2.6.15
> 
> On Thu, Jan 12, 2006 at 09:18:03AM +0200, Michael S. Tsirkin wrote:
> > > Makes me think it's a change in the interrupt code path.
> > > Any clues on where I might start with this?
> > 
> > oprofile should show where we are spending the time.
> 
> Executive summary:
> 	Silly me. interrupt bindings changed and I didn't check after rolling
> 	to 2.6.15 (from 2.6.14). That explains the increase cpu utilization
> 	with lower performance.
> 	Getting similar performance with -T 0,0 (2.6.15) vs -T 1,1 (2.6.14).
> 

OK, so we are left with the ref count now.
I'm seeing this myself sometimes, I have some other stuff to attend to
next week, and then I plan to look into this.

-- 
MST

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [openib-general] Re: SDP perf drop with 2.6.15
  2006-01-12 19:50 SDP perf drop with 2.6.15 Grant Grundler
  2006-01-12 19:58 ` Michael S. Tsirkin
@ 2006-01-31 21:55 ` Grant Grundler
  1 sibling, 0 replies; 3+ messages in thread
From: Grant Grundler @ 2006-01-31 21:55 UTC (permalink / raw)
  To: linux-ia64

On Thu, Jan 12, 2006 at 11:50:04AM -0800, Grant Grundler wrote:
...
> 	I can't explain why q-syscollect *improves* perf by ~11 to 17%.

Kudos to Stephane Eranian for sorting this out.

Execute summary:
	Rebooting with "nohalt" kernel option gets me the full performance.

Gory details:
By default, ia64-linux goes to a "low power state" (no jokes about
this please) in the idle loop. This is implemented with form of
"halt" instruction.  Perfmon subsystem disables use of "halt" since
older (and possibly current) PAL support for "halt" was broken and it
would break the Performance Monitoring HW state. Please ask Stephane
offline if you need more details.

Stephane also commented that this might be an issue with other architectures
as well if they have a low power state. The transition from low power to
"normal" will have a cost on every architecture. And every interrupt is
likely to incur that cost. This is a real problem for benchmarking where
"latency" (ie we idle for very short periods of time) is hurt as I saw
with netperf TCP_RR.

Q to ia64-linux: since perfmon can enable/disable this on the fly, can
I add a /sys hook to do the same from userspace?
Where under /sys could this live?

cheers,
grant

> Details:
> Since I prefer q-syscollect:
> grundler@gsyprf3:~/openib-perf-2006/rx2600-r4929$ LD_PRELOAD=/usr/local/lib/libsdp.so q-syscollect /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_RR -T 1,1 -c -C -- -r 1,1 -s 0 -S 0
> libsdp.so: $LIBSDP_CONFIG_FILE not set. Using /usr/local/etc/libsdp.conf
> libsdp.so: $LIBSDP_CONFIG_FILE not set. Using /usr/local/etc/libsdp.conf
> bind_to_specific_processor: enter
> TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.30 (10.0.0.30) port 0 AF_INET
> Local /Remote
> Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr
> 
> 16384  87380  1       1      60.00   16300.11  10.53  8.26   12.925  10.137
> 
> Wierd. Performance jumps from 13900 to 16300 (+2400 or +%17).
> Hrm...something got me to look at /proc/interrupts and I see that
> mthca is interrupting on CPU0 now:
>  70:  644084899          0       PCI-MSI-X  ib_mthca (comp)
>  71:          8          0       PCI-MSI-X  ib_mthca (async)
>  72:      27247          0       PCI-MSI-X  ib_mthca (cmd)
> 
> Retest with -T 0,1 :
> 
> 16384  87380  1       1      60.00   17557.94  6.06   10.88  6.909   12.390
> 
> And again -T 0,1 but without q-syscollect:
> 16384  87380  1       1      60.00   15891.41  6.13   7.61   7.713   9.571 
> 
> Now with -T 0,0:
> 16384  87380  1       1      60.00   20719.03  5.93   5.26   5.724   5.076 
> 
> with -T 0,0 and without q-syscollect:
> 16384  87380  1       1      60.00   18553.61  5.73   5.36   6.181   5.782 
> 
> That's +11% on the last set.
> I'm stumped why q-syscollect would *improve* performance.
...

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-01-31 21:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-12 19:50 SDP perf drop with 2.6.15 Grant Grundler
2006-01-12 19:58 ` Michael S. Tsirkin
2006-01-31 21:55 ` [openib-general] " Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox