* IPoIB performance benchmarking
@ 2010-04-12 18:35 Tom Ammon
[not found] ` <4BC367DD.30606-wbocuHtxKic@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Tom Ammon @ 2010-04-12 18:35 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; +Cc: Brian Haymore
Hi,
I'm trying to do some performance benchmarking of IPoIB on a DDR IB
cluster, and I am having a hard time understanding what I am seeing.
When I do a simple netperf, I get results like these:
[root@gateway3 ~]# netperf -H 192.168.23.252
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.23.252
(192.168.23.252) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 65536 65536 10.01 4577.70
Which is disappointing since it is simply two DDR IB-connected nodes
plugged in to a DDR switch - I would expect much higher throughput than
that. When I do a test with ibv_srq_pingpong (using the same message
size reported above), here's what I get:
[root@gateway3 ~]# ibv_srq_pingpong 192.168.23.252 -m 4096 -s 65536
local address: LID 0x012b, QPN 0x000337, PSN 0x19cc85
local address: LID 0x012b, QPN 0x000338, PSN 0x956fc2
...
[output omitted]
...
remote address: LID 0x0129, QPN 0x00032e, PSN 0x891ce3
131072000 bytes in 0.08 seconds = 12763.08 Mbit/sec
1000 iters in 0.08 seconds = 82.16 usec/iter
Which is much closer to what I would expect with DDR.
The MTU on both of the QLogic DDR HCAs is set to 4096, as it is on the
QLogic switch.
I know the above is not completely apples-to-apples, since the
ibv_srq_pingpong is layer2 and is using 16 QPs. So I ran it again with
only a single QP, to make it more roughly equivalent of my single-stream
netperf test, and I still get almost double the performance:
[root@gateway3 ~]# ibv_srq_pingpong 192.168.23.252 -m 4096 -s 65536 -q 1
local address: LID 0x012b, QPN 0x000347, PSN 0x65fb56
remote address: LID 0x0129, QPN 0x00032f, PSN 0x5e52f9
131072000 bytes in 0.13 seconds = 8323.22 Mbit/sec
1000 iters in 0.13 seconds = 125.98 usec/iter
Is there something that I am not understanding, here? Is there any way
to make single-stream TCP IPoIB performance better than 4.5Gb/s on a DDR
network? Am I just not using the benchmarking tools correctly?
Thanks,
Tom
--
--------------------------------------------------------------------
Tom Ammon
Network Engineer
Office: 801.587.0976
Mobile: 801.674.9273
Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread[parent not found: <4BC367DD.30606-wbocuHtxKic@public.gmane.org>]
* Re: IPoIB performance benchmarking [not found] ` <4BC367DD.30606-wbocuHtxKic@public.gmane.org> @ 2010-04-12 20:19 ` Dave Olson [not found] ` <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Dave Olson @ 2010-04-12 20:19 UTC (permalink / raw) To: Tom Ammon Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Brian Haymore On Mon, 12 Apr 2010, Tom Ammon wrote: | I'm trying to do some performance benchmarking of IPoIB on a DDR IB | cluster, and I am having a hard time understanding what I am seeing. | | When I do a simple netperf, I get results like these: | | [root@gateway3 ~]# netperf -H 192.168.23.252 | TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.23.252 | (192.168.23.252) port 0 AF_INET | Recv Send Send | Socket Socket Message Elapsed | Size Size Size Time Throughput | bytes bytes bytes secs. 10^6bits/sec | | 87380 65536 65536 10.01 4577.70 Are you using connected mode, or UD? Since you say you have a 4K MTU, I'm guessing you are using UD. Change to use connected mode (edit /etc/infiniband/openib.conf), or as a quick test echo connected > /sys/class/net/ib0/mode and then the mtu should show as 65520. That should help the bandwidth a fair amount. Dave Olson dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org>]
* Re: IPoIB performance benchmarking [not found] ` <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org> @ 2010-04-12 20:52 ` Tom Ammon [not found] ` <4BC3882B.4070200-wbocuHtxKic@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Tom Ammon @ 2010-04-12 20:52 UTC (permalink / raw) To: Dave Olson Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Brian Dale Haymore Dave, Thanks for the pointer. I thought it was running in connected mode, and looking at that variable that you mentioned confirms it: [root@gateway3 ~]# cat /sys/class/net/ib0/mode connected And the IP MTU shows up as: [root@gateway3 ~]# ifconfig ib0 ib0 Link encap:InfiniBand HWaddr 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.23.253 Bcast:192.168.23.255 Mask:255.255.254.0 inet6 addr: fe80::211:7500:ff:6edc/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:2319010 errors:0 dropped:0 overruns:0 frame:0 TX packets:4512605 errors:0 dropped:33011 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:5450805352 (5.0 GiB) TX bytes:154353169896 (143.7 GiB) This is partly why I'm stumped - I've seen threads about how connected mode is supposed to improve IPoIB performance, but I'm not seeing as much performance as I'd like. Tom On 04/12/2010 02:19 PM, Dave Olson wrote: > On Mon, 12 Apr 2010, Tom Ammon wrote: > | I'm trying to do some performance benchmarking of IPoIB on a DDR IB > | cluster, and I am having a hard time understanding what I am seeing. > | > | When I do a simple netperf, I get results like these: > | > | [root@gateway3 ~]# netperf -H 192.168.23.252 > | TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.23.252 > | (192.168.23.252) port 0 AF_INET > | Recv Send Send > | Socket Socket Message Elapsed > | Size Size Size Time Throughput > | bytes bytes bytes secs. 10^6bits/sec > | > | 87380 65536 65536 10.01 4577.70 > > Are you using connected mode, or UD? Since you say you have a 4K MTU, > I'm guessing you are using UD. Change to use connected mode (edit > /etc/infiniband/openib.conf), or as a quick test > > echo connected> /sys/class/net/ib0/mode > > and then the mtu should show as 65520. That should help > the bandwidth a fair amount. > > > Dave Olson > dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org > -- -------------------------------------------------------------------- Tom Ammon Network Engineer Office: 801.587.0976 Mobile: 801.674.9273 Center for High Performance Computing University of Utah http://www.chpc.utah.edu -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <4BC3882B.4070200-wbocuHtxKic@public.gmane.org>]
* Re: IPoIB performance benchmarking [not found] ` <4BC3882B.4070200-wbocuHtxKic@public.gmane.org> @ 2010-04-12 22:25 ` Dave Olson 0 siblings, 0 replies; 4+ messages in thread From: Dave Olson @ 2010-04-12 22:25 UTC (permalink / raw) To: Tom Ammon Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Brian Dale Haymore On Mon, 12 Apr 2010, Tom Ammon wrote: | Thanks for the pointer. I thought it was running in connected mode, and | looking at that variable that you mentioned confirms it: | [root@gateway3 ~]# ifconfig ib0 | ib0 Link encap:InfiniBand HWaddr | 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | inet addr:192.168.23.253 Bcast:192.168.23.255 Mask:255.255.254.0 | RX packets:2319010 errors:0 dropped:0 overruns:0 frame:0 | TX packets:4512605 errors:0 dropped:33011 overruns:0 carrier:0 That's a lot of packets dropped on the tx side. If you have the qlogic software installed, running ipathstats -c1 while you are running the test would be useful, otherwise perfquery -r at start and another perfquery at the end on both nodes might point to something. Oh, and depending on your tcp stack tuning, setting the receive and/or send buffer size might help. These are all ddr results, on a more or less OFED 1.5.1 stack (completely unofficial, blah blah). And yes, multi-thread will bring the results up (iperf, rather than netperf). # netperf -H ib-host TCP_STREAM -- -m 65536 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ib-host (172.29.9.46) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 65536 65536 10.03 5150.24 # netperf -H ib-host TCP_STREAM -- -m 65536 -S 131072 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ib-host (172.29.9.46) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 262144 65536 65536 10.03 5401.83 # netperf -H ib-host TCP_STREAM -- -m 65536 -S 262144 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ib-host (172.29.9.46) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 524288 65536 65536 10.01 5478.28 Dave Olson dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-04-12 22:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-12 18:35 IPoIB performance benchmarking Tom Ammon
[not found] ` <4BC367DD.30606-wbocuHtxKic@public.gmane.org>
2010-04-12 20:19 ` Dave Olson
[not found] ` <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org>
2010-04-12 20:52 ` Tom Ammon
[not found] ` <4BC3882B.4070200-wbocuHtxKic@public.gmane.org>
2010-04-12 22:25 ` Dave Olson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox