* InfiniBand HCA loopback on a single host (subnet manager needed?)
@ 2011-03-09 8:04 Konstantin Boyanov
[not found] ` <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Boyanov @ 2011-03-09 8:04 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hello list,
I want to apologize if I am intruding some development-only mailing list
with my questions, but that is the only mailing list considering
InfiniBand and Linux which I was able to find.
But let me tell you why I am writing this email - we have one dual-GPU
server with and InfiniBand HCA on it. In the future we would like to test
GPU-to-GPU communication between two or more hosts through the IB HCA, but
for now we just want to test how much time is needed by some packet to
travel from system memory / GPU memory to the IB HCA.
I think this is achievable on a single host by using the loopback
capabilities of the InfiniBand HCA. The problem is, that I was not able to
find a comprehensive description of how one sets up such loopback
operation on the HCA chip.
The only thing i have found in this regard is a snippet from a rather old
SunVTS 6.2 Test Reference Manual for x86:
<CITE>
The HCA supports internal loopback for packets transmitted between QPs
that are assigned to the same HCA port. If a packet is being transmitted
to a DLID that is equivalent to the Port LID with the LMC bits masked out
or the packet DLID is a multicast LID, the packet goes on the loopback
path. In this latter case, the packet also is transmitted to the fabric.
In the inbound direction, the ICRC and VCRC checks are blindly passed for
looped back packets. Note that internal loopback is supported only for
packets that are transmitted and received on the same port. Packets that
are transmitted on one port and received on another port are transmitted
to the fabric. The fabric directs these packets to the destination port.
<ENDCITE>
I don't know whether or not this is still true (or true at all) for the
case of our HCA (Mellanox ConnectX dual port QDR MT25408 chip). Can
someone with experience in setting up such loopback shed some light on
this?
Another question - must there be a subnet manager running on the box, so
the port(s) get configured properly or the loopback operation of the HCA
does not require it?
I have dug through the examples in OFED-1.4/src/perftest-1.2 and with its
help have up until now managed to create a single-threaded program which
can sucesfully open the HCA, set up two different QPs. Unfortunately the
programm crashes with a segmentation fault just at begining of
transmission of data between the two QPs and I am wondering if this is not
due to the lack of and subnet manager, wrong (or lacking configuration) or
just my awesome programming skills (see end of mail).
You can find the source of my program here:
http://www.ifh.de/~boyanov/gpeIBloopback.cc
http://www.ifh.de/~boyanov/gpeIBloopback.h
Any ideas, comments or suggestions regarding the questions described above
are highly appreciated! Please let me know if anything does not make sense
or you need more information on the subject.
With best regards,
Konstantin Boyanov
# uname -a
Linux gpu1.ifh.de 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 EST 2010
x86_64 x86_64 x86_64 GNU/Linux
# ibv_devinfo:
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.626
node_guid: 0002:c903:000b:e242
sys_image_guid: 0002:c903:000b:e245
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: MT_0D90110009
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
Output from GDB:
################
Starting program: /user/b/boyanov/workspace/GPEubench/src/ibloop
--len-min=1024 --len-max=8192 --len-inc=1024 --nmeas=1 --npass=1 --conn=0
--txdpth=64 --port=1
[Thread debugging using libthread_db enabled]
optLenMin = 1024, optLenMax = 8192, optLenInc = 1024, optNmeas = 1,
optNpass = 1
# dev_name: uverbs0
# dev_path: /sys/class/infiniband_verbs/uverbs0
# ibdev_path: /sys/class/infiniband/mlx4_0
# name: mlx4_0
Data fields in ibv_device_attr:
atomic_cap = 1
device_cap_flags = 7117942
local_ca_ack_delay = 15
max_ah = 0
max_cq = 65408
max_cqe = 4194303
max_ee = 0
max_ee_init_rd_atom = 0
max_ee_rd_atom = 0
max_fmr = 0
max_map_per_fmr = 8191
max_map_per_fmr = 8192
max_mcast_qp_attach = 56
max_mcast_qp_attach = 524272
max_mr_size = 18446744073709551615
max_mw = 0
max_pd = 32764
max_pkeys = 128
max_qp = 261824
max_qp_init_rd_atom = 128
max_qp_rd_atom = 16
max_qp_wr = 16351
max_raw_ethy_qp = 1
max_raw_ipv6_qp = 0
max_rdd = 0
max_res_rd_atom = 4189184
max_sg = e32
max_sge_rd = 0
max_srq = 65472
max_srq_sge = 31
max_srq_wr = 16383
max_total_mcast_qp_attach = 458752
node_guid = 4819426645931262464
page_size_cap = 4294966784
phys_port_cnt = 1
sys_image_guid = 5035599428045046272
vendor_id = 713
vendor_part_id = 26428
qp_state = 1
path_mig_state = 0
qkey = 286331153
rq_psn = 0
sq_psn = 1441792
dest_qp_num = 0
qp_access_flags = 352
pkey_index = 0
alt_pkey_index = 0
en_sqd_async_notify = 55
sq_draining = 0
max_rd_atomic = 0
max_dest_rd_atomic = 0
min_rnr_timer = 0
port_num = 1
timeout = 0
retry_cnt = 0
rnr_retry = 0
alt_port_num = 0
alt_timeout = 0
qp_state = 1
path_mig_state = 0
qkey = 286331153
rq_psn = 0
sq_psn = 1441792
dest_qp_num = 0
qp_access_flags = 352
pkey_index = 0
alt_pkey_index = 0
en_sqd_async_notify = 170
sq_draining = 0
max_rd_atomic = 0
max_dest_rd_atomic = 0
min_rnr_timer = 0
port_num = 1
timeout = 0
retry_cnt = 0
rnr_retry = 0
alt_port_num = 0
alt_timeout = 0
QP number = 2097225
QP handle = 0
QP state = 1
QP type = 4
QP events completed = 0
QP number = 2097226
QP handle = 1
QP state = 1
QP type = 4
QP events completed = 0
set the send work request fields
set the receive work request fields
local address: LID 0000 QPN 0x200049 PSN 0x204a16 RKEY
0x000000b0041c24 VADDR 0x00000000606010
remote address: LID 0000 QPN 0x20004a PSN 0x442a26 RKEY 0x000000b0041c24
VADDR 0x00000000606010
PING
Program received signal SIGSEGV, Segmentation fault.
0x00002aaaab006037 in ibv_cmd_create_qp () from
/usr/lib64/libmlx4-rdmav2.so
(gdb) bt
#0 0x00002aaaab006037 in ibv_cmd_create_qp () from
/usr/lib64/libmlx4-rdmav2.so
#1 0x00000000004010ba in ibv_post_send (qp=0x605da0, wr=0x7fffffffdf10,
bad_wr=0x7fffffffe0b0) at /usr/include/infiniband/verbs.h:1000
#2 0x000000000040270b in main (argc=9, argv=0x7fffffffe1f8) at
gpeIBloopback.cc:557
Konstantin Boyanov
DESY Zeuthen, Platanenallee 6, 15738 Zeuthen
Tel.:+49(33762)77178
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org>]
* Re: InfiniBand HCA loopback on a single host (subnet manager needed?) [not found] ` <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org> @ 2011-03-09 17:30 ` Jason Gunthorpe [not found] ` <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Jason Gunthorpe @ 2011-03-09 17:30 UTC (permalink / raw) To: Konstantin Boyanov; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, Mar 09, 2011 at 09:04:39AM +0100, Konstantin Boyanov wrote: > I think this is achievable on a single host by using the loopback > capabilities of the InfiniBand HCA. The problem is, that I was not > able to find a comprehensive description of how one sets up such > loopback operation on the HCA chip. To enable loopback I'm pretty sure you need an IB link on the card, and a running SM to set the link the ACTIVE. Also, be aware that in my experience loopback is a special case on IB cards and the performance is not good, so testing in this manner may not be valid. If you have a dual port card you'd be better off connecting the ports together and doing your test from port 1 to port 2. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: InfiniBand HCA loopback on a single host (subnet manager needed?) [not found] ` <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2011-03-09 18:24 ` Hal Rosenstock [not found] ` <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Hal Rosenstock @ 2011-03-09 18:24 UTC (permalink / raw) To: Konstantin Boyanov; +Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, Mar 9, 2011 at 12:30 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Wed, Mar 09, 2011 at 09:04:39AM +0100, Konstantin Boyanov wrote: > >> I think this is achievable on a single host by using the loopback >> capabilities of the InfiniBand HCA. The problem is, that I was not >> able to find a comprehensive description of how one sets up such >> loopback operation on the HCA chip. > > To enable loopback I'm pretty sure you need an IB link on the > card, and a running SM to set the link the ACTIVE. Also, bringing the link to active, etc. can be done via ibportstate if you want to avoid SM for some reason. Also, you will need a loopback plug or cable the port to something otherwise the link won't come up, etc. -- Hal > > Also, be aware that in my experience loopback is a special case on IB > cards and the performance is not good, so testing in this manner may > not be valid. > > If you have a dual port card you'd be better off connecting the ports > together and doing your test from port 1 to port 2. > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: InfiniBand HCA loopback on a single host (subnet manager needed?) [not found] ` <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2011-03-31 8:53 ` Konstantin Boyanov [not found] ` <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Konstantin Boyanov @ 2011-03-31 8:53 UTC (permalink / raw) To: Hal Rosenstock Cc: Konstantin Boyanov, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hello, Thanks for the advices! I have gotten my hands on an QSFP loopback plug, and yestrday inserted it in the machine (sinlge slot IB card). Unfortunately I am having problems when starting the Subnet Manager.I believe I have installed and loaded all the necessary kernel modules needed. *# lsmod | grep ib ib_ipoib 78893 0 ib_ucm 12567 0 ib_uverbs 31293 6 rdma_ucm,ib_ucm ib_umad 12147 4 ib_cm 36419 3 ib_ipoib,ib_ucm,rdma_cm ib_addr 6089 1 rdma_cm ib_sa 22820 4 ib_ipoib,rdma_ucm,rdma_cm,ib_cm mlx4_ib 52866 1 ib_mad 40542 4 ib_umad,ib_cm,ib_sa,mlx4_ib ib_core 66295 11 ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad ipv6 321509 72 ib_ipoib,ib_addr mlx4_core 93453 2 mlx4_ib,mlx4_en* But when I start the opensm via: *# /etc/init.d/opensm start* I see a lot of error messages at the end of /var/log/opensm.log: *Mar 30 12:50:05 622171 [1795B700] 0x80 -> SM port is down Mar 30 12:50:05 622184 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING SM port is down Mar 30 12:50:15 622345 [1795B700] 0x80 -> SM port is down Mar 30 12:50:15 622356 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING Errors on subnet. Duplicate GUID found by link from a port to itself. See verbose opensm.log for more details Mar 30 12:50:25 622645 [1C963700] 0x80 -> Errors on subnet. Duplicate GUID found by link from a port to itself. See verbose opensm.log for more details After that, the port state is changed to PORT_INIT, but non of my test programs for the loopback (as well as thous in the OFED examples) can find a valid LID and oeprate properly. *# ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.626 node_guid: 0002:c903:000b:e242 sys_image_guid: 0002:c903:000b:e245 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0D90110009 phys_port_cnt: 1 port: 1 state: PORT_INIT (2) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 0 port_lid: 0 port_lmc: 0x00 *I am using OFED drivers version 1.4 and the machine is as follows: *# uname -a Linux myhost.domain.de 2.6.32-71.18.2.el6.x86_64 #1 SMP Tue Mar 8 15:00:52 CST 2011 x86_64 x86_64 x86_64 GNU/Linux* It seems to me that the loopback connector is somehow tricking the openSM to think that there is something wrong with the ports. Am I right? Another thing: If I try to force bring the port to the ACTIVE state with ibportstate I get the following error: # ibportstate -G 0x0002c903000be243 1 enable ibwarn: [4824] mad_rpc_open_port: can't open UMAD port ((null):0) ibportstate: iberror: failed: Failed to open '(null)' port '0' I am really a greenehorn to all this InfiniBand stuff, so please can someone decrypt the above error messages in the opensm.log? What should I do in order to have a running openSM and a port configured the right way, so I can loopback messages? Is there any documentation out there which describes the set up of an loopback on a single port, or at least the initial setup of an InfiniBand network? Thnaks in advance for your time and sorry if I am bothering you too much with my lame questions. Best regards, Konstantin Boyanov -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org>]
* Re: InfiniBand HCA loopback on a single host (subnet manager needed?) [not found] ` <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org> @ 2011-03-31 13:53 ` Hal Rosenstock 0 siblings, 0 replies; 5+ messages in thread From: Hal Rosenstock @ 2011-03-31 13:53 UTC (permalink / raw) To: Konstantin Boyanov Cc: Hal Rosenstock, Konstantin Boyanov, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA On 3/31/2011 4:53 AM, Konstantin Boyanov wrote: > Hello, > > Thanks for the advices! I have gotten my hands on an QSFP loopback plug, > and yestrday inserted it in the machine (sinlge slot IB card). > > Unfortunately I am having problems when starting the Subnet Manager.I > believe I have installed and loaded all the necessary kernel modules > needed. > > *# lsmod | grep ib > ib_ipoib 78893 0 > ib_ucm 12567 0 > ib_uverbs 31293 6 rdma_ucm,ib_ucm > ib_umad 12147 4 > ib_cm 36419 3 ib_ipoib,ib_ucm,rdma_cm > ib_addr 6089 1 rdma_cm > ib_sa 22820 4 ib_ipoib,rdma_ucm,rdma_cm,ib_cm > mlx4_ib 52866 1 > ib_mad 40542 4 ib_umad,ib_cm,ib_sa,mlx4_ib > ib_core 66295 11 > ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad > > ipv6 321509 72 ib_ipoib,ib_addr > mlx4_core 93453 2 mlx4_ib,mlx4_en* > > > But when I start the opensm via: > > *# /etc/init.d/opensm start* > > I see a lot of error messages at the end of /var/log/opensm.log: > > *Mar 30 12:50:05 622171 [1795B700] 0x80 -> SM port is down > Mar 30 12:50:05 622184 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR > 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING > SM port is down > > Mar 30 12:50:15 622345 [1795B700] 0x80 -> SM port is down > Mar 30 12:50:15 622356 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR > 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING > Errors on subnet. Duplicate GUID found by link from a port to itself. > See verbose opensm.log for more details > > Mar 30 12:50:25 622645 [1C963700] 0x80 -> Errors on subnet. Duplicate > GUID found by link from a port to itself. See verbose opensm.log for > more details My bad; can you cable this to some other IB port (either switch or other HCA port) ? If this is a 2 port HCA, then it's simple. > After that, the port state is changed to PORT_INIT, but non of my test > programs for the loopback (as well as thous in the OFED examples) can > find a valid LID and oeprate properly. > > *# ibv_devinfo > hca_id: mlx4_0 > transport: InfiniBand (0) > fw_ver: 2.7.626 > node_guid: 0002:c903:000b:e242 > sys_image_guid: 0002:c903:000b:e245 > vendor_id: 0x02c9 > vendor_part_id: 26428 > hw_ver: 0xB0 > board_id: MT_0D90110009 > phys_port_cnt: 1 > port: 1 > state: PORT_INIT (2) > max_mtu: 2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > *I am using OFED drivers version 1.4 and the machine is as follows: > > *# uname -a > Linux myhost.domain.de 2.6.32-71.18.2.el6.x86_64 #1 SMP Tue Mar 8 > 15:00:52 CST 2011 x86_64 x86_64 x86_64 GNU/Linux* > > It seems to me that the loopback connector is somehow tricking the > openSM to think that there is something wrong with the ports. Am I right? It's making the OpenSM think that the remote end of the port has a duplicate GUID; doesn't handle this case :-( > Another thing: If I try to force bring the port to the ACTIVE state with > ibportstate I get the following error: > > # ibportstate -G 0x0002c903000be243 1 enable > ibwarn: [4824] mad_rpc_open_port: can't open UMAD port ((null):0) > ibportstate: iberror: failed: Failed to open '(null)' port '0' Let's fix the problems one at a time. You shouldn't need to do this. -- Hal > > I am really a greenehorn to all this InfiniBand stuff, so please can > someone decrypt the above error messages in the opensm.log? What should > I do in order to have a running openSM and a port configured the right > way, so I can loopback messages? Is there any documentation out there > which describes the set up of an loopback on a single port, or at least > the initial setup of an InfiniBand network? > > Thnaks in advance for your time and sorry if I am bothering you too much > with my lame questions. > > Best regards, > Konstantin Boyanov > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-03-31 13:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-09 8:04 InfiniBand HCA loopback on a single host (subnet manager needed?) Konstantin Boyanov
[not found] ` <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org>
2011-03-09 17:30 ` Jason Gunthorpe
[not found] ` <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-03-09 18:24 ` Hal Rosenstock
[not found] ` <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-03-31 8:53 ` Konstantin Boyanov
[not found] ` <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org>
2011-03-31 13:53 ` Hal Rosenstock
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox