public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* InfiniBand HCA loopback on a single host (subnet manager needed?)
@ 2011-03-09  8:04 Konstantin Boyanov
       [not found] ` <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Boyanov @ 2011-03-09  8:04 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello list,

I want to apologize if I am intruding some development-only mailing list 
with my questions, but that is the only mailing list considering 
InfiniBand and Linux which I was able to find.

But let me tell you why I am writing this email - we have one dual-GPU 
server with and InfiniBand HCA on it. In the future we would like to test 
GPU-to-GPU communication between two or more hosts through the IB HCA, but 
for now we just want to test how much time  is needed by some packet to 
travel from system memory / GPU memory to the IB HCA.
I think this is achievable on a single host by using the loopback 
capabilities of the InfiniBand HCA. The problem is, that I was not able to 
find a comprehensive description of how one sets up such loopback 
operation on the HCA chip.

The only thing i have found in this regard is a snippet from a rather old 
SunVTS 6.2 Test Reference Manual for x86:

<CITE>
The HCA supports internal loopback for packets transmitted between QPs 
that are assigned to the same HCA port. If a packet is being transmitted 
to a DLID that is equivalent to the Port LID with the LMC bits masked out 
or the packet DLID is a multicast LID, the packet goes on the loopback 
path. In this latter case, the packet also is transmitted to the fabric. 
In the inbound direction, the ICRC and VCRC checks are blindly passed for 
looped back packets. Note that internal loopback is supported only for 
packets that are transmitted and received on the same port. Packets that 
are transmitted on one port and received on another port are transmitted 
to the fabric. The fabric directs these packets to the destination port.
<ENDCITE>

I don't know whether or not this is still true (or true at all) for the 
case of our HCA (Mellanox ConnectX dual port QDR MT25408 chip). Can 
someone with experience in setting up such loopback shed some light on 
this?

Another question - must there be a subnet manager running on the box, so 
the port(s) get configured properly or the loopback operation of the HCA 
does not require it?

I have dug through the examples in OFED-1.4/src/perftest-1.2 and with its 
help have up until now managed to create a single-threaded program which 
can sucesfully open the HCA, set up two different QPs. Unfortunately the 
programm crashes with a segmentation fault just at begining of 
transmission of data between the two QPs and I am wondering if this is not 
due to the lack of and subnet manager, wrong (or lacking configuration) or 
just my awesome programming skills (see end of mail).

You can find the source of my program here:
http://www.ifh.de/~boyanov/gpeIBloopback.cc
http://www.ifh.de/~boyanov/gpeIBloopback.h

Any ideas, comments or suggestions regarding the questions described above 
are highly appreciated! Please let me know if anything does not make sense 
or you need more information on the subject.


With best regards,
Konstantin Boyanov



# uname -a
Linux gpu1.ifh.de 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:46:16 EST 2010 
x86_64 x86_64 x86_64 GNU/Linux

# ibv_devinfo:
hca_id:	mlx4_0
 	transport:			InfiniBand (0)
 	fw_ver:				2.7.626
 	node_guid:			0002:c903:000b:e242
 	sys_image_guid:			0002:c903:000b:e245
 	vendor_id:			0x02c9
 	vendor_part_id:			26428
 	hw_ver:				0xB0
 	board_id:			MT_0D90110009
 	phys_port_cnt:			1
 		port:	1
 			state:			PORT_DOWN (1)
 			max_mtu:		2048 (4)
 			active_mtu:		2048 (4)
 			sm_lid:			0
 			port_lid:		0
 			port_lmc:		0x00

Output from GDB:
################
Starting program: /user/b/boyanov/workspace/GPEubench/src/ibloop 
--len-min=1024 --len-max=8192 --len-inc=1024 --nmeas=1 --npass=1 --conn=0 
--txdpth=64 --port=1
[Thread debugging using libthread_db enabled]
optLenMin = 1024, optLenMax = 8192, optLenInc = 1024, optNmeas = 1, 
optNpass = 1
# dev_name:   uverbs0
# dev_path:   /sys/class/infiniband_verbs/uverbs0
# ibdev_path: /sys/class/infiniband/mlx4_0
# name: 	  mlx4_0


Data fields in ibv_device_attr:
atomic_cap = 1
device_cap_flags = 7117942
local_ca_ack_delay = 15
max_ah = 0
max_cq = 65408
max_cqe = 4194303
max_ee = 0
max_ee_init_rd_atom = 0
max_ee_rd_atom = 0
max_fmr = 0
max_map_per_fmr = 8191
max_map_per_fmr = 8192
max_mcast_qp_attach = 56
max_mcast_qp_attach = 524272
max_mr_size = 18446744073709551615
max_mw = 0
max_pd = 32764
max_pkeys = 128
max_qp = 261824
max_qp_init_rd_atom = 128
max_qp_rd_atom = 16
max_qp_wr = 16351
max_raw_ethy_qp = 1
max_raw_ipv6_qp = 0
max_rdd = 0
max_res_rd_atom = 4189184
max_sg = e32
max_sge_rd = 0
max_srq = 65472
max_srq_sge = 31
max_srq_wr = 16383
max_total_mcast_qp_attach = 458752
node_guid = 4819426645931262464
page_size_cap = 4294966784
phys_port_cnt = 1
sys_image_guid = 5035599428045046272
vendor_id = 713
vendor_part_id = 26428


qp_state = 1
path_mig_state = 0
qkey = 286331153
rq_psn = 0
sq_psn = 1441792
dest_qp_num = 0
qp_access_flags = 352
pkey_index = 0
alt_pkey_index = 0
en_sqd_async_notify = 55
sq_draining = 0
max_rd_atomic = 0
max_dest_rd_atomic = 0
min_rnr_timer = 0
port_num = 1
timeout = 0
retry_cnt = 0
rnr_retry = 0
alt_port_num = 0
alt_timeout = 0


qp_state = 1
path_mig_state = 0
qkey = 286331153
rq_psn = 0
sq_psn = 1441792
dest_qp_num = 0
qp_access_flags = 352
pkey_index = 0
alt_pkey_index = 0
en_sqd_async_notify = 170
sq_draining = 0
max_rd_atomic = 0
max_dest_rd_atomic = 0
min_rnr_timer = 0
port_num = 1
timeout = 0
retry_cnt = 0
rnr_retry = 0
alt_port_num = 0
alt_timeout = 0

QP number = 2097225
QP handle = 0
QP state = 1
QP type = 4
QP events completed = 0

QP number = 2097226
QP handle = 1
QP state = 1
QP type = 4
QP events completed = 0

set the send work request fields
set the receive work request fields

      local address: LID 0000 QPN 0x200049 PSN 0x204a16 RKEY 
0x000000b0041c24 VADDR 0x00000000606010
   remote address: LID 0000 QPN 0x20004a PSN 0x442a26 RKEY 0x000000b0041c24 
VADDR 0x00000000606010

PING

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaab006037 in ibv_cmd_create_qp () from 
/usr/lib64/libmlx4-rdmav2.so
(gdb) bt
#0  0x00002aaaab006037 in ibv_cmd_create_qp () from 
/usr/lib64/libmlx4-rdmav2.so
#1  0x00000000004010ba in ibv_post_send (qp=0x605da0, wr=0x7fffffffdf10, 
bad_wr=0x7fffffffe0b0) at /usr/include/infiniband/verbs.h:1000
#2  0x000000000040270b in main (argc=9, argv=0x7fffffffe1f8) at 
gpeIBloopback.cc:557




Konstantin Boyanov
DESY Zeuthen, Platanenallee 6, 15738 Zeuthen
Tel.:+49(33762)77178
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: InfiniBand HCA loopback on a single host (subnet manager needed?)
       [not found] ` <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org>
@ 2011-03-09 17:30   ` Jason Gunthorpe
       [not found]     ` <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jason Gunthorpe @ 2011-03-09 17:30 UTC (permalink / raw)
  To: Konstantin Boyanov; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 09, 2011 at 09:04:39AM +0100, Konstantin Boyanov wrote:

> I think this is achievable on a single host by using the loopback
> capabilities of the InfiniBand HCA. The problem is, that I was not
> able to find a comprehensive description of how one sets up such
> loopback operation on the HCA chip.

To enable loopback I'm pretty sure you need an IB link on the
card, and a running SM to set the link the ACTIVE.

Also, be aware that in my experience loopback is a special case on IB
cards and the performance is not good, so testing in this manner may
not be valid.

If you have a dual port card you'd be better off connecting the ports
together and doing your test from port 1 to port 2.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: InfiniBand HCA loopback on a single host (subnet manager needed?)
       [not found]     ` <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2011-03-09 18:24       ` Hal Rosenstock
       [not found]         ` <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Hal Rosenstock @ 2011-03-09 18:24 UTC (permalink / raw)
  To: Konstantin Boyanov; +Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 9, 2011 at 12:30 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Mar 09, 2011 at 09:04:39AM +0100, Konstantin Boyanov wrote:
>
>> I think this is achievable on a single host by using the loopback
>> capabilities of the InfiniBand HCA. The problem is, that I was not
>> able to find a comprehensive description of how one sets up such
>> loopback operation on the HCA chip.
>
> To enable loopback I'm pretty sure you need an IB link on the
> card, and a running SM to set the link the ACTIVE.

Also, bringing the link to active, etc. can be done via ibportstate if
you want to avoid SM for some reason. Also, you will need a loopback
plug or cable the port to something otherwise the link won't come up,
etc.

-- Hal

>
> Also, be aware that in my experience loopback is a special case on IB
> cards and the performance is not good, so testing in this manner may
> not be valid.
>
> If you have a dual port card you'd be better off connecting the ports
> together and doing your test from port 1 to port 2.
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: InfiniBand HCA loopback on a single host (subnet manager needed?)
       [not found]         ` <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-03-31  8:53           ` Konstantin Boyanov
       [not found]             ` <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Boyanov @ 2011-03-31  8:53 UTC (permalink / raw)
  To: Hal Rosenstock
  Cc: Konstantin Boyanov, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello,

Thanks for the advices! I have gotten my hands on an QSFP loopback plug, 
and yestrday inserted it in the machine (sinlge slot IB card).

Unfortunately I am having problems when starting the Subnet Manager.I 
believe I have installed and loaded all the necessary kernel modules 
needed.

*# lsmod | grep ib
ib_ipoib               78893  0
ib_ucm                 12567  0
ib_uverbs              31293  6 rdma_ucm,ib_ucm
ib_umad                12147  4
ib_cm                  36419  3 ib_ipoib,ib_ucm,rdma_cm
ib_addr                 6089  1 rdma_cm
ib_sa                  22820  4 ib_ipoib,rdma_ucm,rdma_cm,ib_cm
mlx4_ib                52866  1
ib_mad                 40542  4 ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core                66295  11 
ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad 

ipv6                  321509  72 ib_ipoib,ib_addr
mlx4_core              93453  2 mlx4_ib,mlx4_en*


But when I start the opensm via:

*# /etc/init.d/opensm start*

I see a lot of error messages at the end of /var/log/opensm.log:

*Mar 30 12:50:05 622171 [1795B700] 0x80 -> SM port is down
Mar 30 12:50:05 622184 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR 
3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING
SM port is down

Mar 30 12:50:15 622345 [1795B700] 0x80 -> SM port is down
Mar 30 12:50:15 622356 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR 
3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING
Errors on subnet. Duplicate GUID found by link from a port to itself. 
See verbose opensm.log for more details

Mar 30 12:50:25 622645 [1C963700] 0x80 -> Errors on subnet. Duplicate 
GUID found by link from a port to itself. See verbose opensm.log for 
more details

After that, the port state is changed to PORT_INIT, but non of my test 
programs for the loopback (as well as thous in the OFED examples) can 
find a valid LID and oeprate properly.

*# ibv_devinfo
hca_id: mlx4_0
       transport:                      InfiniBand (0)
       fw_ver:                         2.7.626
       node_guid:                      0002:c903:000b:e242
       sys_image_guid:                 0002:c903:000b:e245
       vendor_id:                      0x02c9
       vendor_part_id:                 26428
       hw_ver:                         0xB0
       board_id:                       MT_0D90110009
       phys_port_cnt:                  1
               port:   1
                       state:                  PORT_INIT (2)
                       max_mtu:                2048 (4)
                       active_mtu:             2048 (4)
                       sm_lid:                 0
                       port_lid:               0
                       port_lmc:               0x00

*I am using OFED drivers version 1.4 and the machine is as follows:

*# uname -a
Linux myhost.domain.de 2.6.32-71.18.2.el6.x86_64 #1 SMP Tue Mar 8 
15:00:52 CST 2011 x86_64 x86_64 x86_64 GNU/Linux*

It seems to me that the loopback connector is somehow tricking the 
openSM to think that there is something wrong with the ports. Am I right?

Another thing: If I try to force bring the port to the ACTIVE state with 
ibportstate I get the following error:

# ibportstate -G 0x0002c903000be243 1 enable
ibwarn: [4824] mad_rpc_open_port: can't open UMAD port ((null):0)
ibportstate: iberror: failed: Failed to open '(null)' port '0'


I am really a greenehorn to all this InfiniBand stuff, so please can 
someone decrypt the above error messages in the opensm.log? What should 
I do in order to have a running openSM and a port configured the right 
way, so I can loopback messages? Is there any documentation out there 
which describes the set up of an loopback on a single port, or at least 
the initial setup of an InfiniBand network?

Thnaks in advance for your time and sorry if I am bothering you too much 
with my lame questions.

Best regards,
Konstantin Boyanov


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: InfiniBand HCA loopback on a single host (subnet manager needed?)
       [not found]             ` <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org>
@ 2011-03-31 13:53               ` Hal Rosenstock
  0 siblings, 0 replies; 5+ messages in thread
From: Hal Rosenstock @ 2011-03-31 13:53 UTC (permalink / raw)
  To: Konstantin Boyanov
  Cc: Hal Rosenstock, Konstantin Boyanov, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA

On 3/31/2011 4:53 AM, Konstantin Boyanov wrote:
> Hello,
>
> Thanks for the advices! I have gotten my hands on an QSFP loopback plug,
> and yestrday inserted it in the machine (sinlge slot IB card).
>
> Unfortunately I am having problems when starting the Subnet Manager.I
> believe I have installed and loaded all the necessary kernel modules
> needed.
>
> *# lsmod | grep ib
> ib_ipoib 78893 0
> ib_ucm 12567 0
> ib_uverbs 31293 6 rdma_ucm,ib_ucm
> ib_umad 12147 4
> ib_cm 36419 3 ib_ipoib,ib_ucm,rdma_cm
> ib_addr 6089 1 rdma_cm
> ib_sa 22820 4 ib_ipoib,rdma_ucm,rdma_cm,ib_cm
> mlx4_ib 52866 1
> ib_mad 40542 4 ib_umad,ib_cm,ib_sa,mlx4_ib
> ib_core 66295 11
> ib_ipoib,rdma_ucm,ib_ucm,ib_uverbs,ib_umad,rdma_cm,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad
>
> ipv6 321509 72 ib_ipoib,ib_addr
> mlx4_core 93453 2 mlx4_ib,mlx4_en*
>
>
> But when I start the opensm via:
>
> *# /etc/init.d/opensm start*
>
> I see a lot of error messages at the end of /var/log/opensm.log:
>
> *Mar 30 12:50:05 622171 [1795B700] 0x80 -> SM port is down
> Mar 30 12:50:05 622184 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR
> 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING
> SM port is down
>
> Mar 30 12:50:15 622345 [1795B700] 0x80 -> SM port is down
> Mar 30 12:50:15 622356 [1795B700] 0x01 -> sm_state_mgr_signal_error: ERR
> 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state DISCOVERING
> Errors on subnet. Duplicate GUID found by link from a port to itself.
> See verbose opensm.log for more details
>
> Mar 30 12:50:25 622645 [1C963700] 0x80 -> Errors on subnet. Duplicate
> GUID found by link from a port to itself. See verbose opensm.log for
> more details

My bad; can you cable this to some other IB port (either switch or other 
HCA port) ? If this is a 2 port HCA, then it's simple.

> After that, the port state is changed to PORT_INIT, but non of my test
> programs for the loopback (as well as thous in the OFED examples) can
> find a valid LID and oeprate properly.
>
> *# ibv_devinfo
> hca_id: mlx4_0
> transport: InfiniBand (0)
> fw_ver: 2.7.626
> node_guid: 0002:c903:000b:e242
> sys_image_guid: 0002:c903:000b:e245
> vendor_id: 0x02c9
> vendor_part_id: 26428
> hw_ver: 0xB0
> board_id: MT_0D90110009
> phys_port_cnt: 1
> port: 1
> state: PORT_INIT (2)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 0
> port_lmc: 0x00
>
> *I am using OFED drivers version 1.4 and the machine is as follows:
>
> *# uname -a
> Linux myhost.domain.de 2.6.32-71.18.2.el6.x86_64 #1 SMP Tue Mar 8
> 15:00:52 CST 2011 x86_64 x86_64 x86_64 GNU/Linux*
>
> It seems to me that the loopback connector is somehow tricking the
> openSM to think that there is something wrong with the ports. Am I right?

It's making the OpenSM think that the remote end of the port has a 
duplicate GUID; doesn't handle this case :-(

> Another thing: If I try to force bring the port to the ACTIVE state with
> ibportstate I get the following error:
>
> # ibportstate -G 0x0002c903000be243 1 enable
> ibwarn: [4824] mad_rpc_open_port: can't open UMAD port ((null):0)
> ibportstate: iberror: failed: Failed to open '(null)' port '0'

Let's fix the problems one at a time. You shouldn't need to do this.

-- Hal

>
> I am really a greenehorn to all this InfiniBand stuff, so please can
> someone decrypt the above error messages in the opensm.log? What should
> I do in order to have a running openSM and a port configured the right
> way, so I can loopback messages? Is there any documentation out there
> which describes the set up of an loopback on a single port, or at least
> the initial setup of an InfiniBand network?
>
> Thnaks in advance for your time and sorry if I am bothering you too much
> with my lame questions.
>
> Best regards,
> Konstantin Boyanov
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-03-31 13:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-09  8:04 InfiniBand HCA loopback on a single host (subnet manager needed?) Konstantin Boyanov
     [not found] ` <alpine.LRH.2.00.1103090903140.15803-9mA5q7a405ob1SvskN2V4Q@public.gmane.org>
2011-03-09 17:30   ` Jason Gunthorpe
     [not found]     ` <20110309173005.GN22729-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2011-03-09 18:24       ` Hal Rosenstock
     [not found]         ` <AANLkTimTm2UgUr4A_XJYGJpGBnRAFE74Eu6At+n9Xnfd-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-03-31  8:53           ` Konstantin Boyanov
     [not found]             ` <4D94411F.2080008-T5F83Mi6MZE@public.gmane.org>
2011-03-31 13:53               ` Hal Rosenstock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox