Issue with RDMA_CM on systems with multiple IB HCA's.

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* Issue with RDMA_CM on systems with multiple IB HCA's.
@ 2010-07-22  0:54 Hari Subramoni
       [not found] ` <Pine.GSO.4.40.1007212046022.17-100000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Hari Subramoni @ 2010-07-22  0:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi All,

I'm trying to run 'ib_rdma_bw' test (part of the perftest suite) on a
cluster with two IB ConnectX DDR HCA's. The OFED version I'm using is
1.5.1. OpenSM is running on the network and the ports are up and active.
I see that whenever I use RDMA_CM to establish connections, the program
quits with the error given below (the test runs fine if we don't use
RDMA_CM).

I recall seeing a post mentioning some issues with RDMA_CM on systems
with multiple HCA's. I was wondering whether this has been resolved with
the latest OFED.

Any input on this issue would be greatly appreciated. Also, it would be
great if anyone could point me to any open bugs on this issue so that I
can track it's status.

[subramon@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 |
duplex=0 | cma=1 |
11928: Local address:  LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602
VAddr 0x002b27feabe000
11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605
VAddr 0x002b9d5c93b000

11928:pp_send_start: bad wc status 12
11928:main: Completion with error at client:
11928:main: Failed status 5: wr_id 3
11928:main: scnt=100, ccnt=0

Thanks in advance,
Hari.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <Pine.GSO.4.40.1007212046022.17-100000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>]

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found] ` <Pine.GSO.4.40.1007212046022.17-100000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
@ 2010-07-22  7:15   ` Or Gerlitz
       [not found]     ` <4C47EFFB.1040808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2010-07-22  7:15 UTC (permalink / raw)
  To: Hari Subramoni; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hari Subramoni wrote:
> [subramon@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
> 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 |
> 11928: Local address:  LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602 VAddr 0x002b27feabe000
> 11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605 VAddr 0x002b9d5c93b000


you can see the lid and qp numbers are zero, something is broken... when you use the rdma-cm, 
the address to be provided to the utility should be on an IPoIB subnet, is that what you're doing?

Basically, I would suggest that you first use rping(1) provided by librdmacm-utils to make 
sure things are working well in your configuration and then move to the perftest utils.

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <4C47EFFB.1040808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>]

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]     ` <4C47EFFB.1040808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
@ 2010-07-22 16:41       ` Jonathan Perkins
       [not found]         ` <AANLkTinco6znrHkoE9vL1n7KrrZ68zPExPUPQeDHD64i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2010-07-23  9:32       ` Larry
  1 sibling, 1 reply; 12+ messages in thread
From: Jonathan Perkins @ 2010-07-22 16:41 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Hari Subramoni, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Jul 22, 2010 at 3:15 AM, Or Gerlitz <ogerlitz-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> Hari Subramoni wrote:
>> [subramon@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
>> 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 |
>> 11928: Local address:  LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602 VAddr 0x002b27feabe000
>> 11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605 VAddr 0x002b9d5c93b000
>
>
> you can see the lid and qp numbers are zero, something is broken... when you use the rdma-cm,
> the address to be provided to the utility should be on an IPoIB subnet, is that what you're doing?
>
> Basically, I would suggest that you first use rping(1) provided by librdmacm-utils to make
> sure things are working well in your configuration and then move to the perftest utils.

Thanks for the response Or.  I'm posting some information below.

Here is the output I get when running rping...

[perkinjo@amd5 ~]$ rping -v -s -a 172.16.1.5

[perkinjo@amd6 ~]$ rping -v -c -a 172.16.1.5
cq completion failed status 5
cma event RDMA_CM_EVENT_REJECTED, error 8
wait for CONNECTED state 10
connect error -1
[perkinjo@amd6 ~]$ ping 172.16.1.5
PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data.
64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=3.45 ms
64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=1.00 ms

We are able to ping the addresses but you can see that rping results
in a failure.

We have two interfaces exposed on each machine both on different
subnets (172.16.1.0/24 and 172.16.2.0/24).  We're using ofed-1.5.1 on
these systems.  Any idea of what could be going on?

-- 
Jonathan Perkins
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <AANLkTinco6znrHkoE9vL1n7KrrZ68zPExPUPQeDHD64i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]         ` <AANLkTinco6znrHkoE9vL1n7KrrZ68zPExPUPQeDHD64i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-07-22 17:23           ` Hari Subramoni
       [not found]             ` <Pine.GSO.4.40.1007221320200.17-300000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Hari Subramoni @ 2010-07-22 17:23 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=X-UNKNOWN, Size: 2423 bytes --]

Hi Or,

Thanks a lot for your quick response.

The nodes have LID's assigned to them and OpenSM is running fine. The
reason why the test doesn't print out the LID's seems to be because the
test does not print those fields properly when using RDMA_CM for
establishing connections. I've attached the configurations of the two
hosts along with this e-mail. As Jonathan mentioned, we are able to ping
between them.

The issue is intermittent. It happens at times and at other times, things
work fine. Please let us know if you need any more information.

Thx,
Hari.

On Thu, 22 Jul 2010, Jonathan Perkins wrote:

> On Thu, Jul 22, 2010 at 3:15 AM, Or Gerlitz <ogerlitz-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> > Hari Subramoni wrote:
> >> [subramon@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
> >> 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 |
> >> 11928: Local address:  LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602 VAddr 0x002b27feabe000
> >> 11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605 VAddr 0x002b9d5c93b000
> >
> >
> > you can see the lid and qp numbers are zero, something is broken... when you use the rdma-cm,
> > the address to be provided to the utility should be on an IPoIB subnet, is that what you're doing?
> >
> > Basically, I would suggest that you first use rping(1) provided by librdmacm-utils to make
> > sure things are working well in your configuration and then move to the perftest utils.
>
> Thanks for the response Or.  I'm posting some information below.
>
> Here is the output I get when running rping...
>
> [perkinjo@amd5 ~]$ rping -v -s -a 172.16.1.5
>
> [perkinjo@amd6 ~]$ rping -v -c -a 172.16.1.5
> cq completion failed status 5
> cma event RDMA_CM_EVENT_REJECTED, error 8
> wait for CONNECTED state 10
> connect error -1
> [perkinjo@amd6 ~]$ ping 172.16.1.5
> PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data.
> 64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=3.45 ms
> 64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=1.00 ms
>
> We are able to ping the addresses but you can see that rping results
> in a failure.
>
> We have two interfaces exposed on each machine both on different
> subnets (172.16.1.0/24 and 172.16.2.0/24).  We're using ofed-1.5.1 on
> these systems.  Any idea of what could be going on?
>
> --
> Jonathan Perkins
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 5673 bytes --]

[subramon@amd6 ~]$ ibstat
CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e442
        System image GUID: 0x0002c9030001e445
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 4
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e443
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e444
CA 'mlx4_1'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e44e
        System image GUID: 0x0002c9030001e451
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 6
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e44f
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e450
[subramon@amd6 ~]$
[subramon@amd6 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:30:48:D0:19:CA
          inet addr:164.107.119.237  Bcast:164.107.119.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fed0:19ca/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:132741 errors:0 dropped:0 overruns:0 frame:0
          TX packets:51091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:25346771 (24.1 MiB)  TX bytes:18800740 (17.9 MiB)
          Base address:0xbc00 Memory:d7fe0000-d8000000

ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.1.6  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e443/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:121 errors:0 dropped:0 overruns:0 frame:0
          TX packets:66 errors:0 dropped:10 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:34885 (34.0 KiB)  TX bytes:13913 (13.5 KiB)

ib2       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.2.6  Bcast:172.16.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e44f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:76 errors:0 dropped:0 overruns:0 frame:0
          TX packets:48 errors:0 dropped:10 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:24775 (24.1 KiB)  TX bytes:15327 (14.9 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:7870 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7870 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:855574 (835.5 KiB)  TX bytes:855574 (835.5 KiB)

virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:28 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:6175 (6.0 KiB)

[subramon@amd6 ~]$
[subramon@amd6 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
172.16.2.0      0.0.0.0         255.255.255.0   U         0 0          0 ib2
164.107.119.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
0.0.0.0         164.107.119.1   0.0.0.0         UG        0 0          0 eth0
[subramon@amd6 ~]$
[subramon@amd6 ~]$ ping 172.16.1.5
PING 172.16.1.5 (172.16.1.5) 56(84) bytes of data.
64 bytes from 172.16.1.5: icmp_seq=1 ttl=64 time=2.31 ms
64 bytes from 172.16.1.5: icmp_seq=2 ttl=64 time=0.109 ms
64 bytes from 172.16.1.5: icmp_seq=3 ttl=64 time=0.078 ms

--- 172.16.1.5 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.078/0.834/2.315/1.047 ms
[subramon@amd6 ~]$ ping 172.16.1.6
PING 172.16.1.6 (172.16.1.6) 56(84) bytes of data.
64 bytes from 172.16.1.6: icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from 172.16.1.6: icmp_seq=2 ttl=64 time=0.013 ms
64 bytes from 172.16.1.6: icmp_seq=3 ttl=64 time=0.014 ms

--- 172.16.1.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.013/0.024/0.046/0.015 ms
[subramon@amd6 ~]$

[-- Attachment #3: Type: TEXT/PLAIN, Size: 5618 bytes --]

[subramon@amd5 ~]$ ibstat
CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e386
        System image GUID: 0x0002c9030001e389
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 12
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e387
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e388
CA 'mlx4_1'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0002c9030001e452
        System image GUID: 0x0002c9030001e455
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 7
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e453
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510868
                Port GUID: 0x0002c9030001e454
[subramon@amd5 ~]$ ifconfig
eth0      Link encap:Ethernet  HWaddr 00:30:48:D0:19:BE
          inet addr:164.107.119.236  Bcast:164.107.119.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fed0:19be/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:238196 errors:0 dropped:0 overruns:0 frame:0
          TX packets:172491 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62341710 (59.4 MiB)  TX bytes:94768875 (90.3 MiB)
          Base address:0xbc00 Memory:d7fe0000-d8000000

ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.1.5  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e387/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:121 errors:0 dropped:0 overruns:0 frame:0
          TX packets:78 errors:0 dropped:13 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:47421 (46.3 KiB)  TX bytes:21533 (21.0 KiB)

ib2       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.2.5  Bcast:172.16.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e453/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:120 errors:0 dropped:0 overruns:0 frame:0
          TX packets:45 errors:0 dropped:13 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:40365 (39.4 KiB)  TX bytes:13567 (13.2 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:8506 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8506 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:941478 (919.4 KiB)  TX bytes:941478 (919.4 KiB)

virbr0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:7969 (7.7 KiB)

[subramon@amd5 ~]$
[subramon@amd5 ~]$
[subramon@amd5 ~]$ ping 172.16.1.6
PING 172.16.1.6 (172.16.1.6) 56(84) bytes of data.
64 bytes from 172.16.1.6: icmp_seq=1 ttl=64 time=2.23 ms
64 bytes from 172.16.1.6: icmp_seq=2 ttl=64 time=0.111 ms

--- 172.16.1.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.111/1.172/2.234/1.062 ms
[subramon@amd5 ~]$ ping 172.16.2.6
PING 172.16.2.6 (172.16.2.6) 56(84) bytes of data.
64 bytes from 172.16.2.6: icmp_seq=1 ttl=64 time=1.70 ms
64 bytes from 172.16.2.6: icmp_seq=2 ttl=64 time=0.104 ms
64 bytes from 172.16.2.6: icmp_seq=3 ttl=64 time=0.083 ms

--- 172.16.2.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.083/0.631/1.707/0.760 ms
[subramon@amd5 ~]$
[subramon@amd5 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
172.16.2.0      0.0.0.0         255.255.255.0   U         0 0          0 ib2
164.107.119.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
0.0.0.0         164.107.119.1   0.0.0.0         UG        0 0          0 eth0
[subramon@amd5 ~]$


^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <Pine.GSO.4.40.1007221320200.17-300000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>]

* mlx4_core unable to use MSI-x on our nodes
       [not found]             ` <Pine.GSO.4.40.1007221320200.17-300000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
@ 2010-07-22 17:56               ` Meyer, Donald J
       [not found]                 ` <6203933669E90E4AB42B5BC4EDE38D350CF01EF379-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2010-07-23 20:03               ` Issue with RDMA_CM on systems with multiple IB HCA's Or Gerlitz
  1 sibling, 1 reply; 12+ messages in thread
From: Meyer, Donald J @ 2010-07-22 17:56 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

In our dmesg log I see "mlx4_core 0000:02:00.0: Only 12 MSI-X vectors available, need 25. Not using MSI-X" which if I read it correctly says our mlx4_core is unable to use MSI-X which if I also understand correctly means the mlx4_core can't use the sharing of interrupts among all processors.

If this is correct, do we want to try to fix this so the mlx4_core can use MSI-X and share the interrupts?

And if we do want to fix this, do you have any suggestions for how to fix it?

Thanks,
Don Meyer
Senior Network/System Engineer/Programmer
US+ (253) 371-9532 iNet 8-371-9532
*Other names and brands may be claimed as the property of others

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <6203933669E90E4AB42B5BC4EDE38D350CF01EF379-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>]

* Re: mlx4_core unable to use MSI-x on our nodes
       [not found]                 ` <6203933669E90E4AB42B5BC4EDE38D350CF01EF379-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2010-07-22 19:47                   ` Jason Gunthorpe
  2010-07-25  7:02                   ` Yevgeny Petrilin
  1 sibling, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2010-07-22 19:47 UTC (permalink / raw)
  To: Meyer, Donald J; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, Jul 22, 2010 at 10:56:55AM -0700, Meyer, Donald J wrote:

> In our dmesg log I see "mlx4_core 0000:02:00.0: Only 12 MSI-X
> vectors available, need 25. Not using MSI-X" which if I read it
> correctly says our mlx4_core is unable to use MSI-X which if I also
> understand correctly means the mlx4_core can't use the sharing of
> interrupts among all processors.
 
> If this is correct, do we want to try to fix this so the mlx4_core
> can use MSI-X and share the interrupts?
 
Yes, you will get better performance with MSI-X, fixing it is
worthwhile..

> And if we do want to fix this, do you have any suggestions for how
> to fix it?

AFAIK, the only way to fix this is to user a newer kernel (or
different build options?).. Some kernels do not provide very many
interrupt vectors so they get exhausted easily.

Ie the 2.6.28 kernel I have here is using vector numbers 2298 for MSI
vectors.. There are lots of interrupts available :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: mlx4_core unable to use MSI-x on our nodes
       [not found]                 ` <6203933669E90E4AB42B5BC4EDE38D350CF01EF379-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2010-07-22 19:47                   ` Jason Gunthorpe
@ 2010-07-25  7:02                   ` Yevgeny Petrilin
  1 sibling, 0 replies; 12+ messages in thread
From: Yevgeny Petrilin @ 2010-07-25  7:02 UTC (permalink / raw)
  To: Meyer, Donald J,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

This was already fixed.

Even in case when the kernel is not able to provide us with required number of MSI-X vectores,
The driver then requests the number that the kernel is ready to give (12 in the given case).

--Yevgeny

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org 
> [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Meyer, Donald J
> Sent: Thursday, July 22, 2010 8:57 PM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: mlx4_core unable to use MSI-x on our nodes
> 
> In our dmesg log I see "mlx4_core 0000:02:00.0: Only 12 MSI-X 
> vectors available, need 25. Not using MSI-X" which if I read 
> it correctly says our mlx4_core is unable to use MSI-X which 
> if I also understand correctly means the mlx4_core can't use 
> the sharing of interrupts among all processors.
> 
> If this is correct, do we want to try to fix this so the 
> mlx4_core can use MSI-X and share the interrupts?
> 
> And if we do want to fix this, do you have any suggestions 
> for how to fix it?
> 
> Thanks,
> Don Meyer
> Senior Network/System Engineer/Programmer
> US+ (253) 371-9532 iNet 8-371-9532
> *Other names and brands may be claimed as the property of others
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-rdma" in the body of a message to 
> majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> --
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]             ` <Pine.GSO.4.40.1007221320200.17-300000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
  2010-07-22 17:56               ` mlx4_core unable to use MSI-x on our nodes Meyer, Donald J
@ 2010-07-23 20:03               ` Or Gerlitz
       [not found]                 ` <AANLkTi=s4TWfji7Jkz7SipK6WABKGdB5CurhSqPMxXPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2010-07-23 20:03 UTC (permalink / raw)
  To: Hari Subramoni; +Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hari Subramoni <subramon-wPOY3OvGL++pAIv7I8X2sze48wsgrGvP@public.gmane.org> wrote:

> The nodes have LID's assigned to them and OpenSM is running fine.
> I've attached the configurations of the two hosts along with this e-mail.
>  As Jonathan mentioned, we are able to ping between them.

are the two HCAs on each of the nodes connected to the same IB subnet?

> The issue is intermittent. It happens at times and at other times, things
> work fine. Please let us know if you need any more information.

lets focus on rping, please use both -v -d  flags with rping, also
when  rping fails, please send the neighbours info (#ip neigh show)
from host .5

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <AANLkTi=s4TWfji7Jkz7SipK6WABKGdB5CurhSqPMxXPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]                 ` <AANLkTi=s4TWfji7Jkz7SipK6WABKGdB5CurhSqPMxXPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-07-26 18:42                   ` Hari Subramoni
       [not found]                     ` <Pine.GSO.4.40.1007261442050.20664-100000-3mrvs1K0uXjA85jc68Yv76kAi/sjxfazN7jzCyCsa88@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Hari Subramoni @ 2010-07-26 18:42 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

Yes, both cards are on the same IB subnet.

The machines are down for maintanence now. We will send out the
information you requested as soon as they are up.

Thanks a lot,
Hari.

On Fri, 23 Jul 2010, Or Gerlitz wrote:

> Hari Subramoni <subramon-wPOY3OvGL++pAIv7I8X2sze48wsgrGvP@public.gmane.org> wrote:
>
> > The nodes have LID's assigned to them and OpenSM is running fine.
> > I've attached the configurations of the two hosts along with this e-mail.
> >  As Jonathan mentioned, we are able to ping between them.
>
> are the two HCAs on each of the nodes connected to the same IB subnet?
>
> > The issue is intermittent. It happens at times and at other times, things
> > work fine. Please let us know if you need any more information.
>
> lets focus on rping, please use both -v -d  flags with rping, also
> when  rping fails, please send the neighbours info (#ip neigh show)
> from host .5
>
> Or.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <Pine.GSO.4.40.1007261442050.20664-100000-3mrvs1K0uXjA85jc68Yv76kAi/sjxfazN7jzCyCsa88@public.gmane.org>]

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]                     ` <Pine.GSO.4.40.1007261442050.20664-100000-3mrvs1K0uXjA85jc68Yv76kAi/sjxfazN7jzCyCsa88@public.gmane.org>
@ 2010-07-30  0:20                       ` Hari Subramoni
  0 siblings, 0 replies; 12+ messages in thread
From: Hari Subramoni @ 2010-07-30  0:20 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, 26 Jul 2010, Hari Subramoni wrote:
> Hi,
>
> Yes, both cards are on the same IB subnet.
>
> The machines are down for maintanence now. We will send out the
> information you requested as soon as they are up.
>
> Thanks a lot,
> Hari.
>
> On Fri, 23 Jul 2010, Or Gerlitz wrote:
>
> > Hari Subramoni <subramon-wPOY3OvGL++pAIv7I8X2sze48wsgrGvP@public.gmane.org> wrote:
> >
> > > The nodes have LID's assigned to them and OpenSM is running fine.
> > > I've attached the configurations of the two hosts along with this e-mail.
> > >  As Jonathan mentioned, we are able to ping between them.
> >
> > are the two HCAs on each of the nodes connected to the same IB subnet?
> >
> > > The issue is intermittent. It happens at times and at other times, things
> > > work fine. Please let us know if you need any more information.
> >
> > lets focus on rping, please use both -v -d  flags with rping, also
> > when  rping fails, please send the neighbours info (#ip neigh show)
> > from host .5
> >
> > Or.
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Hi Or,

Sorry for the delay. I as able to reproduce the error after a few
attempts. The details are given below. The systems have OFED-1.5.1. OpenSM
is running and the interfaces are up and active.

Host 1
======
[subramon@amd5 exp2-amd5-install]$ rping -vVd -s -C 1 -a 172.16.2.5
server
count 1
created cm_id 0x5ed7550
rdma_bind_addr successful
rdma_listen

[subramon@amd5 exp2-amd5-install]$ ping 172.16.2.6
PING 172.16.2.6 (172.16.2.6) 56(84) bytes of data.
64 bytes from 172.16.2.6: icmp_seq=1 ttl=64 time=0.169 ms
64 bytes from 172.16.2.6: icmp_seq=2 ttl=64 time=0.104 ms

--- 172.16.2.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.104/0.136/0.169/0.034 ms

[subramon@amd5 exp2-amd5-install]$ ip neigh show
172.16.2.6 dev ib2 lladdr
80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e4:4f REACHABLE
164.107.119.153 dev eth0 lladdr 00:15:17:0f:a8:28 REACHABLE
164.107.119.237 dev eth0 lladdr 00:30:48:d0:19:ca STALE
164.107.119.1 dev eth0 lladdr 00:21:59:85:b0:06 REACHABLE
[subramon@amd5 exp2-amd5-install]$

[subramon@amd5 exp2-amd5-install]$ ifconfig ib0
ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.1.5  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e387/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:75 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:9 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:29059 (28.3 KiB)  TX bytes:9176 (8.9 KiB)

[subramon@amd5 exp2-amd5-install]$ ifconfig ib2
ib2       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.2.5  Bcast:172.16.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e453/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:36 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:10 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:11455 (11.1 KiB)  TX bytes:8001 (7.8 KiB)

[subramon@amd5 exp2-amd5-install]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
172.16.4.0      0.0.0.0         255.255.255.0   U         0 0          0 ib1
172.16.2.0      0.0.0.0         255.255.255.0   U         0 0          0 ib2
164.107.119.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
0.0.0.0         164.107.119.1   0.0.0.0         UG        0 0          0 eth0

Host 2
======
[subramon@amd6 ~]$ rping -vVdc -a 172.16.2.5 -C 1
client
count 1
created cm_id 0xe017550
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0xe017550 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0xe017550 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0xe017a10
created channel 0xe017a30
created cq 0xe017a50
created qp 0xe017b90
rping_setup_buffers called on cb 0xe011010
allocated & registered buffers...
cq_thread started.
cq completion failed status 5
cma_event type RDMA_CM_EVENT_REJECTED cma_id 0xe017550 (parent)
wait for CONNECTED state 10
connect error -1
rping_free_buffers called on cb 0xe011010
cma event RDMA_CM_EVENT_REJECTED, error 8


[subramon@amd6 ~]$ ping 172.16.2.5
PING 172.16.2.5 (172.16.2.5) 56(84) bytes of data.
64 bytes from 172.16.2.5: icmp_seq=1 ttl=64 time=3.04 ms
64 bytes from 172.16.2.5: icmp_seq=2 ttl=64 time=1.09 ms

--- 172.16.2.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 1.097/2.071/3.045/0.974 ms
[subramon@amd6 ~]$
[subramon@amd6 ~]$
[subramon@amd6 ~]$ ip neigh show
164.107.119.1 dev eth0 lladdr 00:21:59:85:b0:06 REACHABLE
172.16.2.5 dev ib2 lladdr
80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e4:53 REACHABLE
164.107.119.236 dev eth0 lladdr 00:30:48:d0:19:be STALE
172.16.2.5 dev ib0 lladdr
80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e4:53 STALE
172.16.1.5 dev ib0 lladdr
80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:01:e3:87 STALE
164.107.119.153 dev eth0 lladdr 00:15:17:0f:a8:28 REACHABLE
[subramon@amd6 ~]$

[subramon@amd6 ~]$ ifconfig ib0
ib0       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.1.6  Bcast:172.16.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e443/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:61 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:9 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:20049 (19.5 KiB)  TX bytes:8263 (8.0 KiB)

[subramon@amd6 ~]$ ifconfig ib2
ib2       Link encap:InfiniBand  HWaddr
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:172.16.2.6  Bcast:172.16.2.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:1:e44f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:14 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:9 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:1740 (1.6 KiB)  TX bytes:8341 (8.1 KiB)

[subramon@amd6 ~]$ netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
172.16.4.0      0.0.0.0         255.255.255.0   U         0 0          0 ib1
172.16.2.0      0.0.0.0         255.255.255.0   U         0 0          0 ib2
164.107.119.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
172.16.1.0      0.0.0.0         255.255.255.0   U         0 0          0 ib0
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
0.0.0.0         164.107.119.1   0.0.0.0         UG        0 0          0 eth0

Thx,
Hari.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]     ` <4C47EFFB.1040808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
  2010-07-22 16:41       ` Jonathan Perkins
@ 2010-07-23  9:32       ` Larry
       [not found]         ` <AANLkTinPRAPDZi8xn5zOr5k26-=040N45vVWk1usf8H5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Larry @ 2010-07-23  9:32 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Hari Subramoni, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Jul 22, 2010 at 3:15 PM, Or Gerlitz <ogerlitz-smomgflXvOZWk0Htik3J/w@public.gmane.org> wrote:
> Hari Subramoni wrote:
>> [subramon@amd6 perftest]$ ./ib_rdma_bw -c 172.16.1.5
>> 11928: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=1 |
>> 11928: Local address:  LID 0000, QPN 000000, PSN 0x5bfbba RKey 0x90042602 VAddr 0x002b27feabe000
>> 11928: Remote address: LID 0000, QPN 000000, PSN 0x392fe6, RKey 0xf8042605 VAddr 0x002b9d5c93b000
>
>
> you can see the lid and qp numbers are zero, something is broken... when you use the rdma-cm,
> the address to be provided to the utility should be on an IPoIB subnet, is that what you're doing?
  The address may not have to be IPoIB interface, any interface which
can find the other side will be OK.
>
> Basically, I would suggest that you first use rping(1) provided by librdmacm-utils to make
> sure things are working well in your configuration and then move to the perftest utils.
>
> Or.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

[parent not found: <AANLkTinPRAPDZi8xn5zOr5k26-=040N45vVWk1usf8H5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* RE: Issue with RDMA_CM on systems with multiple IB HCA's.
       [not found]         ` <AANLkTinPRAPDZi8xn5zOr5k26-=040N45vVWk1usf8H5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-07-23 15:48           ` Hefty, Sean
  0 siblings, 0 replies; 12+ messages in thread
From: Hefty, Sean @ 2010-07-23 15:48 UTC (permalink / raw)
  To: Larry, Or Gerlitz
  Cc: Hari Subramoni,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> > you can see the lid and qp numbers are zero, something is broken... when you use the rdma-cm,
> > the address to be provided to the utility should be on an IPoIB subnet, is that what you're doing?
>   The address may not have to be IPoIB interface, any interface which
> can find the other side will be OK.

Use of the rdma_cm requires that the address be associated with an RDMA device.  In the case of IB, the address should refer to an IP address assigned to an ipoib device.  If you remove the -c option from the perftest, any IP address should work.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-07-30  0:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-22  0:54 Issue with RDMA_CM on systems with multiple IB HCA's Hari Subramoni
     [not found] ` <Pine.GSO.4.40.1007212046022.17-100000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
2010-07-22  7:15   ` Or Gerlitz
     [not found]     ` <4C47EFFB.1040808-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2010-07-22 16:41       ` Jonathan Perkins
     [not found]         ` <AANLkTinco6znrHkoE9vL1n7KrrZ68zPExPUPQeDHD64i-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-22 17:23           ` Hari Subramoni
     [not found]             ` <Pine.GSO.4.40.1007221320200.17-300000-ItQMRKI8FOvVp4Hyp30HIZ9NZdITTVap@public.gmane.org>
2010-07-22 17:56               ` mlx4_core unable to use MSI-x on our nodes Meyer, Donald J
     [not found]                 ` <6203933669E90E4AB42B5BC4EDE38D350CF01EF379-qERRe+bbXDTTXloPLtfHfbfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-07-22 19:47                   ` Jason Gunthorpe
2010-07-25  7:02                   ` Yevgeny Petrilin
2010-07-23 20:03               ` Issue with RDMA_CM on systems with multiple IB HCA's Or Gerlitz
     [not found]                 ` <AANLkTi=s4TWfji7Jkz7SipK6WABKGdB5CurhSqPMxXPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-26 18:42                   ` Hari Subramoni
     [not found]                     ` <Pine.GSO.4.40.1007261442050.20664-100000-3mrvs1K0uXjA85jc68Yv76kAi/sjxfazN7jzCyCsa88@public.gmane.org>
2010-07-30  0:20                       ` Hari Subramoni
2010-07-23  9:32       ` Larry
     [not found]         ` <AANLkTinPRAPDZi8xn5zOr5k26-=040N45vVWk1usf8H5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-23 15:48           ` Hefty, Sean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox