Multicast joins failing on 1.5-rc1?

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* Multicast joins failing on 1.5-rc1?
@ 2009-10-20 18:16 stuarts
       [not found] ` <A5E1097A-DFEA-4508-A47F-FF07C34EA525-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: stuarts @ 2009-10-20 18:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

We have a four  box cluster that we just upgraded to RHEL5.4. This  
required an upgrade to the 1.5 version of OFED. We are using bonding  
over two physical links and ipoib. The final detail is that we are  
using IPv4 multicast to push data from 1 box to the other 3.

Under 1.4, this worked. (Yeah!)
Under 1.5, it doesn't.

By "not working" I mean:
  o IB is able to see the mesh.
  o IPv4 over the bond is working (I can ping, scp files, and similar)
  o Multicast does NOT.

When I looked closer, I can see that I get an error -22 on the  
multicast joins (using a qlogic switche's SM) for everything _except_  
the broadcast join.  I switched over to opensm, since it has far  
better debugging abilities and see the same behavior, though the error  
code is opensm logs a message with error 1B11.

When I look through for the code, I found that error code associated  
with an invalid set of component masks:
Oct 20 12:40:05 824130 [44240940] 0x01 -> mcmr_rcv_join_mgrp: ERR  
1B11: method = SubnAdmSet, scope_state = 0x1, component mask =  
0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:  
ff12:601b:ffff::16 from port 0x0002c90300032431 (x3 HCA-1)

I looked through drivers/infiniband/ulp/ipoib/ipoib_multicast.c and  
found the following interesting bits:
  o The broadcast join is done with the presumption the broadcast  
groups already exist (and they do)
  o In ipoib_mcast_send() data path, ipoib_mcast_sendonly_join() is  
called directly (the multicast task is not used). This path, however,  
does not set the required component_mask bit to clear the 1B11 check  
(check_create_comp_mask())

I looked at the git log (from ofed_kernel_1_5) for ipoib_multicast.c  
and don't see any commits that would appear to be anywhere near this  
area.

Does anyone have any clue to what is going on here?  Thank you, --stuart

p.s. the output from the debugfs:
[root@x3 ipoib]# pwd
/sys/kernel/debug/ipoib
[root@ce-x3 ipoib]# more ib0_mcg
GID: ff12:401b:ffff:0:0:0:0:3a01
   created: 4295351581
   queuelen:         0
   complete:        no
   send_only:      yes

GID: ff12:401b:ffff:0:0:0:ffff:ffff
   created: 4295326209
   queuelen:         0
   complete:       yes
   send_only:       no

--
Stuart Stanley
M: 952-457-3790
stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org
--
"The avalanche has started. It is too late for the pebbles to vote." -  
Kosh in Babylon 5:"Believers"

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1?
       [not found] ` <A5E1097A-DFEA-4508-A47F-FF07C34EA525-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
@ 2009-10-20 18:34   ` Jason Gunthorpe
  2009-10-20 18:52   ` Hal Rosenstock
  1 sibling, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2009-10-20 18:34 UTC (permalink / raw)
  To: stuarts; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 20, 2009 at 01:16:07PM -0500, stuarts wrote:
> When I look through for the code, I found that error code associated  
> with an invalid set of component masks:
> Oct 20 12:40:05 824130 [44240940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B11: 
> method = SubnAdmSet, scope_state = 0x1, component mask =  
> 0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:  
> ff12:601b:ffff::16 from port 0x0002c90300032431 (x3 HCA-1)

I thought this was normal, expected behavior?

The IPoIB Send only join process doesn't create multicast groups. If
there is a listening group then it joins it, otherwise nothing.

Basically, you need listeners before you can send.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1?
       [not found] ` <A5E1097A-DFEA-4508-A47F-FF07C34EA525-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
  2009-10-20 18:34   ` Jason Gunthorpe
@ 2009-10-20 18:52   ` Hal Rosenstock
       [not found]     ` <f0e08f230910201152g476383ffp8e7392dc0c48e41-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Hal Rosenstock @ 2009-10-20 18:52 UTC (permalink / raw)
  To: stuarts; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 20, 2009 at 2:16 PM, stuarts <stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org> wrote:

<snip...>

> When I looked closer, I can see that I get an error -22 on the multicast
> joins (using a qlogic switche's SM) for everything _except_ the broadcast
> join.  I switched over to opensm, since it has far better debugging
> abilities and see the same behavior, though the error code is opensm logs a
> message with error 1B11.
>
> When I look through for the code, I found that error code associated with an
> invalid set of component masks:
> Oct 20 12:40:05 824130 [44240940] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B11:
> method = SubnAdmSet, scope_state = 0x1, component mask = 0x0000000000010083,
> expected comp mask = 0x00000000000130c7, MGID: ff12:601b:ffff::16 from port
> 0x0002c90300032431 (x3 HCA-1)

This is join behavior when the group is not (previously) created (by
some full member). Any idea what was creating this group before ?

-- Hal

<snip...>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1?
       [not found]     ` <f0e08f230910201152g476383ffp8e7392dc0c48e41-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-10-21 19:16       ` stuarts
       [not found]         ` <C28CB83A-CF52-4603-91DF-D56865CBEA98-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: stuarts @ 2009-10-21 19:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Oct 20, 2009, at 1:52 PM, Hal Rosenstock wrote:

> On Tue, Oct 20, 2009 at 2:16 PM, stuarts  
> <stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org> wrote:
>
> <snip...>
>
>> When I looked closer, I can see that I get an error -22 on the  
>> multicast
>> joins (using a qlogic switche's SM) for everything _except_ the  
>> broadcast
>> join.  I switched over to opensm, since it has far better debugging
>> abilities and see the same behavior, though the error code is  
>> opensm logs a
>> message with error 1B11.
>>
>> When I look through for the code, I found that error code  
>> associated with an
>> invalid set of component masks:
>> Oct 20 12:40:05 824130 [44240940] 0x01 -> mcmr_rcv_join_mgrp: ERR  
>> 1B11:
>> method = SubnAdmSet, scope_state = 0x1, component mask =  
>> 0x0000000000010083,
>> expected comp mask = 0x00000000000130c7, MGID: ff12:601b:ffff::16  
>> from port
>> 0x0002c90300032431 (x3 HCA-1)
>
> This is join behavior when the group is not (previously) created (by
> some full member). Any idea what was creating this group before ?

Hal (and Jason):

Thanks for the responses. Yes, it appears the problem is in the "other  
side" of the sender/receiver pairing.

I did a lot more tracing on the sender side. I think I see what is  
happening: The sender uses the IP_ADD_MEMBERSHIP socket op. The IP  
stack (via the dev->mc_list multicast list) tries to create the  
following MGIDs:
ff12:401b:ffff:0000:0000:0000:0100:0025
ff12:601b:ffff:0000:0000:0000:0000:00fb
ff12:601b:ffff:0000:0000:0001:ff03:2431
ff12:601b:ffff:0000:0000:0000:0000:0001
ff12:401b:ffff:0000:0000:0000:0000:0001
ff12:401b:ffff:0000:0000:0000:0000:00fb

The first one is mine, and the others are in the admin band (***1 is  
all-hosts, for example).

This looks like it is valid, BUT, the call to  
ipoib_mcast_addr_is_valid occurs BEFORE the pkey is folded in from the  
ipoib_dev_priv structure. Printing out the pre-fold-in values shows:
00ffffffff12601b0000000000000000000000fb

(This is the dev_mc_list -> dmi_addr value)

Oops, that pkey is "wrong" (0 vs ffff). Out this address goes!

When the broadcast mgid gets created, it is created with the pkey from  
ipoib_dev_priv structure and is thus ffff, not 000. None of the new  
groups ever make it past the bad mcast check and my sender always  
fails because the groups don't exist in the SM.

If I disable the check so all of these "bad" addresses get added, I am  
up and running.

What is the best course of action at this point? Presuming I am not  
missing something obvious, I am not seeing any way to do this cleanly  
with what I know: folding in the pkey earlier would be the exact same  
things as _not_ checking the pkey. Annnnd, I don't think the stuff  
higher up the stack knows the pkey value(I have not looked though, so  
I that is just gut-feel). Open a bug? I'll be happy to do the work and  
provide a patch, though I only have the RHEL5.4 system to test against  
(and only just figured out how to build my own modules this morning).

Thanks, --stuart

Mcast transmitter and receiver example included below, cribbed from  
the example by Antony Courtney. Sorry for the hardcoded bond0  
addresses in the middle there.

/*
  * listener.c -- joins a multicast group and echoes all data it  
receives from
  *		the group to its stdout...
  *
  * Antony Courtney,	25/11/94
  * Modified by: Frédéric Bastien (25/03/04)
  * to compile without warning and work correctly
  */

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <time.h>
#include <string.h>
#include <stdio.h>

#define HELLO_PORT 12345
#define HELLO_GROUP "225.0.0.37"
#define MSGBUFSIZE 256

main(int argc, char *argv[])
{
      struct sockaddr_in addr;
      int fd, nbytes,addrlen;
      struct ip_mreq mreq;
      char msgbuf[MSGBUFSIZE];

      u_int yes=1;            /*** MODIFICATION TO ORIGINAL */

      /* create what looks like an ordinary UDP socket */
      if ((fd=socket(AF_INET,SOCK_DGRAM,0)) < 0) {
	  perror("socket");
	  exit(1);
      }

/**** MODIFICATION TO ORIGINAL */
     /* allow multiple sockets to use the same PORT number */
     if (setsockopt(fd,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(yes)) < 0) {
        perror("Reusing ADDR failed");
        exit(1);
        }
/*** END OF MODIFICATION TO ORIGINAL */

      /* set up destination address */
      memset(&addr,0,sizeof(addr));
      addr.sin_family=AF_INET;
      addr.sin_addr.s_addr=htonl(INADDR_ANY); /* N.B.: differs from  
sender */
      //addr.sin_addr.s_addr = 0xa2fca8c0;
      addr.sin_port=htons(HELLO_PORT);

      /* bind to receive address */
      if (bind(fd,(struct sockaddr *) &addr,sizeof(addr)) < 0) {
	  perror("bind");
	  exit(1);
      }

      /* use setsockopt() to request that the kernel join a multicast  
group */
      mreq.imr_multiaddr.s_addr=inet_addr(HELLO_GROUP);
      mreq.imr_interface.s_addr=0xa2fca8c0;
      if (setsockopt(fd,IPPROTO_IP,IP_ADD_MEMBERSHIP,&mreq,sizeof 
(mreq)) < 0) {
	  perror("setsockopt");
	  exit(1);
      }

      /* now just enter a read-print loop */
      while (1) {
	  addrlen=sizeof(addr);
	  if ((nbytes=recvfrom(fd,msgbuf,MSGBUFSIZE,0,
			       (struct sockaddr *) &addr,&addrlen)) < 0) {
	       perror("recvfrom");
	       exit(1);
	  }
	  puts(msgbuf);
      }
}

/*
  * sender.c -- multicasts "hello, world!" to a multicast group once a  
second
  *
  * Antony Courtney,	25/11/94
  */

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <time.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

#define HELLO_PORT 12345
#define HELLO_GROUP "225.0.0.37"

main(int argc, char *argv[])
{
      struct sockaddr_in addr;
      int fd, cnt;
      struct ip_mreq mreq;
      char *message="Hello, World!";
      struct in_addr interface_addr;
     int er;

      /* create what looks like an ordinary UDP socket */
      if ((fd=socket(AF_INET,SOCK_DGRAM,0)) < 0) {
	  perror("socket");
	  exit(1);
      }
     interface_addr.s_addr = htonl(0xa2fca8c0);
     interface_addr.s_addr = 0xa1fca8c0;
     er=setsockopt (fd, IPPROTO_IP, IP_MULTICAST_IF, &interface_addr,  
sizeof(interface_addr));
     printf("er=%d, errno=%d\n", er, errno);

      /* set up destination address */
      memset(&addr,0,sizeof(addr));
      addr.sin_family=AF_INET;
      addr.sin_addr.s_addr=inet_addr(HELLO_GROUP);
      addr.sin_port=htons(HELLO_PORT);

      /* now just sendto() our destination! */
      while (1) {
	  if (sendto(fd,message,sizeof(message),0,(struct sockaddr *) &addr,
		     sizeof(addr)) < 0) {
	       perror("sendto");
	       exit(1);
	  }
	  sleep(1);
      }
}

--
Stuart Stanley
M: 952-457-3790
stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org
--
"The avalanche has started. It is too late for the pebbles to vote." -  
Kosh in Babylon 5:"Believers"

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1?
       [not found]         ` <C28CB83A-CF52-4603-91DF-D56865CBEA98-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
@ 2009-10-21 20:23           ` Jason Gunthorpe
       [not found]             ` <20091021202346.GO14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2009-10-21 20:23 UTC (permalink / raw)
  To: stuarts; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 21, 2009 at 02:16:47PM -0500, stuarts wrote:

> I did a lot more tracing on the sender side. I think I see what is  
> happening: The sender uses the IP_ADD_MEMBERSHIP socket op. The IP
> stack (via the dev->mc_list multicast list) tries to create the  
> following MGIDs:
> ff12:401b:ffff:0000:0000:0000:0100:0025
> ff12:601b:ffff:0000:0000:0000:0000:00fb
> ff12:601b:ffff:0000:0000:0001:ff03:2431
> ff12:601b:ffff:0000:0000:0000:0000:0001
> ff12:401b:ffff:0000:0000:0000:0000:0001
> ff12:401b:ffff:0000:0000:0000:0000:00fb
> 
> The first one is mine, and the others are in the admin band (***1 is  
> all-hosts, for example).
>
> This looks like it is valid, BUT, the call to  
> ipoib_mcast_addr_is_valid occurs BEFORE the pkey is folded in from the  
> ipoib_dev_priv structure. Printing out the pre-fold-in values shows:
> 00ffffffff12601b0000000000000000000000fb
> 
> (This is the dev_mc_list -> dmi_addr value)
> 
> Oops, that pkey is "wrong" (0 vs ffff). Out this address goes!

Hmm, I created the ipoib_mcast_addr_is_valid last month and it seemed
correct in my testing. I'm surprised to see this.

The intention was to catch groups that don't have the right pkey
set. Everything should be compeltely consistent by this point in the
code, the dmi_addr should have the pkey included in it. If this is not
true then the ip tools and other diagnostics will not function
properly.

What does IP say for your setup? Mine reports this:

$ ip link show dev ib0
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP qlen 256
    link/infiniband 80:2e:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:00:14:a5 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

$ ib1{jgg}~#~/work/iproute2.git/ip/ip maddr show dev ib0
4:      ib0
        link  33:33:ff:fe:f9:2d:00:00:00:00:00:00:00:00:00:e2:e4:f5:00:df static
        link  00:ff:ff:ff:ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:00:14:a5
        link  00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:00:00:00:01
        link  00:ff:ff:ff:ff:12:60:1b:ff:ff:00:00:00:00:00:00:00:00:00:01

So:
          brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
        link  00:ff:ff:ff:ff:12:60:1b:ff:ff:00:00:00:00:00:00:00:00:00:01
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Seems OK to me.

All mcast groups are created in the IP stack using this function:

static inline void ip_ib_mc_map(__be32 naddr, const unsigned char *broadcast, char *buf)
{
[..]
        buf[8]  = broadcast[8];         /* P_Key */
        buf[9]  = broadcast[9];
}

So I can't see how you can possibly get a mismatching pkey.

Are you using an upstream kernel or a backport to some RH kernel? What
does your ip_ib_mc_map function look like? It is a bit of a problem
for backports because it is inlined and built into the main kernel
code, if the original RH source for their kernel does not include the
above then it is broken and backporting the ipoib_mcast_addr_is_valid
just catches a pre-existing bug (as it was intended, actually)

Can you point me to where you see the 'pkey folding'? Is that present
in the mainline kernel?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1?
       [not found]             ` <20091021202346.GO14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-21 21:43               ` stuarts
       [not found]                 ` <A53D7B2B-EE41-4ABC-BC02-EE9A100C5DD8-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: stuarts @ 2009-10-21 21:43 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


On Oct 21, 2009, at 3:23 PM, Jason Gunthorpe wrote:

> On Wed, Oct 21, 2009 at 02:16:47PM -0500, stuarts wrote:
<snip>
> Hmm, I created the ipoib_mcast_addr_is_valid last month and it seemed
> correct in my testing. I'm surprised to see this.

Looks like you did it right.. see below!

>
> The intention was to catch groups that don't have the right pkey
> set. Everything should be compeltely consistent by this point in the
> code, the dmi_addr should have the pkey included in it. If this is not
> true then the ip tools and other diagnostics will not function
> properly.
>
> What does IP say for your setup? Mine reports this:
>
> $ ip link show dev ib0
> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast  
> state UP qlen 256
>    link/infiniband 80:2e:00:48:fe: 
> 80:00:00:00:00:00:00:00:02:c9:03:00:00:14:a5 brd 00:ff:ff:ff:ff: 
> 12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>
> $ ib1{jgg}~#~/work/iproute2.git/ip/ip maddr show dev ib0
> 4:      ib0
>        link  33:33:ff:fe:f9:2d: 
> 00:00:00:00:00:00:00:00:00:e2:e4:f5:00:df static
>        link  00:ff:ff:ff:ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff: 
> 00:14:a5
>        link  00:ff:ff:ff:ff:12:40:1b:ff:ff: 
> 00:00:00:00:00:00:00:00:00:01
>        link  00:ff:ff:ff:ff:12:60:1b:ff:ff: 
> 00:00:00:00:00:00:00:00:00:01
>
> So:
>          brd 00:ff:ff:ff:ff:12:40:1b:ff:ff: 
> 00:00:00:00:00:00:ff:ff:ff:ff
>        link  00:ff:ff:ff:ff:12:60:1b:ff:ff: 
> 00:00:00:00:00:00:00:00:00:01
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Seems OK to me.

5:	ib0
	link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:fb
	link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:01:ff:03:24:31
	link  00:ff:ff:ff:ff:12:60:1b:00:00:00:00:00:00:00:00:00:00:00:01
	link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:01
	link  00:ff:ff:ff:ff:12:40:1b:00:00:00:00:00:00:00:00:00:00:00:fb

1b:00:00. oops!

>
> All mcast groups are created in the IP stack using this function:
>
> static inline void ip_ib_mc_map(__be32 naddr, const unsigned char  
> *broadcast, char *buf)
> {
> [..]
>        buf[8]  = broadcast[8];         /* P_Key */
>        buf[9]  = broadcast[9];
> }

And there we have it. I am stuck with the RHEL based kernel. The  
ip_ib_mc_map I have does not even have the broadcast parameter at all  
(naddr and buf only).

>
> So I can't see how you can possibly get a mismatching pkey.
>
> Are you using an upstream kernel or a backport to some RH kernel? What
> does your ip_ib_mc_map function look like? It is a bit of a problem
> for backports because it is inlined and built into the main kernel
> code, if the original RH source for their kernel does not include the
> above then it is broken and backporting the ipoib_mcast_addr_is_valid
> just catches a pre-existing bug (as it was intended, actually)
>
> Can you point me to where you see the 'pkey folding'? Is that present
> in the mainline kernel?

It's in ipoib_mcast_restart_task:
         /* Mark all of the entries that are found or don't exist */
         for (mclist = dev->mc_list; mclist; mclist = mclist->next)  
{                union ib_gid mgid;
                 if (!ipoib_mcast_addr_is_valid(mclist->dmi_addr,
                                                mclist- 
 >dmi_addrlen,                                               dev- 
 >broadcast,priv)) {
ipoib_dbg_mcast(priv, "skipping invalid \n");
                 //      continue;
}
                 memcpy(mgid.raw, mclist->dmi_addr + 4, sizeof mgid);

                 /* Add in the P_Key */
                 mgid.raw[4] = (priv->pkey >> 8) & 0xff;
                 mgid.raw[5] = priv->pkey & 0xff;


Sorry for the extra goop in there. This is gone from the mainline  
kernel, so it is RHEL5.4 + backport that seems to be the problem.

I'll try to check out if these boxes are fully up to date tomorrow.  
Thank you again for the help.

--stuart
--
Stuart Stanley
M: 952-457-3790
stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org
--
"I can only conclude that I'm paying off karma at a vastly accelerated  
rate." - Susan Ivanova in Babylon 5:"Points of Departure"

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1? (OFED BACKPORT BUG)
       [not found]                 ` <A53D7B2B-EE41-4ABC-BC02-EE9A100C5DD8-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
@ 2009-10-21 22:08                   ` Jason Gunthorpe
       [not found]                     ` <20091021220837.GP14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2009-10-21 22:08 UTC (permalink / raw)
  To: stuarts; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Oct 21, 2009 at 04:43:53PM -0500, stuarts wrote:

>> All mcast groups are created in the IP stack using this function:
>>
>> static inline void ip_ib_mc_map(__be32 naddr, const unsigned char  
>> *broadcast, char *buf)
>> {
>> [..]
>>        buf[8]  = broadcast[8];         /* P_Key */
>>        buf[9]  = broadcast[9];
>> }
>
> And there we have it. I am stuck with the RHEL based kernel. The  
> ip_ib_mc_map I have does not even have the broadcast parameter at all  
> (naddr and buf only).

Ah, that is what I suspected, OK this makes sense now.

This is a bug in the backport, it is very serious since multicast
does not work as it is.

The code you identified in ipoib_mcast_restart_task is part of
the backport, and is designed to work around the above limitations.

I would remove it, and do the following (untested) instead:

--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -814,6 +814,13 @@ void ipoib_mcast_restart_task(struct work_struct *work)
        for (mclist = dev->mc_list; mclist; mclist = mclist->next) {
                union ib_gid mgid;
 
+               /* Work around broken ip_ib_mc_map */
+               if (mclist->dmi_addrlen == INFINIBAND_ALEN) {
+                       mclist->dmi_addr[5] = 0x10 | (dev->broadcast[5] & 0xF);
+                       mclist->dmi_addr[8] = dev->broadcast[8];
+                       mclist->dmi_addr[9] = dev->broadcast[9];
+               }
+
                if (!ipoib_mcast_addr_is_valid(mclist->dmi_addr,
                                               mclist->dmi_addrlen,
                                               dev->broadcast))

This is better than the current stuff since it preserves the intent of
the ip_ib_mc_map patches, and it adjusts the dmi_addr directly so ip
maddr reports the correct address to aid in debugging.

> Sorry for the extra goop in there. This is gone from the mainline  
> kernel, so it is RHEL5.4 + backport that seems to be the problem.

Correct. We need someone to pick up the above patch for the
backports. I don't know who that is (someone please speak up?)

If you can confirm the above does it for you then it would probably
help the backporter.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1? (OFED BACKPORT BUG)
       [not found]                     ` <20091021220837.GP14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2009-10-22  9:12                       ` Tziporet Koren
  2009-10-22 15:08                       ` stuarts
  1 sibling, 0 replies; 10+ messages in thread
From: Tziporet Koren @ 2009-10-22  9:12 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: stuarts, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Jason Gunthorpe wrote:
>
> Correct. We need someone to pick up the above patch for the
> backports. I don't know who that is (someone please speak up?)
>
>   
Vlad will take it
Please open a bug in bugzilla too

Tziporet

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1? (OFED BACKPORT BUG)
       [not found]                     ` <20091021220837.GP14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2009-10-22  9:12                       ` Tziporet Koren
@ 2009-10-22 15:08                       ` stuarts
       [not found]                         ` <3BAE2C3C-9724-47C6-BF44-EF0CDD47612C-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: stuarts @ 2009-10-22 15:08 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


On Oct 21, 2009, at 5:08 PM, Jason Gunthorpe wrote:

> On Wed, Oct 21, 2009 at 04:43:53PM -0500, stuarts wrote:
>
>>>
<snip patch, etc>
>
> This is better than the current stuff since it preserves the intent of
> the ip_ib_mc_map patches, and it adjusts the dmi_addr directly so ip
> maddr reports the correct address to aid in debugging.
>
>> Sorry for the extra goop in there. This is gone from the mainline
>> kernel, so it is RHEL5.4 + backport that seems to be the problem.
>
> Correct. We need someone to pick up the above patch for the
> backports. I don't know who that is (someone please speak up?)
>
> If you can confirm the above does it for you then it would probably
> help the backporter.

Confirmed. I have multicast again!

Shall I open a bug, or has one already been done?

Thanks, --stuart


--
Stuart Stanley
M: 952-457-3790
stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org
--
"Not bad for a man in his jim jams. Very Aurther Dent. Now, there was  
a nice man." - The Doctor in Dr. Who:"The Christmas Invasion"

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Multicast joins failing on 1.5-rc1? (OFED BACKPORT BUG)
       [not found]                         ` <3BAE2C3C-9724-47C6-BF44-EF0CDD47612C-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
@ 2009-10-22 16:39                           ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2009-10-22 16:39 UTC (permalink / raw)
  To: stuarts; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Oct 22, 2009 at 10:08:20AM -0500, stuarts wrote:

> Confirmed. I have multicast again!
> 
> Shall I open a bug, or has one already been done?

Please open a bug, Tziporet said Vlad would handle it..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-10-22 16:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-20 18:16 Multicast joins failing on 1.5-rc1? stuarts
     [not found] ` <A5E1097A-DFEA-4508-A47F-FF07C34EA525-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-20 18:34   ` Jason Gunthorpe
2009-10-20 18:52   ` Hal Rosenstock
     [not found]     ` <f0e08f230910201152g476383ffp8e7392dc0c48e41-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-21 19:16       ` stuarts
     [not found]         ` <C28CB83A-CF52-4603-91DF-D56865CBEA98-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-21 20:23           ` Jason Gunthorpe
     [not found]             ` <20091021202346.GO14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-21 21:43               ` stuarts
     [not found]                 ` <A53D7B2B-EE41-4ABC-BC02-EE9A100C5DD8-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-21 22:08                   ` Multicast joins failing on 1.5-rc1? (OFED BACKPORT BUG) Jason Gunthorpe
     [not found]                     ` <20091021220837.GP14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22  9:12                       ` Tziporet Koren
2009-10-22 15:08                       ` stuarts
     [not found]                         ` <3BAE2C3C-9724-47C6-BF44-EF0CDD47612C-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-22 16:39                           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox