public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Multicast joins failing on 1.5-rc1?
@ 2009-10-20 18:16 stuarts
       [not found] ` <A5E1097A-DFEA-4508-A47F-FF07C34EA525-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: stuarts @ 2009-10-20 18:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA



We have a four  box cluster that we just upgraded to RHEL5.4. This  
required an upgrade to the 1.5 version of OFED. We are using bonding  
over two physical links and ipoib. The final detail is that we are  
using IPv4 multicast to push data from 1 box to the other 3.

Under 1.4, this worked. (Yeah!)
Under 1.5, it doesn't.

By "not working" I mean:
  o IB is able to see the mesh.
  o IPv4 over the bond is working (I can ping, scp files, and similar)
  o Multicast does NOT.

When I looked closer, I can see that I get an error -22 on the  
multicast joins (using a qlogic switche's SM) for everything _except_  
the broadcast join.  I switched over to opensm, since it has far  
better debugging abilities and see the same behavior, though the error  
code is opensm logs a message with error 1B11.

When I look through for the code, I found that error code associated  
with an invalid set of component masks:
Oct 20 12:40:05 824130 [44240940] 0x01 -> mcmr_rcv_join_mgrp: ERR  
1B11: method = SubnAdmSet, scope_state = 0x1, component mask =  
0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID:  
ff12:601b:ffff::16 from port 0x0002c90300032431 (x3 HCA-1)

I looked through drivers/infiniband/ulp/ipoib/ipoib_multicast.c and  
found the following interesting bits:
  o The broadcast join is done with the presumption the broadcast  
groups already exist (and they do)
  o In ipoib_mcast_send() data path, ipoib_mcast_sendonly_join() is  
called directly (the multicast task is not used). This path, however,  
does not set the required component_mask bit to clear the 1B11 check  
(check_create_comp_mask())

I looked at the git log (from ofed_kernel_1_5) for ipoib_multicast.c  
and don't see any commits that would appear to be anywhere near this  
area.

Does anyone have any clue to what is going on here?  Thank you, --stuart

p.s. the output from the debugfs:
[root@x3 ipoib]# pwd
/sys/kernel/debug/ipoib
[root@ce-x3 ipoib]# more ib0_mcg
GID: ff12:401b:ffff:0:0:0:0:3a01
   created: 4295351581
   queuelen:         0
   complete:        no
   send_only:      yes

GID: ff12:401b:ffff:0:0:0:ffff:ffff
   created: 4295326209
   queuelen:         0
   complete:       yes
   send_only:       no

--
Stuart Stanley
M: 952-457-3790
stuarts-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org
--
"The avalanche has started. It is too late for the pebbles to vote." -  
Kosh in Babylon 5:"Believers"

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-10-22 16:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-20 18:16 Multicast joins failing on 1.5-rc1? stuarts
     [not found] ` <A5E1097A-DFEA-4508-A47F-FF07C34EA525-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-20 18:34   ` Jason Gunthorpe
2009-10-20 18:52   ` Hal Rosenstock
     [not found]     ` <f0e08f230910201152g476383ffp8e7392dc0c48e41-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-10-21 19:16       ` stuarts
     [not found]         ` <C28CB83A-CF52-4603-91DF-D56865CBEA98-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-21 20:23           ` Jason Gunthorpe
     [not found]             ` <20091021202346.GO14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-21 21:43               ` stuarts
     [not found]                 ` <A53D7B2B-EE41-4ABC-BC02-EE9A100C5DD8-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-21 22:08                   ` Multicast joins failing on 1.5-rc1? (OFED BACKPORT BUG) Jason Gunthorpe
     [not found]                     ` <20091021220837.GP14520-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2009-10-22  9:12                       ` Tziporet Koren
2009-10-22 15:08                       ` stuarts
     [not found]                         ` <3BAE2C3C-9724-47C6-BF44-EF0CDD47612C-dK3M3PVJaX4iXRBKUn1UN0EOCMrvLtNR@public.gmane.org>
2009-10-22 16:39                           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox