netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.10.0-rc2 mlx4 not receiving packets for some multicast groups
@ 2013-05-24 15:49 Shawn Bohrer
  2013-05-24 16:34 ` Shawn Bohrer
  2013-05-25  3:49 ` Or Gerlitz
  0 siblings, 2 replies; 16+ messages in thread
From: Shawn Bohrer @ 2013-05-24 15:49 UTC (permalink / raw)
  To: netdev; +Cc: Or Gerlitz, Hadar Hen Zion, Rony Efraim, Amir Vadai

I just started testing the 3.10 kernel, previously we were on 3.4 so
there is a fairly large jump.  I've additionally applied the following
four patches to the 3.10.0-rc2 kernel that I'm testing:

https://patchwork.kernel.org/patch/2484651/
https://patchwork.kernel.org/patch/2484671/
https://patchwork.kernel.org/patch/2484681/
https://patchwork.kernel.org/patch/2484641/

I don't know if those patches are related to my issues or not but I
plan on trying to reproduce without them soon.

The issue I'm seeing is that our applications listen on a number of
multicast addresses.  In this case I'm listening to about 350
different addresses per machine, across many different processes, with
usually one socket per address.  The problem is that some of the
sockets are not receiving any data and some are, even though they all
should be.  If I put the device in promiscuous mode then I start
receiving data on all of my sockets.  Running netstat -g shows all of
my memberships so it appears to me that the kernel and the switch
think I've joined the groups, but the card may be filtering the data.
This is with:

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

# ethtool -i eth4
driver: mlx4_en
version: 2.0 (Dec 2011)
firmware-version: 2.11.500
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no

The other strange part is that I've got multiple machines all running
the same kernel and not all of them are experiencing the issue.  At
one point they were all working fine, but the issue appeared after I
rebooted one of the machines and multiple reboots later it is still in
this bad state.  Rebooting that machine back to 3.4 causes it to work
as expected but no luck under 3.10.  I've now got two machines in this
bad state and they both started immediately after a reboot.

Does anyone have any ideas?

Thanks,
Shawn

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-05-31 15:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-24 15:49 3.10.0-rc2 mlx4 not receiving packets for some multicast groups Shawn Bohrer
2013-05-24 16:34 ` Shawn Bohrer
2013-05-24 16:58   ` Eric Dumazet
2013-05-25  3:41   ` Or Gerlitz
2013-05-25 15:13     ` Shawn Bohrer
2013-05-25 19:41       ` Or Gerlitz
2013-05-25 21:37         ` Shawn Bohrer
2013-05-28 20:15       ` Shawn Bohrer
2013-05-29 13:55         ` Or Gerlitz
2013-05-30 20:31           ` Shawn Bohrer
2013-05-30 20:42             ` Or Gerlitz
2013-05-30 20:57               ` Vlad Yasevich
2013-05-31  0:23                 ` Jay Vosburgh
2013-05-31 15:17                   ` Shawn Bohrer
2013-05-25  3:49 ` Or Gerlitz
2013-05-25 14:02   ` Shawn Bohrer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).