* [Bridge] tg3 bridge problems @ 2005-01-10 14:06 Gergely Madarasz 2005-01-10 15:05 ` [Bridge] " Neil Horman 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-10 14:06 UTC (permalink / raw) To: bridge, linux-net Hello, I've got a very strange problem. Lately I've been setting up my linux servers for network (layer2) redundancy with a bridge interface containing two ethernet interfaces connecting to two switches. So far I didn't have any problems with it, but now a very strange thing happens with a new server I'm installing. The server is an ibm x346 having two onboard BCM5721 cards, the switches are cisco 3550, and I've tested with kernel versions 2.6.10 and 2.4.28. The bpdu's from the cisco switches simply cannot be seen on the server, causing loops in l2 traffic. I've tested with sticking a hub between the c3550 and the server, the switch sends out the bpdu's, but they are not seen by linux (running tethereal). This happens only on eth0, on eth1 everything seems fine. Any IP traffic on eth0 goes through, no packet loss, no errors. And something even more strange: if I do an ifconfig eth0 0 up; brctl addif br0 eth0; it seems to be working fine, if I do it the other way round, then the bpdu's sent by the switches are lost somewhere. Considering all these, the problem seems to me a strange interaction between the bridge driver, the tg3 driver and the hardware in question. Any ideas? Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 14:06 [Bridge] tg3 bridge problems Gergely Madarasz @ 2005-01-10 15:05 ` Neil Horman 2005-01-10 15:45 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-10 15:05 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > Hello, > > I've got a very strange problem. Lately I've been setting up my linux > servers for network (layer2) redundancy with a bridge interface containing > two ethernet interfaces connecting to two switches. So far I didn't have > any problems with it, but now a very strange thing happens with a new > server I'm installing. The server is an ibm x346 having two onboard > BCM5721 cards, the switches are cisco 3550, and I've tested with kernel > versions 2.6.10 and 2.4.28. > > The bpdu's from the cisco switches simply cannot be seen on the server, > causing loops in l2 traffic. I've tested with sticking a hub between the > c3550 and the server, the switch sends out the bpdu's, but they are not > seen by linux (running tethereal). This happens only on eth0, on eth1 > everything seems fine. Any IP traffic on eth0 goes through, no packet > loss, no errors. > > And something even more strange: if I do an > ifconfig eth0 0 up; brctl addif br0 eth0; > it seems to be working fine, if I do it the other way > round, then the bpdu's sent by the switches are lost somewhere. > > Considering all these, the problem seems to me a strange interaction > between the bridge driver, the tg3 driver and the hardware in question. > > Any ideas? > > Greg > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html It looks to me like either order should work just fine, as long as the IFF_PROMISC flag isn't cleared when you bring up the interface. Is IF_PROMISC clear in ifconfig after you issue your ifconfig eth0 up command? Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 15:05 ` [Bridge] " Neil Horman @ 2005-01-10 15:45 ` Gergely Madarasz 2005-01-10 16:04 ` Neil Horman 2005-01-10 19:34 ` Stephen Hemminger 0 siblings, 2 replies; 22+ messages in thread From: Gergely Madarasz @ 2005-01-10 15:45 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Mon, Jan 10, 2005 at 10:05:56AM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >Hello, > > > >I've got a very strange problem. Lately I've been setting up my linux > >servers for network (layer2) redundancy with a bridge interface containing > >two ethernet interfaces connecting to two switches. So far I didn't have > >any problems with it, but now a very strange thing happens with a new > >server I'm installing. The server is an ibm x346 having two onboard > >BCM5721 cards, the switches are cisco 3550, and I've tested with kernel > >versions 2.6.10 and 2.4.28. > > > >The bpdu's from the cisco switches simply cannot be seen on the server, > >causing loops in l2 traffic. I've tested with sticking a hub between the > >c3550 and the server, the switch sends out the bpdu's, but they are not > >seen by linux (running tethereal). This happens only on eth0, on eth1 > >everything seems fine. Any IP traffic on eth0 goes through, no packet > >loss, no errors. > > > >And something even more strange: if I do an > >ifconfig eth0 0 up; brctl addif br0 eth0; > >it seems to be working fine, if I do it the other way > >round, then the bpdu's sent by the switches are lost somewhere. > > > >Considering all these, the problem seems to me a strange interaction > >between the bridge driver, the tg3 driver and the hardware in question. > > It looks to me like either order should work just fine, as long as the > IFF_PROMISC flag isn't cleared when you bring up the interface. Is > IF_PROMISC clear in ifconfig after you issue your ifconfig eth0 up command? ifconfig has never showed PROMISC on either of my bridged servers. ip shows it though. I noticed this before but couldn't find the reason and didn't seem important. This is on a machine with working bridge: # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:09:6B:49:89:80 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 ... # ip link list eth0 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:09:6b:49:89:80 brd ff:ff:ff:ff:ff:ff And this is on the problematic machine: # ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0D:60:55:3B:02 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 .... # ip addr list eth0 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0d:60:55:3b:02 brd ff:ff:ff:ff:ff:ff And even if didn't get promisc, tcpdump or tethereal would have made it so, when I was looking for the bpdu packets. Tethereal shows only this: 0.000000 00:0d:60:55:3b:02 -> 01:80:c2:00:00:00 STP Conf. Root = 65535/00:0d:60:55:3b:02 Cost = 0 Port = 0x8001 Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 15:45 ` Gergely Madarasz @ 2005-01-10 16:04 ` Neil Horman 2005-01-10 16:18 ` Gergely Madarasz 2005-01-10 19:34 ` Stephen Hemminger 1 sibling, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-10 16:04 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Mon, Jan 10, 2005 at 10:05:56AM -0500, Neil Horman wrote: > >>Gergely Madarasz wrote: >> >>>Hello, >>> >>>I've got a very strange problem. Lately I've been setting up my linux >>>servers for network (layer2) redundancy with a bridge interface containing >>>two ethernet interfaces connecting to two switches. So far I didn't have >>>any problems with it, but now a very strange thing happens with a new >>>server I'm installing. The server is an ibm x346 having two onboard >>>BCM5721 cards, the switches are cisco 3550, and I've tested with kernel >>>versions 2.6.10 and 2.4.28. >>> >>>The bpdu's from the cisco switches simply cannot be seen on the server, >>>causing loops in l2 traffic. I've tested with sticking a hub between the >>>c3550 and the server, the switch sends out the bpdu's, but they are not >>>seen by linux (running tethereal). This happens only on eth0, on eth1 >>>everything seems fine. Any IP traffic on eth0 goes through, no packet >>>loss, no errors. >>> >>>And something even more strange: if I do an >>>ifconfig eth0 0 up; brctl addif br0 eth0; >>>it seems to be working fine, if I do it the other way >>>round, then the bpdu's sent by the switches are lost somewhere. >>> >>>Considering all these, the problem seems to me a strange interaction >>>between the bridge driver, the tg3 driver and the hardware in question. >> >>It looks to me like either order should work just fine, as long as the >>IFF_PROMISC flag isn't cleared when you bring up the interface. Is >>IF_PROMISC clear in ifconfig after you issue your ifconfig eth0 up command? > > > ifconfig has never showed PROMISC on either of my bridged servers. > ip shows it though. I noticed this before but couldn't find the reason and > didn't seem important. > > This is on a machine with working bridge: > > # ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 00:09:6B:49:89:80 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > ... > # ip link list eth0 > 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 > link/ether 00:09:6b:49:89:80 brd ff:ff:ff:ff:ff:ff > > And this is on the problematic machine: > > # ifconfig eth0 > eth0 Link encap:Ethernet HWaddr 00:0D:60:55:3B:02 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > .... > # ip addr list eth0 > 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 1000 > link/ether 00:0d:60:55:3b:02 brd ff:ff:ff:ff:ff:ff > Strange. My concern was that the tg3 interface has its hardware reset whenever its set to be up, and part of that is a resetting of its receive mode. If for some reason IFF_PROMISC was cleared after you set it using brctl, the interface might be taken out of promisc mode. Do you have any iptables rules running that might drop bpdus? > And even if didn't get promisc, tcpdump or tethereal would have made it > so, when I was looking for the bpdu packets. Tethereal shows only this: > Thats not entirely true. ethereal/tcpdump/etc use the PF_PACKET interface, which talks directly to the driver to set the receive list, and doesn't always go through the device layer, which is where IFF_PROMISC gets asserted. Agruably it should set promisc, but not doing so doesn't prevent ethereal and friends from capturing in promiscuous mode. > 0.000000 00:0d:60:55:3b:02 -> 01:80:c2:00:00:00 STP Conf. Root = 65535/00:0d:60:55:3b:02 Cost = 0 Port = 0x8001 > > Greg -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 16:04 ` Neil Horman @ 2005-01-10 16:18 ` Gergely Madarasz 2005-01-10 17:40 ` Neil Horman 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-10 16:18 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Mon, Jan 10, 2005 at 10:05:56AM -0500, Neil Horman wrote: > > > >>Gergely Madarasz wrote: > >> > >>>The bpdu's from the cisco switches simply cannot be seen on the server, > >>>causing loops in l2 traffic. I've tested with sticking a hub between the > >>>c3550 and the server, the switch sends out the bpdu's, but they are not > >>>seen by linux (running tethereal). This happens only on eth0, on eth1 > >>>everything seems fine. Any IP traffic on eth0 goes through, no packet > >>>loss, no errors. > >>> > >>>And something even more strange: if I do an > >>>ifconfig eth0 0 up; brctl addif br0 eth0; > >>>it seems to be working fine, if I do it the other way > >>>round, then the bpdu's sent by the switches are lost somewhere. > >>> > >>>Considering all these, the problem seems to me a strange interaction > >>>between the bridge driver, the tg3 driver and the hardware in question. > >> > >>It looks to me like either order should work just fine, as long as the > >>IFF_PROMISC flag isn't cleared when you bring up the interface. Is > >>IF_PROMISC clear in ifconfig after you issue your ifconfig eth0 up > >>command? > > > > > >ifconfig has never showed PROMISC on either of my bridged servers. > >ip shows it though. I noticed this before but couldn't find the reason and > >didn't seem important. > > > >This is on a machine with working bridge: > > > ># ifconfig eth0 > >eth0 Link encap:Ethernet HWaddr 00:09:6B:49:89:80 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > ... > ># ip link list eth0 > >2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen > >1000 > > link/ether 00:09:6b:49:89:80 brd ff:ff:ff:ff:ff:ff > > > >And this is on the problematic machine: > > > ># ifconfig eth0 > >eth0 Link encap:Ethernet HWaddr 00:0D:60:55:3B:02 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > .... > ># ip addr list eth0 > >2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen > >1000 > > link/ether 00:0d:60:55:3b:02 brd ff:ff:ff:ff:ff:ff > > > Strange. My concern was that the tg3 interface has its hardware reset > whenever its set to be up, and part of that is a resetting of its > receive mode. If for some reason IFF_PROMISC was cleared after you set > it using brctl, the interface might be taken out of promisc mode. Do > you have any iptables rules running that might drop bpdus? No iptables rules az all. Btw iptables wouldn't prevent tcpdump from seeing the packets, would it? Could it be that the driver perhaps has a problem setting promisc mode when resetting the hardware? > >And even if didn't get promisc, tcpdump or tethereal would have made it > >so, when I was looking for the bpdu packets. Tethereal shows only this: > > > Thats not entirely true. ethereal/tcpdump/etc use the PF_PACKET > interface, which talks directly to the driver to set the receive list, > and doesn't always go through the device layer, which is where > IFF_PROMISC gets asserted. Agruably it should set promisc, but not > doing so doesn't prevent ethereal and friends from capturing in > promiscuous mode. Well, my point was that not even ethereal or tcpdump show the packets, not even in promisc mode... :) Greg. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 16:18 ` Gergely Madarasz @ 2005-01-10 17:40 ` Neil Horman 2005-01-10 19:11 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-10 17:40 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: > >>Gergely Madarasz wrote: >> >>>On Mon, Jan 10, 2005 at 10:05:56AM -0500, Neil Horman wrote: >>> >>> >>>>Gergely Madarasz wrote: >>>> >>>> >>>>>The bpdu's from the cisco switches simply cannot be seen on the server, >>>>>causing loops in l2 traffic. I've tested with sticking a hub between the >>>>>c3550 and the server, the switch sends out the bpdu's, but they are not >>>>>seen by linux (running tethereal). This happens only on eth0, on eth1 >>>>>everything seems fine. Any IP traffic on eth0 goes through, no packet >>>>>loss, no errors. >>>>> >>>>>And something even more strange: if I do an >>>>>ifconfig eth0 0 up; brctl addif br0 eth0; >>>>>it seems to be working fine, if I do it the other way >>>>>round, then the bpdu's sent by the switches are lost somewhere. >>>>> >>>>>Considering all these, the problem seems to me a strange interaction >>>>>between the bridge driver, the tg3 driver and the hardware in question. >>>> >>>>It looks to me like either order should work just fine, as long as the >>>>IFF_PROMISC flag isn't cleared when you bring up the interface. Is >>>>IF_PROMISC clear in ifconfig after you issue your ifconfig eth0 up >>>>command? >>> >>> >>>ifconfig has never showed PROMISC on either of my bridged servers. >>>ip shows it though. I noticed this before but couldn't find the reason and >>>didn't seem important. >>> >>>This is on a machine with working bridge: >>> >>># ifconfig eth0 >>>eth0 Link encap:Ethernet HWaddr 00:09:6B:49:89:80 >>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> ... >>># ip link list eth0 >>>2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen >>>1000 >>> link/ether 00:09:6b:49:89:80 brd ff:ff:ff:ff:ff:ff >>> >>>And this is on the problematic machine: >>> >>># ifconfig eth0 >>>eth0 Link encap:Ethernet HWaddr 00:0D:60:55:3B:02 >>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>> .... >>># ip addr list eth0 >>>2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen >>>1000 >>> link/ether 00:0d:60:55:3b:02 brd ff:ff:ff:ff:ff:ff >>> >> >>Strange. My concern was that the tg3 interface has its hardware reset >>whenever its set to be up, and part of that is a resetting of its >>receive mode. If for some reason IFF_PROMISC was cleared after you set >>it using brctl, the interface might be taken out of promisc mode. Do >>you have any iptables rules running that might drop bpdus? > > > No iptables rules az all. Btw iptables wouldn't prevent tcpdump from > seeing the packets, would it? > Could it be that the driver perhaps has a problem setting promisc mode > when resetting the hardware? > Not really sure about this. One experiment is worth a thousand guesses I suppose....... I'll try and let you know. :) > >>>And even if didn't get promisc, tcpdump or tethereal would have made it >>>so, when I was looking for the bpdu packets. Tethereal shows only this: >>> >> >>Thats not entirely true. ethereal/tcpdump/etc use the PF_PACKET >>interface, which talks directly to the driver to set the receive list, >>and doesn't always go through the device layer, which is where >>IFF_PROMISC gets asserted. Agruably it should set promisc, but not >>doing so doesn't prevent ethereal and friends from capturing in >>promiscuous mode. > > > Well, my point was that not even ethereal or tcpdump show the packets, > not even in promisc mode... :) > Good point, hadn't thought about that. Although if you start tcpdump on the interface before you issue you're brctl/ifconfig commands, you may be limiting what tcpdump will be able to see. It could be that if you are doing things that order (tcpdump, brctl, ifconfig), you could still be resetting the hardware in such a way that you're still only going to get frames bound for your MAC address and broadcasts. Neil > Greg. -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 17:40 ` Neil Horman @ 2005-01-10 19:11 ` Gergely Madarasz 2005-01-10 19:41 ` Neil Horman 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-10 19:11 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: > > > >>Strange. My concern was that the tg3 interface has its hardware reset > >>whenever its set to be up, and part of that is a resetting of its > >>receive mode. If for some reason IFF_PROMISC was cleared after you set > >>it using brctl, the interface might be taken out of promisc mode. Do > >>you have any iptables rules running that might drop bpdus? > > > > > >No iptables rules az all. Btw iptables wouldn't prevent tcpdump from > >seeing the packets, would it? > >Could it be that the driver perhaps has a problem setting promisc mode > >when resetting the hardware? > > > Not really sure about this. One experiment is worth a thousand guesses > I suppose....... I'll try and let you know. :) I did some other checks, like adding an explicit ifconfig eth0 promisc, then looking at tcpdump output - I didn't see any stray packets like I usually do, just ethernet broadcasts and unicasts to my mac, this also points to a problem that the ethernet interface is actually not in promisc, while the driver thinks it is. And it is probably not a driver-only issue. I've got older machines with tg3 running fine with bridge (with an older tg3 driver), and eth1 on the same machine also runs fine. On another machine I tested today, an IBM x326, the same thing happens - eth0 broken, eth1 fine. Would access to one of these machines help? :) Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 19:11 ` Gergely Madarasz @ 2005-01-10 19:41 ` Neil Horman 2005-01-10 19:49 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-10 19:41 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: > >>Gergely Madarasz wrote: >> >>>On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: >>> >>> >>>>Strange. My concern was that the tg3 interface has its hardware reset >>>>whenever its set to be up, and part of that is a resetting of its >>>>receive mode. If for some reason IFF_PROMISC was cleared after you set >>>>it using brctl, the interface might be taken out of promisc mode. Do >>>>you have any iptables rules running that might drop bpdus? >>> >>> >>>No iptables rules az all. Btw iptables wouldn't prevent tcpdump from >>>seeing the packets, would it? >>>Could it be that the driver perhaps has a problem setting promisc mode >>>when resetting the hardware? >>> >> >>Not really sure about this. One experiment is worth a thousand guesses >>I suppose....... I'll try and let you know. :) > > > I did some other checks, like adding an explicit ifconfig eth0 promisc, > then looking at tcpdump output - I didn't see any stray packets like I > usually do, just ethernet broadcasts and unicasts to my mac, this also > points to a problem that the ethernet interface is actually not in > promisc, while the driver thinks it is. > > And it is probably not a driver-only issue. I've got older machines with > tg3 running fine with bridge (with an older tg3 driver), and eth1 on the > same machine also runs fine. On another machine I tested today, an IBM > x326, the same thing happens - eth0 broken, eth1 fine. Would access to one > of these machines help? :) > > Greg I've got a tg3 card here. I'll try re-create it as soon as I have time. Just out of curiosity, are you running tcpdump against the physical interface or the bridged interface? I do recall some issues involving using PF_PACKET on virtual interfaces, that might cause you to not see some packets that the physical interface really does receive. Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 19:41 ` Neil Horman @ 2005-01-10 19:49 ` Gergely Madarasz 2005-01-10 19:53 ` Neil Horman 2005-01-11 4:06 ` Paul Schulz 0 siblings, 2 replies; 22+ messages in thread From: Gergely Madarasz @ 2005-01-10 19:49 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Mon, Jan 10, 2005 at 02:41:34PM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: > > > >>Gergely Madarasz wrote: > >> > >>>On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: > >>> > >>> > >>>>Strange. My concern was that the tg3 interface has its hardware reset > >>>>whenever its set to be up, and part of that is a resetting of its > >>>>receive mode. If for some reason IFF_PROMISC was cleared after you set > >>>>it using brctl, the interface might be taken out of promisc mode. Do > >>>>you have any iptables rules running that might drop bpdus? > >>> > >>> > >>>No iptables rules az all. Btw iptables wouldn't prevent tcpdump from > >>>seeing the packets, would it? > >>>Could it be that the driver perhaps has a problem setting promisc mode > >>>when resetting the hardware? > >>> > >> > >>Not really sure about this. One experiment is worth a thousand guesses > >>I suppose....... I'll try and let you know. :) > > > > > >I did some other checks, like adding an explicit ifconfig eth0 promisc, > >then looking at tcpdump output - I didn't see any stray packets like I > >usually do, just ethernet broadcasts and unicasts to my mac, this also > >points to a problem that the ethernet interface is actually not in > >promisc, while the driver thinks it is. > > > >And it is probably not a driver-only issue. I've got older machines with > >tg3 running fine with bridge (with an older tg3 driver), and eth1 on the > >same machine also runs fine. On another machine I tested today, an IBM > >x326, the same thing happens - eth0 broken, eth1 fine. Would access to one > >of these machines help? :) > > > >Greg > I've got a tg3 card here. I'll try re-create it as soon as I have time. Sounds great, but I expect it will not occur with a random tg3 card, explained above... > Just out of curiosity, are you running tcpdump against the physical > interface or the bridged interface? I do recall some issues involving > using PF_PACKET on virtual interfaces, that might cause you to not see > some packets that the physical interface really does receive. The physical interface of course. I wouldn't have a chance seeing the bpdu's on the bridge interface because they're handled by the bridge driver and not passed on :) Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 19:49 ` Gergely Madarasz @ 2005-01-10 19:53 ` Neil Horman 2005-01-10 20:09 ` Gergely Madarasz 2005-01-11 4:06 ` Paul Schulz 1 sibling, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-10 19:53 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Mon, Jan 10, 2005 at 02:41:34PM -0500, Neil Horman wrote: > >>Gergely Madarasz wrote: >> >>>On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: >>> >>> >>>>Gergely Madarasz wrote: >>>> >>>> >>>>>On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: >>>>> >>>>> >>>>> >>>>>>Strange. My concern was that the tg3 interface has its hardware reset >>>>>>whenever its set to be up, and part of that is a resetting of its >>>>>>receive mode. If for some reason IFF_PROMISC was cleared after you set >>>>>>it using brctl, the interface might be taken out of promisc mode. Do >>>>>>you have any iptables rules running that might drop bpdus? >>>>> >>>>> >>>>>No iptables rules az all. Btw iptables wouldn't prevent tcpdump from >>>>>seeing the packets, would it? >>>>>Could it be that the driver perhaps has a problem setting promisc mode >>>>>when resetting the hardware? >>>>> >>>> >>>>Not really sure about this. One experiment is worth a thousand guesses >>>>I suppose....... I'll try and let you know. :) >>> >>> >>>I did some other checks, like adding an explicit ifconfig eth0 promisc, >>>then looking at tcpdump output - I didn't see any stray packets like I >>>usually do, just ethernet broadcasts and unicasts to my mac, this also >>>points to a problem that the ethernet interface is actually not in >>>promisc, while the driver thinks it is. >>> >>>And it is probably not a driver-only issue. I've got older machines with >>>tg3 running fine with bridge (with an older tg3 driver), and eth1 on the >>>same machine also runs fine. On another machine I tested today, an IBM >>>x326, the same thing happens - eth0 broken, eth1 fine. Would access to one >>>of these machines help? :) >>> >>>Greg >> >>I've got a tg3 card here. I'll try re-create it as soon as I have time. > > > Sounds great, but I expect it will not occur with a random tg3 card, > explained above... > Mmmmm....post your lspci -vvv entry for your broken tg3 card? > >> Just out of curiosity, are you running tcpdump against the physical >>interface or the bridged interface? I do recall some issues involving >>using PF_PACKET on virtual interfaces, that might cause you to not see >>some packets that the physical interface really does receive. > > > The physical interface of course. I wouldn't have a chance seeing the > bpdu's on the bridge interface because they're handled by the bridge > driver and not passed on :) > > Greg Just making sure.....When you're car doesn't start, make sure you have gas in it before you replace the engine.... :) Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 19:53 ` Neil Horman @ 2005-01-10 20:09 ` Gergely Madarasz 2005-01-10 20:43 ` Neil Horman 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-10 20:09 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Mon, Jan 10, 2005 at 02:53:10PM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Mon, Jan 10, 2005 at 02:41:34PM -0500, Neil Horman wrote: > > > >>Gergely Madarasz wrote: > >> > >>>On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: > >>> > >>> > >>>>Gergely Madarasz wrote: > >>>> > >>>> > >>>>>On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: > >>>>> > >>>>> > >>>>> > >>>>>>Strange. My concern was that the tg3 interface has its hardware > >>>>>>reset whenever its set to be up, and part of that is a resetting of > >>>>>>its receive mode. If for some reason IFF_PROMISC was cleared after > >>>>>>you set it using brctl, the interface might be taken out of promisc > >>>>>>mode. Do you have any iptables rules running that might drop bpdus? > >>>>> > >>>>> > >>>>>No iptables rules az all. Btw iptables wouldn't prevent tcpdump from > >>>>>seeing the packets, would it? > >>>>>Could it be that the driver perhaps has a problem setting promisc mode > >>>>>when resetting the hardware? > >>>>> > >>>> > >>>>Not really sure about this. One experiment is worth a thousand guesses > >>>>I suppose....... I'll try and let you know. :) > >>> > >>> > >>>I did some other checks, like adding an explicit ifconfig eth0 promisc, > >>>then looking at tcpdump output - I didn't see any stray packets like I > >>>usually do, just ethernet broadcasts and unicasts to my mac, this also > >>>points to a problem that the ethernet interface is actually not in > >>>promisc, while the driver thinks it is. > >>> > >>>And it is probably not a driver-only issue. I've got older machines with > >>>tg3 running fine with bridge (with an older tg3 driver), and eth1 on the > >>>same machine also runs fine. On another machine I tested today, an IBM > >>>x326, the same thing happens - eth0 broken, eth1 fine. Would access to > >>>one > >>>of these machines help? :) > >>> > >>>Greg > >> > >>I've got a tg3 card here. I'll try re-create it as soon as I have time. > > > > > >Sounds great, but I expect it will not occur with a random tg3 card, > >explained above... > > > Mmmmm....post your lspci -vvv entry for your broken tg3 card? on the ibm x346: 0000:05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 01) Subsystem: IBM: Unknown device 02c6 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0, Cache Line Size: 0x10 (64 bytes) Interrupt: pin A routed to IRQ 16 Region 0: Memory at cfff0000 (64-bit, non-prefetchable) [size=64K] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: bfa4b67f4b6720bc Data: aadf Capabilities: [d0] #10 [0001] The other one looks the same, just bus 06 instead of 05 and different Region 0 and Address lines. Same problem, other machine (ibm x326): 0000:02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: IBM: Unknown device 02a6 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (16000ns min), Cache Line Size: 0x10 (64 bytes) Interrupt: pin A routed to IRQ 24 Region 0: Memory at fe010000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at fe000000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable+ DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- Address: 3aaf3ffdeb65f8f4 Data: b0db Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 20:09 ` Gergely Madarasz @ 2005-01-10 20:43 ` Neil Horman 2005-01-11 9:16 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-10 20:43 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Mon, Jan 10, 2005 at 02:53:10PM -0500, Neil Horman wrote: > >>Gergely Madarasz wrote: >> >>>On Mon, Jan 10, 2005 at 02:41:34PM -0500, Neil Horman wrote: >>> >>> >>>>Gergely Madarasz wrote: >>>> >>>> >>>>>On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: >>>>> >>>>> >>>>> >>>>>>Gergely Madarasz wrote: >>>>>> >>>>>> >>>>>> >>>>>>>On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Strange. My concern was that the tg3 interface has its hardware >>>>>>>>reset whenever its set to be up, and part of that is a resetting of >>>>>>>>its receive mode. If for some reason IFF_PROMISC was cleared after >>>>>>>>you set it using brctl, the interface might be taken out of promisc >>>>>>>>mode. Do you have any iptables rules running that might drop bpdus? >>>>>>> >>>>>>> >>>>>>>No iptables rules az all. Btw iptables wouldn't prevent tcpdump from >>>>>>>seeing the packets, would it? >>>>>>>Could it be that the driver perhaps has a problem setting promisc mode >>>>>>>when resetting the hardware? >>>>>>> >>>>>> >>>>>>Not really sure about this. One experiment is worth a thousand guesses >>>>>>I suppose....... I'll try and let you know. :) >>>>> >>>>> >>>>>I did some other checks, like adding an explicit ifconfig eth0 promisc, >>>>>then looking at tcpdump output - I didn't see any stray packets like I >>>>>usually do, just ethernet broadcasts and unicasts to my mac, this also >>>>>points to a problem that the ethernet interface is actually not in >>>>>promisc, while the driver thinks it is. >>>>> >>>>>And it is probably not a driver-only issue. I've got older machines with >>>>>tg3 running fine with bridge (with an older tg3 driver), and eth1 on the >>>>>same machine also runs fine. On another machine I tested today, an IBM >>>>>x326, the same thing happens - eth0 broken, eth1 fine. Would access to >>>>>one >>>>>of these machines help? :) >>>>> >>>>>Greg >>>> >>>>I've got a tg3 card here. I'll try re-create it as soon as I have time. >>> >>> >>>Sounds great, but I expect it will not occur with a random tg3 card, >>>explained above... >>> >> >>Mmmmm....post your lspci -vvv entry for your broken tg3 card? > > > on the ibm x346: > > 0000:05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 01) > Subsystem: IBM: Unknown device 02c6 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- > Latency: 0, Cache Line Size: 0x10 (64 bytes) > Interrupt: pin A routed to IRQ 16 > Region 0: Memory at cfff0000 (64-bit, non-prefetchable) [size=64K] > Capabilities: [48] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) > Status: D0 PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [50] Vital Product Data > Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- > Address: bfa4b67f4b6720bc Data: aadf > Capabilities: [d0] #10 [0001] > > The other one looks the same, just bus 06 instead of 05 and different > Region 0 and Address lines. > > Same problem, other machine (ibm x326): > > 0000:02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) > Subsystem: IBM: Unknown device 02a6 > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- > Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- > Latency: 64 (16000ns min), Cache Line Size: 0x10 (64 bytes) > Interrupt: pin A routed to IRQ 24 > Region 0: Memory at fe010000 (64-bit, non-prefetchable) [size=64K] > Region 2: Memory at fe000000 (64-bit, non-prefetchable) [size=64K] > Capabilities: [40] Capabilities: [48] Power Management version 2 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) > Status: D0 PME-Enable+ DSel=0 DScale=1 PME- > Capabilities: [50] Vital Product Data > Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable- > Address: 3aaf3ffdeb65f8f4 Data: b0db > > Greg Crud, you're right, I've got a completely different tg3 card. I'll still try to get a bridge set up to recreate, but it won't tell us much if the problem doesn't re-create. Can you put a printk in right before netif_receive_skb in tg3_rx to print out the source mac address of any received frames, we should be able to search for any MACs == 01-80-C2-00-00-00, and determine for sure if the hardware is eating the frame, or something else is dropping it prematurely. Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* [Bridge] Re: tg3 bridge problems 2005-01-10 20:43 ` Neil Horman @ 2005-01-11 9:16 ` Gergely Madarasz 0 siblings, 0 replies; 22+ messages in thread From: Gergely Madarasz @ 2005-01-11 9:16 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Mon, Jan 10, 2005 at 03:43:56PM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Mon, Jan 10, 2005 at 02:53:10PM -0500, Neil Horman wrote: > > > >>Gergely Madarasz wrote: > >> > >>>On Mon, Jan 10, 2005 at 02:41:34PM -0500, Neil Horman wrote: > >>>>Gergely Madarasz wrote: > >>>> > >>>>I've got a tg3 card here. I'll try re-create it as soon as I have > >>>>time. > >>> > >>> > >>>Sounds great, but I expect it will not occur with a random tg3 card, > >>>explained above... > >>> > >> > >>Mmmmm....post your lspci -vvv entry for your broken tg3 card? > > > > > >on the ibm x346: > > > >0000:05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 > >Gigabit Ethernet PCI Express (rev 01) > > ... > > > >The other one looks the same, just bus 06 instead of 05 and different > >Region 0 and Address lines. > > > >Same problem, other machine (ibm x326): > > > >0000:02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 > >Gigabit Ethernet (rev 03) > > ... > > > >Greg > > Crud, you're right, I've got a completely different tg3 card. I'll > still try to get a bridge set up to recreate, but it won't tell us much > if the problem doesn't re-create. Can you put a printk in right before > netif_receive_skb in tg3_rx to print out the source mac address of any > received frames, we should be able to search for any MACs == > 01-80-C2-00-00-00, and determine for sure if the hardware is eating the > frame, or something else is dropping it prematurely. I went for an easier way (couldn't figure printing macs fast enough :), I'm just printing out if there is a packet received in tg3_rx, and running tethereal meanwhile. After checking the output for several minutes I'm sure that every packet received by tg3_rx is accounted for in tethereal, which means the incoming bpdu's are eaten by the hardware... Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-10 19:49 ` Gergely Madarasz 2005-01-10 19:53 ` Neil Horman @ 2005-01-11 4:06 ` Paul Schulz 2005-01-11 8:52 ` Gergely Madarasz 1 sibling, 1 reply; 22+ messages in thread From: Paul Schulz @ 2005-01-11 4:06 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, Neil Horman, bridge Greetings, This may be the problem that I have seen (and reported) previously... http://oss.sgi.com/projects/netdev/archive/2004-02/msg00442.html One suggestion.. do a packet dump on an outgoing bridge port, and a dump from a transmitting machine connected to the bridge. Compare the MD5 checksums. Regards, Paul On Mon, 2005-01-10 at 20:49 +0100, Gergely Madarasz wrote: > On Mon, Jan 10, 2005 at 02:41:34PM -0500, Neil Horman wrote: > > Gergely Madarasz wrote: > > >On Mon, Jan 10, 2005 at 12:40:57PM -0500, Neil Horman wrote: > > > > > >>Gergely Madarasz wrote: > > >> > > >>>On Mon, Jan 10, 2005 at 11:04:55AM -0500, Neil Horman wrote: > > >>> > > >>> > > >>>>Strange. My concern was that the tg3 interface has its hardware reset > > >>>>whenever its set to be up, and part of that is a resetting of its > > >>>>receive mode. If for some reason IFF_PROMISC was cleared after you set > > >>>>it using brctl, the interface might be taken out of promisc mode. Do > > >>>>you have any iptables rules running that might drop bpdus? > > >And it is probably not a driver-only issue. I've got older machines with > > >tg3 running fine with bridge (with an older tg3 driver), and eth1 on the > > >same machine also runs fine. On another machine I tested today, an IBM > > >x326, the same thing happens - eth0 broken, eth1 fine. Would access to one > > >of these machines help? :) > > > > > >Greg > > I've got a tg3 card here. I'll try re-create it as soon as I have time. > > Sounds great, but I expect it will not occur with a random tg3 card, > explained above... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 4:06 ` Paul Schulz @ 2005-01-11 8:52 ` Gergely Madarasz 2005-01-11 12:36 ` Neil Horman 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-11 8:52 UTC (permalink / raw) To: Paul Schulz; +Cc: linux-net, Neil Horman, bridge On Tue, Jan 11, 2005 at 02:36:46PM +1030, Paul Schulz wrote: > Greetings, > > This may be the problem that I have seen (and reported) previously... > http://oss.sgi.com/projects/netdev/archive/2004-02/msg00442.html > > One suggestion.. do a packet dump on an outgoing bridge port, and a > dump from a transmitting machine connected to the bridge. Compare the > MD5 checksums. Thanks for the idea, but it doesn't seem to help. I've modified the patch to apply to my chipset revision (and added a debugging printk to make sure I've hit the right chipset :)), but nothing really changed. I didn't expect it to either, because this is not a transmit problem, but a promiscuous receive problem of the driver/card. Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 8:52 ` Gergely Madarasz @ 2005-01-11 12:36 ` Neil Horman 2005-01-11 12:58 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-11 12:36 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Tue, Jan 11, 2005 at 02:36:46PM +1030, Paul Schulz wrote: > >>Greetings, >> >>This may be the problem that I have seen (and reported) previously... >>http://oss.sgi.com/projects/netdev/archive/2004-02/msg00442.html >> >>One suggestion.. do a packet dump on an outgoing bridge port, and a >>dump from a transmitting machine connected to the bridge. Compare the >>MD5 checksums. > > > Thanks for the idea, but it doesn't seem to help. I've modified the patch > to apply to my chipset revision (and added a debugging printk to make sure > I've hit the right chipset :)), but nothing really changed. I didn't > expect it to either, because this is not a transmit problem, but a > promiscuous receive problem of the driver/card. > > Greg > You know, there is a tg3_dump_state function that if 0-ed out at the moment, which among other things dumps out the chips RX_MODE. You could uncomment that function and tie it to a private ioctl which you could call from user space. That way you could compare the RX_MODE values in a working and a failing environment. If they matched, you could be reasonably sure it was a hardware issue, otherwise, you would know your looking for a driver bug. Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 12:36 ` Neil Horman @ 2005-01-11 12:58 ` Gergely Madarasz 2005-01-11 13:12 ` Neil Horman 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-11 12:58 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Tue, Jan 11, 2005 at 07:36:56AM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Tue, Jan 11, 2005 at 02:36:46PM +1030, Paul Schulz wrote: > > > >>Greetings, > >> > >>This may be the problem that I have seen (and reported) previously... > >>http://oss.sgi.com/projects/netdev/archive/2004-02/msg00442.html > >> > >>One suggestion.. do a packet dump on an outgoing bridge port, and a > >>dump from a transmitting machine connected to the bridge. Compare the > >>MD5 checksums. > > > > > >Thanks for the idea, but it doesn't seem to help. I've modified the patch > >to apply to my chipset revision (and added a debugging printk to make sure > >I've hit the right chipset :)), but nothing really changed. I didn't > >expect it to either, because this is not a transmit problem, but a > >promiscuous receive problem of the driver/card. > > > >Greg > > > You know, there is a tg3_dump_state function that if 0-ed out at the > moment, which among other things dumps out the chips RX_MODE. You could > uncomment that function and tie it to a private ioctl which you could > call from user space. That way you could compare the RX_MODE values in > a working and a failing environment. If they matched, you could be > reasonably sure it was a hardware issue, otherwise, you would know your > looking for a driver bug. It seems they do not match: failing: MAC_RX_MODE[00000002] working: MAC_RX_MODE[00000102] So this would point to a driver bug. To search for that, I added a printk at each write to MAC_RX_MODE to see what is being set up. Every call was fine, the last always being 0x102. Would it be possible that the buggy hardware itself resets this register after a link change or something? The following workaround patch made the problem disappear: --- tg3.c~ 2005-01-11 12:30:21.000000000 +0100 +++ tg3.c 2005-01-11 12:30:21.000000000 +0100 @@ -2803,6 +2803,8 @@ sblk->status = SD_STATUS_UPDATED | (sblk->status & ~SD_STATUS_LINK_CHG); tg3_setup_phy(tp, 0); + tw32_f(MAC_RX_MODE, tp->rx_mode); + udelay(10); } } So if I reset the rx_mode after the card has reported a link change, promisc works fine. This workaround works on both machines, one having rev 4001 cards, the other having rev 2003's. Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 12:58 ` Gergely Madarasz @ 2005-01-11 13:12 ` Neil Horman 2005-01-11 14:07 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: Neil Horman @ 2005-01-11 13:12 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, bridge Gergely Madarasz wrote: > On Tue, Jan 11, 2005 at 07:36:56AM -0500, Neil Horman wrote: > >>Gergely Madarasz wrote: >> >>>On Tue, Jan 11, 2005 at 02:36:46PM +1030, Paul Schulz wrote: >>> >>> >>>>Greetings, >>>> >>>>This may be the problem that I have seen (and reported) previously... >>>>http://oss.sgi.com/projects/netdev/archive/2004-02/msg00442.html >>>> >>>>One suggestion.. do a packet dump on an outgoing bridge port, and a >>>>dump from a transmitting machine connected to the bridge. Compare the >>>>MD5 checksums. >>> >>> >>>Thanks for the idea, but it doesn't seem to help. I've modified the patch >>>to apply to my chipset revision (and added a debugging printk to make sure >>>I've hit the right chipset :)), but nothing really changed. I didn't >>>expect it to either, because this is not a transmit problem, but a >>>promiscuous receive problem of the driver/card. >>> >>>Greg >>> >> >>You know, there is a tg3_dump_state function that if 0-ed out at the >>moment, which among other things dumps out the chips RX_MODE. You could >> uncomment that function and tie it to a private ioctl which you could >>call from user space. That way you could compare the RX_MODE values in >>a working and a failing environment. If they matched, you could be >>reasonably sure it was a hardware issue, otherwise, you would know your >>looking for a driver bug. > > > It seems they do not match: > failing: MAC_RX_MODE[00000002] > working: MAC_RX_MODE[00000102] > > So this would point to a driver bug. To search for that, I added a printk > at each write to MAC_RX_MODE to see what is being set up. Every call was > fine, the last always being 0x102. Would it be possible that the buggy > hardware itself resets this register after a link change or something? > > The following workaround patch made the problem disappear: > > --- tg3.c~ 2005-01-11 12:30:21.000000000 +0100 > +++ tg3.c 2005-01-11 12:30:21.000000000 +0100 > @@ -2803,6 +2803,8 @@ > sblk->status = SD_STATUS_UPDATED | > (sblk->status & ~SD_STATUS_LINK_CHG); > tg3_setup_phy(tp, 0); > + tw32_f(MAC_RX_MODE, tp->rx_mode); > + udelay(10); > } > } > > > So if I reset the rx_mode after the card has reported a link change, > promisc works fine. This workaround works on both machines, one having > rev 4001 cards, the other having rev 2003's. > > Greg I do believe that tg3 driven chips reset the promisc. bit on chip reset, so it may be possible that you have found a driver bug in which the appropriate promiscuous state isn't restored after a reset. Try adding a printk to tg3_reset to see if it gets called after you follow your non-working procedure, and check to see if the promisc bit in MAC_RX_MODE gets lost. If so, I'd say thats arguably your bug. Neil -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@redhat.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 13:12 ` Neil Horman @ 2005-01-11 14:07 ` Gergely Madarasz 2005-01-11 20:19 ` dave 0 siblings, 1 reply; 22+ messages in thread From: Gergely Madarasz @ 2005-01-11 14:07 UTC (permalink / raw) To: Neil Horman; +Cc: linux-net, bridge On Tue, Jan 11, 2005 at 08:12:11AM -0500, Neil Horman wrote: > Gergely Madarasz wrote: > >On Tue, Jan 11, 2005 at 07:36:56AM -0500, Neil Horman wrote: > > > >>You know, there is a tg3_dump_state function that if 0-ed out at the > >>moment, which among other things dumps out the chips RX_MODE. You could > >>uncomment that function and tie it to a private ioctl which you could > >>call from user space. That way you could compare the RX_MODE values in > >>a working and a failing environment. If they matched, you could be > >>reasonably sure it was a hardware issue, otherwise, you would know your > >>looking for a driver bug. > > > > > >It seems they do not match: > >failing: MAC_RX_MODE[00000002] > >working: MAC_RX_MODE[00000102] > > > >So this would point to a driver bug. To search for that, I added a printk > >at each write to MAC_RX_MODE to see what is being set up. Every call was > >fine, the last always being 0x102. Would it be possible that the buggy > >hardware itself resets this register after a link change or something? > > > >The following workaround patch made the problem disappear: > > > >--- tg3.c~ 2005-01-11 12:30:21.000000000 +0100 > >+++ tg3.c 2005-01-11 12:30:21.000000000 +0100 > >@@ -2803,6 +2803,8 @@ > > sblk->status = SD_STATUS_UPDATED | > > (sblk->status & ~SD_STATUS_LINK_CHG); > > tg3_setup_phy(tp, 0); > >+ tw32_f(MAC_RX_MODE, tp->rx_mode); > >+ udelay(10); > > } > > } > > > > > >So if I reset the rx_mode after the card has reported a link change, > >promisc works fine. This workaround works on both machines, one having > >rev 4001 cards, the other having rev 2003's. > > > >Greg > > I do believe that tg3 driven chips reset the promisc. bit on chip reset, > so it may be possible that you have found a driver bug in which the > appropriate promiscuous state isn't restored after a reset. Try adding > a printk to tg3_reset to see if it gets called after you follow your > non-working procedure, and check to see if the promisc bit in > MAC_RX_MODE gets lost. If so, I'd say thats arguably your bug. I have now added quite a lot of debugging printk's to tg3.c. Here is what I see: eth2: tg3.c(tg3_reset_hw,4946) MAC_RX_MODE: 0006 eth2: tg3.c(tg3_chip_reset,3786) MAC_RX_MODE: 0006 eth2: tg3.c(tg3_chip_reset,3948) MAC_RX_MODE: 0000 eth2: tg3.c(tg3_reset_hw,5413) MAC_RX_MODE: 0002 eth2: tg3.c(tg3_reset_hw,5436) MAC_RX_MODE: 0002 eth2: tg3.c(tg3_setup_phy,2434) MAC_RX_MODE: 0002 eth2: tg3.c(tg3_phy_reset,810) MAC_RX_MODE: 0002 eth2: tg3.c(tg3_phy_reset,868) MAC_RX_MODE: 0002 eth2: tg3.c(tg3_setup_phy,2464) MAC_RX_MODE: 0002 eth2: tg3.c(__tg3_set_rx_mode,6320) MAC_RX_MODE: 0102 eth2: tg3.c(tg3_reset_hw,5530) MAC_RX_MODE: 0102 eth2: tg3.c(tg3_poll,2816) MAC_RX_MODE: 0002 that is the promisc bit is lost after leaving tg3_reset_hw, and before entering tg3_poll. Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 14:07 ` Gergely Madarasz @ 2005-01-11 20:19 ` dave 2005-01-12 9:21 ` Gergely Madarasz 0 siblings, 1 reply; 22+ messages in thread From: dave @ 2005-01-11 20:19 UTC (permalink / raw) To: gorgo; +Cc: linux-net, nhorman, bridge Greg, My understanding was that the x346 provides remote management support through an integrated IPMI system management processor (according to the x346 data sheet). I also know that the Broadcom controllers provide IPMI management support through firmware which runs on the controller. Perhaps the problem is that the IPMI firmware is resetting promiscuous mode on the MAC underneath the driver after the hardware reset occurs. Does the x346 provide any mechanisms to disable the IPMI system management processor (perhaps a BIOS setup option)? Does the problem go away when you do so? Dave > On Tue, Jan 11, 2005 at 08:12:11AM -0500, Neil Horman wrote: >> Gergely Madarasz wrote: >> >On Tue, Jan 11, 2005 at 07:36:56AM -0500, Neil Horman wrote: >> > >> >>You know, there is a tg3_dump_state function that if 0-ed out at the >> moment, which among other things dumps out the chips RX_MODE. You >> could uncomment that function and tie it to a private ioctl which >> you could call from user space. That way you could compare the >> RX_MODE values in a working and a failing environment. If they >> matched, you could be reasonably sure it was a hardware issue, >> otherwise, you would know your looking for a driver bug. >> > >> > >> >It seems they do not match: >> >failing: MAC_RX_MODE[00000002] >> >working: MAC_RX_MODE[00000102] >> > >> >So this would point to a driver bug. To search for that, I added a >> printk at each write to MAC_RX_MODE to see what is being set up. >> Every call was fine, the last always being 0x102. Would it be >> possible that the buggy hardware itself resets this register after a >> link change or something? >> > >> >The following workaround patch made the problem disappear: >> > >> >--- tg3.c~ 2005-01-11 12:30:21.000000000 +0100 >> >+++ tg3.c 2005-01-11 12:30:21.000000000 +0100 >> >@@ -2803,6 +2803,8 @@ >> > sblk->status = SD_STATUS_UPDATED | >> > (sblk->status & ~SD_STATUS_LINK_CHG); >> > tg3_setup_phy(tp, 0); >> >+ tw32_f(MAC_RX_MODE, tp->rx_mode); >> >+ udelay(10); >> > } >> > } >> > >> > >> >So if I reset the rx_mode after the card has reported a link change, >> promisc works fine. This workaround works on both machines, one >> having rev 4001 cards, the other having rev 2003's. >> > >> >Greg >> >> I do believe that tg3 driven chips reset the promisc. bit on chip >> reset, so it may be possible that you have found a driver bug in >> which the appropriate promiscuous state isn't restored after a reset. >> Try adding a printk to tg3_reset to see if it gets called after you >> follow your non-working procedure, and check to see if the promisc >> bit in >> MAC_RX_MODE gets lost. If so, I'd say thats arguably your bug. > > I have now added quite a lot of debugging printk's to tg3.c. > Here is what I see: > > eth2: tg3.c(tg3_reset_hw,4946) MAC_RX_MODE: 0006 > eth2: tg3.c(tg3_chip_reset,3786) MAC_RX_MODE: 0006 > eth2: tg3.c(tg3_chip_reset,3948) MAC_RX_MODE: 0000 > eth2: tg3.c(tg3_reset_hw,5413) MAC_RX_MODE: 0002 > eth2: tg3.c(tg3_reset_hw,5436) MAC_RX_MODE: 0002 > eth2: tg3.c(tg3_setup_phy,2434) MAC_RX_MODE: 0002 > eth2: tg3.c(tg3_phy_reset,810) MAC_RX_MODE: 0002 > eth2: tg3.c(tg3_phy_reset,868) MAC_RX_MODE: 0002 > eth2: tg3.c(tg3_setup_phy,2464) MAC_RX_MODE: 0002 > eth2: tg3.c(__tg3_set_rx_mode,6320) MAC_RX_MODE: 0102 > eth2: tg3.c(tg3_reset_hw,5530) MAC_RX_MODE: 0102 > eth2: tg3.c(tg3_poll,2816) MAC_RX_MODE: 0002 > > that is the promisc bit is lost after leaving tg3_reset_hw, and before > entering tg3_poll. > > Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-11 20:19 ` dave @ 2005-01-12 9:21 ` Gergely Madarasz 0 siblings, 0 replies; 22+ messages in thread From: Gergely Madarasz @ 2005-01-12 9:21 UTC (permalink / raw) To: dave; +Cc: linux-net, nhorman, bridge On Tue, Jan 11, 2005 at 12:19:12PM -0800, dave@randomparity.com wrote: > Greg, > > My understanding was that the x346 provides remote management support > through an integrated IPMI system management processor (according to the > x346 data sheet). I also know that the Broadcom controllers provide IPMI > management support through firmware which runs on the controller. Perhaps > the problem is that the IPMI firmware is resetting promiscuous mode on the > MAC underneath the driver after the hardware reset occurs. Sounds possible, but why would it do that? > Does the x346 provide any mechanisms to disable the IPMI system management > processor (perhaps a BIOS setup option)? Does the problem go away when > you do so? No, unfortunatelly I didn't find any way to turn it off. Greg ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Bridge] Re: tg3 bridge problems 2005-01-10 15:45 ` Gergely Madarasz 2005-01-10 16:04 ` Neil Horman @ 2005-01-10 19:34 ` Stephen Hemminger 1 sibling, 0 replies; 22+ messages in thread From: Stephen Hemminger @ 2005-01-10 19:34 UTC (permalink / raw) To: Gergely Madarasz; +Cc: linux-net, Neil Horman, bridge On Mon, 10 Jan 2005 16:45:06 +0100 Gergely Madarasz <gorgo@broadband.hu> wrote: > On Mon, Jan 10, 2005 at 10:05:56AM -0500, Neil Horman wrote: > > Gergely Madarasz wrote: > > >Hello, > > > > > >I've got a very strange problem. Lately I've been setting up my linux > > >servers for network (layer2) redundancy with a bridge interface containing > > >two ethernet interfaces connecting to two switches. So far I didn't have > > >any problems with it, but now a very strange thing happens with a new > > >server I'm installing. The server is an ibm x346 having two onboard > > >BCM5721 cards, the switches are cisco 3550, and I've tested with kernel > > >versions 2.6.10 and 2.4.28. It could be the switch? Some switches require configuring the port as monitor mode (othewise they do MAC filtering). I have a machine with: 03:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02) Subsystem: IBM: Unknown device 026f Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 28 Memory at c0200000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] PCI-X non-bridge device. Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable but haven't tried bridging with it. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-01-12 9:21 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-01-10 14:06 [Bridge] tg3 bridge problems Gergely Madarasz 2005-01-10 15:05 ` [Bridge] " Neil Horman 2005-01-10 15:45 ` Gergely Madarasz 2005-01-10 16:04 ` Neil Horman 2005-01-10 16:18 ` Gergely Madarasz 2005-01-10 17:40 ` Neil Horman 2005-01-10 19:11 ` Gergely Madarasz 2005-01-10 19:41 ` Neil Horman 2005-01-10 19:49 ` Gergely Madarasz 2005-01-10 19:53 ` Neil Horman 2005-01-10 20:09 ` Gergely Madarasz 2005-01-10 20:43 ` Neil Horman 2005-01-11 9:16 ` Gergely Madarasz 2005-01-11 4:06 ` Paul Schulz 2005-01-11 8:52 ` Gergely Madarasz 2005-01-11 12:36 ` Neil Horman 2005-01-11 12:58 ` Gergely Madarasz 2005-01-11 13:12 ` Neil Horman 2005-01-11 14:07 ` Gergely Madarasz 2005-01-11 20:19 ` dave 2005-01-12 9:21 ` Gergely Madarasz 2005-01-10 19:34 ` Stephen Hemminger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.