netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* VLAN and ARP failure on tg3 drivers
@ 2009-10-23  4:52 Gertjan Hofman
  2009-10-23  5:23 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Gertjan Hofman @ 2009-10-23  4:52 UTC (permalink / raw)
  To: netdev

Dear Kernel developers,

A couple of weeks ago we tried to migrate from a 2.6.24  kernel to a 2.6.29 kernel and noticed our VLAN application no longer works.  The problem is easy to replicate:

1. connect 2 PC's with a cross-over cable
2. set up a fixed IP address to both PC's  (say 192.168.0.[1,2])
3. create a vlan:  vconfig  add eth0 0.
4. set IP addresses on the VLAN devices  (say 192.168.1.[1,2])
5. try ping one machine from the other.

I tried to dig into the problem by using un-patched kernel.org kernels with Ubuntu .config files.  Kernels up to 2.6.26 work fine, kernels after and including 2.6.27 fail. The problem is that ARP messages are being dropped. If the ARP table is updated by hand on each machine, the communication across the VLAN works fine.

At first I thought the kernel VLAN code was the problem (we had an earlier issue with a regression in 2.6.24) but it looks like the problem is actually with the tg3 driver.  Our system uses Broadcom ethernet chips. I tried the same experiments with combination of boards that have Broadcom and none-Broadcom and the only time I see it fail is with the tg3  driver loaded.

Snooping with WireShark shows that a ARP request from the non-Broadcom machine is seen and even answered, but never appears back on the network. If the Broadcom machine orginates the ARP message, it never arrives at the destination. I tried lowering the size of the MTU to 1492 as well as giving each VLAN device a different MAC. No deal.

I tried to look at tg3 patch changes from 2.6.26 to 2.6.27 but I am not familiar enough with the Git system to extract the appropiate changes.  I am a bit surprised that I am not seeing any references to this on the web, the combination of >2.6.27 kernels, Broadcom and VLAN cant be that uncommon.

I would be happy to provide more information and to try tests if any one can suggest them.

Sincerely,

Gertjan







      

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VLAN and ARP failure on tg3 drivers
  2009-10-23  4:52 VLAN and ARP failure on tg3 drivers Gertjan Hofman
@ 2009-10-23  5:23 ` Eric Dumazet
  2009-10-23  9:12 ` Benny Amorsen
  2009-10-23 21:35 ` Matt Carlson
  2 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2009-10-23  5:23 UTC (permalink / raw)
  To: Gertjan Hofman; +Cc: netdev

Gertjan Hofman a écrit :
> Dear Kernel developers,
> 
> A couple of weeks ago we tried to migrate from a 2.6.24  kernel to a 2.6.29 kernel and noticed our VLAN application no longer works.  The problem is easy to replicate:
> 
> 1. connect 2 PC's with a cross-over cable
> 2. set up a fixed IP address to both PC's  (say 192.168.0.[1,2])
> 3. create a vlan:  vconfig  add eth0 0.
> 4. set IP addresses on the VLAN devices  (say 192.168.1.[1,2])
> 5. try ping one machine from the other.
> 
> I tried to dig into the problem by using un-patched kernel.org kernels with Ubuntu .config files.  Kernels up to 2.6.26 work fine, kernels after and including 2.6.27 fail. The problem is that ARP messages are being dropped. If the ARP table is updated by hand on each machine, the communication across the VLAN works fine.
> 
> At first I thought the kernel VLAN code was the problem (we had an earlier issue with a regression in 2.6.24) but it looks like the problem is actually with the tg3 driver.  Our system uses Broadcom ethernet chips. I tried the same experiments with combination of boards that have Broadcom and none-Broadcom and the only time I see it fail is with the tg3  driver loaded.
> 
> Snooping with WireShark shows that a ARP request from the non-Broadcom machine is seen and even answered, but never appears back on the network. If the Broadcom machine orginates the ARP message, it never arrives at the destination. I tried lowering the size of the MTU to 1492 as well as giving each VLAN device a different MAC. No deal.
> 
> I tried to look at tg3 patch changes from 2.6.26 to 2.6.27 but I am not familiar enough with the Git system to extract the appropiate changes.  I am a bit surprised that I am not seeing any references to this on the web, the combination of >2.6.27 kernels, Broadcom and VLAN cant be that uncommon.
> 
> I would be happy to provide more information and to try tests if any one can suggest them.
> 
> Sincerely,
> 
> Gertjan

Hello Gertjan

I'll take a look at this problem and try to reproduce it, but I use VLAN + tg3 +
 bonding without noticing a regression yet.

Only difference is I use "ip link add link" command to setup VLANS, not vconfig,
a bit deprecated.

Could you try something like this setup 


ip link set eth1 up

ip link add link eth1 vlan.103 type vlan id 103
ip addr add 192.168.20.110/24 dev vlan.103
ip link set vlan.103 up



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VLAN and ARP failure on tg3 drivers
  2009-10-23  4:52 VLAN and ARP failure on tg3 drivers Gertjan Hofman
  2009-10-23  5:23 ` Eric Dumazet
@ 2009-10-23  9:12 ` Benny Amorsen
  2009-10-23 21:35 ` Matt Carlson
  2 siblings, 0 replies; 7+ messages in thread
From: Benny Amorsen @ 2009-10-23  9:12 UTC (permalink / raw)
  To: Gertjan Hofman; +Cc: netdev

Gertjan Hofman <gertjan_hofman@yahoo.com> writes:

> Dear Kernel developers,
>
> A couple of weeks ago we tried to migrate from a 2.6.24  kernel to a
> 2.6.29 kernel and noticed our VLAN application no longer works.  The
> problem is easy to replicate:
>
> 1. connect 2 PC's with a cross-over cable
> 2. set up a fixed IP address to both PC's  (say 192.168.0.[1,2])
> 3. create a vlan:  vconfig  add eth0 0.

VLAN 0 is a special case ("priority tag only"). If you're just trying to
use VLAN's, pick a different number. Also avoid 1001-1024 and 4095 if
you add switches to the mix, because some vendors have odd ideas.


/Benny


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VLAN and ARP failure on tg3 drivers
  2009-10-23  4:52 VLAN and ARP failure on tg3 drivers Gertjan Hofman
  2009-10-23  5:23 ` Eric Dumazet
  2009-10-23  9:12 ` Benny Amorsen
@ 2009-10-23 21:35 ` Matt Carlson
  2 siblings, 0 replies; 7+ messages in thread
From: Matt Carlson @ 2009-10-23 21:35 UTC (permalink / raw)
  To: Gertjan Hofman; +Cc: netdev@vger.kernel.org

On Thu, Oct 22, 2009 at 09:52:42PM -0700, Gertjan Hofman wrote:
> Dear Kernel developers,
> 
> A couple of weeks ago we tried to migrate from a 2.6.24? kernel to a 2.6.29 kernel and noticed our VLAN application no longer works.? The problem is easy to replicate:
> 
> 1. connect 2 PC's with a cross-over cable
> 2. set up a fixed IP address to both PC's? (say 192.168.0.[1,2])
> 3. create a vlan:? vconfig? add eth0 0.
> 4. set IP addresses on the VLAN devices? (say 192.168.1.[1,2])
> 5. try ping one machine from the other.
> 
> I tried to dig into the problem by using un-patched kernel.org kernels with Ubuntu .config files.? Kernels up to 2.6.26 work fine, kernels after and including 2.6.27 fail. The problem is that ARP messages are being dropped. If the ARP table is updated by hand on each machine, the communication across the VLAN works fine.
> 
> At first I thought the kernel VLAN code was the problem (we had an earlier issue with a regression in 2.6.24) but it looks like the problem is actually with the tg3 driver.? Our system uses Broadcom ethernet chips. I tried the same experiments with combination of boards that have Broadcom and none-Broadcom and the only time I see it fail is with the tg3? driver loaded.
> 
> Snooping with WireShark shows that a ARP request from the non-Broadcom machine is seen and even answered, but never appears back on the network. If the Broadcom machine orginates the ARP message, it never arrives at the destination. I tried lowering the size of the MTU to 1492 as well as giving each VLAN device a different MAC. No deal.
> 
> I tried to look at tg3 patch changes from 2.6.26 to 2.6.27 but I am not familiar enough with the Git system to extract the appropiate changes.? I am a bit surprised that I am not seeing any references to this on the web, the combination of >2.6.27 kernels, Broadcom and VLAN cant be that uncommon.
> 
> I would be happy to provide more information and to try tests if any one can suggest them.
> 
> Sincerely,
> 
> Gertjan

I don't see any reason why your setup should fail, but it doesn't hurt
to gather more info about the problem.

What device are you experiencing this problem with?  Is management
firmware enabled?  (`ethtool -i ethx`)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VLAN and ARP failure on tg3 drivers
@ 2009-10-26  4:30 Gertjan Hofman
  2009-10-26  8:20 ` Benny Amorsen
  0 siblings, 1 reply; 7+ messages in thread
From: Gertjan Hofman @ 2009-10-26  4:30 UTC (permalink / raw)
  To: Matt Carlson; +Cc: netdev@vger.kernel.org, Eric Dumazet, Benny Amorsen

Dear Matt, Eric, Benny,

Sorry about the slow response to your fast replies. I think Benny is correct, the 'problem' lies in the fact that we were using a VLAN ID of 0, without knowing its special significance. User error.

I tested it with other VLAN id's (>0) and it appears to work fine. We are not entirely sure we understand  why it used to work with VLAN ID 0 on the Broadcom chips and still does with a number of different cards (with >2.6.27 kernels).  What is the 'correct' behaviour for this incorrect usage ? When a PC returns the ARP response to the machine with the BroadCom card, it will have the destination address of the VLAN device, but presumably the VLAN ID tag set to zero.  Hmmm. I can live with not knowing the answer I guess.


Thanks again,

Gertjan



 

--- On Fri, 10/23/09, Matt Carlson <mcarlson@broadcom.com> wrote:

> From: Matt Carlson <mcarlson@broadcom.com>
> Subject: Re: VLAN and ARP failure on tg3 drivers
> To: "Gertjan Hofman" <gertjan_hofman@yahoo.com>
> Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
> Date: Friday, October 23, 2009, 3:35 PM
> On Thu, Oct 22, 2009 at 09:52:42PM
> -0700, Gertjan Hofman wrote:
> > Dear Kernel developers,
> > 
> > A couple of weeks ago we tried to migrate from a
> 2.6.24? kernel to a 2.6.29 kernel and noticed our VLAN
> application no longer works.? The problem is easy to
> replicate:
> > 
> > 1. connect 2 PC's with a cross-over cable
> > 2. set up a fixed IP address to both PC's? (say
> 192.168.0.[1,2])
> > 3. create a vlan:? vconfig? add eth0 0.
> > 4. set IP addresses on the VLAN devices? (say
> 192.168.1.[1,2])
> > 5. try ping one machine from the other.
> > 
> > I tried to dig into the problem by using un-patched
> kernel.org kernels with Ubuntu .config files.? Kernels up to
> 2.6.26 work fine, kernels after and including 2.6.27 fail.
> The problem is that ARP messages are being dropped. If the
> ARP table is updated by hand on each machine, the
> communication across the VLAN works fine.
> > 
> > At first I thought the kernel VLAN code was the
> problem (we had an earlier issue with a regression in
> 2.6.24) but it looks like the problem is actually with the
> tg3 driver.? Our system uses Broadcom ethernet chips. I
> tried the same experiments with combination of boards that
> have Broadcom and none-Broadcom and the only time I see it
> fail is with the tg3? driver loaded.
> > 
> > Snooping with WireShark shows that a ARP request from
> the non-Broadcom machine is seen and even answered, but
> never appears back on the network. If the Broadcom machine
> orginates the ARP message, it never arrives at the
> destination. I tried lowering the size of the MTU to 1492 as
> well as giving each VLAN device a different MAC. No deal.
> > 
> > I tried to look at tg3 patch changes from 2.6.26 to
> 2.6.27 but I am not familiar enough with the Git system to
> extract the appropiate changes.? I am a bit surprised that I
> am not seeing any references to this on the web, the
> combination of >2.6.27 kernels, Broadcom and VLAN cant be
> that uncommon.
> > 
> > I would be happy to provide more information and to
> try tests if any one can suggest them.
> > 
> > Sincerely,
> > 
> > Gertjan
> 
> I don't see any reason why your setup should fail, but it
> doesn't hurt
> to gather more info about the problem.
> 
> What device are you experiencing this problem with? 
> Is management
> firmware enabled?  (`ethtool -i ethx`)
> 
>






      

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VLAN and ARP failure on tg3 drivers
  2009-10-26  4:30 Gertjan Hofman
@ 2009-10-26  8:20 ` Benny Amorsen
  2009-10-26  8:54   ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Benny Amorsen @ 2009-10-26  8:20 UTC (permalink / raw)
  To: Gertjan Hofman; +Cc: Matt Carlson, netdev@vger.kernel.org, Eric Dumazet

Gertjan Hofman <gertjan_hofman@yahoo.com> writes:

> Dear Matt, Eric, Benny,
>
> Sorry about the slow response to your fast replies. I think Benny is
> correct, the 'problem' lies in the fact that we were using a VLAN ID
> of 0, without knowing its special significance. User error.
>
> I tested it with other VLAN id's (>0) and it appears to work fine. We
> are not entirely sure we understand  why it used to work with VLAN ID
> 0 on the Broadcom chips and still does with a number of different
> cards (with >2.6.27 kernels).  What is the 'correct' behaviour for
> this incorrect usage ?

VLAN 0 isn't incorrect, it's just surprising. When you send a packet
tagged with VLAN 0, it means that the packet should be interpreted as
being the same VLAN as a completely untagged packet.

So in theory, if both ends are using VLAN 0 and you aren't using eth0
for anything, traffic should flow, at least if both ends are on the same
kernel version. Feel free to debug why that isn't the case for you, of
course...


/Benny


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VLAN and ARP failure on tg3 drivers
  2009-10-26  8:20 ` Benny Amorsen
@ 2009-10-26  8:54   ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2009-10-26  8:54 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: Gertjan Hofman, Matt Carlson, netdev@vger.kernel.org

Benny Amorsen a écrit :
> Gertjan Hofman <gertjan_hofman@yahoo.com> writes:
> 
>> Dear Matt, Eric, Benny,
>>
>> Sorry about the slow response to your fast replies. I think Benny is
>> correct, the 'problem' lies in the fact that we were using a VLAN ID
>> of 0, without knowing its special significance. User error.
>>
>> I tested it with other VLAN id's (>0) and it appears to work fine. We
>> are not entirely sure we understand  why it used to work with VLAN ID
>> 0 on the Broadcom chips and still does with a number of different
>> cards (with >2.6.27 kernels).  What is the 'correct' behaviour for
>> this incorrect usage ?
> 
> VLAN 0 isn't incorrect, it's just surprising. When you send a packet
> tagged with VLAN 0, it means that the packet should be interpreted as
> being the same VLAN as a completely untagged packet.
> 
> So in theory, if both ends are using VLAN 0 and you aren't using eth0
> for anything, traffic should flow, at least if both ends are on the same
> kernel version. Feel free to debug why that isn't the case for you, of
> course...
> 

VLAN id 0 is not usable on current kernel because we use 16 bits in skb to
 store vlan_tci, and vlan_tci = 0 means there is no VLAN tagging.


We could use high order bit (0x8000) to tell if vlan tagging is set or not.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-10-26  8:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-23  4:52 VLAN and ARP failure on tg3 drivers Gertjan Hofman
2009-10-23  5:23 ` Eric Dumazet
2009-10-23  9:12 ` Benny Amorsen
2009-10-23 21:35 ` Matt Carlson
  -- strict thread matches above, loose matches on Subject: below --
2009-10-26  4:30 Gertjan Hofman
2009-10-26  8:20 ` Benny Amorsen
2009-10-26  8:54   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).