* VLAN & ARP requests fail for ARM EABI (2.6.24)
@ 2008-04-09 22:06 Gertjan Hofman
2008-04-10 0:40 ` Patrick McHardy
0 siblings, 1 reply; 4+ messages in thread
From: Gertjan Hofman @ 2008-04-09 22:06 UTC (permalink / raw)
To: netdev
Dear Sirs,
Since the VLAN mailing list is closed, its author suggested I post here.
We have an ARM920T processor based system. When compiling the kernel 2.6.24 using OABI (and appropiate 4.1.1 cross toolchain), VLAN functionality is fine. When setting the CONFIG_EABI flag and using the 4.2.2 toolchain (created by the OpenEmbedded project) a VLAN device fails to respond.
When pinging through the ARM VLAN device to a (PC based) VLAN device, the following is seen in the vlan driver:
The ping request is sent out, followed by an ARP request. The PC returns the ARP reply and it is seen by the VLAN driver (vlan_skb_recv) which calls netif_rx(). This repeats a couple of pings later i.e. the arp reply is not used or received properly.
Similarly, when pinging from the PC, the ARP request is seen by vlan_skb_recv() but there is no ARP reply from the ARM cascading through the vlan driver.
It seems to me that either the issue is with the code that handles the ARP request when compiling in EABI format, or that VLAN doesnt process the frame properly and sends it on incorrectly. Recompile the kernel with OABI and everything is fine.
Note that communication works fine on either OABI or EABI when using 'normal' devices (eth0 etc). This puts the suspicion back on vlan.
Since EABI changes structure packing and other things, I suspect the cause is some networking code that knows a bit too much about its size & packing.
I am happy to troubleshoot, but I am no kernel expert. Tips would be appreciated. Like how to dump the sbk buffer in both cases..
Sincerely,
Gertjan
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: VLAN & ARP requests fail for ARM EABI (2.6.24)
2008-04-09 22:06 VLAN & ARP requests fail for ARM EABI (2.6.24) Gertjan Hofman
@ 2008-04-10 0:40 ` Patrick McHardy
0 siblings, 0 replies; 4+ messages in thread
From: Patrick McHardy @ 2008-04-10 0:40 UTC (permalink / raw)
To: Gertjan Hofman; +Cc: netdev
Gertjan Hofman wrote:
> Dear Sirs,
>
> Since the VLAN mailing list is closed, its author suggested I post here.
> We have an ARM920T processor based system. When compiling the kernel 2.6.24 using OABI (and appropiate 4.1.1 cross toolchain), VLAN functionality is fine. When setting the CONFIG_EABI flag and using the 4.2.2 toolchain (created by the OpenEmbedded project) a VLAN device fails to respond.
>
> When pinging through the ARM VLAN device to a (PC based) VLAN device, the following is seen in the vlan driver:
> The ping request is sent out, followed by an ARP request. The PC returns the ARP reply and it is seen by the VLAN driver (vlan_skb_recv) which calls netif_rx(). This repeats a couple of pings later i.e. the arp reply is not used or received properly.
>
> Similarly, when pinging from the PC, the ARP request is seen by vlan_skb_recv() but there is no ARP reply from the ARM cascading through the vlan driver.
>
> It seems to me that either the issue is with the code that handles the ARP request when compiling in EABI format, or that VLAN doesnt process the frame properly and sends it on incorrectly. Recompile the kernel with OABI and everything is fine.
>
> Note that communication works fine on either OABI or EABI when using 'normal' devices (eth0 etc). This puts the suspicion back on vlan.
>
>
> Since EABI changes structure packing and other things, I suspect the cause is some networking code that knows a bit too much about its size & packing.
>
> I am happy to troubleshoot, but I am no kernel expert. Tips would be appreciated. Like how to dump the sbk buffer in both cases..
I actually have no idea about the differences between
OABI and EABI, but I know a mix of both broke some
iptables setups (kernel EABI/userspace OABI or something
like that). Could you fetch the latest iproute and try
again with adding your VLANs using iproute?
The syntax is:
ip link add link <lowerdev> [name] <name> type vlan id VID
If that works the problem is most likely an inappropriate
ABI mix.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: VLAN & ARP requests fail for ARM EABI (2.6.24)
@ 2008-04-12 16:58 Gertjan Hofman
0 siblings, 0 replies; 4+ messages in thread
From: Gertjan Hofman @ 2008-04-12 16:58 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netdev
Patrick,
Ben mentioned you might be the person to talk to. Just to make sure I did what you suggested:
From: http://devresources.linux-foundation.org/dev/iproute2/download/ I downloaded:
iproute2-2.6.24-rc7.tar.bz2 08-Jan-2008 09:06 336K and cross compiled EABI.
I created the VLAN with:
./ip link add link eth0 eth0.0 type vlan id 0 (did I get the syntax correct ?)
/proc/net/vlan/ indicated eth0.0 is there and looks fine.
Unfortunately pinging through a VLAN to this VLAN fails as before withthe same symptoms - ARP requests are received but not answered.
About OABI/EABI incompatibilities - I didnt explicitly mention it but when testing the EABI, the entire file system is EABI and when testingOABI the entire filesystem is also OABI - so it should not be theproblem.
We spent quite of bit of time tracking this problem deeper down thestack but with limited results. It looks like the calling sequence is:
driver-->
-- ?
- ---> vlan.c
---> ifnet_tx
---> ?
----> arp.c
---> (arp_process)
----> ip_route_input
----> ip_route_input_slow
----> fib_validate_source
Its in fib_validate_source that things go wrong.
In the EABI (faulty kernel), we print values of the device pointers, which are considered in fib_validate_source()
FIB_RES_DEV(res) : 0xC3C77000
dev : 0xC3E2E800
These are not the same, so the variable rpf is checked and it bails returning -EINVAL. You can fake it, by setting rpf=0 using echo 0> /proc/sys/net/ipv4/conf/eth2.0/rp_filter --> 0 and then pingsfrom the foreign PC to the ARM work. Still, pings from ARM to PC dontwork - the ARP request goes out, but the response (which gets to arp.c)is ignored. Presumable for a similar reason - some device pointer check fails.
My guess is that there is a problem with the dev pointer all the wayback in the vlan.c code, which only manifest itself with the EABIcompiler.
If you run the working kernel version, in fib_validate_source:
if (in_dev) {
no_addr = in_dev->ifa_list == NULL;
rpf = IN_DEV_RPFILTER(in_dev); ----> rpfreturns 0 here eventhough the proc/sys/net/ipv4/conf/eth2.0/rp_filteris set to 1.
if (DEBUG_XXX == 0xDEADBEEF)
printk(KERN_INFO "*********rpf = 0x%X\n", rpf);
}
If EABI rpf =1 , in OABI rpf=0. So there is something different about the in_dev. pointer
Do you know what IN_DEV_RPFILTER(in_dev) does exactly ?
I think I need to check the validity of the device pointer already at the VLAN level, but I am not sure how to do this. Any tips ?
Thanks
Gertjan
----- Original Message ----
From: Patrick McHardy <kaber@trash.net>
To: Gertjan Hofman <gertjan_hofman@yahoo.com>
Cc: netdev@vger.kernel.org
Sent: Wednesday, April 9, 2008 5:40:45 PM
Subject: Re: VLAN & ARP requests fail for ARM EABI (2.6.24)
Gertjan Hofman wrote:
> Dear Sirs,
>
> Since the VLAN mailing list is closed, its author suggested I post here.
> We have an ARM920T processor based system. When compiling the kernel 2.6.24 using OABI (and appropiate 4.1.1 cross toolchain), VLAN functionality is fine. When setting the CONFIG_EABI flag and using the 4.2.2 toolchain (created by the OpenEmbedded project) a VLAN device fails to respond.
>
> When pinging through the ARM VLAN device to a (PC based) VLAN device, the following is seen in the vlan driver:
> The ping request is sent out, followed by an ARP request. The PC returns the ARP reply and it is seen by the VLAN driver (vlan_skb_recv) which calls netif_rx(). This repeats a couple of pings later i.e. the arp reply is not used or received properly.
>
> Similarly, when pinging from the PC, the ARP request is seen by vlan_skb_recv() but there is no ARP reply from the ARM cascading through the vlan driver.
>
> It seems to me that either the issue is with the code that handles the ARP request when compiling in EABI format, or that VLAN doesnt process the frame properly and sends it on incorrectly. Recompile the kernel with OABI and everything is fine.
>
> Note that communication works fine on either OABI or EABI when using 'normal' devices (eth0 etc). This puts the suspicion back on vlan.
>
>
> Since EABI changes structure packing and other things, I suspect the cause is some networking code that knows a bit too much about its size & packing.
>
> I am happy to troubleshoot, but I am no kernel expert. Tips would be appreciated. Like how to dump the sbk buffer in both cases..
I actually have no idea about the differences between
OABI and EABI, but I know a mix of both broke some
iptables setups (kernel EABI/userspace OABI or something
like that). Could you fetch the latest iproute and try
again with adding your VLANs using iproute?
The syntax is:
ip link add link <lowerdev> [name] <name> type vlan id VID
If that works the problem is most likely an inappropriate
ABI mix.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: VLAN & ARP requests fail for ARM EABI (2.6.24)
@ 2008-09-23 16:34 Gertjan Hofman
0 siblings, 0 replies; 4+ messages in thread
From: Gertjan Hofman @ 2008-09-23 16:34 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netdev
This e-mail is for completeness only and to stop anyone from wrongly going down this debugging route
The ARM EABI/OABI VLAN & ARP bug discussed was real - however, it was also resolved.
A new multicast address structure had been introduced without proper initialization. See
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=12aa343add3eced38a44bdb612b35fdf634d918c
Not entirely sure why this happened to cause issue only with EABI compilers, but it did.
Unfortunately, the 3 months when this bug existed in 2.6.24 was exactly the time we froze our kernel. Perhaps our fault - I should have included patches as they came out
Cheers
G
--- On Sat, 4/12/08, Gertjan Hofman <gertjan_hofman@yahoo.com> wrote:
> From: Gertjan Hofman <gertjan_hofman@yahoo.com>
> Subject: Re: VLAN & ARP requests fail for ARM EABI (2.6.24)
> To: "Patrick McHardy" <kaber@trash.net>
> Cc: netdev@vger.kernel.org
> Date: Saturday, April 12, 2008, 10:58 AM
> Patrick,
>
> Ben mentioned you might be the person to talk to. Just to
> make sure I did what you suggested:
>
> From:
> http://devresources.linux-foundation.org/dev/iproute2/download/
> I downloaded:
> iproute2-2.6.24-rc7.tar.bz2 08-Jan-2008 09:06 336K
> and cross compiled EABI.
>
> I created the VLAN with:
>
> ./ip link add link eth0 eth0.0 type vlan id 0 (did I
> get the syntax correct ?)
>
> /proc/net/vlan/ indicated eth0.0 is there and looks fine.
>
>
> Unfortunately pinging through a VLAN to this VLAN fails as
> before withthe same symptoms - ARP requests are received but
> not answered.
>
> About OABI/EABI incompatibilities - I didnt explicitly
> mention it but when testing the EABI, the entire file system
> is EABI and when testingOABI the entire filesystem is also
> OABI - so it should not be theproblem.
>
>
> We spent quite of bit of time tracking this problem deeper
> down thestack but with limited results. It looks like the
> calling sequence is:
>
> driver-->
> -- ?
> - ---> vlan.c
> ---> ifnet_tx
> ---> ?
> ----> arp.c
> ---> (arp_process)
> ----> ip_route_input
> ----> ip_route_input_slow
> ----> fib_validate_source
>
>
> Its in fib_validate_source that things go wrong.
>
> In the EABI (faulty kernel), we print values of the device
> pointers, which are considered in fib_validate_source()
> FIB_RES_DEV(res) : 0xC3C77000
> dev : 0xC3E2E800
>
> These are not the same, so the variable rpf is checked and
> it bails returning -EINVAL. You can fake it, by setting
> rpf=0 using echo 0>
> /proc/sys/net/ipv4/conf/eth2.0/rp_filter --> 0 and then
> pingsfrom the foreign PC to the ARM work. Still, pings from
> ARM to PC dontwork - the ARP request goes out, but the
> response (which gets to arp.c)is ignored. Presumable for a
> similar reason - some device pointer check fails.
>
> My guess is that there is a problem with the dev pointer
> all the wayback in the vlan.c code, which only manifest
> itself with the EABIcompiler.
> If you run the working kernel version, in
> fib_validate_source:
>
> if (in_dev) {
> no_addr = in_dev->ifa_list == NULL;
> rpf = IN_DEV_RPFILTER(in_dev); ---->
> rpfreturns 0 here eventhough the
> proc/sys/net/ipv4/conf/eth2.0/rp_filteris set to 1.
>
> if (DEBUG_XXX == 0xDEADBEEF)
> printk(KERN_INFO "*********rpf =
> 0x%X\n", rpf);
> }
>
>
> If EABI rpf =1 , in OABI rpf=0. So there is something
> different about the in_dev. pointer
>
> Do you know what IN_DEV_RPFILTER(in_dev) does exactly ?
>
> I think I need to check the validity of the device pointer
> already at the VLAN level, but I am not sure how to do this.
> Any tips ?
>
> Thanks
>
> Gertjan
>
>
>
>
>
>
> ----- Original Message ----
> From: Patrick McHardy <kaber@trash.net>
> To: Gertjan Hofman <gertjan_hofman@yahoo.com>
> Cc: netdev@vger.kernel.org
> Sent: Wednesday, April 9, 2008 5:40:45 PM
> Subject: Re: VLAN & ARP requests fail for ARM EABI
> (2.6.24)
>
> Gertjan Hofman wrote:
> > Dear Sirs,
> >
> > Since the VLAN mailing list is closed, its author
> suggested I post here.
> > We have an ARM920T processor based system. When
> compiling the kernel 2.6.24 using OABI (and appropiate 4.1.1
> cross toolchain), VLAN functionality is fine. When setting
> the CONFIG_EABI flag and using the 4.2.2 toolchain (created
> by the OpenEmbedded project) a VLAN device fails to respond.
> >
> > When pinging through the ARM VLAN device to a (PC
> based) VLAN device, the following is seen in the vlan
> driver:
> > The ping request is sent out, followed by an ARP
> request. The PC returns the ARP reply and it is seen by the
> VLAN driver (vlan_skb_recv) which calls netif_rx(). This
> repeats a couple of pings later i.e. the arp reply is not
> used or received properly.
> >
> > Similarly, when pinging from the PC, the ARP request
> is seen by vlan_skb_recv() but there is no ARP reply from
> the ARM cascading through the vlan driver.
> >
> > It seems to me that either the issue is with the code
> that handles the ARP request when compiling in EABI format,
> or that VLAN doesnt process the frame properly and sends it
> on incorrectly. Recompile the kernel with OABI and
> everything is fine.
> >
> > Note that communication works fine on either OABI or
> EABI when using 'normal' devices (eth0 etc). This
> puts the suspicion back on vlan.
> >
> >
> > Since EABI changes structure packing and other things,
> I suspect the cause is some networking code that knows a bit
> too much about its size & packing.
> >
> > I am happy to troubleshoot, but I am no kernel expert.
> Tips would be appreciated. Like how to dump the sbk buffer
> in both cases..
>
>
> I actually have no idea about the differences between
> OABI and EABI, but I know a mix of both broke some
> iptables setups (kernel EABI/userspace OABI or something
> like that). Could you fetch the latest iproute and try
> again with adding your VLANs using iproute?
>
> The syntax is:
>
> ip link add link <lowerdev> [name] <name> type
> vlan id VID
>
> If that works the problem is most likely an inappropriate
> ABI mix.
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection
> around
> http://mail.yahoo.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-09-23 16:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-09 22:06 VLAN & ARP requests fail for ARM EABI (2.6.24) Gertjan Hofman
2008-04-10 0:40 ` Patrick McHardy
-- strict thread matches above, loose matches on Subject: below --
2008-04-12 16:58 Gertjan Hofman
2008-09-23 16:34 Gertjan Hofman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).