* Weird DHCP related problems with net-next
@ 2015-06-09 18:54 Florian Fainelli
2015-06-09 19:22 ` Andrew Lunn
0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2015-06-09 18:54 UTC (permalink / raw)
To: Netdev, Eric Dumazet
Hi,
I am observing a strange problem on net-next (not observed with net,
bisection in progress) where the initial DHCP configuration using
busybox's udhcpc is able to configure the local interface address and
DNS serer, but not the default gateway. Restarting udhcpc a second time
does not exhibit this problem.
This is a system using SYSTEMPORT and the DSA SF2 switch but I could
also observe it on systems using the BCMGENET and Asix USB 2.0 network
drivers (all in tree).
Are there any netlink/packet/rhashtable changes in net-next, but not in
net that could explain that? Note that this is a system with very low
entropy on boot? Making the DHCP client more verbose or stracing it did
not expose the issue as frequently, so this might be some sort of race
condition.
Any hints appreciated!
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird DHCP related problems with net-next
2015-06-09 18:54 Weird DHCP related problems with net-next Florian Fainelli
@ 2015-06-09 19:22 ` Andrew Lunn
2015-06-09 20:31 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Lunn @ 2015-06-09 19:22 UTC (permalink / raw)
To: Florian Fainelli; +Cc: Netdev, Eric Dumazet
On Tue, Jun 09, 2015 at 11:54:50AM -0700, Florian Fainelli wrote:
> Hi,
>
> I am observing a strange problem on net-next (not observed with net,
> bisection in progress) where the initial DHCP configuration using
> busybox's udhcpc is able to configure the local interface address and
> DNS serer, but not the default gateway. Restarting udhcpc a second time
> does not exhibit this problem.
Hi Florian
I've seen something similar, but different, again with DSA involved,
on a WiFi Access point. I have debian, and i'm using isc dhcp. It gets
an address, sets the address on the interface, but does not add the
interface route to the routing table. Not sure about default route, i
would have to go check that.
Might be related, might not be.
Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird DHCP related problems with net-next
2015-06-09 19:22 ` Andrew Lunn
@ 2015-06-09 20:31 ` Florian Fainelli
2015-06-09 21:42 ` Andrew Lunn
2015-06-10 0:12 ` Florian Fainelli
0 siblings, 2 replies; 7+ messages in thread
From: Florian Fainelli @ 2015-06-09 20:31 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Netdev, Scott Feldman, Jiri Pirko
Hi Andrew,
On 09/06/15 12:22, Andrew Lunn wrote:
> On Tue, Jun 09, 2015 at 11:54:50AM -0700, Florian Fainelli wrote:
>> Hi,
>>
>> I am observing a strange problem on net-next (not observed with net,
>> bisection in progress) where the initial DHCP configuration using
>> busybox's udhcpc is able to configure the local interface address and
>> DNS serer, but not the default gateway. Restarting udhcpc a second time
>> does not exhibit this problem.
>
> Hi Florian
>
> I've seen something similar, but different, again with DSA involved,
> on a WiFi Access point. I have debian, and i'm using isc dhcp. It gets
> an address, sets the address on the interface, but does not add the
> interface route to the routing table. Not sure about default route, i
> would have to go check that.
Interesting, did you also observe this with 'net', or just with 'net-next'?
Contrary to what I reported above, this is only an issue with
SYSTEMPORT/DSA/SF2, I could not reproduce this GENET or the Asix driver,
I was just conflating two different systems here.
My bisection seems to point at this commit:
58c2cb16b116d7feace621bd6b647bbabacfa225 ("switchdev: convert
fib_ipv4_add/del over to switchdev_port_obj_add/del")
And indeed, hacking a bit the kernel to remove the SWITCHDEV/DSA
dependencies to leave just DSA makes thing work again.
Scott, Jiri, any clues? I can instrument the kernel a bit more to help
find what is the problem here. Note that I am observing this on ARM
(Andrew probably is as well), where uninitialized stack variables are
potentially garbage.
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird DHCP related problems with net-next
2015-06-09 20:31 ` Florian Fainelli
@ 2015-06-09 21:42 ` Andrew Lunn
2015-06-10 0:12 ` Florian Fainelli
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Lunn @ 2015-06-09 21:42 UTC (permalink / raw)
To: Florian Fainelli; +Cc: Netdev, Scott Feldman, Jiri Pirko
On Tue, Jun 09, 2015 at 01:31:31PM -0700, Florian Fainelli wrote:
> Hi Andrew,
>
> On 09/06/15 12:22, Andrew Lunn wrote:
> > On Tue, Jun 09, 2015 at 11:54:50AM -0700, Florian Fainelli wrote:
> >> Hi,
> >>
> >> I am observing a strange problem on net-next (not observed with net,
> >> bisection in progress) where the initial DHCP configuration using
> >> busybox's udhcpc is able to configure the local interface address and
> >> DNS serer, but not the default gateway. Restarting udhcpc a second time
> >> does not exhibit this problem.
> >
> > Hi Florian
> >
> > I've seen something similar, but different, again with DSA involved,
> > on a WiFi Access point. I have debian, and i'm using isc dhcp. It gets
> > an address, sets the address on the interface, but does not add the
> > interface route to the routing table. Not sure about default route, i
> > would have to go check that.
>
> Interesting, did you also observe this with 'net', or just with 'net-next'?
Just net-next, for some value of net-next. But i've not used net much,
since i see this problem during developing new features.
> Contrary to what I reported above, this is only an issue with
> SYSTEMPORT/DSA/SF2, I could not reproduce this GENET or the Asix driver,
> Note that I am observing this on ARM (Andrew probably is as well),
Yep.
Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird DHCP related problems with net-next
2015-06-09 20:31 ` Florian Fainelli
2015-06-09 21:42 ` Andrew Lunn
@ 2015-06-10 0:12 ` Florian Fainelli
2015-06-10 21:44 ` Scott Feldman
1 sibling, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2015-06-10 0:12 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Netdev, Scott Feldman, Jiri Pirko, davem
On 09/06/15 13:31, Florian Fainelli wrote:
> Hi Andrew,
>
> On 09/06/15 12:22, Andrew Lunn wrote:
>> On Tue, Jun 09, 2015 at 11:54:50AM -0700, Florian Fainelli wrote:
>>> Hi,
>>>
>>> I am observing a strange problem on net-next (not observed with net,
>>> bisection in progress) where the initial DHCP configuration using
>>> busybox's udhcpc is able to configure the local interface address and
>>> DNS serer, but not the default gateway. Restarting udhcpc a second time
>>> does not exhibit this problem.
>>
>> Hi Florian
>>
>> I've seen something similar, but different, again with DSA involved,
>> on a WiFi Access point. I have debian, and i'm using isc dhcp. It gets
>> an address, sets the address on the interface, but does not add the
>> interface route to the routing table. Not sure about default route, i
>> would have to go check that.
>
> Interesting, did you also observe this with 'net', or just with 'net-next'?
>
> Contrary to what I reported above, this is only an issue with
> SYSTEMPORT/DSA/SF2, I could not reproduce this GENET or the Asix driver,
> I was just conflating two different systems here.
>
> My bisection seems to point at this commit:
>
> 58c2cb16b116d7feace621bd6b647bbabacfa225 ("switchdev: convert
> fib_ipv4_add/del over to switchdev_port_obj_add/del")
>
> And indeed, hacking a bit the kernel to remove the SWITCHDEV/DSA
> dependencies to leave just DSA makes thing work again.
>
> Scott, Jiri, any clues? I can instrument the kernel a bit more to help
> find what is the problem here. Note that I am observing this on ARM
> (Andrew probably is as well), where uninitialized stack variables are
> potentially garbage.
I see the problem now, DSA does not implement a port_obj_add callback,
so when net/ipv4/fib_trie.c::switchdev_fib_ipv4_add() gets to call
switchdev_port_obj_add, we return -EOPNOTSUPP, and take the error path
in fib_table_insert thus not inserting the route for this interface.
Now when I restart the DHCP client, we end-up inserting the default
route which is correct, still figuring out what is different here,
probably the deletion of the routes by the DHCP client script first is
the different condition.
At any rate, since switchdev_fib_ipv4_add() returns something that make
us take an error path in the fib_trie, something like this seems to fix
it for me but I am not well versed enough into the IPv4 routing code to
be 100% confident this is the right fix. Also, there are other callers
of switchdev_port_obj_add() but a quick look seems to make them safe as
they are only called for "offloading" capable hardware.
It still looks not being able to differentiate a hard failure from
-EOPNOTSUPP has side effects all over the place
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index e008057dab46..b683e89b4caa 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -853,7 +853,7 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len,
struct fib_info *fi,
if (!err)
fi->fib_flags |= RTNH_F_OFFLOAD;
- return err;
+ return err == -EOPNOTSUPP ? 0 : err;
}
EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_add);
@@ -898,7 +898,7 @@ int switchdev_fib_ipv4_del(u32 dst, int dst_len,
struct fib_info *fi,
if (!err)
fi->fib_flags &= ~RTNH_F_OFFLOAD;
- return err;
+ return err == -EOPNOTSUPP ? 0 : err;
}
EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_del);
--
Florian
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Weird DHCP related problems with net-next
2015-06-10 0:12 ` Florian Fainelli
@ 2015-06-10 21:44 ` Scott Feldman
2015-06-10 22:04 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Scott Feldman @ 2015-06-10 21:44 UTC (permalink / raw)
To: Florian Fainelli; +Cc: Andrew Lunn, Netdev, Jiri Pirko, David S. Miller
On Tue, Jun 9, 2015 at 5:12 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> I see the problem now, DSA does not implement a port_obj_add callback,
> so when net/ipv4/fib_trie.c::switchdev_fib_ipv4_add() gets to call
> switchdev_port_obj_add, we return -EOPNOTSUPP, and take the error path
> in fib_table_insert thus not inserting the route for this interface.
Yup, that's the problem.
> Now when I restart the DHCP client, we end-up inserting the default
> route which is correct, still figuring out what is different here,
> probably the deletion of the routes by the DHCP client script first is
> the different condition.
After the first failure, ipv4.fib_offload_disabled is set, so the next
time switchdev_fib_ipv4_add() just returns 0 and the route is
installed. That explains the one-off behavior.
> At any rate, since switchdev_fib_ipv4_add() returns something that make
> us take an error path in the fib_trie, something like this seems to fix
> it for me but I am not well versed enough into the IPv4 routing code to
> be 100% confident this is the right fix. Also, there are other callers
> of switchdev_port_obj_add() but a quick look seems to make them safe as
> they are only called for "offloading" capable hardware.
Your fix looks good to me. The other users of
switchdev_port_obj_add() want to return -EOPNOTSUPP to user, so it's
just this one case for IPv4 fib insert/del where we'll want to treat
no support silently. Are you going to resend as patch for net-next,
or should I?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird DHCP related problems with net-next
2015-06-10 21:44 ` Scott Feldman
@ 2015-06-10 22:04 ` Florian Fainelli
0 siblings, 0 replies; 7+ messages in thread
From: Florian Fainelli @ 2015-06-10 22:04 UTC (permalink / raw)
To: Scott Feldman; +Cc: Andrew Lunn, Netdev, Jiri Pirko, David S. Miller
On 10/06/15 14:44, Scott Feldman wrote:
> On Tue, Jun 9, 2015 at 5:12 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>> I see the problem now, DSA does not implement a port_obj_add callback,
>> so when net/ipv4/fib_trie.c::switchdev_fib_ipv4_add() gets to call
>> switchdev_port_obj_add, we return -EOPNOTSUPP, and take the error path
>> in fib_table_insert thus not inserting the route for this interface.
>
> Yup, that's the problem.
>
>> Now when I restart the DHCP client, we end-up inserting the default
>> route which is correct, still figuring out what is different here,
>> probably the deletion of the routes by the DHCP client script first is
>> the different condition.
>
> After the first failure, ipv4.fib_offload_disabled is set, so the next
> time switchdev_fib_ipv4_add() just returns 0 and the route is
> installed. That explains the one-off behavior.
>
>> At any rate, since switchdev_fib_ipv4_add() returns something that make
>> us take an error path in the fib_trie, something like this seems to fix
>> it for me but I am not well versed enough into the IPv4 routing code to
>> be 100% confident this is the right fix. Also, there are other callers
>> of switchdev_port_obj_add() but a quick look seems to make them safe as
>> they are only called for "offloading" capable hardware.
>
> Your fix looks good to me. The other users of
> switchdev_port_obj_add() want to return -EOPNOTSUPP to user, so it's
> just this one case for IPv4 fib insert/del where we'll want to treat
> no support silently. Are you going to resend as patch for net-next,
> or should I?
I would prefer if you submitted it since you explained how things are
working and now everything makes sense. I will be happy to test it and
provide the magic tags ;)
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-10 22:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-09 18:54 Weird DHCP related problems with net-next Florian Fainelli
2015-06-09 19:22 ` Andrew Lunn
2015-06-09 20:31 ` Florian Fainelli
2015-06-09 21:42 ` Andrew Lunn
2015-06-10 0:12 ` Florian Fainelli
2015-06-10 21:44 ` Scott Feldman
2015-06-10 22:04 ` Florian Fainelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).