* IPv6 problems (mtu is less than advmss?)
From: Jim Paris @ 2008-01-21 10:52 UTC (permalink / raw)
To: netdev
Hi,
When the IPv6 default route is first autoconfigured, it seems the
advertised MSS doesn't get set properly.
I have computer A connected to an external interface with IPv4 and
tunneling IPv6 using 6to4. It runs radvd on an internal interface to
provide connectivity to my LAN. radvd.conf contains:
interface brint {
AdvLinkMTU 1420; # advertise tiny MTU due to tunnel
AdvSendAdvert on;
prefix 0:0:0:1::/64
{ Base6to4Interface brext; };
};
Computer B is on the LAN and autoconfigures an IPv6 address. However,
I have problems with some specific hosts [1]. They're probably doing
broken ICMP/PMTU filtering or something, but I've tracked down some
weirdness in the kernel:
computerA# /etc/init.d/radvd start
computerB# ifconfig eth0 down
computerB# ifconfig eth0 up
computerB# ip -6 route
2002:125f:205:1::/64 dev eth0 proto kernel metric 256 expires 2592312sec mtu 1420 advmss 1360 hoplimit 4294967295
fe80::/64 dev eth0 metric 256 expires 21334361sec mtu 1420 advmss 1360 hoplimit 4294967295
ff00::/8 dev eth0 metric 256 expires 21334361sec mtu 1420 advmss 1360 hoplimit 4294967295
default via fe80::230:48ff:fe8b:5df4 dev eth0 proto kernel metric 1024 expires 1788sec mtu 1420 advmss 1440 hoplimit 64
At this point, I see the network problems. Note how the advmss in the
last line is not only larger than all the other lines, it's even
bigger than the MTU. I won't be able to download [1] until I manually
fix advmss, _or_ just restart radvd on the other computer:
computerA# /etc/init.d/radvd restart
computerB# ip -6 route
2002:125f:205:1::/64 dev eth0 proto kernel metric 256 expires 2592160sec mtu 1420 advmss 1360 hoplimit 4294967295
fe80::/64 dev eth0 metric 256 expires 21334355sec mtu 1420 advmss 1360 hoplimit 4294967295
ff00::/8 dev eth0 metric 256 expires 21334355sec mtu 1420 advmss 1360 hoplimit 4294967295
default via fe80::230:48ff:fe8b:5df4 dev eth0 proto kernel metric 1024 expires 1798sec mtu 1420 advmss 1360 hoplimit 64
Now the advmss is better and everything works fine. It seems that if
it's being created for the first time, the advmss is not correctly set
based on the MTU, but if there's already a route when it gets the RA,
it reconfigures it correctly.
Computer A is 2.6.23.1 and computer B is 2.6.24-rc8, both x86_64.
Any suggestions or other debugging I can do?
-jim
[1] wget -6 http://releases.mozilla.org/pub/mozilla.org/addons/12/all-in-one_gestures-0.18.0-fx.xpi
^ permalink raw reply
* Re: [PATCH 1/2] IPV6: ICMP6_MIB_OUTMSGS increment duplicated
From: David Miller @ 2008-01-21 11:06 UTC (permalink / raw)
To: wangchen; +Cc: dlstevens, netdev, herbert
In-Reply-To: <47947B1F.7080709@cn.fujitsu.com>
From: Wang Chen <wangchen@cn.fujitsu.com>
Date: Mon, 21 Jan 2008 18:59:43 +0800
> Dave, because the network of my working place can't clone git
> tree, I used 2.6.24-rc8 to make the patches.
Does someone know how to setup GIT to update the HTTP
fetch information as a post-push hook?
> Can you apply them to net-2.6 tree?
Done.
^ permalink raw reply
* Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
From: Jarek Poplawski @ 2008-01-21 11:19 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, slavon, kaber, hadi, netdev
In-Reply-To: <20080121.023632.122420355.davem@davemloft.net>
On Mon, Jan 21, 2008 at 02:36:32AM -0800, David Miller wrote:
...
> FWIW I agree that double-negatives are confusing and we should
> avoid them.
Right! No more: CHECKSUM_NONE, SOCK_NOSPACE, IFF_NOARP or KERN_NOTICE!
Jarek P.
^ permalink raw reply
* Re: [PATCH 1/2] IPV6: ICMP6_MIB_OUTMSGS increment duplicated
From: Wang Chen @ 2008-01-21 11:12 UTC (permalink / raw)
To: David Miller; +Cc: dlstevens, netdev, herbert
In-Reply-To: <20080121.030637.148068457.davem@davemloft.net>
David Miller said the following on 2008-1-21 19:06:
> Does someone know how to setup GIT to update the HTTP
> fetch information as a post-push hook?
>
Seems Herbert knows how to do it.
Maybe run git-update-server-info every time when you rebase
your tree.
>> Can you apply them to net-2.6 tree?
>
> Done.
>
Thanks.
--
WCN
^ permalink raw reply
* Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: David Miller @ 2008-01-21 11:14 UTC (permalink / raw)
To: hidave.darkstar
Cc: marcel, netdev, linux-kernel, bluez-devel, cornelia.huck, gombasg,
htejun, viro, kay.sievers, greg
In-Reply-To: <20080121045401.GB4162@darkstar.te-china.tietoenator.com>
From: Dave Young <hidave.darkstar@gmail.com>
Date: Mon, 21 Jan 2008 12:54:01 +0800
> Add people missed in cc-list.
Thanks Dave for your continued efforts on Bluetooth bugs like this.
Marcel, are you going to review/ACK/integrate/push-upstream/whatever
any of these Bluetooth patches?
It hasn't been getting much love from you as of late, you are one of
the listed maintainers, and I don't want to lose any of Dave's
valuable bug fixing work.
Or should I just handle it all directly?
Thanks.
^ permalink raw reply
* Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
From: David Miller @ 2008-01-21 11:15 UTC (permalink / raw)
To: jarkao2; +Cc: shemminger, slavon, kaber, hadi, netdev
In-Reply-To: <20080121111940.GF1789@ff.dom.local>
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 21 Jan 2008 12:19:40 +0100
> On Mon, Jan 21, 2008 at 02:36:32AM -0800, David Miller wrote:
> ...
> > FWIW I agree that double-negatives are confusing and we should
> > avoid them.
>
> Right! No more: CHECKSUM_NONE, SOCK_NOSPACE, IFF_NOARP or KERN_NOTICE!
Life is difficult sometimes, but that is no excuse to further
the pain :-)
^ permalink raw reply
* Re: [WEXT 8/12]: Pull top-level ioctl dispatch logic into helper function.
From: Masakazu Mokuno @ 2008-01-21 11:16 UTC (permalink / raw)
To: David Miller; +Cc: linux-wireless, netdev
In-Reply-To: <20071221.205623.141035851.davem@davemloft.net>
Hi Dave,
On Fri, 21 Dec 2007 20:56:23 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
>
> [WEXT]: Pull top-level ioctl dispatch logic into helper function.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
<snip>
> +int wext_handle_ioctl(struct net *net, struct ifreq *ifr, unsigned int cmd,
> + void __user *arg)
> +{
> + int ret = wext_ioctl_dispatch(net, ifr, cmd,
> + ioctl_standard_call,
> + ioctl_private_call);
> +
> + if (ret > 0 &&
As the return value 0 is legal, should we allow copybacking in the case
of 'ret == 0'?
Same issue exists in compat_wext_handle_ioctl() of the #9 patch.
> + IW_IS_GET(cmd) &&
> + copy_to_user(arg, ifr, sizeof(struct iwreq)))
> return -EFAULT;
> +
> return ret;
> }
--
Masakazu MOKUNO
^ permalink raw reply
* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: Wang Chen @ 2008-01-21 11:15 UTC (permalink / raw)
To: David S. Miller, David L Stevens; +Cc: netdev, Herbert Xu
In-Reply-To: <478DD57B.6020503@cn.fujitsu.com>
Dave, how about this one.
It's like that one of IPV6.
Wang Chen said the following on 2008-1-16 17:59:
> In tree net-2.6.25, commit "96793b482540f3a26e2188eaf75cb56b7829d3e3"
> made a mistake.
>
> In that patch, David L added a icmp_out_count() in ip_push_pending_frames(),
> remove icmp_out_count() from icmp_reply(). But he forgot to remove
> icmp_out_count() from icmp_send() too.
> Since icmp_send and icmp_reply will call icmp_push_reply, which will call
> ip_push_pending_frames, a duplicated increment happened in icmp_send.
>
> This patch remove the icmp_out_count from icmp_send too.
>
> Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
> ---
> diff -Nurp linux-2.6.24.rc8.org/net/ipv4/icmp.c linux-2.6.24.rc8/net/ipv4/icmp.c
> --- linux-2.6.24.rc8.org/net/ipv4/icmp.c 2008-01-16 17:45:02.000000000 +0800
> +++ linux-2.6.24.rc8/net/ipv4/icmp.c 2008-01-16 17:52:13.000000000 +0800
> @@ -540,7 +540,6 @@ void icmp_send(struct sk_buff *skb_in, i
> icmp_param.data.icmph.checksum = 0;
> icmp_param.skb = skb_in;
> icmp_param.offset = skb_network_offset(skb_in);
> - icmp_out_count(icmp_param.data.icmph.type);
> inet_sk(icmp_socket->sk)->tos = tos;
> ipc.addr = iph->saddr;
> ipc.opt = &icmp_param.replyopts;
>
^ permalink raw reply
* Re: [PATCH] IPv4: Enable use of 240/4 address space
From: David Miller @ 2008-01-21 11:19 UTC (permalink / raw)
To: yoshfuji; +Cc: jengelh, netdev, linux-kernel, ak, vaf
In-Reply-To: <20080120.003019.119068925.yoshfuji@linux-ipv6.org>
From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Sun, 20 Jan 2008 00:30:19 +0900 (JST)
> In article <Pine.LNX.4.64.0801191443410.27831@fbirervta.pbzchgretzou.qr> (at Sat, 19 Jan 2008 14:44:13 +0100 (CET)), Jan Engelhardt <jengelh@computergmbh.de> says:
>
> > From 84bccef295aa9754ee662191e32ba1d64edce2ba Mon Sep 17 00:00:00 2001
> > From: Jan Engelhardt <jengelh@computergmbh.de>
> > Date: Fri, 18 Jan 2008 02:10:44 +0100
> > Subject: [PATCH] IPv4: enable use of 240/4 address space
> >
> > This short patch modifies the IPv4 networking to enable use of the
> > 240.0.0.0/4 (aka "class-E") address space as propsed in the internet
> > draft draft-fuller-240space-00.txt.
> >
> > Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
I've applied this to net-2.6.25, thanks everyone.
I know I said we should deploy this as fast as possible,
but we are really coming down the wire as far as releasing
2.6.24 is concerned and I don't want to put anything into
my pushes to Linus that he might not like and thus cause
the entire set of bug fixes to be rejected.
Thanks again.
^ permalink raw reply
* Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
From: Jarek Poplawski @ 2008-01-21 11:28 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, slavon, kaber, hadi, netdev
In-Reply-To: <20080121.031553.224463409.davem@davemloft.net>
On Mon, Jan 21, 2008 at 03:15:53AM -0800, David Miller wrote:
...
> Life is difficult sometimes, but that is no excuse to further
> the pain :-)
YES! I've read somewhere about it too!
Jarek P.
^ permalink raw reply
* Re: : Emit event stream compat iw_point objects correctly.
From: Masakazu Mokuno @ 2008-01-21 11:23 UTC (permalink / raw)
To: David Miller
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080110.011602.74511551.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Hi
Sorry for my intermittent posts.
On Thu, 10 Jan 2008 01:16:02 -0800 (PST)
David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
> From: Masakazu Mokuno <mokuno-DfbDroY8Xu1L9jVzuh4AOg@public.gmane.org>
> Date: Thu, 27 Dec 2007 18:24:40 +0900
>
> > On ppc64 (PS3), IW_EV_LCP_LEN is 8, not 4.
> >
> > include/linux/wireless.h:
> >
> > #define IW_EV_LCP_LEN (sizeof(struct iw_event) - sizeof(union iwreq_data))
> >
> > where sizeof(struct iw_event) == 24, sizeof(union iwreq_data) == 16 on
> > PS3.
>
> Here is a new version of the last patch (#12), it should handle
> all of these cases properly now.
>
> Let me know if you spot any more errors.
>
> Thanks!
>
> [WEXT]: Emit event stream entries correctly when compat.
>
> Three major portions to this change:
>
> 1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF,
> and IW_EV_COMPAT_POINT_LEN helper defines.
>
> 2) Delete iw_stream_check_add_*(), they are unused.
>
> 3) Add iw_request_info argument to iwe_stream_add_*(), and use it to
> size the event and pointer lengths correctly depending upon whether
> IW_REQUEST_FLAG_COMPAT is set or not.
>
> 4) The mechanical transformations to the drivers and wireless stack
> bits to get the iw_request_info passed down into the routines
> modified in #3.
>
> With help from Masakazu Mokuno
>
> Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> ---
> drivers/net/wireless/airo.c | 39 +++++---
> drivers/net/wireless/atmel.c | 24 ++++-
> drivers/net/wireless/hostap/hostap.h | 3 +-
> drivers/net/wireless/hostap/hostap_ap.c | 32 +++---
> drivers/net/wireless/hostap/hostap_ioctl.c | 54 ++++++-----
> drivers/net/wireless/libertas/scan.c | 35 ++++---
> drivers/net/wireless/orinoco.c | 30 ++++--
> drivers/net/wireless/prism54/isl_ioctl.c | 45 +++++----
> drivers/net/wireless/wl3501_cs.c | 10 +-
> drivers/net/wireless/zd1201.c | 21 +++--
> include/linux/wireless.h | 16 +++
> include/net/iw_handler.h | 150 ++++++++--------------------
> net/ieee80211/ieee80211_wx.c | 44 +++++----
> net/mac80211/ieee80211_i.h | 5 +-
> net/mac80211/ieee80211_ioctl.c | 2 +-
> net/mac80211/ieee80211_sta.c | 59 ++++++-----
> 16 files changed, 293 insertions(+), 276 deletions(-)
<snip>
> diff --git a/drivers/net/wireless/prism54/isl_ioctl.c b/drivers/net/wireless/prism54/isl_ioctl.c
> index 6d80ca4..4dc0b5e 100644
> --- a/drivers/net/wireless/prism54/isl_ioctl.c
> +++ b/drivers/net/wireless/prism54/isl_ioctl.c
> @@ -572,8 +572,9 @@ prism54_set_scan(struct net_device *dev, struct iw_request_info *info,
> */
>
> static char *
> -prism54_translate_bss(struct net_device *ndev, char *current_ev,
> - char *end_buf, struct obj_bss *bss, char noise)
> +prism54_translate_bss(struct net_device *ndev, struct iw_request_info *info,
> + char *current_ev, char *end_buf, struct obj_bss *bss,
> + char noise)
> {
> struct iw_event iwe; /* Temporary buffer */
> short cap;
<snip>
> @@ -2728,9 +2730,12 @@ prism2_ioctl_scan_req(struct net_device *ndev,
> rvalue |= mgt_get_request(priv, DOT11_OID_BSSLIST, 0, NULL, &r);
> bsslist = r.ptr;
>
> + info.cmd = PRISM54_HOSTAPD;
> + info.flags = 0;
> +
> /* ok now, scan the list and translate its info */
> for (i = 0; i < min(IW_MAX_AP, (int) bsslist->nr); i++)
> - current_ev = prism54_translate_bss(ndev, current_ev,
> + current_ev = prism54_translate_bss(ndev, current_ev, &info,
The order of the arguments is wrong.
current_ev = prism54_translate_bss(ndev, &info, current_ev,
> extra + IW_SCAN_MAX_DATA,
> &(bsslist->bsslist[i]),
> noise);
--
Masakazu MOKUNO
^ permalink raw reply
* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: David Miller @ 2008-01-21 11:25 UTC (permalink / raw)
To: wangchen; +Cc: dlstevens, netdev, herbert
In-Reply-To: <47947EBB.2020507@cn.fujitsu.com>
From: Wang Chen <wangchen@cn.fujitsu.com>
Date: Mon, 21 Jan 2008 19:15:07 +0800
> Dave, how about this one.
> It's like that one of IPV6.
I am confused, this changelog for this patch mentions changes made
only in the net-2.6.25 tree.
Yet you just told me the two other patches should be applied to
net-2.6
What is the exact story here?
Thank you.
^ permalink raw reply
* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: Wang Chen @ 2008-01-21 11:34 UTC (permalink / raw)
To: David Miller; +Cc: dlstevens, netdev, herbert
In-Reply-To: <20080121.032554.96053463.davem@davemloft.net>
David Miller said the following on 2008-1-21 19:25:
> I am confused, this changelog for this patch mentions changes made
> only in the net-2.6.25 tree.
>
Although I wrote that I find David.S's patch in net-2.6.25 was wrong,
the guilty patch also in net-2.6 tree. Here is the web link.
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=96793b482540f3a26e2188eaf75cb56b7829d3e3
So, I think apply my patches to net-2.6 is ok.
^ permalink raw reply
* Re: : Emit event stream compat iw_point objects correctly.
From: David Miller @ 2008-01-21 11:37 UTC (permalink / raw)
To: mokuno-DfbDroY8Xu1L9jVzuh4AOg
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080121194942.613C.40F06B3A-DfbDroY8Xu1L9jVzuh4AOg@public.gmane.org>
From: Masakazu Mokuno <mokuno-DfbDroY8Xu1L9jVzuh4AOg@public.gmane.org>
Date: Mon, 21 Jan 2008 20:23:15 +0900
> Sorry for my intermittent posts.
No problem.
I am sorry for being to busy to get back to active work
on this patch set.
> > -prism54_translate_bss(struct net_device *ndev, char *current_ev,
> > - char *end_buf, struct obj_bss *bss, char noise)
> > +prism54_translate_bss(struct net_device *ndev, struct iw_request_info *info,
> > + char *current_ev, char *end_buf, struct obj_bss *bss,
> > + char noise)
> > {
> > struct iw_event iwe; /* Temporary buffer */
> > short cap;
>
> <snip>
>
> > @@ -2728,9 +2730,12 @@ prism2_ioctl_scan_req(struct net_device *ndev,
> > rvalue |= mgt_get_request(priv, DOT11_OID_BSSLIST, 0, NULL, &r);
> > bsslist = r.ptr;
> >
> > + info.cmd = PRISM54_HOSTAPD;
> > + info.flags = 0;
> > +
> > /* ok now, scan the list and translate its info */
> > for (i = 0; i < min(IW_MAX_AP, (int) bsslist->nr); i++)
> > - current_ev = prism54_translate_bss(ndev, current_ev,
> > + current_ev = prism54_translate_bss(ndev, current_ev, &info,
>
> The order of the arguments is wrong.
>
> current_ev = prism54_translate_bss(ndev, &info, current_ev,
Indeed, I will fix this up in a future version.
I will also investigate why this escaped my build testing.
It is merely a PCI driver, so it should have been included
in the "make allmodconfig" test builds I do on sparc64.
Thank you.
^ permalink raw reply
* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: David Miller @ 2008-01-21 11:40 UTC (permalink / raw)
To: wangchen; +Cc: dlstevens, netdev, herbert
In-Reply-To: <47948360.9090709@cn.fujitsu.com>
From: Wang Chen <wangchen@cn.fujitsu.com>
Date: Mon, 21 Jan 2008 19:34:56 +0800
> David Miller said the following on 2008-1-21 19:25:
> > I am confused, this changelog for this patch mentions changes made
> > only in the net-2.6.25 tree.
> >
>
> Although I wrote that I find David.S's patch in net-2.6.25 was wrong,
> the guilty patch also in net-2.6 tree. Here is the web link.
> http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=96793b482540f3a26e2188eaf75cb56b7829d3e3
>
> So, I think apply my patches to net-2.6 is ok.
Thank you for explaining this.
Patch applied to net-2.6
^ permalink raw reply
* RE: [PATCH][v2] phylib: add module owner to the mii_bus structure
From: Nicu Ioan Petru @ 2008-01-21 12:13 UTC (permalink / raw)
To: Fleming Andy; +Cc: netdev, shemminger
In-Reply-To: <CFE6E252-4CB9-4F91-ABE6-330798408E50@freescale.com>
> -----Original Message-----
> From: Fleming Andy
> Sent: Monday, January 14, 2008 8:56 PM
> To: Nicu Ioan Petru
> Cc: netdev@vger.kernel.org; shemminger@linux-foundation.org
> Subject: Re: [PATCH][v2] phylib: add module owner to the
> mii_bus structure
>
> Any reason you didn't update the other drivers?
>
> > git grep mdiobus_register drivers/net/ // duplicates and
> mdio_bus.c edited out
> drivers/net/au1000_eth.c: mdiobus_register(&aup->mii_bus);
> drivers/net/bfin_mac.c: mdiobus_register(&lp->mii_bus);
> drivers/net/cpmac.c: res = mdiobus_register(&cpmac_mii);
> drivers/net/fec_mpc52xx_phy.c: err = mdiobus_register(bus);
> drivers/net/fs_enet/mii-bitbang.c: ret =
> mdiobus_register(new_bus);
> drivers/net/fs_enet/mii-fec.c: ret = mdiobus_register(new_bus);
> drivers/net/gianfar_mii.c: err = mdiobus_register(new_bus);
> drivers/net/macb.c: if (mdiobus_register(&bp->mii_bus))
> drivers/net/sb1250-mac.c: err = mdiobus_register(&sc->mii_bus);
> drivers/net/ucc_geth_mii.c: err = mdiobus_register(new_bus);
>
> I'm guessing this was only tested on the UEC, because unless
> I misunderstand the code, any other driver would now crash
> when you try to get the owner.
>
That's not true. If you look closely at the implementation of
try_module_get(), you'll see it returns 1 when module is NULL. So my
change should make no difference for the other drivers.
If that's really a concern for you, I can update the other drivers as
well. How do you propose to do this? Resubmit this patch or send a new
one?
Ionut.
^ permalink raw reply
* Re: [Bugme-new] [Bug 9778] New: unregister_netdevice: waiting for [device] to become free
From: Evgeniy Polyakov @ 2008-01-21 12:14 UTC (permalink / raw)
To: David Miller; +Cc: akpm, xemul, bugme-daemon, nigel, netdev
In-Reply-To: <20080120.023027.85710827.davem@davemloft.net>
On Sun, Jan 20, 2008 at 02:30:27AM -0800, David Miller (davem@davemloft.net) wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Sat, 19 Jan 2008 16:58:02 -0800
>
> > ouch.
>
> Yep, several people are hitting this it seems.
>
> If Pavel doesn't provide a fix or direction soon I'll just revert.
It looks like patch is still valid.
Here is a problem description as I undestood.
When new device (let's talk about ethernet, since that is what I tested)
is being turned on, it gets neigh_parms entry allocated for it via
inetdev_init(), which is called for NETDEV_REGISTER inetdev event.
This entry is stored in arp_tbl table and is in_dev->arp_parms.
When later new arp entry is created, device is provided into
arp_constructor(), which clones (increase reference counter) device's
in_dev->arp_parms and puts it into provided neighbour entry.
When later we remove device, its in_dev->arp_parms's reference counter
is high enough (it is equal to number of arp entries found on given
device plu one), so neigh_parms_destroy() is not called. Later all
neighbour entries are flushed by garbage collector and reference counter
for that parm hits zero and device can be removed.
I will think about how to fix the problem nicely or if this patch still
can be simplified/dropped, but so far it looks valid. Maybe this
analysis will help someone to fix problem first.
Here is debug dmesg:
[ 21.835595] inetdev_init: allocating parms.
[ 21.839829] neigh_parms_alloc: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 2.
...
[ 30.251576] r8169: eth0: link up
[ 31.067079] NET: Registered protocol family 10
[ 31.072055] neigh_parms_alloc: parms: ffff81003efc72a8, dev: lo, refcnt: 1, dev_refcnt: 9.
[ 31.080891] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[ 31.087816] neigh_parms_alloc: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 9.
[ 31.097335] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
[ 31.104172] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2.
[ 31.500348] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[ 32.499628] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[ 102.827796] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 3, dev_refcnt: 13.
[ 106.828843] neigh_destroy: parms: ffff81003f8c3be8, dev: lo, refcnt: 2, dev_refcnt: 78.
[ 109.810987] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
First arp entry for eth0 device, bump the counter:
[ 109.817827] arp_constructor: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2.
[ 109.831811] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
[ 109.838661] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2.
[ 110.837894] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 2, dev_refcnt: 15.
Can not release that neigh parm:
[ 113.638228] neigh_parms_release: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2, dev_refcnt: 5.
Can release some other (for ipv6):
[ 113.649380] neigh_parms_release: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 5.
[ 113.671806] neigh_parms_destroy: parms: ffff81003efc7210, dev: eth0, dev_refcnt: 3.
[ 123.916250] unregister_netdevice: waiting for eth0 to become free. Usage count = 1
GC hits us:
[ 124.839572] neigh_destroy: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 11.
[ 124.847813] neigh_parms_destroy: parms: ffff81003d8e8df0, dev: eth0, dev_refcnt: 1.
[ 124.952026] ACPI: PCI interrupt for device 0000:02:0d.0 disabled
--
Evgeniy Polyakov
^ permalink raw reply
* Re: [Bugme-new] [Bug 9778] New: unregister_netdevice: waiting for [device] to become free
From: David Miller @ 2008-01-21 12:36 UTC (permalink / raw)
To: johnpol; +Cc: akpm, xemul, bugme-daemon, nigel, netdev
In-Reply-To: <20080121121445.GA29459@2ka.mipt.ru>
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Mon, 21 Jan 2008 15:14:45 +0300
> I will think about how to fix the problem nicely or if this patch still
> can be simplified/dropped, but so far it looks valid. Maybe this
> analysis will help someone to fix problem first.
I have currently reverted Pavel's patch.
I have no doubt that his patch was in some aspects
correct, the regressions were worse than the disease
he initially intended to cure.
We can add the race fix back once we get to the
bottom of the reference count issue.
^ permalink raw reply
* Re: [PATCH 3/4] bonding: Fix work rearming
From: Jarek Poplawski @ 2008-01-21 13:33 UTC (permalink / raw)
To: Makito SHIOKAWA; +Cc: netdev
In-Reply-To: <479419B6.9000000@miraclelinux.com>
On Mon, Jan 21, 2008 at 01:04:06PM +0900, Makito SHIOKAWA wrote:
> > (But new_value = 0 seems needed - just like from module_param()?)
> Do you mean to initialize new_value before sscanf()? (There is a check 'if (sscanf(buf, "%d", &new_value) != 1)', so is it necesarry?)
No: you mentioned about treating new_value == 0 like new_value < 0
with 'if (new_value <= 0)', and I didn't understand this idea...
>
> > - maybe to test if the value has changed at all,
> Tested.
>
> For now, patch will be like below. Any slight comment for this will be helpful, regards.
I think cancelling of delayed works in bond_sysfs is OK now.
Alas I don't understand the reason of this change in bond_main()...
Some comment?
Thanks,
Jarek P.
>
>
> Signed-off-by: Makito SHIOKAWA <mshiokawa@miraclelinux.com>
> ---
> drivers/net/bonding/bond_main.c | 11 ++++-------
> drivers/net/bonding/bond_sysfs.c | 19 +++++++++++++++++--
> 2 files changed, 21 insertions(+), 9 deletions(-)
>
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2699,7 +2699,7 @@ void bond_loadbalance_arp_mon(struct wor
>
> read_lock(&bond->lock);
>
> - delta_in_ticks = (bond->params.arp_interval * HZ) / 1000;
> + delta_in_ticks = ((bond->params.arp_interval * HZ) / 1000) ? : 1;
>
> if (bond->kill_timers) {
> goto out;
> @@ -2801,8 +2801,7 @@ void bond_loadbalance_arp_mon(struct wor
> }
>
> re_arm:
> - if (bond->params.arp_interval)
> - queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
> + queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
> out:
> read_unlock(&bond->lock);
> }
> @@ -2832,7 +2831,7 @@ void bond_activebackup_arp_mon(struct wo
>
> read_lock(&bond->lock);
>
> - delta_in_ticks = (bond->params.arp_interval * HZ) / 1000;
> + delta_in_ticks = ((bond->params.arp_interval * HZ) / 1000) ? : 1;
>
> if (bond->kill_timers) {
> goto out;
> @@ -3058,9 +3057,7 @@ void bond_activebackup_arp_mon(struct wo
> }
>
> re_arm:
> - if (bond->params.arp_interval) {
> - queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
> - }
> + queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
> out:
> read_unlock(&bond->lock);
> }
> --- a/drivers/net/bonding/bond_sysfs.c
> +++ b/drivers/net/bonding/bond_sysfs.c
> @@ -644,6 +644,15 @@ static ssize_t bonding_store_arp_interva
> ": %s: Setting ARP monitoring interval to %d.\n",
> bond->dev->name, new_value);
> bond->params.arp_interval = new_value;
> + if (bond->params.arp_interval == 0 && (bond->dev->flags & IFF_UP)) {
> + printk(KERN_INFO DRV_NAME
> + ": %s: Disabling ARP monitoring.\n",
> + bond->dev->name);
> + if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
> + cancel_delayed_work_sync(&bond->ab_arp_work);
> + else
> + cancel_delayed_work_sync(&bond->lb_arp_work);
> + }
> if (bond->params.miimon) {
> printk(KERN_INFO DRV_NAME
> ": %s: ARP monitoring cannot be used with MII monitoring. "
> @@ -658,7 +667,7 @@ static ssize_t bonding_store_arp_interva
> "but no ARP targets have been specified.\n",
> bond->dev->name);
> }
> - if (bond->dev->flags & IFF_UP) {
> + if (bond->params.arp_interval && (bond->dev->flags & IFF_UP)) {
> /* If the interface is up, we may need to fire off
> * the ARP timer. If the interface is down, the
> * timer will get fired off when the open function
> @@ -997,6 +1006,12 @@ static ssize_t bonding_store_miimon(stru
> ": %s: Setting MII monitoring interval to %d.\n",
> bond->dev->name, new_value);
> bond->params.miimon = new_value;
> + if (bond->params.miimon == 0 && (bond->dev->flags & IFF_UP)) {
> + printk(KERN_INFO DRV_NAME
> + ": %s: Disabling MII monitoring...\n",
> + bond->dev->name);
> + cancel_delayed_work_sync(&bond->mii_work);
> + }
> if(bond->params.updelay)
> printk(KERN_INFO DRV_NAME
> ": %s: Note: Updating updelay (to %d) "
> @@ -1026,7 +1041,7 @@ static ssize_t bonding_store_miimon(stru
> cancel_delayed_work_sync(&bond->lb_arp_work);
> }
>
> - if (bond->dev->flags & IFF_UP) {
> + if (bond->params.miimon && (bond->dev->flags & IFF_UP)) {
> /* If the interface is up, we may need to fire off
> * the MII timer. If the interface is down, the
> * timer will get fired off when the open function
>
>
> --
> Makito SHIOKAWA
> MIRACLE LINUX CORPORATION
^ permalink raw reply
* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: Robert Olsson @ 2008-01-21 13:27 UTC (permalink / raw)
To: David Miller
Cc: Robert.Olsson, elendil, jesse.brandeburg, slavon, netdev,
linux-kernel
In-Reply-To: <20080118.041144.81957249.davem@davemloft.net>
David Miller writes:
> Yes, this semaphore thing is highly problematic. In the most crucial
> areas where network driver consistency matters the most for ease of
> understanding and debugging, the Intel drivers choose to be different
> :-(
>
> The way the napi_disable() logic breaks out from high packet load in
> net_rx_action() is it simply returns even leaving interrupts disabled
> when a pending napi_disable() is pending.
>
> This is what trips up the semaphore logic.
>
> Robert, give this patch a try.
Yes it works. e1000 tested for ~3 hours with high very high load and
interface up/down every 5:th sec. Without the patch the irq's gets
disabled within a couple of seconds
A resolute way of handling the semaphores. :)
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Cheers
--ro
> In the long term this semaphore should be completely eliminated,
> there is no justification for it.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 0c9a6f7..76c0fa6 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
>
> #ifdef CONFIG_E1000_NAPI
> napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> #endif
> e1000_irq_disable(adapter);
>
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 2ab3bfb..9cc5a6b 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
> msleep(10);
>
> napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> e1000_irq_disable(adapter);
>
> del_timer_sync(&adapter->watchdog_timer);
> diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
> index d2fb88d..4f63839 100644
> --- a/drivers/net/ixgb/ixgb_main.c
> +++ b/drivers/net/ixgb/ixgb_main.c
> @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
> {
> struct net_device *netdev = adapter->netdev;
>
> +#ifdef CONFIG_IXGB_NAPI
> + napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> +#endif
> +
> ixgb_irq_disable(adapter);
> free_irq(adapter->pdev->irq, netdev);
>
> @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
>
> if(kill_watchdog)
> del_timer_sync(&adapter->watchdog_timer);
> -#ifdef CONFIG_IXGB_NAPI
> - napi_disable(&adapter->napi);
> -#endif
> +
> adapter->link_speed = 0;
> adapter->link_duplex = 0;
> netif_carrier_off(netdev);
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index de3f45e..a4265bc 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
> IXGBE_WRITE_FLUSH(&adapter->hw);
> msleep(10);
>
> + napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> +
> ixgbe_irq_disable(adapter);
>
> - napi_disable(&adapter->napi);
> del_timer_sync(&adapter->watchdog_timer);
>
> netif_carrier_off(netdev);
^ permalink raw reply
* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: David Miller @ 2008-01-21 13:29 UTC (permalink / raw)
To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel
In-Reply-To: <18324.40369.984595.651675@robur.slu.se>
From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Mon, 21 Jan 2008 14:27:13 +0100
> Yes it works. e1000 tested for ~3 hours with high very high load and
> interface up/down every 5:th sec. Without the patch the irq's gets
> disabled within a couple of seconds
>
> A resolute way of handling the semaphores. :)
>
> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Thanks for testing Robert.
I sent off that fix to Linus an hour or so ago, hopefully
he will pick it up some time today.
^ permalink raw reply
* Re: [Bugme-new] [Bug 9778] New: unregister_netdevice: waiting for [device] to become free
From: Evgeniy Polyakov @ 2008-01-21 14:36 UTC (permalink / raw)
To: David Miller; +Cc: akpm, xemul, netdev, bugme-daemon, nigel
In-Reply-To: <20080121121445.GA29459@2ka.mipt.ru>
On Mon, Jan 21, 2008 at 03:14:45PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> It looks like patch is still valid.
> Here is a problem description as I undestood.
>
> When new device (let's talk about ethernet, since that is what I tested)
> is being turned on, it gets neigh_parms entry allocated for it via
> inetdev_init(), which is called for NETDEV_REGISTER inetdev event.
> This entry is stored in arp_tbl table and is in_dev->arp_parms.
>
> When later new arp entry is created, device is provided into
> arp_constructor(), which clones (increase reference counter) device's
> in_dev->arp_parms and puts it into provided neighbour entry.
>
> When later we remove device, its in_dev->arp_parms's reference counter
> is high enough (it is equal to number of arp entries found on given
> device plu one), so neigh_parms_destroy() is not called. Later all
> neighbour entries are flushed by garbage collector and reference counter
> for that parm hits zero and device can be removed.
>
> I will think about how to fix the problem nicely or if this patch still
> can be simplified/dropped, but so far it looks valid. Maybe this
> analysis will help someone to fix problem first.
Yes, patch is valid, and there is a (very noticeble) race between
neighbour processing and parm release - parm still can be accessed after
device was fully freed (as with old behaviour when dev_pu() was called
from neigh_parms_release()), although no one access it, so the simplest
solution is to move dev_put() under the table lock and allow to access
parms->dev only under table lock and always check if it is non-null.
So I propose a following patch as a simplest solution for the current
time.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a4f2618..410b7e7 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -34,6 +34,11 @@ struct neighbour;
struct neigh_parms
{
+ /*
+ * This device is only allowed to be accessed under table lock (bh turned off)
+ * and while device is alive. After parm was released, it will be set to NULL
+ * and has to be always checked before accessed.
+ */
struct net_device *dev;
struct neigh_parms *next;
int (*neigh_setup)(struct neighbour *);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index cc8a2f1..5076acd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1315,7 +1315,12 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
if (*p == parms) {
*p = parms->next;
parms->dead = 1;
+ if (parms->dev) {
+ dev_put(parms->dev);
+ parms->dev = NULL;
+ }
write_unlock_bh(&tbl->lock);
+
call_rcu(&parms->rcu_head, neigh_rcu_free_parms);
return;
}
@@ -1326,8 +1331,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
void neigh_parms_destroy(struct neigh_parms *parms)
{
- if (parms->dev)
- dev_put(parms->dev);
kfree(parms);
}
--
Evgeniy Polyakov
^ permalink raw reply related
* Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
From: Jarek Poplawski @ 2008-01-21 14:43 UTC (permalink / raw)
To: David Miller; +Cc: shemminger, slavon, kaber, hadi, netdev
In-Reply-To: <20080121.031553.224463409.davem@davemloft.net>
On Mon, Jan 21, 2008 at 03:15:53AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 21 Jan 2008 12:19:40 +0100
>
> > On Mon, Jan 21, 2008 at 02:36:32AM -0800, David Miller wrote:
> > ...
> > > FWIW I agree that double-negatives are confusing and we should
> > > avoid them.
> >
> > Right! No more: CHECKSUM_NONE, SOCK_NOSPACE, IFF_NOARP or KERN_NOTICE!
>
> Life is difficult sometimes, but that is no excuse to further
> the pain :-)
BTW, maybe somebody else finds this interesting (because you seem to
know this very well), in some languages, like Polish, e.g.: "that is
no excuse" needs double-negative: "to nie jest zadne wytlumaczenie",
so literally: "that not is no excuse"...
Cheers,
Jarek P.
^ permalink raw reply
* [PATCH 0/6 net-2.6.25] Provide correct namespace on IPv4 packet input path.
From: Denis V. Lunev @ 2008-01-21 14:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev, devel, Linux Containers
This patchset sequentially adds namespace parameter to fib_lookup and
inetdev_by_index. After that it is possible to pass network namespace
from input packet to routing engine.
Output path is much more intrusive and will be sent separately.
Signed-off-by: Denis V. Lunev <den@openvz.org>
^ permalink raw reply
* [PATCH 3/6 net-2.6.25] [NETNS] Pass correct namespace in fib_validate_source.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>
Correct network namespace is available inside fib_validate_source. It can be
obtained from the device passed in. The device is not NULL as in_device is
obtained from it just above.
Signed-off-by: Denis V. Lunev <den@openvz.org>
---
net/ipv4/fib_frontend.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index dcd3a28..39b8b35 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -243,6 +243,7 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
struct fib_result res;
int no_addr, rpf;
int ret;
+ struct net *net;
no_addr = rpf = 0;
rcu_read_lock();
@@ -256,7 +257,8 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
if (in_dev == NULL)
goto e_inval;
- if (fib_lookup(&init_net, &fl, &res))
+ net = dev->nd_net;
+ if (fib_lookup(net, &fl, &res))
goto last_resort;
if (res.type != RTN_UNICAST)
goto e_inval_res;
@@ -280,7 +282,7 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
fl.oif = dev->ifindex;
ret = 0;
- if (fib_lookup(&init_net, &fl, &res) == 0) {
+ if (fib_lookup(net, &fl, &res) == 0) {
if (res.type == RTN_UNICAST) {
*spec_dst = FIB_RES_PREFSRC(res);
ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
--
1.5.3.rc5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox