* Re: 2.6.14-rc4-mm1
[not found] <fa.h4unqgj.l34e31@ifi.uio.no>
@ 2005-10-17 6:19 ` Reuben Farrelly
2005-10-23 7:30 ` [0/3] Fix timer bugs in neighbour cache Herbert Xu
0 siblings, 1 reply; 7+ messages in thread
From: Reuben Farrelly @ 2005-10-17 6:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, netdev
Hi,
On 16/10/2005 10:42 p.m., Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc4/2.6.14-rc4-mm1/
>
> - Lots of i2c, PCI and USB updates
>
> - Large input layer update to convert it all to dynamic input_dev allocation
>
> - Significant x86_64 updates
>
> - MD updates
>
> - Lots of core memory management scalability rework
Compiles and runs here, but noting these messages when ethernet link down:
Oct 17 18:49:40 tornado kernel: NEIGH: BUG, double timer add, state is 1
Oct 17 18:51:04 tornado last message repeated 3 times
Oct 17 18:52:05 tornado last message repeated 5 times
Oct 17 18:52:11 tornado last message repeated 2 times
net/core/neighbour.c has this:
static inline void neigh_add_timer(struct neighbour *n, unsigned long when)
{
if (unlikely(mod_timer(&n->timer, when))) {
printk("NEIGH: BUG, double timer add, state is %x\n",
n->nud_state);
}
}
Network guys?
reuben
^ permalink raw reply [flat|nested] 7+ messages in thread
* [0/3] Fix timer bugs in neighbour cache
2005-10-17 6:19 ` 2.6.14-rc4-mm1 Reuben Farrelly
@ 2005-10-23 7:30 ` Herbert Xu
2005-10-23 7:31 ` [1/3] [NEIGH] Print stack trace in neigh_add_timer Herbert Xu
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Herbert Xu @ 2005-10-23 7:30 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: akpm, linux-kernel, netdev, acme, davem, greearb
Reuben Farrelly <reuben-lkml@reub.net> wrote:
>
> Oct 17 18:49:40 tornado kernel: NEIGH: BUG, double timer add, state is 1
> Oct 17 18:51:04 tornado last message repeated 3 times
> Oct 17 18:52:05 tornado last message repeated 5 times
> Oct 17 18:52:11 tornado last message repeated 2 times
Excellent. Looks like we actually caught something. Pity we don't have
a stack trace which means that there might be more bugs.
Anyway, here are three patches which should fix this. This should go
into 2.6.14.
Arnaldo, you can pull them from
master.kernel.org:/pub/scm/linux/kernel/git/herbert/net-2.6.git
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 7+ messages in thread
* [1/3] [NEIGH] Print stack trace in neigh_add_timer
2005-10-23 7:30 ` [0/3] Fix timer bugs in neighbour cache Herbert Xu
@ 2005-10-23 7:31 ` Herbert Xu
2005-10-23 7:32 ` [2/3] [NEIGH] Fix add_timer race " Herbert Xu
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Herbert Xu @ 2005-10-23 7:31 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: akpm, linux-kernel, netdev, acme, davem, greearb
[-- Attachment #1: Type: text/plain, Size: 445 bytes --]
[NEIGH] Print stack trace in neigh_add_timer
Stack traces are very helpful in determining the exact nature of a bug.
So let's print a stack trace when the timer is added twice.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p1.patch --]
[-- Type: text/plain, Size: 326 bytes --]
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -732,6 +732,7 @@ static inline void neigh_add_timer(struc
if (unlikely(mod_timer(&n->timer, when))) {
printk("NEIGH: BUG, double timer add, state is %x\n",
n->nud_state);
+ dump_stack();
}
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [2/3] [NEIGH] Fix add_timer race in neigh_add_timer
2005-10-23 7:30 ` [0/3] Fix timer bugs in neighbour cache Herbert Xu
2005-10-23 7:31 ` [1/3] [NEIGH] Print stack trace in neigh_add_timer Herbert Xu
@ 2005-10-23 7:32 ` Herbert Xu
2005-10-23 7:33 ` [3/3] [NEIGH] Fix timer leak in neigh_changeaddr Herbert Xu
2005-10-23 16:16 ` [0/3] Fix timer bugs in neighbour cache Arnaldo Carvalho de Melo
3 siblings, 0 replies; 7+ messages in thread
From: Herbert Xu @ 2005-10-23 7:32 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: akpm, linux-kernel, netdev, acme, davem, greearb
[-- Attachment #1: Type: text/plain, Size: 642 bytes --]
[NEIGH] Fix add_timer race in neigh_add_timer
neigh_add_timer cannot use add_timer unconditionally. The reason is that
by the time it has obtained the write lock someone else (e.g., neigh_update)
could have already added a new timer.
So it should only use mod_timer and deal with its return value accordingly.
This bug would have led to rare neighbour cache entry leaks.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p2.patch --]
[-- Type: text/plain, Size: 523 bytes --]
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -816,10 +816,10 @@ static void neigh_timer_handler(unsigned
}
if (neigh->nud_state & NUD_IN_TIMER) {
- neigh_hold(neigh);
if (time_before(next, jiffies + HZ/2))
next = jiffies + HZ/2;
- neigh_add_timer(neigh, next);
+ if (!mod_timer(&neigh->timer, next))
+ neigh_hold(neigh);
}
if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {
struct sk_buff *skb = skb_peek(&neigh->arp_queue);
^ permalink raw reply [flat|nested] 7+ messages in thread
* [3/3] [NEIGH] Fix timer leak in neigh_changeaddr
2005-10-23 7:30 ` [0/3] Fix timer bugs in neighbour cache Herbert Xu
2005-10-23 7:31 ` [1/3] [NEIGH] Print stack trace in neigh_add_timer Herbert Xu
2005-10-23 7:32 ` [2/3] [NEIGH] Fix add_timer race " Herbert Xu
@ 2005-10-23 7:33 ` Herbert Xu
2005-10-23 18:00 ` Ben Greear
2005-10-23 16:16 ` [0/3] Fix timer bugs in neighbour cache Arnaldo Carvalho de Melo
3 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2005-10-23 7:33 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: akpm, linux-kernel, netdev, acme, davem, greearb
[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]
[NEIGH] Fix timer leak in neigh_changeaddr
neigh_changeaddr attempts to delete neighbour timers without setting
nud_state. This doesn't work because the timer may have already fired
when we acquire the write lock in neigh_changeaddr. The result is that
the timer may keep firing for quite a while until the entry reaches
NEIGH_FAILED.
It should be setting the nud_state straight away so that if the timer
has already fired it can simply exit once we relinquish the lock.
In fact, this whole function is simply duplicating the logic in
neigh_ifdown which in turn is already doing the right thing when
it comes to deleting timers and setting nud_state.
So all we have to do is take that code out and put it into a common
function and make both neigh_changeaddr and neigh_ifdown call it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p3.patch --]
[-- Type: text/plain, Size: 1446 bytes --]
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -175,39 +175,10 @@ static void pneigh_queue_purge(struct sk
}
}
-void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
+static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
{
int i;
- write_lock_bh(&tbl->lock);
-
- for (i=0; i <= tbl->hash_mask; i++) {
- struct neighbour *n, **np;
-
- np = &tbl->hash_buckets[i];
- while ((n = *np) != NULL) {
- if (dev && n->dev != dev) {
- np = &n->next;
- continue;
- }
- *np = n->next;
- write_lock_bh(&n->lock);
- n->dead = 1;
- neigh_del_timer(n);
- write_unlock_bh(&n->lock);
- neigh_release(n);
- }
- }
-
- write_unlock_bh(&tbl->lock);
-}
-
-int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
-{
- int i;
-
- write_lock_bh(&tbl->lock);
-
for (i = 0; i <= tbl->hash_mask; i++) {
struct neighbour *n, **np = &tbl->hash_buckets[i];
@@ -243,7 +214,19 @@ int neigh_ifdown(struct neigh_table *tbl
neigh_release(n);
}
}
+}
+void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
+{
+ write_lock_bh(&tbl->lock);
+ neigh_flush_dev(tbl, dev);
+ write_unlock_bh(&tbl->lock);
+}
+
+int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
+{
+ write_lock_bh(&tbl->lock);
+ neigh_flush_dev(tbl, dev);
pneigh_ifdown(tbl, dev);
write_unlock_bh(&tbl->lock);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [0/3] Fix timer bugs in neighbour cache
2005-10-23 7:30 ` [0/3] Fix timer bugs in neighbour cache Herbert Xu
` (2 preceding siblings ...)
2005-10-23 7:33 ` [3/3] [NEIGH] Fix timer leak in neigh_changeaddr Herbert Xu
@ 2005-10-23 16:16 ` Arnaldo Carvalho de Melo
3 siblings, 0 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2005-10-23 16:16 UTC (permalink / raw)
To: Herbert Xu; +Cc: Reuben Farrelly, akpm, linux-kernel, netdev, davem, greearb
Em Sun, Oct 23, 2005 at 05:30:21PM +1000, Herbert Xu escreveu:
> Reuben Farrelly <reuben-lkml@reub.net> wrote:
> >
> > Oct 17 18:49:40 tornado kernel: NEIGH: BUG, double timer add, state is 1
> > Oct 17 18:51:04 tornado last message repeated 3 times
> > Oct 17 18:52:05 tornado last message repeated 5 times
> > Oct 17 18:52:11 tornado last message repeated 2 times
>
> Excellent. Looks like we actually caught something. Pity we don't have
> a stack trace which means that there might be more bugs.
>
> Anyway, here are three patches which should fix this. This should go
> into 2.6.14.
>
> Arnaldo, you can pull them from
>
> master.kernel.org:/pub/scm/linux/kernel/git/herbert/net-2.6.git
Thanks, pulled.
I guess at some point I'll try to make the neighbour code more like the
sock one where we have things like sk_reset_timer and sk_stop_timer for
this purpose.
- Arnaldo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [3/3] [NEIGH] Fix timer leak in neigh_changeaddr
2005-10-23 7:33 ` [3/3] [NEIGH] Fix timer leak in neigh_changeaddr Herbert Xu
@ 2005-10-23 18:00 ` Ben Greear
0 siblings, 0 replies; 7+ messages in thread
From: Ben Greear @ 2005-10-23 18:00 UTC (permalink / raw)
To: Herbert Xu; +Cc: Reuben Farrelly, akpm, linux-kernel, netdev, acme, davem
Herbert Xu wrote:
> [NEIGH] Fix timer leak in neigh_changeaddr
>
> neigh_changeaddr attempts to delete neighbour timers without setting
> nud_state. This doesn't work because the timer may have already fired
> when we acquire the write lock in neigh_changeaddr. The result is that
> the timer may keep firing for quite a while until the entry reaches
> NEIGH_FAILED.
>
> It should be setting the nud_state straight away so that if the timer
> has already fired it can simply exit once we relinquish the lock.
>
> In fact, this whole function is simply duplicating the logic in
> neigh_ifdown which in turn is already doing the right thing when
> it comes to deleting timers and setting nud_state.
>
> So all we have to do is take that code out and put it into a common
> function and make both neigh_changeaddr and neigh_ifdown call it.
Thanks for all who reproduced and fixed this...I'm glad to know I wasn't
insane when I first tried to fix it and then couldn't reproduce
the problem anymore! :)
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-10-23 18:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.h4unqgj.l34e31@ifi.uio.no>
2005-10-17 6:19 ` 2.6.14-rc4-mm1 Reuben Farrelly
2005-10-23 7:30 ` [0/3] Fix timer bugs in neighbour cache Herbert Xu
2005-10-23 7:31 ` [1/3] [NEIGH] Print stack trace in neigh_add_timer Herbert Xu
2005-10-23 7:32 ` [2/3] [NEIGH] Fix add_timer race " Herbert Xu
2005-10-23 7:33 ` [3/3] [NEIGH] Fix timer leak in neigh_changeaddr Herbert Xu
2005-10-23 18:00 ` Ben Greear
2005-10-23 16:16 ` [0/3] Fix timer bugs in neighbour cache Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).