* [BUG] IPv6 recursive locking
@ 2008-02-17 14:12 Kristof Provost
2008-02-17 18:30 ` Daniel Lezcano
2008-02-18 2:41 ` David Miller
0 siblings, 2 replies; 5+ messages in thread
From: Kristof Provost @ 2008-02-17 14:12 UTC (permalink / raw)
To: netdev
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I'm running the current git (1309d4e68497184d2fd87e892ddf14076c2bda98)
without problems. While I was toying with IPv6 on my local network I managed
to completely hang my machine whenever it receives or sends a neighbour
sollictation. At least, I think that's the cause. It started as soon as I
installed radvd on the router. The included trace seems to point in the same
direction.
The machine is a Dell Latitude D505 (so x86). Network interfaces are e100 and
ipw2200 (firmware not loaded). I'm currently using the e100.
I'll try to bisect it but here's the trace already. Let me know if
there's anything else you'd like to know.
[ 124.439831] =============================================
[ 124.443689] [ INFO: possible recursive locking detected ]
[ 124.443689] 2.6.25-rc2 #33
[ 124.443689] ---------------------------------------------
[ 124.443689] swapper/0 is trying to acquire lock:
[ 124.443689] (&n->lock){-+-+}, at: [<c0468d39>] neigh_resolve_output+0x139/0x290
[ 124.443689]
[ 124.443689] but task is already holding lock:
[ 124.443689] (&n->lock){-+-+}, at: [<c0468ea4>] neigh_timer_handler+0x14/0x280
[ 124.443689]
[ 124.443689] other info that might help us debug this:
[ 124.443689] 1 lock held by swapper/0:
[ 124.443689] #0: (&n->lock){-+-+}, at: [<c0468ea4>] neigh_timer_handler+0x14/0x280
[ 124.443689]
[ 124.443689] stack backtrace:
[ 124.443689] Pid: 0, comm: swapper Not tainted 2.6.25-rc2 #33
[ 124.443689] [<c014863a>] __lock_acquire+0xd3a/0xf40
[ 124.443689] [<c0137ec8>] __kernel_text_address+0x18/0x30
[ 124.443689] [<c01488a0>] lock_acquire+0x60/0x80
[ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
[ 124.443689] [<c059287e>] _write_lock_bh+0x2e/0x40
[ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
[ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
[ 124.443689] [<c0148805>] __lock_acquire+0xf05/0xf40
[ 124.443689] [<c04e1650>] ndisc_dst_alloc+0xe0/0x170
[ 124.443689] [<c04d39f4>] ip6_output_finish+0xa4/0x110
[ 124.443689] [<c0147a1d>] __lock_acquire+0x11d/0xf40
[ 124.443689] [<c04d4759>] ip6_output+0x5b9/0xba0
[ 124.443689] [<c0456eb6>] sock_alloc_send_skb+0x176/0x1d0
[ 124.443689] [<c04e4eab>] __ndisc_send+0x33b/0x540
[ 124.443690] [<c04e4d6e>] __ndisc_send+0x1fe/0x540
[ 124.443690] [<c04e5b69>] ndisc_send_ns+0x69/0xa0
[ 124.443690] [<c04e6c8e>] ndisc_solicit+0xee/0x1b0
[ 124.443690] [<c01472b5>] mark_held_locks+0x35/0x80
[ 124.443690] [<c0592c65>] _spin_unlock_irqrestore+0x45/0x60
[ 124.443690] [<c01473f9>] trace_hardirqs_on+0x79/0x130
[ 124.443690] [<c012f99f>] __mod_timer+0x9f/0xb0
[ 124.443690] [<c0468fd3>] neigh_timer_handler+0x143/0x280
[ 124.443690] [<c012f2ca>] run_timer_softirq+0x14a/0x1c0
[ 124.443690] [<c0468e90>] neigh_timer_handler+0x0/0x280
[ 124.443690] [<c0468e90>] neigh_timer_handler+0x0/0x280
[ 124.443690] [<c012b4c4>] __do_softirq+0x84/0x100
[ 124.443690] [<c012b595>] do_softirq+0x55/0x60
[ 124.443690] [<c012b9e5>] irq_exit+0x65/0x80
[ 124.443690] [<c01073b0>] do_IRQ+0x40/0x70
[ 124.443690] [<c010585e>] common_interrupt+0x2e/0x34
[ 124.443690] [<c032007b>] acpi_power_on+0x3b/0x104
[ 124.443690] [<c0322af6>] acpi_idle_enter_simple+0x194/0x1fe
[ 124.443690] [<c0322727>] acpi_idle_enter_bm+0xc1/0x2fc
[ 124.443690] [<c03fff43>] cpuidle_idle_call+0x63/0xb0
[ 124.443690] [<c03ffee0>] cpuidle_idle_call+0x0/0xb0
[ 124.443690] [<c010380d>] cpu_idle+0x5d/0xf0
[ 124.443690] =======================
Kristof
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFHuEDNUEZ9DhGwDugRAgHaAJ9L6i924sEqim1Ti+rZH2qmGESx6wCfWYIY
PI1kcoY3SWN/O9TOLgGQC20=
=cvKu
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] IPv6 recursive locking
2008-02-17 14:12 [BUG] IPv6 recursive locking Kristof Provost
@ 2008-02-17 18:30 ` Daniel Lezcano
2008-02-17 20:37 ` Benjamin Thery
2008-02-18 2:43 ` David Miller
2008-02-18 2:41 ` David Miller
1 sibling, 2 replies; 5+ messages in thread
From: Daniel Lezcano @ 2008-02-17 18:30 UTC (permalink / raw)
To: Kristof Provost; +Cc: netdev
Kristof Provost wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I'm running the current git (1309d4e68497184d2fd87e892ddf14076c2bda98)
> without problems. While I was toying with IPv6 on my local network I managed
> to completely hang my machine whenever it receives or sends a neighbour
> sollictation. At least, I think that's the cause. It started as soon as I
> installed radvd on the router. The included trace seems to point in the same
> direction.
>
> The machine is a Dell Latitude D505 (so x86). Network interfaces are e100 and
> ipw2200 (firmware not loaded). I'm currently using the e100.
>
> I'll try to bisect it but here's the trace already. Let me know if
> there's anything else you'd like to know.
I think this bug was introduced by the commit:
69cc64d8d92bf852f933e90c888dfff083bd4fc9
"[NDISC]: Fix race in generic address resolution".
> [ 124.439831] =============================================
> [ 124.443689] [ INFO: possible recursive locking detected ]
> [ 124.443689] 2.6.25-rc2 #33
> [ 124.443689] ---------------------------------------------
> [ 124.443689] swapper/0 is trying to acquire lock:
> [ 124.443689] (&n->lock){-+-+}, at: [<c0468d39>] neigh_resolve_output+0x139/0x290
> [ 124.443689]
> [ 124.443689] but task is already holding lock:
> [ 124.443689] (&n->lock){-+-+}, at: [<c0468ea4>] neigh_timer_handler+0x14/0x280
> [ 124.443689]
> [ 124.443689] other info that might help us debug this:
> [ 124.443689] 1 lock held by swapper/0:
> [ 124.443689] #0: (&n->lock){-+-+}, at: [<c0468ea4>] neigh_timer_handler+0x14/0x280
> [ 124.443689]
> [ 124.443689] stack backtrace:
> [ 124.443689] Pid: 0, comm: swapper Not tainted 2.6.25-rc2 #33
> [ 124.443689] [<c014863a>] __lock_acquire+0xd3a/0xf40
> [ 124.443689] [<c0137ec8>] __kernel_text_address+0x18/0x30
> [ 124.443689] [<c01488a0>] lock_acquire+0x60/0x80
> [ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
> [ 124.443689] [<c059287e>] _write_lock_bh+0x2e/0x40
> [ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
> [ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
> [ 124.443689] [<c0148805>] __lock_acquire+0xf05/0xf40
> [ 124.443689] [<c04e1650>] ndisc_dst_alloc+0xe0/0x170
> [ 124.443689] [<c04d39f4>] ip6_output_finish+0xa4/0x110
> [ 124.443689] [<c0147a1d>] __lock_acquire+0x11d/0xf40
> [ 124.443689] [<c04d4759>] ip6_output+0x5b9/0xba0
> [ 124.443689] [<c0456eb6>] sock_alloc_send_skb+0x176/0x1d0
> [ 124.443689] [<c04e4eab>] __ndisc_send+0x33b/0x540
> [ 124.443690] [<c04e4d6e>] __ndisc_send+0x1fe/0x540
> [ 124.443690] [<c04e5b69>] ndisc_send_ns+0x69/0xa0
> [ 124.443690] [<c04e6c8e>] ndisc_solicit+0xee/0x1b0
> [ 124.443690] [<c01472b5>] mark_held_locks+0x35/0x80
> [ 124.443690] [<c0592c65>] _spin_unlock_irqrestore+0x45/0x60
> [ 124.443690] [<c01473f9>] trace_hardirqs_on+0x79/0x130
> [ 124.443690] [<c012f99f>] __mod_timer+0x9f/0xb0
> [ 124.443690] [<c0468fd3>] neigh_timer_handler+0x143/0x280
> [ 124.443690] [<c012f2ca>] run_timer_softirq+0x14a/0x1c0
> [ 124.443690] [<c0468e90>] neigh_timer_handler+0x0/0x280
> [ 124.443690] [<c0468e90>] neigh_timer_handler+0x0/0x280
> [ 124.443690] [<c012b4c4>] __do_softirq+0x84/0x100
> [ 124.443690] [<c012b595>] do_softirq+0x55/0x60
> [ 124.443690] [<c012b9e5>] irq_exit+0x65/0x80
> [ 124.443690] [<c01073b0>] do_IRQ+0x40/0x70
> [ 124.443690] [<c010585e>] common_interrupt+0x2e/0x34
> [ 124.443690] [<c032007b>] acpi_power_on+0x3b/0x104
> [ 124.443690] [<c0322af6>] acpi_idle_enter_simple+0x194/0x1fe
> [ 124.443690] [<c0322727>] acpi_idle_enter_bm+0xc1/0x2fc
> [ 124.443690] [<c03fff43>] cpuidle_idle_call+0x63/0xb0
> [ 124.443690] [<c03ffee0>] cpuidle_idle_call+0x0/0xb0
> [ 124.443690] [<c010380d>] cpu_idle+0x5d/0xf0
> [ 124.443690] =======================
>
> Kristof
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] IPv6 recursive locking
2008-02-17 18:30 ` Daniel Lezcano
@ 2008-02-17 20:37 ` Benjamin Thery
2008-02-18 2:43 ` David Miller
1 sibling, 0 replies; 5+ messages in thread
From: Benjamin Thery @ 2008-02-17 20:37 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: Kristof Provost, netdev, davem
On Feb 17, 2008 7:30 PM, Daniel Lezcano <dlezcano@fr.ibm.com> wrote:
> Kristof Provost wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi,
> >
> > I'm running the current git (1309d4e68497184d2fd87e892ddf14076c2bda98)
> > without problems. While I was toying with IPv6 on my local network I managed
> > to completely hang my machine whenever it receives or sends a neighbour
> > sollictation. At least, I think that's the cause. It started as soon as I
> > installed radvd on the router. The included trace seems to point in the same
> > direction.
> >
> > The machine is a Dell Latitude D505 (so x86). Network interfaces are e100 and
> > ipw2200 (firmware not loaded). I'm currently using the e100.
> >
> > I'll try to bisect it but here's the trace already. Let me know if
> > there's anything else you'd like to know.
>
> I think this bug was introduced by the commit:
>
> 69cc64d8d92bf852f933e90c888dfff083bd4fc9
> "[NDISC]: Fix race in generic address resolution".
I confirm this commit is the culprit.
I reported the same bug last Thursday, but it seems I made a mistake: I replied
to the original thread which led to this commit to report it. But as the thread
was a bit old it seems my answer hadn't been noticed.
See http://www.spinics.net/lists/netdev/msg55373.html
The lockup happens very quickly when you have IPv6 configured.
I think we should revert this commit for now.
Benjamin
>
>
>
> > [ 124.439831] =============================================
> > [ 124.443689] [ INFO: possible recursive locking detected ]
> > [ 124.443689] 2.6.25-rc2 #33
> > [ 124.443689] ---------------------------------------------
> > [ 124.443689] swapper/0 is trying to acquire lock:
> > [ 124.443689] (&n->lock){-+-+}, at: [<c0468d39>] neigh_resolve_output+0x139/0x290
> > [ 124.443689]
> > [ 124.443689] but task is already holding lock:
> > [ 124.443689] (&n->lock){-+-+}, at: [<c0468ea4>] neigh_timer_handler+0x14/0x280
> > [ 124.443689]
> > [ 124.443689] other info that might help us debug this:
> > [ 124.443689] 1 lock held by swapper/0:
> > [ 124.443689] #0: (&n->lock){-+-+}, at: [<c0468ea4>] neigh_timer_handler+0x14/0x280
> > [ 124.443689]
> > [ 124.443689] stack backtrace:
> > [ 124.443689] Pid: 0, comm: swapper Not tainted 2.6.25-rc2 #33
> > [ 124.443689] [<c014863a>] __lock_acquire+0xd3a/0xf40
> > [ 124.443689] [<c0137ec8>] __kernel_text_address+0x18/0x30
> > [ 124.443689] [<c01488a0>] lock_acquire+0x60/0x80
> > [ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
> > [ 124.443689] [<c059287e>] _write_lock_bh+0x2e/0x40
> > [ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
> > [ 124.443689] [<c0468d39>] neigh_resolve_output+0x139/0x290
> > [ 124.443689] [<c0148805>] __lock_acquire+0xf05/0xf40
> > [ 124.443689] [<c04e1650>] ndisc_dst_alloc+0xe0/0x170
> > [ 124.443689] [<c04d39f4>] ip6_output_finish+0xa4/0x110
> > [ 124.443689] [<c0147a1d>] __lock_acquire+0x11d/0xf40
> > [ 124.443689] [<c04d4759>] ip6_output+0x5b9/0xba0
> > [ 124.443689] [<c0456eb6>] sock_alloc_send_skb+0x176/0x1d0
> > [ 124.443689] [<c04e4eab>] __ndisc_send+0x33b/0x540
> > [ 124.443690] [<c04e4d6e>] __ndisc_send+0x1fe/0x540
> > [ 124.443690] [<c04e5b69>] ndisc_send_ns+0x69/0xa0
> > [ 124.443690] [<c04e6c8e>] ndisc_solicit+0xee/0x1b0
> > [ 124.443690] [<c01472b5>] mark_held_locks+0x35/0x80
> > [ 124.443690] [<c0592c65>] _spin_unlock_irqrestore+0x45/0x60
> > [ 124.443690] [<c01473f9>] trace_hardirqs_on+0x79/0x130
> > [ 124.443690] [<c012f99f>] __mod_timer+0x9f/0xb0
> > [ 124.443690] [<c0468fd3>] neigh_timer_handler+0x143/0x280
> > [ 124.443690] [<c012f2ca>] run_timer_softirq+0x14a/0x1c0
> > [ 124.443690] [<c0468e90>] neigh_timer_handler+0x0/0x280
> > [ 124.443690] [<c0468e90>] neigh_timer_handler+0x0/0x280
> > [ 124.443690] [<c012b4c4>] __do_softirq+0x84/0x100
> > [ 124.443690] [<c012b595>] do_softirq+0x55/0x60
> > [ 124.443690] [<c012b9e5>] irq_exit+0x65/0x80
> > [ 124.443690] [<c01073b0>] do_IRQ+0x40/0x70
> > [ 124.443690] [<c010585e>] common_interrupt+0x2e/0x34
> > [ 124.443690] [<c032007b>] acpi_power_on+0x3b/0x104
> > [ 124.443690] [<c0322af6>] acpi_idle_enter_simple+0x194/0x1fe
> > [ 124.443690] [<c0322727>] acpi_idle_enter_bm+0xc1/0x2fc
> > [ 124.443690] [<c03fff43>] cpuidle_idle_call+0x63/0xb0
> > [ 124.443690] [<c03ffee0>] cpuidle_idle_call+0x0/0xb0
> > [ 124.443690] [<c010380d>] cpu_idle+0x5d/0xf0
> > [ 124.443690] =======================
> >
> > Kristof
> >
> --
>
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG] IPv6 recursive locking
2008-02-17 14:12 [BUG] IPv6 recursive locking Kristof Provost
2008-02-17 18:30 ` Daniel Lezcano
@ 2008-02-18 2:41 ` David Miller
1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2008-02-18 2:41 UTC (permalink / raw)
To: Kristof; +Cc: netdev
From: Kristof Provost <Kristof@provost-engineering.be>
Date: Sun, 17 Feb 2008 14:12:29 +0000
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I'm running the current git (1309d4e68497184d2fd87e892ddf14076c2bda98)
> without problems. While I was toying with IPv6 on my local network I managed
> to completely hang my machine whenever it receives or sends a neighbour
> sollictation. At least, I think that's the cause. It started as soon as I
> installed radvd on the router. The included trace seems to point in the same
> direction.
>
> The machine is a Dell Latitude D505 (so x86). Network interfaces are e100 and
> ipw2200 (firmware not loaded). I'm currently using the e100.
>
> I'll try to bisect it but here's the trace already. Let me know if
> there's anything else you'd like to know.
I've committed the following revert to fix this, the race bug
will need another solution, perhaps the one which uses skb_copy().
commit 9ff566074689e3aed1488780b97714ec43ba361d
Author: David S. Miller <davem@davemloft.net>
Date: Sun Feb 17 18:39:54 2008 -0800
Revert "[NDISC]: Fix race in generic address resolution"
This reverts commit 69cc64d8d92bf852f933e90c888dfff083bd4fc9.
It causes recursive locking in IPV6 because unlike other
neighbour layer clients, it even needs neighbour cache
entries to send neighbour soliciation messages :-(
We'll have to find another way to fix this race.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 7bb6a9a..a16cf1e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -834,12 +834,18 @@ static void neigh_timer_handler(unsigned long arg)
}
if (neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) {
struct sk_buff *skb = skb_peek(&neigh->arp_queue);
-
+ /* keep skb alive even if arp_queue overflows */
+ if (skb)
+ skb_get(skb);
+ write_unlock(&neigh->lock);
neigh->ops->solicit(neigh, skb);
atomic_inc(&neigh->probes);
- }
+ if (skb)
+ kfree_skb(skb);
+ } else {
out:
- write_unlock(&neigh->lock);
+ write_unlock(&neigh->lock);
+ }
if (notify)
neigh_update_notify(neigh);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index c663fa5..8e17f65 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -368,6 +368,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
if (!(neigh->nud_state&NUD_VALID))
printk(KERN_DEBUG "trying to ucast probe in NUD_INVALID\n");
dst_ha = neigh->ha;
+ read_lock_bh(&neigh->lock);
} else if ((probes -= neigh->parms->app_probes) < 0) {
#ifdef CONFIG_ARPD
neigh_app_ns(neigh);
@@ -377,6 +378,8 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
dst_ha, dev->dev_addr, NULL);
+ if (dst_ha)
+ read_unlock_bh(&neigh->lock);
}
static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [BUG] IPv6 recursive locking
2008-02-17 18:30 ` Daniel Lezcano
2008-02-17 20:37 ` Benjamin Thery
@ 2008-02-18 2:43 ` David Miller
1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2008-02-18 2:43 UTC (permalink / raw)
To: dlezcano; +Cc: Kristof, netdev
From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Sun, 17 Feb 2008 19:30:03 +0100
> I think this bug was introduced by the commit:
>
> 69cc64d8d92bf852f933e90c888dfff083bd4fc9
> "[NDISC]: Fix race in generic address resolution".
Yep and I'll revert this for now.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-02-18 2:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-17 14:12 [BUG] IPv6 recursive locking Kristof Provost
2008-02-17 18:30 ` Daniel Lezcano
2008-02-17 20:37 ` Benjamin Thery
2008-02-18 2:43 ` David Miller
2008-02-18 2:41 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).