* [PATCH] Deadlock in af_packet/packet_rcv
@ 2004-11-25 20:55 Olaf Kirch
2004-11-30 4:02 ` David S. Miller
2004-11-30 9:32 ` Tommy Christensen
0 siblings, 2 replies; 9+ messages in thread
From: Olaf Kirch @ 2004-11-25 20:55 UTC (permalink / raw)
To: netdev
Before introduction of lock-less loopback, all of netdev_xmit_nit was
running with bottom halves disabled, and some code seems to rely on
this. One of them is af_packet's packet_rcv handler, which is called
from the receive path via netdev_xmit_nit. It takes a spin lock,
and if an interrupt occurs and calls netdev_xmit during this time,
the CPU deadlocks.
The patch below disables BHs while in the TX path for loopback and
similar devices.
An alternative, less conservative fix would be to just use spin_lock_bh
in af_packet.c.
Signed-off-by: Olaf Kirch <okir@suse.de>
Index: linux-2.6.9/net/core/dev.c
===================================================================
--- linux-2.6.9.orig/net/core/dev.c
+++ linux-2.6.9/net/core/dev.c
@@ -1222,6 +1222,7 @@ int __skb_linearize(struct sk_buff *skb,
}
#define HARD_TX_LOCK(dev, cpu) { \
+ local_disable_bh(); \
if ((dev->features & NETIF_F_LLTX) == 0) { \
spin_lock(&dev->xmit_lock); \
dev->xmit_lock_owner = cpu; \
@@ -1233,6 +1234,7 @@ int __skb_linearize(struct sk_buff *skb,
dev->xmit_lock_owner = -1; \
spin_unlock(&dev->xmit_lock); \
} \
+ local_enable_bh(); \
}
static inline void qdisc_run(struct net_device *dev)
--
Olaf Kirch | Things that make Monday morning interesting, #2:
okir@suse.de | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-25 20:55 [PATCH] Deadlock in af_packet/packet_rcv Olaf Kirch
@ 2004-11-30 4:02 ` David S. Miller
2004-11-30 10:48 ` Olaf Kirch
2004-11-30 9:32 ` Tommy Christensen
1 sibling, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-11-30 4:02 UTC (permalink / raw)
To: Olaf Kirch; +Cc: netdev
On Thu, 25 Nov 2004 21:55:03 +0100
Olaf Kirch <okir@suse.de> wrote:
> Before introduction of lock-less loopback, all of netdev_xmit_nit was
> running with bottom halves disabled, and some code seems to rely on
> this. One of them is af_packet's packet_rcv handler, which is called
> from the receive path via netdev_xmit_nit. It takes a spin lock,
> and if an interrupt occurs and calls netdev_xmit during this time,
> the CPU deadlocks.
>
> The patch below disables BHs while in the TX path for loopback and
> similar devices.
>
> An alternative, less conservative fix would be to just use spin_lock_bh
> in af_packet.c.
Good spotting.
If it's only the dev_queue_xmit_nit() handlers which all want
BHs disabled, why don't we just disable BHs in that function?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-25 20:55 [PATCH] Deadlock in af_packet/packet_rcv Olaf Kirch
2004-11-30 4:02 ` David S. Miller
@ 2004-11-30 9:32 ` Tommy Christensen
2004-11-30 11:01 ` Olaf Kirch
1 sibling, 1 reply; 9+ messages in thread
From: Tommy Christensen @ 2004-11-30 9:32 UTC (permalink / raw)
To: Olaf Kirch; +Cc: netdev
Olaf Kirch wrote:
> Before introduction of lock-less loopback, all of netdev_xmit_nit was
> running with bottom halves disabled, and some code seems to rely on
> this. One of them is af_packet's packet_rcv handler, which is called
> from the receive path via netdev_xmit_nit. It takes a spin lock,
> and if an interrupt occurs and calls netdev_xmit during this time,
> the CPU deadlocks.
An interrupt handler shouldn't call dev_queue_xmit() directly. If
this indeed happens, it needs to be fixed. Which handler is this?
> The patch below disables BHs while in the TX path for loopback and
> similar devices.
dev_queue_xmit() already have BHs disabled, so this won't do anything.
And weren't you talking about interrupts ??
Protocol handlers are called with bottom halves disabled, but IRQs
are enabled. This hasn't changed - and is assumed in lots of places.
-Tommy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-30 4:02 ` David S. Miller
@ 2004-11-30 10:48 ` Olaf Kirch
0 siblings, 0 replies; 9+ messages in thread
From: Olaf Kirch @ 2004-11-30 10:48 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 534 bytes --]
On Mon, Nov 29, 2004 at 08:02:20PM -0800, David S. Miller wrote:
> If it's only the dev_queue_xmit_nit() handlers which all want
> BHs disabled, why don't we just disable BHs in that function?
That was my original fix (see attachment), but Andi Kleen suggested to
actually disable BHs at that level; I think as a matter of caution.
Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
okir@suse.de | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"
[-- Attachment #2: netdev-xmit --]
[-- Type: text/plain, Size: 626 bytes --]
Index: linux-2.6.9/net/core/dev.c
===================================================================
--- linux-2.6.9.orig/net/core/dev.c 2004-10-18 23:54:08.000000000 +0200
+++ linux-2.6.9/net/core/dev.c 2004-11-30 11:46:39.000000000 +0100
@@ -1094,9 +1094,15 @@ void dev_queue_xmit_nit(struct sk_buff *
skb2->nh.raw = skb2->data;
}
+ /* The packet handler may expect BHs to be disabled,
+ * so don't let them down */
+ local_bh_disable();
+
skb2->h.raw = skb2->nh.raw;
skb2->pkt_type = PACKET_OUTGOING;
ptype->func(skb2, skb->dev, ptype);
+
+ local_bh_enable();
}
}
rcu_read_unlock();
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-30 9:32 ` Tommy Christensen
@ 2004-11-30 11:01 ` Olaf Kirch
2004-11-30 11:31 ` Tommy Christensen
0 siblings, 1 reply; 9+ messages in thread
From: Olaf Kirch @ 2004-11-30 11:01 UTC (permalink / raw)
To: Tommy Christensen; +Cc: netdev
On Tue, Nov 30, 2004 at 10:32:31AM +0100, Tommy Christensen wrote:
> An interrupt handler shouldn't call dev_queue_xmit() directly. If
> this indeed happens, it needs to be fixed. Which handler is this?
The call path according to KDB goes like this:
application does sendmsg()
udp_push_pending_frames
ip_push_pending_frames
ip_output
dev_queue_xmit
dev_queue_xmit_nit
calls ptype->func(skb2, skb->dev, ptype),
where func=packet_rcv
packet_rcv (and this runs with BHs enabled)
take the &sk->sk_receive_queue spinlock
*** timer interrupt
net_tx_action
take the dev->queue_lock spin lock
qdisc_run
qdisc_restart
dev_queue_xmit_nit
as above
packet_rcv
blocks on the &sk->sk_receive_queue spinlock
Before lockless-loopback this never triggered because we did a
spin_lock_bh(&dev->xmit_lock) around the call to dev_queue_xmit_nit.
Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
okir@suse.de | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-30 11:01 ` Olaf Kirch
@ 2004-11-30 11:31 ` Tommy Christensen
2004-11-30 11:45 ` Olaf Kirch
0 siblings, 1 reply; 9+ messages in thread
From: Tommy Christensen @ 2004-11-30 11:31 UTC (permalink / raw)
To: Olaf Kirch; +Cc: netdev
Olaf Kirch wrote:
> On Tue, Nov 30, 2004 at 10:32:31AM +0100, Tommy Christensen wrote:
>
>>An interrupt handler shouldn't call dev_queue_xmit() directly. If
>>this indeed happens, it needs to be fixed. Which handler is this?
>
>
> The call path according to KDB goes like this:
>
> application does sendmsg()
> udp_push_pending_frames
> ip_push_pending_frames
> ip_output
> dev_queue_xmit
> dev_queue_xmit_nit
> calls ptype->func(skb2, skb->dev, ptype),
> where func=packet_rcv
> packet_rcv (and this runs with BHs enabled)
> take the &sk->sk_receive_queue spinlock
> *** timer interrupt
> net_tx_action
> take the dev->queue_lock spin lock
> qdisc_run
> qdisc_restart
> dev_queue_xmit_nit
> as above
> packet_rcv
> blocks on the &sk->sk_receive_queue spinlock
>
> Before lockless-loopback this never triggered because we did a
> spin_lock_bh(&dev->xmit_lock) around the call to dev_queue_xmit_nit.
>
> Olaf
Ahh, back-traces are *so* nice to have.
I still don't agree with the conclusion, though. The spin_lock_bh()
is changed to a local_bh_disable() and an optional spin_lock().
That should not lead to what you are seeing!
I think perhaps your 'BH disabled count' has been corrupted.
There's a fix for that in 2.6.10-rc2.
-Tommy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-30 11:31 ` Tommy Christensen
@ 2004-11-30 11:45 ` Olaf Kirch
2004-11-30 11:56 ` Tommy Christensen
0 siblings, 1 reply; 9+ messages in thread
From: Olaf Kirch @ 2004-11-30 11:45 UTC (permalink / raw)
To: Tommy Christensen; +Cc: netdev
On Tue, Nov 30, 2004 at 12:31:50PM +0100, Tommy Christensen wrote:
> I still don't agree with the conclusion, though. The spin_lock_bh()
> is changed to a local_bh_disable() and an optional spin_lock().
> That should not lead to what you are seeing!
Well, the code in 2.6.9 has
#define HARD_TX_LOCK(dev, cpu) { \
if ((dev->features & NETIF_F_LLTX) == 0) { \
spin_lock(&dev->xmit_lock); \
dev->xmit_lock_owner = cpu; \
} \
}
i.e. there's no local_bh_disable at all - adding the local_bh_disable
was the whole point of my patch. Or did you refer to a different spinlock?
Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
okir@suse.de | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-30 11:45 ` Olaf Kirch
@ 2004-11-30 11:56 ` Tommy Christensen
2004-11-30 21:07 ` David S. Miller
0 siblings, 1 reply; 9+ messages in thread
From: Tommy Christensen @ 2004-11-30 11:56 UTC (permalink / raw)
To: Olaf Kirch; +Cc: netdev
Olaf Kirch wrote:
> On Tue, Nov 30, 2004 at 12:31:50PM +0100, Tommy Christensen wrote:
>
>>I still don't agree with the conclusion, though. The spin_lock_bh()
>>is changed to a local_bh_disable() and an optional spin_lock().
>>That should not lead to what you are seeing!
>
>
> Well, the code in 2.6.9 has
>
> #define HARD_TX_LOCK(dev, cpu) { \
> if ((dev->features & NETIF_F_LLTX) == 0) { \
> spin_lock(&dev->xmit_lock); \
> dev->xmit_lock_owner = cpu; \
> } \
> }
>
> i.e. there's no local_bh_disable at all - adding the local_bh_disable
> was the whole point of my patch. Or did you refer to a different spinlock?
The local_bh_disable() is called earlier in dev_queue_xmit(), and
is held across the whole HARD_TX_LOCK/dev_queue_xmit_nit/
hard_start_xmit/HARD_TX_UNLOCK sequence.
-Tommy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Deadlock in af_packet/packet_rcv
2004-11-30 11:56 ` Tommy Christensen
@ 2004-11-30 21:07 ` David S. Miller
0 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-11-30 21:07 UTC (permalink / raw)
To: Tommy Christensen; +Cc: okir, netdev
On Tue, 30 Nov 2004 12:56:30 +0100
Tommy Christensen <tommy.christensen@tpack.net> wrote:
> > i.e. there's no local_bh_disable at all - adding the local_bh_disable
> > was the whole point of my patch. Or did you refer to a different spinlock?
>
> The local_bh_disable() is called earlier in dev_queue_xmit(), and
> is held across the whole HARD_TX_LOCK/dev_queue_xmit_nit/
> hard_start_xmit/HARD_TX_UNLOCK sequence.
He's right. There must be something else happening in your
tree Olaf.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-11-30 21:07 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-25 20:55 [PATCH] Deadlock in af_packet/packet_rcv Olaf Kirch
2004-11-30 4:02 ` David S. Miller
2004-11-30 10:48 ` Olaf Kirch
2004-11-30 9:32 ` Tommy Christensen
2004-11-30 11:01 ` Olaf Kirch
2004-11-30 11:31 ` Tommy Christensen
2004-11-30 11:45 ` Olaf Kirch
2004-11-30 11:56 ` Tommy Christensen
2004-11-30 21:07 ` David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).