public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [lock validator] drivers/net/8139too.c: deadlock?
@ 2006-01-26 22:43 Ingo Molnar
  2006-01-27  0:35 ` Francois Romieu
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-01-26 22:43 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Andrew Morton, David S. Miller

the lock validator i'm working on found the following scenario in the 
rtl8139 driver, which it flagged as a deadlock:

  ---------------------------------------->
  NETDEV WATCHDOG: eth0: transmit timed out
  eth0: Transmit timeout, status 0d 0000 c07f media 80.
  eth0: Tx queue start entry 164281  dirty entry 164277.
  eth0:  Tx descriptor 0 is 000805ea.
  eth0:  Tx descriptor 1 is 00080042. (queue head)
  eth0:  Tx descriptor 2 is 00080042.
  eth0:  Tx descriptor 3 is 000805ea.
  
  ============================================
  [ BUG: circular locking deadlock detected! ]
  --------------------------------------------
  hackbench/9560 [2] is trying to acquire lock {&tp->rx_lock} at:
   [<c033e460>] rtl8139_tx_timeout+0x110/0x1f0
  but task is already holding lock {&dev->xmit_lock}, acquired at:
   [<c045294b>] dev_watchdog+0x1b/0xc0
  which lock already depends on the new lock,
  which could lead to circular deadlocks!
  
  the dependency chain (in reverse order) is:
  -> #4 {&dev->xmit_lock}: [<c045294b>] dev_watchdog+0x1b/0xc0
  -> #3 {&dev->queue_lock}: [<c0447154>] dev_queue_xmit+0x64/0x290
  -> #2 {&((sk)->sk_lock.slock)}: [<c043eb66>] sk_clone+0x66/0x200
  -> #1 {&((sk)->sk_lock.slock)}: [<c047a116>] tcp_v4_rcv+0x726/0x9d0
  -> #0 {&tp->rx_lock}: [<c033e460>] rtl8139_tx_timeout+0x110/0x1f0
  
  other info that might help us debug this:
  
  locks held by hackbench/9560:
   #0:  {net/unix/af_unix.c:&u->readlock} [<c0490e6f>] unix_stream_recvmsg+0xbf/0x4f0
   #1:  {&dev->xmit_lock} [<c045294b>] dev_watchdog+0x1b/0xc0
  
  stack backtrace:
   [<c010432d>] show_trace+0xd/0x10
   [<c0104347>] dump_stack+0x17/0x20
   [<c0137d60>] print_circular_bug_tail+0x40/0x50
   [<c013922f>] debug_lock_chain+0x74f/0xd40
   [<c013985d>] debug_lock_chain_spin+0x3d/0x60
   [<c0266add>] _raw_spin_lock+0x2d/0x90
   [<c04d9d18>] _spin_lock+0x8/0x10
   [<c033e460>] rtl8139_tx_timeout+0x110/0x1f0
   [<c04529e8>] dev_watchdog+0xb8/0xc0
   [<c0127615>] run_timer_softirq+0xf5/0x1f0
   [<c0122f67>] __do_softirq+0x97/0x130
   [<c0105519>] do_softirq+0x69/0x100
   =======================
   [<c0122c19>] irq_exit+0x39/0x50
   [<c010f4cc>] smp_apic_timer_interrupt+0x4c/0x50
   [<c010393b>] apic_timer_interrupt+0x27/0x2c
   [<c0441829>] skb_release_data+0x59/0xa0
   [<c0441b43>] kfree_skbmem+0x13/0xe0
   [<c0441c58>] __kfree_skb+0x48/0xc0
   [<c0490f7c>] unix_stream_recvmsg+0x1cc/0x4f0
   [<c043d2d5>] do_sock_read+0x95/0xd0
   [<c043d475>] sock_aio_read+0x75/0x80
   [<c016499b>] do_sync_read+0xbb/0x110
   [<c0164e88>] vfs_read+0x148/0x150
   [<c01658bd>] sys_read+0x3d/0x70
   [<c0102df7>] sysenter_past_esp+0x54/0x8d
  eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
  <----------------------------------------------

i'm wondering, is this a genuine deadlock, or a false positive? The 
dependency chain is quite complex, but looks realistic:

  -> #4 {&dev->xmit_lock}: [<c045294b>] dev_watchdog+0x1b/0xc0
  -> #3 {&dev->queue_lock}: [<c0447154>] dev_queue_xmit+0x64/0x290
  -> #2 {&((sk)->sk_lock.slock)}: [<c043eb66>] sk_clone+0x66/0x200
  -> #1 {&((sk)->sk_lock.slock)}: [<c047a116>] tcp_v4_rcv+0x726/0x9d0
  -> #0 {&tp->rx_lock}: [<c033e460>] rtl8139_tx_timeout+0x110/0x1f0

and rtl8139_tx_timeout() is rare enough to not cause real lockups in 
practice all that often.

explanation of the validator output: the above dependency chain does not 
mean it actually occured in one single call sequence - it is a 
comulative (and full) depdency graph the validator is maintaining, to 
prove locking correctness. So it can easily be multiple tasks, at 
distinct points in time, on different CPUs, which built this dependency 
chain. The first (#0) and the last (#4) entry is the current locking 
sequence's fingerprint - so do not understand the above to be an actual 
locking stack - it cannot possibly occur in this order.

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [lock validator] drivers/net/8139too.c: deadlock?
  2006-01-26 22:43 [lock validator] drivers/net/8139too.c: deadlock? Ingo Molnar
@ 2006-01-27  0:35 ` Francois Romieu
  2006-01-27  2:22   ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Francois Romieu @ 2006-01-27  0:35 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Jeff Garzik, linux-kernel, Andrew Morton, David S. Miller

Ingo Molnar <mingo@elte.hu> :
[...]
> i'm wondering, is this a genuine deadlock, or a false positive? The 
> dependency chain is quite complex, but looks realistic:
> 
>   -> #4 {&dev->xmit_lock}: [<c045294b>] dev_watchdog+0x1b/0xc0
>   -> #3 {&dev->queue_lock}: [<c0447154>] dev_queue_xmit+0x64/0x290
>   -> #2 {&((sk)->sk_lock.slock)}: [<c043eb66>] sk_clone+0x66/0x200
>   -> #1 {&((sk)->sk_lock.slock)}: [<c047a116>] tcp_v4_rcv+0x726/0x9d0
>   -> #0 {&tp->rx_lock}: [<c033e460>] rtl8139_tx_timeout+0x110/0x1f0

It looks like watchdog racing against rx_poll (which should/can not
happen). Do you have something specific in mind ?

--
Ueimor

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [lock validator] drivers/net/8139too.c: deadlock?
  2006-01-27  0:35 ` Francois Romieu
@ 2006-01-27  2:22   ` Herbert Xu
  2006-01-27  2:44     ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Herbert Xu @ 2006-01-27  2:22 UTC (permalink / raw)
  To: Francois Romieu; +Cc: mingo, jgarzik, linux-kernel, akpm, davem

Francois Romieu <romieu@fr.zoreil.com> wrote:
> Ingo Molnar <mingo@elte.hu> :
> [...]
>> i'm wondering, is this a genuine deadlock, or a false positive? The 
>> dependency chain is quite complex, but looks realistic:
>> 
>>   -> #4 {&dev->xmit_lock}: [<c045294b>] dev_watchdog+0x1b/0xc0
>>   -> #3 {&dev->queue_lock}: [<c0447154>] dev_queue_xmit+0x64/0x290
>>   -> #2 {&((sk)->sk_lock.slock)}: [<c043eb66>] sk_clone+0x66/0x200
>>   -> #1 {&((sk)->sk_lock.slock)}: [<c047a116>] tcp_v4_rcv+0x726/0x9d0
>>   -> #0 {&tp->rx_lock}: [<c033e460>] rtl8139_tx_timeout+0x110/0x1f0
> 
> It looks like watchdog racing against rx_poll (which should/can not
> happen). Do you have something specific in mind ?

You've got it.

rx_poll => rtl8139_rx => netif_receive_skb => ... => tcp_v4_rcv

In fact once we're at netif_receive_skb it's easy to see how we'll grab
xmit_lock again.

Prescription: Move TX timeout handling into a work queue.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [lock validator] drivers/net/8139too.c: deadlock?
  2006-01-27  2:22   ` Herbert Xu
@ 2006-01-27  2:44     ` Jeff Garzik
  2006-01-31  0:24       ` [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race Francois Romieu
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2006-01-27  2:44 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Francois Romieu, mingo, jgarzik, linux-kernel, akpm, davem

Herbert Xu wrote:
> You've got it.
> 
> rx_poll => rtl8139_rx => netif_receive_skb => ... => tcp_v4_rcv
> 
> In fact once we're at netif_receive_skb it's easy to see how we'll grab
> xmit_lock again.
> 
> Prescription: Move TX timeout handling into a work queue.


This was my recommendation as well.

Using a work queue not only solves this locking problem, but it also 
enables the possibility of sleeping, a common operation during the 
hardware resets that often occur during ->tx_timeout()

Anyone who wants to make this change in a driver, don't forget that this 
could race with other operations, notably dev->stop() or suspend.

	Jeff



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race
  2006-01-27  2:44     ` Jeff Garzik
@ 2006-01-31  0:24       ` Francois Romieu
  2006-01-31 18:49         ` Ingo Molnar
  2006-02-01  0:55         ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Francois Romieu @ 2006-01-31  0:24 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Herbert Xu, mingo, linux-kernel, akpm, davem

Ingo's stealth lock validator detected that both thread acquire
dev->xmit_lock and tp->rx_lock in reverse order.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

471819cf98a56751ae0387400542fe2997855327
diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index adfba44..2beac55 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -586,6 +586,7 @@ struct rtl8139_private {
 	dma_addr_t tx_bufs_dma;
 	signed char phys[4];		/* MII device addresses. */
 	char twistie, twist_row, twist_col;	/* Twister tune state. */
+	unsigned int watchdog_fired : 1;
 	unsigned int default_port : 4;	/* Last dev->if_port value. */
 	unsigned int have_thread : 1;
 	spinlock_t lock;
@@ -638,6 +639,7 @@ static void rtl8139_set_rx_mode (struct 
 static void __set_rx_mode (struct net_device *dev);
 static void rtl8139_hw_start (struct net_device *dev);
 static void rtl8139_thread (void *_data);
+static void rtl8139_tx_timeout_task(void *_data);
 static struct ethtool_ops rtl8139_ethtool_ops;
 
 /* write MMIO register, with flush */
@@ -1598,13 +1600,14 @@ static void rtl8139_thread (void *_data)
 {
 	struct net_device *dev = _data;
 	struct rtl8139_private *tp = netdev_priv(dev);
-	unsigned long thr_delay;
+	unsigned long thr_delay = next_tick;
 
-	if (rtnl_shlock_nowait() == 0) {
+	if (tp->watchdog_fired) {
+		tp->watchdog_fired = 0;
+		rtl8139_tx_timeout_task(_data);
+	} else if (rtnl_shlock_nowait() == 0) {
 		rtl8139_thread_iter (dev, tp, tp->mmio_addr);
 		rtnl_unlock ();
-
-		thr_delay = next_tick;
 	} else {
 		/* unlikely race.  mitigate with fast poll. */
 		thr_delay = HZ / 2;
@@ -1631,7 +1634,8 @@ static void rtl8139_stop_thread(struct r
 	if (tp->have_thread) {
 		cancel_rearming_delayed_work(&tp->thread);
 		tp->have_thread = 0;
-	}
+	} else
+		flush_scheduled_work();
 }
 
 static inline void rtl8139_tx_clear (struct rtl8139_private *tp)
@@ -1642,14 +1646,13 @@ static inline void rtl8139_tx_clear (str
 	/* XXX account for unsent Tx packets in tp->stats.tx_dropped */
 }
 
-
-static void rtl8139_tx_timeout (struct net_device *dev)
+static void rtl8139_tx_timeout_task (void *_data)
 {
+	struct net_device *dev = _data;
 	struct rtl8139_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
 	int i;
 	u8 tmp8;
-	unsigned long flags;
 
 	printk (KERN_DEBUG "%s: Transmit timeout, status %2.2x %4.4x %4.4x "
 		"media %2.2x.\n", dev->name, RTL_R8 (ChipCmd),
@@ -1670,23 +1673,34 @@ static void rtl8139_tx_timeout (struct n
 	if (tmp8 & CmdTxEnb)
 		RTL_W8 (ChipCmd, CmdRxEnb);
 
-	spin_lock(&tp->rx_lock);
+	spin_lock_bh(&tp->rx_lock);
 	/* Disable interrupts by clearing the interrupt mask. */
 	RTL_W16 (IntrMask, 0x0000);
 
 	/* Stop a shared interrupt from scavenging while we are. */
-	spin_lock_irqsave (&tp->lock, flags);
+	spin_lock_irq(&tp->lock);
 	rtl8139_tx_clear (tp);
-	spin_unlock_irqrestore (&tp->lock, flags);
+	spin_unlock_irq(&tp->lock);
 
 	/* ...and finally, reset everything */
 	if (netif_running(dev)) {
 		rtl8139_hw_start (dev);
 		netif_wake_queue (dev);
 	}
-	spin_unlock(&tp->rx_lock);
+	spin_unlock_bh(&tp->rx_lock);
 }
 
+static void rtl8139_tx_timeout (struct net_device *dev)
+{
+	struct rtl8139_private *tp = netdev_priv(dev);
+
+	if (!tp->have_thread) {
+		INIT_WORK(&tp->thread, rtl8139_tx_timeout_task, dev);
+		schedule_delayed_work(&tp->thread, next_tick);
+	} else
+		tp->watchdog_fired = 1;
+
+}
 
 static int rtl8139_start_xmit (struct sk_buff *skb, struct net_device *dev)
 {
-- 
1.1.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race
  2006-01-31  0:24       ` [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race Francois Romieu
@ 2006-01-31 18:49         ` Ingo Molnar
  2006-01-31 18:57           ` Ingo Molnar
  2006-02-01  0:55         ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-01-31 18:49 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Jeff Garzik, Herbert Xu, linux-kernel, akpm, davem

[-- Attachment #1: Type: text/plain, Size: 3907 bytes --]


* Francois Romieu <romieu@fr.zoreil.com> wrote:

> Ingo's stealth lock validator detected that both thread acquire
> dev->xmit_lock and tp->rx_lock in reverse order.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

i've ported this to -mm4 (see the attached patch), but i'm also getting 
a new deadlock message. Seems to be a separate issue, not introduced by 
your patch - but still needs fixing i guess.

	Ingo

=========================================
[ BUG: irq lock inversion bug detected! ]
-----------------------------------------
rcu_torture_rea/263 just changed the state of lock:
 (&mc->mca_lock){-+}, at: [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0
but this lock took another, softirq-unsafe lock in the past:
 (&tp->lock){--}, at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180

and interrupts could create an inverse lock dependency between them,
which could lead to deadlocks!

other info that might help us debug this:
no locks held by rcu_torture_rea/263.
-> (&tp->lock){--} {
   used       at: [<c03413da>] rtl8139_interrupt+0x2a/0x5e0
   softirq-on at: [<c044b5fa>] dev_mc_upload+0x2a/0x40
   hardirq-on at: [<c0160a28>] kmem_cache_alloc+0x98/0xd0
 }
 ... key      at: [<c033fffb>] rtl8139_init_one+0x2ab/0x960
 ... acquired at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180


the first lock's dependencies:
-> (&mc->mca_lock){-+} {
   used       at: [<c04b3440>] igmp6_group_added+0x20/0x170
   in-softirq at: [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0
   hardirq-on at: [<c04dcaa5>] _spin_unlock_irqrestore+0x25/0x30
 }
 ... key      at: [<c04b3edd>] ipv6_dev_mc_inc+0x17d/0x3c0
 ... acquired at: [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0

  -> (&dev->xmit_lock){-.} {
     used       at: [<c044b5ec>] dev_mc_upload+0x1c/0x40
     hardirq-on at: [<c04dba6a>] __mutex_lock_slowpath+0x27a/0x4e0
   }
   ... key      at: [<c0449fdd>] register_netdevice+0x5d/0x3a0
   ... acquired at: [<c044b724>] dev_mc_add+0x34/0x140

    -> (&tp->lock){--} {
       used       at: [<c03413da>] rtl8139_interrupt+0x2a/0x5e0
       softirq-on at: [<c044b5fa>] dev_mc_upload+0x2a/0x40
       hardirq-on at: [<c0160a28>] kmem_cache_alloc+0x98/0xd0
     }
     ... key      at: [<c033fffb>] rtl8139_init_one+0x2ab/0x960
     ... acquired at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180

    -> (&base->t_base.lock){++} {
       used       at: [<c0126df3>] lock_timer_base+0x23/0x50
       in-hardirq at: [<c0126df3>] lock_timer_base+0x23/0x50
       in-softirq at: [<c01277f2>] run_timer_softirq+0x42/0x1f0
     }
     ... key      at: [<c1fd2ef0>] 0xc1fd2ef0
     ... acquired at: [<c0126df3>] lock_timer_base+0x23/0x50


the second lock's dependencies:
-> (&tp->lock){--} {
   used       at: [<c03413da>] rtl8139_interrupt+0x2a/0x5e0
   softirq-on at: [<c044b5fa>] dev_mc_upload+0x2a/0x40
   hardirq-on at: [<c0160a28>] kmem_cache_alloc+0x98/0xd0
 }
 ... key      at: [<c033fffb>] rtl8139_init_one+0x2ab/0x960
 ... acquired at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180


stack backtrace:
 [<c010437d>] show_trace+0xd/0x10
 [<c0104397>] dump_stack+0x17/0x20
 [<c0137db2>] print_irq_inversion_bug+0x142/0x1b0
 [<c0137f91>] check_usage_forwards+0x41/0x50
 [<c0139520>] mark_lock+0x180/0x350
 [<c0139b83>] debug_lock_chain+0x493/0x1090
 [<c013a7bd>] debug_lock_chain_spin+0x3d/0x60
 [<c026777d>] _raw_spin_lock+0x2d/0x90
 [<c04dc942>] _spin_lock_bh+0x12/0x20
 [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0
 [<c01278a5>] run_timer_softirq+0xf5/0x1f0
 [<c01233a7>] __do_softirq+0x97/0x130
 [<c01054d9>] do_softirq+0x69/0x100
 =======================
 [<c0123065>] irq_exit+0x45/0x50
 [<c010f534>] smp_apic_timer_interrupt+0x54/0x60
 [<c010395b>] apic_timer_interrupt+0x27/0x2c
 [<c0265f0c>] __delay+0xc/0x10
 [<c0265f71>] __udelay+0x31/0x40
 [<c0143333>] rcu_torture_reader+0x83/0x140
 [<c0131dea>] kthread+0xca/0xd0
 [<c0100ef5>] kernel_thread_helper+0x5/0x10
eth0: no IPv6 routers present


[-- Attachment #2: 8139too-tx-deadlock-fix.patch --]
[-- Type: text/plain, Size: 3523 bytes --]

Ingo's stealth lock validator detected that both thread acquire
dev->xmit_lock and tp->rx_lock in reverse order.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

----

 drivers/net/8139too.c |   36 ++++++++++++++++++++++++++----------
 1 files changed, 26 insertions(+), 10 deletions(-)

Index: linux/drivers/net/8139too.c
===================================================================
--- linux.orig/drivers/net/8139too.c
+++ linux/drivers/net/8139too.c
@@ -586,6 +586,7 @@ struct rtl8139_private {
 	dma_addr_t tx_bufs_dma;
 	signed char phys[4];		/* MII device addresses. */
 	char twistie, twist_row, twist_col;	/* Twister tune state. */
+	unsigned int watchdog_fired : 1;
 	unsigned int default_port : 4;	/* Last dev->if_port value. */
 	unsigned int have_thread : 1;
 	spinlock_t lock;
@@ -638,6 +639,7 @@ static void rtl8139_set_rx_mode (struct 
 static void __set_rx_mode (struct net_device *dev);
 static void rtl8139_hw_start (struct net_device *dev);
 static void rtl8139_thread (void *_data);
+static void rtl8139_tx_timeout_task(void *_data);
 static struct ethtool_ops rtl8139_ethtool_ops;
 
 /* write MMIO register, with flush */
@@ -1598,9 +1600,12 @@ static void rtl8139_thread (void *_data)
 {
 	struct net_device *dev = _data;
 	struct rtl8139_private *tp = netdev_priv(dev);
-	unsigned long thr_delay;
+	unsigned long thr_delay = next_tick;
 
-	if (rtnl_shlock_nowait() != 0) {
+	if (tp->watchdog_fired) {
+		tp->watchdog_fired = 0;
+		rtl8139_tx_timeout_task(_data);
+	} else if (rtnl_shlock_nowait() != 0) {
 		rtl8139_thread_iter (dev, tp, tp->mmio_addr);
 		rtnl_unlock ();
 
@@ -1631,7 +1636,8 @@ static void rtl8139_stop_thread(struct r
 	if (tp->have_thread) {
 		cancel_rearming_delayed_work(&tp->thread);
 		tp->have_thread = 0;
-	}
+	} else
+		flush_scheduled_work();
 }
 
 static inline void rtl8139_tx_clear (struct rtl8139_private *tp)
@@ -1642,14 +1648,13 @@ static inline void rtl8139_tx_clear (str
 	/* XXX account for unsent Tx packets in tp->stats.tx_dropped */
 }
 
-
-static void rtl8139_tx_timeout (struct net_device *dev)
+static void rtl8139_tx_timeout_task (void *_data)
 {
+	struct net_device *dev = _data;
 	struct rtl8139_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
 	int i;
 	u8 tmp8;
-	unsigned long flags;
 
 	printk (KERN_DEBUG "%s: Transmit timeout, status %2.2x %4.4x %4.4x "
 		"media %2.2x.\n", dev->name, RTL_R8 (ChipCmd),
@@ -1670,23 +1675,34 @@ static void rtl8139_tx_timeout (struct n
 	if (tmp8 & CmdTxEnb)
 		RTL_W8 (ChipCmd, CmdRxEnb);
 
-	spin_lock(&tp->rx_lock);
+	spin_lock_bh(&tp->rx_lock);
 	/* Disable interrupts by clearing the interrupt mask. */
 	RTL_W16 (IntrMask, 0x0000);
 
 	/* Stop a shared interrupt from scavenging while we are. */
-	spin_lock_irqsave (&tp->lock, flags);
+	spin_lock_irq(&tp->lock);
 	rtl8139_tx_clear (tp);
-	spin_unlock_irqrestore (&tp->lock, flags);
+	spin_unlock_irq(&tp->lock);
 
 	/* ...and finally, reset everything */
 	if (netif_running(dev)) {
 		rtl8139_hw_start (dev);
 		netif_wake_queue (dev);
 	}
-	spin_unlock(&tp->rx_lock);
+	spin_unlock_bh(&tp->rx_lock);
 }
 
+static void rtl8139_tx_timeout (struct net_device *dev)
+{
+	struct rtl8139_private *tp = netdev_priv(dev);
+
+	if (!tp->have_thread) {
+		INIT_WORK(&tp->thread, rtl8139_tx_timeout_task, dev);
+		schedule_delayed_work(&tp->thread, next_tick);
+	} else
+		tp->watchdog_fired = 1;
+
+}
 
 static int rtl8139_start_xmit (struct sk_buff *skb, struct net_device *dev)
 {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race
  2006-01-31 18:49         ` Ingo Molnar
@ 2006-01-31 18:57           ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-01-31 18:57 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Jeff Garzik, Herbert Xu, linux-kernel, akpm, davem


* Ingo Molnar <mingo@elte.hu> wrote:

> > Ingo's stealth lock validator detected that both thread acquire
> > dev->xmit_lock and tp->rx_lock in reverse order.
> > 
> > Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> 
> i've ported this to -mm4 (see the attached patch), but i'm also 
> getting a new deadlock message. Seems to be a separate issue, not 
> introduced by your patch - but still needs fixing i guess.

sorry - this is a false alarm. (caused by the lock validator not 
properly handling DEBUG_SHIRQ)

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race
  2006-01-31  0:24       ` [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race Francois Romieu
  2006-01-31 18:49         ` Ingo Molnar
@ 2006-02-01  0:55         ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2006-02-01  0:55 UTC (permalink / raw)
  To: Francois Romieu; +Cc: jgarzik, herbert, mingo, linux-kernel, davem

Francois Romieu <romieu@fr.zoreil.com> wrote:
>
> +	unsigned int watchdog_fired : 1;
>  	unsigned int default_port : 4;	/* Last dev->if_port value. */
>  	unsigned int have_thread : 1;

Bear in mind that the compiler will put these three fields into the same
word and will use non-atomic RMWs when they are modified.

In this particualr case it's hard to see how an SMP or IRQ race can occur,
but it's a trap.   Which won't show up on x86.


struct x {
	unsigned int a:1;
	unsigned int b:1;
};

void foo(struct x *a)
{
	a->a = 1;
}

void bar(struct x *a)
{
	a->b = 1;
}


On ppc:


foo:
        lwz 0,0(3)
        oris 0,0,0x8000
        stw 0,0(3)
        blr
        .size   foo, .-foo
        .align 2
        .globl bar
        .type   bar, @function
bar:
        lwz 0,0(3)
        oris 0,0,0x4000
        stw 0,0(3)
        blr

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-02-01  1:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-26 22:43 [lock validator] drivers/net/8139too.c: deadlock? Ingo Molnar
2006-01-27  0:35 ` Francois Romieu
2006-01-27  2:22   ` Herbert Xu
2006-01-27  2:44     ` Jeff Garzik
2006-01-31  0:24       ` [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race Francois Romieu
2006-01-31 18:49         ` Ingo Molnar
2006-01-31 18:57           ` Ingo Molnar
2006-02-01  0:55         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox