All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: Jeff Garzik <jgarzik@pobox.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	linux-kernel@vger.kernel.org, akpm@osdl.org, davem@redhat.com
Subject: Re: [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race
Date: Tue, 31 Jan 2006 19:49:57 +0100	[thread overview]
Message-ID: <20060131184957.GA22660@elte.hu> (raw)
In-Reply-To: <20060131002418.GA917@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 3907 bytes --]


* Francois Romieu <romieu@fr.zoreil.com> wrote:

> Ingo's stealth lock validator detected that both thread acquire
> dev->xmit_lock and tp->rx_lock in reverse order.
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>

i've ported this to -mm4 (see the attached patch), but i'm also getting 
a new deadlock message. Seems to be a separate issue, not introduced by 
your patch - but still needs fixing i guess.

	Ingo

=========================================
[ BUG: irq lock inversion bug detected! ]
-----------------------------------------
rcu_torture_rea/263 just changed the state of lock:
 (&mc->mca_lock){-+}, at: [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0
but this lock took another, softirq-unsafe lock in the past:
 (&tp->lock){--}, at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180

and interrupts could create an inverse lock dependency between them,
which could lead to deadlocks!

other info that might help us debug this:
no locks held by rcu_torture_rea/263.
-> (&tp->lock){--} {
   used       at: [<c03413da>] rtl8139_interrupt+0x2a/0x5e0
   softirq-on at: [<c044b5fa>] dev_mc_upload+0x2a/0x40
   hardirq-on at: [<c0160a28>] kmem_cache_alloc+0x98/0xd0
 }
 ... key      at: [<c033fffb>] rtl8139_init_one+0x2ab/0x960
 ... acquired at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180


the first lock's dependencies:
-> (&mc->mca_lock){-+} {
   used       at: [<c04b3440>] igmp6_group_added+0x20/0x170
   in-softirq at: [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0
   hardirq-on at: [<c04dcaa5>] _spin_unlock_irqrestore+0x25/0x30
 }
 ... key      at: [<c04b3edd>] ipv6_dev_mc_inc+0x17d/0x3c0
 ... acquired at: [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0

  -> (&dev->xmit_lock){-.} {
     used       at: [<c044b5ec>] dev_mc_upload+0x1c/0x40
     hardirq-on at: [<c04dba6a>] __mutex_lock_slowpath+0x27a/0x4e0
   }
   ... key      at: [<c0449fdd>] register_netdevice+0x5d/0x3a0
   ... acquired at: [<c044b724>] dev_mc_add+0x34/0x140

    -> (&tp->lock){--} {
       used       at: [<c03413da>] rtl8139_interrupt+0x2a/0x5e0
       softirq-on at: [<c044b5fa>] dev_mc_upload+0x2a/0x40
       hardirq-on at: [<c0160a28>] kmem_cache_alloc+0x98/0xd0
     }
     ... key      at: [<c033fffb>] rtl8139_init_one+0x2ab/0x960
     ... acquired at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180

    -> (&base->t_base.lock){++} {
       used       at: [<c0126df3>] lock_timer_base+0x23/0x50
       in-hardirq at: [<c0126df3>] lock_timer_base+0x23/0x50
       in-softirq at: [<c01277f2>] run_timer_softirq+0x42/0x1f0
     }
     ... key      at: [<c1fd2ef0>] 0xc1fd2ef0
     ... acquired at: [<c0126df3>] lock_timer_base+0x23/0x50


the second lock's dependencies:
-> (&tp->lock){--} {
   used       at: [<c03413da>] rtl8139_interrupt+0x2a/0x5e0
   softirq-on at: [<c044b5fa>] dev_mc_upload+0x2a/0x40
   hardirq-on at: [<c0160a28>] kmem_cache_alloc+0x98/0xd0
 }
 ... key      at: [<c033fffb>] rtl8139_init_one+0x2ab/0x960
 ... acquired at: [<c03406d2>] rtl8139_set_rx_mode+0x22/0x180


stack backtrace:
 [<c010437d>] show_trace+0xd/0x10
 [<c0104397>] dump_stack+0x17/0x20
 [<c0137db2>] print_irq_inversion_bug+0x142/0x1b0
 [<c0137f91>] check_usage_forwards+0x41/0x50
 [<c0139520>] mark_lock+0x180/0x350
 [<c0139b83>] debug_lock_chain+0x493/0x1090
 [<c013a7bd>] debug_lock_chain_spin+0x3d/0x60
 [<c026777d>] _raw_spin_lock+0x2d/0x90
 [<c04dc942>] _spin_lock_bh+0x12/0x20
 [<c04b4c0f>] mld_ifc_timer_expire+0x17f/0x2c0
 [<c01278a5>] run_timer_softirq+0xf5/0x1f0
 [<c01233a7>] __do_softirq+0x97/0x130
 [<c01054d9>] do_softirq+0x69/0x100
 =======================
 [<c0123065>] irq_exit+0x45/0x50
 [<c010f534>] smp_apic_timer_interrupt+0x54/0x60
 [<c010395b>] apic_timer_interrupt+0x27/0x2c
 [<c0265f0c>] __delay+0xc/0x10
 [<c0265f71>] __udelay+0x31/0x40
 [<c0143333>] rcu_torture_reader+0x83/0x140
 [<c0131dea>] kthread+0xca/0xd0
 [<c0100ef5>] kernel_thread_helper+0x5/0x10
eth0: no IPv6 routers present


[-- Attachment #2: 8139too-tx-deadlock-fix.patch --]
[-- Type: text/plain, Size: 3523 bytes --]

Ingo's stealth lock validator detected that both thread acquire
dev->xmit_lock and tp->rx_lock in reverse order.

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

----

 drivers/net/8139too.c |   36 ++++++++++++++++++++++++++----------
 1 files changed, 26 insertions(+), 10 deletions(-)

Index: linux/drivers/net/8139too.c
===================================================================
--- linux.orig/drivers/net/8139too.c
+++ linux/drivers/net/8139too.c
@@ -586,6 +586,7 @@ struct rtl8139_private {
 	dma_addr_t tx_bufs_dma;
 	signed char phys[4];		/* MII device addresses. */
 	char twistie, twist_row, twist_col;	/* Twister tune state. */
+	unsigned int watchdog_fired : 1;
 	unsigned int default_port : 4;	/* Last dev->if_port value. */
 	unsigned int have_thread : 1;
 	spinlock_t lock;
@@ -638,6 +639,7 @@ static void rtl8139_set_rx_mode (struct 
 static void __set_rx_mode (struct net_device *dev);
 static void rtl8139_hw_start (struct net_device *dev);
 static void rtl8139_thread (void *_data);
+static void rtl8139_tx_timeout_task(void *_data);
 static struct ethtool_ops rtl8139_ethtool_ops;
 
 /* write MMIO register, with flush */
@@ -1598,9 +1600,12 @@ static void rtl8139_thread (void *_data)
 {
 	struct net_device *dev = _data;
 	struct rtl8139_private *tp = netdev_priv(dev);
-	unsigned long thr_delay;
+	unsigned long thr_delay = next_tick;
 
-	if (rtnl_shlock_nowait() != 0) {
+	if (tp->watchdog_fired) {
+		tp->watchdog_fired = 0;
+		rtl8139_tx_timeout_task(_data);
+	} else if (rtnl_shlock_nowait() != 0) {
 		rtl8139_thread_iter (dev, tp, tp->mmio_addr);
 		rtnl_unlock ();
 
@@ -1631,7 +1636,8 @@ static void rtl8139_stop_thread(struct r
 	if (tp->have_thread) {
 		cancel_rearming_delayed_work(&tp->thread);
 		tp->have_thread = 0;
-	}
+	} else
+		flush_scheduled_work();
 }
 
 static inline void rtl8139_tx_clear (struct rtl8139_private *tp)
@@ -1642,14 +1648,13 @@ static inline void rtl8139_tx_clear (str
 	/* XXX account for unsent Tx packets in tp->stats.tx_dropped */
 }
 
-
-static void rtl8139_tx_timeout (struct net_device *dev)
+static void rtl8139_tx_timeout_task (void *_data)
 {
+	struct net_device *dev = _data;
 	struct rtl8139_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
 	int i;
 	u8 tmp8;
-	unsigned long flags;
 
 	printk (KERN_DEBUG "%s: Transmit timeout, status %2.2x %4.4x %4.4x "
 		"media %2.2x.\n", dev->name, RTL_R8 (ChipCmd),
@@ -1670,23 +1675,34 @@ static void rtl8139_tx_timeout (struct n
 	if (tmp8 & CmdTxEnb)
 		RTL_W8 (ChipCmd, CmdRxEnb);
 
-	spin_lock(&tp->rx_lock);
+	spin_lock_bh(&tp->rx_lock);
 	/* Disable interrupts by clearing the interrupt mask. */
 	RTL_W16 (IntrMask, 0x0000);
 
 	/* Stop a shared interrupt from scavenging while we are. */
-	spin_lock_irqsave (&tp->lock, flags);
+	spin_lock_irq(&tp->lock);
 	rtl8139_tx_clear (tp);
-	spin_unlock_irqrestore (&tp->lock, flags);
+	spin_unlock_irq(&tp->lock);
 
 	/* ...and finally, reset everything */
 	if (netif_running(dev)) {
 		rtl8139_hw_start (dev);
 		netif_wake_queue (dev);
 	}
-	spin_unlock(&tp->rx_lock);
+	spin_unlock_bh(&tp->rx_lock);
 }
 
+static void rtl8139_tx_timeout (struct net_device *dev)
+{
+	struct rtl8139_private *tp = netdev_priv(dev);
+
+	if (!tp->have_thread) {
+		INIT_WORK(&tp->thread, rtl8139_tx_timeout_task, dev);
+		schedule_delayed_work(&tp->thread, next_tick);
+	} else
+		tp->watchdog_fired = 1;
+
+}
 
 static int rtl8139_start_xmit (struct sk_buff *skb, struct net_device *dev)
 {

  reply	other threads:[~2006-01-31 18:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-26 22:43 [lock validator] drivers/net/8139too.c: deadlock? Ingo Molnar
2006-01-27  0:35 ` Francois Romieu
2006-01-27  2:22   ` Herbert Xu
2006-01-27  2:44     ` Jeff Garzik
2006-01-31  0:24       ` [PATCH] 8139too: fix a TX timeout watchdog thread against NAPI softirq race Francois Romieu
2006-01-31 18:49         ` Ingo Molnar [this message]
2006-01-31 18:57           ` Ingo Molnar
2006-02-01  0:55         ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060131184957.GA22660@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@osdl.org \
    --cc=davem@redhat.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=romieu@fr.zoreil.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.