BUG: bad unlock balance detected! e1000e

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* BUG: bad unlock balance detected! e1000e
@ 2008-12-08 11:24 Zdenek Kabelac
  2008-12-09 11:03 ` Frederik Deweerdt
  0 siblings, 1 reply; 6+ messages in thread
From: Zdenek Kabelac @ 2008-12-08 11:24 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi

During occasional scan of message log - I've found out this BUG which
happened on Dec3 with the -rc7 from that day.
(So if it's now fixed in current git feel free to ignore :))

My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
shutdown and was actually not noticed by me...


NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15,
shutting down normally.
NetworkManager: <info>  (eth0): now unmanaged
NetworkManager: <info>  (eth0): device state change: 3 -> 1
NetworkManager: <info>  (eth0): cleaning up...
NetworkManager: <info>  (eth0): taking down device.

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
NetworkManager/2058 is trying to release lock (nvm_mutex) at:
[<ffffffff8052acb9>] mutex_unlock+0x9/0x10
but there are no more locks to release!

other info that might help us debug this:
1 lock held by NetworkManager/2058:
 #0:  (rtnl_mutex){--..}, at: [<ffffffff804b09ba>] rtnetlink_rcv+0x1a/0x40

stack backtrace:
Pid: 2058, comm: NetworkManager Not tainted 2.6.28-rc7 #90
Call Trace:
 [<ffffffff8026bf6b>] print_unlock_inbalance_bug+0xfb/0x120
 [<ffffffff8026db76>] ? mark_held_locks+0x56/0xa0
 [<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
 [<ffffffff8052ce23>] ? _spin_unlock_irqrestore+0x43/0x70
 [<ffffffff8026f468>] lock_release_non_nested+0x1a8/0x2a0
 [<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
 [<ffffffff8052acb9>] ? mutex_unlock+0x9/0x10
 [<ffffffff8026f61b>] lock_release+0xbb/0x200
 [<ffffffff8052abc6>] __mutex_unlock_slowpath+0x86/0x170
 [<ffffffff8052acb9>] mutex_unlock+0x9/0x10
 [<ffffffffa019cf45>] e1000_release_swflag_ich8lan+0x35/0x40 [e1000e]
 [<ffffffffa019de19>] e1000_reset_hw_ich8lan+0xa9/0x130 [e1000e]
 [<ffffffffa01ab140>] e1000e_reset+0xf0/0x260 [e1000e]
 [<ffffffff802500ea>] ? del_timer_sync+0x1a/0x30
 [<ffffffffa01ab4d8>] e1000e_down+0x178/0x180 [e1000e]
 [<ffffffffa01ad0b6>] e1000_close+0x26/0xd0 [e1000e]
 [<ffffffff804a59b6>] dev_close+0x76/0xc0
 [<ffffffff804a7bd6>] dev_change_flags+0x96/0x1e0
 [<ffffffff804b1b4c>] do_setlink+0x2ac/0x440
 [<ffffffff8052cd46>] ? _read_unlock+0x26/0x30
 [<ffffffff804b1ded>] rtnl_setlink+0x10d/0x150
 [<ffffffff8052aeb2>] ? mutex_lock_nested+0x1f2/0x300
 [<ffffffff804b09ba>] ? rtnetlink_rcv+0x1a/0x40
 [<ffffffff804b0b6d>] rtnetlink_rcv_msg+0x18d/0x240
 [<ffffffff804b09e0>] ? rtnetlink_rcv_msg+0x0/0x240
 [<ffffffff804bba29>] netlink_rcv_skb+0x89/0xb0
 [<ffffffff804b09c9>] rtnetlink_rcv+0x29/0x40
 [<ffffffff804bb74d>] netlink_unicast+0x2bd/0x2d0
 [<ffffffff8049e3ee>] ? __alloc_skb+0x6e/0x150
 [<ffffffff804bc6c4>] netlink_sendmsg+0x204/0x2f0
 [<ffffffff80499ed8>] ? sock_def_readable+0x68/0x70
 [<ffffffff80498077>] sock_sendmsg+0x107/0x130
 [<ffffffff8025c690>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80213853>] ? native_sched_clock+0x13/0x60
 [<ffffffff802d60e6>] ? fget_light+0x106/0x110
 [<ffffffff80497987>] ? move_addr_to_kernel+0x57/0x60
 [<ffffffff804a087f>] ? verify_iovec+0x3f/0xe0
 [<ffffffff80498229>] sys_sendmsg+0x189/0x320
 [<ffffffff8049851d>] ? sys_sendto+0xfd/0x120
 [<ffffffff802e6650>] ? d_free+0x50/0x60
 [<ffffffff802d6691>] ? __fput+0x171/0x1e0
 [<ffffffff8026dda3>] ? trace_hardirqs_on_caller+0x133/0x190
 [<ffffffff8028e02f>] ? audit_syscall_entry+0x15f/0x190
 [<ffffffff8052c906>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8020c57b>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: bad unlock balance detected! e1000e
  2008-12-08 11:24 BUG: bad unlock balance detected! e1000e Zdenek Kabelac
@ 2008-12-09 11:03 ` Frederik Deweerdt
  2008-12-09 23:08   ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Frederik Deweerdt @ 2008-12-09 11:03 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Linux Kernel Mailing List, tglx

Hello Zdenek,

This could be due to 717d438d1fde94decef874b9808379d1f4523453
    "e1000e: debug contention on NVM SWFLAG"
Error handling is missing from e1000_reset_hw_ich8lan so it may happen
that we don't acquire the nvm_mutex if the card times out.

Adding Thomas to CC.

Regards,
Frederik

It some error checking is missing in e1000e: debug contention on NVM
SWFLAG
On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> Hi
> 
> During occasional scan of message log - I've found out this BUG which
> happened on Dec3 with the -rc7 from that day.
> (So if it's now fixed in current git feel free to ignore :))
> 
> My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> shutdown and was actually not noticed by me...
> 
> 
> NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15,
> shutting down normally.
> NetworkManager: <info>  (eth0): now unmanaged
> NetworkManager: <info>  (eth0): device state change: 3 -> 1
> NetworkManager: <info>  (eth0): cleaning up...
> NetworkManager: <info>  (eth0): taking down device.
> 
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
> NetworkManager/2058 is trying to release lock (nvm_mutex) at:
> [<ffffffff8052acb9>] mutex_unlock+0x9/0x10
> but there are no more locks to release!
> 
> other info that might help us debug this:
> 1 lock held by NetworkManager/2058:
>  #0:  (rtnl_mutex){--..}, at: [<ffffffff804b09ba>] rtnetlink_rcv+0x1a/0x40
> 
> stack backtrace:
> Pid: 2058, comm: NetworkManager Not tainted 2.6.28-rc7 #90
> Call Trace:
>  [<ffffffff8026bf6b>] print_unlock_inbalance_bug+0xfb/0x120
>  [<ffffffff8026db76>] ? mark_held_locks+0x56/0xa0
>  [<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
>  [<ffffffff8052ce23>] ? _spin_unlock_irqrestore+0x43/0x70
>  [<ffffffff8026f468>] lock_release_non_nested+0x1a8/0x2a0
>  [<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
>  [<ffffffff8052acb9>] ? mutex_unlock+0x9/0x10
>  [<ffffffff8026f61b>] lock_release+0xbb/0x200
>  [<ffffffff8052abc6>] __mutex_unlock_slowpath+0x86/0x170
>  [<ffffffff8052acb9>] mutex_unlock+0x9/0x10
>  [<ffffffffa019cf45>] e1000_release_swflag_ich8lan+0x35/0x40 [e1000e]
>  [<ffffffffa019de19>] e1000_reset_hw_ich8lan+0xa9/0x130 [e1000e]
>  [<ffffffffa01ab140>] e1000e_reset+0xf0/0x260 [e1000e]
>  [<ffffffff802500ea>] ? del_timer_sync+0x1a/0x30
>  [<ffffffffa01ab4d8>] e1000e_down+0x178/0x180 [e1000e]
>  [<ffffffffa01ad0b6>] e1000_close+0x26/0xd0 [e1000e]
>  [<ffffffff804a59b6>] dev_close+0x76/0xc0
>  [<ffffffff804a7bd6>] dev_change_flags+0x96/0x1e0
>  [<ffffffff804b1b4c>] do_setlink+0x2ac/0x440
>  [<ffffffff8052cd46>] ? _read_unlock+0x26/0x30
>  [<ffffffff804b1ded>] rtnl_setlink+0x10d/0x150
>  [<ffffffff8052aeb2>] ? mutex_lock_nested+0x1f2/0x300
>  [<ffffffff804b09ba>] ? rtnetlink_rcv+0x1a/0x40
>  [<ffffffff804b0b6d>] rtnetlink_rcv_msg+0x18d/0x240
>  [<ffffffff804b09e0>] ? rtnetlink_rcv_msg+0x0/0x240
>  [<ffffffff804bba29>] netlink_rcv_skb+0x89/0xb0
>  [<ffffffff804b09c9>] rtnetlink_rcv+0x29/0x40
>  [<ffffffff804bb74d>] netlink_unicast+0x2bd/0x2d0
>  [<ffffffff8049e3ee>] ? __alloc_skb+0x6e/0x150
>  [<ffffffff804bc6c4>] netlink_sendmsg+0x204/0x2f0
>  [<ffffffff80499ed8>] ? sock_def_readable+0x68/0x70
>  [<ffffffff80498077>] sock_sendmsg+0x107/0x130
>  [<ffffffff8025c690>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff80213853>] ? native_sched_clock+0x13/0x60
>  [<ffffffff802d60e6>] ? fget_light+0x106/0x110
>  [<ffffffff80497987>] ? move_addr_to_kernel+0x57/0x60
>  [<ffffffff804a087f>] ? verify_iovec+0x3f/0xe0
>  [<ffffffff80498229>] sys_sendmsg+0x189/0x320
>  [<ffffffff8049851d>] ? sys_sendto+0xfd/0x120
>  [<ffffffff802e6650>] ? d_free+0x50/0x60
>  [<ffffffff802d6691>] ? __fput+0x171/0x1e0
>  [<ffffffff8026dda3>] ? trace_hardirqs_on_caller+0x133/0x190
>  [<ffffffff8028e02f>] ? audit_syscall_entry+0x15f/0x190
>  [<ffffffff8052c906>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8020c57b>] system_call_fastpath+0x16/0x1b
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: bad unlock balance detected! e1000e
  2008-12-09 11:03 ` Frederik Deweerdt
@ 2008-12-09 23:08   ` Andrew Morton
  2008-12-09 23:43     ` Frederik Deweerdt
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2008-12-09 23:08 UTC (permalink / raw)
  To: Frederik Deweerdt
  Cc: zdenek.kabelac, linux-kernel, tglx, netdev, Jesse Brandeburg,
	David S. Miller, stable

On Tue, 9 Dec 2008 12:03:37 +0100
Frederik Deweerdt <frederik.deweerdt@xprog.eu> wrote:

> It some error checking is missing in e1000e: debug contention on NVM
> SWFLAG
> On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> > Hi
> > 
> > During occasional scan of message log - I've found out this BUG which
> > happened on Dec3 with the -rc7 from that day.
> > (So if it's now fixed in current git feel free to ignore :))
> > 
> > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> > shutdown and was actually not noticed by me...
> > 
> > 
> > NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15,
> > shutting down normally.
> > NetworkManager: <info>  (eth0): now unmanaged
> > NetworkManager: <info>  (eth0): device state change: 3 -> 1
> > NetworkManager: <info>  (eth0): cleaning up...
> > NetworkManager: <info>  (eth0): taking down device.
> > 
> > =====================================
> > [ BUG: bad unlock balance detected! ]
> > -------------------------------------

(top-posting repaired.  Please don't do that!!!).

> Hello Zdenek,
> 
> This could be due to 717d438d1fde94decef874b9808379d1f4523453
>     "e1000e: debug contention on NVM SWFLAG"
> Error handling is missing from e1000_reset_hw_ich8lan so it may happen
> that we don't acquire the nvm_mutex if the card times out.
> 
> Adding Thomas to CC.

yup.  2.6.27 needs fixing also.

Like this?

From: Andrew Morton <akpm@linux-foundation.org>

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------

A regression added by 717d438d1fde94decef874b9808379d1f4523453 ("e1000e:
debug contention on NVM SWFLAG").

Reported-by: "Zdenek Kabelac" <zdenek.kabelac@gmail.com>
Cc: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/e1000e/ich8lan.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff -puN drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking drivers/net/e1000e/ich8lan.c
--- a/drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking
+++ a/drivers/net/e1000e/ich8lan.c
@@ -400,6 +400,7 @@ static s32 e1000_acquire_swflag_ich8lan(
 {
 	u32 extcnf_ctrl;
 	u32 timeout = PHY_CFG_TIMEOUT;
+	s32 ret = 0;
 
 	might_sleep();
 
@@ -427,11 +428,11 @@ static s32 e1000_acquire_swflag_ich8lan(
 		extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
 		ew32(EXTCNF_CTRL, extcnf_ctrl);
 		nvm_owner = -1;
-		mutex_unlock(&nvm_mutex);
-		return -E1000_ERR_CONFIG;
+		ret = -E1000_ERR_CONFIG;
 	}
 
-	return 0;
+	mutex_unlock(&nvm_mutex);
+	return ret;
 }
 
 /**
_


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: bad unlock balance detected! e1000e
  2008-12-09 23:08   ` Andrew Morton
@ 2008-12-09 23:43     ` Frederik Deweerdt
  2008-12-09 23:56       ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Frederik Deweerdt @ 2008-12-09 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: zdenek.kabelac, linux-kernel, tglx, netdev, Jesse Brandeburg,
	David S. Miller, stable

On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
> On Tue, 9 Dec 2008 12:03:37 +0100
> Frederik Deweerdt <frederik.deweerdt@xprog.eu> wrote:
> 
> > It some error checking is missing in e1000e: debug contention on NVM
> > SWFLAG
> > On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> > > Hi
> > > 
> > > During occasional scan of message log - I've found out this BUG which
> > > happened on Dec3 with the -rc7 from that day.
> > > (So if it's now fixed in current git feel free to ignore :))
> > > 
> > > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> > > shutdown and was actually not noticed by me...
> > > 
> > > 
> > > NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15,
> > > shutting down normally.
> > > NetworkManager: <info>  (eth0): now unmanaged
> > > NetworkManager: <info>  (eth0): device state change: 3 -> 1
> > > NetworkManager: <info>  (eth0): cleaning up...
> > > NetworkManager: <info>  (eth0): taking down device.
> > > 
> > > =====================================
> > > [ BUG: bad unlock balance detected! ]
> > > -------------------------------------
> 
> (top-posting repaired.  Please don't do that!!!).
Yep, sorry.
> 
> > Hello Zdenek,
> > 
> > This could be due to 717d438d1fde94decef874b9808379d1f4523453
> >     "e1000e: debug contention on NVM SWFLAG"
> > Error handling is missing from e1000_reset_hw_ich8lan so it may happen
> > that we don't acquire the nvm_mutex if the card times out.
> > 
> > Adding Thomas to CC.
> 
> yup.  2.6.27 needs fixing also.
> 
> Like this?
I don't think so, e1000_acquire_swflag_ich8lan() locks and
e1000_release_swflag_ich8lan() unlocks. I think it is more along the
lines of:


diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 523b971..f971b83 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 		 */
 		ctrl |= E1000_CTRL_PHY_RST;
 	}
+
 	ret_val = e1000_acquire_swflag_ich8lan(hw);
+	if (ret_val) {
+		hw_dbg(hw, "Failed to acquire NVM swflag");
+		return ret_val;
+	}
+
 	hw_dbg(hw, "Issuing a global reset to ich8lan");
 	ew32(CTRL, (ctrl | E1000_CTRL_RST));
 	msleep(20);


But I'm not sure we should cancel the ongoing reset if the card times
out... 


Regards,
Frederik

> 
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
> 
> A regression added by 717d438d1fde94decef874b9808379d1f4523453 ("e1000e:
> debug contention on NVM SWFLAG").
> 
> Reported-by: "Zdenek Kabelac" <zdenek.kabelac@gmail.com>
> Cc: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: <stable@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  drivers/net/e1000e/ich8lan.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff -puN drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking drivers/net/e1000e/ich8lan.c
> --- a/drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking
> +++ a/drivers/net/e1000e/ich8lan.c
> @@ -400,6 +400,7 @@ static s32 e1000_acquire_swflag_ich8lan(
>  {
>  	u32 extcnf_ctrl;
>  	u32 timeout = PHY_CFG_TIMEOUT;
> +	s32 ret = 0;
>  
>  	might_sleep();
>  
> @@ -427,11 +428,11 @@ static s32 e1000_acquire_swflag_ich8lan(
>  		extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
>  		ew32(EXTCNF_CTRL, extcnf_ctrl);
>  		nvm_owner = -1;
> -		mutex_unlock(&nvm_mutex);
> -		return -E1000_ERR_CONFIG;
> +		ret = -E1000_ERR_CONFIG;
>  	}
>  
> -	return 0;
> +	mutex_unlock(&nvm_mutex);
> +	return ret;
>  }
>  
>  /**
> _
> 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: BUG: bad unlock balance detected! e1000e
  2008-12-09 23:43     ` Frederik Deweerdt
@ 2008-12-09 23:56       ` Andrew Morton
  2008-12-11  0:37         ` [E1000-devel] " Jeff Kirsher
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2008-12-09 23:56 UTC (permalink / raw)
  To: Frederik Deweerdt
  Cc: zdenek.kabelac, linux-kernel, tglx, netdev, jesse.brandeburg,
	davem, stable, e1000-devel

On Wed, 10 Dec 2008 00:43:46 +0100
Frederik Deweerdt <frederik.deweerdt@xprog.eu> wrote:

> On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
> > On Tue, 9 Dec 2008 12:03:37 +0100
> > Frederik Deweerdt <frederik.deweerdt@xprog.eu> wrote:
> > 
> > > It some error checking is missing in e1000e: debug contention on NVM
> > > SWFLAG
> > > On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> > > > Hi
> > > > 
> > > > During occasional scan of message log - I've found out this BUG which
> > > > happened on Dec3 with the -rc7 from that day.
> > > > (So if it's now fixed in current git feel free to ignore :))
> > > > 
> > > > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> > > > shutdown and was actually not noticed by me...
> > > > 
> > > > 
> > > > NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15,
> > > > shutting down normally.
> > > > NetworkManager: <info>  (eth0): now unmanaged
> > > > NetworkManager: <info>  (eth0): device state change: 3 -> 1
> > > > NetworkManager: <info>  (eth0): cleaning up...
> > > > NetworkManager: <info>  (eth0): taking down device.
> > > > 
> > > > =====================================
> > > > [ BUG: bad unlock balance detected! ]
> > > > -------------------------------------
> > 
> > (top-posting repaired.  Please don't do that!!!).
> Yep, sorry.
> > 
> > > Hello Zdenek,
> > > 
> > > This could be due to 717d438d1fde94decef874b9808379d1f4523453
> > >     "e1000e: debug contention on NVM SWFLAG"
> > > Error handling is missing from e1000_reset_hw_ich8lan so it may happen
> > > that we don't acquire the nvm_mutex if the card times out.
> > > 
> > > Adding Thomas to CC.
> > 
> > yup.  2.6.27 needs fixing also.
> > 
> > Like this?
> I don't think so, e1000_acquire_swflag_ich8lan() locks and
> e1000_release_swflag_ich8lan() unlocks.

urgh, OK, I made the mistake of reading the comments.

> I think it is more along the
> lines of:
> 
> 
> diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
> index 523b971..f971b83 100644
> --- a/drivers/net/e1000e/ich8lan.c
> +++ b/drivers/net/e1000e/ich8lan.c
> @@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
>  		 */
>  		ctrl |= E1000_CTRL_PHY_RST;
>  	}
> +
>  	ret_val = e1000_acquire_swflag_ich8lan(hw);
> +	if (ret_val) {
> +		hw_dbg(hw, "Failed to acquire NVM swflag");
> +		return ret_val;
> +	}
> +
>  	hw_dbg(hw, "Issuing a global reset to ich8lan");
>  	ew32(CTRL, (ctrl | E1000_CTRL_RST));
>  	msleep(20);
> 
> 
> But I'm not sure we should cancel the ongoing reset if the card times
> out... 
> 

Yes, something like that.  Or something like

--- a/drivers/net/e1000e/ich8lan.c~a
+++ a/drivers/net/e1000e/ich8lan.c
@@ -1940,12 +1940,14 @@ static s32 e1000_reset_hw_ich8lan(struct
 		ctrl |= E1000_CTRL_PHY_RST;
 	}
 	ret_val = e1000_acquire_swflag_ich8lan(hw);
-	hw_dbg(hw, "Issuing a global reset to ich8lan\n");
-	ew32(CTRL, (ctrl | E1000_CTRL_RST));
-	msleep(20);
+	if (!ret_val) {
+		hw_dbg(hw, "Issuing a global reset to ich8lan\n");
+		ew32(CTRL, (ctrl | E1000_CTRL_RST));
+		msleep(20);
 
-	/* release the swflag because it is not reset by hardware reset */
-	e1000_release_swflag_ich8lan(hw);
+		/* release the swflag because it is not reset by hardware reset */
+		e1000_release_swflag_ich8lan(hw);
+	}
 
 	ret_val = e1000e_get_auto_rd_done(hw);
 	if (ret_val) {
_


Dunno. It's e1000-developer-summoning-dance time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [E1000-devel] BUG: bad unlock balance detected! e1000e
  2008-12-09 23:56       ` Andrew Morton
@ 2008-12-11  0:37         ` Jeff Kirsher
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Kirsher @ 2008-12-11  0:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederik Deweerdt, e1000-devel, netdev, jesse.brandeburg,
	linux-kernel, stable, tglx, zdenek.kabelac, davem

On Tue, Dec 9, 2008 at 3:56 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 10 Dec 2008 00:43:46 +0100
> Frederik Deweerdt <frederik.deweerdt@xprog.eu> wrote:
>
>> On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
>> > On Tue, 9 Dec 2008 12:03:37 +0100
>> > Frederik Deweerdt <frederik.deweerdt@xprog.eu> wrote:
>> >
>> > > It some error checking is missing in e1000e: debug contention on NVM
>> > > SWFLAG
>> > > On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
>> > > > Hi
>> > > >
>> > > > During occasional scan of message log - I've found out this BUG which
>> > > > happened on Dec3 with the -rc7 from that day.
>> > > > (So if it's now fixed in current git feel free to ignore :))
>> > > >
>> > > > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
>> > > > shutdown and was actually not noticed by me...
>> > > >
>> > > >
>> > > > NetworkManager: <WARN>  nm_signal_handler(): Caught signal 15,
>> > > > shutting down normally.
>> > > > NetworkManager: <info>  (eth0): now unmanaged
>> > > > NetworkManager: <info>  (eth0): device state change: 3 -> 1
>> > > > NetworkManager: <info>  (eth0): cleaning up...
>> > > > NetworkManager: <info>  (eth0): taking down device.
>> > > >
>> > > > =====================================
>> > > > [ BUG: bad unlock balance detected! ]
>> > > > -------------------------------------
>> >
>> > (top-posting repaired.  Please don't do that!!!).
>> Yep, sorry.
>> >
>> > > Hello Zdenek,
>> > >
>> > > This could be due to 717d438d1fde94decef874b9808379d1f4523453
>> > >     "e1000e: debug contention on NVM SWFLAG"
>> > > Error handling is missing from e1000_reset_hw_ich8lan so it may happen
>> > > that we don't acquire the nvm_mutex if the card times out.
>> > >
>> > > Adding Thomas to CC.
>> >
>> > yup.  2.6.27 needs fixing also.
>> >
>> > Like this?
>> I don't think so, e1000_acquire_swflag_ich8lan() locks and
>> e1000_release_swflag_ich8lan() unlocks.
>
> urgh, OK, I made the mistake of reading the comments.
>
>> I think it is more along the
>> lines of:
>>
>>
>> diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
>> index 523b971..f971b83 100644
>> --- a/drivers/net/e1000e/ich8lan.c
>> +++ b/drivers/net/e1000e/ich8lan.c
>> @@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
>>                */
>>               ctrl |= E1000_CTRL_PHY_RST;
>>       }
>> +
>>       ret_val = e1000_acquire_swflag_ich8lan(hw);
>> +     if (ret_val) {
>> +             hw_dbg(hw, "Failed to acquire NVM swflag");
>> +             return ret_val;
>> +     }
>> +
>>       hw_dbg(hw, "Issuing a global reset to ich8lan");
>>       ew32(CTRL, (ctrl | E1000_CTRL_RST));
>>       msleep(20);
>>
>>
>> But I'm not sure we should cancel the ongoing reset if the card times
>> out...
>>
>
> Yes, something like that.  Or something like
>
> --- a/drivers/net/e1000e/ich8lan.c~a
> +++ a/drivers/net/e1000e/ich8lan.c
> @@ -1940,12 +1940,14 @@ static s32 e1000_reset_hw_ich8lan(struct
>                ctrl |= E1000_CTRL_PHY_RST;
>        }
>        ret_val = e1000_acquire_swflag_ich8lan(hw);
> -       hw_dbg(hw, "Issuing a global reset to ich8lan\n");
> -       ew32(CTRL, (ctrl | E1000_CTRL_RST));
> -       msleep(20);
> +       if (!ret_val) {
> +               hw_dbg(hw, "Issuing a global reset to ich8lan\n");
> +               ew32(CTRL, (ctrl | E1000_CTRL_RST));
> +               msleep(20);
>
> -       /* release the swflag because it is not reset by hardware reset */
> -       e1000_release_swflag_ich8lan(hw);
> +               /* release the swflag because it is not reset by hardware reset */
> +               e1000_release_swflag_ich8lan(hw);
> +       }
>
>        ret_val = e1000e_get_auto_rd_done(hw);
>        if (ret_val) {
> _
>
>
> Dunno. It's e1000-developer-summoning-dance time.
>

Actually, if we time out trying to acquire the swflag, we still want
to reset the part because we are most likely in an unrecoverable
state.

So I would suggest the following
 --- a/drivers/net/e1000e/ich8lan.c~a
 +++ a/drivers/net/e1000e/ich8lan.c
 @@ -1940,9 +1940,10 @@ static s32 e1000_reset_hw_ich8lan(struct
                ctrl |= E1000_CTRL_PHY_RST;
        }
        ret_val = e1000_acquire_swflag_ich8lan(hw);
        hw_dbg(hw, "Issuing a global reset to ich8lan\n");
        ew32(CTRL, (ctrl | E1000_CTRL_RST));
        msleep(20);
 +       if (!ret_val) {
 -
 -       /* release the swflag because it is not reset by hardware reset */
 -       e1000_release_swflag_ich8lan(hw);
 +               /* release the swflag because it is not reset by
hardware reset */
 +               e1000_release_swflag_ich8lan(hw);
 +       }

Of course, we will want to add a comment to the fact that we still
want to reset the part, even if we have not acquired the lock because
we are in an unrecoverable state.

I can provide a patch in a few minutes.

-- 
Cheers,
Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-12-11  0:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-08 11:24 BUG: bad unlock balance detected! e1000e Zdenek Kabelac
2008-12-09 11:03 ` Frederik Deweerdt
2008-12-09 23:08   ` Andrew Morton
2008-12-09 23:43     ` Frederik Deweerdt
2008-12-09 23:56       ` Andrew Morton
2008-12-11  0:37         ` [E1000-devel] " Jeff Kirsher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox