Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
       [not found] <bug-8736-10286@http.bugzilla.kernel.org/>
@ 2007-07-11 18:18 ` Andrew Morton
  2007-07-11 18:31   ` Patrick McHardy
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2007-07-11 18:18 UTC (permalink / raw)
  To: netdev; +Cc: bugme-daemon@kernel-bugs.osdl.org, ranko

On Wed, 11 Jul 2007 08:45:12 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8736
> 
>            Summary: New TC deadlock scenario
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.22
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: ranko@spidernet.net
> 
> 
> Most recent kernel where this bug did not occur: 
> Distribution:
> Hardware Environment:
> Software Environment:
> Problem Description:
> 
> Here is another scenario I bumped onto - qdisc_watchdog_cancel() and
> qdisc_restart() deadlock.
> 
> CPU#0
> qdisc_watchdog() fires and gets dev->queue_lock
> qdisc_run()...qdisc_restart()... 
> -> releases dev->queue_lock and enters dev_hard_start_xmit()
> 
> CPU#1
> tc del qdisc dev ...
> qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
> -> grabs dev->queue_lock ... 
> qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
> -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
> holding dev->queue_lock
> 
> CPU#0
> dev_hard_start_xmit() returns ... 
> -> wants to get dev->queue_lock(!)
> 
> DEADLOCK!
> 
> I did not manage to get a backtrace on qdisc_watchdog stack to show them both
> but nevertheless - the above looks like the only way qdisc_watchdog_cancel
> could be sitting there.
> 
> Regards,
> 
> Ranko
> 
> ---cut---
> SysRq : Show Regs
> 
> Pid: 12790, comm:                   tc
> EIP: 0060:[<c02f13f9>] CPU: 1
> EIP is at _spin_unlock_irqrestore+0x36/0x39
>  EFLAGS: 00000282    Not tainted  (2.6.22.SNET.Thors.htbpatch.6.lockdebug #1)
> EAX: 00000000 EBX: c1d119c0 ECX: 00000000 EDX: 00000000
> ESI: 00000282 EDI: c1d11a18 EBP: 00000000 DS: 007b ES: 007b FS: 00d8
> CR0: 80050033 CR2: 008ba828 CR3: 20dc2000 CR4: 000006d0
>  [<c0132ff4>] hrtimer_try_to_cancel+0x33/0x66
>  [<c013007b>] kthread+0xf/0x57
>  [<c0133035>] hrtimer_cancel+0xe/0x14
>  [<c029daaf>] qdisc_watchdog_cancel+0x8/0x11
>  [<f8b8541d>] htb_reset+0x9c/0x14b [sch_htb]
>  [<c029c2ad>] qdisc_reset+0x10/0x11
>  [<c029c3e7>] dev_deactivate+0x27/0xa5
>  [<c029d7a6>] dev_graft_qdisc+0x81/0xa5
>  [<c029d7f2>] qdisc_graft+0x28/0x88
>  [<c029df93>] tc_get_qdisc+0x15d/0x1e9
>  [<c029de36>] tc_get_qdisc+0x0/0x1e9
>  [<c0297038>] rtnetlink_rcv_msg+0x1c2/0x1f5
>  [<c02a1834>] netlink_run_queue+0x96/0xfd
>  [<c0296e76>] rtnetlink_rcv_msg+0x0/0x1f5
>  [<c0296e28>] rtnetlink_rcv+0x26/0x42
>  [<c02a1d5c>] netlink_data_ready+0x12/0x54
>  [<c02a0abe>] netlink_sendskb+0x1c/0x33
>  [<c02a1c6b>] netlink_sendmsg+0x1f3/0x2d2
>  [<c0284d06>] sock_sendmsg+0xe2/0xfd
>  [<c013030d>] autoremove_wake_function+0x0/0x37
>  [<c013030d>] autoremove_wake_function+0x0/0x37
>  [<c01d79c3>] copy_from_user+0x2d/0x59
>  [<c0284e4e>] sys_sendmsg+0x12d/0x243
>  [<c02f12fd>] _read_unlock_irq+0x20/0x23
>  [<c013a49e>] trace_hardirqs_on+0xac/0x149
>  [<c0148c78>] find_get_page+0x11/0x49
>  [<c015585e>] __handle_mm_fault+0x19d/0x947
>  [<c02f0f87>] _spin_unlock+0x14/0x1c
>  [<c01558e5>] __handle_mm_fault+0x224/0x947
>  [<c028608a>] sys_socketcall+0x24f/0x271
>  [<c0103370>] restore_nocheck+0x12/0x15
>  [<c010329e>] sysenter_past_esp+0x5f/0x99
>  =======================
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
  2007-07-11 18:18 ` [Bugme-new] [Bug 8736] New: New TC deadlock scenario Andrew Morton
@ 2007-07-11 18:31   ` Patrick McHardy
  2007-07-14 15:43     ` Patrick McHardy
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick McHardy @ 2007-07-11 18:31 UTC (permalink / raw)
  To: ranko; +Cc: Andrew Morton, netdev, bugme-daemon@kernel-bugs.osdl.org

Andrew Morton wrote:
>>http://bugzilla.kernel.org/show_bug.cgi?id=8736
>>
>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and
>>qdisc_restart() deadlock.
>>
>>CPU#0
>>qdisc_watchdog() fires and gets dev->queue_lock
>>qdisc_run()...qdisc_restart()... 
>>-> releases dev->queue_lock and enters dev_hard_start_xmit()
>>
>>CPU#1
>>tc del qdisc dev ...
>>qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
>>-> grabs dev->queue_lock ... 
>>qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
>>-> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
>>holding dev->queue_lock
>>
>>CPU#0
>>dev_hard_start_xmit() returns ... 
>>-> wants to get dev->queue_lock(!)
>>
>>DEADLOCK!


Good catch.

Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2,
that should fix it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
  2007-07-11 18:31   ` Patrick McHardy
@ 2007-07-14 15:43     ` Patrick McHardy
  2007-07-14 16:45       ` Ranko Zivojnovic
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick McHardy @ 2007-07-14 15:43 UTC (permalink / raw)
  To: ranko; +Cc: Andrew Morton, netdev, bugme-daemon@kernel-bugs.osdl.org

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

Patrick McHardy wrote:
> Andrew Morton wrote:
> 
>>>http://bugzilla.kernel.org/show_bug.cgi?id=8736
>>>
>>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and
>>>qdisc_restart() deadlock.
>>>
>>>[...]
>>>DEADLOCK!
> 
> 
> 
> Good catch.
> 
> Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2,
> that should fix it.


Ranko, did you get a chance to test this? I've attached the patch
since it doesn't revert cleanly ..


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1473 bytes --]

[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization

As noticed by Ranko Zivojnovic <ranko@spidernet.net>, calling qdisc_run
from the timer handler can result in deadlock:

> CPU#0
>
> qdisc_watchdog() fires and gets dev->queue_lock
> qdisc_run()...qdisc_restart()...
> -> releases dev->queue_lock and enters dev_hard_start_xmit()
>
> CPU#1
>
> tc del qdisc dev ...
> qdisc_graft()...dev_graft_qdisc()...dev_deactivate()...
> -> grabs dev->queue_lock ...
>
> qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()...
> -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still
>		        holding dev->queue_lock
>
> CPU#0
>
> dev_hard_start_xmit() returns ...
> -> wants to get dev->queue_lock(!)
>
> DEADLOCK!

The entire optimization is a bit questionable IMO, it moves potentially
large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ,
which kind of defeats the separation of them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d92ea26..4fd0bec 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -278,11 +278,7 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer)
 
 	wd->qdisc->flags &= ~TCQ_F_THROTTLED;
 	smp_wmb();
-	if (spin_trylock(&dev->queue_lock)) {
-		qdisc_run(dev);
-		spin_unlock(&dev->queue_lock);
-	} else
-		netif_schedule(dev);
+	netif_schedule(dev);
 
 	return HRTIMER_NORESTART;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
  2007-07-14 15:43     ` Patrick McHardy
@ 2007-07-14 16:45       ` Ranko Zivojnovic
  2007-07-15  3:49         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Ranko Zivojnovic @ 2007-07-14 16:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Andrew Morton, netdev, bugme-daemon@kernel-bugs.osdl.org

On Sat, 2007-07-14 at 17:43 +0200, Patrick McHardy wrote:
> Patrick McHardy wrote:
> > Andrew Morton wrote:
> > 
> >>>http://bugzilla.kernel.org/show_bug.cgi?id=8736
> >>>
> >>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and
> >>>qdisc_restart() deadlock.
> >>>
> >>>[...]
> >>>DEADLOCK!
> > 
> > 
> > 
> > Good catch.
> > 
> > Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2,
> > that should fix it.
> 
> 
> Ranko, did you get a chance to test this? I've attached the patch
> since it doesn't revert cleanly ..

I have the commit reverted on my tree and have been testing it along
with the other changes to qdisc code I'm currently doing and it does
rectify the problem.

Acked-by: Ranko Zivojnovic <ranko@spidernet.net>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
  2007-07-14 16:45       ` Ranko Zivojnovic
@ 2007-07-15  3:49         ` David Miller
  2007-07-15 14:21           ` Patrick McHardy
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2007-07-15  3:49 UTC (permalink / raw)
  To: ranko; +Cc: kaber, akpm, netdev, bugme-daemon

From: Ranko Zivojnovic <ranko@spidernet.net>
Date: Sat, 14 Jul 2007 19:45:35 +0300

> On Sat, 2007-07-14 at 17:43 +0200, Patrick McHardy wrote:
> > Patrick McHardy wrote:
> > > Andrew Morton wrote:
> > > 
> > >>>http://bugzilla.kernel.org/show_bug.cgi?id=8736
> > >>>
> > >>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and
> > >>>qdisc_restart() deadlock.
> > >>>
> > >>>[...]
> > >>>DEADLOCK!
> > > 
> > > 
> > > 
> > > Good catch.
> > > 
> > > Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2,
> > > that should fix it.
> > 
> > 
> > Ranko, did you get a chance to test this? I've attached the patch
> > since it doesn't revert cleanly ..
> 
> I have the commit reverted on my tree and have been testing it along
> with the other changes to qdisc code I'm currently doing and it does
> rectify the problem.
> 
> Acked-by: Ranko Zivojnovic <ranko@spidernet.net>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
  2007-07-15  3:49         ` David Miller
@ 2007-07-15 14:21           ` Patrick McHardy
  2007-07-16  0:36             ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick McHardy @ 2007-07-15 14:21 UTC (permalink / raw)
  To: David Miller; +Cc: ranko, akpm, netdev, bugme-daemon

David Miller wrote:
> From: Ranko Zivojnovic <ranko@spidernet.net>
> Date: Sat, 14 Jul 2007 19:45:35 +0300
> 
> 
>>>>Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2,
>>>>that should fix it.
>>>
>>>
>>>Ranko, did you get a chance to test this? I've attached the patch
>>>since it doesn't revert cleanly ..
>>
>>I have the commit reverted on my tree and have been testing it along
>>with the other changes to qdisc code I'm currently doing and it does
>>rectify the problem.
>>
>>Acked-by: Ranko Zivojnovic <ranko@spidernet.net>
> 
> 
> Applied, thanks everyone.


This one is a regression in 2.6.22, so the patch should also go
in -stable IMO.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario
  2007-07-15 14:21           ` Patrick McHardy
@ 2007-07-16  0:36             ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2007-07-16  0:36 UTC (permalink / raw)
  To: kaber; +Cc: ranko, akpm, netdev, bugme-daemon

From: Patrick McHardy <kaber@trash.net>
Date: Sun, 15 Jul 2007 16:21:13 +0200

> This one is a regression in 2.6.22, so the patch should also go
> in -stable IMO.

I have it queued up for submission to -stable.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-07-16  0:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-8736-10286@http.bugzilla.kernel.org/>
2007-07-11 18:18 ` [Bugme-new] [Bug 8736] New: New TC deadlock scenario Andrew Morton
2007-07-11 18:31   ` Patrick McHardy
2007-07-14 15:43     ` Patrick McHardy
2007-07-14 16:45       ` Ranko Zivojnovic
2007-07-15  3:49         ` David Miller
2007-07-15 14:21           ` Patrick McHardy
2007-07-16  0:36             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).