* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario [not found] <bug-8736-10286@http.bugzilla.kernel.org/> @ 2007-07-11 18:18 ` Andrew Morton 2007-07-11 18:31 ` Patrick McHardy 0 siblings, 1 reply; 7+ messages in thread From: Andrew Morton @ 2007-07-11 18:18 UTC (permalink / raw) To: netdev; +Cc: bugme-daemon@kernel-bugs.osdl.org, ranko On Wed, 11 Jul 2007 08:45:12 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=8736 > > Summary: New TC deadlock scenario > Product: Networking > Version: 2.5 > KernelVersion: 2.6.22 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: ranko@spidernet.net > > > Most recent kernel where this bug did not occur: > Distribution: > Hardware Environment: > Software Environment: > Problem Description: > > Here is another scenario I bumped onto - qdisc_watchdog_cancel() and > qdisc_restart() deadlock. > > CPU#0 > qdisc_watchdog() fires and gets dev->queue_lock > qdisc_run()...qdisc_restart()... > -> releases dev->queue_lock and enters dev_hard_start_xmit() > > CPU#1 > tc del qdisc dev ... > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... > -> grabs dev->queue_lock ... > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still > holding dev->queue_lock > > CPU#0 > dev_hard_start_xmit() returns ... > -> wants to get dev->queue_lock(!) > > DEADLOCK! > > I did not manage to get a backtrace on qdisc_watchdog stack to show them both > but nevertheless - the above looks like the only way qdisc_watchdog_cancel > could be sitting there. > > Regards, > > Ranko > > ---cut--- > SysRq : Show Regs > > Pid: 12790, comm: tc > EIP: 0060:[<c02f13f9>] CPU: 1 > EIP is at _spin_unlock_irqrestore+0x36/0x39 > EFLAGS: 00000282 Not tainted (2.6.22.SNET.Thors.htbpatch.6.lockdebug #1) > EAX: 00000000 EBX: c1d119c0 ECX: 00000000 EDX: 00000000 > ESI: 00000282 EDI: c1d11a18 EBP: 00000000 DS: 007b ES: 007b FS: 00d8 > CR0: 80050033 CR2: 008ba828 CR3: 20dc2000 CR4: 000006d0 > [<c0132ff4>] hrtimer_try_to_cancel+0x33/0x66 > [<c013007b>] kthread+0xf/0x57 > [<c0133035>] hrtimer_cancel+0xe/0x14 > [<c029daaf>] qdisc_watchdog_cancel+0x8/0x11 > [<f8b8541d>] htb_reset+0x9c/0x14b [sch_htb] > [<c029c2ad>] qdisc_reset+0x10/0x11 > [<c029c3e7>] dev_deactivate+0x27/0xa5 > [<c029d7a6>] dev_graft_qdisc+0x81/0xa5 > [<c029d7f2>] qdisc_graft+0x28/0x88 > [<c029df93>] tc_get_qdisc+0x15d/0x1e9 > [<c029de36>] tc_get_qdisc+0x0/0x1e9 > [<c0297038>] rtnetlink_rcv_msg+0x1c2/0x1f5 > [<c02a1834>] netlink_run_queue+0x96/0xfd > [<c0296e76>] rtnetlink_rcv_msg+0x0/0x1f5 > [<c0296e28>] rtnetlink_rcv+0x26/0x42 > [<c02a1d5c>] netlink_data_ready+0x12/0x54 > [<c02a0abe>] netlink_sendskb+0x1c/0x33 > [<c02a1c6b>] netlink_sendmsg+0x1f3/0x2d2 > [<c0284d06>] sock_sendmsg+0xe2/0xfd > [<c013030d>] autoremove_wake_function+0x0/0x37 > [<c013030d>] autoremove_wake_function+0x0/0x37 > [<c01d79c3>] copy_from_user+0x2d/0x59 > [<c0284e4e>] sys_sendmsg+0x12d/0x243 > [<c02f12fd>] _read_unlock_irq+0x20/0x23 > [<c013a49e>] trace_hardirqs_on+0xac/0x149 > [<c0148c78>] find_get_page+0x11/0x49 > [<c015585e>] __handle_mm_fault+0x19d/0x947 > [<c02f0f87>] _spin_unlock+0x14/0x1c > [<c01558e5>] __handle_mm_fault+0x224/0x947 > [<c028608a>] sys_socketcall+0x24f/0x271 > [<c0103370>] restore_nocheck+0x12/0x15 > [<c010329e>] sysenter_past_esp+0x5f/0x99 > ======================= > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario 2007-07-11 18:18 ` [Bugme-new] [Bug 8736] New: New TC deadlock scenario Andrew Morton @ 2007-07-11 18:31 ` Patrick McHardy 2007-07-14 15:43 ` Patrick McHardy 0 siblings, 1 reply; 7+ messages in thread From: Patrick McHardy @ 2007-07-11 18:31 UTC (permalink / raw) To: ranko; +Cc: Andrew Morton, netdev, bugme-daemon@kernel-bugs.osdl.org Andrew Morton wrote: >>http://bugzilla.kernel.org/show_bug.cgi?id=8736 >> >>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and >>qdisc_restart() deadlock. >> >>CPU#0 >>qdisc_watchdog() fires and gets dev->queue_lock >>qdisc_run()...qdisc_restart()... >>-> releases dev->queue_lock and enters dev_hard_start_xmit() >> >>CPU#1 >>tc del qdisc dev ... >>qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... >>-> grabs dev->queue_lock ... >>qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... >>-> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still >>holding dev->queue_lock >> >>CPU#0 >>dev_hard_start_xmit() returns ... >>-> wants to get dev->queue_lock(!) >> >>DEADLOCK! Good catch. Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2, that should fix it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario 2007-07-11 18:31 ` Patrick McHardy @ 2007-07-14 15:43 ` Patrick McHardy 2007-07-14 16:45 ` Ranko Zivojnovic 0 siblings, 1 reply; 7+ messages in thread From: Patrick McHardy @ 2007-07-14 15:43 UTC (permalink / raw) To: ranko; +Cc: Andrew Morton, netdev, bugme-daemon@kernel-bugs.osdl.org [-- Attachment #1: Type: text/plain, Size: 455 bytes --] Patrick McHardy wrote: > Andrew Morton wrote: > >>>http://bugzilla.kernel.org/show_bug.cgi?id=8736 >>> >>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and >>>qdisc_restart() deadlock. >>> >>>[...] >>>DEADLOCK! > > > > Good catch. > > Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2, > that should fix it. Ranko, did you get a chance to test this? I've attached the patch since it doesn't revert cleanly .. [-- Attachment #2: x --] [-- Type: text/plain, Size: 1473 bytes --] [NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization As noticed by Ranko Zivojnovic <ranko@spidernet.net>, calling qdisc_run from the timer handler can result in deadlock: > CPU#0 > > qdisc_watchdog() fires and gets dev->queue_lock > qdisc_run()...qdisc_restart()... > -> releases dev->queue_lock and enters dev_hard_start_xmit() > > CPU#1 > > tc del qdisc dev ... > qdisc_graft()...dev_graft_qdisc()...dev_deactivate()... > -> grabs dev->queue_lock ... > > qdisc_reset()...{cbq,hfsc,htb,netem,tbf}_reset()...qdisc_watchdog_cancel()... > -> hrtimer_cancel() - waiting for the qdisc_watchdog() to exit, while still > holding dev->queue_lock > > CPU#0 > > dev_hard_start_xmit() returns ... > -> wants to get dev->queue_lock(!) > > DEADLOCK! The entire optimization is a bit questionable IMO, it moves potentially large parts of NET_TX_SOFTIRQ work to TIMER_SOFTIRQ/HRTIMER_SOFTIRQ, which kind of defeats the separation of them. Signed-off-by: Patrick McHardy <kaber@trash.net> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index d92ea26..4fd0bec 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -278,11 +278,7 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer *timer) wd->qdisc->flags &= ~TCQ_F_THROTTLED; smp_wmb(); - if (spin_trylock(&dev->queue_lock)) { - qdisc_run(dev); - spin_unlock(&dev->queue_lock); - } else - netif_schedule(dev); + netif_schedule(dev); return HRTIMER_NORESTART; } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario 2007-07-14 15:43 ` Patrick McHardy @ 2007-07-14 16:45 ` Ranko Zivojnovic 2007-07-15 3:49 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Ranko Zivojnovic @ 2007-07-14 16:45 UTC (permalink / raw) To: Patrick McHardy; +Cc: Andrew Morton, netdev, bugme-daemon@kernel-bugs.osdl.org On Sat, 2007-07-14 at 17:43 +0200, Patrick McHardy wrote: > Patrick McHardy wrote: > > Andrew Morton wrote: > > > >>>http://bugzilla.kernel.org/show_bug.cgi?id=8736 > >>> > >>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and > >>>qdisc_restart() deadlock. > >>> > >>>[...] > >>>DEADLOCK! > > > > > > > > Good catch. > > > > Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2, > > that should fix it. > > > Ranko, did you get a chance to test this? I've attached the patch > since it doesn't revert cleanly .. I have the commit reverted on my tree and have been testing it along with the other changes to qdisc code I'm currently doing and it does rectify the problem. Acked-by: Ranko Zivojnovic <ranko@spidernet.net> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario 2007-07-14 16:45 ` Ranko Zivojnovic @ 2007-07-15 3:49 ` David Miller 2007-07-15 14:21 ` Patrick McHardy 0 siblings, 1 reply; 7+ messages in thread From: David Miller @ 2007-07-15 3:49 UTC (permalink / raw) To: ranko; +Cc: kaber, akpm, netdev, bugme-daemon From: Ranko Zivojnovic <ranko@spidernet.net> Date: Sat, 14 Jul 2007 19:45:35 +0300 > On Sat, 2007-07-14 at 17:43 +0200, Patrick McHardy wrote: > > Patrick McHardy wrote: > > > Andrew Morton wrote: > > > > > >>>http://bugzilla.kernel.org/show_bug.cgi?id=8736 > > >>> > > >>>Here is another scenario I bumped onto - qdisc_watchdog_cancel() and > > >>>qdisc_restart() deadlock. > > >>> > > >>>[...] > > >>>DEADLOCK! > > > > > > > > > > > > Good catch. > > > > > > Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2, > > > that should fix it. > > > > > > Ranko, did you get a chance to test this? I've attached the patch > > since it doesn't revert cleanly .. > > I have the commit reverted on my tree and have been testing it along > with the other changes to qdisc code I'm currently doing and it does > rectify the problem. > > Acked-by: Ranko Zivojnovic <ranko@spidernet.net> Applied, thanks everyone. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario 2007-07-15 3:49 ` David Miller @ 2007-07-15 14:21 ` Patrick McHardy 2007-07-16 0:36 ` David Miller 0 siblings, 1 reply; 7+ messages in thread From: Patrick McHardy @ 2007-07-15 14:21 UTC (permalink / raw) To: David Miller; +Cc: ranko, akpm, netdev, bugme-daemon David Miller wrote: > From: Ranko Zivojnovic <ranko@spidernet.net> > Date: Sat, 14 Jul 2007 19:45:35 +0300 > > >>>>Please try reverting commit 1936502d00ae6c2aa3931c42f6cf54afaba094f2, >>>>that should fix it. >>> >>> >>>Ranko, did you get a chance to test this? I've attached the patch >>>since it doesn't revert cleanly .. >> >>I have the commit reverted on my tree and have been testing it along >>with the other changes to qdisc code I'm currently doing and it does >>rectify the problem. >> >>Acked-by: Ranko Zivojnovic <ranko@spidernet.net> > > > Applied, thanks everyone. This one is a regression in 2.6.22, so the patch should also go in -stable IMO. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bugme-new] [Bug 8736] New: New TC deadlock scenario 2007-07-15 14:21 ` Patrick McHardy @ 2007-07-16 0:36 ` David Miller 0 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2007-07-16 0:36 UTC (permalink / raw) To: kaber; +Cc: ranko, akpm, netdev, bugme-daemon From: Patrick McHardy <kaber@trash.net> Date: Sun, 15 Jul 2007 16:21:13 +0200 > This one is a regression in 2.6.22, so the patch should also go > in -stable IMO. I have it queued up for submission to -stable. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-07-16 0:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-8736-10286@http.bugzilla.kernel.org/>
2007-07-11 18:18 ` [Bugme-new] [Bug 8736] New: New TC deadlock scenario Andrew Morton
2007-07-11 18:31 ` Patrick McHardy
2007-07-14 15:43 ` Patrick McHardy
2007-07-14 16:45 ` Ranko Zivojnovic
2007-07-15 3:49 ` David Miller
2007-07-15 14:21 ` Patrick McHardy
2007-07-16 0:36 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).