* ath5k/mac80211: Reproducible deadlock with 64-stations.
@ 2010-11-10 23:02 Ben Greear
2010-11-11 0:57 ` Ben Greear
2010-11-11 1:02 ` Johannes Berg
0 siblings, 2 replies; 20+ messages in thread
From: Ben Greear @ 2010-11-10 23:02 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org
We started up our strenuous wifi testing again, with this morning's wireless-testing
(plus some of our own non-wifi-related patches).
We have a reproducible kernel hang when attempting to quickly
create and configure 64 STA interfaces, no encryption, with
an ath5k NIC.
It appears to me that this may be a general mac80211 deadlock
from the 'sysrq d' output below.
We'll attempt to reproduce with a lesser-patched wireless-testing
in the meantime.
Nov 10 14:54:33 localhost kernel: SysRq : Show Locks Held
Nov 10 14:54:33 localhost kernel:
Nov 10 14:54:33 localhost kernel: Showing all locks held in the system:
Nov 10 14:54:33 localhost kernel: 3 locks held by kworker/0:0/4:
Nov 10 14:54:33 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf
Nov 10 14:54:33 localhost kernel: #1: ((linkwatch_work).work){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf
Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 1 lock held by ntpd/1675:
Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 1 lock held by bash/2182:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by bash/2199:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2517:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2518:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2519:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2520:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2521:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2522:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 1 lock held by bash/2679:
Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee
Nov 10 14:54:33 localhost kernel: 3 locks held by kworker/0:3/5698:
Nov 10 14:54:33 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf
Nov 10 14:54:33 localhost kernel: #1: (wireless_nlevent_work){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf
Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6438:
Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070f78c>] netlink_dump+0x3a/0x16a
Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6441:
Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6442:
Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 1 lock held by iwconfig/6443:
Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6444:
Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11
Nov 10 14:54:33 localhost kernel: 4 locks held by sh/6449:
Nov 10 14:54:33 localhost kernel: #0: (&(&i->lock)->rlock){-.-...}, at: [<c0631760>] serial8250_interrupt+0x1e/0xe2
Nov 10 14:54:33 localhost kernel: #1: (&port_lock_key){-.-...}, at: [<c06314e6>] serial8250_handle_port+0x10/0x26c
Nov 10 14:54:33 localhost kernel: #2: (sysrq_key_table_lock){-.....}, at: [<c061e912>] __handle_sysrq+0x18/0x11d
Nov 10 14:54:33 localhost kernel: #3: (tasklist_lock){.+.+..}, at: [<c0459d53>] debug_show_all_locks+0x43/0x179
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-10 23:02 ath5k/mac80211: Reproducible deadlock with 64-stations Ben Greear @ 2010-11-11 0:57 ` Ben Greear 2010-11-11 1:03 ` Johannes Berg 2010-11-11 1:02 ` Johannes Berg 1 sibling, 1 reply; 20+ messages in thread From: Ben Greear @ 2010-11-11 0:57 UTC (permalink / raw) To: linux-wireless@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 4894 bytes --] On 11/10/2010 03:02 PM, Ben Greear wrote: > We started up our strenuous wifi testing again, with this morning's > wireless-testing > (plus some of our own non-wifi-related patches). > > We have a reproducible kernel hang when attempting to quickly > create and configure 64 STA interfaces, no encryption, with > an ath5k NIC. > > It appears to me that this may be a general mac80211 deadlock > from the 'sysrq d' output below. > > We'll attempt to reproduce with a lesser-patched wireless-testing > in the meantime. We could reproduce it on the less hacked kernel. I think the attached trace might be more useful. It shows the blocked tasks spammage that we saw in an earlier run. Thanks, Ben > > > Nov 10 14:54:33 localhost kernel: SysRq : Show Locks Held > Nov 10 14:54:33 localhost kernel: > Nov 10 14:54:33 localhost kernel: Showing all locks held in the system: > Nov 10 14:54:33 localhost kernel: 3 locks held by kworker/0:0/4: > Nov 10 14:54:33 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] > process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #1: ((linkwatch_work).work){+.+.+.}, > at: [<c0446ded>] process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by ntpd/1675: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by bash/2182: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by bash/2199: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2517: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2518: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2519: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2520: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2521: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by mingetty/2522: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 1 lock held by bash/2679: > Nov 10 14:54:33 localhost kernel: #0: (&tty->atomic_read_lock){+.+...}, > at: [<c060f4a5>] n_tty_read+0x1d2/0x5ee > Nov 10 14:54:33 localhost kernel: 3 locks held by kworker/0:3/5698: > Nov 10 14:54:33 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] > process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #1: (wireless_nlevent_work){+.+.+.}, > at: [<c0446ded>] process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6438: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: > [<c070f78c>] netlink_dump+0x3a/0x16a > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6441: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6442: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by iwconfig/6443: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6444: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: > [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 4 locks held by sh/6449: > Nov 10 14:54:33 localhost kernel: #0: (&(&i->lock)->rlock){-.-...}, at: > [<c0631760>] serial8250_interrupt+0x1e/0xe2 > Nov 10 14:54:33 localhost kernel: #1: (&port_lock_key){-.-...}, at: > [<c06314e6>] serial8250_handle_port+0x10/0x26c > Nov 10 14:54:33 localhost kernel: #2: (sysrq_key_table_lock){-.....}, > at: [<c061e912>] __handle_sysrq+0x18/0x11d > Nov 10 14:54:33 localhost kernel: #3: (tasklist_lock){.+.+..}, at: > [<c0459d53>] debug_show_all_locks+0x43/0x179 > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com [-- Attachment #2: ath5k_crash.txt --] [-- Type: text/plain, Size: 27532 bytes --] Nov 10 13:26:23 localhost kernel: INFO: task kworker/0:1:19 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: kworker/0:1 D c045a09d 0 19 2 0x00000000 Nov 10 13:26:23 localhost kernel: f512bebc 00000046 f512be8c c045a09d dc3302fb 00000043 c0a22380 f50b3ea0 Nov 10 13:26:23 localhost kernel: c0a22380 f50b411c f50b4118 f50b4118 c0a22380 c0a22380 00240083 00000000 Nov 10 13:26:23 localhost kernel: f146a000 00000043 f50b3ea0 c09a0db4 c09a0db4 f50b3ea0 c09a0db8 f512bf08 Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c0703840>] linkwatch_event+0x8/0x22 Nov 10 13:26:23 localhost kernel: [<c0446e5e>] process_one_work+0x1af/0x2bf Nov 10 13:26:23 localhost kernel: [<c0446ded>] ? process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: [<c0703838>] ? linkwatch_event+0x0/0x22 Nov 10 13:26:23 localhost kernel: [<c04485ac>] worker_thread+0xf9/0x1bf Nov 10 13:26:23 localhost kernel: [<c04484b3>] ? worker_thread+0x0/0x1bf Nov 10 13:26:23 localhost kernel: [<c044b0da>] kthread+0x62/0x67 Nov 10 13:26:23 localhost kernel: [<c044b078>] ? kthread+0x0/0x67 Nov 10 13:26:23 localhost kernel: [<c04036c6>] kernel_thread_helper+0x6/0x1a Nov 10 13:26:23 localhost kernel: 3 locks held by kworker/0:1/19: Nov 10 13:26:23 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: #1: ((linkwatch_work).work){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: INFO: task kworker/0:2:1095 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: kworker/0:2 D c045a09d 0 1095 2 0x00000000 Nov 10 13:26:23 localhost kernel: f3b0fe80 00000046 00000001 c045a09d c0d25960 00000043 c0a22380 f476e860 Nov 10 13:26:23 localhost kernel: c0a22380 f476eadc f476ead8 f476ead8 c0a22380 c0a22380 00000813 00000000 Nov 10 13:26:23 localhost kernel: 00000000 00000043 f476e860 c09a0db4 c09a0db4 f476e860 c09a0db8 f3b0fecc Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<f8494079>] cfg80211_conn_work+0x16/0x114 [cfg80211] Nov 10 13:26:23 localhost kernel: [<c0446ded>] ? process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: [<c0446e5e>] process_one_work+0x1af/0x2bf Nov 10 13:26:23 localhost kernel: [<c0446ded>] ? process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: [<f8494063>] ? cfg80211_conn_work+0x0/0x114 [cfg80211] Nov 10 13:26:23 localhost kernel: [<c04485bd>] worker_thread+0x10a/0x1bf Nov 10 13:26:23 localhost kernel: [<c04484b3>] ? worker_thread+0x0/0x1bf Nov 10 13:26:23 localhost kernel: [<c044b0da>] kthread+0x62/0x67 Nov 10 13:26:23 localhost kernel: [<c044b078>] ? kthread+0x0/0x67 Nov 10 13:26:23 localhost kernel: [<c04036c6>] kernel_thread_helper+0x6/0x1a Nov 10 13:26:23 localhost kernel: 3 locks held by kworker/0:2/1095: Nov 10 13:26:23 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: #1: ((&rdev->conn_work)){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: INFO: task ntpd:1553 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: ntpd D c045a09d 0 1553 1 0x00000080 Nov 10 13:26:23 localhost kernel: f31a5e04 00200046 f31a5dd4 c045a09d 44b01aaf 00000044 c0a22380 f317c910 Nov 10 13:26:23 localhost kernel: c0a22380 f317cb8c f317cb88 f317cb88 c0a22380 c0a22380 00303df7 00000000 Nov 10 13:26:23 localhost kernel: f2013180 00000044 f317c910 c09a0db4 c09a0db4 f317c910 c09a0db8 f31a5e50 Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c06f9446>] dev_ioctl+0x21/0x53f Nov 10 13:26:23 localhost kernel: [<c045a539>] ? trace_hardirqs_on+0xb/0xd Nov 10 13:26:23 localhost kernel: [<c05a4986>] ? __raw_spin_lock_init+0x2d/0x4f Nov 10 13:26:23 localhost kernel: [<c073f765>] ? udp_ioctl+0x0/0x5f Nov 10 13:26:23 localhost kernel: [<c06e80d5>] sock_ioctl+0x1f6/0x202 Nov 10 13:26:23 localhost kernel: [<c04becf3>] ? fd_install+0x27/0xa9 Nov 10 13:26:23 localhost kernel: [<c06e7edf>] ? sock_ioctl+0x0/0x202 Nov 10 13:26:23 localhost kernel: [<c04cc0c1>] do_vfs_ioctl+0x56d/0x5c3 Nov 10 13:26:23 localhost kernel: [<c0790d14>] ? _raw_spin_unlock+0x1d/0x20 Nov 10 13:26:23 localhost kernel: [<c04bed6e>] ? fd_install+0xa2/0xa9 Nov 10 13:26:23 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:23 localhost kernel: [<c04c13c1>] ? fget_light+0x2d/0xb0 Nov 10 13:26:23 localhost kernel: [<c04cc15a>] sys_ioctl+0x43/0x62 Nov 10 13:26:23 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:23 localhost kernel: 1 lock held by ntpd/1553: Nov 10 13:26:23 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: INFO: task btserver:6012 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: btserver D c045a09d 0 6012 5903 0x00000080 Nov 10 13:26:23 localhost kernel: f1653ce4 00000046 f1653cb4 c045a09d 680a5a2f 00000044 c0a22380 f2069f50 Nov 10 13:26:23 localhost kernel: c0a22380 f206a1cc f206a1c8 f206a1c8 c0a22380 c0a22380 0022bd20 00000000 Nov 10 13:26:23 localhost kernel: f2013180 00000044 f2069f50 c09a0db4 c09a0db4 f2069f50 c09a0db8 f1653d30 Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070221f>] rtnetlink_rcv+0xe/0x22 Nov 10 13:26:23 localhost kernel: [<c0710527>] netlink_unicast+0xbe/0x11a Nov 10 13:26:23 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:23 localhost kernel: [<c07110f8>] netlink_sendmsg+0x23e/0x255 Nov 10 13:26:23 localhost kernel: [<c06eace8>] ? rcu_read_lock+0x0/0x35 Nov 10 13:26:23 localhost kernel: [<c06e81a4>] __sock_sendmsg+0x54/0x5b Nov 10 13:26:23 localhost kernel: [<c06e866e>] sock_sendmsg+0x95/0xac Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c04a0180>] ? might_fault+0x90/0x96 Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c0597708>] ? _copy_from_user+0x31/0x115 Nov 10 13:26:23 localhost kernel: [<c06e8429>] ? move_addr_to_kernel+0x3e/0x43 Nov 10 13:26:23 localhost kernel: [<c06e8c8c>] sys_sendto+0xaa/0xc6 Nov 10 13:26:23 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c06e9d58>] sys_socketcall+0x169/0x289 Nov 10 13:26:23 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:23 localhost kernel: 1 lock held by btserver/6012: Nov 10 13:26:23 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: INFO: task kworker/0:3:7184 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: kworker/0:3 D c045a09d 0 7184 2 0x00000080 Nov 10 13:26:23 localhost kernel: f1021ea8 00000046 00000001 c045a09d c0d2795a 00000043 c0a22380 f1008a70 Nov 10 13:26:23 localhost kernel: c0a22380 f1008cec f1008ce8 f1008ce8 c0a22380 c0a22380 0000058b 00000000 Nov 10 13:26:23 localhost kernel: 00000000 00000043 f1008a70 c09a0db4 c09a0db4 f1008a70 c09a0db8 f1021ef4 Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c0770025>] wireless_nlevent_process+0xd/0x69 Nov 10 13:26:23 localhost kernel: [<c0446e5e>] process_one_work+0x1af/0x2bf Nov 10 13:26:23 localhost kernel: [<c0446ded>] ? process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: [<c0770018>] ? wireless_nlevent_process+0x0/0x69 Nov 10 13:26:23 localhost kernel: [<c04485ac>] worker_thread+0xf9/0x1bf Nov 10 13:26:23 localhost kernel: [<c04484b3>] ? worker_thread+0x0/0x1bf Nov 10 13:26:23 localhost kernel: [<c044b0da>] kthread+0x62/0x67 Nov 10 13:26:23 localhost kernel: [<c044b078>] ? kthread+0x0/0x67 Nov 10 13:26:23 localhost kernel: [<c04036c6>] kernel_thread_helper+0x6/0x1a Nov 10 13:26:23 localhost kernel: 3 locks held by kworker/0:3/7184: Nov 10 13:26:23 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: #1: (wireless_nlevent_work){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf Nov 10 13:26:23 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: INFO: task ip:7601 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: ip D c045a09d 0 7601 7563 0x00000080 Nov 10 13:26:23 localhost kernel: ece15c54 00000046 ece15c24 c045a09d bda03dd5 00000043 c0a22380 f1009f50 Nov 10 13:26:23 localhost kernel: c0a22380 f100a1cc f100a1c8 f100a1c8 c0a22380 c0a22380 00024c02 00000000 Nov 10 13:26:23 localhost kernel: f1716c80 00000043 f1009f50 c09a0db4 c09a0db4 f1009f50 c09a0db8 ece15ca0 Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070f78c>] ? netlink_dump+0x3a/0x16a Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070f78c>] ? netlink_dump+0x3a/0x16a Nov 10 13:26:23 localhost kernel: [<c070f78c>] netlink_dump+0x3a/0x16a Nov 10 13:26:23 localhost kernel: [<c06eee50>] ? consume_skb+0x2a/0x2c Nov 10 13:26:23 localhost kernel: [<c070fbfd>] netlink_recvmsg+0x16d/0x27d Nov 10 13:26:23 localhost kernel: [<c06eaeaf>] ? rcu_read_unlock+0x17/0x19 Nov 10 13:26:23 localhost kernel: [<c06e7b97>] __sock_recvmsg_nosec+0x4a/0x52 Nov 10 13:26:23 localhost kernel: [<c06e81e8>] __sock_recvmsg+0x3d/0x44 Nov 10 13:26:23 localhost kernel: [<c06e8580>] sock_recvmsg+0x9d/0xb5 Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c04a0180>] ? might_fault+0x90/0x96 Nov 10 13:26:23 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:23 localhost kernel: [<c06f0b47>] ? verify_iovec+0x3e/0x77 Nov 10 13:26:23 localhost kernel: [<c06e835c>] __sys_recvmsg+0xfb/0x18a Nov 10 13:26:23 localhost kernel: [<c06e84e3>] ? sock_recvmsg+0x0/0xb5 Nov 10 13:26:23 localhost kernel: [<c04a247d>] ? __do_fault+0x32d/0x359 Nov 10 13:26:23 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:23 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:23 localhost kernel: [<c04c13c1>] ? fget_light+0x2d/0xb0 Nov 10 13:26:23 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:23 localhost kernel: [<c06e8933>] sys_recvmsg+0x36/0x4d Nov 10 13:26:23 localhost kernel: [<c06e9e2b>] sys_socketcall+0x23c/0x289 Nov 10 13:26:23 localhost kernel: [<c0790fae>] ? restore_all_notrace+0x0/0x18 Nov 10 13:26:23 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:23 localhost kernel: 1 lock held by ip/7601: Nov 10 13:26:23 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070f78c>] netlink_dump+0x3a/0x16a Nov 10 13:26:23 localhost kernel: INFO: task ip:7602 blocked for more than 120 seconds. Nov 10 13:26:23 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:23 localhost kernel: ip D c045a09d 0 7602 7533 0x00000080 Nov 10 13:26:23 localhost kernel: f17efcc4 00000046 f17efc94 c045a09d 533f2e05 00000045 c0a22380 f100c910 Nov 10 13:26:23 localhost kernel: c0a22380 f100cb8c f100cb88 f100cb88 c0a22380 c0a22380 0007379a 00000000 Nov 10 13:26:23 localhost kernel: f146a000 00000045 f100c910 c09a0db4 c09a0db4 f100c910 c09a0db8 f17efd10 Nov 10 13:26:23 localhost kernel: Call Trace: Nov 10 13:26:23 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:23 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:23 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:23 localhost kernel: [<c070221f>] rtnetlink_rcv+0xe/0x22 Nov 10 13:26:24 localhost kernel: [<c0710527>] netlink_unicast+0xbe/0x11a Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c07110f8>] netlink_sendmsg+0x23e/0x255 Nov 10 13:26:24 localhost kernel: [<c06eace8>] ? rcu_read_lock+0x0/0x35 Nov 10 13:26:24 localhost kernel: [<c06e81a4>] __sock_sendmsg+0x54/0x5b Nov 10 13:26:24 localhost kernel: [<c06e866e>] sock_sendmsg+0x95/0xac Nov 10 13:26:24 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:24 localhost kernel: [<c04c13c1>] ? fget_light+0x2d/0xb0 Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c06e86e0>] ? sockfd_lookup_light+0x16/0x46 Nov 10 13:26:24 localhost kernel: [<c06e8c8c>] sys_sendto+0xaa/0xc6 Nov 10 13:26:24 localhost kernel: [<c04a247d>] ? __do_fault+0x32d/0x359 Nov 10 13:26:24 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:24 localhost kernel: [<c04a2d74>] ? handle_mm_fault+0x322/0x825 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c06e8cde>] sys_send+0x36/0x38 Nov 10 13:26:24 localhost kernel: [<c06e9d2b>] sys_socketcall+0x13c/0x289 Nov 10 13:26:24 localhost kernel: [<c0790fae>] ? restore_all_notrace+0x0/0x18 Nov 10 13:26:24 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:24 localhost kernel: 1 lock held by ip/7602: Nov 10 13:26:24 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: INFO: task ip:7603 blocked for more than 120 seconds. Nov 10 13:26:24 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:24 localhost kernel: ip D 00428314 0 7603 7548 0x00000080 Nov 10 13:26:24 localhost kernel: f106b9d0 00000046 f1009403 00428314 533f1a80 00000045 c0a22380 f10094e0 Nov 10 13:26:24 localhost kernel: c0a22380 f100975c f1009758 f1009758 c0a22380 c0a22380 0014a3a9 00000000 Nov 10 13:26:24 localhost kernel: f23a4780 00000045 f10094e0 f106ba94 f106ba80 7fffffff f10094e0 f106ba14 Nov 10 13:26:24 localhost kernel: Call Trace: Nov 10 13:26:24 localhost kernel: [<c078f4a3>] schedule_timeout+0x16/0x9f Nov 10 13:26:24 localhost kernel: [<c045a2b1>] ? mark_held_locks+0x47/0x5f Nov 10 13:26:24 localhost kernel: [<c0790ca6>] ? _raw_spin_unlock_irq+0x22/0x2b Nov 10 13:26:24 localhost kernel: [<c045a50d>] ? trace_hardirqs_on_caller+0x104/0x125 Nov 10 13:26:24 localhost kernel: [<c045a539>] ? trace_hardirqs_on+0xb/0xd Nov 10 13:26:24 localhost kernel: [<c078f378>] wait_for_common+0xbb/0x101 Nov 10 13:26:24 localhost kernel: [<c0432063>] ? default_wake_function+0x0/0xd Nov 10 13:26:24 localhost kernel: [<c078f441>] wait_for_completion+0x12/0x14 Nov 10 13:26:24 localhost kernel: [<c0447b6b>] flush_work+0x23/0x27 Nov 10 13:26:24 localhost kernel: [<c0446a5e>] ? wq_barrier_func+0x0/0xd Nov 10 13:26:24 localhost kernel: [<f8c9e1e1>] ieee80211_do_stop+0x252/0x3b7 [mac80211] Nov 10 13:26:24 localhost kernel: [<c043c184>] ? local_bh_enable_ip+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c0790c81>] ? _raw_spin_unlock_bh+0x25/0x28 Nov 10 13:26:24 localhost kernel: [<c0709401>] ? dev_deactivate+0x18/0xd1 Nov 10 13:26:24 localhost kernel: [<f8c9e358>] ieee80211_stop+0x12/0x16 [mac80211] Nov 10 13:26:24 localhost kernel: [<c06f8131>] __dev_close+0x73/0x88 Nov 10 13:26:24 localhost kernel: [<c06f5c1e>] ? dev_set_rx_mode+0x22/0x26 Nov 10 13:26:24 localhost kernel: [<c06f5dc0>] __dev_change_flags+0xa5/0x11a Nov 10 13:26:24 localhost kernel: [<c06f7fdc>] dev_change_flags+0x13/0x3f Nov 10 13:26:24 localhost kernel: [<c0702855>] do_setlink+0x23a/0x525 Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c0702e83>] rtnl_newlink+0x283/0x45c Nov 10 13:26:24 localhost kernel: [<c0702c7e>] ? rtnl_newlink+0x7e/0x45c Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c04b5300>] ? slab_pad_check+0xb/0xf6 Nov 10 13:26:24 localhost kernel: [<c045a2b1>] ? mark_held_locks+0x47/0x5f Nov 10 13:26:24 localhost kernel: [<c078fc95>] ? __mutex_lock_common+0x2d0/0x2eb Nov 10 13:26:24 localhost kernel: [<c045a50d>] ? trace_hardirqs_on_caller+0x104/0x125 Nov 10 13:26:24 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c0702c00>] ? rtnl_newlink+0x0/0x45c Nov 10 13:26:24 localhost kernel: [<c07023bb>] rtnetlink_rcv_msg+0x188/0x19e Nov 10 13:26:24 localhost kernel: [<c0702233>] ? rtnetlink_rcv_msg+0x0/0x19e Nov 10 13:26:24 localhost kernel: [<c0710708>] netlink_rcv_skb+0x30/0x77 Nov 10 13:26:24 localhost kernel: [<c070222c>] rtnetlink_rcv+0x1b/0x22 Nov 10 13:26:24 localhost kernel: [<c0710527>] netlink_unicast+0xbe/0x11a Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c07110f8>] netlink_sendmsg+0x23e/0x255 Nov 10 13:26:24 localhost kernel: [<c06eace8>] ? rcu_read_lock+0x0/0x35 Nov 10 13:26:24 localhost kernel: [<c06e81a4>] __sock_sendmsg+0x54/0x5b Nov 10 13:26:24 localhost kernel: [<c06e866e>] sock_sendmsg+0x95/0xac Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a0180>] ? might_fault+0x90/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c0597708>] ? _copy_from_user+0x31/0x115 Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c06f0b47>] ? verify_iovec+0x3e/0x77 Nov 10 13:26:24 localhost kernel: [<c06e8a97>] sys_sendmsg+0x14d/0x19a Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c04a247d>] ? __do_fault+0x32d/0x359 Nov 10 13:26:24 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:24 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a0180>] ? might_fault+0x90/0x96 Nov 10 13:26:24 localhost kernel: [<c06e9e16>] sys_socketcall+0x227/0x289 Nov 10 13:26:24 localhost kernel: [<c0403111>] ? sysenter_exit+0xf/0x1a Nov 10 13:26:24 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:24 localhost kernel: 1 lock held by ip/7603: Nov 10 13:26:24 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: INFO: task ip:7610 blocked for more than 120 seconds. Nov 10 13:26:24 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:24 localhost kernel: ip D c045a09d 0 7610 7527 0x00000080 Nov 10 13:26:24 localhost kernel: f1047cc4 00000046 f1047c94 c045a09d bddb9412 00000043 c0a22380 f14094e0 Nov 10 13:26:24 localhost kernel: c0a22380 f140975c f1409758 f1409758 c0a22380 c0a22380 006c2f4d 00000000 Nov 10 13:26:24 localhost kernel: f146a280 00000043 f14094e0 c09a0db4 c09a0db4 f14094e0 c09a0db8 f1047d10 Nov 10 13:26:24 localhost kernel: Call Trace: Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:24 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:24 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c070221f>] rtnetlink_rcv+0xe/0x22 Nov 10 13:26:24 localhost kernel: [<c0710527>] netlink_unicast+0xbe/0x11a Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c07110f8>] netlink_sendmsg+0x23e/0x255 Nov 10 13:26:24 localhost kernel: [<c06eace8>] ? rcu_read_lock+0x0/0x35 Nov 10 13:26:24 localhost kernel: [<c06e81a4>] __sock_sendmsg+0x54/0x5b Nov 10 13:26:24 localhost kernel: [<c06e866e>] sock_sendmsg+0x95/0xac Nov 10 13:26:24 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:24 localhost kernel: [<c04c13c1>] ? fget_light+0x2d/0xb0 Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c06e86e0>] ? sockfd_lookup_light+0x16/0x46 Nov 10 13:26:24 localhost kernel: [<c06e8c8c>] sys_sendto+0xaa/0xc6 Nov 10 13:26:24 localhost kernel: [<c04a247d>] ? __do_fault+0x32d/0x359 Nov 10 13:26:24 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:24 localhost kernel: [<c04a2d74>] ? handle_mm_fault+0x322/0x825 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c06e8cde>] sys_send+0x36/0x38 Nov 10 13:26:24 localhost kernel: [<c06e9d2b>] sys_socketcall+0x13c/0x289 Nov 10 13:26:24 localhost kernel: [<c0790fae>] ? restore_all_notrace+0x0/0x18 Nov 10 13:26:24 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:24 localhost kernel: [<c0790000>] ? __down_common+0x5/0xae Nov 10 13:26:24 localhost kernel: 1 lock held by ip/7610: Nov 10 13:26:24 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: INFO: task ip:7611 blocked for more than 120 seconds. Nov 10 13:26:24 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:24 localhost kernel: ip D c045a09d 0 7611 7579 0x00000080 Nov 10 13:26:24 localhost kernel: ea7cdcc4 00000046 ea7cdc94 c045a09d be641762 00000043 c0a22380 f140d380 Nov 10 13:26:24 localhost kernel: c0a22380 f140d5fc f140d5f8 f140d5f8 c0a22380 c0a22380 003d63d2 00000000 Nov 10 13:26:24 localhost kernel: 00000000 00000043 f140d380 c09a0db4 c09a0db4 f140d380 c09a0db8 ea7cdd10 Nov 10 13:26:24 localhost kernel: Call Trace: Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c078fba3>] __mutex_lock_common+0x1de/0x2eb Nov 10 13:26:24 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c078fd5c>] mutex_lock_nested+0x36/0x3b Nov 10 13:26:24 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c070220f>] rtnl_lock+0xf/0x11 Nov 10 13:26:24 localhost kernel: [<c070221f>] rtnetlink_rcv+0xe/0x22 Nov 10 13:26:24 localhost kernel: [<c0710527>] netlink_unicast+0xbe/0x11a Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c07110f8>] netlink_sendmsg+0x23e/0x255 Nov 10 13:26:24 localhost kernel: [<c06eace8>] ? rcu_read_lock+0x0/0x35 Nov 10 13:26:24 localhost kernel: [<c06e81a4>] __sock_sendmsg+0x54/0x5b Nov 10 13:26:24 localhost kernel: [<c06e866e>] sock_sendmsg+0x95/0xac Nov 10 13:26:24 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:24 localhost kernel: [<c04c13c1>] ? fget_light+0x2d/0xb0 Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c06e86e0>] ? sockfd_lookup_light+0x16/0x46 Nov 10 13:26:24 localhost kernel: [<c06e8c8c>] sys_sendto+0xaa/0xc6 Nov 10 13:26:24 localhost kernel: [<c04a247d>] ? __do_fault+0x32d/0x359 Nov 10 13:26:24 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:24 localhost kernel: [<c04a2d74>] ? handle_mm_fault+0x322/0x825 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c06e8cde>] sys_send+0x36/0x38 Nov 10 13:26:24 localhost kernel: [<c06e9d2b>] sys_socketcall+0x13c/0x289 Nov 10 13:26:24 localhost kernel: [<c0790fae>] ? restore_all_notrace+0x0/0x18 Nov 10 13:26:24 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:24 localhost kernel: 1 lock held by ip/7611: Nov 10 13:26:24 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 0:57 ` Ben Greear @ 2010-11-11 1:03 ` Johannes Berg 2010-11-11 5:51 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Johannes Berg @ 2010-11-11 1:03 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless@vger.kernel.org On Wed, 2010-11-10 at 16:57 -0800, Ben Greear wrote: > I think the attached trace might be more useful. It shows > the blocked tasks spammage that we saw in an earlier > run. Yeah but we already knew that these tasks were blocking, and where ... nothing really new here. I don't see it, yet anyway. johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 1:03 ` Johannes Berg @ 2010-11-11 5:51 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2010-11-11 5:51 UTC (permalink / raw) To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org On 11/10/2010 05:03 PM, Johannes Berg wrote: > On Wed, 2010-11-10 at 16:57 -0800, Ben Greear wrote: > >> I think the attached trace might be more useful. It shows >> the blocked tasks spammage that we saw in an earlier >> run. > > Yeah but we already knew that these tasks were blocking, and where ... > nothing really new here. I don't see it, yet anyway. I think at least one of the 'ip' processes has RTNL held, according to it's stack trace, and then it went and did something in mac80211. I haven't tried to track everything down in the code yet, however. The system starts being very sluggish, and then locks hard. It's possible that those particular processes might eventually recover or get a bit farther, perhaps the hard-lock is elsewhere. We have lockdep enabled, and it never spits anything out, for what that's worth. I'll get some more traces tomorrow to see if I can find any similarities. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-10 23:02 ath5k/mac80211: Reproducible deadlock with 64-stations Ben Greear 2010-11-11 0:57 ` Ben Greear @ 2010-11-11 1:02 ` Johannes Berg 2010-11-11 9:27 ` Tejun Heo 1 sibling, 1 reply; 20+ messages in thread From: Johannes Berg @ 2010-11-11 1:02 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless@vger.kernel.org, Tejun Heo I don't really see any deadlock here... hmm. Tejun, do you see anything wrong with the "locking" in workq stuff here? Something is holding the RTNL, and a bunch of other things are trying to acquire it. We don't really know who's holding it and who's acquiring it though. > Nov 10 14:54:33 localhost kernel: Showing all locks held in the system: > Nov 10 14:54:33 localhost kernel: 3 locks held by kworker/0:0/4: > Nov 10 14:54:33 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #1: ((linkwatch_work).work){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 This comes in from schedule_work() IIRC. > Nov 10 14:54:33 localhost kernel: 1 lock held by ntpd/1675: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 ntpd holding/acquiring rtnl ... > Nov 10 14:54:33 localhost kernel: #0: (events){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #1: (wireless_nlevent_work){+.+.+.}, at: [<c0446ded>] process_one_work+0x13e/0x2bf > Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 another schedule_work() acquiring/holding RTNL. > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6438: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070f78c>] netlink_dump+0x3a/0x16a > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6441: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6442: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by iwconfig/6443: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6444: > Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 bunch of other things trying / holding ... The other things I snipped seem completely orthogonal... johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 1:02 ` Johannes Berg @ 2010-11-11 9:27 ` Tejun Heo 2010-11-11 16:55 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Tejun Heo @ 2010-11-11 9:27 UTC (permalink / raw) To: Johannes Berg; +Cc: Ben Greear, linux-wireless@vger.kernel.org Hello, On 11/11/2010 02:02 AM, Johannes Berg wrote: > I don't really see any deadlock here... hmm. Tejun, do you see anything > wrong with the "locking" in workq stuff here? > > Something is holding the RTNL, and a bunch of other things are trying to > acquire it. We don't really know who's holding it and who's acquiring it > though. > >> Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 ... >> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 ... >> Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 ... >> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6438: >> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070f78c>] netlink_dump+0x3a/0x16a >> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6441: >> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 >> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6442: >> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 >> Nov 10 14:54:33 localhost kernel: 1 lock held by iwconfig/6443: >> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 >> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6444: >> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 Looks like everyone is stuck trying to get hold of rtnl_mutex. Lockdep seems enabled, isn't there a sysrq which shows all held locks? Yeah, it's 'd'. I don't think much can be found out by looking at the above part. We need to be looking at who's holding the lock. Is the problem reproducible? Thank you. -- tejun ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 9:27 ` Tejun Heo @ 2010-11-11 16:55 ` Ben Greear 2010-11-11 18:26 ` Ben Greear 2010-11-12 0:48 ` Ben Greear 0 siblings, 2 replies; 20+ messages in thread From: Ben Greear @ 2010-11-11 16:55 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org On 11/11/2010 01:27 AM, Tejun Heo wrote: > Hello, > > On 11/11/2010 02:02 AM, Johannes Berg wrote: >> I don't really see any deadlock here... hmm. Tejun, do you see anything >> wrong with the "locking" in workq stuff here? >> >> Something is holding the RTNL, and a bunch of other things are trying to >> acquire it. We don't really know who's holding it and who's acquiring it >> though. >> >>> Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > ... >>> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > ... >>> Nov 10 14:54:33 localhost kernel: #2: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > ... >>> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6438: >>> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070f78c>] netlink_dump+0x3a/0x16a >>> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6441: >>> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 >>> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6442: >>> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 >>> Nov 10 14:54:33 localhost kernel: 1 lock held by iwconfig/6443: >>> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 >>> Nov 10 14:54:33 localhost kernel: 1 lock held by ip/6444: >>> Nov 10 14:54:33 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > > Looks like everyone is stuck trying to get hold of rtnl_mutex. > Lockdep seems enabled, isn't there a sysrq which shows all held locks? > Yeah, it's 'd'. I don't think much can be found out by looking at the > above part. We need to be looking at who's holding the lock. Is the > problem reproducible? It is reproducible, and the output above is from sysrq-d. In the dump of the processes hung for 120 seconds, there was this one. I think it successfully got rtnl_lock, and maybe it's waiting on a wait-queue or something? Please note that the 'sysrq-d' and this capture below were from different instances of the deadlock. For whatever reason, sysrq-d is only printing useful data some of the time. Nov 10 13:26:24 localhost kernel: INFO: task ip:7603 blocked for more than 120 seconds. Nov 10 13:26:24 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 10 13:26:24 localhost kernel: ip D 00428314 0 7603 7548 0x00000080 Nov 10 13:26:24 localhost kernel: f106b9d0 00000046 f1009403 00428314 533f1a80 00000045 c0a22380 f10094e0 Nov 10 13:26:24 localhost kernel: c0a22380 f100975c f1009758 f1009758 c0a22380 c0a22380 0014a3a9 00000000 Nov 10 13:26:24 localhost kernel: f23a4780 00000045 f10094e0 f106ba94 f106ba80 7fffffff f10094e0 f106ba14 Nov 10 13:26:24 localhost kernel: Call Trace: Nov 10 13:26:24 localhost kernel: [<c078f4a3>] schedule_timeout+0x16/0x9f Nov 10 13:26:24 localhost kernel: [<c045a2b1>] ? mark_held_locks+0x47/0x5f Nov 10 13:26:24 localhost kernel: [<c0790ca6>] ? _raw_spin_unlock_irq+0x22/0x2b Nov 10 13:26:24 localhost kernel: [<c045a50d>] ? trace_hardirqs_on_caller+0x104/0x125 Nov 10 13:26:24 localhost kernel: [<c045a539>] ? trace_hardirqs_on+0xb/0xd Nov 10 13:26:24 localhost kernel: [<c078f378>] wait_for_common+0xbb/0x101 Nov 10 13:26:24 localhost kernel: [<c0432063>] ? default_wake_function+0x0/0xd Nov 10 13:26:24 localhost kernel: [<c078f441>] wait_for_completion+0x12/0x14 Nov 10 13:26:24 localhost kernel: [<c0447b6b>] flush_work+0x23/0x27 Nov 10 13:26:24 localhost kernel: [<c0446a5e>] ? wq_barrier_func+0x0/0xd Nov 10 13:26:24 localhost kernel: [<f8c9e1e1>] ieee80211_do_stop+0x252/0x3b7 [mac80211] Nov 10 13:26:24 localhost kernel: [<c043c184>] ? local_bh_enable_ip+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c0790c81>] ? _raw_spin_unlock_bh+0x25/0x28 Nov 10 13:26:24 localhost kernel: [<c0709401>] ? dev_deactivate+0x18/0xd1 Nov 10 13:26:24 localhost kernel: [<f8c9e358>] ieee80211_stop+0x12/0x16 [mac80211] Nov 10 13:26:24 localhost kernel: [<c06f8131>] __dev_close+0x73/0x88 Nov 10 13:26:24 localhost kernel: [<c06f5c1e>] ? dev_set_rx_mode+0x22/0x26 Nov 10 13:26:24 localhost kernel: [<c06f5dc0>] __dev_change_flags+0xa5/0x11a Nov 10 13:26:24 localhost kernel: [<c06f7fdc>] dev_change_flags+0x13/0x3f Nov 10 13:26:24 localhost kernel: [<c0702855>] do_setlink+0x23a/0x525 Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c0702e83>] rtnl_newlink+0x283/0x45c Nov 10 13:26:24 localhost kernel: [<c0702c7e>] ? rtnl_newlink+0x7e/0x45c Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c04b5300>] ? slab_pad_check+0xb/0xf6 Nov 10 13:26:24 localhost kernel: [<c045a2b1>] ? mark_held_locks+0x47/0x5f Nov 10 13:26:24 localhost kernel: [<c078fc95>] ? __mutex_lock_common+0x2d0/0x2eb Nov 10 13:26:24 localhost kernel: [<c045a50d>] ? trace_hardirqs_on_caller+0x104/0x125 Nov 10 13:26:24 localhost kernel: [<c070220f>] ? rtnl_lock+0xf/0x11 rtnl successfully acquired here, I think. ^^ Nov 10 13:26:24 localhost kernel: [<c0702c00>] ? rtnl_newlink+0x0/0x45c Nov 10 13:26:24 localhost kernel: [<c07023bb>] rtnetlink_rcv_msg+0x188/0x19e Nov 10 13:26:24 localhost kernel: [<c0702233>] ? rtnetlink_rcv_msg+0x0/0x19e Nov 10 13:26:24 localhost kernel: [<c0710708>] netlink_rcv_skb+0x30/0x77 Nov 10 13:26:24 localhost kernel: [<c070222c>] rtnetlink_rcv+0x1b/0x22 Nov 10 13:26:24 localhost kernel: [<c0710527>] netlink_unicast+0xbe/0x11a Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c07110f8>] netlink_sendmsg+0x23e/0x255 Nov 10 13:26:24 localhost kernel: [<c06eace8>] ? rcu_read_lock+0x0/0x35 Nov 10 13:26:24 localhost kernel: [<c06e81a4>] __sock_sendmsg+0x54/0x5b Nov 10 13:26:24 localhost kernel: [<c06e866e>] sock_sendmsg+0x95/0xac Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a0180>] ? might_fault+0x90/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c0597708>] ? _copy_from_user+0x31/0x115 Nov 10 13:26:24 localhost kernel: [<c06f07fc>] ? copy_from_user+0x8/0xa Nov 10 13:26:24 localhost kernel: [<c06f0b47>] ? verify_iovec+0x3e/0x77 Nov 10 13:26:24 localhost kernel: [<c06e8a97>] sys_sendmsg+0x14d/0x19a Nov 10 13:26:24 localhost kernel: [<c045a09d>] ? mark_lock+0x1e/0x1eb Nov 10 13:26:24 localhost kernel: [<c04a247d>] ? __do_fault+0x32d/0x359 Nov 10 13:26:24 localhost kernel: [<c045bcaa>] ? lock_release_non_nested+0x86/0x1ec Nov 10 13:26:24 localhost kernel: [<c04c1365>] ? fcheck_files+0x9b/0xca Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a013a>] ? might_fault+0x4a/0x96 Nov 10 13:26:24 localhost kernel: [<c04a0180>] ? might_fault+0x90/0x96 Nov 10 13:26:24 localhost kernel: [<c06e9e16>] sys_socketcall+0x227/0x289 Nov 10 13:26:24 localhost kernel: [<c0403111>] ? sysenter_exit+0xf/0x1a Nov 10 13:26:24 localhost kernel: [<c04030dc>] sysenter_do_call+0x12/0x38 Nov 10 13:26:24 localhost kernel: 1 lock held by ip/7603: Nov 10 13:26:24 localhost kernel: #0: (rtnl_mutex){+.+.+.}, at: [<c070220f>] rtnl_lock+0xf/0x11 > > Thank you. > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 16:55 ` Ben Greear @ 2010-11-11 18:26 ` Ben Greear 2010-11-11 23:12 ` Ben Greear 2010-11-12 0:48 ` Ben Greear 1 sibling, 1 reply; 20+ messages in thread From: Ben Greear @ 2010-11-11 18:26 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org On 11/11/2010 08:55 AM, Ben Greear wrote: > On 11/11/2010 01:27 AM, Tejun Heo wrote: >> Hello, >> >> On 11/11/2010 02:02 AM, Johannes Berg wrote: >>> I don't really see any deadlock here... hmm. Tejun, do you see anything >>> wrong with the "locking" in workq stuff here? >>> >>> Something is holding the RTNL, and a bunch of other things are trying to >>> acquire it. We don't really know who's holding it and who's acquiring it >>> though. I notice that the system is consistently running OOM, even though it has 2GB RAM. I'll try disabling some of the memory-poisoning debugging that may be consuming excess amounts of RAM to see if that helps any. Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 Normal per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 HighMem per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 active_anon:13122 inactive_anon:4414 isolated_anon:0 active_file:2904 inactive_file:16491 isolated_file:13 unevictable:0 dirty:1 writeback:0 unstable:0 free:257367 slab_reclaimable:2005 slab_unreclaimable:210550 mapped:9108 shmem:521 pagetables:964 bounce:0 DMA free:3484kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isoo lowmem_reserve[]: 0 865 2007 2007 Normal free:2296kB min:3728kB low:4660kB high:5592kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anono lowmem_reserve[]: 0 0 9135 9135 HighMem free:1023688kB min:512kB low:1740kB high:2972kB active_anon:52488kB inactive_anon:17656kB active_file:11616kB inactive_file:65964kB unevictablo lowmem_reserve[]: 0 0 0 0 DMA: 3*4kB 0*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3484kB Normal: 402*4kB 64*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2296kB HighMem: 694*4kB 868*8kB 463*16kB 277*32kB 175*64kB 25*128kB 7*256kB 3*512kB 3*1024kB 1*2048kB 238*4096kB = 1023688kB 19928 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 4112380kB Total swap = 4112380kB 521952 pages RAM 294642 pages HighMem 10948 pages reserved 58052 pages shared 241328 pages non-shared Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 18:26 ` Ben Greear @ 2010-11-11 23:12 ` Ben Greear 2010-11-12 10:11 ` Tejun Heo 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2010-11-11 23:12 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org On 11/11/2010 10:26 AM, Ben Greear wrote: > On 11/11/2010 08:55 AM, Ben Greear wrote: >> On 11/11/2010 01:27 AM, Tejun Heo wrote: >>> Hello, >>> >>> On 11/11/2010 02:02 AM, Johannes Berg wrote: >>>> I don't really see any deadlock here... hmm. Tejun, do you see anything >>>> wrong with the "locking" in workq stuff here? >>>> >>>> Something is holding the RTNL, and a bunch of other things are >>>> trying to >>>> acquire it. We don't really know who's holding it and who's >>>> acquiring it >>>> though. > > I notice that the system is consistently running OOM, even though > it has 2GB RAM. I'll try disabling some of the memory-poisoning > debugging that may > be consuming excess amounts of RAM to see if that helps any. The lockup (or extreme slowdown?) happens before the serious memory pressure. One thing I noticed is that at one point near (at?) the beginning of the slowdown, it took 36-seconds to complete the flush_work() call in ieee80211_do_stop in iface.c From some printk's I added: Nov 11 14:58:13 localhost kernel: do_stop: sta14 flushing work: e51298b4 Nov 11 14:58:49 localhost kernel: do_stop: sta14 flushed. It is holding RTNL for this entire time, which of course stops a large number of other useful processes from making progress. Is there any good reason for the flush to take so long? Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 23:12 ` Ben Greear @ 2010-11-12 10:11 ` Tejun Heo 2010-11-12 10:15 ` Tejun Heo 2010-11-12 17:18 ` Ben Greear 0 siblings, 2 replies; 20+ messages in thread From: Tejun Heo @ 2010-11-12 10:11 UTC (permalink / raw) To: Ben Greear; +Cc: Johannes Berg, linux-wireless@vger.kernel.org Hello, On 11/12/2010 12:12 AM, Ben Greear wrote: > The lockup (or extreme slowdown?) happens before the > serious memory pressure. > > One thing I noticed is that at one point near (at?) the beginning > of the slowdown, it took 36-seconds to complete the > flush_work() call in ieee80211_do_stop in iface.c > > From some printk's I added: > > Nov 11 14:58:13 localhost kernel: do_stop: sta14 flushing work: e51298b4 > Nov 11 14:58:49 localhost kernel: do_stop: sta14 flushed. > > It is holding RTNL for this entire time, which of course stops > a large number of other useful processes from making > progress. > > Is there any good reason for the flush to take so long? It depends on what the work being flushed was doing. Which one is it trying to flush? Also, if the memory pressure is high enough, due to the dynamic nature of workqueue, processing of works can be delayed while trying to create new workers to process them. Situations like that usually don't happen often as it's likely that workers get freed up as other works finish; however, if workers are piling up on rtnl_lock, there really isn't much it can do. If there's work user which can behave like that, it would be a good idea to restrict its maximum concurrency using a separate workqueue. Thanks. -- tejun ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 10:11 ` Tejun Heo @ 2010-11-12 10:15 ` Tejun Heo 2010-11-12 18:06 ` Ben Greear 2010-11-12 17:18 ` Ben Greear 1 sibling, 1 reply; 20+ messages in thread From: Tejun Heo @ 2010-11-12 10:15 UTC (permalink / raw) To: Ben Greear; +Cc: Johannes Berg, linux-wireless@vger.kernel.org Just a bit of addition. On 11/12/2010 11:11 AM, Tejun Heo wrote: > It depends on what the work being flushed was doing. Which one is it > trying to flush? Also, if the memory pressure is high enough, due to > the dynamic nature of workqueue, processing of works can be delayed > while trying to create new workers to process them. Please note that under those circumstances, what's guaranteed is forward-progress for workqueues which are used during memory reclaim. Continuously scheduling works which will in turn pile up on rtnl_lock is akin to constantly allocating memory while something holding rtnl_lock is blocked due to memory pressure. Correctness-wise, it isn't necessarily deadlock but the only possible recourse is OOM. Thanks. -- tejun ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 10:15 ` Tejun Heo @ 2010-11-12 18:06 ` Ben Greear 2010-11-12 18:13 ` Tejun Heo 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2010-11-12 18:06 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org On 11/12/2010 02:15 AM, Tejun Heo wrote: > Just a bit of addition. > > On 11/12/2010 11:11 AM, Tejun Heo wrote: >> It depends on what the work being flushed was doing. Which one is it >> trying to flush? Also, if the memory pressure is high enough, due to >> the dynamic nature of workqueue, processing of works can be delayed >> while trying to create new workers to process them. > > Please note that under those circumstances, what's guaranteed is > forward-progress for workqueues which are used during memory reclaim. > Continuously scheduling works which will in turn pile up on rtnl_lock > is akin to constantly allocating memory while something holding > rtnl_lock is blocked due to memory pressure. Correctness-wise, it > isn't necessarily deadlock but the only possible recourse is OOM. From looking at the wireless code, since sdata is stopped, the 'work' isn't going to actually do anything anyway. Is there a way to clear the work from the work-queue w/out requiring any locks that a running worker thread might hold? (So instead of flush_work, we could call something like "remove_all_work" and not block on the worker thread that may currently be trying to acquire rtnl?) Thanks, Ben > > Thanks. > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 18:06 ` Ben Greear @ 2010-11-12 18:13 ` Tejun Heo 2010-11-12 18:34 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Tejun Heo @ 2010-11-12 18:13 UTC (permalink / raw) To: Ben Greear; +Cc: Johannes Berg, linux-wireless@vger.kernel.org Hello, On 11/12/2010 07:06 PM, Ben Greear wrote: > On 11/12/2010 02:15 AM, Tejun Heo wrote: >> Please note that under those circumstances, what's guaranteed is >> forward-progress for workqueues which are used during memory reclaim. >> Continuously scheduling works which will in turn pile up on rtnl_lock >> is akin to constantly allocating memory while something holding >> rtnl_lock is blocked due to memory pressure. Correctness-wise, it >> isn't necessarily deadlock but the only possible recourse is OOM. > > From looking at the wireless code, since sdata is stopped, the > 'work' isn't going to actually do anything anyway. > > Is there a way to clear the work from the work-queue w/out > requiring any locks that a running worker thread might hold? > (So instead of flush_work, we could call something like "remove_all_work" > and not block on the worker thread that may currently be trying to > acquire rtnl?) Hmmm... there's cancel_work_sync(). It'll cancel if the work is pending and wait for completion if it's already running. BTW, which part of code are we talking about? Can you please attach full thread dump at deadlock? Thanks. -- tejun ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 18:13 ` Tejun Heo @ 2010-11-12 18:34 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2010-11-12 18:34 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org On 11/12/2010 10:13 AM, Tejun Heo wrote: > Hello, > > On 11/12/2010 07:06 PM, Ben Greear wrote: >> On 11/12/2010 02:15 AM, Tejun Heo wrote: >>> Please note that under those circumstances, what's guaranteed is >>> forward-progress for workqueues which are used during memory reclaim. >>> Continuously scheduling works which will in turn pile up on rtnl_lock >>> is akin to constantly allocating memory while something holding >>> rtnl_lock is blocked due to memory pressure. Correctness-wise, it >>> isn't necessarily deadlock but the only possible recourse is OOM. >> >> From looking at the wireless code, since sdata is stopped, the >> 'work' isn't going to actually do anything anyway. >> >> Is there a way to clear the work from the work-queue w/out >> requiring any locks that a running worker thread might hold? >> (So instead of flush_work, we could call something like "remove_all_work" >> and not block on the worker thread that may currently be trying to >> acquire rtnl?) > > Hmmm... there's cancel_work_sync(). It'll cancel if the work is > pending and wait for completion if it's already running. BTW, which That would help, but it *might* be possible that the worker thread is currently active. That work shouldn't be asking for rtnl, as far as I can tell, so maybe that's OK. I'll give that a try in a bit. > part of code are we talking about? Can you please attach full thread > dump at deadlock? The problem code seems to be flush_work() call in ieee80211_do_stop() in net/mac80211/iface.c. RTNL is held when the flush_work() method is called. I've seen this apparently deadlock with a worker process trying to call the wireless_nlevent_process method in net/wireless/wext-core.c (it acquires rtnl). Please let me know if you saw the previous thread dumps I sent in this thread. Those were only for processes blocked > 120secs. I can probably get a full sysrq dump if you want that instead. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 10:11 ` Tejun Heo 2010-11-12 10:15 ` Tejun Heo @ 2010-11-12 17:18 ` Ben Greear 1 sibling, 0 replies; 20+ messages in thread From: Ben Greear @ 2010-11-12 17:18 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org On 11/12/2010 02:11 AM, Tejun Heo wrote: > Hello, > > On 11/12/2010 12:12 AM, Ben Greear wrote: >> The lockup (or extreme slowdown?) happens before the >> serious memory pressure. >> >> One thing I noticed is that at one point near (at?) the beginning >> of the slowdown, it took 36-seconds to complete the >> flush_work() call in ieee80211_do_stop in iface.c >> >> From some printk's I added: >> >> Nov 11 14:58:13 localhost kernel: do_stop: sta14 flushing work: e51298b4 >> Nov 11 14:58:49 localhost kernel: do_stop: sta14 flushed. >> >> It is holding RTNL for this entire time, which of course stops >> a large number of other useful processes from making >> progress. >> >> Is there any good reason for the flush to take so long? > > It depends on what the work being flushed was doing. Which one is it > trying to flush? Also, if the memory pressure is high enough, due to It's trying to flush sdata->work. I have no idea which worker..how can I tell? I can reproduce this every time, and I don't mind adding debugging code, so please let me know if there is something I can do to get you better information. > the dynamic nature of workqueue, processing of works can be delayed > while trying to create new workers to process them. Situations like > that usually don't happen often as it's likely that workers get freed > up as other works finish; however, if workers are piling up on > rtnl_lock, there really isn't much it can do. If there's work user > which can behave like that, it would be a good idea to restrict its > maximum concurrency using a separate workqueue. In my case, it seems that memory is OK, but as the deadlock happens the system runs slow for a bit and then goes OOM. I compiled with 2G/2G split so that I have extra low-memory and it runs a bit longer before locking up completely (or just endlessly spewing allocation-failed messages). Thanks, Ben > > Thanks. > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-11 16:55 ` Ben Greear 2010-11-11 18:26 ` Ben Greear @ 2010-11-12 0:48 ` Ben Greear 2010-11-12 2:37 ` Johannes Berg 1 sibling, 1 reply; 20+ messages in thread From: Ben Greear @ 2010-11-12 0:48 UTC (permalink / raw) To: Tejun Heo; +Cc: Johannes Berg, linux-wireless@vger.kernel.org I have a potential scenario: The ieee80211_do_stop logic is called under RTNL, and it then calls flush_work(). What if the worker thread is currently blocked on something like wireless_nlevent_process which tries to acquire rtnl? Wouldn't that cause a deadlock? Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 0:48 ` Ben Greear @ 2010-11-12 2:37 ` Johannes Berg 2010-11-12 16:32 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Johannes Berg @ 2010-11-12 2:37 UTC (permalink / raw) To: Ben Greear; +Cc: Tejun Heo, linux-wireless@vger.kernel.org On Thu, 2010-11-11 at 16:48 -0800, Ben Greear wrote: > I have a potential scenario: > > The ieee80211_do_stop logic is called under RTNL, and it > then calls flush_work(). > > What if the worker thread is currently blocked on something like > wireless_nlevent_process which tries to acquire rtnl? > > Wouldn't that cause a deadlock? Only if Tejun's deadlock avoidance doesn't work -- we used to have a separate kernel thread for mac80211 work including sdata->work which never acquired the RTNL. Also, we have lockdep annotations for exactly this kind of thing ("events" and "(linkwatch_work).work" in your held locks output) that should catch this. johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 2:37 ` Johannes Berg @ 2010-11-12 16:32 ` Ben Greear 2010-11-12 16:45 ` Johannes Berg 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2010-11-12 16:32 UTC (permalink / raw) To: Johannes Berg; +Cc: Tejun Heo, linux-wireless@vger.kernel.org On 11/11/2010 06:37 PM, Johannes Berg wrote: > On Thu, 2010-11-11 at 16:48 -0800, Ben Greear wrote: >> I have a potential scenario: >> >> The ieee80211_do_stop logic is called under RTNL, and it >> then calls flush_work(). >> >> What if the worker thread is currently blocked on something like >> wireless_nlevent_process which tries to acquire rtnl? >> >> Wouldn't that cause a deadlock? > > Only if Tejun's deadlock avoidance doesn't work -- we used to have a > separate kernel thread for mac80211 work including sdata->work which > never acquired the RTNL. Also, we have lockdep annotations for exactly > this kind of thing ("events" and "(linkwatch_work).work" in your held > locks output) that should catch this. There is a warning early in the boot from lockdep, not related to wifi. I fear that is disabling all further lockdep warnings. Thanks, Ben > > johannes -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 16:32 ` Ben Greear @ 2010-11-12 16:45 ` Johannes Berg 2010-11-12 17:37 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Johannes Berg @ 2010-11-12 16:45 UTC (permalink / raw) To: Ben Greear; +Cc: Tejun Heo, linux-wireless@vger.kernel.org On Fri, 2010-11-12 at 08:32 -0800, Ben Greear wrote: > There is a warning early in the boot from lockdep, not related to wifi. > I fear that is disabling all further lockdep warnings. That is indeed true -- try to get rid of it. But even so, nobody else is seeing lockdep warnings afaik, so ... johannes ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: ath5k/mac80211: Reproducible deadlock with 64-stations. 2010-11-12 16:45 ` Johannes Berg @ 2010-11-12 17:37 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2010-11-12 17:37 UTC (permalink / raw) To: Johannes Berg; +Cc: Tejun Heo, linux-wireless@vger.kernel.org On 11/12/2010 08:45 AM, Johannes Berg wrote: > On Fri, 2010-11-12 at 08:32 -0800, Ben Greear wrote: > >> There is a warning early in the boot from lockdep, not related to wifi. >> I fear that is disabling all further lockdep warnings. > > That is indeed true -- try to get rid of it. But even so, nobody else is > seeing lockdep warnings afaik, so ... I wonder if anyone else is running lockdep successfully on wireless-testing? The bug I hit is discussed here, but there doesn't seem to be a clean patch for it yet: http://www.acetylcholine.com/node/113957 Thanks, Ben > > johannes -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2010-11-12 18:34 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-10 23:02 ath5k/mac80211: Reproducible deadlock with 64-stations Ben Greear 2010-11-11 0:57 ` Ben Greear 2010-11-11 1:03 ` Johannes Berg 2010-11-11 5:51 ` Ben Greear 2010-11-11 1:02 ` Johannes Berg 2010-11-11 9:27 ` Tejun Heo 2010-11-11 16:55 ` Ben Greear 2010-11-11 18:26 ` Ben Greear 2010-11-11 23:12 ` Ben Greear 2010-11-12 10:11 ` Tejun Heo 2010-11-12 10:15 ` Tejun Heo 2010-11-12 18:06 ` Ben Greear 2010-11-12 18:13 ` Tejun Heo 2010-11-12 18:34 ` Ben Greear 2010-11-12 17:18 ` Ben Greear 2010-11-12 0:48 ` Ben Greear 2010-11-12 2:37 ` Johannes Berg 2010-11-12 16:32 ` Ben Greear 2010-11-12 16:45 ` Johannes Berg 2010-11-12 17:37 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).