* [BUG] wireless : cpu stuck for 61s @ 2008-07-29 5:57 Dave Young 2008-07-29 12:32 ` Johannes Berg 0 siblings, 1 reply; 19+ messages in thread From: Dave Young @ 2008-07-29 5:57 UTC (permalink / raw) To: linux-kernel; +Cc: linux-wireless While set my wlan0 to ad-hoc mode and then set the essid to something, this bug triggered. I suspect it's related to debugfs or mutex? But I'm not sure. Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore thermal processor button thermal_sys evdev sg Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] debugfs_create_file+0x46/0x210 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02377f1>] debugfs_create_dir+0x21/0x30 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c01394c6>] run_workqueue+0x156/0x1d0 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c0139472>] ? run_workqueue+0x102/0x1d0 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211] Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c0139f18>] worker_thread+0x88/0xe0 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c013cd20>] ? autoremove_wake_function+0x0/0x40 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c0139e90>] ? worker_thread+0x0/0xe0 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c013ca32>] kthread+0x42/0x70 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c013c9f0>] ? kthread+0x0/0x70 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c0104993>] kernel_thread_helper+0x7/0x14 Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ======================= Jul 29 13:02:51 darkstar kernel: [ 8099.234898] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] Jul 29 13:02:51 darkstar kernel: [ 8099.234898] Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore thermal processor button thermal_sys evdev sg Jul 29 13:02:51 darkstar kernel: [ 8099.234898] Jul 29 13:02:51 darkstar kernel: [ 8099.234898] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) Jul 29 13:02:51 darkstar kernel: [ 8099.234898] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] EIP is at mutex_lock_nested+0x1b3/0x280 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c Jul 29 13:02:51 darkstar kernel: [ 8099.234898] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] DR6: ffff0ff0 DR7: 00000400 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c02375a6>] ? debugfs_create_file+0x46/0x210 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c02375a6>] ? debugfs_create_file+0x46/0x210 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c02375a6>] debugfs_create_file+0x46/0x210 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c02377f1>] debugfs_create_dir+0x21/0x30 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c01394c6>] run_workqueue+0x156/0x1d0 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c0139472>] ? run_workqueue+0x102/0x1d0 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211] Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c0139f18>] worker_thread+0x88/0xe0 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c013cd20>] ? autoremove_wake_function+0x0/0x40 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c0139e90>] ? worker_thread+0x0/0xe0 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c013ca32>] kthread+0x42/0x70 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c013c9f0>] ? kthread+0x0/0x70 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] [<c0104993>] kernel_thread_helper+0x7/0x14 Jul 29 13:02:51 darkstar kernel: [ 8099.234898] ======================= ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-29 5:57 [BUG] wireless : cpu stuck for 61s Dave Young @ 2008-07-29 12:32 ` Johannes Berg 2008-07-30 9:08 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Johannes Berg @ 2008-07-29 12:32 UTC (permalink / raw) To: Dave Young; +Cc: linux-kernel, linux-wireless [-- Attachment #1: Type: text/plain, Size: 1548 bytes --] On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote: > While set my wlan0 to ad-hoc mode and then set the essid to something, > this bug triggered. > > I suspect it's related to debugfs or mutex? But I'm not sure. > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400 > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 It looks like it's getting stuck on mutex_lock(&parent->d_inode->i_mutex); inside debugfs_create_by_name, but I have no idea why that would happen. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-29 12:32 ` Johannes Berg @ 2008-07-30 9:08 ` Andrew Morton [not found] ` <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com> 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2008-07-30 9:08 UTC (permalink / raw) To: Johannes Berg; +Cc: Dave Young, linux-kernel, linux-wireless On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote: > > While set my wlan0 to ad-hoc mode and then set the essid to something, > > this bug triggered. > > > > I suspect it's related to debugfs or mutex? But I'm not sure. > > > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400 > > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 > > It looks like it's getting stuck on > mutex_lock(&parent->d_inode->i_mutex); > > inside debugfs_create_by_name, but I have no idea why that would happen. > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we can find another task which is holding onto that lock. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com>]
* Re: [BUG] wireless : cpu stuck for 61s [not found] ` <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com> @ 2008-07-30 10:10 ` Andrew Morton 2008-07-31 2:06 ` Dave Young 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2008-07-30 10:10 UTC (permalink / raw) To: Dave Young; +Cc: Johannes Berg, linux-kernel, linux-wireless On Wed, 30 Jul 2008 18:02:03 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote: > On Wed, Jul 30, 2008 at 5:08 PM, Andrew Morton > <akpm@linux-foundation.org> wrote: > > On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: > > > >> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote: > >> > While set my wlan0 to ad-hoc mode and then set the essid to something, > >> > this bug triggered. > >> > > >> > I suspect it's related to debugfs or mutex? But I'm not sure. > >> > > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400 > >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 > >> > >> It looks like it's getting stuck on > >> mutex_lock(&parent->d_inode->i_mutex); > >> > >> inside debugfs_create_by_name, but I have no idea why that would happen. > >> > > > > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we > > can find another task which is holding onto that lock. > > > > > > Andrew, the keyboard doesn't response after the bug triggered. > > So I just keep sysrq-W / sysrq-T every several seconds after I set the > wlan0 interface. > Attached please see the long log file. argh. Look: Jul 30 17:34:22 darkstar kernel: [ 145.111510] f703fbf8 00000082 00000002 f703fbe4 f703fbd8 00000000 f704ac40 f703fbcc Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000046 c0669060 c066c580 c066c580 c066c580 f703fbe8 f704ad94 c1eaf580 Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000001 f704ac40 00000000 f702ca00 c0452697 ffff55ba 00000000 c014b06d Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace: Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0452697>] ? _spin_unlock_irqrestore+0x47/0x60 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01942c0>] ? __pollwait+0x60/0xd0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888b02e>] ? evdev_poll+0x2e/0x60 [evdev] Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0193564>] do_sys_poll+0x264/0x3a0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0194260>] ? __pollwait+0x0/0xd0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0122160>] ? default_wake_function+0x0/0x10 Jul 30 17:34:22 darkstar last message repeated 3 times Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01220c5>] ? try_to_wake_up+0x75/0x110 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014aed0>] ? mark_held_locks+0x40/0x80 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c03da639>] ? sock_def_readable+0x59/0x80 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 Jul 30 17:34:22 darkstar last message repeated 5 times Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0109d0f>] ? native_sched_clock+0x7f/0xb0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0243b4f>] ? security_file_permission+0xf/0x20 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186387>] ? rw_verify_area+0x57/0xc0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0451f5d>] ? _spin_unlock+0x1d/0x20 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186b94>] ? vfs_read+0x94/0x130 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888be60>] ? evdev_read+0x0/0x1b0 [evdev] Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103dbf>] ? restore_nocheck+0x12/0x15 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01936cd>] sys_poll+0x2d/0x70 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103d5e>] syscall_call+0x7/0xb Jul 30 17:34:22 darkstar kernel: [ 145.111510] ======================= Jul 30 17:34:22 darkstar kernel: [ 145.111510] f63efd94 00000092 00000002 00000000 00000000 f5064028 f7049d80 00000001 Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 c0669060 c066c580 c066c580 c066c580 00000000 f7049ed4 c1e5d580 Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 00000000 00000135 f702e080 f7049d80 00000003 00000000 f7049d80 Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace: Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0 Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0436b98>] unix_stream_recvmsg+0x2a8/0x4e0 We're missing part of the sysrq traces. I noticed this a while back. We need this info to find the D-state processes. It's _possible_ that the kernel is actually trying to print this info, but it's doing it at a different prink facility level from the stack output, so we're only seeing part of the printing. Please try changing your logging level (dmesg -n 0, or dmesg -n 7 - I always forget which). Then retry the sysrq-T. If that successfully causes _all_ the output to be presented, then please rerun the above test. You might want to compress the file too - I don't know whether the vger servers will accept a 730kB mail. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-30 10:10 ` Andrew Morton @ 2008-07-31 2:06 ` Dave Young 2008-07-31 2:56 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Dave Young @ 2008-07-31 2:06 UTC (permalink / raw) To: Andrew Morton; +Cc: Johannes Berg, linux-kernel, linux-wireless [-- Attachment #1: Type: text/plain, Size: 11117 bytes --] On Wed, Jul 30, 2008 at 6:10 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 30 Jul 2008 18:02:03 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote: > >> On Wed, Jul 30, 2008 at 5:08 PM, Andrew Morton >> <akpm@linux-foundation.org> wrote: >> > On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@sipsolutions.net> wrote: >> > >> >> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote: >> >> > While set my wlan0 to ad-hoc mode and then set the essid to something, >> >> > this bug triggered. >> >> > >> >> > I suspect it's related to debugfs or mutex? But I'm not sure. >> >> > >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 >> >> >> >> It looks like it's getting stuck on >> >> mutex_lock(&parent->d_inode->i_mutex); >> >> >> >> inside debugfs_create_by_name, but I have no idea why that would happen. >> >> >> > >> > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we >> > can find another task which is holding onto that lock. >> > >> > >> >> Andrew, the keyboard doesn't response after the bug triggered. >> >> So I just keep sysrq-W / sysrq-T every several seconds after I set the >> wlan0 interface. >> Attached please see the long log file. > > argh. Look: > > Jul 30 17:34:22 darkstar kernel: [ 145.111510] f703fbf8 00000082 00000002 f703fbe4 f703fbd8 00000000 f704ac40 f703fbcc > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000046 c0669060 c066c580 c066c580 c066c580 f703fbe8 f704ad94 c1eaf580 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000001 f704ac40 00000000 f702ca00 c0452697 ffff55ba 00000000 c014b06d > Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace: > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0452697>] ? _spin_unlock_irqrestore+0x47/0x60 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01942c0>] ? __pollwait+0x60/0xd0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888b02e>] ? evdev_poll+0x2e/0x60 [evdev] > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0193564>] do_sys_poll+0x264/0x3a0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0194260>] ? __pollwait+0x0/0xd0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0122160>] ? default_wake_function+0x0/0x10 > Jul 30 17:34:22 darkstar last message repeated 3 times > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01220c5>] ? try_to_wake_up+0x75/0x110 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014aed0>] ? mark_held_locks+0x40/0x80 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c03da639>] ? sock_def_readable+0x59/0x80 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 > Jul 30 17:34:22 darkstar last message repeated 5 times > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0109d0f>] ? native_sched_clock+0x7f/0xb0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0243b4f>] ? security_file_permission+0xf/0x20 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186387>] ? rw_verify_area+0x57/0xc0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0451f5d>] ? _spin_unlock+0x1d/0x20 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186b94>] ? vfs_read+0x94/0x130 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888be60>] ? evdev_read+0x0/0x1b0 [evdev] > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103dbf>] ? restore_nocheck+0x12/0x15 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01936cd>] sys_poll+0x2d/0x70 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103d5e>] syscall_call+0x7/0xb > Jul 30 17:34:22 darkstar kernel: [ 145.111510] ======================= > Jul 30 17:34:22 darkstar kernel: [ 145.111510] f63efd94 00000092 00000002 00000000 00000000 f5064028 f7049d80 00000001 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 c0669060 c066c580 c066c580 c066c580 00000000 f7049ed4 c1e5d580 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 00000000 00000135 f702e080 f7049d80 00000003 00000000 f7049d80 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace: > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0436b98>] unix_stream_recvmsg+0x2a8/0x4e0 > > > We're missing part of the sysrq traces. I noticed this a while back. > We need this info to find the D-state processes. > > It's _possible_ that the kernel is actually trying to print this info, > but it's doing it at a different prink facility level from the stack > output, so we're only seeing part of the printing. > > Please try changing your logging level (dmesg -n 0, or dmesg -n 7 - I > always forget which). Then retry the sysrq-T. If that successfully > causes _all_ the output to be presented, then please rerun the above > test. > > You might want to compress the file too - I don't know whether the vger > servers will accept a 730kB mail. > > Ok here it is. BTW, I run "klogd -c 7" after boot This time I get a kmalloc poison overwritten: Jul 31 09:51:11 darkstar kernel: [ 2024.960036] BUG kmalloc-4096: Poison overwritten Jul 31 09:51:11 darkstar kernel: [ 2024.960036] ----------------------------------------------------------------------------- Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!... Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Redzone 0xf6f3b060: bb bb bb bb »»»» Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0180f5d>] print_trailer+0xad/0xf0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c018103b>] check_bytes_and_report+0x9b/0xc0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c018145e>] check_object+0x19e/0x1e0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01821a4>] __slab_alloc+0x454/0x4f0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dce79>] __alloc_skb+0x49/0x100 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012eafc>] tasklet_action+0x4c/0xc0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012e463>] __do_softirq+0x93/0x120 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012e547>] do_softirq+0x57/0x60 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012ea29>] irq_exit+0x69/0x80 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0106b55>] do_IRQ+0x45/0x80 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a5d0>] ? mwait_idle+0x0/0x50 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0104752>] common_interrupt+0x2e/0x34 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a5d0>] ? mwait_idle+0x0/0x50 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a609>] ? mwait_idle+0x39/0x50 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01026e0>] cpu_idle+0x60/0xd0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c043c8ce>] rest_init+0x4e/0x60 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] ======================= The syslog detail please see the attachment. -- Regards dave [-- Attachment #2: dmesg.gz --] [-- Type: application/x-gzip, Size: 17939 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 2:06 ` Dave Young @ 2008-07-31 2:56 ` Andrew Morton 2008-07-31 3:01 ` Dave Young 2008-07-31 9:15 ` Pekka J Enberg 0 siblings, 2 replies; 19+ messages in thread From: Andrew Morton @ 2008-07-31 2:56 UTC (permalink / raw) To: Dave Young; +Cc: Johannes Berg, linux-kernel, linux-wireless, Pekka Enberg On Thu, 31 Jul 2008 10:06:31 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote: > > Ok here it is. > BTW, I run "klogd -c 7" after boot The sysrq output is still missing lots of stuff. I guess we broke it. > > This time I get a kmalloc poison overwritten: > <fixes wordwrapping, cleans stuff up> ============================================================================= BUG kmalloc-4096: Poison overwritten ----------------------------------------------------------------------------- INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!... Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... Redzone 0xf6f3b060: bb bb bb bb »»»» Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 [<c0180f5d>] print_trailer+0xad/0xf0 [<c018103b>] check_bytes_and_report+0x9b/0xc0 [<c018145e>] check_object+0x19e/0x1e0 [<c01821a4>] __slab_alloc+0x454/0x4f0 [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 [<c03dce79>] __alloc_skb+0x49/0x100 [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 [<c012eafc>] tasklet_action+0x4c/0xc0 [<c012e463>] __do_softirq+0x93/0x120 [<c012e547>] do_softirq+0x57/0x60 [<c012ea29>] irq_exit+0x69/0x80 [<c0106b55>] do_IRQ+0x45/0x80 [<c010a5d0>] ? mwait_idle+0x0/0x50 [<c0104752>] common_interrupt+0x2e/0x34 [<c010a5d0>] ? mwait_idle+0x0/0x50 [<c010a609>] ? mwait_idle+0x39/0x50 [<c01026e0>] cpu_idle+0x60/0xd0 [<c043c8ce>] rest_init+0x4e/0x60 ======================= FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b FIX kmalloc-4096: Marking all objects used [<c0243b4f>] ? security_file_permission+0xf/0x20 [<c019436f>] sys_select+0x3f/0x190 [<c01878e9>] ? fput+0x19/0x20 [<c0103dbf>] ? restore_nocheck+0x12/0x15 [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 [<c0103d5e>] syscall_call+0x7/0xb ======================= argh, that stuff hurts my brain. None of the numbers seem to make any sense for a 4k allocation :( Pekka, do you have time to decrypt this? Dave, could you please remind us which net driver was in use here? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 2:56 ` Andrew Morton @ 2008-07-31 3:01 ` Dave Young 2008-07-31 9:15 ` Pekka J Enberg 1 sibling, 0 replies; 19+ messages in thread From: Dave Young @ 2008-07-31 3:01 UTC (permalink / raw) To: Andrew Morton; +Cc: Johannes Berg, linux-kernel, linux-wireless, Pekka Enberg [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3653 bytes --] On Thu, Jul 31, 2008 at 10:56 AM, Andrew Morton<akpm@linux-foundation.org> wrote:> On Thu, 31 Jul 2008 10:06:31 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote:>>>>> Ok here it is.>> BTW, I run "klogd -c 7" after boot>> The sysrq output is still missing lots of stuff. I guess we broke it.>>>>> This time I get a kmalloc poison overwritten:>>>> <fixes wordwrapping, cleans stuff up> Thanks, sorry for that caused by webmail. >> =============================================================================> BUG kmalloc-4096: Poison overwritten> ----------------------------------------------------------------------------->> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030>> Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÃ..ZZZZZZZZ> Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk> Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk> Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@> Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...> Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............> Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........> Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`> Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÃ...{..........> Redzone 0xf6f3b060: bb bb bb bb »»»»> Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ> Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2> [<c0180f5d>] print_trailer+0xad/0xf0> [<c018103b>] check_bytes_and_report+0x9b/0xc0> [<c018145e>] check_object+0x19e/0x1e0> [<c01821a4>] __slab_alloc+0x454/0x4f0> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30> [<c03dce79>] __alloc_skb+0x49/0x100> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0> [<c012eafc>] tasklet_action+0x4c/0xc0> [<c012e463>] __do_softirq+0x93/0x120> [<c012e547>] do_softirq+0x57/0x60> [<c012ea29>] irq_exit+0x69/0x80> [<c0106b55>] do_IRQ+0x45/0x80> [<c010a5d0>] ? mwait_idle+0x0/0x50> [<c0104752>] common_interrupt+0x2e/0x34> [<c010a5d0>] ? mwait_idle+0x0/0x50> [<c010a609>] ? mwait_idle+0x39/0x50> [<c01026e0>] cpu_idle+0x60/0xd0> [<c043c8ce>] rest_init+0x4e/0x60> =======================> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b>> FIX kmalloc-4096: Marking all objects used> [<c0243b4f>] ? security_file_permission+0xf/0x20> [<c019436f>] sys_select+0x3f/0x190> [<c01878e9>] ? fput+0x19/0x20> [<c0103dbf>] ? restore_nocheck+0x12/0x15> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140> [<c0103d5e>] syscall_call+0x7/0xb> =======================>> argh, that stuff hurts my brain. None of the numbers seem to make any> sense for a 4k allocation :( Pekka, do you have time to decrypt this?>> Dave, could you please remind us which net driver was in use here?>> It's the ath5k -- Regardsdaveÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 2:56 ` Andrew Morton 2008-07-31 3:01 ` Dave Young @ 2008-07-31 9:15 ` Pekka J Enberg 2008-07-31 9:50 ` Tomas Winkler 2008-08-01 7:32 ` Dave Young 1 sibling, 2 replies; 19+ messages in thread From: Pekka J Enberg @ 2008-07-31 9:15 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Young, Johannes Berg, linux-kernel, linux-wireless [-- Attachment #1: Type: TEXT/PLAIN, Size: 4460 bytes --] Hi Andrew, On Wed, 30 Jul 2008, Andrew Morton wrote: > > Ok here it is. > > BTW, I run "klogd -c 7" after boot > > The sysrq output is still missing lots of stuff. I guess we broke it. > > > > > This time I get a kmalloc poison overwritten: > > > > argh, that stuff hurts my brain. None of the numbers seem to make any > sense for a 4k allocation :( Pekka, do you have time to decrypt this? Sure. Here goes: On Wed, 30 Jul 2008, Andrew Morton wrote: > <fixes wordwrapping, cleans stuff up> > > ============================================================================= > BUG kmalloc-4096: Poison overwritten > ----------------------------------------------------------------------------- > > INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b That's POISON_FREE ("0x6b") overwritten which means use-after-free for the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the object is okay but the SLUB debugging code only dumps the first 128 bytes of the object which is why we don't see the full corruption. 2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1? > INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 > INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 So the corrupted object was free'd by skb_release_data() so we need to look for a driver or the networking stack calling that function too early. > INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 > INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 > > Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ The object starts here (the poison values for first 32 bytes are okay): > Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk And here are the first 96 bytes of the corruption: > Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ > Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!... > Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. > Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... > Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` > Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... But I think that's just the payload of a SKB? > Redzone 0xf6f3b060: bb bb bb bb »»»» The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces use-after-free. > Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ > Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 > [<c0180f5d>] print_trailer+0xad/0xf0 > [<c018103b>] check_bytes_and_report+0x9b/0xc0 > [<c018145e>] check_object+0x19e/0x1e0 > [<c01821a4>] __slab_alloc+0x454/0x4f0 > [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 > [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 > [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 > [<c03dce79>] __alloc_skb+0x49/0x100 > [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 > [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] > [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] > [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 > [<c012eafc>] tasklet_action+0x4c/0xc0 > [<c012e463>] __do_softirq+0x93/0x120 > [<c012e547>] do_softirq+0x57/0x60 > [<c012ea29>] irq_exit+0x69/0x80 > [<c0106b55>] do_IRQ+0x45/0x80 > [<c010a5d0>] ? mwait_idle+0x0/0x50 > [<c0104752>] common_interrupt+0x2e/0x34 > [<c010a5d0>] ? mwait_idle+0x0/0x50 > [<c010a609>] ? mwait_idle+0x39/0x50 > [<c01026e0>] cpu_idle+0x60/0xd0 > [<c043c8ce>] rest_init+0x4e/0x60 > ======================= > FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b > > FIX kmalloc-4096: Marking all objects used > [<c0243b4f>] ? security_file_permission+0xf/0x20 > [<c019436f>] sys_select+0x3f/0x190 > [<c01878e9>] ? fput+0x19/0x20 > [<c0103dbf>] ? restore_nocheck+0x12/0x15 > [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 > [<c0103d5e>] syscall_call+0x7/0xb > ======================= > > Dave, could you please remind us which net driver was in use here? There's ath5k in the stack trace but that, of course, doesn't automatically mean it's at fault here. It could have been just the poor bastard who was the next to allocate 4 KB with kmalloc() noticing the corruption. Hope this helps! Pekka ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 9:15 ` Pekka J Enberg @ 2008-07-31 9:50 ` Tomas Winkler 2008-07-31 9:53 ` Pekka Enberg 2008-08-01 7:32 ` Dave Young 1 sibling, 1 reply; 19+ messages in thread From: Tomas Winkler @ 2008-07-31 9:50 UTC (permalink / raw) To: Pekka J Enberg Cc: Andrew Morton, Dave Young, Johannes Berg, linux-kernel, linux-wireless On Thu, Jul 31, 2008 at 12:15 PM, Pekka J Enberg <penberg@cs.helsinki.fi> wrote: > Hi Andrew, > > On Wed, 30 Jul 2008, Andrew Morton wrote: >> > Ok here it is. >> > BTW, I run "klogd -c 7" after boot >> >> The sysrq output is still missing lots of stuff. I guess we broke it. >> >> > >> > This time I get a kmalloc poison overwritten: >> > >> >> argh, that stuff hurts my brain. None of the numbers seem to make any >> sense for a 4k allocation :( Pekka, do you have time to decrypt this? > > Sure. Here goes: > > On Wed, 30 Jul 2008, Andrew Morton wrote: >> <fixes wordwrapping, cleans stuff up> >> >> ============================================================================= >> BUG kmalloc-4096: Poison overwritten >> ----------------------------------------------------------------------------- >> >> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b > > That's POISON_FREE ("0x6b") overwritten which means use-after-free for > the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the > object is okay but the SLUB debugging code only dumps the first 128 bytes > of the object which is why we don't see the full corruption. > > 2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1? > >> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 >> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 > > So the corrupted object was free'd by skb_release_data() so we need to > look for a driver or the networking stack calling that function too early. > >> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 >> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 >> >> Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ > > The object starts here (the poison values for first 32 bytes are okay): > >> Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk >> Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk > > And here are the first 96 bytes of the corruption: > >> Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ >> Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!... >> Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. >> Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... >> Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` >> Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... > > But I think that's just the payload of a SKB? > >> Redzone 0xf6f3b060: bb bb bb bb »»»» > > The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces > use-after-free. > >> Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ >> Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 >> [<c0180f5d>] print_trailer+0xad/0xf0 >> [<c018103b>] check_bytes_and_report+0x9b/0xc0 >> [<c018145e>] check_object+0x19e/0x1e0 >> [<c01821a4>] __slab_alloc+0x454/0x4f0 >> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 >> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 >> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 >> [<c03dce79>] __alloc_skb+0x49/0x100 >> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 >> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] >> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] >> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 >> [<c012eafc>] tasklet_action+0x4c/0xc0 >> [<c012e463>] __do_softirq+0x93/0x120 >> [<c012e547>] do_softirq+0x57/0x60 >> [<c012ea29>] irq_exit+0x69/0x80 >> [<c0106b55>] do_IRQ+0x45/0x80 >> [<c010a5d0>] ? mwait_idle+0x0/0x50 >> [<c0104752>] common_interrupt+0x2e/0x34 >> [<c010a5d0>] ? mwait_idle+0x0/0x50 >> [<c010a609>] ? mwait_idle+0x39/0x50 >> [<c01026e0>] cpu_idle+0x60/0xd0 >> [<c043c8ce>] rest_init+0x4e/0x60 >> ======================= >> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b >> >> FIX kmalloc-4096: Marking all objects used >> [<c0243b4f>] ? security_file_permission+0xf/0x20 >> [<c019436f>] sys_select+0x3f/0x190 >> [<c01878e9>] ? fput+0x19/0x20 >> [<c0103dbf>] ? restore_nocheck+0x12/0x15 >> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 >> [<c0103d5e>] syscall_call+0x7/0xb >> ======================= >> >> Dave, could you please remind us which net driver was in use here? > > There's ath5k in the stack trace but that, of course, doesn't > automatically mean it's at fault here. It could have been just the poor > bastard who was the next to allocate 4 KB with kmalloc() noticing the > corruption. > > Hope this helps! I've seen something similar with fragmentation code in iwl4965 but I can reproduce it only when using SLAB. With SLUB it didn't shown up. Does anyone know what is difference between SLAB and SLUB in this context.? Tomas ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 9:50 ` Tomas Winkler @ 2008-07-31 9:53 ` Pekka Enberg 2008-07-31 10:29 ` Tomas Winkler 0 siblings, 1 reply; 19+ messages in thread From: Pekka Enberg @ 2008-07-31 9:53 UTC (permalink / raw) To: Tomas Winkler Cc: Andrew Morton, Dave Young, Johannes Berg, linux-kernel, linux-wireless On Thu, 2008-07-31 at 12:50 +0300, Tomas Winkler wrote: > > There's ath5k in the stack trace but that, of course, doesn't > > automatically mean it's at fault here. It could have been just the poor > > bastard who was the next to allocate 4 KB with kmalloc() noticing the > > corruption. > > > > Hope this helps! > > I've seen something similar with fragmentation code in iwl4965 but I > can reproduce it only when using SLAB. With SLUB it didn't shown up. > Does anyone know what is difference between SLAB and SLUB in this context.? Do you have CONFIG_SLUB_DEBUG_ON enabled? (Note: if you only have CONFIG_SLUB_DEBUG enabled, then you need to enable debugging at run-time by passing slub_debug to kernel command line.) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 9:53 ` Pekka Enberg @ 2008-07-31 10:29 ` Tomas Winkler 0 siblings, 0 replies; 19+ messages in thread From: Tomas Winkler @ 2008-07-31 10:29 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, Dave Young, Johannes Berg, linux-kernel, linux-wireless On Thu, Jul 31, 2008 at 12:53 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On Thu, 2008-07-31 at 12:50 +0300, Tomas Winkler wrote: >> > There's ath5k in the stack trace but that, of course, doesn't >> > automatically mean it's at fault here. It could have been just the poor >> > bastard who was the next to allocate 4 KB with kmalloc() noticing the >> > corruption. >> > >> > Hope this helps! >> >> I've seen something similar with fragmentation code in iwl4965 but I >> can reproduce it only when using SLAB. With SLUB it didn't shown up. >> Does anyone know what is difference between SLAB and SLUB in this context.? > > Do you have CONFIG_SLUB_DEBUG_ON enabled? (Note: if you only have > CONFIG_SLUB_DEBUG enabled, then you need to enable debugging at run-time > by passing slub_debug to kernel command line.) Yes sure I've done that with slub_debug=FZPU Tomas ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-07-31 9:15 ` Pekka J Enberg 2008-07-31 9:50 ` Tomas Winkler @ 2008-08-01 7:32 ` Dave Young 2008-08-04 9:22 ` Jiri Slaby 2008-08-04 10:00 ` Jiri Slaby 1 sibling, 2 replies; 19+ messages in thread From: Dave Young @ 2008-08-01 7:32 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Andrew Morton, Johannes Berg, linux-kernel, linux-wireless [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 5848 bytes --] On Thu, Jul 31, 2008 at 5:15 PM, Pekka J Enberg <penberg@cs.helsinki.fi> wrote:> Hi Andrew,>> On Wed, 30 Jul 2008, Andrew Morton wrote:>> > Ok here it is.>> > BTW, I run "klogd -c 7" after boot>>>> The sysrq output is still missing lots of stuff. I guess we broke it.>>>> >>> > This time I get a kmalloc poison overwritten:>> >>>>> argh, that stuff hurts my brain. None of the numbers seem to make any>> sense for a 4k allocation :( Pekka, do you have time to decrypt this?>> Sure. Here goes:>> On Wed, 30 Jul 2008, Andrew Morton wrote:>> <fixes wordwrapping, cleans stuff up>>>>> =============================================================================>> BUG kmalloc-4096: Poison overwritten>> ----------------------------------------------------------------------------->>>> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b>> That's POISON_FREE ("0x6b") overwritten which means use-after-free for> the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the> object is okay but the SLUB debugging code only dumps the first 128 bytes> of the object which is why we don't see the full corruption.>> 2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1? Yes, it's 2.6.26 >>> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0>> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398>> So the corrupted object was free'd by skb_release_data() so we need to> look for a driver or the networking stack calling that function too early.>>> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3>> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030>>>> Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÃ..ZZZZZZZZ>> The object starts here (the poison values for first 32 bytes are okay):>>> Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk>> Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk>> And here are the first 96 bytes of the corruption:>>> Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@>> Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...>> Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............>> Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........>> Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`>> Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÃ...{..........>> But I think that's just the payload of a SKB?>>> Redzone 0xf6f3b060: bb bb bb bb »»»»>> The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces> use-after-free.>>> Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ>> Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2>> [<c0180f5d>] print_trailer+0xad/0xf0>> [<c018103b>] check_bytes_and_report+0x9b/0xc0>> [<c018145e>] check_object+0x19e/0x1e0>> [<c01821a4>] __slab_alloc+0x454/0x4f0>> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30>> [<c03dce79>] __alloc_skb+0x49/0x100>> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30>> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]>> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]>> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0>> [<c012eafc>] tasklet_action+0x4c/0xc0>> [<c012e463>] __do_softirq+0x93/0x120>> [<c012e547>] do_softirq+0x57/0x60>> [<c012ea29>] irq_exit+0x69/0x80>> [<c0106b55>] do_IRQ+0x45/0x80>> [<c010a5d0>] ? mwait_idle+0x0/0x50>> [<c0104752>] common_interrupt+0x2e/0x34>> [<c010a5d0>] ? mwait_idle+0x0/0x50>> [<c010a609>] ? mwait_idle+0x39/0x50>> [<c01026e0>] cpu_idle+0x60/0xd0>> [<c043c8ce>] rest_init+0x4e/0x60>> =======================>> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b>>>> FIX kmalloc-4096: Marking all objects used>> [<c0243b4f>] ? security_file_permission+0xf/0x20>> [<c019436f>] sys_select+0x3f/0x190>> [<c01878e9>] ? fput+0x19/0x20>> [<c0103dbf>] ? restore_nocheck+0x12/0x15>> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140>> [<c0103d5e>] syscall_call+0x7/0xb>> =======================>>>> Dave, could you please remind us which net driver was in use here?>> There's ath5k in the stack trace but that, of course, doesn't> automatically mean it's at fault here. It could have been just the poor> bastard who was the next to allocate 4 KB with kmalloc() noticing the> corruption.>> Hope this helps!>> Pekka Lockdep helped me, this morning I get a lockdep warning about this, [ 171.432140] [ INFO: possible recursive locking detected ][ 171.433113] 2.6.27-rc1-smp #4[ 171.434079] ---------------------------------------------[ 171.435039] ath5k_pci/2447 is trying to acquire lock:[ 171.435990] (&sc->lock){--..}, at: [<f89ee9b5>]ath5k_config_interface+0xd5/0x340 [ath5k][ 171.437046][ 171.437048] but task is already holding lock:[ 171.438903] (&sc->lock){--..}, at: [<f89ee91d>]ath5k_config_interface+0x3d/0x340 [ath5k][ 171.439953][ 171.439954] other info that might help us debug this:[ 171.441795] 3 locks held by ath5k_pci/2447:[ 171.442729] #0: ((name)){--..}, at: [<c013a122>] run_workqueue+0x102/0x1d0[ 171.443800] #1: (&(&local->scan_work)->work){--..}, at:[<c013a122>] run_workqueue+0x102/0x1d0[ 171.444859] #2: (&sc->lock){--..}, at: [<f89ee91d>]ath5k_config_interface+0x3d/0x340 [ath5k] Deadlock happen here, I remove the lock in the sub routine, tested andfixed the problem for me.I will send the patch after a while. But I still have no idea with the poison overwritten. -- Regardsdaveÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-01 7:32 ` Dave Young @ 2008-08-04 9:22 ` Jiri Slaby 2008-08-04 10:00 ` Jiri Slaby 1 sibling, 0 replies; 19+ messages in thread From: Jiri Slaby @ 2008-08-04 9:22 UTC (permalink / raw) To: Dave Young Cc: Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless Dave Young napsal(a): > On Thu, Jul 31, 2008 at 5:15 PM, Pekka J Enberg <penberg@cs.helsinki.fi> wrote: >> On Wed, 30 Jul 2008, Andrew Morton wrote: >>> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 >>> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 >> So the corrupted object was free'd by skb_release_data() so we need to >> look for a driver or the networking stack calling that function too early. >> >>> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 >>> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 >>> >>> Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ >> The object starts here (the poison values for first 32 bytes are okay): >> >>> Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk >>> Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk >> And here are the first 96 bytes of the corruption: >> >>> Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ >>> Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!... >>> Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. >>> Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... >>> Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` >>> Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... >> But I think that's just the payload of a SKB? It's a receive frame from ath5k, I suppose. 00:17:7b:00:46:40 is your AP? >>> Redzone 0xf6f3b060: bb bb bb bb »»»» >> The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces >> use-after-free. >> >>> Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ >>> Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 >>> [<c0180f5d>] print_trailer+0xad/0xf0 >>> [<c018103b>] check_bytes_and_report+0x9b/0xc0 >>> [<c018145e>] check_object+0x19e/0x1e0 >>> [<c01821a4>] __slab_alloc+0x454/0x4f0 >>> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 >>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 >>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 >>> [<c03dce79>] __alloc_skb+0x49/0x100 >>> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 >>> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] >>> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] >>> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 >>> [<c012eafc>] tasklet_action+0x4c/0xc0 [...] >>> ======================= >>> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b >>> Dave, could you please remind us which net driver was in use here? >> There's ath5k in the stack trace but that, of course, doesn't >> automatically mean it's at fault here. It could have been just the poor >> bastard who was the next to allocate 4 KB with kmalloc() noticing the >> corruption. No, unfortunately ath5k *is* likely the culprit. Next time please Cc ath5k-devel@lists.ath5k.org even if it is only a suspicion. > But I still have no idea with the poison overwritten. Could you try patch from http://lkml.org/lkml/2008/7/15/276 ? (I have no idea how reproducible is this by you, it often happens on noisy channels and/or by lowering RX buffers, i.e. ATH_RXBUF). [It hit mainline few days ago, I'm going to fwd it to stable.] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-01 7:32 ` Dave Young 2008-08-04 9:22 ` Jiri Slaby @ 2008-08-04 10:00 ` Jiri Slaby 2008-08-05 1:29 ` Dave Young 1 sibling, 1 reply; 19+ messages in thread From: Jiri Slaby @ 2008-08-04 10:00 UTC (permalink / raw) To: Dave Young Cc: Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless, Bob Copeland Dave Young napsal(a): > Lockdep helped me, this morning I get a lockdep warning about this, > > [ 171.432140] [ INFO: possible recursive locking detected ] > [ 171.433113] 2.6.27-rc1-smp #4 > [ 171.434079] --------------------------------------------- > [ 171.435039] ath5k_pci/2447 is trying to acquire lock: > [ 171.435990] (&sc->lock){--..}, at: [<f89ee9b5>] > ath5k_config_interface+0xd5/0x340 [ath5k] > [ 171.437046] > [ 171.437048] but task is already holding lock: > [ 171.438903] (&sc->lock){--..}, at: [<f89ee91d>] > ath5k_config_interface+0x3d/0x340 [ath5k] > [ 171.439953] > [ 171.439954] other info that might help us debug this: > [ 171.441795] 3 locks held by ath5k_pci/2447: > [ 171.442729] #0: ((name)){--..}, at: [<c013a122>] run_workqueue+0x102/0x1d0 > [ 171.443800] #1: (&(&local->scan_work)->work){--..}, at: > [<c013a122>] run_workqueue+0x102/0x1d0 > [ 171.444859] #2: (&sc->lock){--..}, at: [<f89ee91d>] > ath5k_config_interface+0x3d/0x340 [ath5k] > > Deadlock happen here, I remove the lock in the sub routine, tested and > fixed the problem for me. > I will send the patch after a while. That's some kind different. 2.6.26 can't suffer from the thing above (opposing to 27-rc1). Pasting the original issue below. Maybe it's been fixed too? I suppose you don't encounter the problem anymore in 27-rc1 with your/Bob's/mine (fix for the same thing in the same manner) patch applied? BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore thermal processor button thermal_sys evdev sg Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 EIP is at mutex_lock_nested+0x1b3/0x280 EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c02375a6>] ? debugfs_create_file+0x46/0x210 [<c02375a6>] ? debugfs_create_file+0x46/0x210 [<c02375a6>] debugfs_create_file+0x46/0x210 [<c02377f1>] debugfs_create_dir+0x21/0x30 [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] [<c01394c6>] run_workqueue+0x156/0x1d0 [<c0139472>] ? run_workqueue+0x102/0x1d0 [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211] [<c0139f18>] worker_thread+0x88/0xe0 [<c013cd20>] ? autoremove_wake_function+0x0/0x40 [<c0139e90>] ? worker_thread+0x0/0xe0 [<c013ca32>] kthread+0x42/0x70 [<c013c9f0>] ? kthread+0x0/0x70 [<c0104993>] kernel_thread_helper+0x7/0x14 ======================= BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore thermal processor button thermal_sys evdev sg Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 EIP is at mutex_lock_nested+0x1b3/0x280 EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c02375a6>] ? debugfs_create_file+0x46/0x210 [<c02375a6>] ? debugfs_create_file+0x46/0x210 [<c02375a6>] debugfs_create_file+0x46/0x210 [<c02377f1>] debugfs_create_dir+0x21/0x30 [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] [<c01394c6>] run_workqueue+0x156/0x1d0 [<c0139472>] ? run_workqueue+0x102/0x1d0 [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211] [<c0139f18>] worker_thread+0x88/0xe0 [<c013cd20>] ? autoremove_wake_function+0x0/0x40 [<c0139e90>] ? worker_thread+0x0/0xe0 [<c013ca32>] kthread+0x42/0x70 [<c013c9f0>] ? kthread+0x0/0x70 [<c0104993>] kernel_thread_helper+0x7/0x14 ======================= ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-04 10:00 ` Jiri Slaby @ 2008-08-05 1:29 ` Dave Young 2008-08-05 12:24 ` Bob Copeland 0 siblings, 1 reply; 19+ messages in thread From: Dave Young @ 2008-08-05 1:29 UTC (permalink / raw) To: Jiri Slaby Cc: Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless, Bob Copeland On Mon, Aug 4, 2008 at 6:00 PM, Jiri Slaby <jirislaby@gmail.com> wrote: > Dave Young napsal(a): >> >> Lockdep helped me, this morning I get a lockdep warning about this, >> >> [ 171.432140] [ INFO: possible recursive locking detected ] >> [ 171.433113] 2.6.27-rc1-smp #4 >> [ 171.434079] --------------------------------------------- >> [ 171.435039] ath5k_pci/2447 is trying to acquire lock: >> [ 171.435990] (&sc->lock){--..}, at: [<f89ee9b5>] >> ath5k_config_interface+0xd5/0x340 [ath5k] >> [ 171.437046] >> [ 171.437048] but task is already holding lock: >> [ 171.438903] (&sc->lock){--..}, at: [<f89ee91d>] >> ath5k_config_interface+0x3d/0x340 [ath5k] >> [ 171.439953] >> [ 171.439954] other info that might help us debug this: >> [ 171.441795] 3 locks held by ath5k_pci/2447: >> [ 171.442729] #0: ((name)){--..}, at: [<c013a122>] >> run_workqueue+0x102/0x1d0 >> [ 171.443800] #1: (&(&local->scan_work)->work){--..}, at: >> [<c013a122>] run_workqueue+0x102/0x1d0 >> [ 171.444859] #2: (&sc->lock){--..}, at: [<f89ee91d>] >> ath5k_config_interface+0x3d/0x340 [ath5k] >> >> Deadlock happen here, I remove the lock in the sub routine, tested and >> fixed the problem for me. >> I will send the patch after a while. > > That's some kind different. 2.6.26 can't suffer from the thing above > (opposing to 27-rc1). Yes, it's got with 27-rc1 > > Pasting the original issue below. Maybe it's been fixed too? I suppose you > don't encounter the problem anymore in 27-rc1 with your/Bob's/mine (fix for > the same thing in the same manner) patch applied? With the patch I cann't reproduce the bug with 27-rc1 now. > > BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] > Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss > snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm > l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib > ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel > snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore > thermal processor button thermal_sys evdev sg > > Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) > EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 > EIP is at mutex_lock_nested+0x1b3/0x280 > EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 > ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > DR6: ffff0ff0 DR7: 00000400 > [<c02375a6>] ? debugfs_create_file+0x46/0x210 > [<c02375a6>] ? debugfs_create_file+0x46/0x210 > [<c02375a6>] debugfs_create_file+0x46/0x210 > [<c02377f1>] debugfs_create_dir+0x21/0x30 > [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] > [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] > [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] > [<c01394c6>] run_workqueue+0x156/0x1d0 > [<c0139472>] ? run_workqueue+0x102/0x1d0 > [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211] > [<c0139f18>] worker_thread+0x88/0xe0 > [<c013cd20>] ? autoremove_wake_function+0x0/0x40 > [<c0139e90>] ? worker_thread+0x0/0xe0 > [<c013ca32>] kthread+0x42/0x70 > [<c013c9f0>] ? kthread+0x0/0x70 > [<c0104993>] kernel_thread_helper+0x7/0x14 > ======================= > BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] > Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss > snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm > l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib > ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel > snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore > thermal processor button thermal_sys evdev sg > > Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) > EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 > EIP is at mutex_lock_nested+0x1b3/0x280 > EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 > ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > DR6: ffff0ff0 DR7: 00000400 > [<c02375a6>] ? debugfs_create_file+0x46/0x210 > [<c02375a6>] ? debugfs_create_file+0x46/0x210 > [<c02375a6>] debugfs_create_file+0x46/0x210 > [<c02377f1>] debugfs_create_dir+0x21/0x30 > [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] > [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] > [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] > [<c01394c6>] run_workqueue+0x156/0x1d0 > [<c0139472>] ? run_workqueue+0x102/0x1d0 > [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211] > [<c0139f18>] worker_thread+0x88/0xe0 > [<c013cd20>] ? autoremove_wake_function+0x0/0x40 > [<c0139e90>] ? worker_thread+0x0/0xe0 > [<c013ca32>] kthread+0x42/0x70 > [<c013c9f0>] ? kthread+0x0/0x70 > [<c0104993>] kernel_thread_helper+0x7/0x14 > ======================= > -- Regards dave ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-05 1:29 ` Dave Young @ 2008-08-05 12:24 ` Bob Copeland 2008-08-06 1:51 ` Dave Young 0 siblings, 1 reply; 19+ messages in thread From: Bob Copeland @ 2008-08-05 12:24 UTC (permalink / raw) To: Dave Young Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote: > With the patch I cann't reproduce the bug with 27-rc1 now. > > > [<c02375a6>] ? debugfs_create_file+0x46/0x210 > > [<c02375a6>] ? debugfs_create_file+0x46/0x210 > > [<c02375a6>] debugfs_create_file+0x46/0x210 > > [<c02377f1>] debugfs_create_dir+0x21/0x30 > > [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] > > [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] > > [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] I wonder if there were two separate problems here. I looked into this with some detail yesterday and agree with Johannes that the above trace is on locking the parent directory's i_mutex, but I too couldn't see any problems with sta_info_debugfs_add_work. Other stuff could also modify the directory with or without rtnl_lock, but not in a way that to my untrained eyes would lead to deadlock. Or is the trace just wrong? -- Bob Copeland %% www.bobcopeland.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-05 12:24 ` Bob Copeland @ 2008-08-06 1:51 ` Dave Young 2008-08-06 1:53 ` Dave Young 2008-08-12 4:19 ` Dave Young 0 siblings, 2 replies; 19+ messages in thread From: Dave Young @ 2008-08-06 1:51 UTC (permalink / raw) To: Bob Copeland Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless On Tue, Aug 5, 2008 at 8:24 PM, Bob Copeland <me@bobcopeland.com> wrote: > On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote: >> With the patch I cann't reproduce the bug with 27-rc1 now. >> >> > [<c02375a6>] ? debugfs_create_file+0x46/0x210 >> > [<c02375a6>] ? debugfs_create_file+0x46/0x210 >> > [<c02375a6>] debugfs_create_file+0x46/0x210 >> > [<c02377f1>] debugfs_create_dir+0x21/0x30 >> > [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] >> > [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] >> > [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] > > I wonder if there were two separate problems here. I looked into > this with some detail yesterday and agree with Johannes that the above > trace is on locking the parent directory's i_mutex, but I too couldn't > see any problems with sta_info_debugfs_add_work. Other stuff could also > modify the directory with or without rtnl_lock, but not in a way that > to my untrained eyes would lead to deadlock. Yes,. I think so. It's the original bug for me, while testing I found the mutex deadlock problem. But this week I will have no time to trace it. so if I have time I will keep tracing the problem > > Or is the trace just wrong? > > -- > Bob Copeland %% www.bobcopeland.com > > -- Regards dave ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-06 1:51 ` Dave Young @ 2008-08-06 1:53 ` Dave Young 2008-08-12 4:19 ` Dave Young 1 sibling, 0 replies; 19+ messages in thread From: Dave Young @ 2008-08-06 1:53 UTC (permalink / raw) To: Bob Copeland Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless On Wed, Aug 6, 2008 at 9:51 AM, Dave Young <hidave.darkstar@gmail.com> wrote: > On Tue, Aug 5, 2008 at 8:24 PM, Bob Copeland <me@bobcopeland.com> wrote: >> On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote: >>> With the patch I cann't reproduce the bug with 27-rc1 now. >>> >>> > [<c02375a6>] ? debugfs_create_file+0x46/0x210 >>> > [<c02375a6>] ? debugfs_create_file+0x46/0x210 >>> > [<c02375a6>] debugfs_create_file+0x46/0x210 >>> > [<c02377f1>] debugfs_create_dir+0x21/0x30 >>> > [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] >>> > [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] >>> > [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] >> >> I wonder if there were two separate problems here. I looked into >> this with some detail yesterday and agree with Johannes that the above >> trace is on locking the parent directory's i_mutex, but I too couldn't >> see any problems with sta_info_debugfs_add_work. Other stuff could also >> modify the directory with or without rtnl_lock, but not in a way that >> to my untrained eyes would lead to deadlock. > > Yes,. I think so. It's the original bug for me, while testing I found > the mutex deadlock problem. > > But this week I will have no time to trace it. so if I have time I > will keep tracing the problem Additional info, With the mutex fix patch, in 2.6.27-rc1 I seems can not reproduce the debugfs_add bug, (maybe need more test) But with 2.6.26, the bug can be reproduced. (The mutex fix patch need not to be applied because there's no such deadlock bug) > >> >> Or is the trace just wrong? >> >> -- >> Bob Copeland %% www.bobcopeland.com >> >> > > > > -- > Regards > dave > -- Regards dave ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [BUG] wireless : cpu stuck for 61s 2008-08-06 1:51 ` Dave Young 2008-08-06 1:53 ` Dave Young @ 2008-08-12 4:19 ` Dave Young 1 sibling, 0 replies; 19+ messages in thread From: Dave Young @ 2008-08-12 4:19 UTC (permalink / raw) To: Bob Copeland Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel, linux-wireless On Wed, Aug 6, 2008 at 9:51 AM, Dave Young <hidave.darkstar@gmail.com> wrote: > On Tue, Aug 5, 2008 at 8:24 PM, Bob Copeland <me@bobcopeland.com> wrote: >> On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote: >>> With the patch I cann't reproduce the bug with 27-rc1 now. >>> >>> > [<c02375a6>] ? debugfs_create_file+0x46/0x210 >>> > [<c02375a6>] ? debugfs_create_file+0x46/0x210 >>> > [<c02375a6>] debugfs_create_file+0x46/0x210 >>> > [<c02377f1>] debugfs_create_dir+0x21/0x30 >>> > [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211] >>> > [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211] >>> > [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211] >> >> I wonder if there were two separate problems here. I looked into >> this with some detail yesterday and agree with Johannes that the above >> trace is on locking the parent directory's i_mutex, but I too couldn't >> see any problems with sta_info_debugfs_add_work. Other stuff could also >> modify the directory with or without rtnl_lock, but not in a way that >> to my untrained eyes would lead to deadlock. > > Yes,. I think so. It's the original bug for me, while testing I found > the mutex deadlock problem. > > But this week I will have no time to trace it. so if I have time I > will keep tracing the problem Unfortunately I have no more time to do this these days. As I can't reproduce it with the latest git kernel I think that It might have been fixed already. > >> >> Or is the trace just wrong? >> >> -- >> Bob Copeland %% www.bobcopeland.com >> >> > > > > -- > Regards > dave > -- Regards dave ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-08-12 4:19 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-29 5:57 [BUG] wireless : cpu stuck for 61s Dave Young
2008-07-29 12:32 ` Johannes Berg
2008-07-30 9:08 ` Andrew Morton
[not found] ` <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com>
2008-07-30 10:10 ` Andrew Morton
2008-07-31 2:06 ` Dave Young
2008-07-31 2:56 ` Andrew Morton
2008-07-31 3:01 ` Dave Young
2008-07-31 9:15 ` Pekka J Enberg
2008-07-31 9:50 ` Tomas Winkler
2008-07-31 9:53 ` Pekka Enberg
2008-07-31 10:29 ` Tomas Winkler
2008-08-01 7:32 ` Dave Young
2008-08-04 9:22 ` Jiri Slaby
2008-08-04 10:00 ` Jiri Slaby
2008-08-05 1:29 ` Dave Young
2008-08-05 12:24 ` Bob Copeland
2008-08-06 1:51 ` Dave Young
2008-08-06 1:53 ` Dave Young
2008-08-12 4:19 ` Dave Young
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox