[BUG] wireless : cpu stuck for 61s

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] wireless : cpu stuck for 61s
@ 2008-07-29  5:57 Dave Young
  2008-07-29 12:32 ` Johannes Berg
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Young @ 2008-07-29  5:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-wireless

While set my wlan0 to ad-hoc mode and then set the essid to something,
this bug triggered.

I suspect it's related to debugfs or mutex? But I'm not sure.

Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore thermal processor button thermal_sys evdev sg
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] 
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] debugfs_create_file+0x46/0x210
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02377f1>] debugfs_create_dir+0x21/0x30
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c01394c6>] run_workqueue+0x156/0x1d0
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c0139472>] ? run_workqueue+0x102/0x1d0
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211]
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c0139f18>] worker_thread+0x88/0xe0
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c013cd20>] ? autoremove_wake_function+0x0/0x40
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c0139e90>] ? worker_thread+0x0/0xe0
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c013ca32>] kthread+0x42/0x70
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c013c9f0>] ? kthread+0x0/0x70
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c0104993>] kernel_thread_helper+0x7/0x14
Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  =======================
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore thermal processor button thermal_sys evdev sg
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] 
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] EIP is at mutex_lock_nested+0x1b3/0x280
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul 29 13:02:51 darkstar kernel: [ 8099.234898] DR6: ffff0ff0 DR7: 00000400
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c02375a6>] debugfs_create_file+0x46/0x210
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c02377f1>] debugfs_create_dir+0x21/0x30
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c01394c6>] run_workqueue+0x156/0x1d0
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c0139472>] ? run_workqueue+0x102/0x1d0
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211]
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c0139f18>] worker_thread+0x88/0xe0
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c013cd20>] ? autoremove_wake_function+0x0/0x40
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c0139e90>] ? worker_thread+0x0/0xe0
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c013ca32>] kthread+0x42/0x70
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c013c9f0>] ? kthread+0x0/0x70
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  [<c0104993>] kernel_thread_helper+0x7/0x14
Jul 29 13:02:51 darkstar kernel: [ 8099.234898]  =======================

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-29  5:57 [BUG] wireless : cpu stuck for 61s Dave Young
@ 2008-07-29 12:32 ` Johannes Berg
  2008-07-30  9:08   ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2008-07-29 12:32 UTC (permalink / raw)
  To: Dave Young; +Cc: linux-kernel, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote:
> While set my wlan0 to ad-hoc mode and then set the essid to something,
> this bug triggered.
> 
> I suspect it's related to debugfs or mutex? But I'm not sure.
> 
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] 
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400
> Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] ? debugfs_create_file+0x46/0x210

It looks like it's getting stuck on
mutex_lock(&parent->d_inode->i_mutex);

inside debugfs_create_by_name, but I have no idea why that would happen.

johannes

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-29 12:32 ` Johannes Berg
@ 2008-07-30  9:08   ` Andrew Morton
       [not found]     ` <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-07-30  9:08 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Dave Young, linux-kernel, linux-wireless

On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@sipsolutions.net> wrote:

> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote:
> > While set my wlan0 to ad-hoc mode and then set the essid to something,
> > this bug triggered.
> > 
> > I suspect it's related to debugfs or mutex? But I'm not sure.
> > 
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] 
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400
> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
> 
> It looks like it's getting stuck on
> mutex_lock(&parent->d_inode->i_mutex);
> 
> inside debugfs_create_by_name, but I have no idea why that would happen.
> 

Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we
can find another task which is holding onto that lock.


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com>]

* Re: [BUG] wireless : cpu stuck for 61s
       [not found]     ` <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com>
@ 2008-07-30 10:10       ` Andrew Morton
  2008-07-31  2:06         ` Dave Young
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-07-30 10:10 UTC (permalink / raw)
  To: Dave Young; +Cc: Johannes Berg, linux-kernel, linux-wireless

On Wed, 30 Jul 2008 18:02:03 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote:

> On Wed, Jul 30, 2008 at 5:08 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> > On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@sipsolutions.net> wrote:
> >
> >> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote:
> >> > While set my wlan0 to ad-hoc mode and then set the essid to something,
> >> > this bug triggered.
> >> >
> >> > I suspect it's related to debugfs or mutex? But I'm not sure.
> >> >
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400
> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
> >>
> >> It looks like it's getting stuck on
> >> mutex_lock(&parent->d_inode->i_mutex);
> >>
> >> inside debugfs_create_by_name, but I have no idea why that would happen.
> >>
> >
> > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we
> > can find another task which is holding onto that lock.
> >
> >
> 
> Andrew, the keyboard doesn't response after the bug triggered.
> 
> So I just keep sysrq-W / sysrq-T every several seconds after I set the
> wlan0 interface.
> Attached please see the long log file.

argh.  Look:

Jul 30 17:34:22 darkstar kernel: [  145.111510]        f703fbf8 00000082 00000002 f703fbe4 f703fbd8 00000000 f704ac40 f703fbcc 
Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000046 c0669060 c066c580 c066c580 c066c580 f703fbe8 f704ad94 c1eaf580 
Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000001 f704ac40 00000000 f702ca00 c0452697 ffff55ba 00000000 c014b06d 
Jul 30 17:34:22 darkstar kernel: [  145.111510] Call Trace:
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0452697>] ? _spin_unlock_irqrestore+0x47/0x60
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0450265>] schedule_timeout+0x75/0xc0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c01942c0>] ? __pollwait+0x60/0xd0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<f888b02e>] ? evdev_poll+0x2e/0x60 [evdev]
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0193564>] do_sys_poll+0x264/0x3a0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0194260>] ? __pollwait+0x0/0xd0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0122160>] ? default_wake_function+0x0/0x10
Jul 30 17:34:22 darkstar last message repeated 3 times
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c01220c5>] ? try_to_wake_up+0x75/0x110
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014aed0>] ? mark_held_locks+0x40/0x80
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b77b>] ? __lock_acquire+0x24b/0x1040
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b77b>] ? __lock_acquire+0x24b/0x1040
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c03da639>] ? sock_def_readable+0x59/0x80
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b77b>] ? __lock_acquire+0x24b/0x1040
Jul 30 17:34:22 darkstar last message repeated 5 times
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0109d0f>] ? native_sched_clock+0x7f/0xb0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0243b4f>] ? security_file_permission+0xf/0x20
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0186387>] ? rw_verify_area+0x57/0xc0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0451f5d>] ? _spin_unlock+0x1d/0x20
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0186b94>] ? vfs_read+0x94/0x130
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<f888be60>] ? evdev_read+0x0/0x1b0 [evdev]
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0103dbf>] ? restore_nocheck+0x12/0x15
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c01936cd>] sys_poll+0x2d/0x70
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0103d5e>] syscall_call+0x7/0xb
Jul 30 17:34:22 darkstar kernel: [  145.111510]  =======================
Jul 30 17:34:22 darkstar kernel: [  145.111510]        f63efd94 00000092 00000002 00000000 00000000 f5064028 f7049d80 00000001 
Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000000 c0669060 c066c580 c066c580 c066c580 00000000 f7049ed4 c1e5d580 
Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000000 00000000 00000135 f702e080 f7049d80 00000003 00000000 f7049d80 
Jul 30 17:34:22 darkstar kernel: [  145.111510] Call Trace:
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0450265>] schedule_timeout+0x75/0xc0
Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0436b98>] unix_stream_recvmsg+0x2a8/0x4e0


We're missing part of the sysrq traces.  I noticed this a while back. 
We need this info to find the D-state processes.

It's _possible_ that the kernel is actually trying to print this info,
but it's doing it at a different prink facility level from the stack
output, so we're only seeing part of the printing.

Please try changing your logging level (dmesg -n 0, or dmesg -n 7 - I
always forget which).  Then retry the sysrq-T.  If that successfully
causes _all_ the output to be presented, then please rerun the above
test.

You might want to compress the file too - I don't know whether the vger
servers will accept a 730kB mail.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-30 10:10       ` Andrew Morton
@ 2008-07-31  2:06         ` Dave Young
  2008-07-31  2:56           ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Young @ 2008-07-31  2:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Berg, linux-kernel, linux-wireless

[-- Attachment #1: Type: text/plain, Size: 11117 bytes --]

On Wed, Jul 30, 2008 at 6:10 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 30 Jul 2008 18:02:03 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote:
>
>> On Wed, Jul 30, 2008 at 5:08 PM, Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>> > On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@sipsolutions.net> wrote:
>> >
>> >> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote:
>> >> > While set my wlan0 to ad-hoc mode and then set the essid to something,
>> >> > this bug triggered.
>> >> >
>> >> > I suspect it's related to debugfs or mutex? But I'm not sure.
>> >> >
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400
>> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493]  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>> >>
>> >> It looks like it's getting stuck on
>> >> mutex_lock(&parent->d_inode->i_mutex);
>> >>
>> >> inside debugfs_create_by_name, but I have no idea why that would happen.
>> >>
>> >
>> > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we
>> > can find another task which is holding onto that lock.
>> >
>> >
>>
>> Andrew, the keyboard doesn't response after the bug triggered.
>>
>> So I just keep sysrq-W / sysrq-T every several seconds after I set the
>> wlan0 interface.
>> Attached please see the long log file.
>
> argh.  Look:
>
> Jul 30 17:34:22 darkstar kernel: [  145.111510]        f703fbf8 00000082 00000002 f703fbe4 f703fbd8 00000000 f704ac40 f703fbcc
> Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000046 c0669060 c066c580 c066c580 c066c580 f703fbe8 f704ad94 c1eaf580
> Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000001 f704ac40 00000000 f702ca00 c0452697 ffff55ba 00000000 c014b06d
> Jul 30 17:34:22 darkstar kernel: [  145.111510] Call Trace:
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0452697>] ? _spin_unlock_irqrestore+0x47/0x60
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0450265>] schedule_timeout+0x75/0xc0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c01942c0>] ? __pollwait+0x60/0xd0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<f888b02e>] ? evdev_poll+0x2e/0x60 [evdev]
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0193564>] do_sys_poll+0x264/0x3a0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0194260>] ? __pollwait+0x0/0xd0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0122160>] ? default_wake_function+0x0/0x10
> Jul 30 17:34:22 darkstar last message repeated 3 times
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c01220c5>] ? try_to_wake_up+0x75/0x110
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014aed0>] ? mark_held_locks+0x40/0x80
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b77b>] ? __lock_acquire+0x24b/0x1040
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b77b>] ? __lock_acquire+0x24b/0x1040
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c03da639>] ? sock_def_readable+0x59/0x80
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c014b77b>] ? __lock_acquire+0x24b/0x1040
> Jul 30 17:34:22 darkstar last message repeated 5 times
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0109d0f>] ? native_sched_clock+0x7f/0xb0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0243b4f>] ? security_file_permission+0xf/0x20
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0186387>] ? rw_verify_area+0x57/0xc0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0451f5d>] ? _spin_unlock+0x1d/0x20
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0186b94>] ? vfs_read+0x94/0x130
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<f888be60>] ? evdev_read+0x0/0x1b0 [evdev]
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0103dbf>] ? restore_nocheck+0x12/0x15
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c01936cd>] sys_poll+0x2d/0x70
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0103d5e>] syscall_call+0x7/0xb
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  =======================
> Jul 30 17:34:22 darkstar kernel: [  145.111510]        f63efd94 00000092 00000002 00000000 00000000 f5064028 f7049d80 00000001
> Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000000 c0669060 c066c580 c066c580 c066c580 00000000 f7049ed4 c1e5d580
> Jul 30 17:34:22 darkstar kernel: [  145.111510]        00000000 00000000 00000135 f702e080 f7049d80 00000003 00000000 f7049d80
> Jul 30 17:34:22 darkstar kernel: [  145.111510] Call Trace:
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0450265>] schedule_timeout+0x75/0xc0
> Jul 30 17:34:22 darkstar kernel: [  145.111510]  [<c0436b98>] unix_stream_recvmsg+0x2a8/0x4e0
>
>
> We're missing part of the sysrq traces.  I noticed this a while back.
> We need this info to find the D-state processes.
>
> It's _possible_ that the kernel is actually trying to print this info,
> but it's doing it at a different prink facility level from the stack
> output, so we're only seeing part of the printing.
>
> Please try changing your logging level (dmesg -n 0, or dmesg -n 7 - I
> always forget which).  Then retry the sysrq-T.  If that successfully
> causes _all_ the output to be presented, then please rerun the above
> test.
>
> You might want to compress the file too - I don't know whether the vger
> servers will accept a 730kB mail.
>
>

Ok here it is.
BTW, I run "klogd -c 7" after boot

This time I get a kmalloc poison overwritten:

Jul 31 09:51:11 darkstar kernel: [ 2024.960036] BUG kmalloc-4096:
Poison overwritten
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]
-----------------------------------------------------------------------------
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO:
0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Allocated in
dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Freed in
skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Slab 0xc1c05440
objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Object
0xf6f3a060 @offset=8288 fp=0xf6f39030
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Bytes b4 0xf6f3a050:
5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a060:
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a070:
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a080:
80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a090:
00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a0a0:
64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a0b0:
84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a0c0:
07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]   Object 0xf6f3a0d0:
6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{..........
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  Redzone 0xf6f3b060:
bb bb bb bb                                     »»»»
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  Padding 0xf6f3b088:
5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Pid: 0, comm: swapper
Tainted: G        W 2.6.26-smp #2
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c0180f5d>]
print_trailer+0xad/0xf0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c018103b>]
check_bytes_and_report+0x9b/0xc0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c018145e>]
check_object+0x19e/0x1e0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c01821a4>]
__slab_alloc+0x454/0x4f0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c01834d6>]
__kmalloc_track_caller+0xe6/0xf0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c03dd1ec>] ?
dev_alloc_skb+0x1c/0x30
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c03dd1ec>] ?
dev_alloc_skb+0x1c/0x30
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c03dce79>]
__alloc_skb+0x49/0x100
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c03dd1ec>]
dev_alloc_skb+0x1c/0x30
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<f8a58599>]
ath5k_rxbuf_setup+0x39/0x200 [ath5k]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<f8a5a697>]
ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c014969a>] ?
print_lock_contention_bug+0x1a/0xe0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c012eafc>]
tasklet_action+0x4c/0xc0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c012e463>]
__do_softirq+0x93/0x120
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c012e547>]
do_softirq+0x57/0x60
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c012ea29>] irq_exit+0x69/0x80
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c0106b55>] do_IRQ+0x45/0x80
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c010a5d0>] ?
mwait_idle+0x0/0x50
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c0104752>]
common_interrupt+0x2e/0x34
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c010a5d0>] ?
mwait_idle+0x0/0x50
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c010a609>] ?
mwait_idle+0x39/0x50
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c01026e0>] cpu_idle+0x60/0xd0
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  [<c043c8ce>]
rest_init+0x4e/0x60
Jul 31 09:51:11 darkstar kernel: [ 2024.960036]  =======================


The syslog detail please see the attachment.

-- 
Regards
dave

[-- Attachment #2: dmesg.gz --]
[-- Type: application/x-gzip, Size: 17939 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  2:06         ` Dave Young
@ 2008-07-31  2:56           ` Andrew Morton
  2008-07-31  3:01             ` Dave Young
  2008-07-31  9:15             ` Pekka J Enberg
  0 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2008-07-31  2:56 UTC (permalink / raw)
  To: Dave Young; +Cc: Johannes Berg, linux-kernel, linux-wireless, Pekka Enberg

On Thu, 31 Jul 2008 10:06:31 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote:

> 
> Ok here it is.
> BTW, I run "klogd -c 7" after boot

The sysrq output is still missing lots of stuff.  I guess we broke it.

> 
> This time I get a kmalloc poison overwritten:
>

<fixes wordwrapping, cleans stuff up>

=============================================================================
BUG kmalloc-4096: Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b
INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398
INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030

Bytes b4 0xf6f3a050:  5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ
Object 0xf6f3a060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf6f3a070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object 0xf6f3a080:  80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@
Object 0xf6f3a090:  00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...
Object 0xf6f3a0a0:  64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
Object 0xf6f3a0b0:  84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
Object 0xf6f3a0c0:  07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
Object 0xf6f3a0d0:  6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{..........
Redzone 0xf6f3b060:  bb bb bb bb                                     »»»»            
Padding 0xf6f3b088:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
Pid: 0, comm: swapper Tainted: G        W 2.6.26-smp #2
[<c0180f5d>] print_trailer+0xad/0xf0
[<c018103b>] check_bytes_and_report+0x9b/0xc0
[<c018145e>] check_object+0x19e/0x1e0
[<c01821a4>] __slab_alloc+0x454/0x4f0
[<c01834d6>] __kmalloc_track_caller+0xe6/0xf0
[<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
[<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
[<c03dce79>] __alloc_skb+0x49/0x100
[<c03dd1ec>] dev_alloc_skb+0x1c/0x30
[<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]
[<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
[<c014969a>] ? print_lock_contention_bug+0x1a/0xe0
[<c012eafc>] tasklet_action+0x4c/0xc0
[<c012e463>] __do_softirq+0x93/0x120
[<c012e547>] do_softirq+0x57/0x60
[<c012ea29>] irq_exit+0x69/0x80
[<c0106b55>] do_IRQ+0x45/0x80
[<c010a5d0>] ? mwait_idle+0x0/0x50
[<c0104752>] common_interrupt+0x2e/0x34
[<c010a5d0>] ? mwait_idle+0x0/0x50
[<c010a609>] ? mwait_idle+0x39/0x50
[<c01026e0>] cpu_idle+0x60/0xd0
[<c043c8ce>] rest_init+0x4e/0x60
=======================
FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b

FIX kmalloc-4096: Marking all objects used
[<c0243b4f>] ? security_file_permission+0xf/0x20
[<c019436f>] sys_select+0x3f/0x190
[<c01878e9>] ? fput+0x19/0x20
[<c0103dbf>] ? restore_nocheck+0x12/0x15
[<c014b06d>] ? trace_hardirqs_on+0xbd/0x140
[<c0103d5e>] syscall_call+0x7/0xb
=======================

argh, that stuff hurts my brain.  None of the numbers seem to make any
sense for a 4k allocation :( Pekka, do you have time to decrypt this?

Dave, could you please remind us which net driver was in use here?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  2:56           ` Andrew Morton
@ 2008-07-31  3:01             ` Dave Young
  2008-07-31  9:15             ` Pekka J Enberg
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Young @ 2008-07-31  3:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Berg, linux-kernel, linux-wireless, Pekka Enberg

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 3653 bytes --]

On Thu, Jul 31, 2008 at 10:56 AM, Andrew Morton<akpm@linux-foundation.org> wrote:> On Thu, 31 Jul 2008 10:06:31 +0800 "Dave Young" <hidave.darkstar@gmail.com> wrote:>>>>> Ok here it is.>> BTW, I run "klogd -c 7" after boot>> The sysrq output is still missing lots of stuff.  I guess we broke it.>>>>> This time I get a kmalloc poison overwritten:>>>> <fixes wordwrapping, cleans stuff up>
Thanks, sorry for that caused by webmail.
>> =============================================================================> BUG kmalloc-4096: Poison overwritten> ----------------------------------------------------------------------------->> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030>> Bytes b4 0xf6f3a050:  5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÃ‰..ZZZZZZZZ> Object 0xf6f3a060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk> Object 0xf6f3a070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk> Object 0xf6f3a080:  80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿..{.F@> Object 0xf6f3a090:  00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...> Object 0xf6f3a0a0:  64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............> Object 0xf6f3a0b0:  84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........> Object 0xf6f3a0c0:  07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`> Object 0xf6f3a0d0:  6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÃ...{..........> Redzone 0xf6f3b060:  bb bb bb bb                                     Â»Â»Â»Â»> Padding 0xf6f3b088:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ> Pid: 0, comm: swapper Tainted: G        W 2.6.26-smp #2> [<c0180f5d>] print_trailer+0xad/0xf0> [<c018103b>] check_bytes_and_report+0x9b/0xc0> [<c018145e>] check_object+0x19e/0x1e0> [<c01821a4>] __slab_alloc+0x454/0x4f0> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30> [<c03dce79>] __alloc_skb+0x49/0x100> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0> [<c012eafc>] tasklet_action+0x4c/0xc0> [<c012e463>] __do_softirq+0x93/0x120> [<c012e547>] do_softirq+0x57/0x60> [<c012ea29>] irq_exit+0x69/0x80> [<c0106b55>] do_IRQ+0x45/0x80> [<c010a5d0>] ? mwait_idle+0x0/0x50> [<c0104752>] common_interrupt+0x2e/0x34> [<c010a5d0>] ? mwait_idle+0x0/0x50> [<c010a609>] ? mwait_idle+0x39/0x50> [<c01026e0>] cpu_idle+0x60/0xd0> [<c043c8ce>] rest_init+0x4e/0x60> =======================> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b>> FIX kmalloc-4096: Marking all objects used> [<c0243b4f>] ? security_file_permission+0xf/0x20> [<c019436f>] sys_select+0x3f/0x190> [<c01878e9>] ? fput+0x19/0x20> [<c0103dbf>] ? restore_nocheck+0x12/0x15> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140> [<c0103d5e>] syscall_call+0x7/0xb> =======================>> argh, that stuff hurts my brain.  None of the numbers seem to make any> sense for a 4k allocation :( Pekka, do you have time to decrypt this?>> Dave, could you please remind us which net driver was in use here?>>
It's the ath5k
-- Regardsdaveÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  2:56           ` Andrew Morton
  2008-07-31  3:01             ` Dave Young
@ 2008-07-31  9:15             ` Pekka J Enberg
  2008-07-31  9:50               ` Tomas Winkler
  2008-08-01  7:32               ` Dave Young
  1 sibling, 2 replies; 19+ messages in thread
From: Pekka J Enberg @ 2008-07-31  9:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Young, Johannes Berg, linux-kernel, linux-wireless

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4460 bytes --]

Hi Andrew,

On Wed, 30 Jul 2008, Andrew Morton wrote:
> > Ok here it is.
> > BTW, I run "klogd -c 7" after boot
> 
> The sysrq output is still missing lots of stuff.  I guess we broke it.
> 
> > 
> > This time I get a kmalloc poison overwritten:
> >
> 
> argh, that stuff hurts my brain.  None of the numbers seem to make any
> sense for a 4k allocation :( Pekka, do you have time to decrypt this?

Sure. Here goes:

On Wed, 30 Jul 2008, Andrew Morton wrote:
> <fixes wordwrapping, cleans stuff up>
> 
> =============================================================================
> BUG kmalloc-4096: Poison overwritten
> -----------------------------------------------------------------------------
> 
> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b

That's POISON_FREE ("0x6b") overwritten which means use-after-free for 
the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the 
object is okay but the SLUB debugging code only dumps the first 128 bytes 
of the object which is why we don't see the full corruption. 

2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1?

> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398

So the corrupted object was free'd by skb_release_data() so we need to 
look for a driver or the networking stack calling that function too early.

> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030
> 
> Bytes b4 0xf6f3a050:  5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ

The object starts here (the poison values for first 32 bytes are okay):

> Object 0xf6f3a060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> Object 0xf6f3a070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk

And here are the first 96 bytes of the corruption:

> Object 0xf6f3a080:  80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@
> Object 0xf6f3a090:  00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...
> Object 0xf6f3a0a0:  64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
> Object 0xf6f3a0b0:  84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
> Object 0xf6f3a0c0:  07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
> Object 0xf6f3a0d0:  6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{..........

But I think that's just the payload of a SKB?

> Redzone 0xf6f3b060:  bb bb bb bb                                     »»»»            

The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces 
use-after-free.

> Padding 0xf6f3b088:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ        
> Pid: 0, comm: swapper Tainted: G        W 2.6.26-smp #2
> [<c0180f5d>] print_trailer+0xad/0xf0
> [<c018103b>] check_bytes_and_report+0x9b/0xc0
> [<c018145e>] check_object+0x19e/0x1e0
> [<c01821a4>] __slab_alloc+0x454/0x4f0
> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0
> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
> [<c03dce79>] __alloc_skb+0x49/0x100
> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30
> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]
> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0
> [<c012eafc>] tasklet_action+0x4c/0xc0
> [<c012e463>] __do_softirq+0x93/0x120
> [<c012e547>] do_softirq+0x57/0x60
> [<c012ea29>] irq_exit+0x69/0x80
> [<c0106b55>] do_IRQ+0x45/0x80
> [<c010a5d0>] ? mwait_idle+0x0/0x50
> [<c0104752>] common_interrupt+0x2e/0x34
> [<c010a5d0>] ? mwait_idle+0x0/0x50
> [<c010a609>] ? mwait_idle+0x39/0x50
> [<c01026e0>] cpu_idle+0x60/0xd0
> [<c043c8ce>] rest_init+0x4e/0x60
> =======================
> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b
> 
> FIX kmalloc-4096: Marking all objects used
> [<c0243b4f>] ? security_file_permission+0xf/0x20
> [<c019436f>] sys_select+0x3f/0x190
> [<c01878e9>] ? fput+0x19/0x20
> [<c0103dbf>] ? restore_nocheck+0x12/0x15
> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140
> [<c0103d5e>] syscall_call+0x7/0xb
> =======================
> 
> Dave, could you please remind us which net driver was in use here?

There's ath5k in the stack trace but that, of course, doesn't 
automatically mean it's at fault here. It could have been just the poor 
bastard who was the next to allocate 4 KB with kmalloc() noticing the 
corruption.

Hope this helps!

			Pekka

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  9:15             ` Pekka J Enberg
@ 2008-07-31  9:50               ` Tomas Winkler
  2008-07-31  9:53                 ` Pekka Enberg
  2008-08-01  7:32               ` Dave Young
  1 sibling, 1 reply; 19+ messages in thread
From: Tomas Winkler @ 2008-07-31  9:50 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Andrew Morton, Dave Young, Johannes Berg, linux-kernel,
	linux-wireless

On Thu, Jul 31, 2008 at 12:15 PM, Pekka J Enberg <penberg@cs.helsinki.fi> wrote:
> Hi Andrew,
>
> On Wed, 30 Jul 2008, Andrew Morton wrote:
>> > Ok here it is.
>> > BTW, I run "klogd -c 7" after boot
>>
>> The sysrq output is still missing lots of stuff.  I guess we broke it.
>>
>> >
>> > This time I get a kmalloc poison overwritten:
>> >
>>
>> argh, that stuff hurts my brain.  None of the numbers seem to make any
>> sense for a 4k allocation :( Pekka, do you have time to decrypt this?
>
> Sure. Here goes:
>
> On Wed, 30 Jul 2008, Andrew Morton wrote:
>> <fixes wordwrapping, cleans stuff up>
>>
>> =============================================================================
>> BUG kmalloc-4096: Poison overwritten
>> -----------------------------------------------------------------------------
>>
>> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b
>
> That's POISON_FREE ("0x6b") overwritten which means use-after-free for
> the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the
> object is okay but the SLUB debugging code only dumps the first 128 bytes
> of the object which is why we don't see the full corruption.
>
> 2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1?
>
>> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
>> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398
>
> So the corrupted object was free'd by skb_release_data() so we need to
> look for a driver or the networking stack calling that function too early.
>
>> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
>> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030
>>
>> Bytes b4 0xf6f3a050:  5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ
>
> The object starts here (the poison values for first 32 bytes are okay):
>
>> Object 0xf6f3a060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> Object 0xf6f3a070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>
> And here are the first 96 bytes of the corruption:
>
>> Object 0xf6f3a080:  80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@
>> Object 0xf6f3a090:  00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...
>> Object 0xf6f3a0a0:  64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
>> Object 0xf6f3a0b0:  84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
>> Object 0xf6f3a0c0:  07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
>> Object 0xf6f3a0d0:  6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{..........
>
> But I think that's just the payload of a SKB?
>
>> Redzone 0xf6f3b060:  bb bb bb bb                                     »»»»
>
> The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces
> use-after-free.
>
>> Padding 0xf6f3b088:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
>> Pid: 0, comm: swapper Tainted: G        W 2.6.26-smp #2
>> [<c0180f5d>] print_trailer+0xad/0xf0
>> [<c018103b>] check_bytes_and_report+0x9b/0xc0
>> [<c018145e>] check_object+0x19e/0x1e0
>> [<c01821a4>] __slab_alloc+0x454/0x4f0
>> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0
>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
>> [<c03dce79>] __alloc_skb+0x49/0x100
>> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30
>> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]
>> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
>> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0
>> [<c012eafc>] tasklet_action+0x4c/0xc0
>> [<c012e463>] __do_softirq+0x93/0x120
>> [<c012e547>] do_softirq+0x57/0x60
>> [<c012ea29>] irq_exit+0x69/0x80
>> [<c0106b55>] do_IRQ+0x45/0x80
>> [<c010a5d0>] ? mwait_idle+0x0/0x50
>> [<c0104752>] common_interrupt+0x2e/0x34
>> [<c010a5d0>] ? mwait_idle+0x0/0x50
>> [<c010a609>] ? mwait_idle+0x39/0x50
>> [<c01026e0>] cpu_idle+0x60/0xd0
>> [<c043c8ce>] rest_init+0x4e/0x60
>> =======================
>> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b
>>
>> FIX kmalloc-4096: Marking all objects used
>> [<c0243b4f>] ? security_file_permission+0xf/0x20
>> [<c019436f>] sys_select+0x3f/0x190
>> [<c01878e9>] ? fput+0x19/0x20
>> [<c0103dbf>] ? restore_nocheck+0x12/0x15
>> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140
>> [<c0103d5e>] syscall_call+0x7/0xb
>> =======================
>>
>> Dave, could you please remind us which net driver was in use here?
>
> There's ath5k in the stack trace but that, of course, doesn't
> automatically mean it's at fault here. It could have been just the poor
> bastard who was the next to allocate 4 KB with kmalloc() noticing the
> corruption.
>
> Hope this helps!


I've seen something similar with fragmentation code in iwl4965 but I
can reproduce it only when using SLAB. With SLUB it didn't shown up.
Does anyone know what is difference between SLAB and SLUB in this context.?
Tomas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  9:50               ` Tomas Winkler
@ 2008-07-31  9:53                 ` Pekka Enberg
  2008-07-31 10:29                   ` Tomas Winkler
  0 siblings, 1 reply; 19+ messages in thread
From: Pekka Enberg @ 2008-07-31  9:53 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: Andrew Morton, Dave Young, Johannes Berg, linux-kernel,
	linux-wireless

On Thu, 2008-07-31 at 12:50 +0300, Tomas Winkler wrote:
> > There's ath5k in the stack trace but that, of course, doesn't
> > automatically mean it's at fault here. It could have been just the poor
> > bastard who was the next to allocate 4 KB with kmalloc() noticing the
> > corruption.
> >
> > Hope this helps!
> 
> I've seen something similar with fragmentation code in iwl4965 but I
> can reproduce it only when using SLAB. With SLUB it didn't shown up.
> Does anyone know what is difference between SLAB and SLUB in this context.?

Do you have CONFIG_SLUB_DEBUG_ON enabled? (Note: if you only have
CONFIG_SLUB_DEBUG enabled, then you need to enable debugging at run-time
by passing slub_debug to kernel command line.)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  9:53                 ` Pekka Enberg
@ 2008-07-31 10:29                   ` Tomas Winkler
  0 siblings, 0 replies; 19+ messages in thread
From: Tomas Winkler @ 2008-07-31 10:29 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Dave Young, Johannes Berg, linux-kernel,
	linux-wireless

On Thu, Jul 31, 2008 at 12:53 PM, Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> On Thu, 2008-07-31 at 12:50 +0300, Tomas Winkler wrote:
>> > There's ath5k in the stack trace but that, of course, doesn't
>> > automatically mean it's at fault here. It could have been just the poor
>> > bastard who was the next to allocate 4 KB with kmalloc() noticing the
>> > corruption.
>> >
>> > Hope this helps!
>>
>> I've seen something similar with fragmentation code in iwl4965 but I
>> can reproduce it only when using SLAB. With SLUB it didn't shown up.
>> Does anyone know what is difference between SLAB and SLUB in this context.?
>
> Do you have CONFIG_SLUB_DEBUG_ON enabled? (Note: if you only have
> CONFIG_SLUB_DEBUG enabled, then you need to enable debugging at run-time
> by passing slub_debug to kernel command line.)

Yes sure I've done that with slub_debug=FZPU
Tomas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-07-31  9:15             ` Pekka J Enberg
  2008-07-31  9:50               ` Tomas Winkler
@ 2008-08-01  7:32               ` Dave Young
  2008-08-04  9:22                 ` Jiri Slaby
  2008-08-04 10:00                 ` Jiri Slaby
  1 sibling, 2 replies; 19+ messages in thread
From: Dave Young @ 2008-08-01  7:32 UTC (permalink / raw)
  To: Pekka J Enberg; +Cc: Andrew Morton, Johannes Berg, linux-kernel, linux-wireless

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 5848 bytes --]

On Thu, Jul 31, 2008 at 5:15 PM, Pekka J Enberg <penberg@cs.helsinki.fi> wrote:> Hi Andrew,>> On Wed, 30 Jul 2008, Andrew Morton wrote:>> > Ok here it is.>> > BTW, I run "klogd -c 7" after boot>>>> The sysrq output is still missing lots of stuff.  I guess we broke it.>>>> >>> > This time I get a kmalloc poison overwritten:>> >>>>> argh, that stuff hurts my brain.  None of the numbers seem to make any>> sense for a 4k allocation :( Pekka, do you have time to decrypt this?>> Sure. Here goes:>> On Wed, 30 Jul 2008, Andrew Morton wrote:>> <fixes wordwrapping, cleans stuff up>>>>> =============================================================================>> BUG kmalloc-4096: Poison overwritten>> ----------------------------------------------------------------------------->>>> INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b>> That's POISON_FREE ("0x6b") overwritten which means use-after-free for> the range of 0xf6f3a080 - 0xf6f3a0ef (112 bytes). The rest of the> object is okay but the SLUB debugging code only dumps the first 128 bytes> of the object which is why we don't see the full corruption.>> 2.6.27-rc1 should dump the full object so I'm assuming this is pre -rc1?
Yes, it's 2.6.26
>>> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0>> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398>> So the corrupted object was free'd by skb_release_data() so we need to> look for a driver or the networking stack calling that function too early.>>> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3>> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030>>>> Bytes b4 0xf6f3a050:  5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÃ‰..ZZZZZZZZ>> The object starts here (the poison values for first 32 bytes are okay):>>> Object 0xf6f3a060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk>> Object 0xf6f3a070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk>> And here are the first 96 bytes of the corruption:>>> Object 0xf6f3a080:  80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....Ã¿Ã¿Ã¿Ã¿Ã¿Ã¿..{.F@>> Object 0xf6f3a090:  00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...>> Object 0xf6f3a0a0:  64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............>> Object 0xf6f3a0b0:  84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........>> Object 0xf6f3a0c0:  07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`>> Object 0xf6f3a0d0:  6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÃ...{..........>> But I think that's just the payload of a SKB?>>> Redzone 0xf6f3b060:  bb bb bb bb                                     Â»Â»Â»Â»>> The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces> use-after-free.>>> Padding 0xf6f3b088:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ>> Pid: 0, comm: swapper Tainted: G        W 2.6.26-smp #2>> [<c0180f5d>] print_trailer+0xad/0xf0>> [<c018103b>] check_bytes_and_report+0x9b/0xc0>> [<c018145e>] check_object+0x19e/0x1e0>> [<c01821a4>] __slab_alloc+0x454/0x4f0>> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30>> [<c03dce79>] __alloc_skb+0x49/0x100>> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30>> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]>> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]>> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0>> [<c012eafc>] tasklet_action+0x4c/0xc0>> [<c012e463>] __do_softirq+0x93/0x120>> [<c012e547>] do_softirq+0x57/0x60>> [<c012ea29>] irq_exit+0x69/0x80>> [<c0106b55>] do_IRQ+0x45/0x80>> [<c010a5d0>] ? mwait_idle+0x0/0x50>> [<c0104752>] common_interrupt+0x2e/0x34>> [<c010a5d0>] ? mwait_idle+0x0/0x50>> [<c010a609>] ? mwait_idle+0x39/0x50>> [<c01026e0>] cpu_idle+0x60/0xd0>> [<c043c8ce>] rest_init+0x4e/0x60>> =======================>> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b>>>> FIX kmalloc-4096: Marking all objects used>> [<c0243b4f>] ? security_file_permission+0xf/0x20>> [<c019436f>] sys_select+0x3f/0x190>> [<c01878e9>] ? fput+0x19/0x20>> [<c0103dbf>] ? restore_nocheck+0x12/0x15>> [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140>> [<c0103d5e>] syscall_call+0x7/0xb>> =======================>>>> Dave, could you please remind us which net driver was in use here?>> There's ath5k in the stack trace but that, of course, doesn't> automatically mean it's at fault here. It could have been just the poor> bastard who was the next to allocate 4 KB with kmalloc() noticing the> corruption.>> Hope this helps!>>                        Pekka
Lockdep helped me, this morning I get a lockdep warning about this,
[  171.432140] [ INFO: possible recursive locking detected ][  171.433113] 2.6.27-rc1-smp #4[  171.434079] ---------------------------------------------[  171.435039] ath5k_pci/2447 is trying to acquire lock:[  171.435990]  (&sc->lock){--..}, at: [<f89ee9b5>]ath5k_config_interface+0xd5/0x340 [ath5k][  171.437046][  171.437048] but task is already holding lock:[  171.438903]  (&sc->lock){--..}, at: [<f89ee91d>]ath5k_config_interface+0x3d/0x340 [ath5k][  171.439953][  171.439954] other info that might help us debug this:[  171.441795] 3 locks held by ath5k_pci/2447:[  171.442729]  #0:  ((name)){--..}, at: [<c013a122>] run_workqueue+0x102/0x1d0[  171.443800]  #1:  (&(&local->scan_work)->work){--..}, at:[<c013a122>] run_workqueue+0x102/0x1d0[  171.444859]  #2:  (&sc->lock){--..}, at: [<f89ee91d>]ath5k_config_interface+0x3d/0x340 [ath5k]
Deadlock happen here, I remove the lock in the sub routine, tested andfixed the problem for me.I will send the patch after a while.
But I still have no idea with the poison overwritten.
-- Regardsdaveÿôèº{.nÇ+‰·Ÿ®‰†+%ŠËÿ±éÝ¶\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dÊ‡Ú™ë,j\a¢f£¢·hšïêÿ‘êçz_è®\x03(éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨èÚ&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-01  7:32               ` Dave Young
@ 2008-08-04  9:22                 ` Jiri Slaby
  2008-08-04 10:00                 ` Jiri Slaby
  1 sibling, 0 replies; 19+ messages in thread
From: Jiri Slaby @ 2008-08-04  9:22 UTC (permalink / raw)
  To: Dave Young
  Cc: Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel,
	linux-wireless

Dave Young napsal(a):
> On Thu, Jul 31, 2008 at 5:15 PM, Pekka J Enberg <penberg@cs.helsinki.fi> wrote:
>> On Wed, 30 Jul 2008, Andrew Morton wrote:
>>> INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0
>>> INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398
>> So the corrupted object was free'd by skb_release_data() so we need to
>> look for a driver or the networking stack calling that function too early.
>>
>>> INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3
>>> INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030
>>>
>>> Bytes b4 0xf6f3a050:  5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ
>> The object starts here (the poison values for first 32 bytes are okay):
>>
>>> Object 0xf6f3a060:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>>> Object 0xf6f3a070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
>> And here are the first 96 bytes of the corruption:
>>
>>> Object 0xf6f3a080:  80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@
>>> Object 0xf6f3a090:  00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@0..!.z!...
>>> Object 0xf6f3a0a0:  64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!.............
>>> Object 0xf6f3a0b0:  84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$.........
>>> Object 0xf6f3a0c0:  07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H`
>>> Object 0xf6f3a0d0:  6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{..........
>> But I think that's just the payload of a SKB?

It's a receive frame from ath5k, I suppose. 00:17:7b:00:46:40 is your AP?

>>> Redzone 0xf6f3b060:  bb bb bb bb                                     »»»»
>> The red-zone has SLUB_RED_INACTIVE ("0xbb") which reinforces
>> use-after-free.
>>
>>> Padding 0xf6f3b088:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
>>> Pid: 0, comm: swapper Tainted: G        W 2.6.26-smp #2
>>> [<c0180f5d>] print_trailer+0xad/0xf0
>>> [<c018103b>] check_bytes_and_report+0x9b/0xc0
>>> [<c018145e>] check_object+0x19e/0x1e0
>>> [<c01821a4>] __slab_alloc+0x454/0x4f0
>>> [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0
>>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
>>> [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30
>>> [<c03dce79>] __alloc_skb+0x49/0x100
>>> [<c03dd1ec>] dev_alloc_skb+0x1c/0x30
>>> [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k]
>>> [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k]
>>> [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0
>>> [<c012eafc>] tasklet_action+0x4c/0xc0
[...]
>>> =======================
>>> FIX kmalloc-4096: Restoring 0xf6f3a080-0xf6f3a0ef=0x6b

>>> Dave, could you please remind us which net driver was in use here?
>> There's ath5k in the stack trace but that, of course, doesn't
>> automatically mean it's at fault here. It could have been just the poor
>> bastard who was the next to allocate 4 KB with kmalloc() noticing the
>> corruption.

No, unfortunately ath5k *is* likely the culprit. Next time please Cc 
ath5k-devel@lists.ath5k.org even if it is only a suspicion.

> But I still have no idea with the poison overwritten.

Could you try patch from
http://lkml.org/lkml/2008/7/15/276
? (I have no idea how reproducible is this by you, it often happens on noisy 
channels and/or by lowering RX buffers, i.e. ATH_RXBUF).

[It hit mainline few days ago, I'm going to fwd it to stable.]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-01  7:32               ` Dave Young
  2008-08-04  9:22                 ` Jiri Slaby
@ 2008-08-04 10:00                 ` Jiri Slaby
  2008-08-05  1:29                   ` Dave Young
  1 sibling, 1 reply; 19+ messages in thread
From: Jiri Slaby @ 2008-08-04 10:00 UTC (permalink / raw)
  To: Dave Young
  Cc: Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel,
	linux-wireless, Bob Copeland

Dave Young napsal(a):
> Lockdep helped me, this morning I get a lockdep warning about this,
> 
> [  171.432140] [ INFO: possible recursive locking detected ]
> [  171.433113] 2.6.27-rc1-smp #4
> [  171.434079] ---------------------------------------------
> [  171.435039] ath5k_pci/2447 is trying to acquire lock:
> [  171.435990]  (&sc->lock){--..}, at: [<f89ee9b5>]
> ath5k_config_interface+0xd5/0x340 [ath5k]
> [  171.437046]
> [  171.437048] but task is already holding lock:
> [  171.438903]  (&sc->lock){--..}, at: [<f89ee91d>]
> ath5k_config_interface+0x3d/0x340 [ath5k]
> [  171.439953]
> [  171.439954] other info that might help us debug this:
> [  171.441795] 3 locks held by ath5k_pci/2447:
> [  171.442729]  #0:  ((name)){--..}, at: [<c013a122>] run_workqueue+0x102/0x1d0
> [  171.443800]  #1:  (&(&local->scan_work)->work){--..}, at:
> [<c013a122>] run_workqueue+0x102/0x1d0
> [  171.444859]  #2:  (&sc->lock){--..}, at: [<f89ee91d>]
> ath5k_config_interface+0x3d/0x340 [ath5k]
> 
> Deadlock happen here, I remove the lock in the sub routine, tested and
> fixed the problem for me.
> I will send the patch after a while.

That's some kind different. 2.6.26 can't suffer from the thing above 
(opposing to 27-rc1).

Pasting the original issue below. Maybe it's been fixed too? I suppose you 
don't encounter the problem anymore in 27-rc1 with your/Bob's/mine (fix for 
the same thing in the same manner) patch applied?

BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm 
l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib 
ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel 
snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore 
thermal processor button thermal_sys evdev sg

Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
EIP is at mutex_lock_nested+0x1b3/0x280
EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
  [<c02375a6>] ? debugfs_create_file+0x46/0x210
  [<c02375a6>] ? debugfs_create_file+0x46/0x210
  [<c02375a6>] debugfs_create_file+0x46/0x210
  [<c02377f1>] debugfs_create_dir+0x21/0x30
  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
  [<c01394c6>] run_workqueue+0x156/0x1d0
  [<c0139472>] ? run_workqueue+0x102/0x1d0
  [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211]
  [<c0139f18>] worker_thread+0x88/0xe0
  [<c013cd20>] ? autoremove_wake_function+0x0/0x40
  [<c0139e90>] ? worker_thread+0x0/0xe0
  [<c013ca32>] kthread+0x42/0x70
  [<c013c9f0>] ? kthread+0x0/0x70
  [<c0104993>] kernel_thread_helper+0x7/0x14
  =======================
BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss 
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm 
l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib 
ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel 
snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore 
thermal processor button thermal_sys evdev sg

Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
EIP is at mutex_lock_nested+0x1b3/0x280
EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
  [<c02375a6>] ? debugfs_create_file+0x46/0x210
  [<c02375a6>] ? debugfs_create_file+0x46/0x210
  [<c02375a6>] debugfs_create_file+0x46/0x210
  [<c02377f1>] debugfs_create_dir+0x21/0x30
  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
  [<c01394c6>] run_workqueue+0x156/0x1d0
  [<c0139472>] ? run_workqueue+0x102/0x1d0
  [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211]
  [<c0139f18>] worker_thread+0x88/0xe0
  [<c013cd20>] ? autoremove_wake_function+0x0/0x40
  [<c0139e90>] ? worker_thread+0x0/0xe0
  [<c013ca32>] kthread+0x42/0x70
  [<c013c9f0>] ? kthread+0x0/0x70
  [<c0104993>] kernel_thread_helper+0x7/0x14
  =======================

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-04 10:00                 ` Jiri Slaby
@ 2008-08-05  1:29                   ` Dave Young
  2008-08-05 12:24                     ` Bob Copeland
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Young @ 2008-08-05  1:29 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Pekka J Enberg, Andrew Morton, Johannes Berg, linux-kernel,
	linux-wireless, Bob Copeland

On Mon, Aug 4, 2008 at 6:00 PM, Jiri Slaby <jirislaby@gmail.com> wrote:
> Dave Young napsal(a):
>>
>> Lockdep helped me, this morning I get a lockdep warning about this,
>>
>> [  171.432140] [ INFO: possible recursive locking detected ]
>> [  171.433113] 2.6.27-rc1-smp #4
>> [  171.434079] ---------------------------------------------
>> [  171.435039] ath5k_pci/2447 is trying to acquire lock:
>> [  171.435990]  (&sc->lock){--..}, at: [<f89ee9b5>]
>> ath5k_config_interface+0xd5/0x340 [ath5k]
>> [  171.437046]
>> [  171.437048] but task is already holding lock:
>> [  171.438903]  (&sc->lock){--..}, at: [<f89ee91d>]
>> ath5k_config_interface+0x3d/0x340 [ath5k]
>> [  171.439953]
>> [  171.439954] other info that might help us debug this:
>> [  171.441795] 3 locks held by ath5k_pci/2447:
>> [  171.442729]  #0:  ((name)){--..}, at: [<c013a122>]
>> run_workqueue+0x102/0x1d0
>> [  171.443800]  #1:  (&(&local->scan_work)->work){--..}, at:
>> [<c013a122>] run_workqueue+0x102/0x1d0
>> [  171.444859]  #2:  (&sc->lock){--..}, at: [<f89ee91d>]
>> ath5k_config_interface+0x3d/0x340 [ath5k]
>>
>> Deadlock happen here, I remove the lock in the sub routine, tested and
>> fixed the problem for me.
>> I will send the patch after a while.
>
> That's some kind different. 2.6.26 can't suffer from the thing above
> (opposing to 27-rc1).

Yes, it's got with 27-rc1

>
> Pasting the original issue below. Maybe it's been fixed too? I suppose you
> don't encounter the problem anymore in 27-rc1 with your/Bob's/mine (fix for
> the same thing in the same manner) patch applied?

With the patch I cann't reproduce the bug with 27-rc1 now.

>
> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
> Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss
> snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm
> l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib
> ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel
> snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore
> thermal processor button thermal_sys evdev sg
>
> Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
> EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
> EIP is at mutex_lock_nested+0x1b3/0x280
> EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
> ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
>  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>  [<c02375a6>] debugfs_create_file+0x46/0x210
>  [<c02377f1>] debugfs_create_dir+0x21/0x30
>  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
>  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
>  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
>  [<c01394c6>] run_workqueue+0x156/0x1d0
>  [<c0139472>] ? run_workqueue+0x102/0x1d0
>  [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211]
>  [<c0139f18>] worker_thread+0x88/0xe0
>  [<c013cd20>] ? autoremove_wake_function+0x0/0x40
>  [<c0139e90>] ? worker_thread+0x0/0xe0
>  [<c013ca32>] kthread+0x42/0x70
>  [<c013c9f0>] ? kthread+0x0/0x70
>  [<c0104993>] kernel_thread_helper+0x7/0x14
>  =======================
> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]
> Modules linked in: radeonfb fb_ddc i2c_algo_bit snd_seq_dummy snd_seq_oss
> snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm
> l2cap bluetooth vfat fuse rtc_cmos rtc_core psmouse intel_agp pl2303 rtc_lib
> ath5k e100 i2c_i801 serio_raw usbserial agpgart 3c59x mii snd_hda_intel
> snd_pcm mac80211 snd_timer snd_page_alloc cfg80211 snd_hwdep snd soundcore
> thermal processor button thermal_sys evdev sg
>
> Pid: 9, comm: events/0 Tainted: G        W (2.6.26-smp #3)
> EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0
> EIP is at mutex_lock_nested+0x1b3/0x280
> EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8
> ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
>  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>  [<c02375a6>] debugfs_create_file+0x46/0x210
>  [<c02377f1>] debugfs_create_dir+0x21/0x30
>  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
>  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
>  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
>  [<c01394c6>] run_workqueue+0x156/0x1d0
>  [<c0139472>] ? run_workqueue+0x102/0x1d0
>  [<f88eba00>] ? sta_info_debugfs_add_work+0x0/0x130 [mac80211]
>  [<c0139f18>] worker_thread+0x88/0xe0
>  [<c013cd20>] ? autoremove_wake_function+0x0/0x40
>  [<c0139e90>] ? worker_thread+0x0/0xe0
>  [<c013ca32>] kthread+0x42/0x70
>  [<c013c9f0>] ? kthread+0x0/0x70
>  [<c0104993>] kernel_thread_helper+0x7/0x14
>  =======================
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-05  1:29                   ` Dave Young
@ 2008-08-05 12:24                     ` Bob Copeland
  2008-08-06  1:51                       ` Dave Young
  0 siblings, 1 reply; 19+ messages in thread
From: Bob Copeland @ 2008-08-05 12:24 UTC (permalink / raw)
  To: Dave Young
  Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg,
	linux-kernel, linux-wireless

On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote:
> With the patch I cann't reproduce the bug with 27-rc1 now.
> 
> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
> >  [<c02375a6>] debugfs_create_file+0x46/0x210
> >  [<c02377f1>] debugfs_create_dir+0x21/0x30
> >  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
> >  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
> >  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]

I wonder if there were two separate problems here.  I looked into
this with some detail yesterday and agree with Johannes that the above 
trace is on locking the parent directory's i_mutex, but I too couldn't
see any problems with sta_info_debugfs_add_work.  Other stuff could also 
modify the directory with or without rtnl_lock, but not in a way that 
to my untrained eyes would lead to deadlock.

Or is the trace just wrong?

-- 
Bob Copeland %% www.bobcopeland.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-05 12:24                     ` Bob Copeland
@ 2008-08-06  1:51                       ` Dave Young
  2008-08-06  1:53                         ` Dave Young
  2008-08-12  4:19                         ` Dave Young
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Young @ 2008-08-06  1:51 UTC (permalink / raw)
  To: Bob Copeland
  Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg,
	linux-kernel, linux-wireless

On Tue, Aug 5, 2008 at 8:24 PM, Bob Copeland <me@bobcopeland.com> wrote:
> On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote:
>> With the patch I cann't reproduce the bug with 27-rc1 now.
>>
>> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>> >  [<c02375a6>] debugfs_create_file+0x46/0x210
>> >  [<c02377f1>] debugfs_create_dir+0x21/0x30
>> >  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
>> >  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
>> >  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
>
> I wonder if there were two separate problems here.  I looked into
> this with some detail yesterday and agree with Johannes that the above
> trace is on locking the parent directory's i_mutex, but I too couldn't
> see any problems with sta_info_debugfs_add_work.  Other stuff could also
> modify the directory with or without rtnl_lock, but not in a way that
> to my untrained eyes would lead to deadlock.

Yes,. I think so. It's the original bug for me, while testing I found
the mutex deadlock problem.

But this week I will have no time to trace it. so if I have time I
will keep tracing the problem

>
> Or is the trace just wrong?
>
> --
> Bob Copeland %% www.bobcopeland.com
>
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-06  1:51                       ` Dave Young
@ 2008-08-06  1:53                         ` Dave Young
  2008-08-12  4:19                         ` Dave Young
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Young @ 2008-08-06  1:53 UTC (permalink / raw)
  To: Bob Copeland
  Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg,
	linux-kernel, linux-wireless

On Wed, Aug 6, 2008 at 9:51 AM, Dave Young <hidave.darkstar@gmail.com> wrote:
> On Tue, Aug 5, 2008 at 8:24 PM, Bob Copeland <me@bobcopeland.com> wrote:
>> On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote:
>>> With the patch I cann't reproduce the bug with 27-rc1 now.
>>>
>>> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>>> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>>> >  [<c02375a6>] debugfs_create_file+0x46/0x210
>>> >  [<c02377f1>] debugfs_create_dir+0x21/0x30
>>> >  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
>>> >  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
>>> >  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
>>
>> I wonder if there were two separate problems here.  I looked into
>> this with some detail yesterday and agree with Johannes that the above
>> trace is on locking the parent directory's i_mutex, but I too couldn't
>> see any problems with sta_info_debugfs_add_work.  Other stuff could also
>> modify the directory with or without rtnl_lock, but not in a way that
>> to my untrained eyes would lead to deadlock.
>
> Yes,. I think so. It's the original bug for me, while testing I found
> the mutex deadlock problem.
>
> But this week I will have no time to trace it. so if I have time I
> will keep tracing the problem

Additional info,

With the mutex fix patch, in 2.6.27-rc1 I seems can not reproduce the
debugfs_add bug, (maybe need more test)

But with 2.6.26, the bug can be reproduced. (The mutex fix patch need
not to be applied because there's no such deadlock bug)

>
>>
>> Or is the trace just wrong?
>>
>> --
>> Bob Copeland %% www.bobcopeland.com
>>
>>
>
>
>
> --
> Regards
> dave
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [BUG] wireless : cpu stuck for 61s
  2008-08-06  1:51                       ` Dave Young
  2008-08-06  1:53                         ` Dave Young
@ 2008-08-12  4:19                         ` Dave Young
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Young @ 2008-08-12  4:19 UTC (permalink / raw)
  To: Bob Copeland
  Cc: Jiri Slaby, Pekka J Enberg, Andrew Morton, Johannes Berg,
	linux-kernel, linux-wireless

On Wed, Aug 6, 2008 at 9:51 AM, Dave Young <hidave.darkstar@gmail.com> wrote:
> On Tue, Aug 5, 2008 at 8:24 PM, Bob Copeland <me@bobcopeland.com> wrote:
>> On Tue, Aug 05, 2008 at 09:29:26AM +0800, Dave Young wrote:
>>> With the patch I cann't reproduce the bug with 27-rc1 now.
>>>
>>> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>>> >  [<c02375a6>] ? debugfs_create_file+0x46/0x210
>>> >  [<c02375a6>] debugfs_create_file+0x46/0x210
>>> >  [<c02377f1>] debugfs_create_dir+0x21/0x30
>>> >  [<f8901f6d>] ieee80211_sta_debugfs_add+0x2d/0x150 [mac80211]
>>> >  [<f88eba89>] sta_info_debugfs_add_work+0x89/0x130 [mac80211]
>>> >  [<f890a170>] ? rate_control_pid_add_sta_debugfs+0x0/0x30 [mac80211]
>>
>> I wonder if there were two separate problems here.  I looked into
>> this with some detail yesterday and agree with Johannes that the above
>> trace is on locking the parent directory's i_mutex, but I too couldn't
>> see any problems with sta_info_debugfs_add_work.  Other stuff could also
>> modify the directory with or without rtnl_lock, but not in a way that
>> to my untrained eyes would lead to deadlock.
>
> Yes,. I think so. It's the original bug for me, while testing I found
> the mutex deadlock problem.
>
> But this week I will have no time to trace it. so if I have time I
> will keep tracing the problem

Unfortunately I have no more time to do this these days.
As I can't reproduce it with the latest git kernel I think that It
might have been fixed already.

>
>>
>> Or is the trace just wrong?
>>
>> --
>> Bob Copeland %% www.bobcopeland.com
>>
>>
>
>
>
> --
> Regards
> dave
>



-- 
Regards
dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-08-12  4:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-29  5:57 [BUG] wireless : cpu stuck for 61s Dave Young
2008-07-29 12:32 ` Johannes Berg
2008-07-30  9:08   ` Andrew Morton
     [not found]     ` <a8e1da0807300302m35a0a7c0q2ff2e0ab92a69239@mail.gmail.com>
2008-07-30 10:10       ` Andrew Morton
2008-07-31  2:06         ` Dave Young
2008-07-31  2:56           ` Andrew Morton
2008-07-31  3:01             ` Dave Young
2008-07-31  9:15             ` Pekka J Enberg
2008-07-31  9:50               ` Tomas Winkler
2008-07-31  9:53                 ` Pekka Enberg
2008-07-31 10:29                   ` Tomas Winkler
2008-08-01  7:32               ` Dave Young
2008-08-04  9:22                 ` Jiri Slaby
2008-08-04 10:00                 ` Jiri Slaby
2008-08-05  1:29                   ` Dave Young
2008-08-05 12:24                     ` Bob Copeland
2008-08-06  1:51                       ` Dave Young
2008-08-06  1:53                         ` Dave Young
2008-08-12  4:19                         ` Dave Young

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox