public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 3.0.3 kernel BUG at kernel/timer.c:1035
@ 2011-08-24 13:02 Frank van Maarseveen
  2011-08-30  9:08 ` Frank van Maarseveen
  2011-09-02  8:10 ` Andrew Morton
  0 siblings, 2 replies; 11+ messages in thread
From: Frank van Maarseveen @ 2011-08-24 13:02 UTC (permalink / raw)
  To: linux-kernel

Got several of these (logged via netconsole):

 kernel BUG at kernel/timer.c:1035!
 invalid opcode: 0000 [#1]
 PREEMPT
 SMP

 Modules linked in:
  [last unloaded: scsi_wait_scan]


 Pid: 0, comm: swapper Not tainted 3.0.3-x263 #1
  Dell Inc.                 OptiPlex GX620
 /0F8098

 EIP: 0060:[<c107adfe>] EFLAGS: 00010812 CPU: 0
 EIP is at cascade+0x6e/0x70
 EAX: 6b6b6b6a EBX: c1bbb480 ECX: c1ac2d50 EDX: f541335c
 ESI: c1ac2d50 EDI: f600bf60 EBP: f600bf74 ESP: f600bf5c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 Process swapper (pid: 0, ti=f600a000 task=c1aba320 task.ti=c1a94000)
 Stack:
  00000034
  f541335c
  c1ac2d50
  c1bbb480
  00000000
  f600bfac
  f600bfc0
  c107af48

  00000004
  00000000
  f600bfb8
  c1069265
  00000000
  f600bfa8
  c1bbc29c
  c1bbc09c

  c1bbbe9c
  c106a28b
  00000100
  c1bbbc9c
  c106a28b
  00000100
  00000041
  c1a99a84

 Call Trace:
  [<c107af48>] run_timer_softirq+0x148/0x1e0
  [<c1069265>] ? rebalance_domains+0x135/0x160
  [<c106a28b>] ? get_parent_ip+0xb/0x40
  [<c106a28b>] ? get_parent_ip+0xb/0x40
  [<c1075098>] __do_softirq+0x78/0x100
  [<c1075020>] ? local_bh_enable+0xa0/0xa0
  <IRQ>

  [<c10753ad>] ? irq_exit+0x5d/0x70
  [<c104df53>] ? smp_apic_timer_interrupt+0x53/0x90
  [<c178fa22>] ? apic_timer_interrupt+0x2a/0x30
  [<c103d3ed>] ? mwait_idle+0x4d/0x80
  [<c1034b0a>] ? cpu_idle+0x3a/0x80
  [<c176dc3b>] ? rest_init+0x7b/0x80
  [<c1b1471b>] ? start_kernel+0x2e2/0x2e8
  [<c1b141c1>] ? loglevel+0x1a/0x1a
  [<c1b140b3>] ? i386_start_kernel+0xb3/0xbb


Got one stack trace on 64 bit:

 kernel BUG at kernel/timer.c:1035!
 invalid opcode: 0000 [#1]
 PREEMPT
 SMP

 CPU 1

 Modules linked in:
  vmthrottle
  radeon
  [last unloaded: scsi_wait_scan]


 Pid: 4312, comm: qemu Not tainted 3.0.3-x263lm #1
  Dell Inc.                 Dell DXP051
 /0FJ030

 RIP: 0010:[<ffffffff8109438b>]
  [<ffffffff8109438b>] cascade+0x9b/0xa0
 RSP: 0018:ffff8800dfc83e40  EFLAGS: 00210096
 RAX: 6b6b6b6b6b6b6b6a RBX: ffff8800dfc83e40 RCX: ffff8800df0ad080
 RDX: ffff8800dfc83e40 RSI: ffff8800daa7c838 RDI: ffff8800df0ac000
 RBP: ffff8800dfc83e70 R08: ffff8800dfc8c640 R09: ffff8800dfc90df8
 R10: 0000000000000001 R11: ffffffff8189c230 R12: ffff8800df0ac000
 R13: ffff8800dfc83e40 R14: 0000000000000005 R15: 0000000000000001
 FS:  0000000000000000(0000) GS:ffff8800dfc80000(0063) knlGS:00000000f760b770
 CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
 CR2: 00000000080582b8 CR3: 00000000d1b44000 CR4: 00000000000026e0
 DR0: 0000000000000001 DR1: 0000000000000002 DR2: 0000000000000001
 DR3: 000000000000000a DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process qemu (pid: 4312, threadinfo ffff8800c6052000, task ffff8800d1ae9c80)
 Stack:
  ffff8800daa7c838
  ffff8800daa7c838
  0000000000000000
  ffff8800df0ac000

  0000000000000101
  ffff8800dfc83eb0
  ffff8800dfc83ef0
  ffffffff81094653

  ffff8800c6053fd8
  ffff8800c6053fd8
  ffff8800df0adc30
  ffff8800df0ad830

 Call Trace:
  <IRQ>

  [<ffffffff81094653>] run_timer_softirq+0x183/0x250
  [<ffffffff81058398>] ? lapic_next_event+0x18/0x20
  [<ffffffff810b35f7>] ? clockevents_program_event+0x57/0xa0
  [<ffffffff8108d9da>] __do_softirq+0x9a/0x150
  [<ffffffff8188625c>] call_softirq+0x1c/0x30
  [<ffffffff8103ebe5>] do_softirq+0x65/0xa0
  [<ffffffff8108d72d>] irq_exit+0x7d/0xa0
  [<ffffffff81058c99>] smp_apic_timer_interrupt+0x69/0xa0
  [<ffffffff81885d13>] apic_timer_interrupt+0x13/0x20
  <EOI>

  [<ffffffff810a4b99>] ? add_wait_queue+0x49/0x60
  [<ffffffff81884914>] ? _raw_spin_unlock_irqrestore+0x44/0x50
  [<ffffffff810a4b99>] ? add_wait_queue+0x49/0x60
  [<ffffffff81132f2a>] __pollwait+0x7a/0x100
  [<ffffffff8115eb97>] eventfd_poll+0x27/0x70
  [<ffffffff81133ce6>] do_select+0x3d6/0x730
  [<ffffffff81132eb0>] ? poll_freewait+0xc0/0xc0
  [<ffffffff81132fb0>] ? __pollwait+0x100/0x100
 last message repeated 5 times
  [<ffffffff8108239d>] ? sub_preempt_count+0x9d/0xd0
  [<ffffffff81081111>] ? get_parent_ip+0x11/0x50
  [<ffffffff8108239d>] ? sub_preempt_count+0x9d/0xd0
  [<ffffffff81882d53>] ? __mutex_lock_slowpath+0x2a3/0x350
  [<ffffffff811663bc>] compat_core_sys_select+0x1fc/0x280
  [<ffffffff81120ce1>] ? do_sync_read+0xd1/0x120
  [<ffffffff81081111>] ? get_parent_ip+0x11/0x50
  [<ffffffff81043ef6>] ? read_tsc+0x16/0x40
  [<ffffffff810ae732>] ? ktime_get_ts+0xb2/0xe0
  [<ffffffff811666fa>] compat_sys_select+0x4a/0x120
  [<ffffffff810c382b>] ? compat_sys_gettimeofday+0xbb/0xd0
  [<ffffffff8188631c>] sysenter_dispatch+0x7/0x32


In all these cases the issue was triggered by unplugging a mounted ext3
USB stick + an automated umount -l -f afterwards by udev using something
like the script below. A few seconds after the unplug+umount the system
crashed with the above traces, followed by a secondary

  Kernel panic - not syncing: Fatal exception in interrupt

Unfortunately I'm unable to reproduce the issue right now so there must
be some unknown precondition or it is a race. Script:

--------
#!/bin/sh
#
# /etc/udev/rules.d/99-local.rules:
#	SUBSYSTEM=="block", ACTION=="add|remove",	RUN+="/usr/local/sbin/plugdev"

media_add()
{
	mkdir -p /media/$dev
	mount -t "$1" -o "$2" /dev/$dev /media/$dev
}

media_remove()
{
	umount -f -l /media/$dev
	rmdir /media/* 2>/dev/null
}

dev=`echo $DEVNAME|sed 's/.*\///'`
case "$ID_FS_TYPE.$ACTION.$dev" in
ext[234].add.?*)
	media_add $ID_FS_TYPE nodev,nosuid
	;;
vfat.add.?*)
	media_add vfat umask=0
	;;
*.remove.?*)
	media_remove
	;;
esac
--------

-- 
Frank

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-08-24 13:02 3.0.3 kernel BUG at kernel/timer.c:1035 Frank van Maarseveen
@ 2011-08-30  9:08 ` Frank van Maarseveen
  2011-09-02  8:10 ` Andrew Morton
  1 sibling, 0 replies; 11+ messages in thread
From: Frank van Maarseveen @ 2011-08-30  9:08 UTC (permalink / raw)
  To: linux-kernel

Still present in 3.0.4.

-- 
Frank

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-08-24 13:02 3.0.3 kernel BUG at kernel/timer.c:1035 Frank van Maarseveen
  2011-08-30  9:08 ` Frank van Maarseveen
@ 2011-09-02  8:10 ` Andrew Morton
  2011-09-05 12:38   ` Frank van Maarseveen
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2011-09-02  8:10 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: linux-kernel

On Wed, 24 Aug 2011 15:02:38 +0200 Frank van Maarseveen <frankvm@frankvm.com> wrote:

> Got several of these (logged via netconsole):
> 
>  kernel BUG at kernel/timer.c:1035!
>  invalid opcode: 0000 [#1]
>  PREEMPT
>  SMP
> 
> ...
>
>  Call Trace:
>   [<c107af48>] run_timer_softirq+0x148/0x1e0
>   [<c1069265>] ? rebalance_domains+0x135/0x160
>   [<c106a28b>] ? get_parent_ip+0xb/0x40
>   [<c106a28b>] ? get_parent_ip+0xb/0x40
>   [<c1075098>] __do_softirq+0x78/0x100
>   [<c1075020>] ? local_bh_enable+0xa0/0xa0
>   <IRQ>
> 
>   [<c10753ad>] ? irq_exit+0x5d/0x70
>   [<c104df53>] ? smp_apic_timer_interrupt+0x53/0x90
>   [<c178fa22>] ? apic_timer_interrupt+0x2a/0x30
>   [<c103d3ed>] ? mwait_idle+0x4d/0x80
>   [<c1034b0a>] ? cpu_idle+0x3a/0x80
>   [<c176dc3b>] ? rest_init+0x7b/0x80
>   [<c1b1471b>] ? start_kernel+0x2e2/0x2e8
>   [<c1b141c1>] ? loglevel+0x1a/0x1a
>   [<c1b140b3>] ? i386_start_kernel+0xb3/0xbb
> 
> 

Could be that a timer was freed while still running.

Please ensure that all kernel debugging options are enabled. 
Especially 

CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_SELFTEST=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_SLUB_DEBUG_ON=y

CONFIG_DEBUG_OBJECTS_TIMERS might catch this one.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-02  8:10 ` Andrew Morton
@ 2011-09-05 12:38   ` Frank van Maarseveen
       [not found]     ` <CAF1ivSYui_=tHbxHiy15a9wfiphHcpoY+J1MmAJ=dMQsAfEVLw@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Frank van Maarseveen @ 2011-09-05 12:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Fri, Sep 02, 2011 at 01:10:58AM -0700, Andrew Morton wrote:
> On Wed, 24 Aug 2011 15:02:38 +0200 Frank van Maarseveen <frankvm@frankvm.com> wrote:
> 
> > Got several of these (logged via netconsole):
> > 
> >  kernel BUG at kernel/timer.c:1035!
> >  invalid opcode: 0000 [#1]
> >  PREEMPT
> >  SMP
> > 
> > ...
> >
> >  Call Trace:
> >   [<c107af48>] run_timer_softirq+0x148/0x1e0
> >   [<c1069265>] ? rebalance_domains+0x135/0x160
> >   [<c106a28b>] ? get_parent_ip+0xb/0x40
> >   [<c106a28b>] ? get_parent_ip+0xb/0x40
> >   [<c1075098>] __do_softirq+0x78/0x100
> >   [<c1075020>] ? local_bh_enable+0xa0/0xa0
> >   <IRQ>
> > 
> >   [<c10753ad>] ? irq_exit+0x5d/0x70
> >   [<c104df53>] ? smp_apic_timer_interrupt+0x53/0x90
> >   [<c178fa22>] ? apic_timer_interrupt+0x2a/0x30
> >   [<c103d3ed>] ? mwait_idle+0x4d/0x80
> >   [<c1034b0a>] ? cpu_idle+0x3a/0x80
> >   [<c176dc3b>] ? rest_init+0x7b/0x80
> >   [<c1b1471b>] ? start_kernel+0x2e2/0x2e8
> >   [<c1b141c1>] ? loglevel+0x1a/0x1a
> >   [<c1b140b3>] ? i386_start_kernel+0xb3/0xbb
> > 
> > 
> 
> Could be that a timer was freed while still running.
> 
> Please ensure that all kernel debugging options are enabled. 
> Especially 
> 
> CONFIG_DEBUG_OBJECTS=y
> CONFIG_DEBUG_OBJECTS_SELFTEST=y
> CONFIG_DEBUG_OBJECTS_FREE=y
> CONFIG_DEBUG_OBJECTS_TIMERS=y
> CONFIG_DEBUG_OBJECTS_WORK=y
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
> CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
> CONFIG_SLUB_DEBUG_ON=y
> 
> CONFIG_DEBUG_OBJECTS_TIMERS might catch this one.

Got something after enabling the above. Unplugging USB storage
before umount a couple of times produced the following trace:

	usb 1-7: USB disconnect, device number 10
	Buffer I/O error on device sdb1, logical block 0
	lost page write due to I/O error on sdb1
	------------[ cut here ]------------
	WARNING: at lib/debugobjects.c:262 debug_print_object+0x85/0xa0()
	Hardware name: OptiPlex GX620               
	ODEBUG: free active (active state 0) object type: timer_list hint: wakeup_timer_fn+0x0/0x50
	Modules linked in: [last unloaded: scsi_wait_scan]
	Pid: 5144, comm: umount Not tainted 3.0.4-y264 #1
	Call Trace:
	[<c106f59d>] warn_slowpath_common+0x6d/0xa0
	[<c12b5d75>] ? debug_print_object+0x85/0xa0
	[<c12b5d75>] ? debug_print_object+0x85/0xa0
	[<c106f64e>] warn_slowpath_fmt+0x2e/0x30
	[<c12b5d75>] debug_print_object+0x85/0xa0
	[<c10d4bf0>] ? bdi_init+0x150/0x150
	[<c12b5f0a>] __debug_check_no_obj_freed+0xda/0x180
	[<c106a2fb>] ? get_parent_ip+0xb/0x40
	[<c12b68c5>] debug_check_no_obj_freed+0x15/0x20
	[<c10f07ba>] kmem_cache_free+0x9a/0xc0
	[<c12a9f18>] ? prop_local_destroy_percpu+0x8/0x10
	[<c10d4e3b>] ? bdi_destroy+0xdb/0x110
	[<c1299d95>] ? blk_release_queue+0x45/0x50
	[<c1299d95>] blk_release_queue+0x45/0x50
	[<c12a756a>] kobject_release+0x3a/0x80
	[<c12a7530>] ? kobject_del+0x60/0x60
	[<c12a891d>] kref_put+0x2d/0x60
	[<c12a747d>] kobject_put+0x1d/0x50
	[<c1084568>] ? __cancel_work_timer+0x68/0x70
	[<c129707d>] blk_put_queue+0xd/0x10
	[<c1465953>] scsi_device_dev_release_usercontext+0xe3/0x120
	[<c1465870>] ? scsi_device_cls_release+0x10/0x10
	[<c108432c>] execute_in_process_context+0x5c/0x70
	[<c1465853>] scsi_device_dev_release+0x13/0x20
	[<c1440559>] device_release+0x19/0x80
	[<c10f06f9>] ? kfree+0xc9/0xd0
	[<c12a7575>] ? kobject_release+0x45/0x80
	[<c12a756a>] kobject_release+0x3a/0x80
	[<c12a7530>] ? kobject_del+0x60/0x60
	[<c12a891d>] kref_put+0x2d/0x60
	[<c12a747d>] kobject_put+0x1d/0x50
	[<c12a747d>] ? kobject_put+0x1d/0x50
	[<c14402ff>] put_device+0xf/0x20
	[<c145b2f3>] scsi_device_put+0x33/0x50
	[<c148375b>] scsi_disk_put+0x2b/0x40
	[<c1483daf>] sd_release+0x2f/0x60
	[<c11200f0>] __blkdev_put+0x120/0x160
	[<c11200cd>] __blkdev_put+0xfd/0x160
	[<c1120153>] blkdev_put+0x23/0x110
	[<c10f67a0>] kill_block_super+0x40/0x70
	[<c10f6c0d>] deactivate_locked_super+0x3d/0x60
	[<c10f72a9>] deactivate_super+0x49/0x70
	[<c110e1c6>] mntput_no_expire+0x86/0xc0
	[<c110f10d>] sys_umount+0x5d/0xb0
	[<c179141c>] sysenter_do_call+0x12/0x2c
	---[ end trace ad6863f336beb434 ]---


-- 
Frank

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
       [not found]     ` <CAF1ivSYui_=tHbxHiy15a9wfiphHcpoY+J1MmAJ=dMQsAfEVLw@mail.gmail.com>
@ 2011-09-06 14:48       ` Lin Ming
  2011-09-07 10:24         ` Frank van Maarseveen
  0 siblings, 1 reply; 11+ messages in thread
From: Lin Ming @ 2011-09-06 14:48 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Andrew Morton, linux-kernel

On Mon, Sep 5, 2011 at 8:38 PM, Frank van Maarseveen <frankvm@frankvm.com> wrote:
> On Fri, Sep 02, 2011 at 01:10:58AM -0700, Andrew Morton wrote:
>> On Wed, 24 Aug 2011 15:02:38 +0200 Frank van Maarseveen <frankvm@frankvm.com> wrote:
>>
>> > Got several of these (logged via netconsole):
>> >
>> >  kernel BUG at kernel/timer.c:1035!
>> >  invalid opcode: 0000 [#1]
>> >  PREEMPT
>> >  SMP
>> >
>> > ...
>> >
>> >  Call Trace:
>> >   [<c107af48>] run_timer_softirq+0x148/0x1e0
>> >   [<c1069265>] ? rebalance_domains+0x135/0x160
>> >   [<c106a28b>] ? get_parent_ip+0xb/0x40
>> >   [<c106a28b>] ? get_parent_ip+0xb/0x40
>> >   [<c1075098>] __do_softirq+0x78/0x100
>> >   [<c1075020>] ? local_bh_enable+0xa0/0xa0
>> >   <IRQ>
>> >
>> >   [<c10753ad>] ? irq_exit+0x5d/0x70
>> >   [<c104df53>] ? smp_apic_timer_interrupt+0x53/0x90
>> >   [<c178fa22>] ? apic_timer_interrupt+0x2a/0x30
>> >   [<c103d3ed>] ? mwait_idle+0x4d/0x80
>> >   [<c1034b0a>] ? cpu_idle+0x3a/0x80
>> >   [<c176dc3b>] ? rest_init+0x7b/0x80
>> >   [<c1b1471b>] ? start_kernel+0x2e2/0x2e8
>> >   [<c1b141c1>] ? loglevel+0x1a/0x1a
>> >   [<c1b140b3>] ? i386_start_kernel+0xb3/0xbb
>> >
>> >
>>
>> Could be that a timer was freed while still running.
>>
>> Please ensure that all kernel debugging options are enabled.
>> Especially
>>
>> CONFIG_DEBUG_OBJECTS=y
>> CONFIG_DEBUG_OBJECTS_SELFTEST=y
>> CONFIG_DEBUG_OBJECTS_FREE=y
>> CONFIG_DEBUG_OBJECTS_TIMERS=y
>> CONFIG_DEBUG_OBJECTS_WORK=y
>> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
>> CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
>> CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
>> CONFIG_SLUB_DEBUG_ON=y
>>
>> CONFIG_DEBUG_OBJECTS_TIMERS might catch this one.
>
> Got something after enabling the above. Unplugging USB storage
> before umount a couple of times produced the following trace:
>
>        usb 1-7: USB disconnect, device number 10
>        Buffer I/O error on device sdb1, logical block 0
>        lost page write due to I/O error on sdb1
>        ------------[ cut here ]------------
>        WARNING: at lib/debugobjects.c:262 debug_print_object+0x85/0xa0()
>        Hardware name: OptiPlex GX620
>        ODEBUG: free active (active state 0) object type: timer_list hint: wakeup_timer_fn+0x0/0x50

Does below patch help?

>From a98b874437f871d5ecc3f6fe409b2b474b1f2731 Mon Sep 17 00:00:00 2001
From: Lin Ming <ming.m.lin@intel.com>
Date: Tue, 6 Sep 2011 22:45:43 +0800
Subject: [PATCH] block: delete bdi writeback wakup_timer in blk_cleanup_queue()

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
---
 block/blk-core.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..22529a3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -363,6 +363,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	blk_sync_queue(q);
 
 	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
+	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
 	mutex_lock(&q->sysfs_lock);
 	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
 	mutex_unlock(&q->sysfs_lock);
-- 
1.7.2.3

Regards,
Lin Ming

>        Modules linked in: [last unloaded: scsi_wait_scan]
>        Pid: 5144, comm: umount Not tainted 3.0.4-y264 #1
>        Call Trace:
>        [<c106f59d>] warn_slowpath_common+0x6d/0xa0
>        [<c12b5d75>] ? debug_print_object+0x85/0xa0
>        [<c12b5d75>] ? debug_print_object+0x85/0xa0
>        [<c106f64e>] warn_slowpath_fmt+0x2e/0x30
>        [<c12b5d75>] debug_print_object+0x85/0xa0
>        [<c10d4bf0>] ? bdi_init+0x150/0x150
>        [<c12b5f0a>] __debug_check_no_obj_freed+0xda/0x180
>        [<c106a2fb>] ? get_parent_ip+0xb/0x40
>        [<c12b68c5>] debug_check_no_obj_freed+0x15/0x20
>        [<c10f07ba>] kmem_cache_free+0x9a/0xc0
>        [<c12a9f18>] ? prop_local_destroy_percpu+0x8/0x10
>        [<c10d4e3b>] ? bdi_destroy+0xdb/0x110
>        [<c1299d95>] ? blk_release_queue+0x45/0x50
>        [<c1299d95>] blk_release_queue+0x45/0x50
>        [<c12a756a>] kobject_release+0x3a/0x80
>        [<c12a7530>] ? kobject_del+0x60/0x60
>        [<c12a891d>] kref_put+0x2d/0x60
>        [<c12a747d>] kobject_put+0x1d/0x50
>        [<c1084568>] ? __cancel_work_timer+0x68/0x70
>        [<c129707d>] blk_put_queue+0xd/0x10
>        [<c1465953>] scsi_device_dev_release_usercontext+0xe3/0x120
>        [<c1465870>] ? scsi_device_cls_release+0x10/0x10
>        [<c108432c>] execute_in_process_context+0x5c/0x70
>        [<c1465853>] scsi_device_dev_release+0x13/0x20
>        [<c1440559>] device_release+0x19/0x80
>        [<c10f06f9>] ? kfree+0xc9/0xd0
>        [<c12a7575>] ? kobject_release+0x45/0x80
>        [<c12a756a>] kobject_release+0x3a/0x80
>        [<c12a7530>] ? kobject_del+0x60/0x60
>        [<c12a891d>] kref_put+0x2d/0x60
>        [<c12a747d>] kobject_put+0x1d/0x50
>        [<c12a747d>] ? kobject_put+0x1d/0x50
>        [<c14402ff>] put_device+0xf/0x20
>        [<c145b2f3>] scsi_device_put+0x33/0x50
>        [<c148375b>] scsi_disk_put+0x2b/0x40
>        [<c1483daf>] sd_release+0x2f/0x60
>        [<c11200f0>] __blkdev_put+0x120/0x160
>        [<c11200cd>] __blkdev_put+0xfd/0x160
>        [<c1120153>] blkdev_put+0x23/0x110
>        [<c10f67a0>] kill_block_super+0x40/0x70
>        [<c10f6c0d>] deactivate_locked_super+0x3d/0x60
>        [<c10f72a9>] deactivate_super+0x49/0x70
>        [<c110e1c6>] mntput_no_expire+0x86/0xc0
>        [<c110f10d>] sys_umount+0x5d/0xb0
>        [<c179141c>] sysenter_do_call+0x12/0x2c
>        ---[ end trace ad6863f336beb434 ]---
>
>
> --
> Frank





^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-06 14:48       ` Lin Ming
@ 2011-09-07 10:24         ` Frank van Maarseveen
  2011-09-07 12:36           ` Lin Ming
  0 siblings, 1 reply; 11+ messages in thread
From: Frank van Maarseveen @ 2011-09-07 10:24 UTC (permalink / raw)
  To: Lin Ming; +Cc: Andrew Morton, linux-kernel

On Tue, Sep 06, 2011 at 10:48:38PM +0800, Lin Ming wrote:
> Does below patch help?
> 
> >From a98b874437f871d5ecc3f6fe409b2b474b1f2731 Mon Sep 17 00:00:00 2001
> From: Lin Ming <ming.m.lin@intel.com>
> Date: Tue, 6 Sep 2011 22:45:43 +0800
> Subject: [PATCH] block: delete bdi writeback wakup_timer in blk_cleanup_queue()
> 
> Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> ---
>  block/blk-core.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 90e1ffd..22529a3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -363,6 +363,7 @@ void blk_cleanup_queue(struct request_queue *q)
>  	blk_sync_queue(q);
>  
>  	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
> +	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
>  	mutex_lock(&q->sysfs_lock);
>  	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>  	mutex_unlock(&q->sysfs_lock);
> -- 
> 1.7.2.3
> 

No, bug still present. Stack trace is the same and I double checked that
it was the new kernel (this time with a lot more debug enabled).

-- 
Frank

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-07 10:24         ` Frank van Maarseveen
@ 2011-09-07 12:36           ` Lin Ming
  2011-09-07 21:30             ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Lin Ming @ 2011-09-07 12:36 UTC (permalink / raw)
  To: Frank van Maarseveen; +Cc: Andrew Morton, linux-kernel@vger.kernel.org

On Wed, 2011-09-07 at 18:24 +0800, Frank van Maarseveen wrote:
> On Tue, Sep 06, 2011 at 10:48:38PM +0800, Lin Ming wrote:
> > Does below patch help?
> > 
> > >From a98b874437f871d5ecc3f6fe409b2b474b1f2731 Mon Sep 17 00:00:00 2001
> > From: Lin Ming <ming.m.lin@intel.com>
> > Date: Tue, 6 Sep 2011 22:45:43 +0800
> > Subject: [PATCH] block: delete bdi writeback wakup_timer in blk_cleanup_queue()
> > 
> > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> > ---
> >  block/blk-core.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 90e1ffd..22529a3 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -363,6 +363,7 @@ void blk_cleanup_queue(struct request_queue *q)
> >  	blk_sync_queue(q);
> >  
> >  	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
> > +	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
> >  	mutex_lock(&q->sysfs_lock);
> >  	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
> >  	mutex_unlock(&q->sysfs_lock);
> > -- 
> > 1.7.2.3
> > 
> 
> No, bug still present. Stack trace is the same and I double checked that
> it was the new kernel (this time with a lot more debug enabled).

Thanks for test.
I'll try to reproduce this bug.

Lin Ming



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-07 12:36           ` Lin Ming
@ 2011-09-07 21:30             ` Andrew Morton
  2011-09-09 14:21               ` Lin Ming
  2011-09-20 13:16               ` Frank van Maarseveen
  0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2011-09-07 21:30 UTC (permalink / raw)
  To: Lin Ming; +Cc: Frank van Maarseveen, linux-kernel@vger.kernel.org, Jens Axboe

On Wed, 07 Sep 2011 20:36:19 +0800
Lin Ming <ming.m.lin@intel.com> wrote:

> On Wed, 2011-09-07 at 18:24 +0800, Frank van Maarseveen wrote:
> > On Tue, Sep 06, 2011 at 10:48:38PM +0800, Lin Ming wrote:
> > > Does below patch help?
> > > 
> > > >From a98b874437f871d5ecc3f6fe409b2b474b1f2731 Mon Sep 17 00:00:00 2001
> > > From: Lin Ming <ming.m.lin@intel.com>
> > > Date: Tue, 6 Sep 2011 22:45:43 +0800
> > > Subject: [PATCH] block: delete bdi writeback wakup_timer in blk_cleanup_queue()
> > > 
> > > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> > > ---
> > >  block/blk-core.c |    1 +
> > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 90e1ffd..22529a3 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -363,6 +363,7 @@ void blk_cleanup_queue(struct request_queue *q)
> > >  	blk_sync_queue(q);
> > >  
> > >  	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
> > > +	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
> > >  	mutex_lock(&q->sysfs_lock);
> > >  	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
> > >  	mutex_unlock(&q->sysfs_lock);
> > > -- 
> > > 1.7.2.3
> > > 
> > 
> > No, bug still present. Stack trace is the same and I double checked that
> > it was the new kernel (this time with a lot more debug enabled).
> 
> Thanks for test.
> I'll try to reproduce this bug.

Probably this will "fix" it:

--- a/block/blk-sysfs.c~a
+++ a/block/blk-sysfs.c
@@ -4,6 +4,7 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/timer.h>
 #include <linux/bio.h>
 #include <linux/blkdev.h>
 #include <linux/blktrace_api.h>
@@ -486,7 +487,7 @@ static void blk_release_queue(struct kob
 		__blk_queue_free_tags(q);
 
 	blk_trace_shutdown(q);
-
+	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
 	bdi_destroy(&q->backing_dev_info);
 	kmem_cache_free(blk_requestq_cachep, q);
 }
_

Jens, can you please take a look at this regression? 
blk_release_queue() is freeing a pending timer.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-07 21:30             ` Andrew Morton
@ 2011-09-09 14:21               ` Lin Ming
  2011-09-10  8:41                 ` Frank van Maarseveen
  2011-09-20 13:16               ` Frank van Maarseveen
  1 sibling, 1 reply; 11+ messages in thread
From: Lin Ming @ 2011-09-09 14:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Lin Ming, Frank van Maarseveen, linux-kernel@vger.kernel.org,
	Jens Axboe

On Thu, Sep 8, 2011 at 5:30 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 07 Sep 2011 20:36:19 +0800
> Lin Ming <ming.m.lin@intel.com> wrote:
>
>> On Wed, 2011-09-07 at 18:24 +0800, Frank van Maarseveen wrote:
>> > On Tue, Sep 06, 2011 at 10:48:38PM +0800, Lin Ming wrote:
>> > > Does below patch help?
>> > >
>> > > >From a98b874437f871d5ecc3f6fe409b2b474b1f2731 Mon Sep 17 00:00:00 2001
>> > > From: Lin Ming <ming.m.lin@intel.com>
>> > > Date: Tue, 6 Sep 2011 22:45:43 +0800
>> > > Subject: [PATCH] block: delete bdi writeback wakup_timer in blk_cleanup_queue()
>> > >
>> > > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
>> > > ---
>> > >  block/blk-core.c |    1 +
>> > >  1 files changed, 1 insertions(+), 0 deletions(-)
>> > >
>> > > diff --git a/block/blk-core.c b/block/blk-core.c
>> > > index 90e1ffd..22529a3 100644
>> > > --- a/block/blk-core.c
>> > > +++ b/block/blk-core.c
>> > > @@ -363,6 +363,7 @@ void blk_cleanup_queue(struct request_queue *q)
>> > >   blk_sync_queue(q);
>> > >
>> > >   del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
>> > > + del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
>> > >   mutex_lock(&q->sysfs_lock);
>> > >   queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>> > >   mutex_unlock(&q->sysfs_lock);
>> > > --
>> > > 1.7.2.3
>> > >
>> >
>> > No, bug still present. Stack trace is the same and I double checked that
>> > it was the new kernel (this time with a lot more debug enabled).
>>
>> Thanks for test.
>> I'll try to reproduce this bug.
>
> Probably this will "fix" it:

Frank,

Does Andrew's patch help?

Lin Ming

>
> --- a/block/blk-sysfs.c~a
> +++ a/block/blk-sysfs.c
> @@ -4,6 +4,7 @@
>  #include <linux/kernel.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <linux/timer.h>
>  #include <linux/bio.h>
>  #include <linux/blkdev.h>
>  #include <linux/blktrace_api.h>
> @@ -486,7 +487,7 @@ static void blk_release_queue(struct kob
>                __blk_queue_free_tags(q);
>
>        blk_trace_shutdown(q);
> -
> +       del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
>        bdi_destroy(&q->backing_dev_info);
>        kmem_cache_free(blk_requestq_cachep, q);
>  }
> _
>
> Jens, can you please take a look at this regression?
> blk_release_queue() is freeing a pending timer.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-09 14:21               ` Lin Ming
@ 2011-09-10  8:41                 ` Frank van Maarseveen
  0 siblings, 0 replies; 11+ messages in thread
From: Frank van Maarseveen @ 2011-09-10  8:41 UTC (permalink / raw)
  To: Lin Ming
  Cc: Andrew Morton, Lin Ming, linux-kernel@vger.kernel.org, Jens Axboe

On Fri, Sep 09, 2011 at 10:21:36PM +0800, Lin Ming wrote:
> On Thu, Sep 8, 2011 at 5:30 AM, Andrew Morton <akpm@linux-foundation.org> wrote:

(...)

> >
> > Probably this will "fix" it:
> 
> Frank,
> 
> Does Andrew's patch help?
> 
> Lin Ming
> 
> >
> > --- a/block/blk-sysfs.c~a
> > +++ a/block/blk-sysfs.c
> > @@ -4,6 +4,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/slab.h>
> >  #include <linux/module.h>
> > +#include <linux/timer.h>
> >  #include <linux/bio.h>
> >  #include <linux/blkdev.h>
> >  #include <linux/blktrace_api.h>
> > @@ -486,7 +487,7 @@ static void blk_release_queue(struct kob
> >                __blk_queue_free_tags(q);
> >
> >        blk_trace_shutdown(q);
> > -
> > +       del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
> >        bdi_destroy(&q->backing_dev_info);
> >        kmem_cache_free(blk_requestq_cachep, q);
> >  }
> > _
> >

I have no physical access to the machines showing the issue for a day
or 10 and it appears to be not very reproducable on my own machines:
I got only one trace so far (with a pristine 3.0.4 tree). I try to make
it more reproducable before trying out the above patch. So, it will take
some time before I have conclusive information on this.

-- 
Frank

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3.0.3 kernel BUG at kernel/timer.c:1035
  2011-09-07 21:30             ` Andrew Morton
  2011-09-09 14:21               ` Lin Ming
@ 2011-09-20 13:16               ` Frank van Maarseveen
  1 sibling, 0 replies; 11+ messages in thread
From: Frank van Maarseveen @ 2011-09-20 13:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Lin Ming, linux-kernel@vger.kernel.org, Jens Axboe

On Wed, Sep 07, 2011 at 02:30:06PM -0700, Andrew Morton wrote:
> On Wed, 07 Sep 2011 20:36:19 +0800
> Lin Ming <ming.m.lin@intel.com> wrote:
> 
> > On Wed, 2011-09-07 at 18:24 +0800, Frank van Maarseveen wrote:
> > > On Tue, Sep 06, 2011 at 10:48:38PM +0800, Lin Ming wrote:
> > > > Does below patch help?
> > > > 
> > > > >From a98b874437f871d5ecc3f6fe409b2b474b1f2731 Mon Sep 17 00:00:00 2001
> > > > From: Lin Ming <ming.m.lin@intel.com>
> > > > Date: Tue, 6 Sep 2011 22:45:43 +0800
> > > > Subject: [PATCH] block: delete bdi writeback wakup_timer in blk_cleanup_queue()
> > > > 
> > > > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> > > > ---
> > > >  block/blk-core.c |    1 +
> > > >  1 files changed, 1 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > index 90e1ffd..22529a3 100644
> > > > --- a/block/blk-core.c
> > > > +++ b/block/blk-core.c
> > > > @@ -363,6 +363,7 @@ void blk_cleanup_queue(struct request_queue *q)
> > > >  	blk_sync_queue(q);
> > > >  
> > > >  	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
> > > > +	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
> > > >  	mutex_lock(&q->sysfs_lock);
> > > >  	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
> > > >  	mutex_unlock(&q->sysfs_lock);
> > > > -- 
> > > > 1.7.2.3
> > > > 
> > > 
> > > No, bug still present. Stack trace is the same and I double checked that
> > > it was the new kernel (this time with a lot more debug enabled).
> > 
> > Thanks for test.
> > I'll try to reproduce this bug.
> 
> Probably this will "fix" it:
> 
> --- a/block/blk-sysfs.c~a
> +++ a/block/blk-sysfs.c
> @@ -4,6 +4,7 @@
>  #include <linux/kernel.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <linux/timer.h>
>  #include <linux/bio.h>
>  #include <linux/blkdev.h>
>  #include <linux/blktrace_api.h>
> @@ -486,7 +487,7 @@ static void blk_release_queue(struct kob
>  		__blk_queue_free_tags(q);
>  
>  	blk_trace_shutdown(q);
> -
> +	del_timer_sync(&q->backing_dev_info.wb.wakeup_timer);
>  	bdi_destroy(&q->backing_dev_info);
>  	kmem_cache_free(blk_requestq_cachep, q);
>  }
> _
> 
> Jens, can you please take a look at this regression? 
> blk_release_queue() is freeing a pending timer.

Yep, this fixes it. This is the recipe I used for triggering the issue
on 3.0.4 (it can probably be simplified):

-	mount an ext[34] formatted USB stick read-write on /mnt, preloaded
	with a (64k) file "bar" in subdirectory "foo".
-	cat /mnt/foo/bar >/dev/null
-	sleep 30	# or more
-	unplug USB stick
-	issue an "umount -l -f /mnt"

After playing with the 30 second delay parameter (to get more details) the
kernel somehow ended in a state where the bug was no longer reproducable.
A reboot made the recipe work again.


Thanks,

-- 
Frank

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-09-20 13:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-24 13:02 3.0.3 kernel BUG at kernel/timer.c:1035 Frank van Maarseveen
2011-08-30  9:08 ` Frank van Maarseveen
2011-09-02  8:10 ` Andrew Morton
2011-09-05 12:38   ` Frank van Maarseveen
     [not found]     ` <CAF1ivSYui_=tHbxHiy15a9wfiphHcpoY+J1MmAJ=dMQsAfEVLw@mail.gmail.com>
2011-09-06 14:48       ` Lin Ming
2011-09-07 10:24         ` Frank van Maarseveen
2011-09-07 12:36           ` Lin Ming
2011-09-07 21:30             ` Andrew Morton
2011-09-09 14:21               ` Lin Ming
2011-09-10  8:41                 ` Frank van Maarseveen
2011-09-20 13:16               ` Frank van Maarseveen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox