Hi everyone: A posix timer race condition is found in current kernel source tree. Jeremy has actually reported the same problem. I write a simple stress test program for posix timer subsystem, to reproduce the problem in the lastest mainline kernel. My test program creates 200 threads, and each thread does the following job: while (1) { timer_create() timer_settime() sleep a while timer_delete() } Please see my test program in the attachemnt "posix_timer_test.c". You can compile my test program via the following command line: gcc -static -o posix_timer_test.c posix_timer_test.c -lrt -lpthread For my testing environment, you can refer to the three attachment files: "dmesg.txt", "cpuinfo.txt", "config.txt" In the pristine Linux-2.6.23-rc3, we get the following oops message: slab error in cache_alloc_debugcheck_after(): cache `sigqueue': double free, or memory outside object was overwritten [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] __slab_error+0x26/0x28 [] cache_alloc_debugcheck_after+0x134/0x204 [] kmem_cache_alloc+0x5a/0xac [] __sigqueue_alloc+0x25/0x62 [] sigqueue_alloc+0x15/0x1f [] sys_timer_create+0x3d/0x2d8 [] syscall_call+0x7/0xb ======================= dcdcd000: redzone 1:0xd84156c5635688c0, redzone 2:0xd84156c5635688c0 slab error in verify_redzone_free(): cache `sigqueue': double free detected [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] __slab_error+0x26/0x28 [] cache_free_debugcheck+0x1d9/0x298 [] kmem_cache_free+0x66/0xb5 [] __sigqueue_free+0x2f/0x32 [] __dequeue_signal+0xdc/0x174 [] dequeue_signal+0xbb/0x149 [] sys_rt_sigtimedwait+0x7f/0x240 [] syscall_call+0x7/0xb ======================= dd7f7000: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b. BUG: unable to handle kernel paging request at virtual address dd7f7f6c printing eip: c012abdb *pde = 00075067 *pte = 1d7f7000 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00010086 (2.6.23-rc3-g2a677896 #1) EIP is at posix_timer_event+0x14/0xa0 eax: 00000000 ebx: dcc1e900 ecx: 00000020 edx: 00000003 esi: dcc1e938 edi: dd7f7f6c ebp: dff5fe50 esp: dff5fe48 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process swapper (pid: 0, ti=dff5e000 task=dff4eac0 task.ti=dff5e000) Stack: c012b1a7 dcc1e900 dff5fe7c c012b1e0 00000001 c140ef50 dcc1e938 00000002 00000217 dcc1e908 c012b1a7 dcc1e938 c140ef50 dff5febc c012e3a5 00000000 dcc1e95c 00000000 e32a8edd ef84ee03 2e028890 000000f1 c140ef20 00000001 Call Trace: [] show_trace_log_lvl+0x1a/0x30 [] show_stack_log_lvl+0xa5/0xca [] show_registers+0x21b/0x391 [] die+0x121/0x25e [] do_page_fault+0x354/0x627 [] error_code+0x72/0x78 [] posix_timer_fn+0x39/0x94 [] hrtimer_run_queues+0x150/0x181 [] run_timer_softirq+0x1d/0x1a9 [] __do_softirq+0x71/0xe0 [] do_softirq+0x3f/0x41 [] irq_exit+0x48/0x4a [] smp_apic_timer_interrupt+0x5d/0x89 [] apic_timer_interrupt+0x28/0x30 [] cpu_idle+0x67/0x90 [] start_secondary+0x157/0x15e [<00000000>] _stext+0x3fefff50/0x19 ======================= Code: 89 44 24 04 c7 04 24 bc 53 33 c0 e8 85 05 ff ff 83 c4 08 5e 5f 5d c3 55 89 e5 57 53 89 c3 31 c0 b9 20 00 00 00 8b 7b 34 83 c7 0c ab 8b 43 34 89 50 24 8b 53 34 8b 43 28 89 42 0c 8b 43 34 c7 EIP: [] posix_timer_event+0x14/0xa0 SS:ESP 0068:dff5fe48 Kernel panic - not syncing: Fatal exception in interrupt And I also apply the four patches from Oleg Nesterov from lkml: http://lkm.org/lkml/2007/8/12/193 http://lkm.org/lkml/2007/8/12/194 http://lkm.org/lkml/2007/8/12/195 http://lkm.org/lkml/2007/8/12/196 After about ten hours, the kernel still panic. Here is its oops message: slab error in verify_redzone_free(): cache `sigqueue': double free detected [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] __slab_error+0x26/0x28 [] cache_free_debugcheck+0x1d9/0x298 [] kmem_cache_free+0x66/0xb5 [] __sigqueue_free+0x2f/0x32 [] __dequeue_signal+0xdc/0x174 [] dequeue_signal+0x25/0x156 [] sys_rt_sigtimedwait+0x7f/0x240 [] syscall_call+0x7/0xb ======================= df839000: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b. BUG: unable to handle kernel paging request at virtual address df839f68 printing eip: c0124af3 *pde = 0007e067 *pte = 1f839000 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00010246 (2.6.23-rc3-g2a677896-dirty #2) EIP is at sigqueue_free+0x7/0x6f eax: df839f60 ebx: df839f60 ecx: df3b0000 edx: 00000000 esi: dd3c7120 edi: 00000213 ebp: df3b1e24 esp: df3b1e1c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process posix_timer_tes (pid: 1110, ti=df3b0000 task=dd77dac0 task.ti=df3b0000) Stack: 00000213 dd3c7120 df3b1e34 c012a8c2 dd3c7120 dd3c7128 df3b1e50 c012a9ce df755e48 df755e88 dd77df9c dd77dac0 00000000 df3b1e9c c011dab0 00000000 00000000 00000000 dd77df5c df3b1e8c c012307d 00000008 00000002 dd77df44 Call Trace: [] show_trace_log_lvl+0x1a/0x30 [] show_stack_log_lvl+0xa5/0xca [] show_registers+0x21b/0x391 [] die+0x121/0x25e [] do_page_fault+0x354/0x627 [] error_code+0x72/0x78 [] release_posix_timer+0x13/0x6c [] exit_itimers+0xb3/0xe7 [] do_exit+0x579/0x7d4 [] do_group_exit+0x29/0x70 [] get_signal_to_deliver+0x282/0x43e [] do_notify_resume+0x8b/0x767 [] work_notifysig+0x13/0x19 ======================= Code: 1c 00 5b 5d c3 55 89 e5 64 a1 00 a0 3c c0 31 c9 ba d0 00 00 00 e8 8d e8 ff ff 85 c0 74 04 83 48 08 01 5d c3 55 89 e5 56 53 89 c3 40 08 01 74 13 3b 00 75 13 83 63 08 fe 89 d8 e8 16 e4 ff ff EIP: [] sigqueue_free+0x7/0x6f SS:ESP 0068:df3b1e1c Fixing recursive fault but reboot is needed! Any help are appreciated. Best regards yue.tao