From: pv@relay.sgi.com (steiner@sgi.com) Subject: BUG 881594 - Deadlock in perfmon - pfm_fetch_regs() To: raybry@sgi.com, steiner@sgi.com Status: X-Mozilla-Status: 8001 X-Mozilla-Status2: 00000000 X-UIDL: 3bd895570000681c View Incident: http://co-op.engr.sgi.com/BugWorks/code/bwxquery.cgi?search=Search&wlong=1&view_type=Bug&wi=881594 Submitter : steiner Submitter Domain : sgi.com Assigned Engineer : raybry Assigned Engineer Domain : sgi.com Assigned Group : linux-mckinley Category : software Reported by Customer : F Priority : 2 Project : snlinux Status : open Description : Ferarri hung this morning running a mixture of 0xe000003068b40000 00024267 00024266 0 006 stop 0xe000003068b407d0 code3 0xe00000305c608000 00024273 00024250 0 000 stop 0xe00000305c6087d0 pfmon 0xe00000300c738000 00024275 00010688 0 000 stop 0xe00000300c7387d0 go.bottle 0xe000003051018000 00024278 00010688 0 000 stop 0xe0000030510187d0 go.bottle 0xe000003020058000 00024281 00010688 0 000 stop 0xe0000030200587d0 go.bottle 0xe00000306c6e8000 00024284 00010688 0 006 stop 0xe00000306c6e87d0 go.bottle 0xe000003049098000 00024287 00010688 0 000 stop 0xe0000030490987d0 go.bottle 0xe000003021558000 00024274 00024273 0 000 stop 0xe0000030215587d0 code3 0xe00001b030900000 00024295 00024284 0 006 stop 0xe00001b0309007d0 pfmon 0xe00001b03ad30000 00024296 00024295 0 005 stop 0xe00001b03ad307d0 code3 0xe00000302a240000 00024297 00024278 0 000 stop 0xe00000302a2407d0 pfmon 0xe000003028980000 00024298 00024275 0 000 stop 0xe0000030289807d0 pfmon From the leds, it appeared that cpu 2 & 3 were hard hung & not processing interrupts. I nmi'ed the system. Cpu 2 was hung here (I think this is right - 90% confidence. Someone reset the system before I finished digging out the info I needed): 1 99 1 smp_call_function_single 7 ?? 2 pfm_fetch_regs 8 ?? 3 pfm_load_regs 9 ?? 4 ia64_load_extra 10 ?? 5 __switch_to 11 ?? 6 switch_to 12 ?? 7 context_switch 13 ?? 8 schedule cpu 2 was in the function smp_call_function_single spinnning with interrupts disabled trying to lock call_lock. Cpu 3 was hung the same way that cpu was hung. Another cpu was holding the call_lock & was waiting for cpu 2 to respond to an IPI. Since cpu 2 was spinning with interrupts disabled, it was not responding. The cpu holding the lock was here: 0xe002000000045af0 smp_call_function+0x470 0xe002000000045350 smp_flush_tlb_all+0x30 0xe002000000051550 flush_tlb_range+0x50 0xe002000000125090 swap_out+0x9f0 0xe002000000126350 shrink_cache+0xb70 0xe0020000001269e0 shrink_caches+0x100 0xe002000000126ad0 try_to_free_pages+0x70 0xe002000000128b40 balance_classzone+0xe0 0xe0020000001295c0 __alloc_pages+0x420 0xe0020000001297c0 __get_free_pages+0xc0 0xe002000000120260 kmem_cache_grow+0x280 0xe002000000121580 kmem_cache_alloc+0x460 0xe00200000033fc60 kmem_zone_zalloc+0xa0 0xe0020000002d5980 xfs_efd_init+0x80 0xe00200000030e030 xfs_trans_get_efd+0x30 0xe00200000029cf40 xfs_bmap_finish+0x1a0 0xe0020000002e5cb0 xfs_itruncate_finish+0x2d0 0xe002000000318640 xfs_inactive+0x5e0 0xe00200000033ea20 vn_rele+0x140 0xe00200000033c9f0 linvfs_clear_inode+0x30 0xe0020000001741d0 clear_inode+0x370 0xe002000000175d70 iput+0x4b0 0xe002000000170160 d_delete+0x180 0xe00200000015c690 vfs_unlink+0x650 0xe00200000015c930 sys_unlink+0x210 0xe00200000000ea00 ia64_ret_from_syscall I dont believe that pfm_fetch_regs should be calling smp_call_function_single unless interrupts are enabled.