* 3.9-rc2 xfs panic
[not found] <1868681549.12603593.1363061997919.JavaMail.root@redhat.com>
@ 2013-03-12 4:32 ` CAI Qian
2013-03-12 6:07 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: CAI Qian @ 2013-03-12 4:32 UTC (permalink / raw)
To: xfs
Just came across when running xfstests using 3.9-rc2 kernel on a power7
box with addition of this patch which fixed a known issue,
http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
The log shows it was happened around test case 370 with
TEST_PARAM_BLKSIZE = 2048
Some more information:
xfsprogs version = 3.1.10
number of CPUs = 32
Swap Size = 4047 MB
Mem Size = 4046 M
Still reproducing and bisecting, so this is just a head-up to see if
helps.
CAI Qian
[31797.113368] XFS (loop1): xfs_trans_ail_delete_bulk: attempting to delete a log item that is not in the AIL
[31797.113383] XFS (loop1): xfs_do_force_shutdown(0x2) called from line 743 of file fs/xfs/xfs_trans_ail.c. Return address = 0xd000000000f22838
[31797.113430] Buffer I/O error on device loop1, logical block 66965
[31797.113440] lost page write due to I/O error on loop1
[31797.113446] Buffer I/O error on device loop1, logical block 66966
[31797.113450] lost page write due to I/O error on loop1
[31797.113456] Buffer I/O error on device loop1, logical block 66967
[31797.113461] lost page write due to I/O error on loop1
[31797.113466] Buffer I/O error on device loop1, logical block 66968
[31797.113472] lost page write due to I/O error on loop1
[31797.113477] Buffer I/O error on device loop1, logical block 66969
[31797.113482] lost page write due to I/O error on loop1
[31797.113881] XFS (loop1): Log I/O Error Detected. Shutting down filesystem
[31797.113887] XFS (loop1): Please umount the filesystem and rectify the problem(s)
[31797.114000] XFS (loop1): metadata I/O error: block 0xc0002 ("xfs_trans_read_buf_map") error 5 numblks 1
[31797.114048] XFS (loop1): metadata I/O error: block 0xc0002 ("xfs_trans_read_buf_map") error 5 numblks 1
[31797.150364] Buffer I/O error on device loop1, logical block 33940
[31797.150380] lost page write due to I/O error on loop1
[31797.150389] Buffer I/O error on device loop1, logical block 33941
[31797.150395] lost page write due to I/O error on loop1
[31797.150403] Buffer I/O error on device loop1, logical block 33942
[31797.150408] lost page write due to I/O error on loop1
[31797.150415] Buffer I/O error on device loop1, logical block 33943
[31797.150421] lost page write due to I/O error on loop1
[31797.150429] Buffer I/O error on device loop1, logical block 33944
[31797.150436] lost page write due to I/O error on loop1
[-- MARK -- Mon Mar 11 14:15:00 2013]
[31817.159550] XFS (loop1): xfs_log_force: error 5 returned.
[31817.166204] XFS (loop1): xfs_log_force: error 5 returned.
[31817.166551] XFS (loop1): xfs_log_force: error 5 returned.
[31817.508411] XFS (loop0): Mounting Filesystem
[31817.566235] XFS (loop0): Ending clean mount
[31819.094713] XFS (loop0): Mounting Filesystem
[31819.152248] XFS (loop0): Ending clean mount
[31819.348238] XFS (loop1): Mounting Filesystem
[31819.349879] XFS (loop1): Ending clean mount
[31819.561366] XFS (loop0): Mounting Filesystem
[31819.616607] XFS (loop0): Ending clean mount
[31819.990833] XFS (loop1): Mounting Filesystem
[31819.992652] XFS (loop1): Ending clean mount
[31819.992768] XFS (loop1): Quotacheck needed: Please wait.
[31820.051134] XFS (loop1): Quotacheck: Done.
[31832.534868] Unable to handle kernel paging request for data at address 0x5841474900000001
[31832.534881] Faulting instruction address: 0xc0000000001f8070
[31832.534888] Oops: Kernel access of bad area, sig: 11 [#1]
[31832.534891] SMP NR_CPUS=1024 NUMA pSeries
[31832.534899] Modules linked in: tun(F) binfmt_misc(F) hidp(F) cmtp(F) kernelcapi(F) rfcomm(F) l2tp_ppp(F) l2tp_netlink(F) l2tp_core(F) bnep(F) nfc(F) af_802154(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) sctp(F) ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nfsv2(F) nfs(F) dns_resolver(F) lockd(F) sunrpc(F) fscache(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) fuse(F) sg(F) ibmveth(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: ipt_REJECT]
[31832.534978] NIP: c0000000001f8070 LR: c000000000192f6c CTR: c000000000192f50
[31832.534984] REGS: c0000000f1c125f0 TRAP: 0300 Tainted: GF W (3.9.0-rc2+)
[31832.534989] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24022024 XER: 20000001
[31832.535003] SOFTE: 0
[31832.535006] CFAR: c000000000005f1c
[31832.535009] DAR: 5841474900000001, DSISR: 40000000
[31832.535013] TASK = c00000003f0111c0[16795] 'loop1' THREAD: c0000000f1c10000 CPU: 30
GPR00: c000000000192f6c c0000000f1c12870 c0000000010f3a48 c0000000fe015a00
GPR04: 0000000000011220 0000000000000080 00000000000f3aaf c0000000018d5840
GPR08: 0000000000000000 0000000000000000 0000000000000000 c0000000004e3300
GPR12: 0000000044024024 c00000000f247800 c0000000010d01b0 0000000000000000
GPR16: 0000000000000001 0000000000000000 c0000000009d9020 c0000000009d9060
GPR20: c0000000009d9048 0000000000000020 000000000000007f 0000000000000000
GPR24: 0000000000000fe0 c0000000010d1020 c0000000fe015a00 0000000000000000
GPR28: c000000000192f6c 0000000000011220 5841474900000001 c0000000fe015a00
[31832.535086] NIP [c0000000001f8070] .kmem_cache_alloc+0xb0/0x2d0
[31832.535092] LR [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
[31832.535096] Call Trace:
[31832.535101] [c0000000f1c12870] [0000000000016ac3] 0x16ac3 (unreliable)
[31832.535108] [c0000000f1c12920] [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
[31832.535114] [c0000000f1c12990] [c000000000193108] .mempool_alloc+0x88/0x1c0
[31832.535122] [c0000000f1c12a80] [c0000000004e1824] .scsi_sg_alloc+0x64/0xc0
[31832.535129] [c0000000f1c12af0] [c0000000003e09f8] .__sg_alloc_table+0xa8/0x190
[31832.535135] [c0000000f1c12bc0] [c0000000004e15f0] .scsi_alloc_sgtable+0x40/0x90
[31832.535142] [c0000000f1c12c40] [c0000000004e1668] .scsi_init_sgtable+0x28/0x90
[31832.535148] [c0000000f1c12cc0] [c0000000004e19e0] .scsi_init_io+0x40/0x1a0
[31832.535157] [c0000000f1c12d60] [d000000000c02e78] .sd_prep_fn+0x128/0xac0 [sd_mod]
[31832.535164] [c0000000f1c12e20] [c0000000003a611c] .blk_peek_request+0xfc/0x2d0
[31832.535171] [c0000000f1c12eb0] [c0000000004e2c08] .scsi_request_fn+0xb8/0x6d0
[31832.535178] [c0000000f1c12fa0] [c00000000039d7c0] .__blk_run_queue+0x50/0x80
[31832.535184] [c0000000f1c13020] [c0000000003a2184] .queue_unplugged+0xe4/0x100
[31832.535190] [c0000000f1c130c0] [c0000000003a67d8] .blk_flush_plug_list+0x248/0x2e0
[31832.535197] [c0000000f1c13180] [c0000000003a6bcc] .blk_queue_bio+0x2fc/0x490
[31832.535203] [c0000000f1c13230] [c0000000003a436c] .generic_make_request+0x11c/0x180
[31832.535210] [c0000000f1c132c0] [c0000000003a4484] .submit_bio+0xb4/0x1e0
[31832.535245] [c0000000f1c13380] [d000000000eaffa0] .xfs_submit_ioend_bio.isra.10+0x70/0x90 [xfs]
[31832.535286] [c0000000f1c133f0] [d000000000eb00f0] .xfs_submit_ioend+0x130/0x190 [xfs]
[31832.535343] [c0000000f1c134a0] [d000000000eb045c] .xfs_vm_writepage+0x30c/0x670 [xfs]
[31832.535349] [c0000000f1c135d0] [c00000000019d050] .__writepage+0x30/0x90
[31832.535356] [c0000000f1c13650] [c00000000019d728] .write_cache_pages+0x208/0x4f0
[31832.535362] [c0000000f1c137e0] [c00000000019da5c] .generic_writepages+0x4c/0xa0
[31832.535395] [c0000000f1c138a0] [d000000000eaea10] .xfs_vm_writepages+0x60/0x90 [xfs]
[31832.535411] [c0000000f1c13930] [c00000000019ee7c] .do_writepages+0x3c/0x70
[31832.535424] [c0000000f1c139a0] [c0000000001914b8] .__filemap_fdatawrite_range+0x68/0x80
[31832.535430] [c0000000f1c13a40] [c000000000191610] .filemap_write_and_wait_range+0x70/0xc0
[31832.535463] [c0000000f1c13ad0] [d000000000eb7970] .xfs_file_fsync+0x60/0x250 [xfs]
[31832.535479] [c0000000f1c13b90] [c00000000024c278] .vfs_fsync+0x48/0x70
[31832.535497] [c0000000f1c13c00] [c0000000004d299c] .loop_thread+0x3ec/0x5b0
[31832.535503] [c0000000f1c13d30] [c0000000000b58c8] .kthread+0xe8/0xf0
[31832.535510] [c0000000f1c13e30] [c000000000009f64] .ret_from_kernel_thread+0x64/0x80
[31832.535516] Instruction dump:
[31832.535519] 4be17f11 60000000 e93f0000 e94d0040 7ce95214 e8c70008 7fc9502a 2fbe0000
[31832.535532] 41de01fc e95f0022 e93f0000 79290720 <7f3e502a> 0b090000 0b1b0000 39200000
[31832.535557] ---[ end trace 8bfd449aae38f917 ]---
[31832.536965]
[31892.635116] INFO: rcu_sched detected stalls on CPUs/tasks: { 30 31} (detected by 28, t=6002 jiffies, g=390021, c=390020, q=417)
[31892.635141] Task dump for CPU 30:
[31892.635146] loop1 R running task 0 16795 2 0x00000884
[31892.635155] Call Trace:
[31892.635176] [c0000000f1c12f10] [0000000000000001] 0x1 (unreliable)
[31892.635183] Task dump for CPU 31:
[31892.635187] swapper/31 R running task 0 0 1 0x00000804
[31892.635194] Call Trace:
[32072.684667] INFO: rcu_sched detected stalls on CPUs/tasks: { 30 31} (detected by 28, t=24007 jiffies, g=390021, c=390020, q=580)
[32072.684694] Task dump for CPU 30:
[32072.684700] loop1 R running task 0 16795 2 0x00000884
[32072.684708] Call Trace:
[32072.684729] [c0000000f1c12f10] [0000000000000001] 0x1 (unreliable)
[32072.684737] Task dump for CPU 31:
[32072.684741] swapper/31 R running task 0 0 1 0x00000804
[32072.684748] Call Trace:
[-- MARK -- Mon Mar 11 14:20:00 2013]
[32252.734223] INFO: rcu_sched detected stalls on CPUs/tasks: { 30 31} (detected by 22, t=42012 jiffies, g=390021, c=390020, q=619)
[32252.734248] Task dump for CPU 30:
[32252.734254] loop1 R running task 0 16795 2 0x00000884
[32252.734262] Call Trace:
[32252.734282] [c0000000f1c12f10] [0000000000000001] 0x1 (unreliable)
[32252.734289] Task dump for CPU 31:
[32252.734293] swapper/31 R running task 0 0 1 0x00000804
[32252.734300] Call Trace:
[-- MARK -- Mon Mar 11 14:25:00 2013]
[32432.783781] INFO: rcu_sched detected stalls on CPUs/tasks: { 30 31} (detected by 28, t=60017 jiffies, g=390021, c=390020, q=630)
[32432.783809] Task dump for CPU 30:
[32432.783815] loop1 R running task 0 16795 2 0x00000884
[32432.783823] Call Trace:
[32432.783845] [c0000000f1c12f10] [0000000000000001] 0x1 (unreliable)
[32432.783852] Task dump for CPU 31:
[32432.783856] swapper/31 R running task 0 0 1 0x00000804
[32432.783864] Call Trace:
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 4:32 ` 3.9-rc2 xfs panic CAI Qian
@ 2013-03-12 6:07 ` Dave Chinner
2013-03-12 6:34 ` CAI Qian
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2013-03-12 6:07 UTC (permalink / raw)
To: CAI Qian; +Cc: xfs
On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> Just came across when running xfstests using 3.9-rc2 kernel on a power7
> box with addition of this patch which fixed a known issue,
> http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
>
> The log shows it was happened around test case 370 with
> TEST_PARAM_BLKSIZE = 2048
That doesn't sound like xfstests. it only has 305 tests, and no
parameters like TEST_PARAM_BLKSIZE....
> Some more information:
> xfsprogs version = 3.1.10
> number of CPUs = 32
> Swap Size = 4047 MB
> Mem Size = 4046 M
>
> Still reproducing and bisecting, so this is just a head-up to see if
> helps.
>
> CAI Qian
>
> [31797.113368] XFS (loop1): xfs_trans_ail_delete_bulk: attempting to delete a log item that is not in the AIL
> [31797.113383] XFS (loop1): xfs_do_force_shutdown(0x2) called from line 743 of file fs/xfs/xfs_trans_ail.c. Return address = 0xd000000000f22838
Shutdown for an in-memory problem of some kind....
> [31817.508411] XFS (loop0): Mounting Filesystem
> [31817.566235] XFS (loop0): Ending clean mount
> [31819.094713] XFS (loop0): Mounting Filesystem
> [31819.152248] XFS (loop0): Ending clean mount
> [31819.348238] XFS (loop1): Mounting Filesystem
> [31819.349879] XFS (loop1): Ending clean mount
> [31819.561366] XFS (loop0): Mounting Filesystem
> [31819.616607] XFS (loop0): Ending clean mount
> [31819.990833] XFS (loop1): Mounting Filesystem
> [31819.992652] XFS (loop1): Ending clean mount
> [31819.992768] XFS (loop1): Quotacheck needed: Please wait.
> [31820.051134] XFS (loop1): Quotacheck: Done.
> [31832.534868] Unable to handle kernel paging request for data at address 0x5841474900000001
And after remounting the filesystemi a couple of times, it's tried
to follow an AGI buffer header (magic # XAGI, seqno = 1) as though
it was a pointer. I can't think of why that would be
executed....
> [31832.534881] Faulting instruction address: 0xc0000000001f8070
> [31832.534888] Oops: Kernel access of bad area, sig: 11 [#1]
> [31832.534891] SMP NR_CPUS=1024 NUMA pSeries
> [31832.534899] Modules linked in: tun(F) binfmt_misc(F) hidp(F) cmtp(F) kernelcapi(F) rfcomm(F) l2tp_ppp(F) l2tp_netlink(F) l2tp_core(F) bnep(F) nfc(F) af_802154(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) sctp(F) ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nfsv2(F) nfs(F) dns_resolver(F) lockd(F) sunrpc(F) fscache(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) fuse(F) sg(F) ibmveth(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F) ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: ipt_REJECT]
> [31832.534978] NIP: c0000000001f8070 LR: c000000000192f6c CTR: c000000000192f50
> [31832.534984] REGS: c0000000f1c125f0 TRAP: 0300 Tainted: GF W (3.9.0-rc2+)
> [31832.534989] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 24022024 XER: 20000001
> [31832.535003] SOFTE: 0
> [31832.535006] CFAR: c000000000005f1c
> [31832.535009] DAR: 5841474900000001, DSISR: 40000000
> [31832.535013] TASK = c00000003f0111c0[16795] 'loop1' THREAD: c0000000f1c10000 CPU: 30
> GPR00: c000000000192f6c c0000000f1c12870 c0000000010f3a48 c0000000fe015a00
> GPR04: 0000000000011220 0000000000000080 00000000000f3aaf c0000000018d5840
> GPR08: 0000000000000000 0000000000000000 0000000000000000 c0000000004e3300
> GPR12: 0000000044024024 c00000000f247800 c0000000010d01b0 0000000000000000
> GPR16: 0000000000000001 0000000000000000 c0000000009d9020 c0000000009d9060
> GPR20: c0000000009d9048 0000000000000020 000000000000007f 0000000000000000
> GPR24: 0000000000000fe0 c0000000010d1020 c0000000fe015a00 0000000000000000
> GPR28: c000000000192f6c 0000000000011220 5841474900000001 c0000000fe015a00
> [31832.535086] NIP [c0000000001f8070] .kmem_cache_alloc+0xb0/0x2d0
> [31832.535092] LR [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
> [31832.535096] Call Trace:
> [31832.535101] [c0000000f1c12870] [0000000000016ac3] 0x16ac3 (unreliable)
> [31832.535108] [c0000000f1c12920] [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
> [31832.535114] [c0000000f1c12990] [c000000000193108] .mempool_alloc+0x88/0x1c0
> [31832.535122] [c0000000f1c12a80] [c0000000004e1824] .scsi_sg_alloc+0x64/0xc0
> [31832.535129] [c0000000f1c12af0] [c0000000003e09f8] .__sg_alloc_table+0xa8/0x190
> [31832.535135] [c0000000f1c12bc0] [c0000000004e15f0] .scsi_alloc_sgtable+0x40/0x90
> [31832.535142] [c0000000f1c12c40] [c0000000004e1668] .scsi_init_sgtable+0x28/0x90
> [31832.535148] [c0000000f1c12cc0] [c0000000004e19e0] .scsi_init_io+0x40/0x1a0
> [31832.535157] [c0000000f1c12d60] [d000000000c02e78] .sd_prep_fn+0x128/0xac0 [sd_mod]
> [31832.535164] [c0000000f1c12e20] [c0000000003a611c] .blk_peek_request+0xfc/0x2d0
> [31832.535171] [c0000000f1c12eb0] [c0000000004e2c08] .scsi_request_fn+0xb8/0x6d0
> [31832.535178] [c0000000f1c12fa0] [c00000000039d7c0] .__blk_run_queue+0x50/0x80
> [31832.535184] [c0000000f1c13020] [c0000000003a2184] .queue_unplugged+0xe4/0x100
> [31832.535190] [c0000000f1c130c0] [c0000000003a67d8] .blk_flush_plug_list+0x248/0x2e0
> [31832.535197] [c0000000f1c13180] [c0000000003a6bcc] .blk_queue_bio+0x2fc/0x490
> [31832.535203] [c0000000f1c13230] [c0000000003a436c] .generic_make_request+0x11c/0x180
> [31832.535210] [c0000000f1c132c0] [c0000000003a4484] .submit_bio+0xb4/0x1e0
> [31832.535245] [c0000000f1c13380] [d000000000eaffa0] .xfs_submit_ioend_bio.isra.10+0x70/0x90 [xfs]
> [31832.535286] [c0000000f1c133f0] [d000000000eb00f0] .xfs_submit_ioend+0x130/0x190 [xfs]
> [31832.535343] [c0000000f1c134a0] [d000000000eb045c] .xfs_vm_writepage+0x30c/0x670 [xfs]
> [31832.535349] [c0000000f1c135d0] [c00000000019d050] .__writepage+0x30/0x90
> [31832.535356] [c0000000f1c13650] [c00000000019d728] .write_cache_pages+0x208/0x4f0
> [31832.535362] [c0000000f1c137e0] [c00000000019da5c] .generic_writepages+0x4c/0xa0
> [31832.535395] [c0000000f1c138a0] [d000000000eaea10] .xfs_vm_writepages+0x60/0x90 [xfs]
> [31832.535411] [c0000000f1c13930] [c00000000019ee7c] .do_writepages+0x3c/0x70
> [31832.535424] [c0000000f1c139a0] [c0000000001914b8] .__filemap_fdatawrite_range+0x68/0x80
> [31832.535430] [c0000000f1c13a40] [c000000000191610] .filemap_write_and_wait_range+0x70/0xc0
> [31832.535463] [c0000000f1c13ad0] [d000000000eb7970] .xfs_file_fsync+0x60/0x250 [xfs]
> [31832.535479] [c0000000f1c13b90] [c00000000024c278] .vfs_fsync+0x48/0x70
> [31832.535497] [c0000000f1c13c00] [c0000000004d299c] .loop_thread+0x3ec/0x5b0
> [31832.535503] [c0000000f1c13d30] [c0000000000b58c8] .kthread+0xe8/0xf0
> [31832.535510] [c0000000f1c13e30] [c000000000009f64] .ret_from_kernel_thread+0x64/0x80
So, looks like memory corruption - a corrupted slab, perhaps? Can
you turn on memory poisoning, debugging, etc?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 6:07 ` Dave Chinner
@ 2013-03-12 6:34 ` CAI Qian
2013-03-12 7:46 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: CAI Qian @ 2013-03-12 6:34 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: xfs@oss.sgi.com
> Sent: Tuesday, March 12, 2013 2:07:01 PM
> Subject: Re: 3.9-rc2 xfs panic
>
> On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> > Just came across when running xfstests using 3.9-rc2 kernel on a
> > power7
> > box with addition of this patch which fixed a known issue,
> > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
> >
> > The log shows it was happened around test case 370 with
> > TEST_PARAM_BLKSIZE = 2048
>
> That doesn't sound like xfstests. it only has 305 tests, and no
> parameters like TEST_PARAM_BLKSIZE....
Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was
from an internal wrapper to be used to create new filessytem not from the
original xfstests. Apologize for that, Dave.
>
> > Some more information:
> > xfsprogs version = 3.1.10
> > number of CPUs = 32
> > Swap Size = 4047 MB
> > Mem Size = 4046 M
> >
> > Still reproducing and bisecting, so this is just a head-up to see
> > if
> > helps.
> >
> > CAI Qian
> >
> > [31797.113368] XFS (loop1): xfs_trans_ail_delete_bulk: attempting
> > to delete a log item that is not in the AIL
> > [31797.113383] XFS (loop1): xfs_do_force_shutdown(0x2) called from
> > line 743 of file fs/xfs/xfs_trans_ail.c. Return address =
> > 0xd000000000f22838
>
> Shutdown for an in-memory problem of some kind....
>
> > [31817.508411] XFS (loop0): Mounting Filesystem
> > [31817.566235] XFS (loop0): Ending clean mount
> > [31819.094713] XFS (loop0): Mounting Filesystem
> > [31819.152248] XFS (loop0): Ending clean mount
> > [31819.348238] XFS (loop1): Mounting Filesystem
> > [31819.349879] XFS (loop1): Ending clean mount
> > [31819.561366] XFS (loop0): Mounting Filesystem
> > [31819.616607] XFS (loop0): Ending clean mount
> > [31819.990833] XFS (loop1): Mounting Filesystem
> > [31819.992652] XFS (loop1): Ending clean mount
> > [31819.992768] XFS (loop1): Quotacheck needed: Please wait.
> > [31820.051134] XFS (loop1): Quotacheck: Done.
> > [31832.534868] Unable to handle kernel paging request for data at
> > address 0x5841474900000001
>
> And after remounting the filesystemi a couple of times, it's tried
> to follow an AGI buffer header (magic # XAGI, seqno = 1) as though
> it was a pointer. I can't think of why that would be
> executed....
>
> > [31832.534881] Faulting instruction address: 0xc0000000001f8070
> > [31832.534888] Oops: Kernel access of bad area, sig: 11 [#1]
> > [31832.534891] SMP NR_CPUS=1024 NUMA pSeries
> > [31832.534899] Modules linked in: tun(F) binfmt_misc(F) hidp(F)
> > cmtp(F) kernelcapi(F) rfcomm(F) l2tp_ppp(F) l2tp_netlink(F)
> > l2tp_core(F) bnep(F) nfc(F) af_802154(F) pppoe(F) pppox(F)
> > ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) sctp(F)
> > ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F)
> > btrfs(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F)
> > nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nfsv2(F)
> > nfs(F) dns_resolver(F) lockd(F) sunrpc(F) fscache(F)
> > nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) fuse(F)
> > sg(F) ibmveth(F) xfs(F) libcrc32c(F) sd_mod(F) crc_t10dif(F)
> > ibmvscsi(F) scsi_transport_srp(F) scsi_tgt(F) dm_mirror(F)
> > dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: ipt_REJECT]
> > [31832.534978] NIP: c0000000001f8070 LR: c000000000192f6c CTR:
> > c000000000192f50
> > [31832.534984] REGS: c0000000f1c125f0 TRAP: 0300 Tainted: GF
> > W (3.9.0-rc2+)
> > [31832.534989] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR:
> > 24022024 XER: 20000001
> > [31832.535003] SOFTE: 0
> > [31832.535006] CFAR: c000000000005f1c
> > [31832.535009] DAR: 5841474900000001, DSISR: 40000000
> > [31832.535013] TASK = c00000003f0111c0[16795] 'loop1' THREAD:
> > c0000000f1c10000 CPU: 30
> > GPR00: c000000000192f6c c0000000f1c12870 c0000000010f3a48
> > c0000000fe015a00
> > GPR04: 0000000000011220 0000000000000080 00000000000f3aaf
> > c0000000018d5840
> > GPR08: 0000000000000000 0000000000000000 0000000000000000
> > c0000000004e3300
> > GPR12: 0000000044024024 c00000000f247800 c0000000010d01b0
> > 0000000000000000
> > GPR16: 0000000000000001 0000000000000000 c0000000009d9020
> > c0000000009d9060
> > GPR20: c0000000009d9048 0000000000000020 000000000000007f
> > 0000000000000000
> > GPR24: 0000000000000fe0 c0000000010d1020 c0000000fe015a00
> > 0000000000000000
> > GPR28: c000000000192f6c 0000000000011220 5841474900000001
> > c0000000fe015a00
> > [31832.535086] NIP [c0000000001f8070] .kmem_cache_alloc+0xb0/0x2d0
> > [31832.535092] LR [c000000000192f6c] .mempool_alloc_slab+0x1c/0x30
> > [31832.535096] Call Trace:
> > [31832.535101] [c0000000f1c12870] [0000000000016ac3] 0x16ac3
> > (unreliable)
> > [31832.535108] [c0000000f1c12920] [c000000000192f6c]
> > .mempool_alloc_slab+0x1c/0x30
> > [31832.535114] [c0000000f1c12990] [c000000000193108]
> > .mempool_alloc+0x88/0x1c0
> > [31832.535122] [c0000000f1c12a80] [c0000000004e1824]
> > .scsi_sg_alloc+0x64/0xc0
> > [31832.535129] [c0000000f1c12af0] [c0000000003e09f8]
> > .__sg_alloc_table+0xa8/0x190
> > [31832.535135] [c0000000f1c12bc0] [c0000000004e15f0]
> > .scsi_alloc_sgtable+0x40/0x90
> > [31832.535142] [c0000000f1c12c40] [c0000000004e1668]
> > .scsi_init_sgtable+0x28/0x90
> > [31832.535148] [c0000000f1c12cc0] [c0000000004e19e0]
> > .scsi_init_io+0x40/0x1a0
> > [31832.535157] [c0000000f1c12d60] [d000000000c02e78]
> > .sd_prep_fn+0x128/0xac0 [sd_mod]
> > [31832.535164] [c0000000f1c12e20] [c0000000003a611c]
> > .blk_peek_request+0xfc/0x2d0
> > [31832.535171] [c0000000f1c12eb0] [c0000000004e2c08]
> > .scsi_request_fn+0xb8/0x6d0
> > [31832.535178] [c0000000f1c12fa0] [c00000000039d7c0]
> > .__blk_run_queue+0x50/0x80
> > [31832.535184] [c0000000f1c13020] [c0000000003a2184]
> > .queue_unplugged+0xe4/0x100
> > [31832.535190] [c0000000f1c130c0] [c0000000003a67d8]
> > .blk_flush_plug_list+0x248/0x2e0
> > [31832.535197] [c0000000f1c13180] [c0000000003a6bcc]
> > .blk_queue_bio+0x2fc/0x490
> > [31832.535203] [c0000000f1c13230] [c0000000003a436c]
> > .generic_make_request+0x11c/0x180
> > [31832.535210] [c0000000f1c132c0] [c0000000003a4484]
> > .submit_bio+0xb4/0x1e0
> > [31832.535245] [c0000000f1c13380] [d000000000eaffa0]
> > .xfs_submit_ioend_bio.isra.10+0x70/0x90 [xfs]
> > [31832.535286] [c0000000f1c133f0] [d000000000eb00f0]
> > .xfs_submit_ioend+0x130/0x190 [xfs]
> > [31832.535343] [c0000000f1c134a0] [d000000000eb045c]
> > .xfs_vm_writepage+0x30c/0x670 [xfs]
> > [31832.535349] [c0000000f1c135d0] [c00000000019d050]
> > .__writepage+0x30/0x90
> > [31832.535356] [c0000000f1c13650] [c00000000019d728]
> > .write_cache_pages+0x208/0x4f0
> > [31832.535362] [c0000000f1c137e0] [c00000000019da5c]
> > .generic_writepages+0x4c/0xa0
> > [31832.535395] [c0000000f1c138a0] [d000000000eaea10]
> > .xfs_vm_writepages+0x60/0x90 [xfs]
> > [31832.535411] [c0000000f1c13930] [c00000000019ee7c]
> > .do_writepages+0x3c/0x70
> > [31832.535424] [c0000000f1c139a0] [c0000000001914b8]
> > .__filemap_fdatawrite_range+0x68/0x80
> > [31832.535430] [c0000000f1c13a40] [c000000000191610]
> > .filemap_write_and_wait_range+0x70/0xc0
> > [31832.535463] [c0000000f1c13ad0] [d000000000eb7970]
> > .xfs_file_fsync+0x60/0x250 [xfs]
> > [31832.535479] [c0000000f1c13b90] [c00000000024c278]
> > .vfs_fsync+0x48/0x70
> > [31832.535497] [c0000000f1c13c00] [c0000000004d299c]
> > .loop_thread+0x3ec/0x5b0
> > [31832.535503] [c0000000f1c13d30] [c0000000000b58c8]
> > .kthread+0xe8/0xf0
> > [31832.535510] [c0000000f1c13e30] [c000000000009f64]
> > .ret_from_kernel_thread+0x64/0x80
>
> So, looks like memory corruption - a corrupted slab, perhaps? Can
> you turn on memory poisoning, debugging, etc?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 6:34 ` CAI Qian
@ 2013-03-12 7:46 ` Dave Chinner
2013-03-12 8:04 ` CAI Qian
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Dave Chinner @ 2013-03-12 7:46 UTC (permalink / raw)
To: CAI Qian; +Cc: xfs
On Tue, Mar 12, 2013 at 02:34:07AM -0400, CAI Qian wrote:
>
>
> ----- Original Message -----
> > From: "Dave Chinner" <david@fromorbit.com>
> > To: "CAI Qian" <caiqian@redhat.com>
> > Cc: xfs@oss.sgi.com
> > Sent: Tuesday, March 12, 2013 2:07:01 PM
> > Subject: Re: 3.9-rc2 xfs panic
> >
> > On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> > > Just came across when running xfstests using 3.9-rc2 kernel on a
> > > power7
> > > box with addition of this patch which fixed a known issue,
> > > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
> > >
> > > The log shows it was happened around test case 370 with
> > > TEST_PARAM_BLKSIZE = 2048
> >
> > That doesn't sound like xfstests. it only has 305 tests, and no
> > parameters like TEST_PARAM_BLKSIZE....
> Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was
> from an internal wrapper to be used to create new filessytem not from the
> original xfstests.
OK, so that means you're testing 2k filesystem block size on a 64k
page size machine? Are you running with CONFIG_XFS_DEBUG=y?
> > So, looks like memory corruption - a corrupted slab, perhaps? Can
> > you turn on memory poisoning, debugging, etc?
Does this turn anything up?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 7:46 ` Dave Chinner
@ 2013-03-12 8:04 ` CAI Qian
2013-03-12 10:23 ` Dave Chinner
2013-03-13 2:44 ` CAI Qian
2013-03-28 8:39 ` CAI Qian
2 siblings, 1 reply; 14+ messages in thread
From: CAI Qian @ 2013-03-12 8:04 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: xfs@oss.sgi.com
> Sent: Tuesday, March 12, 2013 3:46:08 PM
> Subject: Re: 3.9-rc2 xfs panic
>
> On Tue, Mar 12, 2013 at 02:34:07AM -0400, CAI Qian wrote:
> >
> >
> > ----- Original Message -----
> > > From: "Dave Chinner" <david@fromorbit.com>
> > > To: "CAI Qian" <caiqian@redhat.com>
> > > Cc: xfs@oss.sgi.com
> > > Sent: Tuesday, March 12, 2013 2:07:01 PM
> > > Subject: Re: 3.9-rc2 xfs panic
> > >
> > > On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> > > > Just came across when running xfstests using 3.9-rc2 kernel on
> > > > a
> > > > power7
> > > > box with addition of this patch which fixed a known issue,
> > > > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
> > > >
> > > > The log shows it was happened around test case 370 with
> > > > TEST_PARAM_BLKSIZE = 2048
> > >
> > > That doesn't sound like xfstests. it only has 305 tests, and no
> > > parameters like TEST_PARAM_BLKSIZE....
> > Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was
> > from an internal wrapper to be used to create new filessytem not
> > from the
> > original xfstests.
>
> OK, so that means you're testing 2k filesystem block size on a 64k
> page size machine?
Looks like so. Would that be a problem?
TEST_PARAM_TEST_DEV not specified; using loopback file
TEST_PARAM_SCRATCH_DEV not specified; using loopback file
meta-data=/dev/loop0 isize=256 agcount=4, agsize=655360 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=2048 blocks=2621440, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=2048 blocks=5120, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
TEST_DEV=/dev/loop0 # device containing TEST PARTITION
TEST_DIR=/mnt/testarea/test # mount point of TEST PARTITION
SCRATCH_DEV=/dev/loop1 # device containing SCRATCH PARTITION
SCRATCH_MNT=/mnt/testarea/scratch # mount point for SCRATCH PARTITION
SCRATCH_LOGDEV= # optional external log for SCRATCH PARTITION
SCRATCH_RTDEV= # optional realtime device for SCRATCH PARTITION
TMPFS_MOUNT_OPTIONS="" # scratch mount options for tmpfs
TEST_FS_MOUNT_OPTS="" # test mount options for tmpfs
> Are you running with CONFIG_XFS_DEBUG=y?
# CONFIG_XFS_DEBUG is not set
I can enable this if I can reproduce it.
>
> > > So, looks like memory corruption - a corrupted slab, perhaps? Can
> > > you turn on memory poisoning, debugging, etc?
>
> Does this turn anything up?
It is still running. Unsure if it is reproducible at this point.
CAI Qian
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 8:04 ` CAI Qian
@ 2013-03-12 10:23 ` Dave Chinner
0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2013-03-12 10:23 UTC (permalink / raw)
To: CAI Qian; +Cc: xfs
On Tue, Mar 12, 2013 at 04:04:11AM -0400, CAI Qian wrote:
>
>
> ----- Original Message -----
> > From: "Dave Chinner" <david@fromorbit.com>
> > To: "CAI Qian" <caiqian@redhat.com>
> > Cc: xfs@oss.sgi.com
> > Sent: Tuesday, March 12, 2013 3:46:08 PM
> > Subject: Re: 3.9-rc2 xfs panic
> >
> > On Tue, Mar 12, 2013 at 02:34:07AM -0400, CAI Qian wrote:
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Dave Chinner" <david@fromorbit.com>
> > > > To: "CAI Qian" <caiqian@redhat.com>
> > > > Cc: xfs@oss.sgi.com
> > > > Sent: Tuesday, March 12, 2013 2:07:01 PM
> > > > Subject: Re: 3.9-rc2 xfs panic
> > > >
> > > > On Tue, Mar 12, 2013 at 12:32:28AM -0400, CAI Qian wrote:
> > > > > Just came across when running xfstests using 3.9-rc2 kernel on
> > > > > a
> > > > > power7
> > > > > box with addition of this patch which fixed a known issue,
> > > > > http://people.redhat.com/qcai/stable/01-fix-double-fetch-hlist.patch
> > > > >
> > > > > The log shows it was happened around test case 370 with
> > > > > TEST_PARAM_BLKSIZE = 2048
> > > >
> > > > That doesn't sound like xfstests. it only has 305 tests, and no
> > > > parameters like TEST_PARAM_BLKSIZE....
> > > Sorry, it is a typo, test case 270 not 370. TEST_PARAM_BLKSIZE was
> > > from an internal wrapper to be used to create new filessytem not
> > > from the
> > > original xfstests.
> >
> > OK, so that means you're testing 2k filesystem block size on a 64k
> > page size machine?
> Looks like so. Would that be a problem?
It shouldn't be a problem, but nobody else is testing with that
config and so you could be seeing problems nobody sees.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 7:46 ` Dave Chinner
2013-03-12 8:04 ` CAI Qian
@ 2013-03-13 2:44 ` CAI Qian
2013-03-13 4:43 ` Dave Chinner
2013-03-28 8:39 ` CAI Qian
2 siblings, 1 reply; 14+ messages in thread
From: CAI Qian @ 2013-03-13 2:44 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Eek, got another NULL pointer on an x64 system also. Looks like from
xfstests case 110. Same user-space version as the one in the ppc64
case. Still trying to reproduce and without more debugging options
enabled if possible.
Swap Size = 7983 MB
Mem Size = 7852 MB
Number of Processors = 16
meta-data=/dev/loop0 isize=256 agcount=4, agsize=655360 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=2048 blocks=2621440, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=2048 blocks=5120, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
CAI Qian
[30706.240701] XFS (loop1): xfs_trans_ail_delete_bulk: attempting to delete a log item that is not in the AIL
[30706.242124] XFS (loop1): xfs_do_force_shutdown(0x2) called from line 743 of file fs/xfs/xfs_trans_ail.c. Return address = 0xffffffffa03c03ef
[30706.245280] XFS (loop1): Log I/O Error Detected. Shutting down filesystem
[30706.246311] XFS (loop1): Please umount the filesystem and rectify the problem(s)
[30707.279880] XFS (loop0): Mounting Filesystem
[30707.290512] XFS (loop0): Ending clean mount
[30708.966751] XFS (loop1): xfs_log_force: error 5 returned.
[30708.977075] XFS (loop1): xfs_log_force: error 5 returned.
[30708.978074] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
[30708.979629] IP: [<ffffffffa03655e7>] xfs_bdstrat_cb+0x27/0xd0 [xfs]
[30708.980846] PGD 0
[30708.981354] Oops: 0000 [#1] SMP
[30708.982012] Modules linked in: fuse(F) scsi_transport_iscsi(F) tun(F) ipt_ULOG(F) binfmt_misc(F) bnep(F) hidp(F) nfc(F) af_802154(F) pppoe(F) pppox(F) ppp_generic(F) slhc(F) rds(F) af_key(F) atm(F) ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) btrfs(F) zlib_deflate(F) raid6_pq(F) xor(F) vfat(F) fat(F) nfsv3(F) nfs_acl(F) nfsv2(F) nfs(F) lockd(F) sunrpc(F) dns_resolver(F) fscache(F) nfnetlink_log(F) nfnetlink(F) bluetooth(F) rfkill(F) nf_tproxy_core(F) nls_koi8_u(F) nls_cp932(F) ts_kmp(F) sctp(F) sg(F) ipmi_si(F) kvm_amd(F) bnx2(F) hpwdt(F) ipmi_msghandler(F) hpilo(F) kvm(F) amd64_edac_mod(F) serio_raw(F) edac_mce_amd(F) pcspkr(F) shpchp(F) edac_core(F) microcode(F) k10temp(F) xfs(F) libcrc32c(F) hpsa(F) sr_mod(F) cdrom(F) ata_generic(F) radeon(F) i2c_algo_bit(F) pata_acpi(F) qla2xxx(F) drm_kms_helper(F) ttm(F) pata_amd(F) scsi_transport_fc(F) drm(F) scsi_tgt(F) libata(F) cciss(F) i2c_core(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloade
d: ipt_REJECT]
[30709.000425] CPU 11
[30709.000749] Pid: 29444, comm: xfsaild/loop1 Tainted: GF W 3.9.0-rc2+ #1 HP ProLiant DL585 G5
[30709.002464] RIP: 0010:[<ffffffffa03655e7>] [<ffffffffa03655e7>] xfs_bdstrat_cb+0x27/0xd0 [xfs]
[30709.004076] RSP: 0018:ffff8801f8d21d18 EFLAGS: 00010286
[30709.005089] RAX: 0000000000000000 RBX: ffff88017c672b80 RCX: dead000000200200
[30709.006415] RDX: ffff88017c670df8 RSI: 00000000802a0013 RDI: ffff88017c670d80
[30709.007791] RBP: ffff8801f8d21d38 R08: ffff88017c670df8 R09: 0000000000000001
[30709.009274] R10: ffffea0009e43c00 R11: ffffffffa037b055 R12: ffff88017c670d80
[30709.010597] R13: ffff8801f8d21dd8 R14: ffff88025cfba5e0 R15: ffff88017c672b80
[30709.011973] FS: 00007fdebffff700(0000) GS:ffff88027fb00000(0000) knlGS:00000000f75736c0
[30709.013575] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[30709.014760] CR2: 0000000000000230 CR3: 000000010ac7b000 CR4: 00000000000007e0
[30709.016188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[30709.017571] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[30709.018957] Process xfsaild/loop1 (pid: 29444, threadinfo ffff8801f8d20000, task ffff8801e2ea65c0)
[30709.020766] Stack:
[30709.021198] 0000000000000000 ffff88017c672b80 0000000000000000 ffff8801f8d21dd8
[30709.022712] ffff8801f8d21dc8 ffffffffa0365801 ffff88017c670df8 ffff88025cfba658
[30709.024102] 00000000f8d21d78 ffffffffa03661b0 ffff88017c670d80 0000000091827364
[30709.025480] Call Trace:
[30709.025938] [<ffffffffa0365801>] __xfs_buf_delwri_submit+0x171/0x1e0 [xfs]
[30709.027202] [<ffffffffa03661b0>] ? xfs_buf_delwri_submit_nowait+0x20/0x30 [xfs]
[30709.028519] [<ffffffffa03bf800>] ? xfs_trans_ail_cursor_done+0x20/0x30 [xfs]
[30709.029824] [<ffffffffa03661b0>] xfs_buf_delwri_submit_nowait+0x20/0x30 [xfs]
[30709.031223] [<ffffffffa03bfae2>] xfsaild+0x222/0x5e0 [xfs]
[30709.032291] [<ffffffffa03bf8c0>] ? xfs_trans_ail_cursor_first+0xb0/0xb0 [xfs]
[30709.033661] [<ffffffff81086a50>] kthread+0xc0/0xd0
[30709.034644] [<ffffffff81086990>] ? kthread_create_on_node+0x120/0x120
[30709.035870] [<ffffffff8162fb6c>] ret_from_fork+0x7c/0xb0
[30709.036932] [<ffffffff81086990>] ? kthread_create_on_node+0x120/0x120
[30709.038184] Code: 1f 44 00 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 4c 89 65 f0 48 89 5d e8 49 89 fc 4c 89 6d f8 48 8b 87 90 00 00 00 48 8b 40 18 <f6> 80 30 02 00 00 10 74 30 4c 8b 6d 08 66 66 66 66 90 49 83 bc
[30709.042158] RIP [<ffffffffa03655e7>] xfs_bdstrat_cb+0x27/0xd0 [xfs]
[30709.043486] RSP <ffff8801f8d21d18>
[30709.044224] CR2: 0000000000000230
[30709.060200] ---[ end trace e0ed74f75ad92c73 ]---
[30734.047588] XFS (loop1): xfs_log_force: error 5 returned.
[30764.108856] XFS (loop1): xfs_log_force: error 5 returned.
[30794.170118] XFS (loop1): xfs_log_force: error 5 returned.
[30824.231359] XFS (loop1): xfs_log_force: error 5 returned.
[30854.292624] XFS (loop1): xfs_log_force: error 5 returned.
[-- MARK -- Tue Mar 12 09:30:00 2013]
[30884.353854] XFS (loop1): xfs_log_force: error 5 returned.
[30914.415138] XFS (loop1): xfs_log_force: error 5 returned.
[30944.476396] XFS (loop1): xfs_log_force: error 5 returned.
[30974.537638] XFS (loop1): xfs_log_force: error 5 returned.
[31004.598901] XFS (loop1): xfs_log_force: error 5 returned.
[31034.660150] XFS (loop1): xfs_log_force: error 5 returned.
[31064.721412] XFS (loop1): xfs_log_force: error 5 returned.
[31094.782674] XFS (loop1): xfs_log_force: error 5 returned.
[31124.843915] XFS (loop1): xfs_log_force: error 5 returned.
[31154.905174] XFS (loop1): xfs_log_force: error 5 returned.
[-- MARK -- Tue Mar 12 09:35:00 2013]
[31184.966422] XFS (loop1): xfs_log_force: error 5 returned.
[31215.027675] XFS (loop1): xfs_log_force: error 5 returned.
[31245.088943] XFS (loop1): xfs_log_force: error 5 returned.
[31275.150185] XFS (loop1): xfs_log_force: error 5 returned.
[31305.211448] XFS (loop1): xfs_log_force: error 5 returned.
[31335.272787] XFS (loop1): xfs_log_force: error 5 returned.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-13 2:44 ` CAI Qian
@ 2013-03-13 4:43 ` Dave Chinner
2013-03-13 4:56 ` CAI Qian
2013-03-14 7:39 ` CAI Qian
0 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2013-03-13 4:43 UTC (permalink / raw)
To: CAI Qian; +Cc: xfs
On Tue, Mar 12, 2013 at 10:44:36PM -0400, CAI Qian wrote:
> Eek, got another NULL pointer on an x64 system also. Looks like from
> xfstests case 110. Same user-space version as the one in the ppc64
> case. Still trying to reproduce and without more debugging options
> enabled if possible.
>
> Swap Size = 7983 MB
> Mem Size = 7852 MB
> Number of Processors = 16
>
> meta-data=/dev/loop0 isize=256 agcount=4, agsize=655360 blks
> = sectsz=512 attr=2, projid32bit=0
> data = bsize=2048 blocks=2621440, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal log bsize=2048 blocks=5120, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> CAI Qian
>
> [30706.240701] XFS (loop1): xfs_trans_ail_delete_bulk: attempting to delete a log item that is not in the AIL
What happens prior to this message? This is the first indication of
a problem....
> [30706.242124] XFS (loop1): xfs_do_force_shutdown(0x2) called from line 743 of file fs/xfs/xfs_trans_ail.c. Return address = 0xffffffffa03c03ef
> [30706.245280] XFS (loop1): Log I/O Error Detected. Shutting down filesystem
> [30706.246311] XFS (loop1): Please umount the filesystem and rectify the problem(s)
> [30707.279880] XFS (loop0): Mounting Filesystem
> [30707.290512] XFS (loop0): Ending clean mount
> [30708.966751] XFS (loop1): xfs_log_force: error 5 returned.
> [30708.977075] XFS (loop1): xfs_log_force: error 5 returned.
> [30708.978074] BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
> [30708.979629] IP: [<ffffffffa03655e7>] xfs_bdstrat_cb+0x27/0xd0 [xfs]
And that indicates that the buftarg attached to the buffer has a
NULL xfs_mount pointer, so it's probably related to the above issue.
As it is, none of my machines see this problem, so I'm wondering if
this is related to the way you are using loop devices. Can you
reproduce it on a different type of storage device (like LVm of
physical disk partitions)?
Also, can you turn on CONFIG_XFS_DEBUG and all the memory
leak/poisoning checks and see if that catches anything.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-13 4:43 ` Dave Chinner
@ 2013-03-13 4:56 ` CAI Qian
2013-03-14 7:39 ` CAI Qian
1 sibling, 0 replies; 14+ messages in thread
From: CAI Qian @ 2013-03-13 4:56 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
----- Original Message -----
> From: "Dave Chinner" <david@fromorbit.com>
> To: "CAI Qian" <caiqian@redhat.com>
> Cc: xfs@oss.sgi.com
> Sent: Wednesday, March 13, 2013 12:43:07 PM
> Subject: Re: 3.9-rc2 xfs panic
>
> On Tue, Mar 12, 2013 at 10:44:36PM -0400, CAI Qian wrote:
> > Eek, got another NULL pointer on an x64 system also. Looks like
> > from
> > xfstests case 110. Same user-space version as the one in the ppc64
> > case. Still trying to reproduce and without more debugging options
> > enabled if possible.
> >
> > Swap Size = 7983 MB
> > Mem Size = 7852 MB
> > Number of Processors = 16
> >
> > meta-data=/dev/loop0 isize=256 agcount=4,
> > agsize=655360 blks
> > = sectsz=512 attr=2, projid32bit=0
> > data = bsize=2048 blocks=2621440,
> > imaxpct=25
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0
> > log =internal log bsize=2048 blocks=5120,
> > version=2
> > = sectsz=512 sunit=0 blks,
> > lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> >
> > CAI Qian
> >
> > [30706.240701] XFS (loop1): xfs_trans_ail_delete_bulk: attempting
> > to delete a log item that is not in the AIL
>
> What happens prior to this message? This is the first indication of
> a problem....
Something might be interesting before those were,
[30409.026043] XFS (loop1): xfs_log_force: error 5 returned.
[30409.027253] XFS (loop1): xfs_log_force: error 5 returned.
[30409.028900] XFS (loop1): xfs_log_force: error 5 returned.
...
[30453.156830] XFS (loop1): Quotacheck: Done.
[30454.006227] XFS (loop1): xfs_log_force: error 5 returned.
[30454.010698] XFS (loop1): xfs_log_force: error 5 returned.
[30454.011673] XFS (loop1): xfs_qm_dquot_logitem_push: push error 5 on dqp ffff8801f6785e48
[30454.019251] XFS (loop1): xfs_log_force: error 5 returned.
...
>
> > [30706.242124] XFS (loop1): xfs_do_force_shutdown(0x2) called from
> > line 743 of file fs/xfs/xfs_trans_ail.c. Return address =
> > 0xffffffffa03c03ef
> > [30706.245280] XFS (loop1): Log I/O Error Detected. Shutting down
> > filesystem
> > [30706.246311] XFS (loop1): Please umount the filesystem and
> > rectify the problem(s)
> > [30707.279880] XFS (loop0): Mounting Filesystem
> > [30707.290512] XFS (loop0): Ending clean mount
> > [30708.966751] XFS (loop1): xfs_log_force: error 5 returned.
> > [30708.977075] XFS (loop1): xfs_log_force: error 5 returned.
> > [30708.978074] BUG: unable to handle kernel NULL pointer
> > dereference at 0000000000000230
> > [30708.979629] IP: [] xfs_bdstrat_cb+0x27/0xd0
> > [xfs]
>
> And that indicates that the buftarg attached to the buffer has a
> NULL xfs_mount pointer, so it's probably related to the above issue.
>
> As it is, none of my machines see this problem, so I'm wondering if
> this is related to the way you are using loop devices. Can you
> reproduce it on a different type of storage device (like LVm of
> physical disk partitions)?
Will do.
>
> Also, can you turn on CONFIG_XFS_DEBUG and all the memory
> leak/poisoning checks and see if that catches anything.
Will do.
CAI Qian
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-13 4:43 ` Dave Chinner
2013-03-13 4:56 ` CAI Qian
@ 2013-03-14 7:39 ` CAI Qian
2013-03-14 8:06 ` CAI Qian
1 sibling, 1 reply; 14+ messages in thread
From: CAI Qian @ 2013-03-14 7:39 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
OK, this time I reproduced this panic on both x64 and ppc64
systems with LVM partitions using the default block size as
well as enabling debugging and memory poisoning options.
- ppc64 trace:
(nothing looks like really interesting prior to this)
[ 2221.546337] XFS (dm-0): Corruption detected. Unmount and run xfs_repair
[ 2221.546345] XFS (dm-0): bad inode magic/vsn daddr 64 #8 (magic=5858)
[ 2221.546350] XFS: Assertion failed: 0, file: fs/xfs/xfs_inode.c, line: 416
[ 2221.546383] ------------[ cut here ]------------
[ 2221.546386] kernel BUG at fs/xfs/xfs_message.c:100!
[ 2221.546391] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2221.546394] SMP NR_CPUS=1024 NUMA pSeries
[ 2221.546398] Modules linked in: btrfs raid6_pq xor lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sg ibmveth xfs libcrc32c sd_mod crc_t10dif ibmvscsi scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 2221.546445] NIP: d000000002758bbc LR: d000000002758bbc CTR: 0000000001766760
[ 2221.546449] REGS: c00000000a393780 TRAP: 0700 Not tainted (3.9.0-rc2+)
[ 2221.546452] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 286b4b24 XER: 0000000e
[ 2221.546463] SOFTE: 1
[ 2221.546465] CFAR: d00000000275881c
[ 2221.546468] TASK = c00000000a423dc0[415] 'kworker/16:1H' THREAD: c00000000a390000 CPU: 16
GPR00: d000000002758bbc c00000000a393a00 d00000000282f770 000000000000003d
GPR04: 0000000000000000 0000000000000000 0000000000010000 0000000000000000
GPR08: 0000000000000007 0000000000000000 0000000000000000 0000000000003fef
GPR12: 00000000286b4b22 c00000000f244000 c0000000000b57e0 c0000000fb6dba90
GPR16: 0000000000000000 0000000000000000 c0000000e76614b0 d00000000282f770
GPR20: d00000000282f770 d00000000282f770 d00000000282f770 d00000000282f770
GPR24: d00000000282f770 d000000002743554 0000000000000020 d000000002802988
GPR28: c00000005758e700 c0000000e7661290 0000000000000008 c0000000e7423dc8
[ 2221.546545] NIP [d000000002758bbc] .assfail+0x2c/0x30 [xfs]
[ 2221.546564] LR [d000000002758bbc] .assfail+0x2c/0x30 [xfs]
[ 2221.546568] Call Trace:
[ 2221.546586] [c00000000a393a00] [d000000002758bbc] .assfail+0x2c/0x30 [xfs] (unreliable)
[ 2221.546612] [c00000000a393a70] [d0000000027a8704] .xfs_inode_buf_verify+0x134/0x220 [xfs]
[ 2221.546632] [c00000000a393b50] [d000000002743554] .xfs_buf_iodone_work+0x64/0x150 [xfs]
[ 2221.546639] [c00000000a393bd0] [c0000000000ad830] .process_one_work+0x1b0/0x4c0
[ 2221.546644] [c00000000a393c70] [c0000000000ae078] .worker_thread+0x178/0x470
[ 2221.546649] [c00000000a393d30] [c0000000000b58c8] .kthread+0xe8/0xf0
[ 2221.546654] [c00000000a393e30] [c000000000009f64] .ret_from_kernel_thread+0x64/0x80
[ 2221.546658] Instruction dump:
[ 2221.546662] 60000000 7c0802a6 3d420000 7c691b78 7c862378 e88a90f0 7ca72b78 38600000
[ 2221.546672] 7d254b78 f8010010 f821ff91 4bfffc09 <0fe00000> 7c0802a6 3ce20000 3d420000
[ 2221.546685] ---[ end trace fd5756e02a75ba6a ]---
[ 2221.548469]
[ 2221.548542] Unable to handle kernel paging request for data at address 0xffffffffffffffd8
[ 2221.548552] Faulting instruction address: 0xc0000000000b5cc4
[ 2221.548561] Oops: Kernel access of bad area, sig: 11 [#2]
[ 2221.548577] SMP NR_CPUS=1024 NUMA pSeries
[ 2221.548592] Modules linked in: btrfs raid6_pq xor lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sg ibmveth xfs libcrc32c sd_mod crc_t10dif ibmvscsi scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 2221.548677] NIP: c0000000000b5cc4 LR: c0000000000aede8 CTR: c0000000000d3d80
[ 2221.548685] REGS: c00000000a392e40 TRAP: 0300 Tainted: G D (3.9.0-rc2+)
[ 2221.548692] MSR: 8000000002009032 <SF,VEC,EE,ME,IR,DR,RI> CR: 426b4b48 XER: 00000001
[ 2221.548714] SOFTE: 0
[ 2221.548718] CFAR: c000000000005f1c
[ 2221.548724] DAR: ffffffffffffffd8, DSISR: 40000000
[ 2221.548732] TASK = c00000000a423dc0[415] 'kworker/16:1H' THREAD: c00000000a390000 CPU: 16
GPR00: c000000000738fac c00000000a3930c0 c000000001114018 c00000000a423dc0
GPR04: 0000000000000010 c00000000a423e30 0000000000000000 0000000000000001
GPR08: 0000000000000001 0000000000000000 c000000001352f08 00000000161aad9e
GPR12: 00000000846b4b44 c00000000f244000 c000000001044018 c00000000008d470
GPR16: c000000000b04880 c0000000011b4018 c000000001074018 c000000000b04880
GPR20: c000000000b04880 0000000000c10000 0000000000000010 c000000000b04880
GPR24: c000000000b04880 0000000000000010 c00000000a424200 c00000000a390000
GPR28: c00000000117c9d8 c000000000b04880 c000000001714880 0000000000000010
[ 2221.548853] NIP [c0000000000b5cc4] .kthread_data+0x4/0x10
[ 2221.548862] LR [c0000000000aede8] .wq_worker_sleeping+0x18/0xd0
[ 2221.548868] Call Trace:
[ 2221.548874] [c00000000a3930c0] [c0000000007495a4] .__slab_free+0x84/0x310 (unreliable)
[ 2221.548883] [c00000000a393140] [c000000000738fac] .__schedule+0x6fc/0x940
[ 2221.548891] [c00000000a3933c0] [c00000000008d470] .do_exit+0x730/0xb40
[ 2221.548899] [c00000000a3934c0] [c00000000001e0d4] .die+0x2e4/0x440
[ 2221.548906] [c00000000a393570] [c00000000001e444] ._exception+0x1a4/0x1d0
[ 2221.548913] [c00000000a393710] [c0000000000063c8] program_check_common+0x148/0x180
[ 2221.548994] --- Exception: 700 at .assfail+0x2c/0x30 [xfs]
[ 2221.548994] LR = .assfail+0x2c/0x30 [xfs]
[ 2221.549043] [c00000000a393a70] [d0000000027a8704] .xfs_inode_buf_verify+0x134/0x220 [xfs]
[ 2221.549086] [c00000000a393b50] [d000000002743554] .xfs_buf_iodone_work+0x64/0x150 [xfs]
[ 2221.549094] [c00000000a393bd0] [c0000000000ad830] .process_one_work+0x1b0/0x4c0
[ 2221.549102] [c00000000a393c70] [c0000000000ae078] .worker_thread+0x178/0x470
[ 2221.549108] [c00000000a393d30] [c0000000000b58c8] .kthread+0xe8/0xf0
[ 2221.549115] [c00000000a393e30] [c000000000009f64] .ret_from_kernel_thread+0x64/0x80
[ 2221.549121] Instruction dump:
[ 2221.549125] ebe1fff8 7c0803a6 4bfffdf4 e92d0258 e92903f8 e869ffc8 7863f7e2 4e800020
[ 2221.549138] 60000000 60000000 60000000 e92303f8 <e869ffd8> 4e800020 60000000 e92d0258
[ 2221.549151] ---[ end trace fd5756e02a75ba6b ]---
[ 2221.552003]
[ 2221.552008] Fixing recursive fault but reboot is needed!
[ 2281.608342] INFO: rcu_sched detected stalls on CPUs/tasks: { 16} (detected by 17, t=6002 jiffies, g=23049, c=23048, q=135)
[ 2281.608362] Task dump for CPU 16:
[ 2281.608367] kworker/16:1H D 0000000000000000 0 415 2 0x00000800
[ 2281.608374] Call Trace:
[ 2281.608412] [c00000000a393770] [d000000002758bbc] .assfail+0x2c/0x30 [xfs] (unreliable)
[ 2353.528157] sd 0:0:1:0: aborting command. lun 0x8100000000000000, tag 0xc0000000fc142650
[ 2406.758570] sd 0:0:1:0: aborted task tag 0xc0000000fc142650 completed
[ 2461.657815] INFO: rcu_sched detected stalls on CPUs/tasks: { 16} (detected by 18, t=24007 jiffies, g=23049, c=23048, q=511)
[ 2461.657838] Task dump for CPU 16:
[ 2461.657843] kworker/16:1H D 0000000000000000 0 415 2 0x00000800
[ 2461.657851] Call Trace:
[ 2461.657900] [c00000000a393770] [d000000002758bbc] .assfail+0x2c/0x30 [xfs] (unreliable)
- x64 trace:
[18922.742140] XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 964
[18922.744338] ------------[ cut here ]------------
[18922.745267] kernel BUG at fs/xfs/xfs_message.c:100!
[18922.746294] invalid opcode: 0000 [#1] SMP
[18922.747261] Modules linked in: btrfs(F) zlib_deflate(F) raid6_pq(F) xor(F) lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) kvm_amd(F) kvm(F) bnx2(F) pcspkr(F) microcode(F) shpchp(F) amd64_edac_mod(F) serio_raw(F) ipmi_si(F) edac_mce_amd(F) hpwdt(F) ipmi_msghandler(F) edac_core(F) k10temp(F) hpilo(F) xfs(F) libcrc32c(F) sr_mod(F) cdrom(F) hpsa(F) radeon(F) ata_generic(F) qla2xxx(F) i2c_algo_bit(F) pata_acpi(F) drm_kms_helper(F) scsi_transport_fc(F) ttm(F) scsi_tgt(F) drm(F) pata_amd(F) cciss(F) libata(F) i2c_core(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F)
[18922.764413] CPU 4
[18922.764815] Pid: 23999, comm: umount Tainted: GF 3.9.0-rc2+ #1 HP ProLiant DL585 G5
[18922.766619] RIP: 0010:[<ffffffffa03807b2>] [<ffffffffa03807b2>] assfail+0x22/0x30 [xfs]
[18922.768316] RSP: 0018:ffff8801c9575d38 EFLAGS: 00010292
[18922.769482] RAX: 0000000000000077 RBX: ffff88010da1bf08 RCX: ffff88007da8ffe8
[18922.770812] RDX: 0000000000000000 RSI: ffff88007da8e3b8 RDI: 0000000000000246
[18922.772117] RBP: ffff8801c9575d38 R08: ffffffff81a0f320 R09: 00000000000014ce
[18922.773429] R10: 0000000000000000 R11: 00000000000014cd R12: ffff88010da1bf08
[18922.774715] R13: ffff88010da1bdc0 R14: ffff880073e53ab8 R15: ffff880072bfef60
[18922.775998] FS: 00007fde048ed880(0000) GS:ffff88007da80000(0000) knlGS:0000000000000000
[18922.777433] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[18922.778446] CR2: 00007fde044c264f CR3: 00000000737c5000 CR4: 00000000000007e0
[18922.779730] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18922.781019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[18922.782305] Process umount (pid: 23999, threadinfo ffff8801c9574000, task ffff88019a24d040)
[18922.783836] Stack:
[18922.784236] ffff8801c9575d68 ffffffffa0381664 ffff8801c9575d68 ffff88010da1bf08
[18922.785773] ffff88010da1c008 ffffffffa03f3ca0 ffff8801c9575d88 ffffffff811bb37c
[18922.787466] ffff88010da1bf08 ffff88010da1bf08 ffff8801c9575db8 ffffffff811bb4bf
[18922.788982] Call Trace:
[18922.789541] [<ffffffffa0381664>] xfs_fs_destroy_inode+0x84/0x140 [xfs]
[18922.790878] [<ffffffff811bb37c>] destroy_inode+0x3c/0x70
[18922.792087] [<ffffffff811bb4bf>] evict+0x10f/0x1a0
[18922.792991] [<ffffffff811bb58e>] dispose_list+0x3e/0x60
[18922.793967] [<ffffffff811bc058>] evict_inodes+0xb8/0x100
[18922.794937] [<ffffffff811a3f53>] generic_shutdown_super+0x53/0xf0
[18922.796064] [<ffffffff811a4020>] kill_block_super+0x30/0x80
[18922.797091] [<ffffffff811a4407>] deactivate_locked_super+0x57/0x80
[18922.798198] [<ffffffff811a4f8e>] deactivate_super+0x4e/0x70
[18922.799235] [<ffffffff811bfb17>] mntput_no_expire+0xd7/0x130
[18922.800310] [<ffffffff811c09fc>] sys_umount+0x9c/0x3c0
[18922.801273] [<ffffffff81630399>] system_call_fastpath+0x16/0x1b
[18922.802393] Code: e8 f4 fb ff ff 0f 0b 66 90 66 66 66 66 90 55 48 89 f1 41 89 d0 48 c7 c6 b0 e0 3f a0 48 89 fa 31 c0 48 89 e5 31 ff e8 1e fc ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
[18922.806375] RIP [<ffffffffa03807b2>] assfail+0x22/0x30 [xfs]
[18922.807459] RSP <ffff8801c9575d38>
[18922.824045] ---[ end trace 133202f8e58b0c3c ]---
CAI Qian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-14 7:39 ` CAI Qian
@ 2013-03-14 8:06 ` CAI Qian
2013-03-14 13:17 ` Mark Tinguely
0 siblings, 1 reply; 14+ messages in thread
From: CAI Qian @ 2013-03-14 8:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
This is easy to reproduce here,
# ./check 111
Bisecting is under way...
----- Original Message -----
> From: "CAI Qian" <caiqian@redhat.com>
> To: "Dave Chinner" <david@fromorbit.com>
> Cc: xfs@oss.sgi.com
> Sent: Thursday, March 14, 2013 3:39:33 PM
> Subject: Re: 3.9-rc2 xfs panic
>
> OK, this time I reproduced this panic on both x64 and ppc64
> systems with LVM partitions using the default block size as
> well as enabling debugging and memory poisoning options.
>
> - ppc64 trace:
> (nothing looks like really interesting prior to this)
> [ 2221.546337] XFS (dm-0): Corruption detected. Unmount and run
> xfs_repair
> [ 2221.546345] XFS (dm-0): bad inode magic/vsn daddr 64 #8
> (magic=5858)
> [ 2221.546350] XFS: Assertion failed: 0, file: fs/xfs/xfs_inode.c,
> line: 416
> [ 2221.546383] ------------[ cut here ]------------
> [ 2221.546386] kernel BUG at fs/xfs/xfs_message.c:100!
> [ 2221.546391] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 2221.546394] SMP NR_CPUS=1024 NUMA pSeries
> [ 2221.546398] Modules linked in: btrfs raid6_pq xor lockd sunrpc
> nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE
> ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
> ip_tables sg ibmveth xfs libcrc32c sd_mod crc_t10dif ibmvscsi
> scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
> [ 2221.546445] NIP: d000000002758bbc LR: d000000002758bbc CTR:
> 0000000001766760
> [ 2221.546449] REGS: c00000000a393780 TRAP: 0700 Not tainted
> (3.9.0-rc2+)
> [ 2221.546452] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR:
> 286b4b24 XER: 0000000e
> [ 2221.546463] SOFTE: 1
> [ 2221.546465] CFAR: d00000000275881c
> [ 2221.546468] TASK = c00000000a423dc0[415] 'kworker/16:1H' THREAD:
> c00000000a390000 CPU: 16
> GPR00: d000000002758bbc c00000000a393a00 d00000000282f770
> 000000000000003d
> GPR04: 0000000000000000 0000000000000000 0000000000010000
> 0000000000000000
> GPR08: 0000000000000007 0000000000000000 0000000000000000
> 0000000000003fef
> GPR12: 00000000286b4b22 c00000000f244000 c0000000000b57e0
> c0000000fb6dba90
> GPR16: 0000000000000000 0000000000000000 c0000000e76614b0
> d00000000282f770
> GPR20: d00000000282f770 d00000000282f770 d00000000282f770
> d00000000282f770
> GPR24: d00000000282f770 d000000002743554 0000000000000020
> d000000002802988
> GPR28: c00000005758e700 c0000000e7661290 0000000000000008
> c0000000e7423dc8
> [ 2221.546545] NIP [d000000002758bbc] .assfail+0x2c/0x30 [xfs]
> [ 2221.546564] LR [d000000002758bbc] .assfail+0x2c/0x30 [xfs]
> [ 2221.546568] Call Trace:
> [ 2221.546586] [c00000000a393a00] [d000000002758bbc]
> .assfail+0x2c/0x30 [xfs] (unreliable)
> [ 2221.546612] [c00000000a393a70] [d0000000027a8704]
> .xfs_inode_buf_verify+0x134/0x220 [xfs]
> [ 2221.546632] [c00000000a393b50] [d000000002743554]
> .xfs_buf_iodone_work+0x64/0x150 [xfs]
> [ 2221.546639] [c00000000a393bd0] [c0000000000ad830]
> .process_one_work+0x1b0/0x4c0
> [ 2221.546644] [c00000000a393c70] [c0000000000ae078]
> .worker_thread+0x178/0x470
> [ 2221.546649] [c00000000a393d30] [c0000000000b58c8]
> .kthread+0xe8/0xf0
> [ 2221.546654] [c00000000a393e30] [c000000000009f64]
> .ret_from_kernel_thread+0x64/0x80
> [ 2221.546658] Instruction dump:
> [ 2221.546662] 60000000 7c0802a6 3d420000 7c691b78 7c862378 e88a90f0
> 7ca72b78 38600000
> [ 2221.546672] 7d254b78 f8010010 f821ff91 4bfffc09 <0fe00000>
> 7c0802a6 3ce20000 3d420000
> [ 2221.546685] ---[ end trace fd5756e02a75ba6a ]---
> [ 2221.548469]
> [ 2221.548542] Unable to handle kernel paging request for data at
> address 0xffffffffffffffd8
> [ 2221.548552] Faulting instruction address: 0xc0000000000b5cc4
> [ 2221.548561] Oops: Kernel access of bad area, sig: 11 [#2]
> [ 2221.548577] SMP NR_CPUS=1024 NUMA pSeries
> [ 2221.548592] Modules linked in: btrfs raid6_pq xor lockd sunrpc
> nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE
> ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
> iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
> ip_tables sg ibmveth xfs libcrc32c sd_mod crc_t10dif ibmvscsi
> scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
> [ 2221.548677] NIP: c0000000000b5cc4 LR: c0000000000aede8 CTR:
> c0000000000d3d80
> [ 2221.548685] REGS: c00000000a392e40 TRAP: 0300 Tainted: G D
> (3.9.0-rc2+)
> [ 2221.548692] MSR: 8000000002009032 <SF,VEC,EE,ME,IR,DR,RI> CR:
> 426b4b48 XER: 00000001
> [ 2221.548714] SOFTE: 0
> [ 2221.548718] CFAR: c000000000005f1c
> [ 2221.548724] DAR: ffffffffffffffd8, DSISR: 40000000
> [ 2221.548732] TASK = c00000000a423dc0[415] 'kworker/16:1H' THREAD:
> c00000000a390000 CPU: 16
> GPR00: c000000000738fac c00000000a3930c0 c000000001114018
> c00000000a423dc0
> GPR04: 0000000000000010 c00000000a423e30 0000000000000000
> 0000000000000001
> GPR08: 0000000000000001 0000000000000000 c000000001352f08
> 00000000161aad9e
> GPR12: 00000000846b4b44 c00000000f244000 c000000001044018
> c00000000008d470
> GPR16: c000000000b04880 c0000000011b4018 c000000001074018
> c000000000b04880
> GPR20: c000000000b04880 0000000000c10000 0000000000000010
> c000000000b04880
> GPR24: c000000000b04880 0000000000000010 c00000000a424200
> c00000000a390000
> GPR28: c00000000117c9d8 c000000000b04880 c000000001714880
> 0000000000000010
> [ 2221.548853] NIP [c0000000000b5cc4] .kthread_data+0x4/0x10
> [ 2221.548862] LR [c0000000000aede8] .wq_worker_sleeping+0x18/0xd0
> [ 2221.548868] Call Trace:
> [ 2221.548874] [c00000000a3930c0] [c0000000007495a4]
> .__slab_free+0x84/0x310 (unreliable)
> [ 2221.548883] [c00000000a393140] [c000000000738fac]
> .__schedule+0x6fc/0x940
> [ 2221.548891] [c00000000a3933c0] [c00000000008d470]
> .do_exit+0x730/0xb40
> [ 2221.548899] [c00000000a3934c0] [c00000000001e0d4] .die+0x2e4/0x440
> [ 2221.548906] [c00000000a393570] [c00000000001e444]
> ._exception+0x1a4/0x1d0
> [ 2221.548913] [c00000000a393710] [c0000000000063c8]
> program_check_common+0x148/0x180
> [ 2221.548994] --- Exception: 700 at .assfail+0x2c/0x30 [xfs]
> [ 2221.548994] LR = .assfail+0x2c/0x30 [xfs]
> [ 2221.549043] [c00000000a393a70] [d0000000027a8704]
> .xfs_inode_buf_verify+0x134/0x220 [xfs]
> [ 2221.549086] [c00000000a393b50] [d000000002743554]
> .xfs_buf_iodone_work+0x64/0x150 [xfs]
> [ 2221.549094] [c00000000a393bd0] [c0000000000ad830]
> .process_one_work+0x1b0/0x4c0
> [ 2221.549102] [c00000000a393c70] [c0000000000ae078]
> .worker_thread+0x178/0x470
> [ 2221.549108] [c00000000a393d30] [c0000000000b58c8]
> .kthread+0xe8/0xf0
> [ 2221.549115] [c00000000a393e30] [c000000000009f64]
> .ret_from_kernel_thread+0x64/0x80
> [ 2221.549121] Instruction dump:
> [ 2221.549125] ebe1fff8 7c0803a6 4bfffdf4 e92d0258 e92903f8 e869ffc8
> 7863f7e2 4e800020
> [ 2221.549138] 60000000 60000000 60000000 e92303f8 <e869ffd8>
> 4e800020 60000000 e92d0258
> [ 2221.549151] ---[ end trace fd5756e02a75ba6b ]---
> [ 2221.552003]
> [ 2221.552008] Fixing recursive fault but reboot is needed!
> [ 2281.608342] INFO: rcu_sched detected stalls on CPUs/tasks: { 16}
> (detected by 17, t=6002 jiffies, g=23049, c=23048, q=135)
> [ 2281.608362] Task dump for CPU 16:
> [ 2281.608367] kworker/16:1H D 0000000000000000 0 415 2
> 0x00000800
> [ 2281.608374] Call Trace:
> [ 2281.608412] [c00000000a393770] [d000000002758bbc]
> .assfail+0x2c/0x30 [xfs] (unreliable)
> [ 2353.528157] sd 0:0:1:0: aborting command. lun 0x8100000000000000,
> tag 0xc0000000fc142650
> [ 2406.758570] sd 0:0:1:0: aborted task tag 0xc0000000fc142650
> completed
> [ 2461.657815] INFO: rcu_sched detected stalls on CPUs/tasks: { 16}
> (detected by 18, t=24007 jiffies, g=23049, c=23048, q=511)
> [ 2461.657838] Task dump for CPU 16:
> [ 2461.657843] kworker/16:1H D 0000000000000000 0 415 2
> 0x00000800
> [ 2461.657851] Call Trace:
> [ 2461.657900] [c00000000a393770] [d000000002758bbc]
> .assfail+0x2c/0x30 [xfs] (unreliable)
>
> - x64 trace:
> [18922.742140] XFS: Assertion failed:
> XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file:
> fs/xfs/xfs_super.c, line: 964
> [18922.744338] ------------[ cut here ]------------
> [18922.745267] kernel BUG at fs/xfs/xfs_message.c:100!
> [18922.746294] invalid opcode: 0000 [#1] SMP
> [18922.747261] Modules linked in: btrfs(F) zlib_deflate(F)
> raid6_pq(F) xor(F) lockd(F) sunrpc(F) nf_conntrack_netbios_ns(F)
> nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_mangle(F)
> ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) iptable_nat(F)
> nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) ipt_REJECT(F)
> nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F)
> nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F)
> ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) kvm_amd(F) kvm(F)
> bnx2(F) pcspkr(F) microcode(F) shpchp(F) amd64_edac_mod(F)
> serio_raw(F) ipmi_si(F) edac_mce_amd(F) hpwdt(F) ipmi_msghandler(F)
> edac_core(F) k10temp(F) hpilo(F) xfs(F) libcrc32c(F) sr_mod(F)
> cdrom(F) hpsa(F) radeon(F) ata_generic(F) qla2xxx(F) i2c_algo_bit(F)
> pata_acpi(F) drm_kms_helper(F) scsi_transport_fc(F) ttm(F)
> scsi_tgt(F) drm(F) pata_amd(F) cciss(F) libata(F) i2c_core(F)
> dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F)
> [18922.764413] CPU 4
> [18922.764815] Pid: 23999, comm: umount Tainted: GF
> 3.9.0-rc2+ #1 HP ProLiant DL585 G5
> [18922.766619] RIP: 0010:[<ffffffffa03807b2>] [<ffffffffa03807b2>]
> assfail+0x22/0x30 [xfs]
> [18922.768316] RSP: 0018:ffff8801c9575d38 EFLAGS: 00010292
> [18922.769482] RAX: 0000000000000077 RBX: ffff88010da1bf08 RCX:
> ffff88007da8ffe8
> [18922.770812] RDX: 0000000000000000 RSI: ffff88007da8e3b8 RDI:
> 0000000000000246
> [18922.772117] RBP: ffff8801c9575d38 R08: ffffffff81a0f320 R09:
> 00000000000014ce
> [18922.773429] R10: 0000000000000000 R11: 00000000000014cd R12:
> ffff88010da1bf08
> [18922.774715] R13: ffff88010da1bdc0 R14: ffff880073e53ab8 R15:
> ffff880072bfef60
> [18922.775998] FS: 00007fde048ed880(0000) GS:ffff88007da80000(0000)
> knlGS:0000000000000000
> [18922.777433] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [18922.778446] CR2: 00007fde044c264f CR3: 00000000737c5000 CR4:
> 00000000000007e0
> [18922.779730] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [18922.781019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [18922.782305] Process umount (pid: 23999, threadinfo
> ffff8801c9574000, task ffff88019a24d040)
> [18922.783836] Stack:
> [18922.784236] ffff8801c9575d68 ffffffffa0381664 ffff8801c9575d68
> ffff88010da1bf08
> [18922.785773] ffff88010da1c008 ffffffffa03f3ca0 ffff8801c9575d88
> ffffffff811bb37c
> [18922.787466] ffff88010da1bf08 ffff88010da1bf08 ffff8801c9575db8
> ffffffff811bb4bf
> [18922.788982] Call Trace:
> [18922.789541] [<ffffffffa0381664>] xfs_fs_destroy_inode+0x84/0x140
> [xfs]
> [18922.790878] [<ffffffff811bb37c>] destroy_inode+0x3c/0x70
> [18922.792087] [<ffffffff811bb4bf>] evict+0x10f/0x1a0
> [18922.792991] [<ffffffff811bb58e>] dispose_list+0x3e/0x60
> [18922.793967] [<ffffffff811bc058>] evict_inodes+0xb8/0x100
> [18922.794937] [<ffffffff811a3f53>] generic_shutdown_super+0x53/0xf0
> [18922.796064] [<ffffffff811a4020>] kill_block_super+0x30/0x80
> [18922.797091] [<ffffffff811a4407>]
> deactivate_locked_super+0x57/0x80
> [18922.798198] [<ffffffff811a4f8e>] deactivate_super+0x4e/0x70
> [18922.799235] [<ffffffff811bfb17>] mntput_no_expire+0xd7/0x130
> [18922.800310] [<ffffffff811c09fc>] sys_umount+0x9c/0x3c0
> [18922.801273] [<ffffffff81630399>] system_call_fastpath+0x16/0x1b
> [18922.802393] Code: e8 f4 fb ff ff 0f 0b 66 90 66 66 66 66 90 55 48
> 89 f1 41 89 d0 48 c7 c6 b0 e0 3f a0 48 89 fa 31 c0 48 89 e5 31 ff e8
> 1e fc ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66
> 90 55 48
> [18922.806375] RIP [<ffffffffa03807b2>] assfail+0x22/0x30 [xfs]
> [18922.807459] RSP <ffff8801c9575d38>
> [18922.824045] ---[ end trace 133202f8e58b0c3c ]---
>
> CAI Qian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-14 8:06 ` CAI Qian
@ 2013-03-14 13:17 ` Mark Tinguely
2013-03-14 23:39 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Mark Tinguely @ 2013-03-14 13:17 UTC (permalink / raw)
To: CAI Qian; +Cc: xfs
On 03/14/13 03:06, CAI Qian wrote:
> This is easy to reproduce here,
>
> # ./check 111
>
> Bisecting is under way...
>
Right now, xfstests 111 will cause a panic. See thread:
http://oss.sgi.com/archives/xfs/2013-03/msg00322.html
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-14 13:17 ` Mark Tinguely
@ 2013-03-14 23:39 ` Dave Chinner
0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2013-03-14 23:39 UTC (permalink / raw)
To: Mark Tinguely; +Cc: CAI Qian, xfs
On Thu, Mar 14, 2013 at 08:17:01AM -0500, Mark Tinguely wrote:
> On 03/14/13 03:06, CAI Qian wrote:
> >This is easy to reproduce here,
> >
> ># ./check 111
> >
> >Bisecting is under way...
> >
>
>
> Right now, xfstests 111 will cause a panic. See thread:
>
> http://oss.sgi.com/archives/xfs/2013-03/msg00322.html
On a debug kernel, yes. On a production kernel, no.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 3.9-rc2 xfs panic
2013-03-12 7:46 ` Dave Chinner
2013-03-12 8:04 ` CAI Qian
2013-03-13 2:44 ` CAI Qian
@ 2013-03-28 8:39 ` CAI Qian
2 siblings, 0 replies; 14+ messages in thread
From: CAI Qian @ 2013-03-28 8:39 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
FYI, I never saw any of those issues on testing any longer after
switched to use real storage devices rather than loopback ones.
I have changed the tests to always use real ones since then.
CAI Qian
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-03-28 8:39 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1868681549.12603593.1363061997919.JavaMail.root@redhat.com>
2013-03-12 4:32 ` 3.9-rc2 xfs panic CAI Qian
2013-03-12 6:07 ` Dave Chinner
2013-03-12 6:34 ` CAI Qian
2013-03-12 7:46 ` Dave Chinner
2013-03-12 8:04 ` CAI Qian
2013-03-12 10:23 ` Dave Chinner
2013-03-13 2:44 ` CAI Qian
2013-03-13 4:43 ` Dave Chinner
2013-03-13 4:56 ` CAI Qian
2013-03-14 7:39 ` CAI Qian
2013-03-14 8:06 ` CAI Qian
2013-03-14 13:17 ` Mark Tinguely
2013-03-14 23:39 ` Dave Chinner
2013-03-28 8:39 ` CAI Qian
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox