* [BUG report]xfs_btree_make_block_unfull generated an OOPS
@ 2009-12-09 1:58 hank peng
2009-12-09 2:57 ` Eric Sandeen
0 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-09 1:58 UTC (permalink / raw)
To: linux-xfs
Hi, all:
I think it is a BUG, so I report it here.
root@1234dahua:~# uname -a
Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc019ea28
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
NIP: c019ea28 LR: c019ea00 CTR: 00000000
REGS: e233baf0 TRAP: 0300 Not tainted (2.6.31.6)
MSR: 00029000 <EE,ME,CE> CR: 22008484 XER: 00000000
DEAR: 00000000, ESR: 00800000
TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0
LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
Call Trace:
[e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
[e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
[e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
[e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810
[e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
[e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
[e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
[e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
[e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
[e233bef0] [c009574c] __fput+0xe8/0x1dc
[e233bf10] [c0092048] filp_close+0x70/0xb0
[e233bf30] [c009211c] sys_close+0x94/0xc0
[e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
Instruction dump:
7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
---[ end trace 069fbb7d042289d2 ]---
According to the above call trace, I checked the source code and found
that it may be invoked by xfs_btree_make_block_unfull function in
fs/xfs/xfs_btree.c:
2641
2642 if (*stat) {
2643 *oindex = *index = cur->bc_ptrs[level];
2644 return 0;
2645 }
here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
someone can help me fix this BUG, if you need more information, let me
know.
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-09 1:58 [BUG report]xfs_btree_make_block_unfull generated an OOPS hank peng @ 2009-12-09 2:57 ` Eric Sandeen 2009-12-09 3:18 ` hank peng 0 siblings, 1 reply; 12+ messages in thread From: Eric Sandeen @ 2009-12-09 2:57 UTC (permalink / raw) To: hank peng; +Cc: linux-xfs hank peng wrote: > Hi, all: > I think it is a BUG, so I report it here. > root@1234dahua:~# uname -a > Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown > > Unable to handle kernel paging request for data at address 0x00000000 > Faulting instruction address: 0xc019ea28 > Oops: Kernel access of bad area, sig: 11 [#1] > MPC85xx CDS > Modules linked in: > NIP: c019ea28 LR: c019ea00 CTR: 00000000 > REGS: e233baf0 TRAP: 0300 Not tainted (2.6.31.6) > MSR: 00029000 <EE,ME,CE> CR: 22008484 XER: 00000000 > DEAR: 00000000, ESR: 00800000 > TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000 > GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000 > GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000 > GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0 > GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8 > NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0 huh, I don't think I've ever seen an oops here, nor has kerneloops.org. I wonder how you managed this ... :) > LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 > Call Trace: > [e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable) > [e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0 > [e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0 > [e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810 so this is freeing blocks and adding them to the freespace btrees; it needs to move entries out of a block to make room for the new one. Not a terribly unusual operation, I think. > [e233bd20] [c0186668] xfs_free_extent+0xdc/0x104 > [e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0 > [e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8 > [e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c > [e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28 > [e233bef0] [c009574c] __fput+0xe8/0x1dc > [e233bf10] [c0092048] filp_close+0x70/0xb0 > [e233bf30] [c009211c] sys_close+0x94/0xc0 > [e233bf40] [c000f784] ret_from_syscall+0x0/0x3c > Instruction dump: > 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a > 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040 > ---[ end trace 069fbb7d042289d2 ]--- > > According to the above call trace, I checked the source code and found > that it may be invoked by xfs_btree_make_block_unfull function in > fs/xfs/xfs_btree.c: > > 2641 > 2642 if (*stat) { > 2643 *oindex = *index = cur->bc_ptrs[level]; > 2644 return 0; > 2645 } > > here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope > someone can help me fix this BUG, if you need more information, let me > know. Is the above from gdb? You're quite certain that this is the case, or is this a guess? It seems a little unlikely because in the calling function: int optr; /* old key/record index */ int ptr; /* key/record index */ // .... code code code ... if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) { error = xfs_btree_make_block_unfull(cur, level, numrecs, &optr, &ptr, &nptr, &ncur, &nrec, stat); We're just sending in the addresses of these local variables; I don't see how these pointers could be NULL. Thanks, -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-09 2:57 ` Eric Sandeen @ 2009-12-09 3:18 ` hank peng [not found] ` <4B1F18C4.3060704@sandeen.net> 0 siblings, 1 reply; 12+ messages in thread From: hank peng @ 2009-12-09 3:18 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-xfs 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: > hank peng wrote: >> Hi, all: >> I think it is a BUG, so I report it here. >> root@1234dahua:~# uname -a >> Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown >> >> Unable to handle kernel paging request for data at address 0x00000000 >> Faulting instruction address: 0xc019ea28 >> Oops: Kernel access of bad area, sig: 11 [#1] >> MPC85xx CDS >> Modules linked in: >> NIP: c019ea28 LR: c019ea00 CTR: 00000000 >> REGS: e233baf0 TRAP: 0300 Not tainted (2.6.31.6) >> MSR: 00029000 <EE,ME,CE> CR: 22008484 XER: 00000000 >> DEAR: 00000000, ESR: 00800000 >> TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000 >> GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000 >> GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000 >> GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0 >> GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8 >> NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0 > > huh, I don't think I've ever seen an oops here, nor has kerneloops.org. > > I wonder how you managed this ... :) > >> LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 >> Call Trace: >> [e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable) >> [e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0 >> [e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0 >> [e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810 > > so this is freeing blocks and adding them to the freespace btrees; > it needs to move entries out of a block to make room for the new one. > Not a terribly unusual operation, I think. > >> [e233bd20] [c0186668] xfs_free_extent+0xdc/0x104 >> [e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0 >> [e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8 >> [e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c >> [e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28 >> [e233bef0] [c009574c] __fput+0xe8/0x1dc >> [e233bf10] [c0092048] filp_close+0x70/0xb0 >> [e233bf30] [c009211c] sys_close+0x94/0xc0 >> [e233bf40] [c000f784] ret_from_syscall+0x0/0x3c >> Instruction dump: >> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a >> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040 >> ---[ end trace 069fbb7d042289d2 ]--- >> >> According to the above call trace, I checked the source code and found >> that it may be invoked by xfs_btree_make_block_unfull function in >> fs/xfs/xfs_btree.c: >> >> 2641 >> 2642 if (*stat) { >> 2643 *oindex = *index = cur->bc_ptrs[level]; >> 2644 return 0; >> 2645 } >> >> here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope >> someone can help me fix this BUG, if you need more information, let me >> know. > > Is the above from gdb? You're quite certain that this is the case, > or is this a guess? > > It seems a little unlikely because in the calling function: > > int optr; /* old key/record index */ > int ptr; /* key/record index */ > > // .... code code code ... > > if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) { > error = xfs_btree_make_block_unfull(cur, level, numrecs, > &optr, &ptr, &nptr, &ncur, &nrec, stat); > > We're just sending in the addresses of these local variables; > I don't see how these pointers could be NULL. > Thanks for your replay. I made this conclusion from assembly code, correct me if I am wrong. #powerpc-linux-gnuspe-objdump vmlinux | less <snip> c019e964 <xfs_btree_make_block_unfull>: c019e964: 94 21 ff c0 stwu r1,-64(r1) c019e968: 7c 08 02 a6 mflr r0 c019e96c: be e1 00 1c stmw r23,28(r1) c019e970: 7c 7f 1b 78 mr r31,r3 c019e974: 90 01 00 44 stw r0,68(r1) c019e978: 7c bc 2b 78 mr r28,r5 c019e97c: 7c d9 33 78 mr r25,r6 <I think here r6 store value of oindex > c019e980: 83 a1 00 48 lwz r29,72(r1) c019e984: 7c f7 3b 78 mr r23,r7 c019e988: 80 03 00 0c lwz r0,12(r3) c019e98c: 7d 1b 43 78 mr r27,r8 c019e990: 7d 3a 4b 78 mr r26,r9 c019e994: 7d 58 53 78 mr r24,r10 c019e998: 7c 9e 23 78 mr r30,r4 c019e99c: 70 0b 00 02 andi. r11,r0,2 c019e9a0: 41 82 00 14 beq- c019e9b4 <xfs_btree_make_block_unfull+0x50> c019e9a4: 89 23 00 78 lbz r9,120(r3) c019e9a8: 39 29 ff ff addi r9,r9,-1 c019e9ac: 7f 84 48 00 cmpw cr7,r4,r9 c019e9b0: 41 9e 00 90 beq- cr7,c019ea40 <xfs_btree_make_block_unfull+0xdc> c019e9b4: 7f e3 fb 78 mr r3,r31 c019e9b8: 7f c4 f3 78 mr r4,r30 c019e9bc: 7f a5 eb 78 mr r5,r29 c019e9c0: 4b ff db 39 bl c019c4f8 <xfs_btree_rshift> c019e9c4: 7c 7c 1b 79 mr. r28,r3 c019e9c8: 40 82 00 10 bne- c019e9d8 <xfs_btree_make_block_unfull+0x74> c019e9cc: 80 1d 00 00 lwz r0,0(r29) c019e9d0: 2f 80 00 00 cmpwi cr7,r0,0 c019e9d4: 41 9e 00 1c beq- cr7,c019e9f0 <xfs_btree_make_block_unfull+0x8c> c019e9d8: 80 01 00 44 lwz r0,68(r1) c019e9dc: 7f 83 e3 78 mr r3,r28 c019e9e0: ba e1 00 1c lmw r23,28(r1) c019e9e4: 38 21 00 40 addi r1,r1,64 c019e9e8: 7c 08 03 a6 mtlr r0 c019e9ec: 4e 80 00 20 blr c019e9f0: 7f e3 fb 78 mr r3,r31 c019e9f4: 7f c4 f3 78 mr r4,r30 c019e9f8: 7f a5 eb 78 mr r5,r29 c019e9fc: 4b ff df 59 bl c019c954 <xfs_btree_lshift> c019ea00: 7c 7c 1b 79 mr. r28,r3 c019ea04: 40 a2 ff d4 bne- c019e9d8 <xfs_btree_make_block_unfull+0x74> c019ea08: 80 1d 00 00 lwz r0,0(r29) c019ea0c: 2f 80 00 00 cmpwi cr7,r0,0 c019ea10: 41 9e 00 64 beq- cr7,c019ea74 <xfs_btree_make_block_unfull+0x110> c019ea14: 57 c9 10 3a rlwinm r9,r30,2,0,29 c019ea18: 7f 83 e3 78 mr r3,r28 c019ea1c: 7d 29 fa 14 add r9,r9,r31 c019ea20: 80 09 00 50 lwz r0,80(r9) c019ea24: 90 17 00 00 stw r0,0(r23) c019ea28: 90 19 00 00 stw r0,0(r25) <OOPs occured here> c019ea2c: 80 01 00 44 lwz r0,68(r1) c019ea30: ba e1 00 1c lmw r23,28(r1) c019ea34: 38 21 00 40 addi r1,r1,64 c019ea38: 7c 08 03 a6 mtlr r0 > Thanks, > -Eric > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <4B1F18C4.3060704@sandeen.net>]
[parent not found: <389deec70912082053v4310057dg479f6d4b6c4b46f7@mail.gmail.com>]
[parent not found: <4B1F31FD.3020705@sandeen.net>]
[parent not found: <389deec70912082220pcb3b5d1q516ac197d31502c5@mail.gmail.com>]
[parent not found: <389deec70912082230g38987576pc48d7699f23844c5@mail.gmail.com>]
[parent not found: <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com>]
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS [not found] ` <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com> @ 2009-12-14 15:56 ` Eric Sandeen 2009-12-15 0:49 ` hank peng 0 siblings, 1 reply; 12+ messages in thread From: Eric Sandeen @ 2009-12-14 15:56 UTC (permalink / raw) To: hank peng, xfs-oss hank peng wrote: > Hi,Eric: > I think I have found the reason to this problem, but I need you a little help. > We have tested it again, and the same OOPS occured again: Ok, let's keep this on the list please ... > Unable to handle kernel paging request for data at address 0x00000000 > Faulting instruction address: 0xc019f4b8 > Oops: Kernel access of bad area, sig: 11 [#1] > MPC85xx CDS > Modules linked in: > NIP: c019f4b8 LR: c019f490 CTR: 00000000 > REGS: ef965af0 TRAP: 0300 Not tainted (2.6.31.6-svn40) > MSR: 00029000 <EE,ME,CE> CR: 22008284 XER: 00000000 > DEAR: 00000000, ESR: 00800000 > TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000 > GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001 > GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000 > GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660 > GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18 > NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0 > LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 > Call Trace: > [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable) > [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0 > [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0 > [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810 > [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104 > [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0 > [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8 > [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c > [ef965ee0] [c01da638] xfs_file_release+0x14/0x28 > [ef965ef0] [c009574c] __fput+0xe8/0x1dc > [ef965f10] [c0092048] filp_close+0x70/0xb0 > [ef965f30] [c009211c] sys_close+0x94/0xc0 > [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c > Instruction dump: > 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a > 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040 > ---[ end trace 356726176eeecd9c ]--- > Oops: Exception in kernel mode, sig: 4 [#2] > MPC85xx CDS > Modules linked in: > NIP: c0187660 LR: c019b26c CTR: c0187660 > REGS: d42076a0 TRAP: 0700 Tainted: G D (2.6.31.6-svn40) > MSR: 00029000 <EE,ME,CE> CR: 22222082 XER: 00000000 > TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000 > GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003 > GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001 > GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001 > GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750 > NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20 > LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8 > Call Trace: > [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8 > [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc > [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc > [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30 > [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c > [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168 > [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524 > [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60 > [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc > [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450 > [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c > [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40 > [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744 > [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128 > [d4207d90] [c006d740] __writepage+0x24/0x80 > [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0 > [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34 > [d4207e60] [c006dd70] do_writepages+0x48/0x7c > [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4 > [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398 > [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4 > [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154 > [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4 > [d4207fc0] [c004c750] kthread+0x78/0x7c > <...> > > > There were another OOPS which followed the first one. After the first oops I think the rest is not interesting, things are in bad shape by now. > Please note that > in the second OOPS, a SIGILL has been invoked and address of illegal > instrucion is 0xc0187660. > In the first OOPS, look at the following registers: > > GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001 > GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000 > GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660 > GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18 > > I noticed that the value of r23 is also 0xc0187660. I have a little > powerpc assembly code knowledge, if I am not wrong, > *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was > built into the following asm code which I send it to you ealier: > 80 09 00 50 lwz r0,80(r9) > 90 17 00 00 stw r0,0(r23) > 90 19 00 00 stw r0,0(r25) <OOPs occured here> > > So, r23 should have pointed to address of index and never had a chace > to point to a code adress, but it did. What's worse, the code at > 0xc0187660 had been changed and the second OOPS happened imediately. > > Could you correct my analysis if I am wrong? > In addition, I think the problem may be caused by stack overflow, what > is your comments? > > Perhaps, but if this is the 2nd oops I think it is not worth investigating; we need to figure out why the first one happened, and from that stack trace I don't think you are close to overflowing... -eric > > 2009/12/9 hank peng <pengxihan@gmail.com>: >> 2009/12/9 hank peng <pengxihan@gmail.com>: >>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: >>>> hank peng wrote: >>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: >>>>>> hank peng wrote: >>>>>> >>>>>>> Thanks for your replay. >>>>>>> >>>>>>> I made this conclusion from assembly code, correct me if I am wrong. >>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less >>>>>>> <snip> >>>>>> (off list; if this works maybe you can reply on-list?) >>>>>> >>>>>> Could you use gdb to look? Maybe: >>>>>> >>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4 >>>>>> >>>>> I use gdb on my PC and get this: >>>>> >>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux >>>>> GNU gdb Red Hat Linux (6.5-37.el5rh) >>>>> Copyright (C) 2006 Free Software Foundation, Inc. >>>>> GDB is free software, covered by the GNU General Public License, and you are >>>>> welcome to change it and/or distribute copies of it under certain conditions. >>>>> Type "show copying" to see the conditions. >>>>> There is absolutely no warranty for GDB. Type "show warranty" for details. >>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host >>>>> libthread_db library "/lib/libthread_db.so.1". >>>>> >>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4 >>>>> No source file for address 0xc019ea28. >>>>> (gdb) >>>>> >>>>>> -Eric >>>> so I guess it is not built with debugging symbols perhaps? >>>> >>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe? >>>> >>> yes, you are right, now I get the result: >>> (gdb) l *xfs_btree_make_block_unfull+0xc4 >>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643). >>> 2638 error = xfs_btree_lshift(cur, level, stat); >>> 2639 if (error) >>> 2640 return error; >>> 2641 >>> 2642 if (*stat) { >>> 2643 *oindex = *index = cur->bc_ptrs[level]; >>> 2644 return 0; >>> 2645 } >>> 2646 >>> 2647 /* >>> >>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];" >>> >> Very strange, as you said, xfs_btree_insrec passes address local >> variable to xfs_btree_make_block_unfull, so it is impossible for >> oindex to be NULL. >> Do you think it may be an memory corrupt? >>>> -Eric >>>> >>> >>> >>> -- >>> The simplest is not all best but the best is surely the simplest! >>> >> >> >> -- >> The simplest is not all best but the best is surely the simplest! >> > > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-14 15:56 ` Eric Sandeen @ 2009-12-15 0:49 ` hank peng 2009-12-15 0:58 ` hank peng 2009-12-15 1:26 ` Dave Chinner 0 siblings, 2 replies; 12+ messages in thread From: hank peng @ 2009-12-15 0:49 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs-oss Hi, Eric: I add some code like this: if (*stat) { printk("*stat = 0x%08x, oindex = %p, index = %p\n", *stat, oindex, index); if (oindex == NULL || index == NULL) { printk("BUG occured!\n"); printk("oindex = %p, index = %p\n", oindex, index); BUG(); } *oindex = *index = cur->bc_ptrs[level]; return 0; } And the same OOPS happened again but a little different, kernel messages are: <snip> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc *stat = 0x00000001, oindex = 00000501, index = 22008424 Unable to handle kernel paging request for data at address 0x22008424 Faulting instruction address: 0xc019f568 Oops: Kernel access of bad area, sig: 11 [#1] MPC85xx CDS Modules linked in: NIP: c019f568 LR: c019f54c CTR: c023f9f4 REGS: e87d7af0 TRAP: 0300 Not tainted (2.6.31.6-svn40) MSR: 00029000 <EE,ME,CE> CR: 22008424 XER: 20000000 DEAR: 22008424, ESR: 00800000 TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000 GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91 GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000 GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0 GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8 NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4 LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 Call Trace: [e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable) [e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0 [e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0 [e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810 [e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104 [e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0 [e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8 [e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c [e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28 [e87d7ef0] [c00957dc] __fput+0xe8/0x1dc [e87d7f10] [c00920d8] filp_close+0x70/0xb0 [e87d7f30] [c00921ac] sys_close+0x94/0xc0 [e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c Instruction dump: 7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000 419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010 ---[ end trace f245b6a670339d8f ]--- </snip> As you see, after printing "*stat = 0x00000001, oindex = 00000501, index = 22008424", OOPS happened. Although my BUG() was not invoked, it did access bad area. 2009/12/14 Eric Sandeen <sandeen@sandeen.net>: > hank peng wrote: >> Hi,Eric: >> I think I have found the reason to this problem, but I need you a little help. >> We have tested it again, and the same OOPS occured again: > > Ok, let's keep this on the list please ... > >> Unable to handle kernel paging request for data at address 0x00000000 >> Faulting instruction address: 0xc019f4b8 >> Oops: Kernel access of bad area, sig: 11 [#1] >> MPC85xx CDS >> Modules linked in: >> NIP: c019f4b8 LR: c019f490 CTR: 00000000 >> REGS: ef965af0 TRAP: 0300 Not tainted (2.6.31.6-svn40) >> MSR: 00029000 <EE,ME,CE> CR: 22008284 XER: 00000000 >> DEAR: 00000000, ESR: 00800000 >> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000 >> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001 >> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000 >> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660 >> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18 >> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0 >> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 >> Call Trace: >> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable) >> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0 >> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0 >> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810 >> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104 >> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0 >> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8 >> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c >> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28 >> [ef965ef0] [c009574c] __fput+0xe8/0x1dc >> [ef965f10] [c0092048] filp_close+0x70/0xb0 >> [ef965f30] [c009211c] sys_close+0x94/0xc0 >> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c >> Instruction dump: >> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a >> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040 >> ---[ end trace 356726176eeecd9c ]--- >> Oops: Exception in kernel mode, sig: 4 [#2] >> MPC85xx CDS >> Modules linked in: >> NIP: c0187660 LR: c019b26c CTR: c0187660 >> REGS: d42076a0 TRAP: 0700 Tainted: G D (2.6.31.6-svn40) >> MSR: 00029000 <EE,ME,CE> CR: 22222082 XER: 00000000 >> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000 >> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003 >> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001 >> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001 >> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750 >> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20 >> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8 >> Call Trace: >> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8 >> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc >> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc >> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30 >> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c >> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168 >> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524 >> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60 >> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc >> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450 >> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c >> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40 >> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744 >> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128 >> [d4207d90] [c006d740] __writepage+0x24/0x80 >> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0 >> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34 >> [d4207e60] [c006dd70] do_writepages+0x48/0x7c >> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4 >> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398 >> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4 >> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154 >> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4 >> [d4207fc0] [c004c750] kthread+0x78/0x7c >> <...> >> >> >> There were another OOPS which followed the first one. > > After the first oops I think the rest is not interesting, things > are in bad shape by now. > >> Please note that >> in the second OOPS, a SIGILL has been invoked and address of illegal >> instrucion is 0xc0187660. >> In the first OOPS, look at the following registers: >> >> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001 >> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000 >> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660 >> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18 >> >> I noticed that the value of r23 is also 0xc0187660. I have a little >> powerpc assembly code knowledge, if I am not wrong, >> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was >> built into the following asm code which I send it to you ealier: >> 80 09 00 50 lwz r0,80(r9) >> 90 17 00 00 stw r0,0(r23) >> 90 19 00 00 stw r0,0(r25) <OOPs occured here> >> >> So, r23 should have pointed to address of index and never had a chace >> to point to a code adress, but it did. What's worse, the code at >> 0xc0187660 had been changed and the second OOPS happened imediately. >> >> Could you correct my analysis if I am wrong? >> In addition, I think the problem may be caused by stack overflow, what >> is your comments? >> >> > Perhaps, but if this is the 2nd oops I think it is not worth investigating; > we need to figure out why the first one happened, and from that stack trace > I don't think you are close to overflowing... > > -eric > >> >> 2009/12/9 hank peng <pengxihan@gmail.com>: >>> 2009/12/9 hank peng <pengxihan@gmail.com>: >>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: >>>>> hank peng wrote: >>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: >>>>>>> hank peng wrote: >>>>>>> >>>>>>>> Thanks for your replay. >>>>>>>> >>>>>>>> I made this conclusion from assembly code, correct me if I am wrong. >>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less >>>>>>>> <snip> >>>>>>> (off list; if this works maybe you can reply on-list?) >>>>>>> >>>>>>> Could you use gdb to look? Maybe: >>>>>>> >>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4 >>>>>>> >>>>>> I use gdb on my PC and get this: >>>>>> >>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux >>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh) >>>>>> Copyright (C) 2006 Free Software Foundation, Inc. >>>>>> GDB is free software, covered by the GNU General Public License, and you are >>>>>> welcome to change it and/or distribute copies of it under certain conditions. >>>>>> Type "show copying" to see the conditions. >>>>>> There is absolutely no warranty for GDB. Type "show warranty" for details. >>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host >>>>>> libthread_db library "/lib/libthread_db.so.1". >>>>>> >>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4 >>>>>> No source file for address 0xc019ea28. >>>>>> (gdb) >>>>>> >>>>>>> -Eric >>>>> so I guess it is not built with debugging symbols perhaps? >>>>> >>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe? >>>>> >>>> yes, you are right, now I get the result: >>>> (gdb) l *xfs_btree_make_block_unfull+0xc4 >>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643). >>>> 2638 error = xfs_btree_lshift(cur, level, stat); >>>> 2639 if (error) >>>> 2640 return error; >>>> 2641 >>>> 2642 if (*stat) { >>>> 2643 *oindex = *index = cur->bc_ptrs[level]; >>>> 2644 return 0; >>>> 2645 } >>>> 2646 >>>> 2647 /* >>>> >>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];" >>>> >>> Very strange, as you said, xfs_btree_insrec passes address local >>> variable to xfs_btree_make_block_unfull, so it is impossible for >>> oindex to be NULL. >>> Do you think it may be an memory corrupt? >>>>> -Eric >>>>> >>>> >>>> >>>> -- >>>> The simplest is not all best but the best is surely the simplest! >>>> >>> >>> >>> -- >>> The simplest is not all best but the best is surely the simplest! >>> >> >> >> > > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 0:49 ` hank peng @ 2009-12-15 0:58 ` hank peng 2009-12-15 1:26 ` Dave Chinner 1 sibling, 0 replies; 12+ messages in thread From: hank peng @ 2009-12-15 0:58 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs-oss 2009/12/15 hank peng <pengxihan@gmail.com>: > Hi, Eric: > I add some code like this: > if (*stat) { > printk("*stat = 0x%08x, oindex = %p, index = %p\n", > *stat, oindex, index); > if (oindex == NULL || index == NULL) { > printk("BUG occured!\n"); > printk("oindex = %p, index = %p\n", oindex, index); > BUG(); > } > *oindex = *index = cur->bc_ptrs[level]; > return 0; > } > > And the same OOPS happened again but a little different, kernel messages are: > > <snip> > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = 00000501, index = 22008424 > Unable to handle kernel paging request for data at address 0x22008424 > Faulting instruction address: 0xc019f568 > Oops: Kernel access of bad area, sig: 11 [#1] > MPC85xx CDS > Modules linked in: > NIP: c019f568 LR: c019f54c CTR: c023f9f4 > REGS: e87d7af0 TRAP: 0300 Not tainted (2.6.31.6-svn40) > MSR: 00029000 <EE,ME,CE> CR: 22008424 XER: 20000000 > DEAR: 22008424, ESR: 00800000 > TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000 > GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91 > GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000 > GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0 > GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8 > NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4 > LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 > Call Trace: > [e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable) > [e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0 > [e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0 > [e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810 > [e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104 > [e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0 > [e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8 > [e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c > [e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28 > [e87d7ef0] [c00957dc] __fput+0xe8/0x1dc > [e87d7f10] [c00920d8] filp_close+0x70/0xb0 > [e87d7f30] [c00921ac] sys_close+0x94/0xc0 > [e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c > Instruction dump: > 7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000 > 419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010 > ---[ end trace f245b6a670339d8f ]--- > </snip> > > As you see, after printing "*stat = 0x00000001, oindex = 00000501, > index = 22008424", OOPS happened. > Although my BUG() was not invoked, it did access bad area. > This is what gdb shows: (gdb) list *xfs_btree_make_block_unfull+0xe4 0xc019f568 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2650). 2645 if (oindex == NULL || index == NULL) { 2646 printk("BUG occured!\n"); 2647 printk("oindex = %p, index = %p\n", oindex, index); 2648 BUG(); 2649 } 2650 *oindex = *index = cur->bc_ptrs[level]; /* why alaways here????? */ 2651 return 0; 2652 } 2653 2654 /* (gdb) Why suddenly abnormal? memory corrupt? If so, why this OOPS always occured at the same place? > > > 2009/12/14 Eric Sandeen <sandeen@sandeen.net>: >> hank peng wrote: >>> Hi,Eric: >>> I think I have found the reason to this problem, but I need you a little help. >>> We have tested it again, and the same OOPS occured again: >> >> Ok, let's keep this on the list please ... >> >>> Unable to handle kernel paging request for data at address 0x00000000 >>> Faulting instruction address: 0xc019f4b8 >>> Oops: Kernel access of bad area, sig: 11 [#1] >>> MPC85xx CDS >>> Modules linked in: >>> NIP: c019f4b8 LR: c019f490 CTR: 00000000 >>> REGS: ef965af0 TRAP: 0300 Not tainted (2.6.31.6-svn40) >>> MSR: 00029000 <EE,ME,CE> CR: 22008284 XER: 00000000 >>> DEAR: 00000000, ESR: 00800000 >>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000 >>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001 >>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000 >>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660 >>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18 >>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0 >>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 >>> Call Trace: >>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable) >>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0 >>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0 >>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810 >>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104 >>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0 >>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8 >>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c >>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28 >>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc >>> [ef965f10] [c0092048] filp_close+0x70/0xb0 >>> [ef965f30] [c009211c] sys_close+0x94/0xc0 >>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c >>> Instruction dump: >>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a >>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040 >>> ---[ end trace 356726176eeecd9c ]--- >>> Oops: Exception in kernel mode, sig: 4 [#2] >>> MPC85xx CDS >>> Modules linked in: >>> NIP: c0187660 LR: c019b26c CTR: c0187660 >>> REGS: d42076a0 TRAP: 0700 Tainted: G D (2.6.31.6-svn40) >>> MSR: 00029000 <EE,ME,CE> CR: 22222082 XER: 00000000 >>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000 >>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003 >>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001 >>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001 >>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750 >>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20 >>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8 >>> Call Trace: >>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8 >>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc >>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc >>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30 >>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c >>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168 >>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524 >>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60 >>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc >>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450 >>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c >>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40 >>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744 >>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128 >>> [d4207d90] [c006d740] __writepage+0x24/0x80 >>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0 >>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34 >>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c >>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4 >>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398 >>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4 >>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154 >>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4 >>> [d4207fc0] [c004c750] kthread+0x78/0x7c >>> <...> >>> >>> >>> There were another OOPS which followed the first one. >> >> After the first oops I think the rest is not interesting, things >> are in bad shape by now. >> >>> Please note that >>> in the second OOPS, a SIGILL has been invoked and address of illegal >>> instrucion is 0xc0187660. >>> In the first OOPS, look at the following registers: >>> >>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001 >>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000 >>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660 >>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18 >>> >>> I noticed that the value of r23 is also 0xc0187660. I have a little >>> powerpc assembly code knowledge, if I am not wrong, >>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was >>> built into the following asm code which I send it to you ealier: >>> 80 09 00 50 lwz r0,80(r9) >>> 90 17 00 00 stw r0,0(r23) >>> 90 19 00 00 stw r0,0(r25) <OOPs occured here> >>> >>> So, r23 should have pointed to address of index and never had a chace >>> to point to a code adress, but it did. What's worse, the code at >>> 0xc0187660 had been changed and the second OOPS happened imediately. >>> >>> Could you correct my analysis if I am wrong? >>> In addition, I think the problem may be caused by stack overflow, what >>> is your comments? >>> >>> >> Perhaps, but if this is the 2nd oops I think it is not worth investigating; >> we need to figure out why the first one happened, and from that stack trace >> I don't think you are close to overflowing... >> >> -eric >> >>> >>> 2009/12/9 hank peng <pengxihan@gmail.com>: >>>> 2009/12/9 hank peng <pengxihan@gmail.com>: >>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: >>>>>> hank peng wrote: >>>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>: >>>>>>>> hank peng wrote: >>>>>>>> >>>>>>>>> Thanks for your replay. >>>>>>>>> >>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong. >>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less >>>>>>>>> <snip> >>>>>>>> (off list; if this works maybe you can reply on-list?) >>>>>>>> >>>>>>>> Could you use gdb to look? Maybe: >>>>>>>> >>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4 >>>>>>>> >>>>>>> I use gdb on my PC and get this: >>>>>>> >>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux >>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh) >>>>>>> Copyright (C) 2006 Free Software Foundation, Inc. >>>>>>> GDB is free software, covered by the GNU General Public License, and you are >>>>>>> welcome to change it and/or distribute copies of it under certain conditions. >>>>>>> Type "show copying" to see the conditions. >>>>>>> There is absolutely no warranty for GDB. Type "show warranty" for details. >>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host >>>>>>> libthread_db library "/lib/libthread_db.so.1". >>>>>>> >>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4 >>>>>>> No source file for address 0xc019ea28. >>>>>>> (gdb) >>>>>>> >>>>>>>> -Eric >>>>>> so I guess it is not built with debugging symbols perhaps? >>>>>> >>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe? >>>>>> >>>>> yes, you are right, now I get the result: >>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4 >>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643). >>>>> 2638 error = xfs_btree_lshift(cur, level, stat); >>>>> 2639 if (error) >>>>> 2640 return error; >>>>> 2641 >>>>> 2642 if (*stat) { >>>>> 2643 *oindex = *index = cur->bc_ptrs[level]; >>>>> 2644 return 0; >>>>> 2645 } >>>>> 2646 >>>>> 2647 /* >>>>> >>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];" >>>>> >>>> Very strange, as you said, xfs_btree_insrec passes address local >>>> variable to xfs_btree_make_block_unfull, so it is impossible for >>>> oindex to be NULL. >>>> Do you think it may be an memory corrupt? >>>>>> -Eric >>>>>> >>>>> >>>>> >>>>> -- >>>>> The simplest is not all best but the best is surely the simplest! >>>>> >>>> >>>> >>>> -- >>>> The simplest is not all best but the best is surely the simplest! >>>> >>> >>> >>> >> >> > > > > -- > The simplest is not all best but the best is surely the simplest! > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 0:49 ` hank peng 2009-12-15 0:58 ` hank peng @ 2009-12-15 1:26 ` Dave Chinner 2009-12-15 1:56 ` hank peng ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: Dave Chinner @ 2009-12-15 1:26 UTC (permalink / raw) To: hank peng; +Cc: Eric Sandeen, xfs-oss On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote: > Hi, Eric: > I add some code like this: > if (*stat) { > printk("*stat = 0x%08x, oindex = %p, index = %p\n", > *stat, oindex, index); > if (oindex == NULL || index == NULL) { This won't catch bad non-NULL pointers like you are seeing. > printk("BUG occured!\n"); > printk("oindex = %p, index = %p\n", oindex, index); > BUG(); > } > *oindex = *index = cur->bc_ptrs[level]; > return 0; > } > > And the same OOPS happened again but a little different, kernel messages are: > > <snip> > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc > *stat = 0x00000001, oindex = 00000501, index = 22008424 > Unable to handle kernel paging request for data at address 0x22008424 Given that oindex and index are stack varibles, this indicates some thing is probably smashing the stack. Possibly a buffer overrun. To narrow down the possible cause, can you add the debug: printk("%s:%s: oindex = %p, index = %p\n", __func__, __LINE__, oindex, index); throughout the xfs_btree_make_block_unfull() function? i.e. at first entry, before the xfs_btree_rshift() call, before the xfs_btree_lshift() call, etc, to see if any of the parameters are being modified during execution of the function? If the variables being passed into xfs_btree_make_block_unfull() are already bad, then do the same thing for the caller xfs_btree_insert(). This may help narrow down where the problem is coming from.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 1:26 ` Dave Chinner @ 2009-12-15 1:56 ` hank peng 2009-12-15 3:15 ` Eric Sandeen 2009-12-15 5:36 ` hank peng 2010-01-13 1:11 ` hank peng 2 siblings, 1 reply; 12+ messages in thread From: hank peng @ 2009-12-15 1:56 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss 2009/12/15 Dave Chinner <david@fromorbit.com>: > On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote: >> Hi, Eric: >> I add some code like this: >> if (*stat) { >> printk("*stat = 0x%08x, oindex = %p, index = %p\n", >> *stat, oindex, index); >> if (oindex == NULL || index == NULL) { > > This won't catch bad non-NULL pointers like you are seeing. > >> printk("BUG occured!\n"); >> printk("oindex = %p, index = %p\n", oindex, index); >> BUG(); >> } >> *oindex = *index = cur->bc_ptrs[level]; >> return 0; >> } >> >> And the same OOPS happened again but a little different, kernel messages are: >> >> <snip> >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = 00000501, index = 22008424 >> Unable to handle kernel paging request for data at address 0x22008424 > > Given that oindex and index are stack varibles, this indicates some > thing is probably smashing the stack. Possibly a buffer overrun. To > narrow down the possible cause, can you add the debug: > > printk("%s:%s: oindex = %p, index = %p\n", > __func__, __LINE__, oindex, index); > > throughout the xfs_btree_make_block_unfull() function? i.e. at > first entry, before the xfs_btree_rshift() call, before the > xfs_btree_lshift() call, etc, to see if any of the parameters > are being modified during execution of the function? > > If the variables being passed into xfs_btree_make_block_unfull() are > already bad, then do the same thing for the caller > xfs_btree_insert(). This may help narrow down where the problem > is coming from.... > Thanks for your reply! As you said, I added some code like this: /* First, try shifting an entry to the right neighbor. */ printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n", __func__, oindex, index); error = xfs_btree_rshift(cur, level, stat); if (error || *stat) return error; /* Next, try shifting an entry to the left neighbor. */ printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n", __func__, oindex, index); error = xfs_btree_lshift(cur, level, stat); if (error) return error; if (*stat) { printk("*stat = 0x%08x, oindex = %p, index = %p\n", *stat, oindex, index); if (oindex == NULL || index == NULL) { printk("BUG occured!\n"); printk("oindex = %p, index = %p\n", oindex, index); BUG(); } *oindex = *index = cur->bc_ptrs[level]; return 0; } xfs_btree_set_ptr_null(cur, &nptr); if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) { printk("%s: before calling xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n", __func__, &optr, &ptr); error = xfs_btree_make_block_unfull(cur, level, numrecs, &optr, &ptr, &nptr, &ncur, &nrec, stat); if (error || *stat == 0) goto error0; } We are waiting for OOPS to happen. I hope it will nerver be memory corrupt problem which is nightmare for me to debug. > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 1:56 ` hank peng @ 2009-12-15 3:15 ` Eric Sandeen 2009-12-15 3:22 ` hank peng 0 siblings, 1 reply; 12+ messages in thread From: Eric Sandeen @ 2009-12-15 3:15 UTC (permalink / raw) To: hank peng; +Cc: xfs-oss hank peng wrote: > 2009/12/15 Dave Chinner <david@fromorbit.com>: >> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote: >>> Hi, Eric: >>> I add some code like this: >>> if (*stat) { >>> printk("*stat = 0x%08x, oindex = %p, index = %p\n", >>> *stat, oindex, index); >>> if (oindex == NULL || index == NULL) { >> This won't catch bad non-NULL pointers like you are seeing. >> >>> printk("BUG occured!\n"); >>> printk("oindex = %p, index = %p\n", oindex, index); >>> BUG(); >>> } >>> *oindex = *index = cur->bc_ptrs[level]; >>> return 0; >>> } >>> >>> And the same OOPS happened again but a little different, kernel messages are: >>> >>> <snip> >>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>> *stat = 0x00000001, oindex = 00000501, index = 22008424 >>> Unable to handle kernel paging request for data at address 0x22008424 Are you using any of the xfs userspace prior to this error, or is it a fresh boot and just normal IO? I ask because libxfs calls sys_ustat() which at one point was corrupting userspace, at least, with 32-bit userspace on a 64-bit kernel: https://bugzilla.redhat.com/show_bug.cgi?id=472795 Even with that fixed there were still some reports of odd behavior on ppc... I don't know if things might be going wrong in kernelspace as well... https://bugzilla.redhat.com/show_bug.cgi?id=517994 and I haven't gotten to the bottom of that yet ... Very few things actually use sys_ustat, but xfs userspace does... just a random thought. -eric >> Given that oindex and index are stack varibles, this indicates some >> thing is probably smashing the stack. Possibly a buffer overrun. To >> narrow down the possible cause, can you add the debug: >> >> printk("%s:%s: oindex = %p, index = %p\n", >> __func__, __LINE__, oindex, index); >> >> throughout the xfs_btree_make_block_unfull() function? i.e. at >> first entry, before the xfs_btree_rshift() call, before the >> xfs_btree_lshift() call, etc, to see if any of the parameters >> are being modified during execution of the function? >> >> If the variables being passed into xfs_btree_make_block_unfull() are >> already bad, then do the same thing for the caller >> xfs_btree_insert(). This may help narrow down where the problem >> is coming from.... >> > Thanks for your reply! > As you said, I added some code like this: > /* First, try shifting an entry to the right neighbor. */ > printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n", > __func__, oindex, index); > error = xfs_btree_rshift(cur, level, stat); > if (error || *stat) > return error; > > /* Next, try shifting an entry to the left neighbor. */ > printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n", > __func__, oindex, index); > error = xfs_btree_lshift(cur, level, stat); > if (error) > return error; > > if (*stat) { > printk("*stat = 0x%08x, oindex = %p, index = %p\n", > *stat, oindex, index); > if (oindex == NULL || index == NULL) { > printk("BUG occured!\n"); > printk("oindex = %p, index = %p\n", oindex, index); > BUG(); > } > *oindex = *index = cur->bc_ptrs[level]; > return 0; > } > > > xfs_btree_set_ptr_null(cur, &nptr); > if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) { > printk("%s: before calling > xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n", > __func__, &optr, &ptr); > error = xfs_btree_make_block_unfull(cur, level, numrecs, > &optr, &ptr, &nptr, &ncur, &nrec, stat); > if (error || *stat == 0) > goto error0; > } > > > We are waiting for OOPS to happen. > > I hope it will nerver be memory corrupt problem which is nightmare for > me to debug. > >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com >> > > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 3:15 ` Eric Sandeen @ 2009-12-15 3:22 ` hank peng 0 siblings, 0 replies; 12+ messages in thread From: hank peng @ 2009-12-15 3:22 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs-oss 2009/12/15 Eric Sandeen <sandeen@sandeen.net>: > hank peng wrote: >> 2009/12/15 Dave Chinner <david@fromorbit.com>: >>> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote: >>>> Hi, Eric: >>>> I add some code like this: >>>> if (*stat) { >>>> printk("*stat = 0x%08x, oindex = %p, index = %p\n", >>>> *stat, oindex, index); >>>> if (oindex == NULL || index == NULL) { >>> This won't catch bad non-NULL pointers like you are seeing. >>> >>>> printk("BUG occured!\n"); >>>> printk("oindex = %p, index = %p\n", oindex, index); >>>> BUG(); >>>> } >>>> *oindex = *index = cur->bc_ptrs[level]; >>>> return 0; >>>> } >>>> >>>> And the same OOPS happened again but a little different, kernel messages are: >>>> >>>> <snip> >>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >>>> *stat = 0x00000001, oindex = 00000501, index = 22008424 >>>> Unable to handle kernel paging request for data at address 0x22008424 > > Are you using any of the xfs userspace prior to this error, or is it a > fresh boot and just normal IO? > no xfs userspace prior to this error, just normal IO. Besides, it need some time to produce the OOPS. > I ask because libxfs calls sys_ustat() which at one point was corrupting > userspace, at least, with 32-bit userspace on a 64-bit kernel: > https://bugzilla.redhat.com/show_bug.cgi?id=472795 > Forgot to say, I use "-o inode64" when mount. # uname -a Linux Storage 2.6.31.6-svn40 #30 Tue Dec 15 09:50:02 CST 2009 ppc unknown # mount rootfs on / type rootfs (rw) /dev/root on / type ext2 (rw,relatime,errors=continue) /dev/mtdblock2 on /mnt/sys_data type jffs2 (rw,relatime) proc on /proc type proc (rw,relatime) sysfs on /sys type sysfs (rw,relatime) tmpfs on /opt/upgrade type tmpfs (rw,relatime) devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620) /dev/Pool_md2/ss1 on /mnt/Pool_md2/ss1 type xfs (rw,relatime,attr2,inode64,noquota) > Even with that fixed there were still some reports of odd behavior > on ppc... I don't know if things might be going wrong in kernelspace > as well... > > https://bugzilla.redhat.com/show_bug.cgi?id=517994 > and I haven't gotten to the bottom of that yet ... > > Very few things actually use sys_ustat, but xfs userspace does... > just a random thought. > > -eric > >>> Given that oindex and index are stack varibles, this indicates some >>> thing is probably smashing the stack. Possibly a buffer overrun. To >>> narrow down the possible cause, can you add the debug: >>> >>> printk("%s:%s: oindex = %p, index = %p\n", >>> __func__, __LINE__, oindex, index); >>> >>> throughout the xfs_btree_make_block_unfull() function? i.e. at >>> first entry, before the xfs_btree_rshift() call, before the >>> xfs_btree_lshift() call, etc, to see if any of the parameters >>> are being modified during execution of the function? >>> >>> If the variables being passed into xfs_btree_make_block_unfull() are >>> already bad, then do the same thing for the caller >>> xfs_btree_insert(). This may help narrow down where the problem >>> is coming from.... >>> >> Thanks for your reply! >> As you said, I added some code like this: >> /* First, try shifting an entry to the right neighbor. */ >> printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n", >> __func__, oindex, index); >> error = xfs_btree_rshift(cur, level, stat); >> if (error || *stat) >> return error; >> >> /* Next, try shifting an entry to the left neighbor. */ >> printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n", >> __func__, oindex, index); >> error = xfs_btree_lshift(cur, level, stat); >> if (error) >> return error; >> >> if (*stat) { >> printk("*stat = 0x%08x, oindex = %p, index = %p\n", >> *stat, oindex, index); >> if (oindex == NULL || index == NULL) { >> printk("BUG occured!\n"); >> printk("oindex = %p, index = %p\n", oindex, index); >> BUG(); >> } >> *oindex = *index = cur->bc_ptrs[level]; >> return 0; >> } >> >> >> xfs_btree_set_ptr_null(cur, &nptr); >> if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) { >> printk("%s: before calling >> xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n", >> __func__, &optr, &ptr); >> error = xfs_btree_make_block_unfull(cur, level, numrecs, >> &optr, &ptr, &nptr, &ncur, &nrec, stat); >> if (error || *stat == 0) >> goto error0; >> } >> >> >> We are waiting for OOPS to happen. >> >> I hope it will nerver be memory corrupt problem which is nightmare for >> me to debug. >> >>> Cheers, >>> >>> Dave. >>> -- >>> Dave Chinner >>> david@fromorbit.com >>> >> >> >> > > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 1:26 ` Dave Chinner 2009-12-15 1:56 ` hank peng @ 2009-12-15 5:36 ` hank peng 2010-01-13 1:11 ` hank peng 2 siblings, 0 replies; 12+ messages in thread From: hank peng @ 2009-12-15 5:36 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss 2009/12/15 Dave Chinner <david@fromorbit.com>: > On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote: >> Hi, Eric: >> I add some code like this: >> if (*stat) { >> printk("*stat = 0x%08x, oindex = %p, index = %p\n", >> *stat, oindex, index); >> if (oindex == NULL || index == NULL) { > > This won't catch bad non-NULL pointers like you are seeing. > >> printk("BUG occured!\n"); >> printk("oindex = %p, index = %p\n", oindex, index); >> BUG(); >> } >> *oindex = *index = cur->bc_ptrs[level]; >> return 0; >> } >> >> And the same OOPS happened again but a little different, kernel messages are: >> >> <snip> >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = 00000501, index = 22008424 >> Unable to handle kernel paging request for data at address 0x22008424 > > Given that oindex and index are stack varibles, this indicates some In xfs_btree_make_block_unfull, it seems that oindex and index are optimised to register variables. So, it become more odd. > thing is probably smashing the stack. Possibly a buffer overrun. To > narrow down the possible cause, can you add the debug: > > printk("%s:%s: oindex = %p, index = %p\n", > __func__, __LINE__, oindex, index); > > throughout the xfs_btree_make_block_unfull() function? i.e. at > first entry, before the xfs_btree_rshift() call, before the > xfs_btree_lshift() call, etc, to see if any of the parameters > are being modified during execution of the function? > > If the variables being passed into xfs_btree_make_block_unfull() are > already bad, then do the same thing for the caller > xfs_btree_insert(). This may help narrow down where the problem > is coming from.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS 2009-12-15 1:26 ` Dave Chinner 2009-12-15 1:56 ` hank peng 2009-12-15 5:36 ` hank peng @ 2010-01-13 1:11 ` hank peng 2 siblings, 0 replies; 12+ messages in thread From: hank peng @ 2010-01-13 1:11 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss 2009/12/15 Dave Chinner <david@fromorbit.com>: > On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote: >> Hi, Eric: >> I add some code like this: >> if (*stat) { >> printk("*stat = 0x%08x, oindex = %p, index = %p\n", >> *stat, oindex, index); >> if (oindex == NULL || index == NULL) { > > This won't catch bad non-NULL pointers like you are seeing. > >> printk("BUG occured!\n"); >> printk("oindex = %p, index = %p\n", oindex, index); >> BUG(); >> } >> *oindex = *index = cur->bc_ptrs[level]; >> return 0; >> } >> >> And the same OOPS happened again but a little different, kernel messages are: >> >> <snip> >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc >> *stat = 0x00000001, oindex = 00000501, index = 22008424 >> Unable to handle kernel paging request for data at address 0x22008424 > > Given that oindex and index are stack varibles, this indicates some > thing is probably smashing the stack. Possibly a buffer overrun. To > narrow down the possible cause, can you add the debug: > > printk("%s:%s: oindex = %p, index = %p\n", > __func__, __LINE__, oindex, index); > > throughout the xfs_btree_make_block_unfull() function? i.e. at > first entry, before the xfs_btree_rshift() call, before the > xfs_btree_lshift() call, etc, to see if any of the parameters > are being modified during execution of the function? > > If the variables being passed into xfs_btree_make_block_unfull() are > already bad, then do the same thing for the caller > xfs_btree_insert(). This may help narrow down where the problem > is coming from.... > I added the following debug code as you said: <code> printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n", __func__, oindex, index); error = xfs_btree_rshift(cur, level, stat); if (error || *stat) return error; /* Next, try shifting an entry to the left neighbor. */ printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n", __func__, oindex, index); error = xfs_btree_lshift(cur, level, stat); if (error) return error; if (*stat) { printk("%s: oindex = %p, index = %p, *stat = %d\n", __func__, oindex, index, *stat); *oindex = *index = cur->bc_ptrs[level]; return 0; } </code> It has been working fine for about 36 hours without problem, but in today's morning, odd OOPS appeared: xfs_btree_make_block_unfull: before xfs_btree_rshift, oindex = d3a27bd8, index = d3a27bdc xfs_btree_make_block_unfull: before xfs_btree_lshift, oindex = d3a27bd8, index = d3a27bdc xfs_btree_make_block_unfull: oindex = d3a27bd8, index = d3a27bdc, *stat = 1 xfs_btree_make_block_unfull: before xfs_btree_rshift, oindex = d3a27bd8, index = d3a27bdc Unable to handle kernel paging request for data at address 0x00000501 Faulting instruction address: 0xc019f4f0 Oops: Kernel access of bad area, sig: 11 [#2] MPC85xx CDS Modules linked in: NIP: c019f4f0 LR: c019f4e8 CTR: c023fabc REGS: d3a27ad0 TRAP: 0300 Tainted: G D (2.6.31.6-svn45) MSR: 00029000 <EE,ME,CE> CR: 22008424 XER: 20000000 DEAR: 00000501, ESR: 00000000 TASK = efb46a30[20273] 'cp' THREAD: d3a26000 GPR00: c019f4e8 d3a27b80 efb46a30 00000000 d3a27b38 d3a27b38 00000010 007f0f26 GPR08: c04a7c40 ffffffff e8517850 d3a27b80 20008422 100eb39c 3fff5400 100a0000 GPR16: 100d5ac8 00000000 016d30f3 e8517850 c019d08c 00029000 d3a27bf0 c023fabc GPR24: c019d068 00000000 22008424 d3a27bdc 00000501 d3a27bd8 00000000 e8517850 NIP [c019f4f0] xfs_btree_make_block_unfull+0x8c/0x1f8 LR [c019f4e8] xfs_btree_make_block_unfull+0x84/0x1f8 Call Trace: [d3a27b80] [c019f4e8] xfs_btree_make_block_unfull+0x84/0x1f8 (unreliable) [d3a27bc0] [c019f9d0] xfs_btree_insrec+0x374/0x4b0 [d3a27c30] [c019fb88] xfs_btree_insert+0x7c/0x1c0 [d3a27c90] [c01865d0] xfs_free_ag_extent+0x34c/0x810 [d3a27d00] [c0187168] xfs_free_extent+0xdc/0x104 [d3a27d90] [c018fe50] xfs_bmap_finish+0x154/0x1a0 [d3a27dc0] [c01b697c] xfs_itruncate_finish+0x254/0x3b8 [d3a27e40] [c01d2134] xfs_inactive+0x2c4/0x450 [d3a27e80] [c01e193c] xfs_fs_clear_inode+0x40/0x50 [d3a27e90] [c00a84bc] clear_inode+0x6c/0x108 [d3a27ea0] [c00a87d0] generic_delete_inode+0x114/0x118 [d3a27eb0] [c00a7ff8] iput+0x74/0x94 [d3a27ec0] [c00a003c] do_unlinkat+0x114/0x198 [d3a27f40] [c000f7ac] ret_from_syscall+0x0/0x3c Instruction dump: 7f66db78 7f44d378 7fa5eb78 3863eca4 4cc63182 4be97ef5 7fe3fb78 7fc4f378 7f85e378 4bffdb15 7c791b79 40820010 <801c0000> 2f800000 419e001c 80010044 ---[ end trace 95e2c49eb5a34f9a ]--- (gdb) list *(xfs_btree_make_block_unfull+0x8c) 0xc019f4f0 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2636). 2631 2632 /* First, try shifting an entry to the right neighbor. */ 2633 printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n", 2634 __func__, oindex, index); 2635 error = xfs_btree_rshift(cur, level, stat); 2636 if (error || *stat) 2637 return error; 2638 2639 /* Next, try shifting an entry to the left neighbor. */ 2640 printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n", It seems that after call xfs_btree_rshift, the value of 'stat' has been changed, how could it be possible since it is local variable? # uname -a Linux Storage 2.6.31.6-svn45 #87 Mon Jan 11 13:22:14 CST 2010 ppc unknown # mount rootfs on / type rootfs (rw) /dev/root on / type ext2 (rw,relatime,errors=continue) /dev/mtdblock2 on /mnt/sys_data type jffs2 (rw,relatime) proc on /proc type proc (rw,relatime) sysfs on /sys type sysfs (rw,relatime) tmpfs on /opt/upgrade type tmpfs (rw,relatime) devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620) /dev/vg_log/lv_log on /var/log type reiserfs (rw,relatime) /dev/Pool_md1/SS1 on /mnt/Pool_md1/SS1 type xfs (rw,relatime,attr2,inode64,noquota) /dev/Pool_md2/SS2 on /mnt/Pool_md2/SS2 type xfs (rw,relatime,attr2,inode64,noquota) root@Storage:/var/log# df -h Filesystem Size Used Available Use% Mounted on /dev/root 124.0M 72.6M 51.4M 59% / /dev/mtdblock2 1.0M 408.0K 616.0K 40% /mnt/sys_data tmpfs 505.3M 0 505.3M 0% /opt/upgrade /dev/vg_log/lv_log 10.0G 32.4M 10.0G 0% /var/log /dev/Pool_md1/SS1 2.7T 270.2G 2.5T 10% /mnt/Pool_md1/SS1 /dev/Pool_md2/SS2 2.7T 344.0G 2.4T 12% /mnt/Pool_md2/SS2 From assembly code, I noticed that the local variable 'stat' didn't have real space in stack. It is optimised to be a register(r28). According to powerpc ABI, before call xfs_btree_rshift, some registers will be saved at stack and before return from xfs_btree_rshift, these registers will be restored. Is it possible that a smash occured at this time? BTW, I noticed that my cross-compiler "powerpc-linux-gnuspe-gcc" didn't have default 4 bytes alignment but 8 bytes alignment. > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- The simplest is not all best but the best is surely the simplest! _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-01-13 1:10 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-09 1:58 [BUG report]xfs_btree_make_block_unfull generated an OOPS hank peng
2009-12-09 2:57 ` Eric Sandeen
2009-12-09 3:18 ` hank peng
[not found] ` <4B1F18C4.3060704@sandeen.net>
[not found] ` <389deec70912082053v4310057dg479f6d4b6c4b46f7@mail.gmail.com>
[not found] ` <4B1F31FD.3020705@sandeen.net>
[not found] ` <389deec70912082220pcb3b5d1q516ac197d31502c5@mail.gmail.com>
[not found] ` <389deec70912082230g38987576pc48d7699f23844c5@mail.gmail.com>
[not found] ` <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com>
2009-12-14 15:56 ` Eric Sandeen
2009-12-15 0:49 ` hank peng
2009-12-15 0:58 ` hank peng
2009-12-15 1:26 ` Dave Chinner
2009-12-15 1:56 ` hank peng
2009-12-15 3:15 ` Eric Sandeen
2009-12-15 3:22 ` hank peng
2009-12-15 5:36 ` hank peng
2010-01-13 1:11 ` hank peng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox