* [BUG report]xfs_btree_make_block_unfull generated an OOPS
@ 2009-12-09 1:58 hank peng
2009-12-09 2:57 ` Eric Sandeen
0 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-09 1:58 UTC (permalink / raw)
To: linux-xfs
Hi, all:
I think it is a BUG, so I report it here.
root@1234dahua:~# uname -a
Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc019ea28
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
NIP: c019ea28 LR: c019ea00 CTR: 00000000
REGS: e233baf0 TRAP: 0300 Not tainted (2.6.31.6)
MSR: 00029000 <EE,ME,CE> CR: 22008484 XER: 00000000
DEAR: 00000000, ESR: 00800000
TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0
LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
Call Trace:
[e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
[e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
[e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
[e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810
[e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
[e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
[e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
[e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
[e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
[e233bef0] [c009574c] __fput+0xe8/0x1dc
[e233bf10] [c0092048] filp_close+0x70/0xb0
[e233bf30] [c009211c] sys_close+0x94/0xc0
[e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
Instruction dump:
7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
---[ end trace 069fbb7d042289d2 ]---
According to the above call trace, I checked the source code and found
that it may be invoked by xfs_btree_make_block_unfull function in
fs/xfs/xfs_btree.c:
2641
2642 if (*stat) {
2643 *oindex = *index = cur->bc_ptrs[level];
2644 return 0;
2645 }
here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
someone can help me fix this BUG, if you need more information, let me
know.
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-09 1:58 [BUG report]xfs_btree_make_block_unfull generated an OOPS hank peng
@ 2009-12-09 2:57 ` Eric Sandeen
2009-12-09 3:18 ` hank peng
0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2009-12-09 2:57 UTC (permalink / raw)
To: hank peng; +Cc: linux-xfs
hank peng wrote:
> Hi, all:
> I think it is a BUG, so I report it here.
> root@1234dahua:~# uname -a
> Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown
>
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc019ea28
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019ea28 LR: c019ea00 CTR: 00000000
> REGS: e233baf0 TRAP: 0300 Not tainted (2.6.31.6)
> MSR: 00029000 <EE,ME,CE> CR: 22008484 XER: 00000000
> DEAR: 00000000, ESR: 00800000
> TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
> GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
> GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
> GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
> GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
> NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0
huh, I don't think I've ever seen an oops here, nor has kerneloops.org.
I wonder how you managed this ... :)
> LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
> Call Trace:
> [e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
> [e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
> [e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
> [e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810
so this is freeing blocks and adding them to the freespace btrees;
it needs to move entries out of a block to make room for the new one.
Not a terribly unusual operation, I think.
> [e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
> [e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
> [e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
> [e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
> [e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
> [e233bef0] [c009574c] __fput+0xe8/0x1dc
> [e233bf10] [c0092048] filp_close+0x70/0xb0
> [e233bf30] [c009211c] sys_close+0x94/0xc0
> [e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
> ---[ end trace 069fbb7d042289d2 ]---
>
> According to the above call trace, I checked the source code and found
> that it may be invoked by xfs_btree_make_block_unfull function in
> fs/xfs/xfs_btree.c:
>
> 2641
> 2642 if (*stat) {
> 2643 *oindex = *index = cur->bc_ptrs[level];
> 2644 return 0;
> 2645 }
>
> here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
> someone can help me fix this BUG, if you need more information, let me
> know.
Is the above from gdb? You're quite certain that this is the case,
or is this a guess?
It seems a little unlikely because in the calling function:
int optr; /* old key/record index */
int ptr; /* key/record index */
// .... code code code ...
if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
error = xfs_btree_make_block_unfull(cur, level, numrecs,
&optr, &ptr, &nptr, &ncur, &nrec, stat);
We're just sending in the addresses of these local variables;
I don't see how these pointers could be NULL.
Thanks,
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-09 2:57 ` Eric Sandeen
@ 2009-12-09 3:18 ` hank peng
[not found] ` <4B1F18C4.3060704@sandeen.net>
0 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-09 3:18 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs
2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> Hi, all:
>> I think it is a BUG, so I report it here.
>> root@1234dahua:~# uname -a
>> Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown
>>
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc019ea28
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c019ea28 LR: c019ea00 CTR: 00000000
>> REGS: e233baf0 TRAP: 0300 Not tainted (2.6.31.6)
>> MSR: 00029000 <EE,ME,CE> CR: 22008484 XER: 00000000
>> DEAR: 00000000, ESR: 00800000
>> TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
>> GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
>> GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
>> GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
>> GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
>> NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0
>
> huh, I don't think I've ever seen an oops here, nor has kerneloops.org.
>
> I wonder how you managed this ... :)
>
>> LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
>> Call Trace:
>> [e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>> [e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
>> [e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
>> [e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810
>
> so this is freeing blocks and adding them to the freespace btrees;
> it needs to move entries out of a block to make room for the new one.
> Not a terribly unusual operation, I think.
>
>> [e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
>> [e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
>> [e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
>> [e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
>> [e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
>> [e233bef0] [c009574c] __fput+0xe8/0x1dc
>> [e233bf10] [c0092048] filp_close+0x70/0xb0
>> [e233bf30] [c009211c] sys_close+0x94/0xc0
>> [e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
>> Instruction dump:
>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>> ---[ end trace 069fbb7d042289d2 ]---
>>
>> According to the above call trace, I checked the source code and found
>> that it may be invoked by xfs_btree_make_block_unfull function in
>> fs/xfs/xfs_btree.c:
>>
>> 2641
>> 2642 if (*stat) {
>> 2643 *oindex = *index = cur->bc_ptrs[level];
>> 2644 return 0;
>> 2645 }
>>
>> here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
>> someone can help me fix this BUG, if you need more information, let me
>> know.
>
> Is the above from gdb? You're quite certain that this is the case,
> or is this a guess?
>
> It seems a little unlikely because in the calling function:
>
> int optr; /* old key/record index */
> int ptr; /* key/record index */
>
> // .... code code code ...
>
> if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
> error = xfs_btree_make_block_unfull(cur, level, numrecs,
> &optr, &ptr, &nptr, &ncur, &nrec, stat);
>
> We're just sending in the addresses of these local variables;
> I don't see how these pointers could be NULL.
>
Thanks for your replay.
I made this conclusion from assembly code, correct me if I am wrong.
#powerpc-linux-gnuspe-objdump vmlinux | less
<snip>
c019e964 <xfs_btree_make_block_unfull>:
c019e964: 94 21 ff c0 stwu r1,-64(r1)
c019e968: 7c 08 02 a6 mflr r0
c019e96c: be e1 00 1c stmw r23,28(r1)
c019e970: 7c 7f 1b 78 mr r31,r3
c019e974: 90 01 00 44 stw r0,68(r1)
c019e978: 7c bc 2b 78 mr r28,r5
c019e97c: 7c d9 33 78 mr r25,r6 <I think
here r6 store value of oindex >
c019e980: 83 a1 00 48 lwz r29,72(r1)
c019e984: 7c f7 3b 78 mr r23,r7
c019e988: 80 03 00 0c lwz r0,12(r3)
c019e98c: 7d 1b 43 78 mr r27,r8
c019e990: 7d 3a 4b 78 mr r26,r9
c019e994: 7d 58 53 78 mr r24,r10
c019e998: 7c 9e 23 78 mr r30,r4
c019e99c: 70 0b 00 02 andi. r11,r0,2
c019e9a0: 41 82 00 14 beq- c019e9b4
<xfs_btree_make_block_unfull+0x50>
c019e9a4: 89 23 00 78 lbz r9,120(r3)
c019e9a8: 39 29 ff ff addi r9,r9,-1
c019e9ac: 7f 84 48 00 cmpw cr7,r4,r9
c019e9b0: 41 9e 00 90 beq- cr7,c019ea40
<xfs_btree_make_block_unfull+0xdc>
c019e9b4: 7f e3 fb 78 mr r3,r31
c019e9b8: 7f c4 f3 78 mr r4,r30
c019e9bc: 7f a5 eb 78 mr r5,r29
c019e9c0: 4b ff db 39 bl c019c4f8 <xfs_btree_rshift>
c019e9c4: 7c 7c 1b 79 mr. r28,r3
c019e9c8: 40 82 00 10 bne- c019e9d8
<xfs_btree_make_block_unfull+0x74>
c019e9cc: 80 1d 00 00 lwz r0,0(r29)
c019e9d0: 2f 80 00 00 cmpwi cr7,r0,0
c019e9d4: 41 9e 00 1c beq- cr7,c019e9f0
<xfs_btree_make_block_unfull+0x8c>
c019e9d8: 80 01 00 44 lwz r0,68(r1)
c019e9dc: 7f 83 e3 78 mr r3,r28
c019e9e0: ba e1 00 1c lmw r23,28(r1)
c019e9e4: 38 21 00 40 addi r1,r1,64
c019e9e8: 7c 08 03 a6 mtlr r0
c019e9ec: 4e 80 00 20 blr
c019e9f0: 7f e3 fb 78 mr r3,r31
c019e9f4: 7f c4 f3 78 mr r4,r30
c019e9f8: 7f a5 eb 78 mr r5,r29
c019e9fc: 4b ff df 59 bl c019c954 <xfs_btree_lshift>
c019ea00: 7c 7c 1b 79 mr. r28,r3
c019ea04: 40 a2 ff d4 bne- c019e9d8
<xfs_btree_make_block_unfull+0x74>
c019ea08: 80 1d 00 00 lwz r0,0(r29)
c019ea0c: 2f 80 00 00 cmpwi cr7,r0,0
c019ea10: 41 9e 00 64 beq- cr7,c019ea74
<xfs_btree_make_block_unfull+0x110>
c019ea14: 57 c9 10 3a rlwinm r9,r30,2,0,29
c019ea18: 7f 83 e3 78 mr r3,r28
c019ea1c: 7d 29 fa 14 add r9,r9,r31
c019ea20: 80 09 00 50 lwz r0,80(r9)
c019ea24: 90 17 00 00 stw r0,0(r23)
c019ea28: 90 19 00 00 stw r0,0(r25) <OOPs
occured here>
c019ea2c: 80 01 00 44 lwz r0,68(r1)
c019ea30: ba e1 00 1c lmw r23,28(r1)
c019ea34: 38 21 00 40 addi r1,r1,64
c019ea38: 7c 08 03 a6 mtlr r0
> Thanks,
> -Eric
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
[not found] ` <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com>
@ 2009-12-14 15:56 ` Eric Sandeen
2009-12-15 0:49 ` hank peng
0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2009-12-14 15:56 UTC (permalink / raw)
To: hank peng, xfs-oss
hank peng wrote:
> Hi,Eric:
> I think I have found the reason to this problem, but I need you a little help.
> We have tested it again, and the same OOPS occured again:
Ok, let's keep this on the list please ...
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc019f4b8
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019f4b8 LR: c019f490 CTR: 00000000
> REGS: ef965af0 TRAP: 0300 Not tainted (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE> CR: 22008284 XER: 00000000
> DEAR: 00000000, ESR: 00800000
> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
> Call Trace:
> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
> [ef965f10] [c0092048] filp_close+0x70/0xb0
> [ef965f30] [c009211c] sys_close+0x94/0xc0
> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
> ---[ end trace 356726176eeecd9c ]---
> Oops: Exception in kernel mode, sig: 4 [#2]
> MPC85xx CDS
> Modules linked in:
> NIP: c0187660 LR: c019b26c CTR: c0187660
> REGS: d42076a0 TRAP: 0700 Tainted: G D (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE> CR: 22222082 XER: 00000000
> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003
> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001
> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001
> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750
> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
> Call Trace:
> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
> [d4207d90] [c006d740] __writepage+0x24/0x80
> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
> [d4207fc0] [c004c750] kthread+0x78/0x7c
> <...>
>
>
> There were another OOPS which followed the first one.
After the first oops I think the rest is not interesting, things
are in bad shape by now.
> Please note that
> in the second OOPS, a SIGILL has been invoked and address of illegal
> instrucion is 0xc0187660.
> In the first OOPS, look at the following registers:
>
> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>
> I noticed that the value of r23 is also 0xc0187660. I have a little
> powerpc assembly code knowledge, if I am not wrong,
> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
> built into the following asm code which I send it to you ealier:
> 80 09 00 50 lwz r0,80(r9)
> 90 17 00 00 stw r0,0(r23)
> 90 19 00 00 stw r0,0(r25) <OOPs occured here>
>
> So, r23 should have pointed to address of index and never had a chace
> to point to a code adress, but it did. What's worse, the code at
> 0xc0187660 had been changed and the second OOPS happened imediately.
>
> Could you correct my analysis if I am wrong?
> In addition, I think the problem may be caused by stack overflow, what
> is your comments?
>
>
Perhaps, but if this is the 2nd oops I think it is not worth investigating;
we need to figure out why the first one happened, and from that stack trace
I don't think you are close to overflowing...
-eric
>
> 2009/12/9 hank peng <pengxihan@gmail.com>:
>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>> hank peng wrote:
>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>> hank peng wrote:
>>>>>>
>>>>>>> Thanks for your replay.
>>>>>>>
>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>> <snip>
>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>
>>>>>> Could you use gdb to look? Maybe:
>>>>>>
>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>
>>>>> I use gdb on my PC and get this:
>>>>>
>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>>> welcome to change it and/or distribute copies of it under certain conditions.
>>>>> Type "show copying" to see the conditions.
>>>>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>
>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>> No source file for address 0xc019ea28.
>>>>> (gdb)
>>>>>
>>>>>> -Eric
>>>> so I guess it is not built with debugging symbols perhaps?
>>>>
>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>
>>> yes, you are right, now I get the result:
>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>> 2638 error = xfs_btree_lshift(cur, level, stat);
>>> 2639 if (error)
>>> 2640 return error;
>>> 2641
>>> 2642 if (*stat) {
>>> 2643 *oindex = *index = cur->bc_ptrs[level];
>>> 2644 return 0;
>>> 2645 }
>>> 2646
>>> 2647 /*
>>>
>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>
>> Very strange, as you said, xfs_btree_insrec passes address local
>> variable to xfs_btree_make_block_unfull, so it is impossible for
>> oindex to be NULL.
>> Do you think it may be an memory corrupt?
>>>> -Eric
>>>>
>>>
>>>
>>> --
>>> The simplest is not all best but the best is surely the simplest!
>>>
>>
>>
>> --
>> The simplest is not all best but the best is surely the simplest!
>>
>
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-14 15:56 ` Eric Sandeen
@ 2009-12-15 0:49 ` hank peng
2009-12-15 0:58 ` hank peng
2009-12-15 1:26 ` Dave Chinner
0 siblings, 2 replies; 12+ messages in thread
From: hank peng @ 2009-12-15 0:49 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs-oss
Hi, Eric:
I add some code like this:
if (*stat) {
printk("*stat = 0x%08x, oindex = %p, index = %p\n",
*stat, oindex, index);
if (oindex == NULL || index == NULL) {
printk("BUG occured!\n");
printk("oindex = %p, index = %p\n", oindex, index);
BUG();
}
*oindex = *index = cur->bc_ptrs[level];
return 0;
}
And the same OOPS happened again but a little different, kernel messages are:
<snip>
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = 00000501, index = 22008424
Unable to handle kernel paging request for data at address 0x22008424
Faulting instruction address: 0xc019f568
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
NIP: c019f568 LR: c019f54c CTR: c023f9f4
REGS: e87d7af0 TRAP: 0300 Not tainted (2.6.31.6-svn40)
MSR: 00029000 <EE,ME,CE> CR: 22008424 XER: 20000000
DEAR: 22008424, ESR: 00800000
TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000
GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91
GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000
GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0
GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8
NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4
LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4
Call Trace:
[e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable)
[e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0
[e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0
[e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810
[e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104
[e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0
[e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8
[e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c
[e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28
[e87d7ef0] [c00957dc] __fput+0xe8/0x1dc
[e87d7f10] [c00920d8] filp_close+0x70/0xb0
[e87d7f30] [c00921ac] sys_close+0x94/0xc0
[e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c
Instruction dump:
7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000
419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010
---[ end trace f245b6a670339d8f ]---
</snip>
As you see, after printing "*stat = 0x00000001, oindex = 00000501,
index = 22008424", OOPS happened.
Although my BUG() was not invoked, it did access bad area.
2009/12/14 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> Hi,Eric:
>> I think I have found the reason to this problem, but I need you a little help.
>> We have tested it again, and the same OOPS occured again:
>
> Ok, let's keep this on the list please ...
>
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc019f4b8
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c019f4b8 LR: c019f490 CTR: 00000000
>> REGS: ef965af0 TRAP: 0300 Not tainted (2.6.31.6-svn40)
>> MSR: 00029000 <EE,ME,CE> CR: 22008284 XER: 00000000
>> DEAR: 00000000, ESR: 00800000
>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
>> Call Trace:
>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
>> [ef965f10] [c0092048] filp_close+0x70/0xb0
>> [ef965f30] [c009211c] sys_close+0x94/0xc0
>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
>> Instruction dump:
>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>> ---[ end trace 356726176eeecd9c ]---
>> Oops: Exception in kernel mode, sig: 4 [#2]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c0187660 LR: c019b26c CTR: c0187660
>> REGS: d42076a0 TRAP: 0700 Tainted: G D (2.6.31.6-svn40)
>> MSR: 00029000 <EE,ME,CE> CR: 22222082 XER: 00000000
>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003
>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001
>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001
>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750
>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
>> Call Trace:
>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
>> [d4207d90] [c006d740] __writepage+0x24/0x80
>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
>> [d4207fc0] [c004c750] kthread+0x78/0x7c
>> <...>
>>
>>
>> There were another OOPS which followed the first one.
>
> After the first oops I think the rest is not interesting, things
> are in bad shape by now.
>
>> Please note that
>> in the second OOPS, a SIGILL has been invoked and address of illegal
>> instrucion is 0xc0187660.
>> In the first OOPS, look at the following registers:
>>
>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>>
>> I noticed that the value of r23 is also 0xc0187660. I have a little
>> powerpc assembly code knowledge, if I am not wrong,
>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
>> built into the following asm code which I send it to you ealier:
>> 80 09 00 50 lwz r0,80(r9)
>> 90 17 00 00 stw r0,0(r23)
>> 90 19 00 00 stw r0,0(r25) <OOPs occured here>
>>
>> So, r23 should have pointed to address of index and never had a chace
>> to point to a code adress, but it did. What's worse, the code at
>> 0xc0187660 had been changed and the second OOPS happened imediately.
>>
>> Could you correct my analysis if I am wrong?
>> In addition, I think the problem may be caused by stack overflow, what
>> is your comments?
>>
>>
> Perhaps, but if this is the 2nd oops I think it is not worth investigating;
> we need to figure out why the first one happened, and from that stack trace
> I don't think you are close to overflowing...
>
> -eric
>
>>
>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>> hank peng wrote:
>>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>>> hank peng wrote:
>>>>>>>
>>>>>>>> Thanks for your replay.
>>>>>>>>
>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>>> <snip>
>>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>>
>>>>>>> Could you use gdb to look? Maybe:
>>>>>>>
>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>>
>>>>>> I use gdb on my PC and get this:
>>>>>>
>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>>>> welcome to change it and/or distribute copies of it under certain conditions.
>>>>>> Type "show copying" to see the conditions.
>>>>>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>>
>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>> No source file for address 0xc019ea28.
>>>>>> (gdb)
>>>>>>
>>>>>>> -Eric
>>>>> so I guess it is not built with debugging symbols perhaps?
>>>>>
>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>>
>>>> yes, you are right, now I get the result:
>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>>> 2638 error = xfs_btree_lshift(cur, level, stat);
>>>> 2639 if (error)
>>>> 2640 return error;
>>>> 2641
>>>> 2642 if (*stat) {
>>>> 2643 *oindex = *index = cur->bc_ptrs[level];
>>>> 2644 return 0;
>>>> 2645 }
>>>> 2646
>>>> 2647 /*
>>>>
>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>>
>>> Very strange, as you said, xfs_btree_insrec passes address local
>>> variable to xfs_btree_make_block_unfull, so it is impossible for
>>> oindex to be NULL.
>>> Do you think it may be an memory corrupt?
>>>>> -Eric
>>>>>
>>>>
>>>>
>>>> --
>>>> The simplest is not all best but the best is surely the simplest!
>>>>
>>>
>>>
>>> --
>>> The simplest is not all best but the best is surely the simplest!
>>>
>>
>>
>>
>
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 0:49 ` hank peng
@ 2009-12-15 0:58 ` hank peng
2009-12-15 1:26 ` Dave Chinner
1 sibling, 0 replies; 12+ messages in thread
From: hank peng @ 2009-12-15 0:58 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs-oss
2009/12/15 hank peng <pengxihan@gmail.com>:
> Hi, Eric:
> I add some code like this:
> if (*stat) {
> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
> *stat, oindex, index);
> if (oindex == NULL || index == NULL) {
> printk("BUG occured!\n");
> printk("oindex = %p, index = %p\n", oindex, index);
> BUG();
> }
> *oindex = *index = cur->bc_ptrs[level];
> return 0;
> }
>
> And the same OOPS happened again but a little different, kernel messages are:
>
> <snip>
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = 00000501, index = 22008424
> Unable to handle kernel paging request for data at address 0x22008424
> Faulting instruction address: 0xc019f568
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019f568 LR: c019f54c CTR: c023f9f4
> REGS: e87d7af0 TRAP: 0300 Not tainted (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE> CR: 22008424 XER: 20000000
> DEAR: 22008424, ESR: 00800000
> TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000
> GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91
> GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000
> GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0
> GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8
> NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4
> LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4
> Call Trace:
> [e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable)
> [e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0
> [e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0
> [e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810
> [e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104
> [e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0
> [e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8
> [e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c
> [e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28
> [e87d7ef0] [c00957dc] __fput+0xe8/0x1dc
> [e87d7f10] [c00920d8] filp_close+0x70/0xb0
> [e87d7f30] [c00921ac] sys_close+0x94/0xc0
> [e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000
> 419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010
> ---[ end trace f245b6a670339d8f ]---
> </snip>
>
> As you see, after printing "*stat = 0x00000001, oindex = 00000501,
> index = 22008424", OOPS happened.
> Although my BUG() was not invoked, it did access bad area.
>
This is what gdb shows:
(gdb) list *xfs_btree_make_block_unfull+0xe4
0xc019f568 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2650).
2645 if (oindex == NULL || index == NULL) {
2646 printk("BUG occured!\n");
2647 printk("oindex = %p, index = %p\n",
oindex, index);
2648 BUG();
2649 }
2650 *oindex = *index = cur->bc_ptrs[level];
/* why alaways here????? */
2651 return 0;
2652 }
2653
2654 /*
(gdb)
Why suddenly abnormal? memory corrupt? If so, why this OOPS always
occured at the same place?
>
>
> 2009/12/14 Eric Sandeen <sandeen@sandeen.net>:
>> hank peng wrote:
>>> Hi,Eric:
>>> I think I have found the reason to this problem, but I need you a little help.
>>> We have tested it again, and the same OOPS occured again:
>>
>> Ok, let's keep this on the list please ...
>>
>>> Unable to handle kernel paging request for data at address 0x00000000
>>> Faulting instruction address: 0xc019f4b8
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> MPC85xx CDS
>>> Modules linked in:
>>> NIP: c019f4b8 LR: c019f490 CTR: 00000000
>>> REGS: ef965af0 TRAP: 0300 Not tainted (2.6.31.6-svn40)
>>> MSR: 00029000 <EE,ME,CE> CR: 22008284 XER: 00000000
>>> DEAR: 00000000, ESR: 00800000
>>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
>>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
>>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
>>> Call Trace:
>>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
>>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
>>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
>>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
>>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
>>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
>>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
>>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
>>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
>>> [ef965f10] [c0092048] filp_close+0x70/0xb0
>>> [ef965f30] [c009211c] sys_close+0x94/0xc0
>>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
>>> Instruction dump:
>>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>>> ---[ end trace 356726176eeecd9c ]---
>>> Oops: Exception in kernel mode, sig: 4 [#2]
>>> MPC85xx CDS
>>> Modules linked in:
>>> NIP: c0187660 LR: c019b26c CTR: c0187660
>>> REGS: d42076a0 TRAP: 0700 Tainted: G D (2.6.31.6-svn40)
>>> MSR: 00029000 <EE,ME,CE> CR: 22222082 XER: 00000000
>>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
>>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003
>>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001
>>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001
>>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750
>>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
>>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
>>> Call Trace:
>>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
>>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
>>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
>>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
>>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
>>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
>>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
>>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
>>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
>>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
>>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
>>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
>>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
>>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
>>> [d4207d90] [c006d740] __writepage+0x24/0x80
>>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
>>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
>>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
>>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
>>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
>>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
>>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
>>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
>>> [d4207fc0] [c004c750] kthread+0x78/0x7c
>>> <...>
>>>
>>>
>>> There were another OOPS which followed the first one.
>>
>> After the first oops I think the rest is not interesting, things
>> are in bad shape by now.
>>
>>> Please note that
>>> in the second OOPS, a SIGILL has been invoked and address of illegal
>>> instrucion is 0xc0187660.
>>> In the first OOPS, look at the following registers:
>>>
>>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>>>
>>> I noticed that the value of r23 is also 0xc0187660. I have a little
>>> powerpc assembly code knowledge, if I am not wrong,
>>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
>>> built into the following asm code which I send it to you ealier:
>>> 80 09 00 50 lwz r0,80(r9)
>>> 90 17 00 00 stw r0,0(r23)
>>> 90 19 00 00 stw r0,0(r25) <OOPs occured here>
>>>
>>> So, r23 should have pointed to address of index and never had a chace
>>> to point to a code adress, but it did. What's worse, the code at
>>> 0xc0187660 had been changed and the second OOPS happened imediately.
>>>
>>> Could you correct my analysis if I am wrong?
>>> In addition, I think the problem may be caused by stack overflow, what
>>> is your comments?
>>>
>>>
>> Perhaps, but if this is the 2nd oops I think it is not worth investigating;
>> we need to figure out why the first one happened, and from that stack trace
>> I don't think you are close to overflowing...
>>
>> -eric
>>
>>>
>>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>> hank peng wrote:
>>>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>>>> hank peng wrote:
>>>>>>>>
>>>>>>>>> Thanks for your replay.
>>>>>>>>>
>>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>>>> <snip>
>>>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>>>
>>>>>>>> Could you use gdb to look? Maybe:
>>>>>>>>
>>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>>>
>>>>>>> I use gdb on my PC and get this:
>>>>>>>
>>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>>>>> welcome to change it and/or distribute copies of it under certain conditions.
>>>>>>> Type "show copying" to see the conditions.
>>>>>>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>>>
>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>> No source file for address 0xc019ea28.
>>>>>>> (gdb)
>>>>>>>
>>>>>>>> -Eric
>>>>>> so I guess it is not built with debugging symbols perhaps?
>>>>>>
>>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>>>
>>>>> yes, you are right, now I get the result:
>>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>>>> 2638 error = xfs_btree_lshift(cur, level, stat);
>>>>> 2639 if (error)
>>>>> 2640 return error;
>>>>> 2641
>>>>> 2642 if (*stat) {
>>>>> 2643 *oindex = *index = cur->bc_ptrs[level];
>>>>> 2644 return 0;
>>>>> 2645 }
>>>>> 2646
>>>>> 2647 /*
>>>>>
>>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>>>
>>>> Very strange, as you said, xfs_btree_insrec passes address local
>>>> variable to xfs_btree_make_block_unfull, so it is impossible for
>>>> oindex to be NULL.
>>>> Do you think it may be an memory corrupt?
>>>>>> -Eric
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The simplest is not all best but the best is surely the simplest!
>>>>>
>>>>
>>>>
>>>> --
>>>> The simplest is not all best but the best is surely the simplest!
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
> --
> The simplest is not all best but the best is surely the simplest!
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 0:49 ` hank peng
2009-12-15 0:58 ` hank peng
@ 2009-12-15 1:26 ` Dave Chinner
2009-12-15 1:56 ` hank peng
` (2 more replies)
1 sibling, 3 replies; 12+ messages in thread
From: Dave Chinner @ 2009-12-15 1:26 UTC (permalink / raw)
To: hank peng; +Cc: Eric Sandeen, xfs-oss
On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
> Hi, Eric:
> I add some code like this:
> if (*stat) {
> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
> *stat, oindex, index);
> if (oindex == NULL || index == NULL) {
This won't catch bad non-NULL pointers like you are seeing.
> printk("BUG occured!\n");
> printk("oindex = %p, index = %p\n", oindex, index);
> BUG();
> }
> *oindex = *index = cur->bc_ptrs[level];
> return 0;
> }
>
> And the same OOPS happened again but a little different, kernel messages are:
>
> <snip>
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = 00000501, index = 22008424
> Unable to handle kernel paging request for data at address 0x22008424
Given that oindex and index are stack varibles, this indicates some
thing is probably smashing the stack. Possibly a buffer overrun. To
narrow down the possible cause, can you add the debug:
printk("%s:%s: oindex = %p, index = %p\n",
__func__, __LINE__, oindex, index);
throughout the xfs_btree_make_block_unfull() function? i.e. at
first entry, before the xfs_btree_rshift() call, before the
xfs_btree_lshift() call, etc, to see if any of the parameters
are being modified during execution of the function?
If the variables being passed into xfs_btree_make_block_unfull() are
already bad, then do the same thing for the caller
xfs_btree_insert(). This may help narrow down where the problem
is coming from....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 1:26 ` Dave Chinner
@ 2009-12-15 1:56 ` hank peng
2009-12-15 3:15 ` Eric Sandeen
2009-12-15 5:36 ` hank peng
2010-01-13 1:11 ` hank peng
2 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-15 1:56 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss
2009/12/15 Dave Chinner <david@fromorbit.com>:
> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>> Hi, Eric:
>> I add some code like this:
>> if (*stat) {
>> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>> *stat, oindex, index);
>> if (oindex == NULL || index == NULL) {
>
> This won't catch bad non-NULL pointers like you are seeing.
>
>> printk("BUG occured!\n");
>> printk("oindex = %p, index = %p\n", oindex, index);
>> BUG();
>> }
>> *oindex = *index = cur->bc_ptrs[level];
>> return 0;
>> }
>>
>> And the same OOPS happened again but a little different, kernel messages are:
>>
>> <snip>
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>> Unable to handle kernel paging request for data at address 0x22008424
>
> Given that oindex and index are stack varibles, this indicates some
> thing is probably smashing the stack. Possibly a buffer overrun. To
> narrow down the possible cause, can you add the debug:
>
> printk("%s:%s: oindex = %p, index = %p\n",
> __func__, __LINE__, oindex, index);
>
> throughout the xfs_btree_make_block_unfull() function? i.e. at
> first entry, before the xfs_btree_rshift() call, before the
> xfs_btree_lshift() call, etc, to see if any of the parameters
> are being modified during execution of the function?
>
> If the variables being passed into xfs_btree_make_block_unfull() are
> already bad, then do the same thing for the caller
> xfs_btree_insert(). This may help narrow down where the problem
> is coming from....
>
Thanks for your reply!
As you said, I added some code like this:
/* First, try shifting an entry to the right neighbor. */
printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
__func__, oindex, index);
error = xfs_btree_rshift(cur, level, stat);
if (error || *stat)
return error;
/* Next, try shifting an entry to the left neighbor. */
printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
__func__, oindex, index);
error = xfs_btree_lshift(cur, level, stat);
if (error)
return error;
if (*stat) {
printk("*stat = 0x%08x, oindex = %p, index = %p\n",
*stat, oindex, index);
if (oindex == NULL || index == NULL) {
printk("BUG occured!\n");
printk("oindex = %p, index = %p\n", oindex, index);
BUG();
}
*oindex = *index = cur->bc_ptrs[level];
return 0;
}
xfs_btree_set_ptr_null(cur, &nptr);
if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
printk("%s: before calling
xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n",
__func__, &optr, &ptr);
error = xfs_btree_make_block_unfull(cur, level, numrecs,
&optr, &ptr, &nptr, &ncur, &nrec, stat);
if (error || *stat == 0)
goto error0;
}
We are waiting for OOPS to happen.
I hope it will nerver be memory corrupt problem which is nightmare for
me to debug.
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 1:56 ` hank peng
@ 2009-12-15 3:15 ` Eric Sandeen
2009-12-15 3:22 ` hank peng
0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2009-12-15 3:15 UTC (permalink / raw)
To: hank peng; +Cc: xfs-oss
hank peng wrote:
> 2009/12/15 Dave Chinner <david@fromorbit.com>:
>> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>>> Hi, Eric:
>>> I add some code like this:
>>> if (*stat) {
>>> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>> *stat, oindex, index);
>>> if (oindex == NULL || index == NULL) {
>> This won't catch bad non-NULL pointers like you are seeing.
>>
>>> printk("BUG occured!\n");
>>> printk("oindex = %p, index = %p\n", oindex, index);
>>> BUG();
>>> }
>>> *oindex = *index = cur->bc_ptrs[level];
>>> return 0;
>>> }
>>>
>>> And the same OOPS happened again but a little different, kernel messages are:
>>>
>>> <snip>
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>>> Unable to handle kernel paging request for data at address 0x22008424
Are you using any of the xfs userspace prior to this error, or is it a
fresh boot and just normal IO?
I ask because libxfs calls sys_ustat() which at one point was corrupting
userspace, at least, with 32-bit userspace on a 64-bit kernel:
https://bugzilla.redhat.com/show_bug.cgi?id=472795
Even with that fixed there were still some reports of odd behavior
on ppc... I don't know if things might be going wrong in kernelspace
as well...
https://bugzilla.redhat.com/show_bug.cgi?id=517994
and I haven't gotten to the bottom of that yet ...
Very few things actually use sys_ustat, but xfs userspace does...
just a random thought.
-eric
>> Given that oindex and index are stack varibles, this indicates some
>> thing is probably smashing the stack. Possibly a buffer overrun. To
>> narrow down the possible cause, can you add the debug:
>>
>> printk("%s:%s: oindex = %p, index = %p\n",
>> __func__, __LINE__, oindex, index);
>>
>> throughout the xfs_btree_make_block_unfull() function? i.e. at
>> first entry, before the xfs_btree_rshift() call, before the
>> xfs_btree_lshift() call, etc, to see if any of the parameters
>> are being modified during execution of the function?
>>
>> If the variables being passed into xfs_btree_make_block_unfull() are
>> already bad, then do the same thing for the caller
>> xfs_btree_insert(). This may help narrow down where the problem
>> is coming from....
>>
> Thanks for your reply!
> As you said, I added some code like this:
> /* First, try shifting an entry to the right neighbor. */
> printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
> __func__, oindex, index);
> error = xfs_btree_rshift(cur, level, stat);
> if (error || *stat)
> return error;
>
> /* Next, try shifting an entry to the left neighbor. */
> printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
> __func__, oindex, index);
> error = xfs_btree_lshift(cur, level, stat);
> if (error)
> return error;
>
> if (*stat) {
> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
> *stat, oindex, index);
> if (oindex == NULL || index == NULL) {
> printk("BUG occured!\n");
> printk("oindex = %p, index = %p\n", oindex, index);
> BUG();
> }
> *oindex = *index = cur->bc_ptrs[level];
> return 0;
> }
>
>
> xfs_btree_set_ptr_null(cur, &nptr);
> if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
> printk("%s: before calling
> xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n",
> __func__, &optr, &ptr);
> error = xfs_btree_make_block_unfull(cur, level, numrecs,
> &optr, &ptr, &nptr, &ncur, &nrec, stat);
> if (error || *stat == 0)
> goto error0;
> }
>
>
> We are waiting for OOPS to happen.
>
> I hope it will nerver be memory corrupt problem which is nightmare for
> me to debug.
>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
>>
>
>
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 3:15 ` Eric Sandeen
@ 2009-12-15 3:22 ` hank peng
0 siblings, 0 replies; 12+ messages in thread
From: hank peng @ 2009-12-15 3:22 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs-oss
2009/12/15 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> 2009/12/15 Dave Chinner <david@fromorbit.com>:
>>> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>>>> Hi, Eric:
>>>> I add some code like this:
>>>> if (*stat) {
>>>> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>>> *stat, oindex, index);
>>>> if (oindex == NULL || index == NULL) {
>>> This won't catch bad non-NULL pointers like you are seeing.
>>>
>>>> printk("BUG occured!\n");
>>>> printk("oindex = %p, index = %p\n", oindex, index);
>>>> BUG();
>>>> }
>>>> *oindex = *index = cur->bc_ptrs[level];
>>>> return 0;
>>>> }
>>>>
>>>> And the same OOPS happened again but a little different, kernel messages are:
>>>>
>>>> <snip>
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>>>> Unable to handle kernel paging request for data at address 0x22008424
>
> Are you using any of the xfs userspace prior to this error, or is it a
> fresh boot and just normal IO?
>
no xfs userspace prior to this error, just normal IO. Besides, it need
some time to produce the OOPS.
> I ask because libxfs calls sys_ustat() which at one point was corrupting
> userspace, at least, with 32-bit userspace on a 64-bit kernel:
> https://bugzilla.redhat.com/show_bug.cgi?id=472795
>
Forgot to say, I use "-o inode64" when mount.
# uname -a
Linux Storage 2.6.31.6-svn40 #30 Tue Dec 15 09:50:02 CST 2009 ppc unknown
# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (rw,relatime,errors=continue)
/dev/mtdblock2 on /mnt/sys_data type jffs2 (rw,relatime)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
tmpfs on /opt/upgrade type tmpfs (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620)
/dev/Pool_md2/ss1 on /mnt/Pool_md2/ss1 type xfs
(rw,relatime,attr2,inode64,noquota)
> Even with that fixed there were still some reports of odd behavior
> on ppc... I don't know if things might be going wrong in kernelspace
> as well...
>
> https://bugzilla.redhat.com/show_bug.cgi?id=517994
> and I haven't gotten to the bottom of that yet ...
>
> Very few things actually use sys_ustat, but xfs userspace does...
> just a random thought.
>
> -eric
>
>>> Given that oindex and index are stack varibles, this indicates some
>>> thing is probably smashing the stack. Possibly a buffer overrun. To
>>> narrow down the possible cause, can you add the debug:
>>>
>>> printk("%s:%s: oindex = %p, index = %p\n",
>>> __func__, __LINE__, oindex, index);
>>>
>>> throughout the xfs_btree_make_block_unfull() function? i.e. at
>>> first entry, before the xfs_btree_rshift() call, before the
>>> xfs_btree_lshift() call, etc, to see if any of the parameters
>>> are being modified during execution of the function?
>>>
>>> If the variables being passed into xfs_btree_make_block_unfull() are
>>> already bad, then do the same thing for the caller
>>> xfs_btree_insert(). This may help narrow down where the problem
>>> is coming from....
>>>
>> Thanks for your reply!
>> As you said, I added some code like this:
>> /* First, try shifting an entry to the right neighbor. */
>> printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
>> __func__, oindex, index);
>> error = xfs_btree_rshift(cur, level, stat);
>> if (error || *stat)
>> return error;
>>
>> /* Next, try shifting an entry to the left neighbor. */
>> printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
>> __func__, oindex, index);
>> error = xfs_btree_lshift(cur, level, stat);
>> if (error)
>> return error;
>>
>> if (*stat) {
>> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>> *stat, oindex, index);
>> if (oindex == NULL || index == NULL) {
>> printk("BUG occured!\n");
>> printk("oindex = %p, index = %p\n", oindex, index);
>> BUG();
>> }
>> *oindex = *index = cur->bc_ptrs[level];
>> return 0;
>> }
>>
>>
>> xfs_btree_set_ptr_null(cur, &nptr);
>> if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
>> printk("%s: before calling
>> xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n",
>> __func__, &optr, &ptr);
>> error = xfs_btree_make_block_unfull(cur, level, numrecs,
>> &optr, &ptr, &nptr, &ncur, &nrec, stat);
>> if (error || *stat == 0)
>> goto error0;
>> }
>>
>>
>> We are waiting for OOPS to happen.
>>
>> I hope it will nerver be memory corrupt problem which is nightmare for
>> me to debug.
>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@fromorbit.com
>>>
>>
>>
>>
>
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 1:26 ` Dave Chinner
2009-12-15 1:56 ` hank peng
@ 2009-12-15 5:36 ` hank peng
2010-01-13 1:11 ` hank peng
2 siblings, 0 replies; 12+ messages in thread
From: hank peng @ 2009-12-15 5:36 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss
2009/12/15 Dave Chinner <david@fromorbit.com>:
> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>> Hi, Eric:
>> I add some code like this:
>> if (*stat) {
>> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>> *stat, oindex, index);
>> if (oindex == NULL || index == NULL) {
>
> This won't catch bad non-NULL pointers like you are seeing.
>
>> printk("BUG occured!\n");
>> printk("oindex = %p, index = %p\n", oindex, index);
>> BUG();
>> }
>> *oindex = *index = cur->bc_ptrs[level];
>> return 0;
>> }
>>
>> And the same OOPS happened again but a little different, kernel messages are:
>>
>> <snip>
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>> Unable to handle kernel paging request for data at address 0x22008424
>
> Given that oindex and index are stack varibles, this indicates some
In xfs_btree_make_block_unfull, it seems that oindex and index are
optimised to register variables. So, it become more odd.
> thing is probably smashing the stack. Possibly a buffer overrun. To
> narrow down the possible cause, can you add the debug:
>
> printk("%s:%s: oindex = %p, index = %p\n",
> __func__, __LINE__, oindex, index);
>
> throughout the xfs_btree_make_block_unfull() function? i.e. at
> first entry, before the xfs_btree_rshift() call, before the
> xfs_btree_lshift() call, etc, to see if any of the parameters
> are being modified during execution of the function?
>
> If the variables being passed into xfs_btree_make_block_unfull() are
> already bad, then do the same thing for the caller
> xfs_btree_insert(). This may help narrow down where the problem
> is coming from....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
2009-12-15 1:26 ` Dave Chinner
2009-12-15 1:56 ` hank peng
2009-12-15 5:36 ` hank peng
@ 2010-01-13 1:11 ` hank peng
2 siblings, 0 replies; 12+ messages in thread
From: hank peng @ 2010-01-13 1:11 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss
2009/12/15 Dave Chinner <david@fromorbit.com>:
> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>> Hi, Eric:
>> I add some code like this:
>> if (*stat) {
>> printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>> *stat, oindex, index);
>> if (oindex == NULL || index == NULL) {
>
> This won't catch bad non-NULL pointers like you are seeing.
>
>> printk("BUG occured!\n");
>> printk("oindex = %p, index = %p\n", oindex, index);
>> BUG();
>> }
>> *oindex = *index = cur->bc_ptrs[level];
>> return 0;
>> }
>>
>> And the same OOPS happened again but a little different, kernel messages are:
>>
>> <snip>
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>> Unable to handle kernel paging request for data at address 0x22008424
>
> Given that oindex and index are stack varibles, this indicates some
> thing is probably smashing the stack. Possibly a buffer overrun. To
> narrow down the possible cause, can you add the debug:
>
> printk("%s:%s: oindex = %p, index = %p\n",
> __func__, __LINE__, oindex, index);
>
> throughout the xfs_btree_make_block_unfull() function? i.e. at
> first entry, before the xfs_btree_rshift() call, before the
> xfs_btree_lshift() call, etc, to see if any of the parameters
> are being modified during execution of the function?
>
> If the variables being passed into xfs_btree_make_block_unfull() are
> already bad, then do the same thing for the caller
> xfs_btree_insert(). This may help narrow down where the problem
> is coming from....
>
I added the following debug code as you said:
<code>
printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
__func__, oindex, index);
error = xfs_btree_rshift(cur, level, stat);
if (error || *stat)
return error;
/* Next, try shifting an entry to the left neighbor. */
printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
__func__, oindex, index);
error = xfs_btree_lshift(cur, level, stat);
if (error)
return error;
if (*stat) {
printk("%s: oindex = %p, index = %p, *stat = %d\n",
__func__, oindex, index, *stat);
*oindex = *index = cur->bc_ptrs[level];
return 0;
}
</code>
It has been working fine for about 36 hours without problem, but in
today's morning, odd OOPS appeared:
xfs_btree_make_block_unfull: before xfs_btree_rshift, oindex =
d3a27bd8, index = d3a27bdc
xfs_btree_make_block_unfull: before xfs_btree_lshift, oindex =
d3a27bd8, index = d3a27bdc
xfs_btree_make_block_unfull: oindex = d3a27bd8, index = d3a27bdc, *stat = 1
xfs_btree_make_block_unfull: before xfs_btree_rshift, oindex =
d3a27bd8, index = d3a27bdc
Unable to handle kernel paging request for data at address 0x00000501
Faulting instruction address: 0xc019f4f0
Oops: Kernel access of bad area, sig: 11 [#2]
MPC85xx CDS
Modules linked in:
NIP: c019f4f0 LR: c019f4e8 CTR: c023fabc
REGS: d3a27ad0 TRAP: 0300 Tainted: G D (2.6.31.6-svn45)
MSR: 00029000 <EE,ME,CE> CR: 22008424 XER: 20000000
DEAR: 00000501, ESR: 00000000
TASK = efb46a30[20273] 'cp' THREAD: d3a26000
GPR00: c019f4e8 d3a27b80 efb46a30 00000000 d3a27b38 d3a27b38 00000010 007f0f26
GPR08: c04a7c40 ffffffff e8517850 d3a27b80 20008422 100eb39c 3fff5400 100a0000
GPR16: 100d5ac8 00000000 016d30f3 e8517850 c019d08c 00029000 d3a27bf0 c023fabc
GPR24: c019d068 00000000 22008424 d3a27bdc 00000501 d3a27bd8 00000000 e8517850
NIP [c019f4f0] xfs_btree_make_block_unfull+0x8c/0x1f8
LR [c019f4e8] xfs_btree_make_block_unfull+0x84/0x1f8
Call Trace:
[d3a27b80] [c019f4e8] xfs_btree_make_block_unfull+0x84/0x1f8 (unreliable)
[d3a27bc0] [c019f9d0] xfs_btree_insrec+0x374/0x4b0
[d3a27c30] [c019fb88] xfs_btree_insert+0x7c/0x1c0
[d3a27c90] [c01865d0] xfs_free_ag_extent+0x34c/0x810
[d3a27d00] [c0187168] xfs_free_extent+0xdc/0x104
[d3a27d90] [c018fe50] xfs_bmap_finish+0x154/0x1a0
[d3a27dc0] [c01b697c] xfs_itruncate_finish+0x254/0x3b8
[d3a27e40] [c01d2134] xfs_inactive+0x2c4/0x450
[d3a27e80] [c01e193c] xfs_fs_clear_inode+0x40/0x50
[d3a27e90] [c00a84bc] clear_inode+0x6c/0x108
[d3a27ea0] [c00a87d0] generic_delete_inode+0x114/0x118
[d3a27eb0] [c00a7ff8] iput+0x74/0x94
[d3a27ec0] [c00a003c] do_unlinkat+0x114/0x198
[d3a27f40] [c000f7ac] ret_from_syscall+0x0/0x3c
Instruction dump:
7f66db78 7f44d378 7fa5eb78 3863eca4 4cc63182 4be97ef5 7fe3fb78 7fc4f378
7f85e378 4bffdb15 7c791b79 40820010 <801c0000> 2f800000 419e001c 80010044
---[ end trace 95e2c49eb5a34f9a ]---
(gdb) list *(xfs_btree_make_block_unfull+0x8c)
0xc019f4f0 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2636).
2631
2632 /* First, try shifting an entry to the right neighbor. */
2633 printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
2634 __func__, oindex, index);
2635 error = xfs_btree_rshift(cur, level, stat);
2636 if (error || *stat)
2637 return error;
2638
2639 /* Next, try shifting an entry to the left neighbor. */
2640 printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
It seems that after call xfs_btree_rshift, the value of 'stat' has
been changed, how could it be possible since it is local variable?
# uname -a
Linux Storage 2.6.31.6-svn45 #87 Mon Jan 11 13:22:14 CST 2010 ppc unknown
# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (rw,relatime,errors=continue)
/dev/mtdblock2 on /mnt/sys_data type jffs2 (rw,relatime)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
tmpfs on /opt/upgrade type tmpfs (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620)
/dev/vg_log/lv_log on /var/log type reiserfs (rw,relatime)
/dev/Pool_md1/SS1 on /mnt/Pool_md1/SS1 type xfs
(rw,relatime,attr2,inode64,noquota)
/dev/Pool_md2/SS2 on /mnt/Pool_md2/SS2 type xfs
(rw,relatime,attr2,inode64,noquota)
root@Storage:/var/log# df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 124.0M 72.6M 51.4M 59% /
/dev/mtdblock2 1.0M 408.0K 616.0K 40% /mnt/sys_data
tmpfs 505.3M 0 505.3M 0% /opt/upgrade
/dev/vg_log/lv_log 10.0G 32.4M 10.0G 0% /var/log
/dev/Pool_md1/SS1 2.7T 270.2G 2.5T 10% /mnt/Pool_md1/SS1
/dev/Pool_md2/SS2 2.7T 344.0G 2.4T 12% /mnt/Pool_md2/SS2
From assembly code, I noticed that the local variable 'stat' didn't
have real space in stack. It is optimised to be a register(r28).
According to powerpc ABI, before call xfs_btree_rshift, some registers
will be saved at stack and before return from xfs_btree_rshift, these
registers will be restored. Is it possible that a smash occured at
this time?
BTW, I noticed that my cross-compiler "powerpc-linux-gnuspe-gcc"
didn't have default 4 bytes alignment but 8 bytes alignment.
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
--
The simplest is not all best but the best is surely the simplest!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-01-13 1:10 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-09 1:58 [BUG report]xfs_btree_make_block_unfull generated an OOPS hank peng
2009-12-09 2:57 ` Eric Sandeen
2009-12-09 3:18 ` hank peng
[not found] ` <4B1F18C4.3060704@sandeen.net>
[not found] ` <389deec70912082053v4310057dg479f6d4b6c4b46f7@mail.gmail.com>
[not found] ` <4B1F31FD.3020705@sandeen.net>
[not found] ` <389deec70912082220pcb3b5d1q516ac197d31502c5@mail.gmail.com>
[not found] ` <389deec70912082230g38987576pc48d7699f23844c5@mail.gmail.com>
[not found] ` <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com>
2009-12-14 15:56 ` Eric Sandeen
2009-12-15 0:49 ` hank peng
2009-12-15 0:58 ` hank peng
2009-12-15 1:26 ` Dave Chinner
2009-12-15 1:56 ` hank peng
2009-12-15 3:15 ` Eric Sandeen
2009-12-15 3:22 ` hank peng
2009-12-15 5:36 ` hank peng
2010-01-13 1:11 ` hank peng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox