public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG report]xfs_btree_make_block_unfull generated an OOPS
@ 2009-12-09  1:58 hank peng
  2009-12-09  2:57 ` Eric Sandeen
  0 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-09  1:58 UTC (permalink / raw)
  To: linux-xfs

Hi, all:
I think it is a BUG, so I report it here.
root@1234dahua:~# uname -a
Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc019ea28
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
NIP: c019ea28 LR: c019ea00 CTR: 00000000
REGS: e233baf0 TRAP: 0300   Not tainted  (2.6.31.6)
MSR: 00029000 <EE,ME,CE>  CR: 22008484  XER: 00000000
DEAR: 00000000, ESR: 00800000
TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0
LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
Call Trace:
[e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
[e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
[e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
[e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810
[e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
[e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
[e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
[e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
[e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
[e233bef0] [c009574c] __fput+0xe8/0x1dc
[e233bf10] [c0092048] filp_close+0x70/0xb0
[e233bf30] [c009211c] sys_close+0x94/0xc0
[e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
Instruction dump:
7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
---[ end trace 069fbb7d042289d2 ]---

According to the above call trace, I checked the source code and found
that it may be invoked by xfs_btree_make_block_unfull function in
fs/xfs/xfs_btree.c:

2641
2642         if (*stat) {
2643                 *oindex = *index = cur->bc_ptrs[level];
2644                 return 0;
2645         }

here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
someone can help me fix this BUG, if you need more information, let me
know.

-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-09  1:58 [BUG report]xfs_btree_make_block_unfull generated an OOPS hank peng
@ 2009-12-09  2:57 ` Eric Sandeen
  2009-12-09  3:18   ` hank peng
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2009-12-09  2:57 UTC (permalink / raw)
  To: hank peng; +Cc: linux-xfs

hank peng wrote:
> Hi, all:
> I think it is a BUG, so I report it here.
> root@1234dahua:~# uname -a
> Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown
> 
> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc019ea28
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019ea28 LR: c019ea00 CTR: 00000000
> REGS: e233baf0 TRAP: 0300   Not tainted  (2.6.31.6)
> MSR: 00029000 <EE,ME,CE>  CR: 22008484  XER: 00000000
> DEAR: 00000000, ESR: 00800000
> TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
> GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
> GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
> GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
> GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
> NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0

huh, I don't think I've ever seen an oops here, nor has kerneloops.org.

I wonder how you managed this ... :)

> LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
> Call Trace:
> [e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
> [e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
> [e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
> [e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810

so this is freeing blocks and adding them to the freespace btrees;
it needs to move entries out of a block to make room for the new one.
Not a terribly unusual operation, I think.

> [e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
> [e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
> [e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
> [e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
> [e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
> [e233bef0] [c009574c] __fput+0xe8/0x1dc
> [e233bf10] [c0092048] filp_close+0x70/0xb0
> [e233bf30] [c009211c] sys_close+0x94/0xc0
> [e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
> ---[ end trace 069fbb7d042289d2 ]---
> 
> According to the above call trace, I checked the source code and found
> that it may be invoked by xfs_btree_make_block_unfull function in
> fs/xfs/xfs_btree.c:
> 
> 2641
> 2642         if (*stat) {
> 2643                 *oindex = *index = cur->bc_ptrs[level];
> 2644                 return 0;
> 2645         }
> 
> here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
> someone can help me fix this BUG, if you need more information, let me
> know.

Is the above from gdb?  You're quite certain that this is the case,
or is this a guess?

It seems a little unlikely because in the calling function:

        int                     optr;   /* old key/record index */
        int                     ptr;    /* key/record index */

// .... code code code ...

        if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
                error = xfs_btree_make_block_unfull(cur, level, numrecs,
                                        &optr, &ptr, &nptr, &ncur, &nrec, stat);

We're just sending in the addresses of these local variables;
I don't see how these pointers could be NULL.

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-09  2:57 ` Eric Sandeen
@ 2009-12-09  3:18   ` hank peng
       [not found]     ` <4B1F18C4.3060704@sandeen.net>
  0 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-09  3:18 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> Hi, all:
>> I think it is a BUG, so I report it here.
>> root@1234dahua:~# uname -a
>> Linux 1234dahua 2.6.31.6 #14 Tue Dec 8 16:48:40 CST 2009 ppc unknown
>>
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc019ea28
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c019ea28 LR: c019ea00 CTR: 00000000
>> REGS: e233baf0 TRAP: 0300   Not tainted  (2.6.31.6)
>> MSR: 00029000 <EE,ME,CE>  CR: 22008484  XER: 00000000
>> DEAR: 00000000, ESR: 00800000
>> TASK = e8add2c0[21249] 'SS_Server' THREAD: e233a000
>> GPR00: 000001a4 e233bba0 e8add2c0 00000000 00000000 00000000 00000001 00000000
>> GPR08: c0e22478 e20137b8 c0e2247c 000001a4 22008422 1016d410 3fff5400 100a0000
>> GPR16: 100ce108 00000000 006398bb e20137b8 c019c58c 00029000 e233bc5c c0186bd0
>> GPR24: c019c568 00000000 22008424 e233bc08 00000000 e233bc58 00000000 e20137b8
>> NIP [c019ea28] xfs_btree_make_block_unfull+0xc4/0x1b0
>
> huh, I don't think I've ever seen an oops here, nor has kerneloops.org.
>
> I wonder how you managed this ... :)
>
>> LR [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0
>> Call Trace:
>> [e233bba0] [c019ea00] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>> [e233bbe0] [c019ee88] xfs_btree_insrec+0x374/0x4b0
>> [e233bc50] [c019f040] xfs_btree_insert+0x7c/0x1c0
>> [e233bcb0] [c0185ad0] xfs_free_ag_extent+0x34c/0x810
>
> so this is freeing blocks and adding them to the freespace btrees;
> it needs to move entries out of a block to make room for the new one.
> Not a terribly unusual operation, I think.
>
>> [e233bd20] [c0186668] xfs_free_extent+0xdc/0x104
>> [e233bdb0] [c018f350] xfs_bmap_finish+0x154/0x1a0
>> [e233bde0] [c01b5e34] xfs_itruncate_finish+0x254/0x3b8
>> [e233be60] [c01d033c] xfs_free_eofblocks+0x254/0x29c
>> [e233bee0] [c01d9ba8] xfs_file_release+0x14/0x28
>> [e233bef0] [c009574c] __fput+0xe8/0x1dc
>> [e233bf10] [c0092048] filp_close+0x70/0xb0
>> [e233bf30] [c009211c] sys_close+0x94/0xc0
>> [e233bf40] [c000f784] ret_from_syscall+0x0/0x3c
>> Instruction dump:
>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>> ---[ end trace 069fbb7d042289d2 ]---
>>
>> According to the above call trace, I checked the source code and found
>> that it may be invoked by xfs_btree_make_block_unfull function in
>> fs/xfs/xfs_btree.c:
>>
>> 2641
>> 2642         if (*stat) {
>> 2643                 *oindex = *index = cur->bc_ptrs[level];
>> 2644                 return 0;
>> 2645         }
>>
>> here, oindex is NULL so OOPs occured. I am not a xfs hacker, I hope
>> someone can help me fix this BUG, if you need more information, let me
>> know.
>
> Is the above from gdb?  You're quite certain that this is the case,
> or is this a guess?
>
> It seems a little unlikely because in the calling function:
>
>        int                     optr;   /* old key/record index */
>        int                     ptr;    /* key/record index */
>
> // .... code code code ...
>
>        if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
>                error = xfs_btree_make_block_unfull(cur, level, numrecs,
>                                        &optr, &ptr, &nptr, &ncur, &nrec, stat);
>
> We're just sending in the addresses of these local variables;
> I don't see how these pointers could be NULL.
>
Thanks for your replay.

I made this conclusion from assembly code, correct me if I am wrong.
#powerpc-linux-gnuspe-objdump vmlinux | less
<snip>
c019e964 <xfs_btree_make_block_unfull>:
c019e964:       94 21 ff c0     stwu    r1,-64(r1)
c019e968:       7c 08 02 a6     mflr    r0
c019e96c:       be e1 00 1c     stmw    r23,28(r1)
c019e970:       7c 7f 1b 78     mr      r31,r3
c019e974:       90 01 00 44     stw     r0,68(r1)
c019e978:       7c bc 2b 78     mr      r28,r5
c019e97c:       7c d9 33 78     mr      r25,r6              <I think
here r6 store value of oindex >
c019e980:       83 a1 00 48     lwz     r29,72(r1)
c019e984:       7c f7 3b 78     mr      r23,r7
c019e988:       80 03 00 0c     lwz     r0,12(r3)
c019e98c:       7d 1b 43 78     mr      r27,r8
c019e990:       7d 3a 4b 78     mr      r26,r9
c019e994:       7d 58 53 78     mr      r24,r10
c019e998:       7c 9e 23 78     mr      r30,r4
c019e99c:       70 0b 00 02     andi.   r11,r0,2
c019e9a0:       41 82 00 14     beq-    c019e9b4
<xfs_btree_make_block_unfull+0x50>
c019e9a4:       89 23 00 78     lbz     r9,120(r3)
c019e9a8:       39 29 ff ff     addi    r9,r9,-1
c019e9ac:       7f 84 48 00     cmpw    cr7,r4,r9
c019e9b0:       41 9e 00 90     beq-    cr7,c019ea40
<xfs_btree_make_block_unfull+0xdc>
c019e9b4:       7f e3 fb 78     mr      r3,r31
c019e9b8:       7f c4 f3 78     mr      r4,r30
c019e9bc:       7f a5 eb 78     mr      r5,r29
c019e9c0:       4b ff db 39     bl      c019c4f8 <xfs_btree_rshift>
c019e9c4:       7c 7c 1b 79     mr.     r28,r3
c019e9c8:       40 82 00 10     bne-    c019e9d8
<xfs_btree_make_block_unfull+0x74>
c019e9cc:       80 1d 00 00     lwz     r0,0(r29)
c019e9d0:       2f 80 00 00     cmpwi   cr7,r0,0
c019e9d4:       41 9e 00 1c     beq-    cr7,c019e9f0
<xfs_btree_make_block_unfull+0x8c>
c019e9d8:       80 01 00 44     lwz     r0,68(r1)
c019e9dc:       7f 83 e3 78     mr      r3,r28
c019e9e0:       ba e1 00 1c     lmw     r23,28(r1)
c019e9e4:       38 21 00 40     addi    r1,r1,64
c019e9e8:       7c 08 03 a6     mtlr    r0
c019e9ec:       4e 80 00 20     blr
c019e9f0:       7f e3 fb 78     mr      r3,r31
c019e9f4:       7f c4 f3 78     mr      r4,r30
c019e9f8:       7f a5 eb 78     mr      r5,r29
c019e9fc:       4b ff df 59     bl      c019c954 <xfs_btree_lshift>
c019ea00:       7c 7c 1b 79     mr.     r28,r3
c019ea04:       40 a2 ff d4     bne-    c019e9d8
<xfs_btree_make_block_unfull+0x74>
c019ea08:       80 1d 00 00     lwz     r0,0(r29)
c019ea0c:       2f 80 00 00     cmpwi   cr7,r0,0
c019ea10:       41 9e 00 64     beq-    cr7,c019ea74
<xfs_btree_make_block_unfull+0x110>
c019ea14:       57 c9 10 3a     rlwinm  r9,r30,2,0,29
c019ea18:       7f 83 e3 78     mr      r3,r28
c019ea1c:       7d 29 fa 14     add     r9,r9,r31
c019ea20:       80 09 00 50     lwz     r0,80(r9)
c019ea24:       90 17 00 00     stw     r0,0(r23)
c019ea28:       90 19 00 00     stw     r0,0(r25)              <OOPs
occured here>
c019ea2c:       80 01 00 44     lwz     r0,68(r1)
c019ea30:       ba e1 00 1c     lmw     r23,28(r1)
c019ea34:       38 21 00 40     addi    r1,r1,64
c019ea38:       7c 08 03 a6     mtlr    r0





> Thanks,
> -Eric
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
       [not found]               ` <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com>
@ 2009-12-14 15:56                 ` Eric Sandeen
  2009-12-15  0:49                   ` hank peng
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2009-12-14 15:56 UTC (permalink / raw)
  To: hank peng, xfs-oss

hank peng wrote:
> Hi,Eric:
> I think I have found the reason to this problem, but I need you a little help.
> We have tested it again, and the same OOPS occured again:

Ok, let's keep this on the list please ...

> Unable to handle kernel paging request for data at address 0x00000000
> Faulting instruction address: 0xc019f4b8
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019f4b8 LR: c019f490 CTR: 00000000
> REGS: ef965af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE>  CR: 22008284  XER: 00000000
> DEAR: 00000000, ESR: 00800000
> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
> Call Trace:
> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
> [ef965f10] [c0092048] filp_close+0x70/0xb0
> [ef965f30] [c009211c] sys_close+0x94/0xc0
> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
> ---[ end trace 356726176eeecd9c ]---
> Oops: Exception in kernel mode, sig: 4 [#2]
> MPC85xx CDS
> Modules linked in:
> NIP: c0187660 LR: c019b26c CTR: c0187660
> REGS: d42076a0 TRAP: 0700   Tainted: G      D     (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE>  CR: 22222082  XER: 00000000
> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003
> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001
> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001
> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750
> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
> Call Trace:
> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
> [d4207d90] [c006d740] __writepage+0x24/0x80
> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
> [d4207fc0] [c004c750] kthread+0x78/0x7c
> <...>
> 
> 
> There were another OOPS which followed the first one. 

After the first oops I think the rest is not interesting, things
are in bad shape by now.

> Please note that
> in the second OOPS, a SIGILL has been invoked and address of illegal
> instrucion is 0xc0187660.
> In the first OOPS, look at the following registers:
> 
> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
> 
> I noticed that the value of r23 is also 0xc0187660. I have a little
> powerpc assembly code knowledge, if I am not wrong,
> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
> built into the following asm code which I send it to you ealier:
> 80 09 00 50     lwz     r0,80(r9)
> 90 17 00 00     stw     r0,0(r23)
> 90 19 00 00     stw     r0,0(r25)              <OOPs occured here>
> 
> So, r23 should have pointed to address of index and never had a chace
> to point to a code adress, but it did. What's worse, the code at
> 0xc0187660 had been changed and the second OOPS happened imediately.
> 
> Could you correct my analysis if I am wrong?
> In addition, I think the problem may be caused by stack overflow, what
> is your comments?
> 
> 
Perhaps, but if this is the 2nd oops I think it is not worth investigating;
we need to figure out why the first one happened, and from that stack trace
I don't think you are close to overflowing...

-eric

> 
> 2009/12/9 hank peng <pengxihan@gmail.com>:
>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>> hank peng wrote:
>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>> hank peng wrote:
>>>>>>
>>>>>>> Thanks for your replay.
>>>>>>>
>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>> <snip>
>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>
>>>>>> Could you use gdb to look?  Maybe:
>>>>>>
>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>
>>>>> I use gdb on my PC and get this:
>>>>>
>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>>> welcome to change it and/or distribute copies of it under certain conditions.
>>>>> Type "show copying" to see the conditions.
>>>>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>
>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>> No source file for address 0xc019ea28.
>>>>> (gdb)
>>>>>
>>>>>> -Eric
>>>> so I guess it is not built with debugging symbols perhaps?
>>>>
>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>
>>> yes, you are right, now I get the result:
>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>> 2638            error = xfs_btree_lshift(cur, level, stat);
>>> 2639            if (error)
>>> 2640                    return error;
>>> 2641
>>> 2642            if (*stat) {
>>> 2643                    *oindex = *index = cur->bc_ptrs[level];
>>> 2644                    return 0;
>>> 2645            }
>>> 2646
>>> 2647            /*
>>>
>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>
>> Very strange, as you said, xfs_btree_insrec passes address local
>> variable to xfs_btree_make_block_unfull, so it is impossible for
>> oindex to be NULL.
>> Do you think it may be an memory corrupt?
>>>> -Eric
>>>>
>>>
>>>
>>> --
>>> The simplest is not all best but the best is surely the simplest!
>>>
>>
>>
>> --
>> The simplest is not all best but the best is surely the simplest!
>>
> 
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-14 15:56                 ` Eric Sandeen
@ 2009-12-15  0:49                   ` hank peng
  2009-12-15  0:58                     ` hank peng
  2009-12-15  1:26                     ` Dave Chinner
  0 siblings, 2 replies; 12+ messages in thread
From: hank peng @ 2009-12-15  0:49 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

Hi, Eric:
I add some code like this:
if (*stat) {
                printk("*stat = 0x%08x, oindex = %p, index = %p\n",
                                *stat, oindex, index);
                if (oindex == NULL || index == NULL) {
                        printk("BUG occured!\n");
                        printk("oindex = %p, index = %p\n", oindex, index);
                        BUG();
                }
                *oindex = *index = cur->bc_ptrs[level];
                return 0;
        }

And the same OOPS happened again but a little different, kernel messages are:

<snip>
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = 00000501, index = 22008424
Unable to handle kernel paging request for data at address 0x22008424
Faulting instruction address: 0xc019f568
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
NIP: c019f568 LR: c019f54c CTR: c023f9f4
REGS: e87d7af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
MSR: 00029000 <EE,ME,CE>  CR: 22008424  XER: 20000000
DEAR: 22008424, ESR: 00800000
TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000
GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91
GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000
GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0
GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8
NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4
LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4
Call Trace:
[e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable)
[e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0
[e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0
[e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810
[e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104
[e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0
[e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8
[e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c
[e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28
[e87d7ef0] [c00957dc] __fput+0xe8/0x1dc
[e87d7f10] [c00920d8] filp_close+0x70/0xb0
[e87d7f30] [c00921ac] sys_close+0x94/0xc0
[e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c
Instruction dump:
7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000
419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010
---[ end trace f245b6a670339d8f ]---
</snip>

As you see, after printing "*stat = 0x00000001, oindex = 00000501,
index = 22008424", OOPS happened.
Although my BUG() was not invoked, it did access bad area.



2009/12/14 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> Hi,Eric:
>> I think I have found the reason to this problem, but I need you a little help.
>> We have tested it again, and the same OOPS occured again:
>
> Ok, let's keep this on the list please ...
>
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc019f4b8
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c019f4b8 LR: c019f490 CTR: 00000000
>> REGS: ef965af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
>> MSR: 00029000 <EE,ME,CE>  CR: 22008284  XER: 00000000
>> DEAR: 00000000, ESR: 00800000
>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
>> Call Trace:
>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
>> [ef965f10] [c0092048] filp_close+0x70/0xb0
>> [ef965f30] [c009211c] sys_close+0x94/0xc0
>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
>> Instruction dump:
>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>> ---[ end trace 356726176eeecd9c ]---
>> Oops: Exception in kernel mode, sig: 4 [#2]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c0187660 LR: c019b26c CTR: c0187660
>> REGS: d42076a0 TRAP: 0700   Tainted: G      D     (2.6.31.6-svn40)
>> MSR: 00029000 <EE,ME,CE>  CR: 22222082  XER: 00000000
>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003
>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001
>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001
>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750
>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
>> Call Trace:
>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
>> [d4207d90] [c006d740] __writepage+0x24/0x80
>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
>> [d4207fc0] [c004c750] kthread+0x78/0x7c
>> <...>
>>
>>
>> There were another OOPS which followed the first one.
>
> After the first oops I think the rest is not interesting, things
> are in bad shape by now.
>
>> Please note that
>> in the second OOPS, a SIGILL has been invoked and address of illegal
>> instrucion is 0xc0187660.
>> In the first OOPS, look at the following registers:
>>
>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>>
>> I noticed that the value of r23 is also 0xc0187660. I have a little
>> powerpc assembly code knowledge, if I am not wrong,
>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
>> built into the following asm code which I send it to you ealier:
>> 80 09 00 50     lwz     r0,80(r9)
>> 90 17 00 00     stw     r0,0(r23)
>> 90 19 00 00     stw     r0,0(r25)              <OOPs occured here>
>>
>> So, r23 should have pointed to address of index and never had a chace
>> to point to a code adress, but it did. What's worse, the code at
>> 0xc0187660 had been changed and the second OOPS happened imediately.
>>
>> Could you correct my analysis if I am wrong?
>> In addition, I think the problem may be caused by stack overflow, what
>> is your comments?
>>
>>
> Perhaps, but if this is the 2nd oops I think it is not worth investigating;
> we need to figure out why the first one happened, and from that stack trace
> I don't think you are close to overflowing...
>
> -eric
>
>>
>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>> hank peng wrote:
>>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>>> hank peng wrote:
>>>>>>>
>>>>>>>> Thanks for your replay.
>>>>>>>>
>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>>> <snip>
>>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>>
>>>>>>> Could you use gdb to look?  Maybe:
>>>>>>>
>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>>
>>>>>> I use gdb on my PC and get this:
>>>>>>
>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>>>> welcome to change it and/or distribute copies of it under certain conditions.
>>>>>> Type "show copying" to see the conditions.
>>>>>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>>
>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>> No source file for address 0xc019ea28.
>>>>>> (gdb)
>>>>>>
>>>>>>> -Eric
>>>>> so I guess it is not built with debugging symbols perhaps?
>>>>>
>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>>
>>>> yes, you are right, now I get the result:
>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>>> 2638            error = xfs_btree_lshift(cur, level, stat);
>>>> 2639            if (error)
>>>> 2640                    return error;
>>>> 2641
>>>> 2642            if (*stat) {
>>>> 2643                    *oindex = *index = cur->bc_ptrs[level];
>>>> 2644                    return 0;
>>>> 2645            }
>>>> 2646
>>>> 2647            /*
>>>>
>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>>
>>> Very strange, as you said, xfs_btree_insrec passes address local
>>> variable to xfs_btree_make_block_unfull, so it is impossible for
>>> oindex to be NULL.
>>> Do you think it may be an memory corrupt?
>>>>> -Eric
>>>>>
>>>>
>>>>
>>>> --
>>>> The simplest is not all best but the best is surely the simplest!
>>>>
>>>
>>>
>>> --
>>> The simplest is not all best but the best is surely the simplest!
>>>
>>
>>
>>
>
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  0:49                   ` hank peng
@ 2009-12-15  0:58                     ` hank peng
  2009-12-15  1:26                     ` Dave Chinner
  1 sibling, 0 replies; 12+ messages in thread
From: hank peng @ 2009-12-15  0:58 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

2009/12/15 hank peng <pengxihan@gmail.com>:
> Hi, Eric:
> I add some code like this:
> if (*stat) {
>                printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>                                *stat, oindex, index);
>                if (oindex == NULL || index == NULL) {
>                        printk("BUG occured!\n");
>                        printk("oindex = %p, index = %p\n", oindex, index);
>                        BUG();
>                }
>                *oindex = *index = cur->bc_ptrs[level];
>                return 0;
>        }
>
> And the same OOPS happened again but a little different, kernel messages are:
>
> <snip>
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = 00000501, index = 22008424
> Unable to handle kernel paging request for data at address 0x22008424
> Faulting instruction address: 0xc019f568
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019f568 LR: c019f54c CTR: c023f9f4
> REGS: e87d7af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE>  CR: 22008424  XER: 20000000
> DEAR: 22008424, ESR: 00800000
> TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000
> GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91
> GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000
> GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0
> GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8
> NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4
> LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4
> Call Trace:
> [e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable)
> [e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0
> [e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0
> [e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810
> [e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104
> [e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0
> [e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8
> [e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c
> [e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28
> [e87d7ef0] [c00957dc] __fput+0xe8/0x1dc
> [e87d7f10] [c00920d8] filp_close+0x70/0xb0
> [e87d7f30] [c00921ac] sys_close+0x94/0xc0
> [e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000
> 419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010
> ---[ end trace f245b6a670339d8f ]---
> </snip>
>
> As you see, after printing "*stat = 0x00000001, oindex = 00000501,
> index = 22008424", OOPS happened.
> Although my BUG() was not invoked, it did access bad area.
>
This is what gdb shows:

(gdb) list *xfs_btree_make_block_unfull+0xe4
0xc019f568 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2650).
2645                    if (oindex == NULL || index == NULL) {
2646                            printk("BUG occured!\n");
2647                            printk("oindex = %p, index = %p\n",
oindex, index);
2648                            BUG();
2649                    }
2650                    *oindex = *index = cur->bc_ptrs[level];
/* why alaways here????? */
2651                    return 0;
2652            }
2653
2654            /*
(gdb)
Why suddenly abnormal? memory corrupt? If so, why this OOPS always
occured at the same place?

>
>
> 2009/12/14 Eric Sandeen <sandeen@sandeen.net>:
>> hank peng wrote:
>>> Hi,Eric:
>>> I think I have found the reason to this problem, but I need you a little help.
>>> We have tested it again, and the same OOPS occured again:
>>
>> Ok, let's keep this on the list please ...
>>
>>> Unable to handle kernel paging request for data at address 0x00000000
>>> Faulting instruction address: 0xc019f4b8
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> MPC85xx CDS
>>> Modules linked in:
>>> NIP: c019f4b8 LR: c019f490 CTR: 00000000
>>> REGS: ef965af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
>>> MSR: 00029000 <EE,ME,CE>  CR: 22008284  XER: 00000000
>>> DEAR: 00000000, ESR: 00800000
>>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
>>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
>>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
>>> Call Trace:
>>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
>>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
>>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
>>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
>>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
>>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
>>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
>>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
>>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
>>> [ef965f10] [c0092048] filp_close+0x70/0xb0
>>> [ef965f30] [c009211c] sys_close+0x94/0xc0
>>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
>>> Instruction dump:
>>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>>> ---[ end trace 356726176eeecd9c ]---
>>> Oops: Exception in kernel mode, sig: 4 [#2]
>>> MPC85xx CDS
>>> Modules linked in:
>>> NIP: c0187660 LR: c019b26c CTR: c0187660
>>> REGS: d42076a0 TRAP: 0700   Tainted: G      D     (2.6.31.6-svn40)
>>> MSR: 00029000 <EE,ME,CE>  CR: 22222082  XER: 00000000
>>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
>>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 00000003
>>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 00000001
>>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 00000001
>>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 d4207750
>>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
>>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
>>> Call Trace:
>>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
>>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
>>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
>>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
>>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
>>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
>>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
>>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
>>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
>>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
>>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
>>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
>>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
>>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
>>> [d4207d90] [c006d740] __writepage+0x24/0x80
>>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
>>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
>>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
>>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
>>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
>>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
>>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
>>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
>>> [d4207fc0] [c004c750] kthread+0x78/0x7c
>>> <...>
>>>
>>>
>>> There were another OOPS which followed the first one.
>>
>> After the first oops I think the rest is not interesting, things
>> are in bad shape by now.
>>
>>> Please note that
>>> in the second OOPS, a SIGILL has been invoked and address of illegal
>>> instrucion is 0xc0187660.
>>> In the first OOPS, look at the following registers:
>>>
>>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 00000001
>>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 100a0000
>>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c c0187660
>>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 e8fa1a18
>>>
>>> I noticed that the value of r23 is also 0xc0187660. I have a little
>>> powerpc assembly code knowledge, if I am not wrong,
>>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
>>> built into the following asm code which I send it to you ealier:
>>> 80 09 00 50     lwz     r0,80(r9)
>>> 90 17 00 00     stw     r0,0(r23)
>>> 90 19 00 00     stw     r0,0(r25)              <OOPs occured here>
>>>
>>> So, r23 should have pointed to address of index and never had a chace
>>> to point to a code adress, but it did. What's worse, the code at
>>> 0xc0187660 had been changed and the second OOPS happened imediately.
>>>
>>> Could you correct my analysis if I am wrong?
>>> In addition, I think the problem may be caused by stack overflow, what
>>> is your comments?
>>>
>>>
>> Perhaps, but if this is the 2nd oops I think it is not worth investigating;
>> we need to figure out why the first one happened, and from that stack trace
>> I don't think you are close to overflowing...
>>
>> -eric
>>
>>>
>>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>>> 2009/12/9 hank peng <pengxihan@gmail.com>:
>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>> hank peng wrote:
>>>>>>> 2009/12/9 Eric Sandeen <sandeen@sandeen.net>:
>>>>>>>> hank peng wrote:
>>>>>>>>
>>>>>>>>> Thanks for your replay.
>>>>>>>>>
>>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>>>> <snip>
>>>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>>>
>>>>>>>> Could you use gdb to look?  Maybe:
>>>>>>>>
>>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>>>
>>>>>>> I use gdb on my PC and get this:
>>>>>>>
>>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>>>> GDB is free software, covered by the GNU General Public License, and you are
>>>>>>> welcome to change it and/or distribute copies of it under certain conditions.
>>>>>>> Type "show copying" to see the conditions.
>>>>>>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>>>
>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>> No source file for address 0xc019ea28.
>>>>>>> (gdb)
>>>>>>>
>>>>>>>> -Eric
>>>>>> so I guess it is not built with debugging symbols perhaps?
>>>>>>
>>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>>>
>>>>> yes, you are right, now I get the result:
>>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>>>> 2638            error = xfs_btree_lshift(cur, level, stat);
>>>>> 2639            if (error)
>>>>> 2640                    return error;
>>>>> 2641
>>>>> 2642            if (*stat) {
>>>>> 2643                    *oindex = *index = cur->bc_ptrs[level];
>>>>> 2644                    return 0;
>>>>> 2645            }
>>>>> 2646
>>>>> 2647            /*
>>>>>
>>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>>>
>>>> Very strange, as you said, xfs_btree_insrec passes address local
>>>> variable to xfs_btree_make_block_unfull, so it is impossible for
>>>> oindex to be NULL.
>>>> Do you think it may be an memory corrupt?
>>>>>> -Eric
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The simplest is not all best but the best is surely the simplest!
>>>>>
>>>>
>>>>
>>>> --
>>>> The simplest is not all best but the best is surely the simplest!
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
> --
> The simplest is not all best but the best is surely the simplest!
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  0:49                   ` hank peng
  2009-12-15  0:58                     ` hank peng
@ 2009-12-15  1:26                     ` Dave Chinner
  2009-12-15  1:56                       ` hank peng
                                         ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Dave Chinner @ 2009-12-15  1:26 UTC (permalink / raw)
  To: hank peng; +Cc: Eric Sandeen, xfs-oss

On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
> Hi, Eric:
> I add some code like this:
> if (*stat) {
>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>                                 *stat, oindex, index);
>                 if (oindex == NULL || index == NULL) {

This won't catch bad non-NULL pointers like you are seeing.

>                         printk("BUG occured!\n");
>                         printk("oindex = %p, index = %p\n", oindex, index);
>                         BUG();
>                 }
>                 *oindex = *index = cur->bc_ptrs[level];
>                 return 0;
>         }
> 
> And the same OOPS happened again but a little different, kernel messages are:
> 
> <snip>
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = 00000501, index = 22008424
> Unable to handle kernel paging request for data at address 0x22008424

Given that oindex and index are stack varibles, this indicates some
thing is probably smashing the stack. Possibly a buffer overrun. To
narrow down the possible cause, can you add the debug:

	printk("%s:%s: oindex = %p, index = %p\n",
			__func__, __LINE__, oindex, index);

throughout the xfs_btree_make_block_unfull() function? i.e. at
first entry, before the xfs_btree_rshift() call, before the
xfs_btree_lshift() call, etc, to see if any of the parameters
are being modified during execution of the function?

If the variables being passed into xfs_btree_make_block_unfull() are
already bad, then do the same thing for the caller
xfs_btree_insert(). This may help narrow down where the problem
is coming from....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  1:26                     ` Dave Chinner
@ 2009-12-15  1:56                       ` hank peng
  2009-12-15  3:15                         ` Eric Sandeen
  2009-12-15  5:36                       ` hank peng
  2010-01-13  1:11                       ` hank peng
  2 siblings, 1 reply; 12+ messages in thread
From: hank peng @ 2009-12-15  1:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss

2009/12/15 Dave Chinner <david@fromorbit.com>:
> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>> Hi, Eric:
>> I add some code like this:
>> if (*stat) {
>>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>                                 *stat, oindex, index);
>>                 if (oindex == NULL || index == NULL) {
>
> This won't catch bad non-NULL pointers like you are seeing.
>
>>                         printk("BUG occured!\n");
>>                         printk("oindex = %p, index = %p\n", oindex, index);
>>                         BUG();
>>                 }
>>                 *oindex = *index = cur->bc_ptrs[level];
>>                 return 0;
>>         }
>>
>> And the same OOPS happened again but a little different, kernel messages are:
>>
>> <snip>
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>> Unable to handle kernel paging request for data at address 0x22008424
>
> Given that oindex and index are stack varibles, this indicates some
> thing is probably smashing the stack. Possibly a buffer overrun. To
> narrow down the possible cause, can you add the debug:
>
>        printk("%s:%s: oindex = %p, index = %p\n",
>                        __func__, __LINE__, oindex, index);
>
> throughout the xfs_btree_make_block_unfull() function? i.e. at
> first entry, before the xfs_btree_rshift() call, before the
> xfs_btree_lshift() call, etc, to see if any of the parameters
> are being modified during execution of the function?
>
> If the variables being passed into xfs_btree_make_block_unfull() are
> already bad, then do the same thing for the caller
> xfs_btree_insert(). This may help narrow down where the problem
> is coming from....
>
Thanks for your reply!
As you said, I added some code like this:
/* First, try shifting an entry to the right neighbor. */
        printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
                        __func__, oindex, index);
        error = xfs_btree_rshift(cur, level, stat);
        if (error || *stat)
                return error;

        /* Next, try shifting an entry to the left neighbor. */
        printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
                        __func__, oindex, index);
        error = xfs_btree_lshift(cur, level, stat);
        if (error)
                return error;

        if (*stat) {
                printk("*stat = 0x%08x, oindex = %p, index = %p\n",
                                *stat, oindex, index);
                if (oindex == NULL || index == NULL) {
                        printk("BUG occured!\n");
                        printk("oindex = %p, index = %p\n", oindex, index);
                        BUG();
                }
                *oindex = *index = cur->bc_ptrs[level];
                return 0;
        }


xfs_btree_set_ptr_null(cur, &nptr);
        if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
                printk("%s: before calling
xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n",
                                __func__, &optr, &ptr);
                error = xfs_btree_make_block_unfull(cur, level, numrecs,
                                        &optr, &ptr, &nptr, &ncur, &nrec, stat);
                if (error || *stat == 0)
                        goto error0;
        }


We are waiting for OOPS to happen.

I hope it will nerver be memory corrupt problem which is nightmare for
me to debug.

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  1:56                       ` hank peng
@ 2009-12-15  3:15                         ` Eric Sandeen
  2009-12-15  3:22                           ` hank peng
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Sandeen @ 2009-12-15  3:15 UTC (permalink / raw)
  To: hank peng; +Cc: xfs-oss

hank peng wrote:
> 2009/12/15 Dave Chinner <david@fromorbit.com>:
>> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>>> Hi, Eric:
>>> I add some code like this:
>>> if (*stat) {
>>>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>>                                 *stat, oindex, index);
>>>                 if (oindex == NULL || index == NULL) {
>> This won't catch bad non-NULL pointers like you are seeing.
>>
>>>                         printk("BUG occured!\n");
>>>                         printk("oindex = %p, index = %p\n", oindex, index);
>>>                         BUG();
>>>                 }
>>>                 *oindex = *index = cur->bc_ptrs[level];
>>>                 return 0;
>>>         }
>>>
>>> And the same OOPS happened again but a little different, kernel messages are:
>>>
>>> <snip>
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>>> Unable to handle kernel paging request for data at address 0x22008424

Are you using any of the xfs userspace prior to this error, or is it a
fresh boot and just normal IO?

I ask because libxfs calls sys_ustat() which at one point was corrupting
userspace, at least, with 32-bit userspace on a 64-bit kernel:
https://bugzilla.redhat.com/show_bug.cgi?id=472795

Even with that fixed there were still some reports of odd behavior
on ppc... I don't know if things might be going wrong in kernelspace
as well...

https://bugzilla.redhat.com/show_bug.cgi?id=517994
and I haven't gotten to the bottom of that yet ...

Very few things actually use sys_ustat, but xfs userspace does...
just a random thought.

-eric

>> Given that oindex and index are stack varibles, this indicates some
>> thing is probably smashing the stack. Possibly a buffer overrun. To
>> narrow down the possible cause, can you add the debug:
>>
>>        printk("%s:%s: oindex = %p, index = %p\n",
>>                        __func__, __LINE__, oindex, index);
>>
>> throughout the xfs_btree_make_block_unfull() function? i.e. at
>> first entry, before the xfs_btree_rshift() call, before the
>> xfs_btree_lshift() call, etc, to see if any of the parameters
>> are being modified during execution of the function?
>>
>> If the variables being passed into xfs_btree_make_block_unfull() are
>> already bad, then do the same thing for the caller
>> xfs_btree_insert(). This may help narrow down where the problem
>> is coming from....
>>
> Thanks for your reply!
> As you said, I added some code like this:
> /* First, try shifting an entry to the right neighbor. */
>         printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
>                         __func__, oindex, index);
>         error = xfs_btree_rshift(cur, level, stat);
>         if (error || *stat)
>                 return error;
> 
>         /* Next, try shifting an entry to the left neighbor. */
>         printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
>                         __func__, oindex, index);
>         error = xfs_btree_lshift(cur, level, stat);
>         if (error)
>                 return error;
> 
>         if (*stat) {
>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>                                 *stat, oindex, index);
>                 if (oindex == NULL || index == NULL) {
>                         printk("BUG occured!\n");
>                         printk("oindex = %p, index = %p\n", oindex, index);
>                         BUG();
>                 }
>                 *oindex = *index = cur->bc_ptrs[level];
>                 return 0;
>         }
> 
> 
> xfs_btree_set_ptr_null(cur, &nptr);
>         if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
>                 printk("%s: before calling
> xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n",
>                                 __func__, &optr, &ptr);
>                 error = xfs_btree_make_block_unfull(cur, level, numrecs,
>                                         &optr, &ptr, &nptr, &ncur, &nrec, stat);
>                 if (error || *stat == 0)
>                         goto error0;
>         }
> 
> 
> We are waiting for OOPS to happen.
> 
> I hope it will nerver be memory corrupt problem which is nightmare for
> me to debug.
> 
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
>>
> 
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  3:15                         ` Eric Sandeen
@ 2009-12-15  3:22                           ` hank peng
  0 siblings, 0 replies; 12+ messages in thread
From: hank peng @ 2009-12-15  3:22 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

2009/12/15 Eric Sandeen <sandeen@sandeen.net>:
> hank peng wrote:
>> 2009/12/15 Dave Chinner <david@fromorbit.com>:
>>> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>>>> Hi, Eric:
>>>> I add some code like this:
>>>> if (*stat) {
>>>>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>>>                                 *stat, oindex, index);
>>>>                 if (oindex == NULL || index == NULL) {
>>> This won't catch bad non-NULL pointers like you are seeing.
>>>
>>>>                         printk("BUG occured!\n");
>>>>                         printk("oindex = %p, index = %p\n", oindex, index);
>>>>                         BUG();
>>>>                 }
>>>>                 *oindex = *index = cur->bc_ptrs[level];
>>>>                 return 0;
>>>>         }
>>>>
>>>> And the same OOPS happened again but a little different, kernel messages are:
>>>>
>>>> <snip>
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>>>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>>>> Unable to handle kernel paging request for data at address 0x22008424
>
> Are you using any of the xfs userspace prior to this error, or is it a
> fresh boot and just normal IO?
>
no xfs userspace prior to this error, just normal IO. Besides, it need
some time to produce the OOPS.

> I ask because libxfs calls sys_ustat() which at one point was corrupting
> userspace, at least, with 32-bit userspace on a 64-bit kernel:
> https://bugzilla.redhat.com/show_bug.cgi?id=472795
>
Forgot to say, I use "-o inode64" when mount.

# uname -a
Linux Storage 2.6.31.6-svn40 #30 Tue Dec 15 09:50:02 CST 2009 ppc unknown
# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (rw,relatime,errors=continue)
/dev/mtdblock2 on /mnt/sys_data type jffs2 (rw,relatime)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
tmpfs on /opt/upgrade type tmpfs (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620)
/dev/Pool_md2/ss1 on /mnt/Pool_md2/ss1 type xfs
(rw,relatime,attr2,inode64,noquota)



> Even with that fixed there were still some reports of odd behavior
> on ppc... I don't know if things might be going wrong in kernelspace
> as well...
>
> https://bugzilla.redhat.com/show_bug.cgi?id=517994
> and I haven't gotten to the bottom of that yet ...
>
> Very few things actually use sys_ustat, but xfs userspace does...
> just a random thought.
>
> -eric
>
>>> Given that oindex and index are stack varibles, this indicates some
>>> thing is probably smashing the stack. Possibly a buffer overrun. To
>>> narrow down the possible cause, can you add the debug:
>>>
>>>        printk("%s:%s: oindex = %p, index = %p\n",
>>>                        __func__, __LINE__, oindex, index);
>>>
>>> throughout the xfs_btree_make_block_unfull() function? i.e. at
>>> first entry, before the xfs_btree_rshift() call, before the
>>> xfs_btree_lshift() call, etc, to see if any of the parameters
>>> are being modified during execution of the function?
>>>
>>> If the variables being passed into xfs_btree_make_block_unfull() are
>>> already bad, then do the same thing for the caller
>>> xfs_btree_insert(). This may help narrow down where the problem
>>> is coming from....
>>>
>> Thanks for your reply!
>> As you said, I added some code like this:
>> /* First, try shifting an entry to the right neighbor. */
>>         printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
>>                         __func__, oindex, index);
>>         error = xfs_btree_rshift(cur, level, stat);
>>         if (error || *stat)
>>                 return error;
>>
>>         /* Next, try shifting an entry to the left neighbor. */
>>         printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
>>                         __func__, oindex, index);
>>         error = xfs_btree_lshift(cur, level, stat);
>>         if (error)
>>                 return error;
>>
>>         if (*stat) {
>>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>                                 *stat, oindex, index);
>>                 if (oindex == NULL || index == NULL) {
>>                         printk("BUG occured!\n");
>>                         printk("oindex = %p, index = %p\n", oindex, index);
>>                         BUG();
>>                 }
>>                 *oindex = *index = cur->bc_ptrs[level];
>>                 return 0;
>>         }
>>
>>
>> xfs_btree_set_ptr_null(cur, &nptr);
>>         if (numrecs == cur->bc_ops->get_maxrecs(cur, level)) {
>>                 printk("%s: before calling
>> xfs_btree_make_block_unfull, &optr = %p, &ptr = %p\n",
>>                                 __func__, &optr, &ptr);
>>                 error = xfs_btree_make_block_unfull(cur, level, numrecs,
>>                                         &optr, &ptr, &nptr, &ncur, &nrec, stat);
>>                 if (error || *stat == 0)
>>                         goto error0;
>>         }
>>
>>
>> We are waiting for OOPS to happen.
>>
>> I hope it will nerver be memory corrupt problem which is nightmare for
>> me to debug.
>>
>>> Cheers,
>>>
>>> Dave.
>>> --
>>> Dave Chinner
>>> david@fromorbit.com
>>>
>>
>>
>>
>
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  1:26                     ` Dave Chinner
  2009-12-15  1:56                       ` hank peng
@ 2009-12-15  5:36                       ` hank peng
  2010-01-13  1:11                       ` hank peng
  2 siblings, 0 replies; 12+ messages in thread
From: hank peng @ 2009-12-15  5:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss

2009/12/15 Dave Chinner <david@fromorbit.com>:
> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>> Hi, Eric:
>> I add some code like this:
>> if (*stat) {
>>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>                                 *stat, oindex, index);
>>                 if (oindex == NULL || index == NULL) {
>
> This won't catch bad non-NULL pointers like you are seeing.
>
>>                         printk("BUG occured!\n");
>>                         printk("oindex = %p, index = %p\n", oindex, index);
>>                         BUG();
>>                 }
>>                 *oindex = *index = cur->bc_ptrs[level];
>>                 return 0;
>>         }
>>
>> And the same OOPS happened again but a little different, kernel messages are:
>>
>> <snip>
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>> Unable to handle kernel paging request for data at address 0x22008424
>
> Given that oindex and index are stack varibles, this indicates some

In xfs_btree_make_block_unfull, it seems that oindex and index are
optimised to register variables. So, it become more odd.

> thing is probably smashing the stack. Possibly a buffer overrun. To
> narrow down the possible cause, can you add the debug:
>
>        printk("%s:%s: oindex = %p, index = %p\n",
>                        __func__, __LINE__, oindex, index);
>
> throughout the xfs_btree_make_block_unfull() function? i.e. at
> first entry, before the xfs_btree_rshift() call, before the
> xfs_btree_lshift() call, etc, to see if any of the parameters
> are being modified during execution of the function?
>
> If the variables being passed into xfs_btree_make_block_unfull() are
> already bad, then do the same thing for the caller
> xfs_btree_insert(). This may help narrow down where the problem
> is coming from....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
  2009-12-15  1:26                     ` Dave Chinner
  2009-12-15  1:56                       ` hank peng
  2009-12-15  5:36                       ` hank peng
@ 2010-01-13  1:11                       ` hank peng
  2 siblings, 0 replies; 12+ messages in thread
From: hank peng @ 2010-01-13  1:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, xfs-oss

2009/12/15 Dave Chinner <david@fromorbit.com>:
> On Tue, Dec 15, 2009 at 08:49:37AM +0800, hank peng wrote:
>> Hi, Eric:
>> I add some code like this:
>> if (*stat) {
>>                 printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>>                                 *stat, oindex, index);
>>                 if (oindex == NULL || index == NULL) {
>
> This won't catch bad non-NULL pointers like you are seeing.
>
>>                         printk("BUG occured!\n");
>>                         printk("oindex = %p, index = %p\n", oindex, index);
>>                         BUG();
>>                 }
>>                 *oindex = *index = cur->bc_ptrs[level];
>>                 return 0;
>>         }
>>
>> And the same OOPS happened again but a little different, kernel messages are:
>>
>> <snip>
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
>> *stat = 0x00000001, oindex = 00000501, index = 22008424
>> Unable to handle kernel paging request for data at address 0x22008424
>
> Given that oindex and index are stack varibles, this indicates some
> thing is probably smashing the stack. Possibly a buffer overrun. To
> narrow down the possible cause, can you add the debug:
>
>        printk("%s:%s: oindex = %p, index = %p\n",
>                        __func__, __LINE__, oindex, index);
>
> throughout the xfs_btree_make_block_unfull() function? i.e. at
> first entry, before the xfs_btree_rshift() call, before the
> xfs_btree_lshift() call, etc, to see if any of the parameters
> are being modified during execution of the function?
>
> If the variables being passed into xfs_btree_make_block_unfull() are
> already bad, then do the same thing for the caller
> xfs_btree_insert(). This may help narrow down where the problem
> is coming from....
>
I added the following debug code as you said:
<code>
printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
                        __func__, oindex, index);
        error = xfs_btree_rshift(cur, level, stat);
        if (error || *stat)
                return error;

        /* Next, try shifting an entry to the left neighbor. */
        printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",
                        __func__, oindex, index);
        error = xfs_btree_lshift(cur, level, stat);
        if (error)
                return error;

        if (*stat) {
                printk("%s: oindex = %p, index = %p, *stat = %d\n",
__func__, oindex, index, *stat);
                *oindex = *index = cur->bc_ptrs[level];
                return 0;
        }
</code>

It has been working fine for about 36 hours without problem, but in
today's morning, odd OOPS appeared:

xfs_btree_make_block_unfull: before xfs_btree_rshift, oindex =
d3a27bd8, index = d3a27bdc
xfs_btree_make_block_unfull: before xfs_btree_lshift, oindex =
d3a27bd8, index = d3a27bdc
xfs_btree_make_block_unfull: oindex = d3a27bd8, index = d3a27bdc, *stat = 1
xfs_btree_make_block_unfull: before xfs_btree_rshift, oindex =
d3a27bd8, index = d3a27bdc
Unable to handle kernel paging request for data at address 0x00000501
Faulting instruction address: 0xc019f4f0
Oops: Kernel access of bad area, sig: 11 [#2]
MPC85xx CDS
Modules linked in:
NIP: c019f4f0 LR: c019f4e8 CTR: c023fabc
REGS: d3a27ad0 TRAP: 0300   Tainted: G      D     (2.6.31.6-svn45)
MSR: 00029000 <EE,ME,CE>  CR: 22008424  XER: 20000000
DEAR: 00000501, ESR: 00000000
TASK = efb46a30[20273] 'cp' THREAD: d3a26000
GPR00: c019f4e8 d3a27b80 efb46a30 00000000 d3a27b38 d3a27b38 00000010 007f0f26
GPR08: c04a7c40 ffffffff e8517850 d3a27b80 20008422 100eb39c 3fff5400 100a0000
GPR16: 100d5ac8 00000000 016d30f3 e8517850 c019d08c 00029000 d3a27bf0 c023fabc
GPR24: c019d068 00000000 22008424 d3a27bdc 00000501 d3a27bd8 00000000 e8517850
NIP [c019f4f0] xfs_btree_make_block_unfull+0x8c/0x1f8
LR [c019f4e8] xfs_btree_make_block_unfull+0x84/0x1f8
Call Trace:
[d3a27b80] [c019f4e8] xfs_btree_make_block_unfull+0x84/0x1f8 (unreliable)
[d3a27bc0] [c019f9d0] xfs_btree_insrec+0x374/0x4b0
[d3a27c30] [c019fb88] xfs_btree_insert+0x7c/0x1c0
[d3a27c90] [c01865d0] xfs_free_ag_extent+0x34c/0x810
[d3a27d00] [c0187168] xfs_free_extent+0xdc/0x104
[d3a27d90] [c018fe50] xfs_bmap_finish+0x154/0x1a0
[d3a27dc0] [c01b697c] xfs_itruncate_finish+0x254/0x3b8
[d3a27e40] [c01d2134] xfs_inactive+0x2c4/0x450
[d3a27e80] [c01e193c] xfs_fs_clear_inode+0x40/0x50
[d3a27e90] [c00a84bc] clear_inode+0x6c/0x108
[d3a27ea0] [c00a87d0] generic_delete_inode+0x114/0x118
[d3a27eb0] [c00a7ff8] iput+0x74/0x94
[d3a27ec0] [c00a003c] do_unlinkat+0x114/0x198
[d3a27f40] [c000f7ac] ret_from_syscall+0x0/0x3c
Instruction dump:
7f66db78 7f44d378 7fa5eb78 3863eca4 4cc63182 4be97ef5 7fe3fb78 7fc4f378
7f85e378 4bffdb15 7c791b79 40820010 <801c0000> 2f800000 419e001c 80010044
---[ end trace 95e2c49eb5a34f9a ]---

(gdb) list *(xfs_btree_make_block_unfull+0x8c)
0xc019f4f0 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2636).
2631
2632            /* First, try shifting an entry to the right neighbor. */
2633            printk("%s: before xfs_btree_rshift, oindex = %p, index = %p\n",
2634                            __func__, oindex, index);
2635            error = xfs_btree_rshift(cur, level, stat);
2636            if (error || *stat)
2637                    return error;
2638
2639            /* Next, try shifting an entry to the left neighbor. */
2640            printk("%s: before xfs_btree_lshift, oindex = %p, index = %p\n",

It seems that after call xfs_btree_rshift, the value of 'stat' has
been changed, how could it be possible since it is local variable?

# uname -a
Linux Storage 2.6.31.6-svn45 #87 Mon Jan 11 13:22:14 CST 2010 ppc unknown
# mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (rw,relatime,errors=continue)
/dev/mtdblock2 on /mnt/sys_data type jffs2 (rw,relatime)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
tmpfs on /opt/upgrade type tmpfs (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620)
/dev/vg_log/lv_log on /var/log type reiserfs (rw,relatime)
/dev/Pool_md1/SS1 on /mnt/Pool_md1/SS1 type xfs
(rw,relatime,attr2,inode64,noquota)
/dev/Pool_md2/SS2 on /mnt/Pool_md2/SS2 type xfs
(rw,relatime,attr2,inode64,noquota)
root@Storage:/var/log# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/root               124.0M     72.6M     51.4M  59% /
/dev/mtdblock2            1.0M    408.0K    616.0K  40% /mnt/sys_data
tmpfs                   505.3M         0    505.3M   0% /opt/upgrade
/dev/vg_log/lv_log       10.0G     32.4M     10.0G   0% /var/log
/dev/Pool_md1/SS1         2.7T    270.2G      2.5T  10% /mnt/Pool_md1/SS1
/dev/Pool_md2/SS2         2.7T    344.0G      2.4T  12% /mnt/Pool_md2/SS2

From assembly code, I noticed that the local variable 'stat' didn't
have real space in stack. It is optimised to be a register(r28).
According to powerpc ABI, before call xfs_btree_rshift, some registers
will be saved at stack and before return from xfs_btree_rshift, these
registers will be restored. Is it possible that a smash occured at
this time?

BTW, I noticed that my cross-compiler "powerpc-linux-gnuspe-gcc"
didn't have default 4 bytes alignment but 8 bytes alignment.



> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
The simplest is not all best but the best is surely the simplest!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-01-13  1:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-09  1:58 [BUG report]xfs_btree_make_block_unfull generated an OOPS hank peng
2009-12-09  2:57 ` Eric Sandeen
2009-12-09  3:18   ` hank peng
     [not found]     ` <4B1F18C4.3060704@sandeen.net>
     [not found]       ` <389deec70912082053v4310057dg479f6d4b6c4b46f7@mail.gmail.com>
     [not found]         ` <4B1F31FD.3020705@sandeen.net>
     [not found]           ` <389deec70912082220pcb3b5d1q516ac197d31502c5@mail.gmail.com>
     [not found]             ` <389deec70912082230g38987576pc48d7699f23844c5@mail.gmail.com>
     [not found]               ` <389deec70912140119q40ed91cao62fe9c9ebdf13601@mail.gmail.com>
2009-12-14 15:56                 ` Eric Sandeen
2009-12-15  0:49                   ` hank peng
2009-12-15  0:58                     ` hank peng
2009-12-15  1:26                     ` Dave Chinner
2009-12-15  1:56                       ` hank peng
2009-12-15  3:15                         ` Eric Sandeen
2009-12-15  3:22                           ` hank peng
2009-12-15  5:36                       ` hank peng
2010-01-13  1:11                       ` hank peng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox