public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Xfs Access to block zero  exception and system crash
@ 2008-06-24  7:03 Sagar Borikar
  2008-06-25  6:48 ` Sagar Borikar
  2008-06-25  8:49 ` Dave Chinner
  0 siblings, 2 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-06-24  7:03 UTC (permalink / raw)
  To: xfs; +Cc: Sagar Borikar


Hello,

I hope this is the right list to address this issue. If not please divert me to the right list. 

We are facing strange issue with xfs under heavy load. It's a NAS box with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 2.8.11. 
NAS allows to create RAID1,RAID5 over XFS. The system is stable in general without any stress. We don't see any issues in day to day activities. 
But when it is exposed to stress with multiple iozone clients, it starts giving weird issues. 
The iozone stress test is ran with 15 CIFS clients, pumping data over 1GBps network, continuously for 48 hours as a part of calculating MTBF of the system, it crashes at different stages in different stimulus but in XFS only. 


A. Initially it used to give access to block zero exception and system used to crash for which I applied Nathan Scott's patch which removes the kernel panic when this situation is hit. http://oss.sgi.com/archives/xfs/2006-08/msg00073.html
After back porting this patch, we observed that the system is not crashing but the warning messages are still coming. And after some time the system goes in soft lockup state and becomes non-responsive. 

I couldn't run the xfs_db or xfs_repair to check what is the state of the inode as console was not reachable after hitting the lockup state. 
Here is the log 
"
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
af
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
af
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
af
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
b0
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
b0
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8
20
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8
20
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st
ate: 1 lastx: 891
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st
"

Once we hit the soft lockup, system has to be rebooted as it is completely stalled and we can't even check which processes are running. I could be wrong but it was surprising to me that the same inode was referring to different offsets and blkcnt. It took 48 hours to reach this state and system had to be rebooted.


B. In another DUT, the system just rebooted after displaying couple of warning messages without entering in soft lockup state.


"
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inï



PMON2000 MIPS Initializing. Standby...
ERRORPC=bfc00004 CONFIG=0042e4bb STATUS=00400000
CPU PRID 000034c1, MaskID 00001320
Initializing caches...done (CONFIG=0042e4bb)
Switching to runtime address map...done
Setting up SDRAM controller: Manual SDRAM setup
  drive strength 0x000073c7
  output timing  0x00000fca
  general config 0x80010000
master clock 100 Mhz, MulFundBIU 0x02, DivXSDRAM 0x02
sdram freq 0x09ef21aa hz, sdram period: 0x06 nsec

"


It took 43 hours to come to this state. 

C. In another stimulus, device driver mentioned that it can't access the block. Which means that filesystem got corrupted. I inferred that Filesystem was trying to reach block which is not existing in the disk. After some time, it recovers itself and starts giving weird issue. Finally it hits the memory access exception and system crashes.

"
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
8
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
8
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device



Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
e
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
e
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
f
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
f
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
f
CPU 0 Unable to handle kernel paging request at virtual address 04e81080, epc == 802a90ac, ra == 802a9094
Oops[#1]:
Cpu 0
$ 0   : 00000000 9000a001 84e81080 80000000
$ 4   : 82ce6dd0 00000000 ffffffff ffffffff
$ 8   : 00086800 00000000 00086800 00000001
$12   : 00000004 34000000 82ce6c00 00000001
$16   : ffffffff 04e81080 34000000 81213978
$20   : 82ce6c00 82ce6dd0 00000000 34000000
$24   : 00086800 00000000                  
$28   : 81212000 81213878 00000000 802a9094
Hi    : 00000000
Lo    : 00036a20
epc   : 802a90ac xfs_bmap_btalloc+0x33c/0x950     Not tainted
ra    : 802a9094 xfs_bmap_btalloc+0x324/0x950
Status: 9000a003    KERNEL EXL IE 
Cause : 00000008
BadVA : 04e81080
PrId  : 000034c1
Modules linked in: aes autofs4
Process pdflush (pid: 66, threadinfo=81212000, task=8120b138)
Stack : 81213880 811c9074 00000003 863af000 00000000 00000001 000000cb 805c1f90
        812139b8 8616ece0 8538e6f8 82ce6c00 812139fc ffffffff 00086800 00000000
        802aad9c 802aad80 8616ed30 00000001 8173c6f4 813cf200 812138d8 00000001
        00000200 00000000 812139b8 00000004 81213a00 00000000 ffffffff ffffffff
        00000000 00000000 00000001 00000000 00000000 81213a00 000002a3 81213ac0
        ...
Call Trace:
[<802a90ac>] xfs_bmap_btalloc+0x33c/0x950
[<802a9700>] xfs_bmap_alloc+0x40/0x4c
[<802acc9c>] xfs_bmapi+0x8d8/0x13e4
[<802d42d4>] xfs_iomap_write_allocate+0x3c0/0x5f4
[<802d2b28>] xfs_iomap+0x408/0x4dc
[<802fe90c>] xfs_bmap+0x30/0x3c
[<802f3cfc>] xfs_map_blocks+0x50/0x84
[<802f512c>] xfs_page_state_convert+0x3f4/0x840
[<802f565c>] xfs_vm_writepage+0xe4/0x140
[<80198758>] mpage_writepages+0x24c/0x45c
[<802f56e8>] xfs_vm_writepages+0x30/0x3c
[<801507b4>] do_writepages+0x44/0x84
[<80196628>] __sync_single_inode+0x68/0x234
[<80196980>] __writeback_single_inode+0x18c/0x1ac
[<80196ba8>] sync_sb_inodes+0x208/0x2f0
[<80196d14>] writeback_inodes+0x84/0xd0
[<801503e0>] background_writeout+0xac/0xfc
[<80151330>] __pdflush+0x130/0x228
[<80151458>] pdflush+0x30/0x3c
[<801398bc>] kthread+0x98/0xe0
[<80104c38>] kernel_thread_helper+0x10/0x18

"


In all the three cases, when I tried to perform the slower tests i.e. with 6 clients but with the same stimulus, there we re no exceptions and system was stable for 5 days. 

[root@Cousteau6 ~]# df -k
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/scsibd2              41664     41664         0 100% /
udev                      62044         4     62040   0% /dev
tmpfs                      5120      3468      1652  68% /var
tmpfs                     62044        24     62020   0% /tmp
tmpfs                       128         4       124   3% /mnt
/dev/mtdblock1             1664       436      1228  26% /linuxrwfs
/dev/RAIDA/Volume1     10475520       624  10474896   0% /mnt/RAIDA/Volume1
/dev/RAIDA/Volume1     10475520       624  10474896   0% /mnt/ftp_dir/homes
/dev/RAIDA/IOZONETEST   4184064   2479044   1705020  59% /mnt/RAIDA/IOZONETEST
/dev/RAIDA/IOZONETEST   4184064   2479044   1705020  59% /mnt/ftp_dir/share1
/dev/RAIDA/Volume1     10475520       624  10474896   0% /mnt/ftp_dir/share2

Can anyone let me know what could be the probable cause of this issue.

Thanks in advance
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-06-24  7:03 Xfs Access to block zero exception and system crash Sagar Borikar
@ 2008-06-25  6:48 ` Sagar Borikar
  2008-06-25  8:49 ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-06-25  6:48 UTC (permalink / raw)
  To: xfs; +Cc: linux-xfs

 
Hello,

Can anyone help me out here?

Thanks
Sagar

-----Original Message-----
From: Sagar Borikar 
Sent: Tuesday, June 24, 2008 12:33 PM
To: 'xfs@oss.sgi.com'
Cc: Sagar Borikar
Subject: Xfs Access to block zero exception and system crash


Hello,

I hope this is the right list to address this issue. If not please divert me to the right list. 

We are facing strange issue with xfs under heavy load. It's a NAS box with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 2.8.11. 
NAS allows to create RAID1,RAID5 over XFS. The system is stable in general without any stress. We don't see any issues in day to day activities. 
But when it is exposed to stress with multiple iozone clients, it starts giving weird issues. 
The iozone stress test is ran with 15 CIFS clients, pumping data over 1GBps network, continuously for 48 hours as a part of calculating MTBF of the system, it crashes at different stages in different stimulus but in XFS only. 


A. Initially it used to give access to block zero exception and system used to crash for which I applied Nathan Scott's patch which removes the kernel panic when this situation is hit. http://oss.sgi.com/archives/xfs/2006-08/msg00073.html
After back porting this patch, we observed that the system is not crashing but the warning messages are still coming. And after some time the system goes in soft lockup state and becomes non-responsive. 

I couldn't run the xfs_db or xfs_repair to check what is the state of the inode as console was not reachable after hitting the lockup state. 
Here is the log
"
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 af Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 af Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 af Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 b0 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 b0 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8 20 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8 20 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st
ate: 1 lastx: 891
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st "

Once we hit the soft lockup, system has to be rebooted as it is completely stalled and we can't even check which processes are running. I could be wrong but it was surprising to me that the same inode was referring to different offsets and blkcnt. It took 48 hours to reach this state and system had to be rebooted.


B. In another DUT, the system just rebooted after displaying couple of warning messages without entering in soft lockup state.


"
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inï



PMON2000 MIPS Initializing. Standby...
ERRORPC=bfc00004 CONFIG=0042e4bb STATUS=00400000 CPU PRID 000034c1, MaskID 00001320 Initializing caches...done (CONFIG=0042e4bb) Switching to runtime address map...done Setting up SDRAM controller: Manual SDRAM setup
  drive strength 0x000073c7
  output timing  0x00000fca
  general config 0x80010000
master clock 100 Mhz, MulFundBIU 0x02, DivXSDRAM 0x02 sdram freq 0x09ef21aa hz, sdram period: 0x06 nsec

"


It took 43 hours to come to this state. 

C. In another stimulus, device driver mentioned that it can't access the block. Which means that filesystem got corrupted. I inferred that Filesystem was trying to reach block which is not existing in the disk. After some time, it recovers itself and starts giving weird issue. Finally it hits the memory access exception and system crashes.

"
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
8
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
8
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001       ("xfs_trans_read_buf") error 5 buf count
 512
attempt to access beyond end of device



Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f e Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f e Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f f Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f f Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f f CPU 0 Unable to handle kernel paging request at virtual address 04e81080, epc == 802a90ac, ra == 802a9094
Oops[#1]:
Cpu 0
$ 0   : 00000000 9000a001 84e81080 80000000
$ 4   : 82ce6dd0 00000000 ffffffff ffffffff
$ 8   : 00086800 00000000 00086800 00000001
$12   : 00000004 34000000 82ce6c00 00000001
$16   : ffffffff 04e81080 34000000 81213978
$20   : 82ce6c00 82ce6dd0 00000000 34000000
$24   : 00086800 00000000                  
$28   : 81212000 81213878 00000000 802a9094
Hi    : 00000000
Lo    : 00036a20
epc   : 802a90ac xfs_bmap_btalloc+0x33c/0x950     Not tainted
ra    : 802a9094 xfs_bmap_btalloc+0x324/0x950
Status: 9000a003    KERNEL EXL IE 
Cause : 00000008
BadVA : 04e81080
PrId  : 000034c1
Modules linked in: aes autofs4
Process pdflush (pid: 66, threadinfo=81212000, task=8120b138) Stack : 81213880 811c9074 00000003 863af000 00000000 00000001 000000cb 805c1f90
        812139b8 8616ece0 8538e6f8 82ce6c00 812139fc ffffffff 00086800 00000000
        802aad9c 802aad80 8616ed30 00000001 8173c6f4 813cf200 812138d8 00000001
        00000200 00000000 812139b8 00000004 81213a00 00000000 ffffffff ffffffff
        00000000 00000000 00000001 00000000 00000000 81213a00 000002a3 81213ac0
        ...
Call Trace:
[<802a90ac>] xfs_bmap_btalloc+0x33c/0x950 [<802a9700>] xfs_bmap_alloc+0x40/0x4c [<802acc9c>] xfs_bmapi+0x8d8/0x13e4 [<802d42d4>] xfs_iomap_write_allocate+0x3c0/0x5f4
[<802d2b28>] xfs_iomap+0x408/0x4dc
[<802fe90c>] xfs_bmap+0x30/0x3c
[<802f3cfc>] xfs_map_blocks+0x50/0x84
[<802f512c>] xfs_page_state_convert+0x3f4/0x840
[<802f565c>] xfs_vm_writepage+0xe4/0x140 [<80198758>] mpage_writepages+0x24c/0x45c [<802f56e8>] xfs_vm_writepages+0x30/0x3c [<801507b4>] do_writepages+0x44/0x84 [<80196628>] __sync_single_inode+0x68/0x234 [<80196980>] __writeback_single_inode+0x18c/0x1ac
[<80196ba8>] sync_sb_inodes+0x208/0x2f0
[<80196d14>] writeback_inodes+0x84/0xd0
[<801503e0>] background_writeout+0xac/0xfc [<80151330>] __pdflush+0x130/0x228 [<80151458>] pdflush+0x30/0x3c [<801398bc>] kthread+0x98/0xe0 [<80104c38>] kernel_thread_helper+0x10/0x18

"


In all the three cases, when I tried to perform the slower tests i.e. with 6 clients but with the same stimulus, there we re no exceptions and system was stable for 5 days. 

[root@Cousteau6 ~]# df -k
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/scsibd2              41664     41664         0 100% /
udev                      62044         4     62040   0% /dev
tmpfs                      5120      3468      1652  68% /var
tmpfs                     62044        24     62020   0% /tmp
tmpfs                       128         4       124   3% /mnt
/dev/mtdblock1             1664       436      1228  26% /linuxrwfs
/dev/RAIDA/Volume1     10475520       624  10474896   0% /mnt/RAIDA/Volume1
/dev/RAIDA/Volume1     10475520       624  10474896   0% /mnt/ftp_dir/homes
/dev/RAIDA/IOZONETEST   4184064   2479044   1705020  59% /mnt/RAIDA/IOZONETEST
/dev/RAIDA/IOZONETEST   4184064   2479044   1705020  59% /mnt/ftp_dir/share1
/dev/RAIDA/Volume1     10475520       624  10474896   0% /mnt/ftp_dir/share2

Can anyone let me know what could be the probable cause of this issue.

Thanks in advance
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-24  7:03 Xfs Access to block zero exception and system crash Sagar Borikar
  2008-06-25  6:48 ` Sagar Borikar
@ 2008-06-25  8:49 ` Dave Chinner
  2008-06-26  6:46   ` Sagar Borikar
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-06-25  8:49 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Tue, Jun 24, 2008 at 12:03:16AM -0700, Sagar Borikar wrote:
> 
> Hello,
> 
> I hope this is the right list to address this issue. If not please divert me to the right list. 
> 
> We are facing strange issue with xfs under heavy load. It's a NAS
> box with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS
> version 2.8.11.

[...]

> Can anyone let me know what could be the probable cause of this issue.

they are all from  corrupted extent btrees.

There are many possible causes of this that we've fixed over the
past years since 2.6.18 was released. Indeed, we are currently
discussing fixes for a bunch of problems that lead to corrupted
extent btrees and problems like this. I'd suggest that you should
probably start with a more recent kernel, make sure you have a
serial console and set the xfs_error_level to 11 so that it gives as
much information as possible on the console when the error it hit.
if that doesn't give a stack trace, then  you need to set the
xfs_panic_mask to crash the machine on block zero accesses and
report the stack straces that it outputs...

Cheers,

Dave.
-- 
Dave Chinner
dchinner@agami.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-06-25  8:49 ` Dave Chinner
@ 2008-06-26  6:46   ` Sagar Borikar
  2008-06-26  7:02     ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-06-26  6:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

 

Thanks Dave.

>> with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 
>> 2.8.11.

> [...]

>> Can anyone let me know what could be the probable cause of this issue.

> they are all from  corrupted extent btrees.
> There are many possible causes of this that we've fixed over the past years since 2.6.18 was released. Indeed, we are currently discussing fixes for a 
> bunch of problems that lead to corrupted extent btrees and problems like this. I'd suggest that you should probably start with a more recent kernel, 
> make sure you have a serial console and set the xfs_error_level to 11 so that it gives as much information as possible on the console when the error it > hit.
> if that doesn't give a stack trace, then  you need to set the xfs_panic_mask to crash the machine on block zero accesses and report the stack straces 
> that it outputs...
 

Yes, I went through the changes between 2.6.24 and 2.6.18 and they are quite a few. But as this is production system and on field, its not viable to upgrade the kernel. I do understand that there could be many places which can cause the corruption. Unfortunately, three different systems have given three different places of corruption as stated. Now I am sleeping in the access to block zero exception and rescheduling so that it won't stall the system and I can monitor the state of the filesystem. As the frequency of landing the error is once in 2.5 days under extreme stress,  if you could point me to the probable place to look at, I can narrow down the debugging path.

Thanks in advance
Sagar  

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-26  6:46   ` Sagar Borikar
@ 2008-06-26  7:02     ` Dave Chinner
  2008-06-27 10:13       ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-06-26  7:02 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

[please wrap your replies at 72 columns]

On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote:
> >> with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 
> >> 2.8.11.
> 
> > [...]
> 
> >> Can anyone let me know what could be the probable cause of this issue.
> 
> > they are all from  corrupted extent btrees.  There are many
> > possible causes of this that we've fixed over the past years
> > since 2.6.18 was released. Indeed, we are currently discussing
> > fixes for a bunch of problems that lead to corrupted extent
> > btrees and problems like this. I'd suggest that you should
> > probably start with a more recent kernel, make sure you have a
> > serial console and set the xfs_error_level to 11 so that it
> > gives as much information as possible on the console when the
> > error it > hit.  if that doesn't give a stack trace, then  you
> > need to set the xfs_panic_mask to crash the machine on block
> > zero accesses and report the stack straces that it outputs...
>  
> Yes, I went through the changes between 2.6.24 and 2.6.18 and they
> are quite a few. But as this is production system and on field,
> its not viable to upgrade the kernel.

Well, you're pretty much on your own then :/

> I do understand that there
> could be many places which can cause the corruption.
> Unfortunately, three different systems have given three different
> places of corruption as stated.

Yes, but all the same pattern of corruption, so it is likely
that it is one problem.

> Now I am sleeping in the access to
> block zero exception and rescheduling so that it won't stall the
> system and I can monitor the state of the filesystem. As the
> frequency of landing the error is once in 2.5 days under extreme
> stress,  if you could point me to the probable place to look at, I
> can narrow down the debugging path.

Like I said - it's a corrupt bmap btree. It could be a bug in the
bmap btree code, the alloc btree code, the inode data fork
manipulation code, it could be a block device bug returning bad data
to XFS on on a cancelled btree readahead, etc. IOWs, there are so many
possible causes of a corrupted btree that a bug report by itself is
mostly useless.

All I can suggest is working out a reproducable test case in your
development environment, attaching a debugger and start digging around
in memory when the problem is hit and try to find out exactly what
is corrupted. If you can't reproduce it or work out what is
occurring to trigger the problem, then we're not going to be able to
find the cause...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-26  7:02     ` Dave Chinner
@ 2008-06-27 10:13       ` Sagar Borikar
  2008-06-27 10:25         ` Sagar Borikar
  2008-06-28  0:02         ` Dave Chinner
  0 siblings, 2 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-06-27 10:13 UTC (permalink / raw)
  To: xfs


Dave Chinner wrote:
> [please wrap your replies at 72 columns]
>
> On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote:
>   
>
> Yes, but all the same pattern of corruption, so it is likely
> that it is one problem.
>
>   
> All I can suggest is working out a reproducable test case in your
> development environment, attaching a debugger and start digging around
> in memory when the problem is hit and try to find out exactly what
> is corrupted. If you can't reproduce it or work out what is
> occurring to trigger the problem, then we're not going to be able to
> find the cause...
>
> Cheers,
>
> Dave.
>   
Thanks Dave
I did some experiments today with the corrupted filesystem.
setup : NAS box contains one volume /share and 10 subdirectories.
In first subdirectory sh1, I kept 512MB file. Through a script I 
continuously copy this file
simultaneously from sh2 to sh10 subdirectories.
The script looks like
....
while [ 1 ]
do
cp $1 $2
done


And when I check the process status using top, almost all the cp 
processes are in
uninterruptible sleep state continuously.  Ran xfs_repair with -n option 
on filesystem mounted on JBOD
Here is the output :



Fri Jun 27 02:13:01 2008
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
bad nblocks 8788 for inode 33554562, would reset to 15461
bad nextents 18 for inode 33554562, would reset to 32
        - agno = 2
entry "iozone_68.tst" in shortform directory 67108993 references free 
inode 67108995
would have junked entry "iozone_68.tst" in directory inode 67108993
data fork in ino 67108995 claims dup extent, off - 252, start - 
14711445, cnt 576
bad data fork in inode 67108995
would have cleared inode 67108995
        - agno = 3
entry "iozone_68.tst" in shortform directory 100663425 references free 
inode 100663427
would have junked entry "iozone_68.tst" in directory inode 100663425
inode 100663427 - bad extent starting block number 906006917242880, 
offset 2533274882670609
bad data fork in inode 100663427
would have cleared inode 100663427
        - agno = 4
bad nblocks 10214 for inode 134217859, would reset to 16761
bad nextents 22 for inode 134217859, would reset to 34
        - agno = 5
bad nblocks 23581 for inode 167772290, would reset to 27557
bad nextents 39 for inode 167772290, would reset to 45
        - agno = 6
bad nblocks 14527 for inode 201326722, would reset to 15697
bad nextents 31 for inode 201326722, would reset to 34
bad nblocks 12633 for inode 201326723, would reset to 16647
bad nextents 23 for inode 201326723, would reset to 35
        - agno = 7
bad nblocks 26638 for inode 234881154, would reset to 27557
bad nextents 53 for inode 234881154, would reset to 54
bad nblocks 85653 for inode 234881155, would reset to 85664
bad nextents 310 for inode 234881155, would reset to 311
        - agno = 8
bad nblocks 23241 for inode 268640387, would reset to 27565
bad nextents 32 for inode 268640387, would reset to 42
bad nblocks 81766 for inode 268640388, would reset to 86012
bad nextents 332 for inode 268640388, would reset to 344
        - agno = 9
entry "iozone_68.tst" in shortform directory 301990016 references free 
inode 301990019
would have junked entry "iozone_68.tst" in directory inode 301990016
data fork in ino 301990019 claims dup extent, off - 26402, start - 
19129002, cnt 450
bad data fork in inode 301990019
would have cleared inode 301990019
bad nblocks 70282 for inode 301990020, would reset to 71793
bad nextents 281 for inode 301990020, would reset to 294
        - agno = 10
entry "iozone_68.tst" in shortform directory 335544448 references free 
inode 335544451
would have junked entry "iozone_68.tst" in directory inode 335544448
bad nblocks 11261 for inode 335544451, would reset to 19853
bad nextents 24 for inode 335544451, would reset to 41
imap claims in-use inode 335544451 is free, correcting imap
bad nblocks 119952 for inode 335544452, would reset to 121178
bad nextents 301 for inode 335544452, would reset to 312
        - agno = 11
bad nblocks 24361 for inode 369098883, would reset to 29553
bad nextents 51 for inode 369098883, would reset to 57
bad nblocks 3173 for inode 369098884, would reset to 5851
bad nextents 10 for inode 369098884, would reset to 18
        - agno = 12
entry "iozone_68.tst" in shortform directory 402653313 references free 
inode 402653318
would have junked entry "iozone_68.tst" in directory inode 402653313
bad nblocks 16348 for inode 402653317, would reset to 21485
bad nextents 28 for inode 402653317, would reset to 37
data fork in ino 402653318 claims dup extent, off - 124142, start - 
29379669, cnt 2
bad data fork in inode 402653318
would have cleared inode 402653318
        - agno = 13
bad nblocks 18374 for inode 436207747, would reset to 19991
bad nextents 43 for inode 436207747, would reset to 47
bad nblocks 38390 for inode 436207748, would reset to 38914
bad nextents 300 for inode 436207748, would reset to 304
        - agno = 14
bad nblocks 20267 for inode 469762178, would reset to 23089
bad nextents 41 for inode 469762178, would reset to 45
        - agno = 15
entry "iozone_68.tst" in shortform directory 503316608 references free 
inode 503316609
would have junked entry "iozone_68.tst" in directory inode 503316608
imap claims in-use inode 503316609 is free, correcting imap
libxfs_bcache: 0x100020b0
Max supported entries = 524288
Max utilized entries = 562
Active entries = 562
Hash table size = 65536
Hits = 1009
Misses = 564
Hit ratio = 64.00
Hash buckets with   0 entries 65116 (  0%)
Hash buckets with   1 entries   391 ( 69%)
Hash buckets with   2 entries    20 (  7%)
Hash buckets with   3 entries     1 (  0%)
Hash buckets with  15 entries     1 (  2%)
Hash buckets with  16 entries     6 ( 17%)
Hash buckets with  17 entries     1 (  3%)
Fri Jun 27 02:13:08 2008
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem starting at / ...
        - agno = 0
        - agno = 1
        - agno = 2
entry "iozone_68.tst" in shortform directory inode 67108993 points to 
free inode 67108995
would junk entry "iozone_68.tst"
        - agno = 3
entry "iozone_68.tst" in shortform directory inode 100663425 points to 
free inode 100663427
would junk entry "iozone_68.tst"
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
entry "iozone_68.tst" in shortform directory inode 301990016 points to 
free inode 301990019
would junk entry "iozone_68.tst"
        - agno = 10
        - agno = 11
        - agno = 12
entry "iozone_68.tst" in shortform directory inode 402653313 points to 
free inode 402653318
would junk entry "iozone_68.tst"
        - agno = 13
        - agno = 14
        - agno = 15
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
libxfs_icache: 0x10002050
Max supported entries = 524288
Max utilized entries = 42
Active entries = 42
Hash table size = 65536
Hits = 0
Misses = 42
Hit ratio =  0.00
Hash buckets with   0 entries 65524 (  0%)
Hash buckets with   1 entries     9 ( 21%)
Hash buckets with   6 entries     1 ( 14%)
Hash buckets with  12 entries     1 ( 28%)
Hash buckets with  15 entries     1 ( 35%)
libxfs_bcache: 0x100020b0
Max supported entries = 524288
Max utilized entries = 562
Active entries = 17
Hash table size = 65536
Hits = 1035
Misses = 581
Hit ratio = 64.00
Hash buckets with   0 entries 65533 (  0%)
Hash buckets with   1 entries     2 ( 11%)
Hash buckets with  15 entries     1 ( 88%)
Fri Jun 27 02:13:10 2008
Phase 7 - verify link counts...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
libxfs_icache: 0x10002050
Max supported entries = 524288
Max utilized entries = 42
Active entries = 42
Hash table size = 65536
Hits = 0
Misses = 42
Hit ratio =  0.00
Hash buckets with   0 entries 65524 (  0%)
Hash buckets with   1 entries     9 ( 21%)
Hash buckets with   6 entries     1 ( 14%)
Hash buckets with  12 entries     1 ( 28%)
Hash buckets with  15 entries     1 ( 35%)
libxfs_bcache: 0x100020b0
Max supported entries = 524288
Max utilized entries = 562
Active entries = 16
Hash table size = 65536
Hits = 1051
Misses = 597
Hit ratio = 63.00
Hash buckets with   0 entries 65534 (  0%)
Hash buckets with   1 entries     1 (  6%)
Hash buckets with  15 entries     1 ( 93%)
Fri Jun 27 02:13:17 2008
No modify flag set, skipping filesystem flush and exiting.

So there are  several  bad blocks and extents present in all  ag, which 
are causing the problem.
top output reveals that all cp are in D state
  PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
 7455 root     R        984  1892  7.4  0.7 top
 6100 root     D        524  1973  2.9  0.4 cp
 6799 root     R        524  1983  2.9  0.4 cp
 6796 root     D        524  2125  2.9  0.4 cp
 6074 root     D        524  2109  1.4  0.4 cp
 6097 root     D        524  1979  1.4  0.4 cp
 6076 root     D        524  1975  1.4  0.4 cp
 6738 root     D        524  2123  1.4  0.4 cp
 6759 root     D        524  2115  1.4  0.4 cp
 7035 root     D        524  1977  1.4  0.4 cp
 7440 root     D        520  1985  1.4  0.4 cp
   73 root     SW<        0     6  1.4  0.0 xfsdatad/0
   67 root     SW         0     6  1.4  0.0 pdflush
...
..
This means that they are waiting for an IO  and sleeping in system call 
but not able
to come out as several inodes are corrupted. And hence the script never 
gets completed.

Thanks
Sagar

   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-27 10:13       ` Sagar Borikar
@ 2008-06-27 10:25         ` Sagar Borikar
  2008-06-28  0:05           ` Dave Chinner
  2008-06-28  0:02         ` Dave Chinner
  1 sibling, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-06-27 10:25 UTC (permalink / raw)
  To: xfs


Dave,

I also got continuous exceptions


XFS internal error XFS_WANT_CORRUPTED_RETURN at line 296 of file 
fs/xfs/xfs_alloc.c.  Caller 0x802962c0
Call Trace:
[<80109888>] dump_stack+0x18/0x44
[<802c3550>] xfs_error_report+0x58/0x64
[<802965a0>] xfs_alloc_fixup_trees+0x39c/0x3dc
[<80297850>] xfs_alloc_ag_vextent_size+0x3ec/0x4f4
[<80296708>] xfs_alloc_ag_vextent+0x5c/0x15c
[<8029910c>] xfs_alloc_vextent+0x430/0x604
[<802a9420>] xfs_bmap_btalloc+0x6b0/0x950
[<802a9700>] xfs_bmap_alloc+0x40/0x4c
[<802acc80>] xfs_bmapi+0x8d8/0x13e4
[<802d4230>] xfs_iomap_write_allocate+0x340/0x5d8
[<802d2b18>] xfs_iomap+0x408/0x4dc
[<802fe8bc>] xfs_bmap+0x30/0x3c
[<802f3cac>] xfs_map_blocks+0x50/0x84
[<802f50dc>] xfs_page_state_convert+0x3f4/0x840
[<802f560c>] xfs_vm_writepage+0xe4/0x140
[<80198758>] mpage_writepages+0x24c/0x45c
[<802f5698>] xfs_vm_writepages+0x30/0x3c
[<801507b4>] do_writepages+0x44/0x84
[<80196628>] __sync_single_inode+0x68/0x234
[<80196980>] __writeback_single_inode+0x18c/0x1ac
[<80196ba8>] sync_sb_inodes+0x208/0x2f0
[<80196d14>] writeback_inodes+0x84/0xd0
[<80150160>] balance_dirty_pages+0xd8/0x1d4
[<801502a8>] balance_dirty_pages_ratelimited_nr+0x4c/0x58
[<8014c0a4>] generic_file_buffered_write+0x534/0x650
[<802fe4c8>] xfs_write+0x768/0xaac
[<802f8c30>] xfs_file_aio_write+0x88/0x94
[<8016d8d4>] do_sync_write+0xcc/0x124
[<8016d9e4>] vfs_write+0xb8/0x1a0
[<8016dbb8>] sys_write+0x54/0x98
[<8010c180>] stack_done+0x20/0x3c


So memory was also not available for pdflush threads to flush the data 
back to disks. But when
I checked memory stats, around 260KB of buffers were available with 
sufficient free memory
We are running 8k kernel stack with MIPS architecture. Also pdflush 
threads were stalled in
uninterruptible state.

Do you see any issues in the available memory as well?

Thanks
Sagar


Sagar Borikar wrote:
>
> Dave Chinner wrote:
>> [please wrap your replies at 72 columns]
>>
>> On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote:
>>  
>> Yes, but all the same pattern of corruption, so it is likely
>> that it is one problem.
>>
>>   All I can suggest is working out a reproducable test case in your
>> development environment, attaching a debugger and start digging around
>> in memory when the problem is hit and try to find out exactly what
>> is corrupted. If you can't reproduce it or work out what is
>> occurring to trigger the problem, then we're not going to be able to
>> find the cause...
>>
>> Cheers,
>>
>> Dave.
>>   
> Thanks Dave
> I did some experiments today with the corrupted filesystem.
> setup : NAS box contains one volume /share and 10 subdirectories.
> In first subdirectory sh1, I kept 512MB file. Through a script I 
> continuously copy this file
> simultaneously from sh2 to sh10 subdirectories.
> The script looks like
> ....
> while [ 1 ]
> do
> cp $1 $2
> done
>
>
> And when I check the process status using top, almost all the cp 
> processes are in
> uninterruptible sleep state continuously.  Ran xfs_repair with -n 
> option on filesystem mounted on JBOD
> Here is the output :
>
>
>
> Fri Jun 27 02:13:01 2008
> Phase 4 - check for duplicate blocks...
>        - setting up duplicate extent list...
>        - check for inodes claiming duplicate blocks...
>        - agno = 0
>        - agno = 1
> bad nblocks 8788 for inode 33554562, would reset to 15461
> bad nextents 18 for inode 33554562, would reset to 32
>        - agno = 2
> entry "iozone_68.tst" in shortform directory 67108993 references free 
> inode 67108995
> would have junked entry "iozone_68.tst" in directory inode 67108993
> data fork in ino 67108995 claims dup extent, off - 252, start - 
> 14711445, cnt 576
> bad data fork in inode 67108995
> would have cleared inode 67108995
>        - agno = 3
> entry "iozone_68.tst" in shortform directory 100663425 references free 
> inode 100663427
> would have junked entry "iozone_68.tst" in directory inode 100663425
> inode 100663427 - bad extent starting block number 906006917242880, 
> offset 2533274882670609
> bad data fork in inode 100663427
> would have cleared inode 100663427
>        - agno = 4
> bad nblocks 10214 for inode 134217859, would reset to 16761
> bad nextents 22 for inode 134217859, would reset to 34
>        - agno = 5
> bad nblocks 23581 for inode 167772290, would reset to 27557
> bad nextents 39 for inode 167772290, would reset to 45
>        - agno = 6
> bad nblocks 14527 for inode 201326722, would reset to 15697
> bad nextents 31 for inode 201326722, would reset to 34
> bad nblocks 12633 for inode 201326723, would reset to 16647
> bad nextents 23 for inode 201326723, would reset to 35
>        - agno = 7
> bad nblocks 26638 for inode 234881154, would reset to 27557
> bad nextents 53 for inode 234881154, would reset to 54
> bad nblocks 85653 for inode 234881155, would reset to 85664
> bad nextents 310 for inode 234881155, would reset to 311
>        - agno = 8
> bad nblocks 23241 for inode 268640387, would reset to 27565
> bad nextents 32 for inode 268640387, would reset to 42
> bad nblocks 81766 for inode 268640388, would reset to 86012
> bad nextents 332 for inode 268640388, would reset to 344
>        - agno = 9
> entry "iozone_68.tst" in shortform directory 301990016 references free 
> inode 301990019
> would have junked entry "iozone_68.tst" in directory inode 301990016
> data fork in ino 301990019 claims dup extent, off - 26402, start - 
> 19129002, cnt 450
> bad data fork in inode 301990019
> would have cleared inode 301990019
> bad nblocks 70282 for inode 301990020, would reset to 71793
> bad nextents 281 for inode 301990020, would reset to 294
>        - agno = 10
> entry "iozone_68.tst" in shortform directory 335544448 references free 
> inode 335544451
> would have junked entry "iozone_68.tst" in directory inode 335544448
> bad nblocks 11261 for inode 335544451, would reset to 19853
> bad nextents 24 for inode 335544451, would reset to 41
> imap claims in-use inode 335544451 is free, correcting imap
> bad nblocks 119952 for inode 335544452, would reset to 121178
> bad nextents 301 for inode 335544452, would reset to 312
>        - agno = 11
> bad nblocks 24361 for inode 369098883, would reset to 29553
> bad nextents 51 for inode 369098883, would reset to 57
> bad nblocks 3173 for inode 369098884, would reset to 5851
> bad nextents 10 for inode 369098884, would reset to 18
>        - agno = 12
> entry "iozone_68.tst" in shortform directory 402653313 references free 
> inode 402653318
> would have junked entry "iozone_68.tst" in directory inode 402653313
> bad nblocks 16348 for inode 402653317, would reset to 21485
> bad nextents 28 for inode 402653317, would reset to 37
> data fork in ino 402653318 claims dup extent, off - 124142, start - 
> 29379669, cnt 2
> bad data fork in inode 402653318
> would have cleared inode 402653318
>        - agno = 13
> bad nblocks 18374 for inode 436207747, would reset to 19991
> bad nextents 43 for inode 436207747, would reset to 47
> bad nblocks 38390 for inode 436207748, would reset to 38914
> bad nextents 300 for inode 436207748, would reset to 304
>        - agno = 14
> bad nblocks 20267 for inode 469762178, would reset to 23089
> bad nextents 41 for inode 469762178, would reset to 45
>        - agno = 15
> entry "iozone_68.tst" in shortform directory 503316608 references free 
> inode 503316609
> would have junked entry "iozone_68.tst" in directory inode 503316608
> imap claims in-use inode 503316609 is free, correcting imap
> libxfs_bcache: 0x100020b0
> Max supported entries = 524288
> Max utilized entries = 562
> Active entries = 562
> Hash table size = 65536
> Hits = 1009
> Misses = 564
> Hit ratio = 64.00
> Hash buckets with   0 entries 65116 (  0%)
> Hash buckets with   1 entries   391 ( 69%)
> Hash buckets with   2 entries    20 (  7%)
> Hash buckets with   3 entries     1 (  0%)
> Hash buckets with  15 entries     1 (  2%)
> Hash buckets with  16 entries     6 ( 17%)
> Hash buckets with  17 entries     1 (  3%)
> Fri Jun 27 02:13:08 2008
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
>        - traversing filesystem starting at / ...
>        - agno = 0
>        - agno = 1
>        - agno = 2
> entry "iozone_68.tst" in shortform directory inode 67108993 points to 
> free inode 67108995
> would junk entry "iozone_68.tst"
>        - agno = 3
> entry "iozone_68.tst" in shortform directory inode 100663425 points to 
> free inode 100663427
> would junk entry "iozone_68.tst"
>        - agno = 4
>        - agno = 5
>        - agno = 6
>        - agno = 7
>        - agno = 8
>        - agno = 9
> entry "iozone_68.tst" in shortform directory inode 301990016 points to 
> free inode 301990019
> would junk entry "iozone_68.tst"
>        - agno = 10
>        - agno = 11
>        - agno = 12
> entry "iozone_68.tst" in shortform directory inode 402653313 points to 
> free inode 402653318
> would junk entry "iozone_68.tst"
>        - agno = 13
>        - agno = 14
>        - agno = 15
>        - traversal finished ...
>        - traversing all unattached subtrees ...
>        - traversals finished ...
>        - moving disconnected inodes to lost+found ...
> libxfs_icache: 0x10002050
> Max supported entries = 524288
> Max utilized entries = 42
> Active entries = 42
> Hash table size = 65536
> Hits = 0
> Misses = 42
> Hit ratio =  0.00
> Hash buckets with   0 entries 65524 (  0%)
> Hash buckets with   1 entries     9 ( 21%)
> Hash buckets with   6 entries     1 ( 14%)
> Hash buckets with  12 entries     1 ( 28%)
> Hash buckets with  15 entries     1 ( 35%)
> libxfs_bcache: 0x100020b0
> Max supported entries = 524288
> Max utilized entries = 562
> Active entries = 17
> Hash table size = 65536
> Hits = 1035
> Misses = 581
> Hit ratio = 64.00
> Hash buckets with   0 entries 65533 (  0%)
> Hash buckets with   1 entries     2 ( 11%)
> Hash buckets with  15 entries     1 ( 88%)
> Fri Jun 27 02:13:10 2008
> Phase 7 - verify link counts...
>        - agno = 0
>        - agno = 1
>        - agno = 2
>        - agno = 3
>        - agno = 4
>        - agno = 5
>        - agno = 6
>        - agno = 7
>        - agno = 8
>        - agno = 9
>        - agno = 10
>        - agno = 11
>        - agno = 12
>        - agno = 13
>        - agno = 14
>        - agno = 15
> libxfs_icache: 0x10002050
> Max supported entries = 524288
> Max utilized entries = 42
> Active entries = 42
> Hash table size = 65536
> Hits = 0
> Misses = 42
> Hit ratio =  0.00
> Hash buckets with   0 entries 65524 (  0%)
> Hash buckets with   1 entries     9 ( 21%)
> Hash buckets with   6 entries     1 ( 14%)
> Hash buckets with  12 entries     1 ( 28%)
> Hash buckets with  15 entries     1 ( 35%)
> libxfs_bcache: 0x100020b0
> Max supported entries = 524288
> Max utilized entries = 562
> Active entries = 16
> Hash table size = 65536
> Hits = 1051
> Misses = 597
> Hit ratio = 63.00
> Hash buckets with   0 entries 65534 (  0%)
> Hash buckets with   1 entries     1 (  6%)
> Hash buckets with  15 entries     1 ( 93%)
> Fri Jun 27 02:13:17 2008
> No modify flag set, skipping filesystem flush and exiting.
>
> So there are  several  bad blocks and extents present in all  ag, 
> which are causing the problem.
> top output reveals that all cp are in D state
>  PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
> 7455 root     R        984  1892  7.4  0.7 top
> 6100 root     D        524  1973  2.9  0.4 cp
> 6799 root     R        524  1983  2.9  0.4 cp
> 6796 root     D        524  2125  2.9  0.4 cp
> 6074 root     D        524  2109  1.4  0.4 cp
> 6097 root     D        524  1979  1.4  0.4 cp
> 6076 root     D        524  1975  1.4  0.4 cp
> 6738 root     D        524  2123  1.4  0.4 cp
> 6759 root     D        524  2115  1.4  0.4 cp
> 7035 root     D        524  1977  1.4  0.4 cp
> 7440 root     D        520  1985  1.4  0.4 cp
>   73 root     SW<        0     6  1.4  0.0 xfsdatad/0
>   67 root     SW         0     6  1.4  0.0 pdflush
> ...
> ..
> This means that they are waiting for an IO  and sleeping in system 
> call but not able
> to come out as several inodes are corrupted. And hence the script 
> never gets completed.
>
> Thanks
> Sagar
>
>  
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-27 10:13       ` Sagar Borikar
  2008-06-27 10:25         ` Sagar Borikar
@ 2008-06-28  0:02         ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2008-06-28  0:02 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Fri, Jun 27, 2008 at 03:43:49PM +0530, Sagar Borikar wrote:
> Dave Chinner wrote:
>> Yes, but all the same pattern of corruption, so it is likely
>> that it is one problem.
>>
>>   All I can suggest is working out a reproducable test case in your
>> development environment, attaching a debugger and start digging around
>> in memory when the problem is hit and try to find out exactly what
>> is corrupted. If you can't reproduce it or work out what is
>> occurring to trigger the problem, then we're not going to be able to
>> find the cause...
>>
> Thanks Dave
> I did some experiments today with the corrupted filesystem.
> setup : NAS box contains one volume /share and 10 subdirectories.
> In first subdirectory sh1, I kept 512MB file. Through a script I  
> continuously copy this file
> simultaneously from sh2 to sh10 subdirectories.
> The script looks like
> ....
> while [ 1 ]
> do
> cp $1 $2
> done
....
> uninterruptible sleep state continuously.  Ran xfs_repair with -n option  
> on filesystem mounted on JBOD
> Here is the output :
....
> entry "iozone_68.tst" in shortform directory 67108993 references free  
> inode 67108995
....
> entry "iozone_68.tst" in shortform directory 100663425 references free  
> inode 100663427
....
> entry "iozone_68.tst" in shortform directory 301990016 references free  
> inode 301990019
....
> entry "iozone_68.tst" in shortform directory 335544448 references free  
> inode 335544451
....
> entry "iozone_68.tst" in shortform directory 402653313 references free  
> inode 402653318
....

And so on. There's a pattern here. Can you try to find out what
part of your workload is producing these errors?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-27 10:25         ` Sagar Borikar
@ 2008-06-28  0:05           ` Dave Chinner
  2008-06-28 16:47             ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-06-28  0:05 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Fri, Jun 27, 2008 at 03:55:05PM +0530, Sagar Borikar wrote:
>
> Dave,
>
> I also got continuous exceptions
>
>
> XFS internal error XFS_WANT_CORRUPTED_RETURN at line 296 of file  
> fs/xfs/xfs_alloc.c.  Caller 0x802962c0

corrupt alloc btree. xfs_repair won't report errors in this btree;
it simply rebuilds it. xfs_check will report errors in it, though.

> So memory was also not available for pdflush threads to flush the data  
> back to disks. But when

nothing to do with memory availabilty, I think.

FWIW, can you send the output of xfs_growfs -n <mntpt> and details
of the partitioning and volume config?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-06-28  0:05           ` Dave Chinner
@ 2008-06-28 16:47             ` Sagar Borikar
  2008-06-29 21:56               ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-06-28 16:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


Dave,

Attaching required information:

> nothing to do with memory availabilty, I think.

> FWIW, can you send the output of xfs_growfs -n <mntpt> and details
> of the partitioning and volume config?

[root@NAS001ee5ab9c85 ~]# xfs_growfs -n /mnt/RAIDA/vol/
meta-data=/dev/RAIDA/vol         isize=256    agcount=16, agsize=1638400
blks
         =                       sectsz=512   attr=1
data  =                       bsize=4096   blocks=26214400, imaxpct=25
        =                       sunit=0      swidth=0 blks, unwritten=1
naming=version 2              bsize=4096
log      =internal               bsize=4096   blocks=12800, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

[root@NAS001ee5ab9c85 ~]# cat /etc/fstab
/dev/root       /              ext2     rw,noauto         0      1
proc            /proc          proc     defaults          0      0
devpts          /dev/pts       devpts   defaults,gid=5,mode=620   0
0
tmpfs           /tmp           tmpfs    defaults          0      0
/dev/RAIDA/vol  /mnt/RAIDA/vol  xfs     defaults,usrquota,grpquota
0 0
/mnt/RAIDA/vol/sh       /mnt/ftp_dir/sh none    rw,bind 0 0
/mnt/RAIDA/vol/.autohome/       /mnt/ftp_dir/homes      none    rw,bind
0 0

[root@NAS001ee5ab9c85 ~]# fdisk -l

Disk /dev/scsibd: 257 MB, 257425408 bytes
8 heads, 32 sectors/track, 1964 cylinders
Units = cylinders of 256 * 512 = 131072 bytes

      Device Boot      Start         End      Blocks   Id  System
/dev/scsibd1             126         286       20608   83  Linux
/dev/scsibd2             287        1023       94336   83  Linux
/dev/scsibd3            1149        1309       20608   83  Linux
/dev/scsibd4            1310        2046       94336   83  Linux

Disk /dev/md0: 251.0 GB, 251000160256 bytes
2 heads, 4 sectors/track, 61279336 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

But still the issue is why doesn't it happen every time and less stress?

I am surprised to see to let this happen immediately when the
subdirectories increase more than 30. Else it decays slowly.

Thanks
Sagar

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-28 16:47             ` Sagar Borikar
@ 2008-06-29 21:56               ` Dave Chinner
  2008-06-30  3:37                 ` Sagar Borikar
       [not found]                 ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
  0 siblings, 2 replies; 48+ messages in thread
From: Dave Chinner @ 2008-06-29 21:56 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote:
> > FWIW, can you send the output of xfs_growfs -n <mntpt> and details
> > of the partitioning and volume config?
....
> [root@NAS001ee5ab9c85 ~]# cat /etc/fstab
> /dev/root       /              ext2     rw,noauto         0      1
> proc            /proc          proc     defaults          0      0
> devpts          /dev/pts       devpts   defaults,gid=5,mode=620   0
> 0
> tmpfs           /tmp           tmpfs    defaults          0      0
> /dev/RAIDA/vol  /mnt/RAIDA/vol  xfs     defaults,usrquota,grpquota
> 0 0
> /mnt/RAIDA/vol/sh       /mnt/ftp_dir/sh none    rw,bind 0 0
> /mnt/RAIDA/vol/.autohome/       /mnt/ftp_dir/homes      none    rw,bind
> 0 0
> 
> [root@NAS001ee5ab9c85 ~]# fdisk -l
> 
> Disk /dev/scsibd: 257 MB, 257425408 bytes
> 8 heads, 32 sectors/track, 1964 cylinders
> Units = cylinders of 256 * 512 = 131072 bytes
> 
>       Device Boot      Start         End      Blocks   Id  System
> /dev/scsibd1             126         286       20608   83  Linux
> /dev/scsibd2             287        1023       94336   83  Linux
> /dev/scsibd3            1149        1309       20608   83  Linux
> /dev/scsibd4            1310        2046       94336   83  Linux

I'd have to assume thats a flash based root drive, right?

> Disk /dev/md0: 251.0 GB, 251000160256 bytes
> 2 heads, 4 sectors/track, 61279336 cylinders
> Units = cylinders of 8 * 512 = 4096 bytes
> 
> Disk /dev/md0 doesn't contain a valid partition table
> 
> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
> 255 heads, 63 sectors/track, 13054 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes

Neither of these tell me what /dev/RAIDA/vol is....

> But still the issue is why doesn't it happen every time and less stress?
> 
> I am surprised to see to let this happen immediately when the
> subdirectories increase more than 30. Else it decays slowly.

So it happens when you get more than 30 entries in a directory
under a certain load? That might be an extent->btree format
conversion bug or vice versa. I'd suggest setting up a test based
around this to try to narrow down the problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-29 21:56               ` Dave Chinner
@ 2008-06-30  3:37                 ` Sagar Borikar
       [not found]                 ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
  1 sibling, 0 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-06-30  3:37 UTC (permalink / raw)
  To: xfs

Dave Chinner wrote:
> On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote:
>   
> Device Boot Start End Blocks Id System
>> /dev/scsibd1             126         286       20608   83  Linux
>> /dev/scsibd2             287        1023       94336   83  Linux
>> /dev/scsibd3            1149        1309       20608   83  Linux
>> /dev/scsibd4            1310        2046       94336   83  Linux
>>     
>
> I'd have to assume thats a flash based root drive, right?
>
>   
That's right,
>> Disk /dev/md0: 251.0 GB, 251000160256 bytes
>> 2 heads, 4 sectors/track, 61279336 cylinders
>> Units = cylinders of 8 * 512 = 4096 bytes
>>
>> Disk /dev/md0 doesn't contain a valid partition table
>>
>> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
>> 255 heads, 63 sectors/track, 13054 cylinders
>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>     
>
> Neither of these tell me what /dev/RAIDA/vol is....
> It is the device node to which /mnt/RAIDA/vol is mapped to. Its a JBOD with 233 GB size.
>   
>> But still the issue is why doesn't it happen every time and less stress?
>>
>> I am surprised to see to let this happen immediately when the
>> subdirectories increase more than 30. Else it decays slowly.
>>     
>
> So it happens when you get more than 30 entries in a directory
> under a certain load? That might be an extent->btree format
> conversion bug or vice versa. I'd suggest setting up a test based
> around this to try to narrow down the problem.
>
> Cheers,
>
> Dave.
>   
Thanks for all your help. Shall keep you posted with the progress on 
debugging.

Regards
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
       [not found]                 ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
@ 2008-06-30  6:07                   ` Sagar Borikar
  2008-06-30 10:24                   ` Sagar Borikar
  1 sibling, 0 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-06-30  6:07 UTC (permalink / raw)
  To: xfs



Sagar Borikar wrote:
> Dave Chinner wrote:
>> On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote:
>>   Device Boot Start End Blocks Id System
>>> /dev/scsibd1             126         286       20608   83  Linux
>>> /dev/scsibd2             287        1023       94336   83  Linux
>>> /dev/scsibd3            1149        1309       20608   83  Linux
>>> /dev/scsibd4            1310        2046       94336   83  Linux
>>>     
>>
>> I'd have to assume thats a flash based root drive, right?
>>
>>   
> That's right,
>>> Disk /dev/md0: 251.0 GB, 251000160256 bytes
>>> 2 heads, 4 sectors/track, 61279336 cylinders
>>> Units = cylinders of 8 * 512 = 4096 bytes
>>>
>>> Disk /dev/md0 doesn't contain a valid partition table
>>>
>>> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
>>> 255 heads, 63 sectors/track, 13054 cylinders
>>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>>     
>>
>> Neither of these tell me what /dev/RAIDA/vol is....
>> It is the device node to which /mnt/RAIDA/vol is mapped to. Its a 
>> JBOD with 233 GB size.
>>  
>>> But still the issue is why doesn't it happen every time and less 
>>> stress?
>>>
>>> I am surprised to see to let this happen immediately when the
>>> subdirectories increase more than 30. Else it decays slowly.
>>>     
>>
>> So it happens when you get more than 30 entries in a directory
>> under a certain load? That might be an extent->btree format
>> conversion bug or vice versa. I'd suggest setting up a test based
>> around this to try to narrow down the problem.
>>
>> Cheers,
>>
>> Dave.
>>   
> Thanks for all your help. Shall keep you posted with the progress on 
> debugging.
>
> Regards
> Sagar
>
>
Sorry if I was not clear.  As I mentioned the frequency of finding bad 
extents is much higher
when I increase simultaneous transactions to 30 ( say in 5 min ) but if 
I run only
two copies in infinite loop, the issue crops up in 2-3 hours roughly. 
And all the copies plus pdflush
are in uninterruptible sleep state continuously. And it is not 
uninterruptible sleep and waiting state ( DW )  but
just uninterruptible ( D ). 

Thanks
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
       [not found]                 ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
  2008-06-30  6:07                   ` Sagar Borikar
@ 2008-06-30 10:24                   ` Sagar Borikar
  2008-07-01  6:44                     ` Dave Chinner
  1 sibling, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-06-30 10:24 UTC (permalink / raw)
  To: xfs

Hi Dave,

Sagar Borikar wrote:
> Dave Chinner wrote:
>> On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote:
>>   Device Boot Start End Blocks Id System
>>> /dev/scsibd1             126         286       20608   83  Linux
>>> /dev/scsibd2             287        1023       94336   83  Linux
>>> /dev/scsibd3            1149        1309       20608   83  Linux
>>> /dev/scsibd4            1310        2046       94336   83  Linux
>>>     
>>
>> I'd have to assume thats a flash based root drive, right?
>>
>>   
> That's right,
>>> Disk /dev/md0: 251.0 GB, 251000160256 bytes
>>> 2 heads, 4 sectors/track, 61279336 cylinders
>>> Units = cylinders of 8 * 512 = 4096 bytes
>>>
>>> Disk /dev/md0 doesn't contain a valid partition table
>>>
>>> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
>>> 255 heads, 63 sectors/track, 13054 cylinders
>>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>>     
>>
>> Neither of these tell me what /dev/RAIDA/vol is....
>> It is the device node to which /mnt/RAIDA/vol is mapped to. Its a 
>> JBOD with 233 GB size.
>>  
>>> But still the issue is why doesn't it happen every time and less 
>>> stress?
>>>
>>> I am surprised to see to let this happen immediately when the
>>> subdirectories increase more than 30. Else it decays slowly.
>>>     
>>
>> So it happens when you get more than 30 entries in a directory
>> under a certain load? That might be an extent->btree format
>> conversion bug or vice versa. I'd suggest setting up a test based
>> around this to try to narrow down the problem.
>>
>> Cheers,
>>
>> Dave.
>>   
> Thanks for all your help. Shall keep you posted with the progress on 
> debugging.
>
> Regards
> Sagar
>
After running my test for 20 min, when I check the fragmentation status 
of file system, I observe that it
is severely fragmented.

[root@NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol
actual 94343, ideal 107, fragmentation factor 99.89%

Do you think, this can cause the issue?

Thanks
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-06-30 10:24                   ` Sagar Borikar
@ 2008-07-01  6:44                     ` Dave Chinner
  2008-07-02  4:18                       ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-07-01  6:44 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote:
> After running my test for 20 min, when I check the fragmentation status  
> of file system, I observe that it
> is severely fragmented.

Depends on your definition of fragmentation....

> [root@NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol
> actual 94343, ideal 107, fragmentation factor 99.89%

And that one is a bad one ;)

Still, there are a lot of extents - ~1000 to a file - which
will be stressing the btree extent format code.

> Do you think, this can cause the issue?

Sure - just like any other workload that generates enough
extents. Like I said originally, we've fixed so many problems
in this code since 2.6.18 I'd suggest that your only sane
hope for us to help you track done the problem is to upgrade
to a current kernel and go from there....

Cheers,,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-01  6:44                     ` Dave Chinner
@ 2008-07-02  4:18                       ` Sagar Borikar
  2008-07-02  5:13                         ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-02  4:18 UTC (permalink / raw)
  To: Sagar Borikar, xfs



Dave Chinner wrote:
> On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote:
>   
>> After running my test for 20 min, when I check the fragmentation status  
>> of file system, I observe that it
>> is severely fragmented.
>>     
>
> Depends on your definition of fragmentation....
>
>   
>> [root@NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol
>> actual 94343, ideal 107, fragmentation factor 99.89%
>>     
>
> And that one is a bad one ;)
>
> Still, there are a lot of extents - ~1000 to a file - which
> will be stressing the btree extent format code.
>
>   
>> Do you think, this can cause the issue?
>>     
>
> Sure - just like any other workload that generates enough
> extents. Like I said originally, we've fixed so many problems
> in this code since 2.6.18 I'd suggest that your only sane
> hope for us to help you track done the problem is to upgrade
> to a current kernel and go from there....
>
> Cheers,,
>
> Dave.
>   
Thanks again Dave. But we can't upgrade the kernel as it is already in 
production and on field.
So do you think, periodic cleaning of file system using xfs_fsr can 
solve the issue? If not, could you
kindly direct me what all patches were fixing similar problem? I can try 
back porting them.

Thanks
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-02  4:18                       ` Sagar Borikar
@ 2008-07-02  5:13                         ` Dave Chinner
  2008-07-02  5:35                           ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-07-02  5:13 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Wed, Jul 02, 2008 at 09:48:46AM +0530, Sagar Borikar wrote:
> Dave Chinner wrote:
>> On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote:
>> Sure - just like any other workload that generates enough
>> extents. Like I said originally, we've fixed so many problems
>> in this code since 2.6.18 I'd suggest that your only sane
>> hope for us to help you track done the problem is to upgrade
>> to a current kernel and go from there....
>>   
> Thanks again Dave. But we can't upgrade the kernel as it is already in  
> production and on field.

Yes, but you can run it in your test environment where you are
reproducing this problem, right?

> So do you think, periodic cleaning of file system using xfs_fsr can  
> solve the issue?

No, at best it would only delay the problem (whatever it is).

> If not, could you
> kindly direct me what all patches were fixing similar problem? I can try  
> back porting them.

I don't have time to try to identify some set of changes from the
past 3-4 years that might fix your problem. There may not even be a
patch that fixes your problem, which is one of the reasons why I've
asked if you can reproduce it on a current kernel....

I pointed you the files that the bug could lie in earlier in the
thread. You can find the history of changes to those files via the
mainline git repository or via the XFS CVS repository. You'd
probably do best to look at the git tree because all the changes are
well described in the commit logs and you should be able to isolate
ones that fix btree problems fairly easily...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-02  5:13                         ` Dave Chinner
@ 2008-07-02  5:35                           ` Sagar Borikar
  2008-07-02  6:13                             ` Nathan Scott
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-02  5:35 UTC (permalink / raw)
  To: Sagar Borikar, xfs



Dave Chinner wrote:
> On Wed, Jul 02, 2008 at 09:48:46AM +0530, Sagar Borikar wrote:
>   
>> Dave Chinner wrote:
>>     
>>> On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote:
>>> Sure - just like any other workload that generates enough
>>> extents. Like I said originally, we've fixed so many problems
>>> in this code since 2.6.18 I'd suggest that your only sane
>>> hope for us to help you track done the problem is to upgrade
>>> to a current kernel and go from there....
>>>   
>>>       
>> Thanks again Dave. But we can't upgrade the kernel as it is already in  
>> production and on field.
>>     
>
> Yes, but you can run it in your test environment where you are
> reproducing this problem, right?
>
>   
Unfortunately the architecture is customized mips for which the standard 
kernel port is
not available  and we have to port the new kernel in order to try this 
which is why I was
hesitating to do this.
>> So do you think, periodic cleaning of file system using xfs_fsr can  
>> solve the issue?
>>     
>
> No, at best it would only delay the problem (whatever it is).
>
>   
>> If not, could you
>> kindly direct me what all patches were fixing similar problem? I can try  
>> back porting them.
>>     
>
> I don't have time to try to identify some set of changes from the
> past 3-4 years that might fix your problem. There may not even be a
> patch that fixes your problem, which is one of the reasons why I've
> asked if you can reproduce it on a current kernel....
>
> I pointed you the files that the bug could lie in earlier in the
> thread. You can find the history of changes to those files via the
> mainline git repository or via the XFS CVS repository. You'd
> probably do best to look at the git tree because all the changes are
> well described in the commit logs and you should be able to isolate
> ones that fix btree problems fairly easily...
>
> Cheers,
>
> Dave.
>   
Sure I'll go through these changelogs.  Thanks for all your help and 
really appreciate your
time. I hope you don't mind to help me in future if I find something new :)

Regards,
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-02  5:35                           ` Sagar Borikar
@ 2008-07-02  6:13                             ` Nathan Scott
  2008-07-02  6:56                               ` Dave Chinner
  0 siblings, 1 reply; 48+ messages in thread
From: Nathan Scott @ 2008-07-02  6:13 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

On Wed, 2008-07-02 at 11:05 +0530, Sagar Borikar wrote:
> 
> Unfortunately the architecture is customized mips for which the
> standard 
> kernel port is
> not available  and we have to port the new kernel in order to try
> this 
> which is why I was
> hesitating to do this.

You can always try the reverse - replace fs/xfs from your mips build
tree with the one from the current/a recent kernel.  Theres very few
changes in the surrounding kernel code that xfs needs.

cheers.

--
Nathan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-02  6:13                             ` Nathan Scott
@ 2008-07-02  6:56                               ` Dave Chinner
  2008-07-02 11:02                                 ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-07-02  6:56 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Sagar Borikar, xfs, sandeen

On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote:
> On Wed, 2008-07-02 at 11:05 +0530, Sagar Borikar wrote:
> > 
> > Unfortunately the architecture is customized mips for which the
> > standard 
> > kernel port is
> > not available  and we have to port the new kernel in order to try
> > this 
> > which is why I was
> > hesitating to do this.
> 
> You can always try the reverse - replace fs/xfs from your mips build
> tree with the one from the current/a recent kernel.  Theres very few
> changes in the surrounding kernel code that xfs needs.

Eric should be able to comment on the pitfalls in doing this having
tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric -
any comments?

Cheers,

Dave.
-- 
Dave Chinner
dchinner@agami.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-02  6:56                               ` Dave Chinner
@ 2008-07-02 11:02                                 ` Sagar Borikar
  2008-07-03  4:03                                   ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-02 11:02 UTC (permalink / raw)
  To: Nathan Scott, Sagar Borikar, xfs, sandeen



Dave Chinner wrote:
> On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote:
>   
>> On Wed, 2008-07-02 at 11:05 +0530, Sagar Borikar wrote:
>>     
>>> Unfortunately the architecture is customized mips for which the
>>> standard 
>>> kernel port is
>>> not available  and we have to port the new kernel in order to try
>>> this 
>>> which is why I was
>>> hesitating to do this.
>>>       
>> You can always try the reverse - replace fs/xfs from your mips build
>> tree with the one from the current/a recent kernel.  Theres very few
>> changes in the surrounding kernel code that xfs needs.
>>     
>
> Eric should be able to comment on the pitfalls in doing this having
> tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric -
> any comments?
>
> Cheers,
>
> Dave.
>   
Eric, Could you please let me know about bits and pieces that we need to 
remember while back porting xfs to 2.6.18?
If you share patches which takes care of it, that would be great.

Thanks
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-02 11:02                                 ` Sagar Borikar
@ 2008-07-03  4:03                                   ` Eric Sandeen
  2008-07-03  5:14                                     ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-03  4:03 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Nathan Scott, xfs

Sagar Borikar wrote:
> 
> Dave Chinner wrote:
>> On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote:


>>> You can always try the reverse - replace fs/xfs from your mips build
>>> tree with the one from the current/a recent kernel.  Theres very few
>>> changes in the surrounding kernel code that xfs needs.
>>>     
>> Eric should be able to comment on the pitfalls in doing this having
>> tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric -
>> any comments?
>>
>> Cheers,
>>
>> Dave.
>>   
> Eric, Could you please let me know about bits and pieces that we need to 
> remember while back porting xfs to 2.6.18?
> If you share patches which takes care of it, that would be great.

http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2

should be pretty close.  It was quick 'n' dirty and it has some warts
but would give an idea of what backporting was done (see patches/ and
the associated quilt series; quilt push -a to apply them all)

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-03  4:03                                   ` Eric Sandeen
@ 2008-07-03  5:14                                     ` Sagar Borikar
  2008-07-03 15:02                                       ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-03  5:14 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Nathan Scott, xfs



Eric Sandeen wrote:
> Sagar Borikar wrote:
>   
>> Dave Chinner wrote:
>>     
>>> On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote:
>>>       
>
>
>   
>>>> You can always try the reverse - replace fs/xfs from your mips build
>>>> tree with the one from the current/a recent kernel.  Theres very few
>>>> changes in the surrounding kernel code that xfs needs.
>>>>     
>>>>         
>>> Eric should be able to comment on the pitfalls in doing this having
>>> tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric -
>>> any comments?
>>>
>>> Cheers,
>>>
>>> Dave.
>>>   
>>>       
>> Eric, Could you please let me know about bits and pieces that we need to 
>> remember while back porting xfs to 2.6.18?
>> If you share patches which takes care of it, that would be great.
>>     
>
> http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2
>
> should be pretty close.  It was quick 'n' dirty and it has some warts
> but would give an idea of what backporting was done (see patches/ and
> the associated quilt series; quilt push -a to apply them all)
>   
Thanks a lot Eric. I'll go through it .I am actually trying another 
option of regularly defragmenting the file system under stress.
I wanted to understand couple of things for using xfs_fsr utility:

1. What should be the state of filesystem when I am running xfs_fsr. 
Ideally we should stop all io before running defragmentation.
2. How effective is the utility when ran on highly fragmented file 
system? I saw that if filesystem is 99.89% fragmented, the recovery is 
very slow. It took around 25 min to clean up 100GB JBOD volume and after 
that system was fragmented to 82%. So I was confused on how exactly the 
fragmentation works.
Any pointers on probable optimum use of xfs_fsr?
3. Any precautions I need to take when working with that from data 
consistency, robustness point of view? Any disadvantages?
4. Any threshold for starting the defragmentation on xfs?

Thanks
Sagar
> -Eric
>   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-03  5:14                                     ` Sagar Borikar
@ 2008-07-03 15:02                                       ` Eric Sandeen
  2008-07-04 10:18                                         ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-03 15:02 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Nathan Scott, xfs

Sagar Borikar wrote:
> 
> Eric Sandeen wrote:


>>> Eric, Could you please let me know about bits and pieces that we need to 
>>> remember while back porting xfs to 2.6.18?
>>> If you share patches which takes care of it, that would be great.
>>>     
>> http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2
>>
>> should be pretty close.  It was quick 'n' dirty and it has some warts
>> but would give an idea of what backporting was done (see patches/ and
>> the associated quilt series; quilt push -a to apply them all)
>>   
> Thanks a lot Eric. I'll go through it .I am actually trying another 
> option of regularly defragmenting the file system under stress.

Ok, but that won't get to the bottom of the problem.  It might alleviate
it at best, but if I were shipping a product using xfs I'd want to know
that it was properly solved.  :)

The tarball above should give you almost everything you need to run your
testcase with current xfs code on your older kernel to see if the bug
persists or if it's been fixed upstream, in which case you have a
relatively easy path to an actual solution that your customers can
depend on.

> I wanted to understand couple of things for using xfs_fsr utility:
> 
> 1. What should be the state of filesystem when I am running xfs_fsr. 
> Ideally we should stop all io before running defragmentation.

you can run in any state.  Some files will not get defragmented due to
busy-ness or other conditions; look at the xfs_swap_extents() function
in the kernel which is very well documented; some cases return EBUSY.

> 2. How effective is the utility when ran on highly fragmented file 
> system? I saw that if filesystem is 99.89% fragmented, the recovery is 
> very slow. It took around 25 min to clean up 100GB JBOD volume and after 
> that system was fragmented to 82%. So I was confused on how exactly the 
> fragmentation works.

Again read the code, but basically it tries to preallocate as much space
as the file is currently using, then checks that it is more contiguous
space than the file currently has and if so, it copies the data from old
to new and swaps the new allocation for the old.  Note, this involves a
fair amount of IO.

Also don't get hung up on that fragmentation factor, at least not until
you've read xfs_db code to see how it's reported, and you've thought
about what that means.  For example: a 100G filesystem with 10 10G files
each with 5x2G extents will report 80% fragmentation.  Now, ask
yourself, is a 10G file in 5x2G extents "bad" fragmentation?

> Any pointers on probable optimum use of xfs_fsr?
> 3. Any precautions I need to take when working with that from data 
> consistency, robustness point of view? Any disadvantages?

Anything which corrupts data is a bug, and I'm not aware of any such
bugs in the defragmentation process.

> 4. Any threshold for starting the defragmentation on xfs?

Pretty well determined by your individual use case and requirements, I
think.

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-03 15:02                                       ` Eric Sandeen
@ 2008-07-04 10:18                                         ` Sagar Borikar
  2008-07-04 12:27                                           ` Dave Chinner
  2008-07-04 15:33                                           ` Eric Sandeen
  0 siblings, 2 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-07-04 10:18 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Nathan Scott, xfs

[-- Attachment #1: Type: text/plain, Size: 4198 bytes --]



Eric Sandeen wrote:
> Sagar Borikar wrote:
>   
>> Eric Sandeen wrote:
>>     
>
>
>   
>>>> Eric, Could you please let me know about bits and pieces that we need to 
>>>> remember while back porting xfs to 2.6.18?
>>>> If you share patches which takes care of it, that would be great.
>>>>     
>>>>         
>>> http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2
>>>
>>> should be pretty close.  It was quick 'n' dirty and it has some warts
>>> but would give an idea of what backporting was done (see patches/ and
>>> the associated quilt series; quilt push -a to apply them all)
>>>   
>>>       
>> Thanks a lot Eric. I'll go through it .I am actually trying another 
>> option of regularly defragmenting the file system under stress.
>>     
>
> Ok, but that won't get to the bottom of the problem.  It might alleviate
> it at best, but if I were shipping a product using xfs I'd want to know
> that it was properly solved.  :)
>
>   
Even we too don't want to leave it as it is.  I still am working on back 
porting the latest xfs code.
Your patches are helping a lot .
Just to check whether that issue lies with 2.6.18 or MIPS port, I tested 
it on 2.6.24 x86 platform.
Here we created a loop back device of 10 GB and mounted xfs on that.
What I observe that xfs_repair reports quite a few bad blocks and bad 
extents here as well.
So is developing bad blocks and extents  normal behavior in xfs which 
would be recovered
in background or is it a bug? I still didn't see the exception but the 
bad blocks and extents are
generated within 10 minutes or running the tests.
Attaching the log .
> The tarball above should give you almost everything you need to run your
> testcase with current xfs code on your older kernel to see if the bug
> persists or if it's been fixed upstream, in which case you have a
> relatively easy path to an actual solution that your customers can
> depend on.
>
>   
>> I wanted to understand couple of things for using xfs_fsr utility:
>>
>> 1. What should be the state of filesystem when I am running xfs_fsr. 
>> Ideally we should stop all io before running defragmentation.
>>     
>
> you can run in any state.  Some files will not get defragmented due to
> busy-ness or other conditions; look at the xfs_swap_extents() function
> in the kernel which is very well documented; some cases return EBUSY.
>   

>   
>> 2. How effective is the utility when ran on highly fragmented file 
>> system? I saw that if filesystem is 99.89% fragmented, the recovery is 
>> very slow. It took around 25 min to clean up 100GB JBOD volume and after 
>> that system was fragmented to 82%. So I was confused on how exactly the 
>> fragmentation works.
>>     
>
> Again read the code, but basically it tries to preallocate as much space
> as the file is currently using, then checks that it is more contiguous
> space than the file currently has and if so, it copies the data from old
> to new and swaps the new allocation for the old.  Note, this involves a
> fair amount of IO.
>
> Also don't get hung up on that fragmentation factor, at least not until
> you've read xfs_db code to see how it's reported, and you've thought
> about what that means.  For example: a 100G filesystem with 10 10G files
> each with 5x2G extents will report 80% fragmentation.  Now, ask
> yourself, is a 10G file in 5x2G extents "bad" fragmentation?
>
>   
Agreed  as in x86 too I see 99.12% fragmentation when I run above 
mentioned test. and xfs_fsr
doesn't help much even after freezing the file system.
>> Any pointers on probable optimum use of xfs_fsr?
>> 3. Any precautions I need to take when working with that from data 
>> consistency, robustness point of view? Any disadvantages?
>>     
>
> Anything which corrupts data is a bug, and I'm not aware of any such
> bugs in the defragmentation process.
>
>   
Assuming that we get some improvement by running   xfs_fsr, is it safe 
to run regularly
in some periodic interval the defragmentation utility?
>> 4. Any threshold for starting the defragmentation on xfs?
>>     
>
> Pretty well determined by your individual use case and requirements, I
> think.
>
> -Eric
>   
Thanks for the detailed response Eric.

Sagar

[-- Attachment #2: xfs_repair_log --]
[-- Type: text/plain, Size: 4444 bytes --]

bad nblocks 13345 for inode 50331785, would reset to 19431
bad nextents 156 for inode 50331785, would reset to 251
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
entry "testfile" in shortform directory 132 references free inode 142
would have junked entry "testfile" in directory inode 132
entry "testfile" in shortform directory 138 references free inode 143
would have junked entry "testfile" in directory inode 138
entry "testfile" in shortform directory 140 references free inode 144
would have junked entry "testfile" in directory inode 140
bad nblocks 15848 for inode 141, would reset to 18634
bad nextents 269 for inode 141, would reset to 306
bad nblocks 18888 for inode 16777350, would reset to 19144
bad nextents 303 for inode 16777350, would reset to 309
bad nblocks 18704 for inode 16777351, would reset to 19144
bad nextents 291 for inode 16777351, would reset to 299
bad fwd (right) sibling pointer (saw 107678 should be NULLDFSBNO)
        in inode 142 ((null) fork) bmap btree block 236077307437232
would have cleared inode 142
bad fwd (right) sibling pointer (saw 1139882 should be NULLDFSBNO)
        in inode 143 ((null) fork) bmap btree block 4556402090352816
would have cleared inode 143
bad fwd (right) sibling pointer (saw 1138473 should be NULLDFSBNO)
        in inode 144 ((null) fork) bmap btree block 4564279060373680
would have cleared inode 144
bad nblocks 13825 for inode 145, would reset to 18503
bad nextents 221 for inode 145, would reset to 222
        - agno = 2
entry "testfile" in shortform directory 33595588 references free inode 33595593
would have junked entry "testfile" in directory inode 33595588
bad nblocks 18704 for inode 33595589, would reset to 19121
bad nextents 306 for inode 33595589, would reset to 314
bad nblocks 18704 for inode 33595590, would reset to 19432
bad nextents 302 for inode 33595590, would reset to 313
bad nblocks 18640 for inode 33595591, would reset to 19432
bad nextents 311 for inode 33595591, would reset to 317
bad nblocks 18888 for inode 33595592, would reset to 19432
bad nextents 312 for inode 33595592, would reset to 322
bad fwd (right) sibling pointer (saw 104113 should be NULLDFSBNO)
        in inode 33595593 ((null) fork) bmap btree block 9041060911947952
would have cleared inode 33595593
        - agno = 3
bad nblocks 18888 for inode 50331781, would reset to 19432
bad nextents 315 for inode 50331781, would reset to 324
bad nblocks 18888 for inode 50331782, would reset to 19432
bad nextents 326 for inode 50331782, would reset to 333
bad nblocks 18888 for inode 50331783, would reset to 19432
bad nblocks 18428 for inode 50331784, would reset to 19784
bad nextents 285 for inode 50331784, would reset to 306
bad nblocks 18704 for inode 16777352, would reset to 19144
bad nextents 311 for inode 16777352, would reset to 315
bad nblocks 13345 for inode 50331785, would reset to 19431
bad nextents 156 for inode 50331785, would reset to 251
bad nblocks 18888 for inode 16777353, would reset to 19144
bad nextents 318 for inode 16777353, would reset to 321
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
entry "testfile" in shortform directory inode 132 points to free inode 142would junk entry
entry "testfile" in shortform directory inode 138 points to free inode 143would junk entry
entry "testfile" in shortform directory inode 140 points to free inode 144would junk entry
        - agno = 1
        - agno = 2
entry "testfile" in shortform directory inode 33595588 points to free inode 33595593would junk entry
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Fri Jul  4 15:34:47 2008

Phase           Start           End             Duration
Phase 1:        07/04 15:34:00  07/04 15:34:04  4 seconds
Phase 2:        07/04 15:34:04  07/04 15:34:31  27 seconds
Phase 3:        07/04 15:34:31  07/04 15:34:47  16 seconds
Phase 4:        07/04 15:34:47  07/04 15:34:47
Phase 5:        Skipped
Phase 6:        07/04 15:34:47  07/04 15:34:47
Phase 7:        07/04 15:34:47  07/04 15:34:47

Total run time: 47 seconds

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-04 10:18                                         ` Sagar Borikar
@ 2008-07-04 12:27                                           ` Dave Chinner
  2008-07-04 17:30                                             ` Sagar Borikar
  2008-07-04 15:33                                           ` Eric Sandeen
  1 sibling, 1 reply; 48+ messages in thread
From: Dave Chinner @ 2008-07-04 12:27 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Eric Sandeen, Nathan Scott, xfs

On Fri, Jul 04, 2008 at 03:48:24PM +0530, Sagar Borikar wrote:
> Even we too don't want to leave it as it is.  I still am working on back  
> porting the latest xfs code.
> Your patches are helping a lot .
> Just to check whether that issue lies with 2.6.18 or MIPS port, I tested  
> it on 2.6.24 x86 platform.
> Here we created a loop back device of 10 GB and mounted xfs on that.

And the script that generates the workload can be found where?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-04 10:18                                         ` Sagar Borikar
  2008-07-04 12:27                                           ` Dave Chinner
@ 2008-07-04 15:33                                           ` Eric Sandeen
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Sandeen @ 2008-07-04 15:33 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Nathan Scott, xfs

Sagar Borikar wrote:

>>   
> Even we too don't want to leave it as it is.  I still am working on back 
> porting the latest xfs code.
> Your patches are helping a lot .
> Just to check whether that issue lies with 2.6.18 or MIPS port, I tested 
> it on 2.6.24 x86 platform.
> Here we created a loop back device of 10 GB and mounted xfs on that.
> What I observe that xfs_repair reports quite a few bad blocks and bad 
> extents here as well.
> So is developing bad blocks and extents  normal behavior in xfs which 
> would be recovered
> in background or is it a bug? I still didn't see the exception but the 
> bad blocks and extents are
> generated within 10 minutes or running the tests.
> Attaching the log .

Repair finding corruption indicates a bug (or hardware problem) somewhere.

As a long shot you might re-test with this patch in place:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=6ab455eeaff6893cd06da33843e840d888cdc04a

But, as Dave said, please also provide the testcase.

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-07-04 12:27                                           ` Dave Chinner
@ 2008-07-04 17:30                                             ` Sagar Borikar
  2008-07-04 17:35                                               ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-04 17:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, Nathan Scott, xfs


The script is pretty straight forward:

While [ 1 ]
Do
Cp -f $1 $2
Done

Where I pass the first parameter as the 300+ MB file in one directory
and $2 are is other directory. I run 30 instances of the script in
parallel.

Thanks
Sagar


On Fri, Jul 04, 2008 at 03:48:24PM +0530, Sagar Borikar wrote:
> Even we too don't want to leave it as it is.  I still am working on
back  
> porting the latest xfs code.
> Your patches are helping a lot .
> Just to check whether that issue lies with 2.6.18 or MIPS port, I
tested  
> it on 2.6.24 x86 platform.
> Here we created a loop back device of 10 GB and mounted xfs on that.

And the script that generates the workload can be found where?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-04 17:30                                             ` Sagar Borikar
@ 2008-07-04 17:35                                               ` Eric Sandeen
  2008-07-04 17:51                                                 ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-04 17:35 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:
> The script is pretty straight forward:
> 
> While [ 1 ]
> Do
> Cp -f $1 $2
> Done
> 
> Where I pass the first parameter as the 300+ MB file in one directory
> and $2 are is other directory. I run 30 instances of the script in
> parallel.

Copying the same file to the same directory, or 30 different files to 30
different directories?  Or the ame file to 30 different directories?  If
different directories what is the layout of the target directories?  Etc...

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-07-04 17:35                                               ` Eric Sandeen
@ 2008-07-04 17:51                                                 ` Sagar Borikar
  2008-07-05 16:25                                                   ` Eric Sandeen
  2008-07-06  4:19                                                   ` Dave Chinner
  0 siblings, 2 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-07-04 17:51 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs


Copy is of the same file to 30 different directories and it is basically
overwrite.

Here is the setup:

It's a JBOD with Volume size 20 GB. The directories are empty and this
is basically continuous copy of the file on all thirty directories. But
surprisingly none of the copy succeeds. All the copy processes are in 
Uninterruptible sleep state and xfs_repair log I have already attached 
With the prep. As mentioned it is with 2.6.24 Fedora kernel.

Thanks
Sagar




-----Original Message-----
From: Eric Sandeen [mailto:sandeen@sandeen.net] 
Sent: Friday, July 04, 2008 11:05 PM
To: Sagar Borikar
Cc: Dave Chinner; Nathan Scott; xfs@oss.sgi.com
Subject: Re: Xfs Access to block zero exception and system crash

Sagar Borikar wrote:
> The script is pretty straight forward:
> 
> While [ 1 ]
> Do
> Cp -f $1 $2
> Done
> 
> Where I pass the first parameter as the 300+ MB file in one directory
> and $2 are is other directory. I run 30 instances of the script in
> parallel.

Copying the same file to the same directory, or 30 different files to 30
different directories?  Or the ame file to 30 different directories?  If
different directories what is the layout of the target directories?
Etc...

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-04 17:51                                                 ` Sagar Borikar
@ 2008-07-05 16:25                                                   ` Eric Sandeen
  2008-07-06 17:24                                                     ` Sagar Borikar
  2008-07-06  4:19                                                   ` Dave Chinner
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-05 16:25 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:
> Copy is of the same file to 30 different directories and it is basically
> overwrite.
> 
> Here is the setup:
> 
> It's a JBOD with Volume size 20 GB. The directories are empty and this
> is basically continuous copy of the file on all thirty directories. But
> surprisingly none of the copy succeeds. All the copy processes are in 
> Uninterruptible sleep state and xfs_repair log I have already attached 
> With the prep. As mentioned it is with 2.6.24 Fedora kernel.

It would probably be best to try a 2.6.26 kernel from rawhide to be sure
you're closest to the bleeding edge.

I tested on 2.6.24.7-92.fc8 on x86_64, and I did this, specifically, in
the root of a 30G xfs fs:

# for I in `seq 1 30`; do mkdir dir$I; done
# vi copyit.sh (your script)
# chmod +x copyit.sh
# dd if=/dev/zero of=300mbfile bs=1M count=300
# for I in `seq 1 30`; do ./copyit.sh 300mbfile dir$I & done

I got no errors or corruption after several iterations.

Might also be worth checking dmesg for any errors when you run.

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-04 17:51                                                 ` Sagar Borikar
  2008-07-05 16:25                                                   ` Eric Sandeen
@ 2008-07-06  4:19                                                   ` Dave Chinner
  1 sibling, 0 replies; 48+ messages in thread
From: Dave Chinner @ 2008-07-06  4:19 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Eric Sandeen, Nathan Scott, xfs

On Fri, Jul 04, 2008 at 10:51:47AM -0700, Sagar Borikar wrote:
> 
> Copy is of the same file to 30 different directories and it is basically
> overwrite.

Not an overwrite - cp truncates the destination file first:

# cp t.t fred
# strace cp -f t.t fred
.....
stat("fred", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
stat("t.t", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
stat("fred", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
open("t.t", O_RDONLY)                   = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
open("fred", O_WRONLY|O_TRUNC)          = 4
             ^^^^^^^^^^^^^^^^
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(3, "fred\n", 4096)                 = 5
write(4, "fred\n", 5)                   = 5
close(4)                                = 0
close(3)                                = 0
.....


That being said, I can't reproduce it on a 2.6.24 (debian)
kernel, either.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-07-05 16:25                                                   ` Eric Sandeen
@ 2008-07-06 17:24                                                     ` Sagar Borikar
  2008-07-06 19:07                                                       ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-06 17:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs


Sagar Borikar wrote:
> Copy is of the same file to 30 different directories and it is
basically
> overwrite.
> 
> Here is the setup:
> 
> It's a JBOD with Volume size 20 GB. The directories are empty and this
> is basically continuous copy of the file on all thirty directories.
But
> surprisingly none of the copy succeeds. All the copy processes are in 
> Uninterruptible sleep state and xfs_repair log I have already attached

> With the prep. As mentioned it is with 2.6.24 Fedora kernel.

It would probably be best to try a 2.6.26 kernel from rawhide to be sure
you're closest to the bleeding edge.

<Sagar> Sure Eric but I reran the test and I got similar errors with
2.6.24 kernel on x86. I am still confused with the results that I see on
2.6.24 kernel on x86 machine. I see that the used size shown by ls is
way too huge than the actual size. Here is the log of the system

[root@lab00 ~/test_partition]# ls -lSah
total 202M
-rw-r--r--  1 root root 202M Jul  4 14:06 original ---> this I sthe file
Which I  copy.
drwxr-x--- 65 root root  12K Jul  6 21:57 ..
-rwxr-xr-x  1 root root  189 Jul  4 16:31 runall
-rwxr-xr-x  1 root root   50 Jul  4 16:32 copy
drwxr-xr-x  2 root root   45 Jul  6 22:07 .

 -------> Total size is roughly 202MB. 

[root@lab00 ~/test_partition]# df -lh .
Filesystem            Size  Used Avail Use% Mounted on
/mnt/xfstest          9.6G  7.7G  2.0G  80% /root/test_partition

Size reported by df is 7.7G which is complete anomaly here. This is
10GB loopback partition and it mentions that only 2 GB is available.

[root@lab00 ~/test_partition]# cat /etc/mtab
/dev/mapper/VolGroup00-LogVol00 / ext3 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
/dev/sda1 /boot ext3 rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
automount(pid3151) /net autofs rw,fd=4,pgrp=3151,minproto=2,maxproto=4 0
0
/mnt/xfstest /root/test_partition xfs rw,loop=/dev/loop0 0 0 ---> XFS
partition.


Here is the fragmentation result
[root@lab00 ~/test_partition]# xfs_db  -c frag -r /mnt/xfstest
actual 7781, ideal 32, fragmentation factor 99.59%

Here is the kernel version:

[root@lab00 ~/test_partition]# uname -a
Linux lab00 2.6.24 #1 SMP Fri Jul 4 12:20:56 IST 2008 i686 i686 i386
GNU/Linux


I tested on 2.6.24.7-92.fc8 on x86_64, and I did this, specifically, in
the root of a 30G xfs fs:



# for I in `seq 1 30`; do mkdir dir$I; done
# vi copyit.sh (your script)
# chmod +x copyit.sh
# dd if=/dev/zero of=300mbfile bs=1M count=300
# for I in `seq 1 30`; do ./copyit.sh 300mbfile dir$I & done

I got no errors or corruption after several iterations.

<Sagar> Surprising. I see it every time. I do it on 20 GB and 10GB
partition on loopback device. When looked for the bad inode, 

Might also be worth checking dmesg for any errors when you run.

<Sagar> dmesg log doesn't give any information. Here is XFS related
info:

XFS mounting filesystem loop0
Ending clean XFS mount for filesystem: loop0
Which is basically for mounting XFS cleanly. But there is no exception
in XFS. 


Filesystem has become completely sluggish and response time is increased
to 
3-4 minutes for every command.  Not a single copy is complete and all
the copy processes are sleeping continuously. Xfs_repair starts
reporting severe bugs:

        - agno = 1
entry "testfile" in shortform directory 16777472 references free inode
16777473
would have junked entry "testfile" in directory inode 16777472
        - agno = 0
entry "testfile_3" at block 0 offset 664 in directory inode 128
references free inode 138
        would clear inode number in entry at offset 664...
entry "testfile_4" at block 0 offset 712 in directory inode 128
references free inode 140
        would clear inode number in entry at offset 712...
entry "testfile_5" at block 0 offset 760 in directory inode 128
references free inode 142
        would clear inode number in entry at offset 760...
entry "testfile_6" at block 0 offset 808 in directory inode 128
references free inode 143
        would clear inode number in entry at offset 808...
entry "testfile_7" at block 0 offset 856 in directory inode 128
references free inode 144
        would clear inode number in entry at offset 856...
entry "testfile_8" at block 0 offset 904 in directory inode 128
references free inode 146
        would clear inode number in entry at offset 904...
entry "testfile_9" at block 0 offset 952 in directory inode 128
references free inode 148
        would clear inode number in entry at offset 952...
entry "testfile_10" at block 0 offset 976 in directory inode 128
references free inode 149
        would clear inode number in entry at offset 976...
entry "testfile_12" at block 0 offset 1048 in directory inode 128
references free inode 150
        would clear inode number in entry at offset 1048...
entry "testfile_11" at block 0 offset 1072 in directory inode 128
references free inode 151
        would clear inode number in entry at offset 1072...
entry "testfile_13" at block 0 offset 1144 in directory inode 128
references free inode 154
data fork in ino 16777473 claims dup extent, off - 5266, start -
2164956, cnt 192
bad data fork in inode 16777473
would have cleared inode 16777473
entry "testfile" in shortform directory 16777474 references free inode
16777475
would have junked entry "testfile" in directory inode 16777474
        would clear inode number in entry at offset 1144...
entry "testfile_14" at block 0 offset 1168 in directory inode 128
references free inode 155
        would clear inode number in entry at offset 1168...
entry "testfile_15" at block 0 offset 1240 in directory inode 128
references free inode 156
        would clear inode number in entry at offset 1240...
entry "testfile_16" at block 0 offset 1264 in directory inode 128
references free inode 157
        would clear inode number in entry at offset 1264...
entry "testfile_17" at block 0 offset 1336 in directory inode 128
references free inode 160
        would clear inode number in entry at offset 1336...
entry "testfile_18" at block 0 offset 1360 in directory inode 128
references free inode 161
        would clear inode number in entry at offset 1360...
entry "testfile_19" at block 0 offset 1432 in directory inode 128
references free inode 162
        would clear inode number in entry at offset 1432...
entry "testfile_20" at block 0 offset 1456 in directory inode 128
references free inode 163
        would clear inode number in entry at offset 1456...
entry "testfile_2" at block 0 offset 3032 in directory inode 128
references free inode 137
        would clear inode number in entry at offset 3032...
data fork in ino 16777475 claims dup extent, off - 8178, start -
3200553, cnt 104
bad data fork in inode 16777475
would have cleared inode 16777475
entry "testfile" in shortform directory 16777476 references free inode
16777477
would have junked entry "testfile" in directory inode 16777476
data fork in ino 16777477 claims dup extent, off - 9402, start -
3221565, cnt 56
bad data fork in inode 16777477
would have cleared inode 16777477
entry "testfile" in shortform directory 16777478 references free inode
16777479
would have junked entry "testfile" in directory inode 16777478
data fork in ino 16777479 claims dup extent, off - 9586, start - 170361,
cnt 96
bad data fork in inode 16777479
would have cleared inode 16777479
entry "testfile" in shortform directory 16777480 references free inode
16777481
would have junked entry "testfile" in directory inode 16777480
data fork in ino 16777481 claims dup extent, off - 8338, start -
3203018, cnt 128
bad data fork in inode 16777481
would have cleared inode 16777481
        - agno = 2
entry "testfile" in shortform directory 33595712 references free inode
33595713
would have junked entry "testfile" in directory inode 33595712
bad data fork in inode 33595713
would have cleared inode 33595713
entry "testfile" in shortform directory 33595714 references free inode
33595715
would have junked entry "testfile" in directory inode 33595714
imap claims in-use inode 33595715 is free, correcting imap
entry "testfile" in shortform directory 33595716 references free inode
33595717
would have junked entry "testfile" in directory inode 33595716
data fork in ino 33595717 claims dup extent, off - 0, start - 3281880,
cnt 6180
bad data fork in inode 33595717
would have cleared inode 33595717
entry "testfile" in shortform directory 33595718 references free inode
33595719
would have junked entry "testfile" in directory inode 33595718
bad data fork in inode 33595719
would have cleared inode 33595719
entry "testfile" in shortform directory 33595720 references free inode
33595721
would have junked entry "testfile" in directory inode 33595720
bad data fork in inode 33595721
would have cleared inode 33595721
        - agno = 3
entry "testfile" in shortform directory 50331904 references free inode
50331905
would have junked entry "testfile" in directory inode 50331904
bad data fork in inode 50331905
would have cleared inode 50331905
entry "testfile" in shortform directory 50331906 references free inode
50331907
would have junked entry "testfile" in directory inode 50331906
data fork in ino 50331907 claims dup extent, off - 609, start - 3151886,
cnt 311
bad data fork in inode 50331907
would have cleared inode 50331907
entry "testfile" in shortform directory 50331908 references free inode
50331909
would have junked entry "testfile" in directory inode 50331908
imap claims in-use inode 50331909 is free, correcting imap
entry "testfile" in shortform directory 50331910 references free inode
50331911
would have junked entry "testfile" in directory inode 50331910
bad data fork in inode 50331911
would have cleared inode 50331911
entry "testfile" in shortform directory 50331912 references free inode
50331913
would have junked entry "testfile" in directory inode 50331912
data fork in ino 50331913 claims dup extent, off - 6358, start -
3224389, cnt 469
bad data fork in inode 50331913
would have cleared inode 50331913
data fork in regular inode 133 claims used block 1075592
would have cleared inode 133
data fork in regular inode 136 claims used block 1075930
would have cleared inode 136
data fork in regular inode 137 claims used block 2162044
would have cleared inode 137
data fork in regular inode 138 claims used block 1075938
would have cleared inode 138
entry "testfile" in shortform directory 139 references free inode 141
would have junked entry "testfile" in directory inode 139
data fork in ino 140 claims dup extent, off - 12298, start - 202587, cnt
30
bad data fork in inode 140
would have cleared inode 140
data fork in ino 141 claims dup extent, off - 8562, start - 160071, cnt
384
bad data fork in inode 141
would have cleared inode 141
data fork in ino 142 claims dup extent, off - 1458, start - 80521, cnt
32
bad data fork in inode 142
would have cleared inode 142
data fork in ino 143 claims dup extent, off - 13770, start - 235117, cnt
96
bad data fork in inode 143
would have cleared inode 143
bad magic # 0 in inode 144 (data fork) bmbt block 3262925
bad data fork in inode 144
would have cleared inode 144
entry "testfile" in shortform directory 145 references free inode 147
would have junked entry "testfile" in directory inode 145
data fork in ino 146 claims dup extent, off - 8082, start - 138272, cnt
32
bad data fork in inode 146
would have cleared inode 146
data fork in regular inode 147 claims used block 1075759
would have cleared inode 147
data fork in regular inode 148 claims used block 3231076
would have cleared inode 148
data fork in ino 149 claims dup extent, off - 9426, start - 168635, cnt
8
bad data fork in inode 149
would have cleared inode 149
data fork in ino 150 claims dup extent, off - 3607, start - 105990, cnt
59
bad data fork in inode 150
would have cleared inode 150
data fork in regular inode 151 claims used block 1076476
would have cleared inode 151
entry "testfile" in shortform directory 152 references free inode 153
would have junked entry "testfile" in directory inode 152
bad magic # 0 in inode 153 (data fork) bmbt block 3271407
bad data fork in inode 153
would have cleared inode 153
data fork in regular inode 154 claims used block 1076388
would have cleared inode 154
data fork in regular inode 155 claims used block 1076068
would have cleared inode 155
data fork in regular inode 156 claims used block 3224002
would have cleared inode 156
data fork in ino 157 claims dup extent, off - 9554, start - 170265, cnt
96
bad data fork in inode 157
would have cleared inode 157
entry "testfile" in shortform directory 158 references free inode 159
would have junked entry "testfile" in directory inode 158
data fork in regular inode 159 claims used block 1076564
would have cleared inode 159
data fork in ino 160 claims dup extent, off - 9394, start - 168489, cnt
8
bad data fork in inode 160
would have cleared inode 160
data fork in ino 161 claims dup extent, off - 14662, start - 253175, cnt
32
bad data fork in inode 161
would have cleared inode 161
data fork in regular inode 162 claims used block 2209542
would have cleared inode 162
bad magic # 0 in inode 163 (data fork) bmbt block 3270098
bad data fork in inode 163
would have cleared inode 163
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
entry "testfile_3" in directory inode 128 points to free inode 138,
would junk entry
entry "testfile_4" in directory inode 128 points to free inode 140,
would junk entry
entry "testfile_5" in directory inode 128 points to free inode 142,
would junk entry
entry "testfile_6" in directory inode 128 points to free inode 143,
would junk entry
entry "testfile_7" in directory inode 128 points to free inode 144,
would junk entry
entry "testfile_8" in directory inode 128 points to free inode 146,
would junk entry
entry "testfile_9" in directory inode 128 points to free inode 148,
would junk entry
entry "testfile_10" in directory inode 128 points to free inode 149,
would junk entry
entry "testfile_12" in directory inode 128 points to free inode 150,
would junk entry
entry "testfile_11" in directory inode 128 points to free inode 151,
would junk entry
entry "testfile_13" in directory inode 128 points to free inode 154,
would junk entry
entry "testfile_14" in directory inode 128 points to free inode 155,
would junk entry
entry "testfile_15" in directory inode 128 points to free inode 156,
would junk entry
entry "testfile_16" in directory inode 128 points to free inode 157,
would junk entry
entry "testfile_17" in directory inode 128 points to free inode 160,
would junk entry
entry "testfile_18" in directory inode 128 points to free inode 161,
would junk entry
entry "testfile_19" in directory inode 128 points to free inode 162,
would junk entry
entry "testfile_20" in directory inode 128 points to free inode 163,
would junk entry
entry "testfile_1" in directory inode 128 points to free inode 136,
would junk entry
entry "testfile_2" in directory inode 128 points to free inode 137,
would junk entry
bad hash table for directory inode 128 (no data entry): would rebuild
entry "testfile" in shortform directory inode 132 points to free inode
133would junk entry
entry "testfile" in shortform directory inode 139 points to free inode
141would junk entry
entry "testfile" in shortform directory inode 145 points to free inode
147would junk entry
entry "testfile" in shortform directory inode 152 points to free inode
153would junk entry
entry "testfile" in shortform directory inode 158 points to free inode
159would junk entry
        - agno = 1
entry "testfile" in shortform directory inode 16777472 points to free
inode 16777473would junk entry
entry "testfile" in shortform directory inode 16777474 points to free
inode 16777475would junk entry
entry "testfile" in shortform directory inode 16777476 points to free
inode 16777477would junk entry
entry "testfile" in shortform directory inode 16777478 points to free
inode 16777479would junk entry
entry "testfile" in shortform directory inode 16777480 points to free
inode 16777481would junk entry
        - agno = 2
entry "testfile" in shortform directory inode 33595712 points to free
inode 33595713would junk entry
entry "testfile" in shortform directory inode 33595716 points to free
inode 33595717would junk entry
entry "testfile" in shortform directory inode 33595718 points to free
inode 33595719would junk entry
entry "testfile" in shortform directory inode 33595720 points to free
inode 33595721would junk entry
        - agno = 3
entry "testfile" in shortform directory inode 50331904 points to free
inode 50331905would junk entry
entry "testfile" in shortform directory inode 50331906 points to free
inode 50331907would junk entry
entry "testfile" in shortform directory inode 50331910 points to free
inode 50331911would junk entry
entry "testfile" in shortform directory inode 50331912 points to free
inode 50331913would junk entry
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Sun Jul  6 22:43:36 2008

Phase           Start           End             Duration
Phase 1:        07/06 22:39:18  07/06 22:39:33  15 seconds
Phase 2:        07/06 22:39:33  07/06 22:41:47  2 minutes, 14 seconds
Phase 3:        07/06 22:41:47  07/06 22:43:15  1 minute, 28 seconds
Phase 4:        07/06 22:43:15  07/06 22:43:36  21 seconds
Phase 5:        Skipped
Phase 6:        07/06 22:43:36  07/06 22:43:36
Phase 7:        07/06 22:43:36  07/06 22:43:36

Total run time: 4 minutes, 18 seconds


When checked for bad inode in xfs_db, then the parent inode was shown as
-1
I presume it should point to right parent directory inode.
1:
        byte offset 2560065792, length 256
        buffer block 5000128 (fsbno 1048592), 8 bbs
        inode 16777473, dir inode -1, type inode

I don't know what I am doing wrong here.
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-06 17:24                                                     ` Sagar Borikar
@ 2008-07-06 19:07                                                       ` Eric Sandeen
  2008-07-07  3:02                                                         ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-06 19:07 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:
> Sagar Borikar wrote:
>> Copy is of the same file to 30 different directories and it is
> basically
>> overwrite.
>>
>> Here is the setup:
>>
>> It's a JBOD with Volume size 20 GB. The directories are empty and this
>> is basically continuous copy of the file on all thirty directories.
> But
>> surprisingly none of the copy succeeds. All the copy processes are in 
>> Uninterruptible sleep state and xfs_repair log I have already attached
> 
>> With the prep. As mentioned it is with 2.6.24 Fedora kernel.
> 
> It would probably be best to try a 2.6.26 kernel from rawhide to be sure
> you're closest to the bleeding edge.
> 
> <Sagar> Sure Eric but I reran the test and I got similar errors with
> 2.6.24 kernel on x86. I am still confused with the results that I see on
> 2.6.24 kernel on x86 machine. I see that the used size shown by ls is
> way too huge than the actual size. Here is the log of the system
> 
> [root@lab00 ~/test_partition]# ls -lSah
> total 202M
> -rw-r--r--  1 root root 202M Jul  4 14:06 original ---> this I sthe file
> Which I  copy.
> drwxr-x--- 65 root root  12K Jul  6 21:57 ..
> -rwxr-xr-x  1 root root  189 Jul  4 16:31 runall
> -rwxr-xr-x  1 root root   50 Jul  4 16:32 copy
> drwxr-xr-x  2 root root   45 Jul  6 22:07 .

It'd be great if you provided these actual scripts so we don't have to
guess at what you're doing or work backwards from the repair output :)

> dmesg log doesn't give any information. Here is XFS related
> info:
> 
> XFS mounting filesystem loop0
> Ending clean XFS mount for filesystem: loop0
> Which is basically for mounting XFS cleanly. But there is no exception
> in XFS. 

and nothing else of interest either?

> Filesystem has become completely sluggish and response time is increased
> to 
> 3-4 minutes for every command.  Not a single copy is complete and all
> the copy processes are sleeping continuously. 

And how did you recover from this; did you power-cycle the box?

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-06 19:07                                                       ` Eric Sandeen
@ 2008-07-07  3:02                                                         ` Sagar Borikar
  2008-07-07  3:04                                                           ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-07  3:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs

[-- Attachment #1: Type: text/plain, Size: 2363 bytes --]



Eric Sandeen wrote:
> Sagar Borikar wrote:
>   
>> Sagar Borikar wrote:
>>     
>>> Copy is of the same file to 30 different directories and it is
>>>       
>> basically
>>     
>>> overwrite.
>>>
>>> Here is the setup:
>>>
>>> It's a JBOD with Volume size 20 GB. The directories are empty and this
>>> is basically continuous copy of the file on all thirty directories.
>>>       
>> But
>>     
>>> surprisingly none of the copy succeeds. All the copy processes are in 
>>> Uninterruptible sleep state and xfs_repair log I have already attached
>>>       
>>> With the prep. As mentioned it is with 2.6.24 Fedora kernel.
>>>       
>> It would probably be best to try a 2.6.26 kernel from rawhide to be sure
>> you're closest to the bleeding edge.
>>
>> <Sagar> Sure Eric but I reran the test and I got similar errors with
>> 2.6.24 kernel on x86. I am still confused with the results that I see on
>> 2.6.24 kernel on x86 machine. I see that the used size shown by ls is
>> way too huge than the actual size. Here is the log of the system
>>
>> [root@lab00 ~/test_partition]# ls -lSah
>> total 202M
>> -rw-r--r--  1 root root 202M Jul  4 14:06 original ---> this I sthe file
>> Which I  copy.
>> drwxr-x--- 65 root root  12K Jul  6 21:57 ..
>> -rwxr-xr-x  1 root root  189 Jul  4 16:31 runall
>> -rwxr-xr-x  1 root root   50 Jul  4 16:32 copy
>> drwxr-xr-x  2 root root   45 Jul  6 22:07 .
>>     
>
> It'd be great if you provided these actual scripts so we don't have to
> guess at what you're doing or work backwards from the repair output :)
>   
Attaching the scripts with this mail.
>   
>> dmesg log doesn't give any information. Here is XFS related
>> info:
>>
>> XFS mounting filesystem loop0
>> Ending clean XFS mount for filesystem: loop0
>> Which is basically for mounting XFS cleanly. But there is no exception
>> in XFS. 
>>     
>
> and nothing else of interest either?
>   
Not really. That's why it was surprising. Even after setting the 
error_level to 11
>   
>> Filesystem has become completely sluggish and response time is increased
>> to 
>> 3-4 minutes for every command.  Not a single copy is complete and all
>> the copy processes are sleeping continuously. 
>>     
>
> And how did you recover from this; did you power-cycle the box?
>   
There was no failure. Only the processes were stalled. System was 
operative.
> -Eric
>   

[-- Attachment #2: copy --]
[-- Type: text/plain, Size: 50 bytes --]

#! /bin/sh

while [ 1 ]

do
cp -f $1 $2
done






[-- Attachment #3: runall --]
[-- Type: text/plain, Size: 189 bytes --]

#! /bin/sh

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
do
	
mkdir -p testdir_$i	
./copy testfile testdir_$i &
rm -Rf testdir_$1/testfile
./copy testfile testfile_$i &
done

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:02                                                         ` Sagar Borikar
@ 2008-07-07  3:04                                                           ` Eric Sandeen
  2008-07-07  3:07                                                             ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-07  3:04 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:


> There was no failure. Only the processes were stalled. System was 
> operative.


I'm curious, if the processes were stalled, how did you unmount the
filesystem to run repair on it?

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:04                                                           ` Eric Sandeen
@ 2008-07-07  3:07                                                             ` Sagar Borikar
  2008-07-07  3:11                                                               ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-07  3:07 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs



Eric Sandeen wrote:
> Sagar Borikar wrote:
>
>
>   
>> There was no failure. Only the processes were stalled. System was 
>> operative.
>>     
>
>
> I'm curious, if the processes were stalled, how did you unmount the
> filesystem to run repair on it?
>
> -Eric
>   
I ran with -n option.

xfs_repair -fvn /root/test_partition

Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:07                                                             ` Sagar Borikar
@ 2008-07-07  3:11                                                               ` Eric Sandeen
  2008-07-07  3:17                                                                 ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-07  3:11 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:
> 
> Eric Sandeen wrote:
>> Sagar Borikar wrote:
>>
>>
>>   
>>> There was no failure. Only the processes were stalled. System was 
>>> operative.
>>>     
>>
>> I'm curious, if the processes were stalled, how did you unmount the
>> filesystem to run repair on it?
>>
>> -Eric
>>   
> I ran with -n option.
> 
> xfs_repair -fvn /root/test_partition

oh....

So, you basically ran repair on a live, mounted filesystem; it's
expected that it would not be consistent at this point.

So, the errors you are seeing on this x86 are likely not related to
those you see on mips.  (the D state process might be interesting and
worth looking into, but probably not related to the problem you're
trying to solve.)

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:11                                                               ` Eric Sandeen
@ 2008-07-07  3:17                                                                 ` Sagar Borikar
  2008-07-07  3:22                                                                   ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-07  3:17 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs



Eric Sandeen wrote:
> Sagar Borikar wrote:
>   
>> Eric Sandeen wrote:
>>     
>>> Sagar Borikar wrote:
>>>
>>>
>>>   
>>>       
>>>> There was no failure. Only the processes were stalled. System was 
>>>> operative.
>>>>     
>>>>         
>>> I'm curious, if the processes were stalled, how did you unmount the
>>> filesystem to run repair on it?
>>>
>>> -Eric
>>>   
>>>       
>> I ran with -n option.
>>
>> xfs_repair -fvn /root/test_partition
>>     
>
> oh....
>
> So, you basically ran repair on a live, mounted filesystem; it's
> expected that it would not be consistent at this point.
>
> So, the errors you are seeing on this x86 are likely not related to
> those you see on mips.  (the D state process might be interesting and
> worth looking into, but probably not related to the problem you're
> trying to solve.)
>
> -Eric
>   
Ok. But then I was surprised as why the copy is not successful. Here is 
the ps output

root     29200  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_16
root     29201  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_16
root     29202  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_14
root     29203  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_2
root     29204  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_9
root     29205  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_5
root     29206  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_3
root     29207  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_15
root     29208  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testdir_2
root     29209  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_12
root     29210  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_10
root     29211  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_4
root     29212  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_13
root     29213  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_20
root     29214  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testdir_20
root     29215  0.0  0.1   2088   656 ?        D    01:41   0:00 cp -f 
testfile testdir_18
root     29216  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_13
root     29217  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testdir_1
root     29218  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_8
root     29219  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_11
root     29220  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_6
root     29221  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_6
root     29222  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_10
root     29223  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_14
root     29224  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_19
root     29225  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_12
root     29226  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_5
root     29227  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testdir_11
root     29228  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_8
root     29229  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_4
root     29230  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_17
root     29231  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_18
root     29232  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testdir_15
root     29233  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_7
root     29234  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_3
root     29235  0.0  0.1   2088   644 ?        D    01:41   0:00 cp -f 
testfile testfile_1
root     29236  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_17
root     29237  0.0  0.1   2088   652 ?        D    01:41   0:00 cp -f 
testfile testdir_7
root     29238  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testdir_19
root     29239  0.0  0.1   2088   648 ?        D    01:41   0:00 cp -f 
testfile testfile_9

All the the copies are pending and file size in those directories is 
constant. It is not
increasing.
 And as the processes are in D state, the file system is marked as busy 
and I can't unmount
it.

Thanks
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:17                                                                 ` Sagar Borikar
@ 2008-07-07  3:22                                                                   ` Eric Sandeen
  2008-07-07  3:42                                                                     ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-07  3:22 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:

> All the the copies are pending and file size in those directories is 
> constant. It is not
> increasing.
>  And as the processes are in D state, the file system is marked as busy 
> and I can't unmount
> it.

Understood.  It looks like you've deadlocked somewhere.  But, this is
not the problem you are really trying to solve, right?  You just were
trying to recreate the mips problem on x86?

If you want, do a sysrq-t to get traces of all those cp's to see where
they're stuck, but this probably isn't getting you much closer to
solving the original problem.

(BTW: is this the exact same testcase that led to the block 0 access on
mips which started this thread?)

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:22                                                                   ` Eric Sandeen
@ 2008-07-07  3:42                                                                     ` Sagar Borikar
       [not found]                                                                       ` <487191C2.6090803@sandeen  .net>
  2008-07-07  3:47                                                                       ` Eric Sandeen
  0 siblings, 2 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-07-07  3:42 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs



Eric Sandeen wrote:
> Sagar Borikar wrote:
>
>   
>> All the the copies are pending and file size in those directories is 
>> constant. It is not
>> increasing.
>>  And as the processes are in D state, the file system is marked as busy 
>> and I can't unmount
>> it.
>>     
>
> Understood.  It looks like you've deadlocked somewhere.  But, this is
> not the problem you are really trying to solve, right?  You just were
> trying to recreate the mips problem on x86?
>   
That's right. The intention behind testing on 2.6.24 was to check 
whether we can imitate
failure on x86 which is considered to be more robust. If we replicate 
the failure then
there could be some issue in XFS and if the test passes then we can back 
port this kernel
on MIPS ( Which any way I am doing with your patches ). But I faced 
similar deadlock on MIPS
with exceptions which I posted earlier.

> If you want, do a sysrq-t to get traces of all those cp's to see where
> they're stuck, but this probably isn't getting you much closer to
> solving the original problem.
>
>   
I'll keep you posted with it.
> (BTW: is this the exact same testcase that led to the block 0 access on
> mips which started this thread?)
>
> -Eric
>   
Ok. So initially our multi client iozone stress test used to fail. But 
as it took 2-3 days
to replicate the issue, I tried the test, standalone on MIPS and 
observed similar failures which
I used to get in multi client test. The test is exactly same what I do 
in mutli client
iozoen over network. Hence I came to conclusion that if we fix system to 
pass my test case
then we can try iozone test with that fix.  And now on x86 with 2.6.24, 
I am finding similar deadlock but
the system is responsive and there are no lockups or exceptions. Do you 
observe similar failures on x86
at your setup? Also do you think the issues which I am seeing on x86 and 
MIPS are coming from the
same sources?

Thanks
Sagar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:42                                                                     ` Sagar Borikar
       [not found]                                                                       ` <487191C2.6090803@sandeen  .net>
@ 2008-07-07  3:47                                                                       ` Eric Sandeen
  2008-07-07  3:58                                                                         ` Sagar Borikar
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-07  3:47 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs

Sagar Borikar wrote:


> Ok. So initially our multi client iozone stress test used to fail. 

Are these multiple nfs clients?

> But 
> as it took 2-3 days
> to replicate the issue, I tried the test, standalone on MIPS and 

the iozone test again?

> observed similar failures which
> I used to get in multi client test. The test is exactly same what I do 
> in mutli client
> iozoen over network. Hence I came to conclusion that if we fix system to 
> pass my test case
> then we can try iozone test with that fix.  And now on x86 with 2.6.24, 
> I am finding similar deadlock but
> the system is responsive and there are no lockups or exceptions. Do you 
> observe similar failures on x86
> at your setup? 

So far I've not seen the deadlocks.

> Also do you think the issues which I am seeing on x86 and 
> MIPS are coming from the
> same sources?

hard to say at this point, I think.

-Eric

> Thanks
> Sagar
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:47                                                                       ` Eric Sandeen
@ 2008-07-07  3:58                                                                         ` Sagar Borikar
  2008-07-07  5:19                                                                           ` Eric Sandeen
  0 siblings, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-07  3:58 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs



Eric Sandeen wrote:
> Sagar Borikar wrote:
>
>
>   
>> Ok. So initially our multi client iozone stress test used to fail. 
>>     
>
> Are these multiple nfs clients?
>   
Actually mix of them. 15 CIFS clients, 4 NFS clients ( 19 iozone clients 
) , 2 FTP clients,
4 HTTP transfers. ( Total 25 transactions simultaneously )
>   
>> But 
>> as it took 2-3 days
>> to replicate the issue, I tried the test, standalone on MIPS and 
>>     
>
> the iozone test again?
>   
iozone test is continuously giving the access to block zero exception 
and xfs shutdown
errors with transaction cancel exceptions plus alloc btree corruption 
exception which I
reported earlier. And my test gives transaction cancel exception and 
block zero exception
with processes under test in deadlock state on MIPS but on x86 there are 
no exceptions but
only incomplete copies due to uninterruptible sleep state and deadlock.
>   
>> observed similar failures which
>> I used to get in multi client test. The test is exactly same what I do 
>> in mutli client
>> iozoen over network. Hence I came to conclusion that if we fix system to 
>> pass my test case
>> then we can try iozone test with that fix.  And now on x86 with 2.6.24, 
>> I am finding similar deadlock but
>> the system is responsive and there are no lockups or exceptions. Do you 
>> observe similar failures on x86
>> at your setup? 
>>     
>
> So far I've not seen the deadlocks.
>   
Could you kindly try with my test? I presume you should see failure 
soon. I tried this on
2 different x86 systems 2 times ( after rebooting the system ) and I saw 
it every time.
>   
>> Also do you think the issues which I am seeing on x86 and 
>> MIPS are coming from the
>> same sources?
>>     
>
> hard to say at this point, I think.
>
> -Eric
>
>   
>> Thanks
>> Sagar
>>
>>     
>
>   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  3:58                                                                         ` Sagar Borikar
@ 2008-07-07  5:19                                                                           ` Eric Sandeen
  2008-07-07  5:58                                                                             ` Sagar Borikar
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Sandeen @ 2008-07-07  5:19 UTC (permalink / raw)
  To: Sagar Borikar; +Cc: xfs

Sagar Borikar wrote:


> Could you kindly try with my test? I presume you should see failure 
> soon. I tried this on
> 2 different x86 systems 2 times ( after rebooting the system ) and I saw 
> it every time.


Sure.  Is there a reason you're doing this on a loopback file?  That
probably stresses the vm a bit more, and might get even trickier if the
loopback file is sparse...

But anyway, on an x86_64 machine with 2G of memory and a non-sparse 10G
loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for me,
though the system does get sluggish.  I let it run a bit then ran repair
and it found no problems, I'll run it overnight to see if anything else
turns up.

-Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Xfs Access to block zero  exception and system crash
  2008-07-07  5:19                                                                           ` Eric Sandeen
@ 2008-07-07  5:58                                                                             ` Sagar Borikar
  0 siblings, 0 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-07-07  5:58 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs



Eric Sandeen wrote:
> Sagar Borikar wrote:
>
>
>   
>> Could you kindly try with my test? I presume you should see failure 
>> soon. I tried this on
>> 2 different x86 systems 2 times ( after rebooting the system ) and I saw 
>> it every time.
>>     
>
>
> Sure.  Is there a reason you're doing this on a loopback file?  That
> probably stresses the vm a bit more, and might get even trickier if the
> loopback file is sparse...
>   
Initially I thought to do that since I didn't want to have a strict 
allocation limit but
allowing allocations to  grow as needed until the backing filesystem 
runs out of free space
due to type of the test case I had. But then I dropped the plan and 
created a non-sparse
loopback device. There was no specific reason to create loopback but as 
it was
simplest option to do it.
> But anyway, on an x86_64 machine with 2G of memory and a non-sparse 10G
> loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for me,
> though the system does get sluggish.  I let it run a bit then ran repair
> and it found no problems, I'll run it overnight to see if anything else
> turns up.
>   
That will be great.  Thanks indeed.
Sagar

> -Eric
>   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
       [not found] ` <4872E33E.3090107@sandeen.net>
@ 2008-07-08  5:03   ` Sagar Borikar
  2008-07-09 16:57   ` Sagar Borikar
  1 sibling, 0 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-07-08  5:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Raj Palani, xfs

Sure Eric, I'll keep you posted with the results w/o loop back file.
When you say that the deadlock could be due to vm, is it due to lack of
memory? I checked meminfo and I found that sufficient buffers and
committed_as were persent when xfs is stalled. 

Thanks
Sagar


Sagar Borikar wrote:
> That's right Eric but I am still surprised that why should we get a 
> dead lock in this scenario as it is a plain copy of file in multiple 
> directories.  Our customer is reporting similar kind of lockup in our 
> platform.

ok, I guess I had missed that, sorry.

> I do understand that we are chasing the access to block zero exception

> and XFS forced shutdown which I mentioned earlier.  But we also see 
> quite a few smbd processes which are writing data to XFS are in 
> uninterruptible sleep state and the system locks up too.

Ok; then the next step is probably to do sysrq-t and see where things
are stuck.  It might be better to see if you can reproduce w/o the
loopback file, too, since that's just another layer to go through that
might be changing things.

> So I thought
> the test which I am running could be pointing to similar issue which 
> we are observing on our platform. But does this indicate that the 
> problem lies with x86 XFS too ?

or maybe the vm ...

> Also I presume in enterprise market such kind of simultaneous write 
> situation may happen.  Has anybody reported similar issues to you? As 
> you observed it over x86 and 2.6.24 kernel, could you say what would 
> be root cause of this?

Haven't really seen it before that I recall, and at this point can't say
for sure what it might be.

-Eric

>     Sorry for lots of questions at same time :) But I am happy that 
> you were able to see the deadlock in x86 on your setup with 2.6.24
> 
> Thanks
> Sagar
> 
> 
> Eric Sandeen wrote:
>> Sagar Borikar wrote:
>>   
>>> Hi Eric,
>>>
>>> Did you see any issues in your test? 
>>>     
>> I got a deadlock but that's it; I don't think that's the bug you want

>> to chase...
>>
>>
>> -Eric
>>
>>   
>>> Thanks
>>> Sagar
>>>
>>>
>>> Sagar Borikar wrote:
>>>     
>>>> Eric Sandeen wrote:
>>>>       
>>>>> Sagar Borikar wrote:
>>>>>
>>>>>
>>>>>  
>>>>>         
>>>>>> Could you kindly try with my test? I presume you should see 
>>>>>> failure soon. I tried this on
>>>>>> 2 different x86 systems 2 times ( after rebooting the system ) 
>>>>>> and I saw it every time.
>>>>>>     
>>>>>>           
>>>>> Sure.  Is there a reason you're doing this on a loopback file?  
>>>>> That probably stresses the vm a bit more, and might get even 
>>>>> trickier if the loopback file is sparse...
>>>>>   
>>>>>         
>>>> Initially I thought to do that since I didn't want to have a strict

>>>> allocation limit but allowing allocations to  grow as needed until 
>>>> the backing filesystem runs out of free space due to type of the 
>>>> test case I had. But then I dropped the plan and created a 
>>>> non-sparse loopback device. There was no specific reason to create 
>>>> loopback but as it was simplest option to do it.
>>>>       
>>>>> But anyway, on an x86_64 machine with 2G of memory and a 
>>>>> non-sparse 10G loopback file on 2.6.24.7-92.fc8, your test runs 
>>>>> w/o problems for me, though the system does get sluggish.  I let 
>>>>> it run a bit then ran repair and it found no problems, I'll run it

>>>>> overnight to see if anything else turns up.
>>>>>   
>>>>>         
>>>> That will be great.  Thanks indeed.
>>>> Sagar
>>>>
>>>>       
>>>>> -Eric
>>>>>   
>>>>>         
>>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
       [not found] ` <4872E33E.3090107@sandeen.net>
  2008-07-08  5:03   ` Sagar Borikar
@ 2008-07-09 16:57   ` Sagar Borikar
  2008-07-10  5:12     ` Sagar Borikar
  1 sibling, 1 reply; 48+ messages in thread
From: Sagar Borikar @ 2008-07-09 16:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Sagar Borikar wrote:
> That's right Eric but I am still surprised that why should we get a
dead 
> lock in this scenario as it is a plain copy of file in multiple 
> directories.  Our customer is reporting similar kind of lockup in our 
> platform. 

ok, I guess I had missed that, sorry.

> I do understand that we are chasing the access to block zero 
> exception and XFS forced shutdown which I mentioned earlier.  But we 
> also see quite a few smbd processes which are writing data to XFS are
in 
> uninterruptible sleep state and the system locks up too. 

Ok; then the next step is probably to do sysrq-t and see where things
are stuck.  It might be better to see if you can reproduce w/o the
loopback file, too, since that's just another layer to go through that
might be changing things.

<Sagar> I ran it on actual device w/o loopback file and even there
observed that XFS transactions going into uninterruptible sleep state
and the copies were stalled. I had to hard reboot the system to bring
XFS out of that state since soft reboot didn't work, it was waiting for
file system to get unmounted. I shall provide the sysrq-t update later.

> So I thought 
> the test which I am running could be pointing to similar issue which
we 
> are observing on our platform. But does this indicate that the problem

> lies with x86 XFS too ?  

or maybe the vm ...

> Also I presume in enterprise market such kind 
> of simultaneous write situation may happen.  Has anybody reported 
> similar issues to you? As you observed it over x86 and 2.6.24 kernel, 
> could you say what would be root cause of this?

Haven't really seen it before that I recall, and at this point can't say
for sure what it might be.

-Eric

>     Sorry for lots of questions at same time :) But I am happy that
you 
> were able to see the deadlock in x86 on your setup with 2.6.24
> 
> Thanks
> Sagar
> 
> 
> Eric Sandeen wrote:
>> Sagar Borikar wrote:
>>   
>>> Hi Eric,
>>>
>>> Did you see any issues in your test? 
>>>     
>> I got a deadlock but that's it; I don't think that's the bug you want
to
>> chase...
>>
>>
>> -Eric
>>
>>   
>>> Thanks
>>> Sagar
>>>
>>>
>>> Sagar Borikar wrote:
>>>     
>>>> Eric Sandeen wrote:
>>>>       
>>>>> Sagar Borikar wrote:
>>>>>
>>>>>
>>>>>  
>>>>>         
>>>>>> Could you kindly try with my test? I presume you should see
failure 
>>>>>> soon. I tried this on
>>>>>> 2 different x86 systems 2 times ( after rebooting the system )
and I 
>>>>>> saw it every time.
>>>>>>     
>>>>>>           
>>>>> Sure.  Is there a reason you're doing this on a loopback file?
That
>>>>> probably stresses the vm a bit more, and might get even trickier
if the
>>>>> loopback file is sparse...
>>>>>   
>>>>>         
>>>> Initially I thought to do that since I didn't want to have a strict

>>>> allocation limit but
>>>> allowing allocations to  grow as needed until the backing
filesystem 
>>>> runs out of free space
>>>> due to type of the test case I had. But then I dropped the plan and

>>>> created a non-sparse
>>>> loopback device. There was no specific reason to create loopback
but 
>>>> as it was
>>>> simplest option to do it.
>>>>       
>>>>> But anyway, on an x86_64 machine with 2G of memory and a
non-sparse 10G
>>>>> loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for
me,
>>>>> though the system does get sluggish.  I let it run a bit then ran
repair
>>>>> and it found no problems, I'll run it overnight to see if anything
else
>>>>> turns up.
>>>>>   
>>>>>         
>>>> That will be great.  Thanks indeed.
>>>> Sagar
>>>>
>>>>       
>>>>> -Eric
>>>>>   
>>>>>         
>>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: Xfs Access to block zero  exception and system crash
  2008-07-09 16:57   ` Sagar Borikar
@ 2008-07-10  5:12     ` Sagar Borikar
  0 siblings, 0 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-07-10  5:12 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

Eric,

Could be a slight digression but can you let me know why the
fragmentation factor is going to 99% immediately? I observed this on
both x86 and MIPS platform. Also to alleviate this issue, if I specify
allocsize=512m what would be the consequences? Since default allocsize
is 64k right? Also while mounting we are setting up default option for
mounting file system.

Thanks
Sagar



-----Original Message-----
From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf
Of Sagar Borikar
Sent: Wednesday, July 09, 2008 10:28 PM
To: Eric Sandeen
Cc: xfs@oss.sgi.com
Subject: RE: Xfs Access to block zero exception and system crash

Sagar Borikar wrote:
> That's right Eric but I am still surprised that why should we get a
dead 
> lock in this scenario as it is a plain copy of file in multiple 
> directories.  Our customer is reporting similar kind of lockup in our 
> platform. 

ok, I guess I had missed that, sorry.

> I do understand that we are chasing the access to block zero 
> exception and XFS forced shutdown which I mentioned earlier.  But we 
> also see quite a few smbd processes which are writing data to XFS are
in 
> uninterruptible sleep state and the system locks up too. 

Ok; then the next step is probably to do sysrq-t and see where things
are stuck.  It might be better to see if you can reproduce w/o the
loopback file, too, since that's just another layer to go through that
might be changing things.

<Sagar> I ran it on actual device w/o loopback file and even there
observed that XFS transactions going into uninterruptible sleep state
and the copies were stalled. I had to hard reboot the system to bring
XFS out of that state since soft reboot didn't work, it was waiting for
file system to get unmounted. I shall provide the sysrq-t update later.

> So I thought 
> the test which I am running could be pointing to similar issue which
we 
> are observing on our platform. But does this indicate that the problem

> lies with x86 XFS too ?  

or maybe the vm ...

> Also I presume in enterprise market such kind 
> of simultaneous write situation may happen.  Has anybody reported 
> similar issues to you? As you observed it over x86 and 2.6.24 kernel, 
> could you say what would be root cause of this?

Haven't really seen it before that I recall, and at this point can't say
for sure what it might be.

-Eric

>     Sorry for lots of questions at same time :) But I am happy that
you 
> were able to see the deadlock in x86 on your setup with 2.6.24
> 
> Thanks
> Sagar
> 
> 
> Eric Sandeen wrote:
>> Sagar Borikar wrote:
>>   
>>> Hi Eric,
>>>
>>> Did you see any issues in your test? 
>>>     
>> I got a deadlock but that's it; I don't think that's the bug you want
to
>> chase...
>>
>>
>> -Eric
>>
>>   
>>> Thanks
>>> Sagar
>>>
>>>
>>> Sagar Borikar wrote:
>>>     
>>>> Eric Sandeen wrote:
>>>>       
>>>>> Sagar Borikar wrote:
>>>>>
>>>>>
>>>>>  
>>>>>         
>>>>>> Could you kindly try with my test? I presume you should see
failure 
>>>>>> soon. I tried this on
>>>>>> 2 different x86 systems 2 times ( after rebooting the system )
and I 
>>>>>> saw it every time.
>>>>>>     
>>>>>>           
>>>>> Sure.  Is there a reason you're doing this on a loopback file?
That
>>>>> probably stresses the vm a bit more, and might get even trickier
if the
>>>>> loopback file is sparse...
>>>>>   
>>>>>         
>>>> Initially I thought to do that since I didn't want to have a strict

>>>> allocation limit but
>>>> allowing allocations to  grow as needed until the backing
filesystem 
>>>> runs out of free space
>>>> due to type of the test case I had. But then I dropped the plan and

>>>> created a non-sparse
>>>> loopback device. There was no specific reason to create loopback
but 
>>>> as it was
>>>> simplest option to do it.
>>>>       
>>>>> But anyway, on an x86_64 machine with 2G of memory and a
non-sparse 10G
>>>>> loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for
me,
>>>>> though the system does get sluggish.  I let it run a bit then ran
repair
>>>>> and it found no problems, I'll run it overnight to see if anything
else
>>>>> turns up.
>>>>>   
>>>>>         
>>>> That will be great.  Thanks indeed.
>>>> Sagar
>>>>
>>>>       
>>>>> -Eric
>>>>>   
>>>>>         
>>   
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2008-07-10  5:11 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-24  7:03 Xfs Access to block zero exception and system crash Sagar Borikar
2008-06-25  6:48 ` Sagar Borikar
2008-06-25  8:49 ` Dave Chinner
2008-06-26  6:46   ` Sagar Borikar
2008-06-26  7:02     ` Dave Chinner
2008-06-27 10:13       ` Sagar Borikar
2008-06-27 10:25         ` Sagar Borikar
2008-06-28  0:05           ` Dave Chinner
2008-06-28 16:47             ` Sagar Borikar
2008-06-29 21:56               ` Dave Chinner
2008-06-30  3:37                 ` Sagar Borikar
     [not found]                 ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
2008-06-30  6:07                   ` Sagar Borikar
2008-06-30 10:24                   ` Sagar Borikar
2008-07-01  6:44                     ` Dave Chinner
2008-07-02  4:18                       ` Sagar Borikar
2008-07-02  5:13                         ` Dave Chinner
2008-07-02  5:35                           ` Sagar Borikar
2008-07-02  6:13                             ` Nathan Scott
2008-07-02  6:56                               ` Dave Chinner
2008-07-02 11:02                                 ` Sagar Borikar
2008-07-03  4:03                                   ` Eric Sandeen
2008-07-03  5:14                                     ` Sagar Borikar
2008-07-03 15:02                                       ` Eric Sandeen
2008-07-04 10:18                                         ` Sagar Borikar
2008-07-04 12:27                                           ` Dave Chinner
2008-07-04 17:30                                             ` Sagar Borikar
2008-07-04 17:35                                               ` Eric Sandeen
2008-07-04 17:51                                                 ` Sagar Borikar
2008-07-05 16:25                                                   ` Eric Sandeen
2008-07-06 17:24                                                     ` Sagar Borikar
2008-07-06 19:07                                                       ` Eric Sandeen
2008-07-07  3:02                                                         ` Sagar Borikar
2008-07-07  3:04                                                           ` Eric Sandeen
2008-07-07  3:07                                                             ` Sagar Borikar
2008-07-07  3:11                                                               ` Eric Sandeen
2008-07-07  3:17                                                                 ` Sagar Borikar
2008-07-07  3:22                                                                   ` Eric Sandeen
2008-07-07  3:42                                                                     ` Sagar Borikar
     [not found]                                                                       ` <487191C2.6090803@sandeen  .net>
     [not found]                                                                         ` <4871947D.2090701@pmc-sierr a.com>
2008-07-07  3:47                                                                       ` Eric Sandeen
2008-07-07  3:58                                                                         ` Sagar Borikar
2008-07-07  5:19                                                                           ` Eric Sandeen
2008-07-07  5:58                                                                             ` Sagar Borikar
2008-07-06  4:19                                                   ` Dave Chinner
2008-07-04 15:33                                           ` Eric Sandeen
2008-06-28  0:02         ` Dave Chinner
     [not found] <4872E0BC.6070400@pmc-sierra.com>
     [not found] ` <4872E33E.3090107@sandeen.net>
2008-07-08  5:03   ` Sagar Borikar
2008-07-09 16:57   ` Sagar Borikar
2008-07-10  5:12     ` Sagar Borikar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox