* Xfs Access to block zero exception and system crash
@ 2008-06-24 7:03 Sagar Borikar
2008-06-25 6:48 ` Sagar Borikar
2008-06-25 8:49 ` Dave Chinner
0 siblings, 2 replies; 48+ messages in thread
From: Sagar Borikar @ 2008-06-24 7:03 UTC (permalink / raw)
To: xfs; +Cc: Sagar Borikar
Hello,
I hope this is the right list to address this issue. If not please divert me to the right list.
We are facing strange issue with xfs under heavy load. It's a NAS box with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 2.8.11.
NAS allows to create RAID1,RAID5 over XFS. The system is stable in general without any stress. We don't see any issues in day to day activities.
But when it is exposed to stress with multiple iozone clients, it starts giving weird issues.
The iozone stress test is ran with 15 CIFS clients, pumping data over 1GBps network, continuously for 48 hours as a part of calculating MTBF of the system, it crashes at different stages in different stimulus but in XFS only.
A. Initially it used to give access to block zero exception and system used to crash for which I applied Nathan Scott's patch which removes the kernel panic when this situation is hit. http://oss.sgi.com/archives/xfs/2006-08/msg00073.html
After back porting this patch, we observed that the system is not crashing but the warning messages are still coming. And after some time the system goes in soft lockup state and becomes non-responsive.
I couldn't run the xfs_db or xfs_repair to check what is the state of the inode as console was not reachable after hitting the lockup state.
Here is the log
"
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
af
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
af
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
af
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
b0
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
b0
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7
e7
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8
20
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8
20
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta
te: 1 lastx: 88d
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st
ate: 1 lastx: 88f
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st
ate: 1 lastx: 891
Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st
"
Once we hit the soft lockup, system has to be rebooted as it is completely stalled and we can't even check which processes are running. I could be wrong but it was surprising to me that the same inode was referring to different offsets and blkcnt. It took 48 hours to reach this state and system had to be rebooted.
B. In another DUT, the system just rebooted after displaying couple of warning messages without entering in soft lockup state.
"
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46
d
Filesystem "dm-1": Access to block zero in inï
PMON2000 MIPS Initializing. Standby...
ERRORPC=bfc00004 CONFIG=0042e4bb STATUS=00400000
CPU PRID 000034c1, MaskID 00001320
Initializing caches...done (CONFIG=0042e4bb)
Switching to runtime address map...done
Setting up SDRAM controller: Manual SDRAM setup
drive strength 0x000073c7
output timing 0x00000fca
general config 0x80010000
master clock 100 Mhz, MulFundBIU 0x02, DivXSDRAM 0x02
sdram freq 0x09ef21aa hz, sdram period: 0x06 nsec
"
It took 43 hours to come to this state.
C. In another stimulus, device driver mentioned that it can't access the block. Which means that filesystem got corrupted. I inferred that Filesystem was trying to reach block which is not existing in the disk. After some time, it recovers itself and starts giving weird issue. Finally it hits the memory access exception and system crashes.
"
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
8
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
8
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a
1
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count
512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count
512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count
512
attempt to access beyond end of device
dm-1: rw=0, want=1003118956380168, limit=8388608
I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count
512
attempt to access beyond end of device
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
e
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
e
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
f
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
f
Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f
f
CPU 0 Unable to handle kernel paging request at virtual address 04e81080, epc == 802a90ac, ra == 802a9094
Oops[#1]:
Cpu 0
$ 0 : 00000000 9000a001 84e81080 80000000
$ 4 : 82ce6dd0 00000000 ffffffff ffffffff
$ 8 : 00086800 00000000 00086800 00000001
$12 : 00000004 34000000 82ce6c00 00000001
$16 : ffffffff 04e81080 34000000 81213978
$20 : 82ce6c00 82ce6dd0 00000000 34000000
$24 : 00086800 00000000
$28 : 81212000 81213878 00000000 802a9094
Hi : 00000000
Lo : 00036a20
epc : 802a90ac xfs_bmap_btalloc+0x33c/0x950 Not tainted
ra : 802a9094 xfs_bmap_btalloc+0x324/0x950
Status: 9000a003 KERNEL EXL IE
Cause : 00000008
BadVA : 04e81080
PrId : 000034c1
Modules linked in: aes autofs4
Process pdflush (pid: 66, threadinfo=81212000, task=8120b138)
Stack : 81213880 811c9074 00000003 863af000 00000000 00000001 000000cb 805c1f90
812139b8 8616ece0 8538e6f8 82ce6c00 812139fc ffffffff 00086800 00000000
802aad9c 802aad80 8616ed30 00000001 8173c6f4 813cf200 812138d8 00000001
00000200 00000000 812139b8 00000004 81213a00 00000000 ffffffff ffffffff
00000000 00000000 00000001 00000000 00000000 81213a00 000002a3 81213ac0
...
Call Trace:
[<802a90ac>] xfs_bmap_btalloc+0x33c/0x950
[<802a9700>] xfs_bmap_alloc+0x40/0x4c
[<802acc9c>] xfs_bmapi+0x8d8/0x13e4
[<802d42d4>] xfs_iomap_write_allocate+0x3c0/0x5f4
[<802d2b28>] xfs_iomap+0x408/0x4dc
[<802fe90c>] xfs_bmap+0x30/0x3c
[<802f3cfc>] xfs_map_blocks+0x50/0x84
[<802f512c>] xfs_page_state_convert+0x3f4/0x840
[<802f565c>] xfs_vm_writepage+0xe4/0x140
[<80198758>] mpage_writepages+0x24c/0x45c
[<802f56e8>] xfs_vm_writepages+0x30/0x3c
[<801507b4>] do_writepages+0x44/0x84
[<80196628>] __sync_single_inode+0x68/0x234
[<80196980>] __writeback_single_inode+0x18c/0x1ac
[<80196ba8>] sync_sb_inodes+0x208/0x2f0
[<80196d14>] writeback_inodes+0x84/0xd0
[<801503e0>] background_writeout+0xac/0xfc
[<80151330>] __pdflush+0x130/0x228
[<80151458>] pdflush+0x30/0x3c
[<801398bc>] kthread+0x98/0xe0
[<80104c38>] kernel_thread_helper+0x10/0x18
"
In all the three cases, when I tried to perform the slower tests i.e. with 6 clients but with the same stimulus, there we re no exceptions and system was stable for 5 days.
[root@Cousteau6 ~]# df -k
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/scsibd2 41664 41664 0 100% /
udev 62044 4 62040 0% /dev
tmpfs 5120 3468 1652 68% /var
tmpfs 62044 24 62020 0% /tmp
tmpfs 128 4 124 3% /mnt
/dev/mtdblock1 1664 436 1228 26% /linuxrwfs
/dev/RAIDA/Volume1 10475520 624 10474896 0% /mnt/RAIDA/Volume1
/dev/RAIDA/Volume1 10475520 624 10474896 0% /mnt/ftp_dir/homes
/dev/RAIDA/IOZONETEST 4184064 2479044 1705020 59% /mnt/RAIDA/IOZONETEST
/dev/RAIDA/IOZONETEST 4184064 2479044 1705020 59% /mnt/ftp_dir/share1
/dev/RAIDA/Volume1 10475520 624 10474896 0% /mnt/ftp_dir/share2
Can anyone let me know what could be the probable cause of this issue.
Thanks in advance
Sagar
^ permalink raw reply [flat|nested] 48+ messages in thread* RE: Xfs Access to block zero exception and system crash 2008-06-24 7:03 Xfs Access to block zero exception and system crash Sagar Borikar @ 2008-06-25 6:48 ` Sagar Borikar 2008-06-25 8:49 ` Dave Chinner 1 sibling, 0 replies; 48+ messages in thread From: Sagar Borikar @ 2008-06-25 6:48 UTC (permalink / raw) To: xfs; +Cc: linux-xfs Hello, Can anyone help me out here? Thanks Sagar -----Original Message----- From: Sagar Borikar Sent: Tuesday, June 24, 2008 12:33 PM To: 'xfs@oss.sgi.com' Cc: Sagar Borikar Subject: Xfs Access to block zero exception and system crash Hello, I hope this is the right list to address this issue. If not please divert me to the right list. We are facing strange issue with xfs under heavy load. It's a NAS box with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version 2.8.11. NAS allows to create RAID1,RAID5 over XFS. The system is stable in general without any stress. We don't see any issues in day to day activities. But when it is exposed to stress with multiple iozone clients, it starts giving weird issues. The iozone stress test is ran with 15 CIFS clients, pumping data over 1GBps network, continuously for 48 hours as a part of calculating MTBF of the system, it crashes at different stages in different stimulus but in XFS only. A. Initially it used to give access to block zero exception and system used to crash for which I applied Nathan Scott's patch which removes the kernel panic when this situation is hit. http://oss.sgi.com/archives/xfs/2006-08/msg00073.html After back porting this patch, we observed that the system is not crashing but the warning messages are still coming. And after some time the system goes in soft lockup state and becomes non-responsive. I couldn't run the xfs_db or xfs_repair to check what is the state of the inode as console was not reachable after hitting the lockup state. Here is the log " Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 af Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 af Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 af Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 b0 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 b0 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 e7 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 e7 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 7 e7 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8 20 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 8 20 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta te: 1 lastx: 88d Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta te: 1 lastx: 88d Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 68 extent-sta te: 1 lastx: 88d Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st ate: 1 lastx: 88f Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st ate: 1 lastx: 88f Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffc000000 blkcnt: 180 extent-st ate: 1 lastx: 88f Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st ate: 1 lastx: 891 Filesystem "dm-0": Access to block zero in inode 33554565 start_block: 0 start_off: 3ffffffe000000 blkcnt: 1a0 extent-st " Once we hit the soft lockup, system has to be rebooted as it is completely stalled and we can't even check which processes are running. I could be wrong but it was surprising to me that the same inode was referring to different offsets and blkcnt. It took 48 hours to reach this state and system had to be rebooted. B. In another DUT, the system just rebooted after displaying couple of warning messages without entering in soft lockup state. " Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inode 2097283 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 46 d Filesystem "dm-1": Access to block zero in inï PMON2000 MIPS Initializing. Standby... ERRORPC=bfc00004 CONFIG=0042e4bb STATUS=00400000 CPU PRID 000034c1, MaskID 00001320 Initializing caches...done (CONFIG=0042e4bb) Switching to runtime address map...done Setting up SDRAM controller: Manual SDRAM setup drive strength 0x000073c7 output timing 0x00000fca general config 0x80010000 master clock 100 Mhz, MulFundBIU 0x02, DivXSDRAM 0x02 sdram freq 0x09ef21aa hz, sdram period: 0x06 nsec " It took 43 hours to come to this state. C. In another stimulus, device driver mentioned that it can't access the block. Which means that filesystem got corrupted. I inferred that Filesystem was trying to reach block which is not existing in the disk. After some time, it recovers itself and starts giving weird issue. Finally it hits the memory access exception and system crashes. " Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a 8 Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a 8 Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a 1 Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a 1 Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a 1 attempt to access beyond end of device dm-1: rw=0, want=1003118956380168, limit=8388608 I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count 512 attempt to access beyond end of device dm-1: rw=0, want=1003118956380168, limit=8388608 I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count 512 attempt to access beyond end of device dm-1: rw=0, want=1003118956380168, limit=8388608 I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count 512 attempt to access beyond end of device dm-1: rw=0, want=1003118956380168, limit=8388608 I/O error in filesystem ("dm-1") meta-data dev dm-1 block 0x39054d5100001 ("xfs_trans_read_buf") error 5 buf count 512 attempt to access beyond end of device Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f e Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f e Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f f Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f f Filesystem "dm-1": Access to block zero in inode 2097284 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1f f CPU 0 Unable to handle kernel paging request at virtual address 04e81080, epc == 802a90ac, ra == 802a9094 Oops[#1]: Cpu 0 $ 0 : 00000000 9000a001 84e81080 80000000 $ 4 : 82ce6dd0 00000000 ffffffff ffffffff $ 8 : 00086800 00000000 00086800 00000001 $12 : 00000004 34000000 82ce6c00 00000001 $16 : ffffffff 04e81080 34000000 81213978 $20 : 82ce6c00 82ce6dd0 00000000 34000000 $24 : 00086800 00000000 $28 : 81212000 81213878 00000000 802a9094 Hi : 00000000 Lo : 00036a20 epc : 802a90ac xfs_bmap_btalloc+0x33c/0x950 Not tainted ra : 802a9094 xfs_bmap_btalloc+0x324/0x950 Status: 9000a003 KERNEL EXL IE Cause : 00000008 BadVA : 04e81080 PrId : 000034c1 Modules linked in: aes autofs4 Process pdflush (pid: 66, threadinfo=81212000, task=8120b138) Stack : 81213880 811c9074 00000003 863af000 00000000 00000001 000000cb 805c1f90 812139b8 8616ece0 8538e6f8 82ce6c00 812139fc ffffffff 00086800 00000000 802aad9c 802aad80 8616ed30 00000001 8173c6f4 813cf200 812138d8 00000001 00000200 00000000 812139b8 00000004 81213a00 00000000 ffffffff ffffffff 00000000 00000000 00000001 00000000 00000000 81213a00 000002a3 81213ac0 ... Call Trace: [<802a90ac>] xfs_bmap_btalloc+0x33c/0x950 [<802a9700>] xfs_bmap_alloc+0x40/0x4c [<802acc9c>] xfs_bmapi+0x8d8/0x13e4 [<802d42d4>] xfs_iomap_write_allocate+0x3c0/0x5f4 [<802d2b28>] xfs_iomap+0x408/0x4dc [<802fe90c>] xfs_bmap+0x30/0x3c [<802f3cfc>] xfs_map_blocks+0x50/0x84 [<802f512c>] xfs_page_state_convert+0x3f4/0x840 [<802f565c>] xfs_vm_writepage+0xe4/0x140 [<80198758>] mpage_writepages+0x24c/0x45c [<802f56e8>] xfs_vm_writepages+0x30/0x3c [<801507b4>] do_writepages+0x44/0x84 [<80196628>] __sync_single_inode+0x68/0x234 [<80196980>] __writeback_single_inode+0x18c/0x1ac [<80196ba8>] sync_sb_inodes+0x208/0x2f0 [<80196d14>] writeback_inodes+0x84/0xd0 [<801503e0>] background_writeout+0xac/0xfc [<80151330>] __pdflush+0x130/0x228 [<80151458>] pdflush+0x30/0x3c [<801398bc>] kthread+0x98/0xe0 [<80104c38>] kernel_thread_helper+0x10/0x18 " In all the three cases, when I tried to perform the slower tests i.e. with 6 clients but with the same stimulus, there we re no exceptions and system was stable for 5 days. [root@Cousteau6 ~]# df -k Filesystem 1k-blocks Used Available Use% Mounted on /dev/scsibd2 41664 41664 0 100% / udev 62044 4 62040 0% /dev tmpfs 5120 3468 1652 68% /var tmpfs 62044 24 62020 0% /tmp tmpfs 128 4 124 3% /mnt /dev/mtdblock1 1664 436 1228 26% /linuxrwfs /dev/RAIDA/Volume1 10475520 624 10474896 0% /mnt/RAIDA/Volume1 /dev/RAIDA/Volume1 10475520 624 10474896 0% /mnt/ftp_dir/homes /dev/RAIDA/IOZONETEST 4184064 2479044 1705020 59% /mnt/RAIDA/IOZONETEST /dev/RAIDA/IOZONETEST 4184064 2479044 1705020 59% /mnt/ftp_dir/share1 /dev/RAIDA/Volume1 10475520 624 10474896 0% /mnt/ftp_dir/share2 Can anyone let me know what could be the probable cause of this issue. Thanks in advance Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-24 7:03 Xfs Access to block zero exception and system crash Sagar Borikar 2008-06-25 6:48 ` Sagar Borikar @ 2008-06-25 8:49 ` Dave Chinner 2008-06-26 6:46 ` Sagar Borikar 1 sibling, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-06-25 8:49 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Tue, Jun 24, 2008 at 12:03:16AM -0700, Sagar Borikar wrote: > > Hello, > > I hope this is the right list to address this issue. If not please divert me to the right list. > > We are facing strange issue with xfs under heavy load. It's a NAS > box with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS > version 2.8.11. [...] > Can anyone let me know what could be the probable cause of this issue. they are all from corrupted extent btrees. There are many possible causes of this that we've fixed over the past years since 2.6.18 was released. Indeed, we are currently discussing fixes for a bunch of problems that lead to corrupted extent btrees and problems like this. I'd suggest that you should probably start with a more recent kernel, make sure you have a serial console and set the xfs_error_level to 11 so that it gives as much information as possible on the console when the error it hit. if that doesn't give a stack trace, then you need to set the xfs_panic_mask to crash the machine on block zero accesses and report the stack straces that it outputs... Cheers, Dave. -- Dave Chinner dchinner@agami.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash 2008-06-25 8:49 ` Dave Chinner @ 2008-06-26 6:46 ` Sagar Borikar 2008-06-26 7:02 ` Dave Chinner 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-06-26 6:46 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Thanks Dave. >> with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version >> 2.8.11. > [...] >> Can anyone let me know what could be the probable cause of this issue. > they are all from corrupted extent btrees. > There are many possible causes of this that we've fixed over the past years since 2.6.18 was released. Indeed, we are currently discussing fixes for a > bunch of problems that lead to corrupted extent btrees and problems like this. I'd suggest that you should probably start with a more recent kernel, > make sure you have a serial console and set the xfs_error_level to 11 so that it gives as much information as possible on the console when the error it > hit. > if that doesn't give a stack trace, then you need to set the xfs_panic_mask to crash the machine on block zero accesses and report the stack straces > that it outputs... Yes, I went through the changes between 2.6.24 and 2.6.18 and they are quite a few. But as this is production system and on field, its not viable to upgrade the kernel. I do understand that there could be many places which can cause the corruption. Unfortunately, three different systems have given three different places of corruption as stated. Now I am sleeping in the access to block zero exception and rescheduling so that it won't stall the system and I can monitor the state of the filesystem. As the frequency of landing the error is once in 2.5 days under extreme stress, if you could point me to the probable place to look at, I can narrow down the debugging path. Thanks in advance Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-26 6:46 ` Sagar Borikar @ 2008-06-26 7:02 ` Dave Chinner 2008-06-27 10:13 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-06-26 7:02 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs [please wrap your replies at 72 columns] On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote: > >> with 2.6.18 kernel,128 MB of RAM, MIPS architecture and XFS version > >> 2.8.11. > > > [...] > > >> Can anyone let me know what could be the probable cause of this issue. > > > they are all from corrupted extent btrees. There are many > > possible causes of this that we've fixed over the past years > > since 2.6.18 was released. Indeed, we are currently discussing > > fixes for a bunch of problems that lead to corrupted extent > > btrees and problems like this. I'd suggest that you should > > probably start with a more recent kernel, make sure you have a > > serial console and set the xfs_error_level to 11 so that it > > gives as much information as possible on the console when the > > error it > hit. if that doesn't give a stack trace, then you > > need to set the xfs_panic_mask to crash the machine on block > > zero accesses and report the stack straces that it outputs... > > Yes, I went through the changes between 2.6.24 and 2.6.18 and they > are quite a few. But as this is production system and on field, > its not viable to upgrade the kernel. Well, you're pretty much on your own then :/ > I do understand that there > could be many places which can cause the corruption. > Unfortunately, three different systems have given three different > places of corruption as stated. Yes, but all the same pattern of corruption, so it is likely that it is one problem. > Now I am sleeping in the access to > block zero exception and rescheduling so that it won't stall the > system and I can monitor the state of the filesystem. As the > frequency of landing the error is once in 2.5 days under extreme > stress, if you could point me to the probable place to look at, I > can narrow down the debugging path. Like I said - it's a corrupt bmap btree. It could be a bug in the bmap btree code, the alloc btree code, the inode data fork manipulation code, it could be a block device bug returning bad data to XFS on on a cancelled btree readahead, etc. IOWs, there are so many possible causes of a corrupted btree that a bug report by itself is mostly useless. All I can suggest is working out a reproducable test case in your development environment, attaching a debugger and start digging around in memory when the problem is hit and try to find out exactly what is corrupted. If you can't reproduce it or work out what is occurring to trigger the problem, then we're not going to be able to find the cause... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-26 7:02 ` Dave Chinner @ 2008-06-27 10:13 ` Sagar Borikar 2008-06-27 10:25 ` Sagar Borikar 2008-06-28 0:02 ` Dave Chinner 0 siblings, 2 replies; 48+ messages in thread From: Sagar Borikar @ 2008-06-27 10:13 UTC (permalink / raw) To: xfs Dave Chinner wrote: > [please wrap your replies at 72 columns] > > On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote: > > > Yes, but all the same pattern of corruption, so it is likely > that it is one problem. > > > All I can suggest is working out a reproducable test case in your > development environment, attaching a debugger and start digging around > in memory when the problem is hit and try to find out exactly what > is corrupted. If you can't reproduce it or work out what is > occurring to trigger the problem, then we're not going to be able to > find the cause... > > Cheers, > > Dave. > Thanks Dave I did some experiments today with the corrupted filesystem. setup : NAS box contains one volume /share and 10 subdirectories. In first subdirectory sh1, I kept 512MB file. Through a script I continuously copy this file simultaneously from sh2 to sh10 subdirectories. The script looks like .... while [ 1 ] do cp $1 $2 done And when I check the process status using top, almost all the cp processes are in uninterruptible sleep state continuously. Ran xfs_repair with -n option on filesystem mounted on JBOD Here is the output : Fri Jun 27 02:13:01 2008 Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 bad nblocks 8788 for inode 33554562, would reset to 15461 bad nextents 18 for inode 33554562, would reset to 32 - agno = 2 entry "iozone_68.tst" in shortform directory 67108993 references free inode 67108995 would have junked entry "iozone_68.tst" in directory inode 67108993 data fork in ino 67108995 claims dup extent, off - 252, start - 14711445, cnt 576 bad data fork in inode 67108995 would have cleared inode 67108995 - agno = 3 entry "iozone_68.tst" in shortform directory 100663425 references free inode 100663427 would have junked entry "iozone_68.tst" in directory inode 100663425 inode 100663427 - bad extent starting block number 906006917242880, offset 2533274882670609 bad data fork in inode 100663427 would have cleared inode 100663427 - agno = 4 bad nblocks 10214 for inode 134217859, would reset to 16761 bad nextents 22 for inode 134217859, would reset to 34 - agno = 5 bad nblocks 23581 for inode 167772290, would reset to 27557 bad nextents 39 for inode 167772290, would reset to 45 - agno = 6 bad nblocks 14527 for inode 201326722, would reset to 15697 bad nextents 31 for inode 201326722, would reset to 34 bad nblocks 12633 for inode 201326723, would reset to 16647 bad nextents 23 for inode 201326723, would reset to 35 - agno = 7 bad nblocks 26638 for inode 234881154, would reset to 27557 bad nextents 53 for inode 234881154, would reset to 54 bad nblocks 85653 for inode 234881155, would reset to 85664 bad nextents 310 for inode 234881155, would reset to 311 - agno = 8 bad nblocks 23241 for inode 268640387, would reset to 27565 bad nextents 32 for inode 268640387, would reset to 42 bad nblocks 81766 for inode 268640388, would reset to 86012 bad nextents 332 for inode 268640388, would reset to 344 - agno = 9 entry "iozone_68.tst" in shortform directory 301990016 references free inode 301990019 would have junked entry "iozone_68.tst" in directory inode 301990016 data fork in ino 301990019 claims dup extent, off - 26402, start - 19129002, cnt 450 bad data fork in inode 301990019 would have cleared inode 301990019 bad nblocks 70282 for inode 301990020, would reset to 71793 bad nextents 281 for inode 301990020, would reset to 294 - agno = 10 entry "iozone_68.tst" in shortform directory 335544448 references free inode 335544451 would have junked entry "iozone_68.tst" in directory inode 335544448 bad nblocks 11261 for inode 335544451, would reset to 19853 bad nextents 24 for inode 335544451, would reset to 41 imap claims in-use inode 335544451 is free, correcting imap bad nblocks 119952 for inode 335544452, would reset to 121178 bad nextents 301 for inode 335544452, would reset to 312 - agno = 11 bad nblocks 24361 for inode 369098883, would reset to 29553 bad nextents 51 for inode 369098883, would reset to 57 bad nblocks 3173 for inode 369098884, would reset to 5851 bad nextents 10 for inode 369098884, would reset to 18 - agno = 12 entry "iozone_68.tst" in shortform directory 402653313 references free inode 402653318 would have junked entry "iozone_68.tst" in directory inode 402653313 bad nblocks 16348 for inode 402653317, would reset to 21485 bad nextents 28 for inode 402653317, would reset to 37 data fork in ino 402653318 claims dup extent, off - 124142, start - 29379669, cnt 2 bad data fork in inode 402653318 would have cleared inode 402653318 - agno = 13 bad nblocks 18374 for inode 436207747, would reset to 19991 bad nextents 43 for inode 436207747, would reset to 47 bad nblocks 38390 for inode 436207748, would reset to 38914 bad nextents 300 for inode 436207748, would reset to 304 - agno = 14 bad nblocks 20267 for inode 469762178, would reset to 23089 bad nextents 41 for inode 469762178, would reset to 45 - agno = 15 entry "iozone_68.tst" in shortform directory 503316608 references free inode 503316609 would have junked entry "iozone_68.tst" in directory inode 503316608 imap claims in-use inode 503316609 is free, correcting imap libxfs_bcache: 0x100020b0 Max supported entries = 524288 Max utilized entries = 562 Active entries = 562 Hash table size = 65536 Hits = 1009 Misses = 564 Hit ratio = 64.00 Hash buckets with 0 entries 65116 ( 0%) Hash buckets with 1 entries 391 ( 69%) Hash buckets with 2 entries 20 ( 7%) Hash buckets with 3 entries 1 ( 0%) Hash buckets with 15 entries 1 ( 2%) Hash buckets with 16 entries 6 ( 17%) Hash buckets with 17 entries 1 ( 3%) Fri Jun 27 02:13:08 2008 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem starting at / ... - agno = 0 - agno = 1 - agno = 2 entry "iozone_68.tst" in shortform directory inode 67108993 points to free inode 67108995 would junk entry "iozone_68.tst" - agno = 3 entry "iozone_68.tst" in shortform directory inode 100663425 points to free inode 100663427 would junk entry "iozone_68.tst" - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 entry "iozone_68.tst" in shortform directory inode 301990016 points to free inode 301990019 would junk entry "iozone_68.tst" - agno = 10 - agno = 11 - agno = 12 entry "iozone_68.tst" in shortform directory inode 402653313 points to free inode 402653318 would junk entry "iozone_68.tst" - agno = 13 - agno = 14 - agno = 15 - traversal finished ... - traversing all unattached subtrees ... - traversals finished ... - moving disconnected inodes to lost+found ... libxfs_icache: 0x10002050 Max supported entries = 524288 Max utilized entries = 42 Active entries = 42 Hash table size = 65536 Hits = 0 Misses = 42 Hit ratio = 0.00 Hash buckets with 0 entries 65524 ( 0%) Hash buckets with 1 entries 9 ( 21%) Hash buckets with 6 entries 1 ( 14%) Hash buckets with 12 entries 1 ( 28%) Hash buckets with 15 entries 1 ( 35%) libxfs_bcache: 0x100020b0 Max supported entries = 524288 Max utilized entries = 562 Active entries = 17 Hash table size = 65536 Hits = 1035 Misses = 581 Hit ratio = 64.00 Hash buckets with 0 entries 65533 ( 0%) Hash buckets with 1 entries 2 ( 11%) Hash buckets with 15 entries 1 ( 88%) Fri Jun 27 02:13:10 2008 Phase 7 - verify link counts... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 - agno = 14 - agno = 15 libxfs_icache: 0x10002050 Max supported entries = 524288 Max utilized entries = 42 Active entries = 42 Hash table size = 65536 Hits = 0 Misses = 42 Hit ratio = 0.00 Hash buckets with 0 entries 65524 ( 0%) Hash buckets with 1 entries 9 ( 21%) Hash buckets with 6 entries 1 ( 14%) Hash buckets with 12 entries 1 ( 28%) Hash buckets with 15 entries 1 ( 35%) libxfs_bcache: 0x100020b0 Max supported entries = 524288 Max utilized entries = 562 Active entries = 16 Hash table size = 65536 Hits = 1051 Misses = 597 Hit ratio = 63.00 Hash buckets with 0 entries 65534 ( 0%) Hash buckets with 1 entries 1 ( 6%) Hash buckets with 15 entries 1 ( 93%) Fri Jun 27 02:13:17 2008 No modify flag set, skipping filesystem flush and exiting. So there are several bad blocks and extents present in all ag, which are causing the problem. top output reveals that all cp are in D state PID USER STATUS RSS PPID %CPU %MEM COMMAND 7455 root R 984 1892 7.4 0.7 top 6100 root D 524 1973 2.9 0.4 cp 6799 root R 524 1983 2.9 0.4 cp 6796 root D 524 2125 2.9 0.4 cp 6074 root D 524 2109 1.4 0.4 cp 6097 root D 524 1979 1.4 0.4 cp 6076 root D 524 1975 1.4 0.4 cp 6738 root D 524 2123 1.4 0.4 cp 6759 root D 524 2115 1.4 0.4 cp 7035 root D 524 1977 1.4 0.4 cp 7440 root D 520 1985 1.4 0.4 cp 73 root SW< 0 6 1.4 0.0 xfsdatad/0 67 root SW 0 6 1.4 0.0 pdflush ... .. This means that they are waiting for an IO and sleeping in system call but not able to come out as several inodes are corrupted. And hence the script never gets completed. Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-27 10:13 ` Sagar Borikar @ 2008-06-27 10:25 ` Sagar Borikar 2008-06-28 0:05 ` Dave Chinner 2008-06-28 0:02 ` Dave Chinner 1 sibling, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-06-27 10:25 UTC (permalink / raw) To: xfs Dave, I also got continuous exceptions XFS internal error XFS_WANT_CORRUPTED_RETURN at line 296 of file fs/xfs/xfs_alloc.c. Caller 0x802962c0 Call Trace: [<80109888>] dump_stack+0x18/0x44 [<802c3550>] xfs_error_report+0x58/0x64 [<802965a0>] xfs_alloc_fixup_trees+0x39c/0x3dc [<80297850>] xfs_alloc_ag_vextent_size+0x3ec/0x4f4 [<80296708>] xfs_alloc_ag_vextent+0x5c/0x15c [<8029910c>] xfs_alloc_vextent+0x430/0x604 [<802a9420>] xfs_bmap_btalloc+0x6b0/0x950 [<802a9700>] xfs_bmap_alloc+0x40/0x4c [<802acc80>] xfs_bmapi+0x8d8/0x13e4 [<802d4230>] xfs_iomap_write_allocate+0x340/0x5d8 [<802d2b18>] xfs_iomap+0x408/0x4dc [<802fe8bc>] xfs_bmap+0x30/0x3c [<802f3cac>] xfs_map_blocks+0x50/0x84 [<802f50dc>] xfs_page_state_convert+0x3f4/0x840 [<802f560c>] xfs_vm_writepage+0xe4/0x140 [<80198758>] mpage_writepages+0x24c/0x45c [<802f5698>] xfs_vm_writepages+0x30/0x3c [<801507b4>] do_writepages+0x44/0x84 [<80196628>] __sync_single_inode+0x68/0x234 [<80196980>] __writeback_single_inode+0x18c/0x1ac [<80196ba8>] sync_sb_inodes+0x208/0x2f0 [<80196d14>] writeback_inodes+0x84/0xd0 [<80150160>] balance_dirty_pages+0xd8/0x1d4 [<801502a8>] balance_dirty_pages_ratelimited_nr+0x4c/0x58 [<8014c0a4>] generic_file_buffered_write+0x534/0x650 [<802fe4c8>] xfs_write+0x768/0xaac [<802f8c30>] xfs_file_aio_write+0x88/0x94 [<8016d8d4>] do_sync_write+0xcc/0x124 [<8016d9e4>] vfs_write+0xb8/0x1a0 [<8016dbb8>] sys_write+0x54/0x98 [<8010c180>] stack_done+0x20/0x3c So memory was also not available for pdflush threads to flush the data back to disks. But when I checked memory stats, around 260KB of buffers were available with sufficient free memory We are running 8k kernel stack with MIPS architecture. Also pdflush threads were stalled in uninterruptible state. Do you see any issues in the available memory as well? Thanks Sagar Sagar Borikar wrote: > > Dave Chinner wrote: >> [please wrap your replies at 72 columns] >> >> On Wed, Jun 25, 2008 at 11:46:59PM -0700, Sagar Borikar wrote: >> >> Yes, but all the same pattern of corruption, so it is likely >> that it is one problem. >> >> All I can suggest is working out a reproducable test case in your >> development environment, attaching a debugger and start digging around >> in memory when the problem is hit and try to find out exactly what >> is corrupted. If you can't reproduce it or work out what is >> occurring to trigger the problem, then we're not going to be able to >> find the cause... >> >> Cheers, >> >> Dave. >> > Thanks Dave > I did some experiments today with the corrupted filesystem. > setup : NAS box contains one volume /share and 10 subdirectories. > In first subdirectory sh1, I kept 512MB file. Through a script I > continuously copy this file > simultaneously from sh2 to sh10 subdirectories. > The script looks like > .... > while [ 1 ] > do > cp $1 $2 > done > > > And when I check the process status using top, almost all the cp > processes are in > uninterruptible sleep state continuously. Ran xfs_repair with -n > option on filesystem mounted on JBOD > Here is the output : > > > > Fri Jun 27 02:13:01 2008 > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 1 > bad nblocks 8788 for inode 33554562, would reset to 15461 > bad nextents 18 for inode 33554562, would reset to 32 > - agno = 2 > entry "iozone_68.tst" in shortform directory 67108993 references free > inode 67108995 > would have junked entry "iozone_68.tst" in directory inode 67108993 > data fork in ino 67108995 claims dup extent, off - 252, start - > 14711445, cnt 576 > bad data fork in inode 67108995 > would have cleared inode 67108995 > - agno = 3 > entry "iozone_68.tst" in shortform directory 100663425 references free > inode 100663427 > would have junked entry "iozone_68.tst" in directory inode 100663425 > inode 100663427 - bad extent starting block number 906006917242880, > offset 2533274882670609 > bad data fork in inode 100663427 > would have cleared inode 100663427 > - agno = 4 > bad nblocks 10214 for inode 134217859, would reset to 16761 > bad nextents 22 for inode 134217859, would reset to 34 > - agno = 5 > bad nblocks 23581 for inode 167772290, would reset to 27557 > bad nextents 39 for inode 167772290, would reset to 45 > - agno = 6 > bad nblocks 14527 for inode 201326722, would reset to 15697 > bad nextents 31 for inode 201326722, would reset to 34 > bad nblocks 12633 for inode 201326723, would reset to 16647 > bad nextents 23 for inode 201326723, would reset to 35 > - agno = 7 > bad nblocks 26638 for inode 234881154, would reset to 27557 > bad nextents 53 for inode 234881154, would reset to 54 > bad nblocks 85653 for inode 234881155, would reset to 85664 > bad nextents 310 for inode 234881155, would reset to 311 > - agno = 8 > bad nblocks 23241 for inode 268640387, would reset to 27565 > bad nextents 32 for inode 268640387, would reset to 42 > bad nblocks 81766 for inode 268640388, would reset to 86012 > bad nextents 332 for inode 268640388, would reset to 344 > - agno = 9 > entry "iozone_68.tst" in shortform directory 301990016 references free > inode 301990019 > would have junked entry "iozone_68.tst" in directory inode 301990016 > data fork in ino 301990019 claims dup extent, off - 26402, start - > 19129002, cnt 450 > bad data fork in inode 301990019 > would have cleared inode 301990019 > bad nblocks 70282 for inode 301990020, would reset to 71793 > bad nextents 281 for inode 301990020, would reset to 294 > - agno = 10 > entry "iozone_68.tst" in shortform directory 335544448 references free > inode 335544451 > would have junked entry "iozone_68.tst" in directory inode 335544448 > bad nblocks 11261 for inode 335544451, would reset to 19853 > bad nextents 24 for inode 335544451, would reset to 41 > imap claims in-use inode 335544451 is free, correcting imap > bad nblocks 119952 for inode 335544452, would reset to 121178 > bad nextents 301 for inode 335544452, would reset to 312 > - agno = 11 > bad nblocks 24361 for inode 369098883, would reset to 29553 > bad nextents 51 for inode 369098883, would reset to 57 > bad nblocks 3173 for inode 369098884, would reset to 5851 > bad nextents 10 for inode 369098884, would reset to 18 > - agno = 12 > entry "iozone_68.tst" in shortform directory 402653313 references free > inode 402653318 > would have junked entry "iozone_68.tst" in directory inode 402653313 > bad nblocks 16348 for inode 402653317, would reset to 21485 > bad nextents 28 for inode 402653317, would reset to 37 > data fork in ino 402653318 claims dup extent, off - 124142, start - > 29379669, cnt 2 > bad data fork in inode 402653318 > would have cleared inode 402653318 > - agno = 13 > bad nblocks 18374 for inode 436207747, would reset to 19991 > bad nextents 43 for inode 436207747, would reset to 47 > bad nblocks 38390 for inode 436207748, would reset to 38914 > bad nextents 300 for inode 436207748, would reset to 304 > - agno = 14 > bad nblocks 20267 for inode 469762178, would reset to 23089 > bad nextents 41 for inode 469762178, would reset to 45 > - agno = 15 > entry "iozone_68.tst" in shortform directory 503316608 references free > inode 503316609 > would have junked entry "iozone_68.tst" in directory inode 503316608 > imap claims in-use inode 503316609 is free, correcting imap > libxfs_bcache: 0x100020b0 > Max supported entries = 524288 > Max utilized entries = 562 > Active entries = 562 > Hash table size = 65536 > Hits = 1009 > Misses = 564 > Hit ratio = 64.00 > Hash buckets with 0 entries 65116 ( 0%) > Hash buckets with 1 entries 391 ( 69%) > Hash buckets with 2 entries 20 ( 7%) > Hash buckets with 3 entries 1 ( 0%) > Hash buckets with 15 entries 1 ( 2%) > Hash buckets with 16 entries 6 ( 17%) > Hash buckets with 17 entries 1 ( 3%) > Fri Jun 27 02:13:08 2008 > No modify flag set, skipping phase 5 > Phase 6 - check inode connectivity... > - traversing filesystem starting at / ... > - agno = 0 > - agno = 1 > - agno = 2 > entry "iozone_68.tst" in shortform directory inode 67108993 points to > free inode 67108995 > would junk entry "iozone_68.tst" > - agno = 3 > entry "iozone_68.tst" in shortform directory inode 100663425 points to > free inode 100663427 > would junk entry "iozone_68.tst" > - agno = 4 > - agno = 5 > - agno = 6 > - agno = 7 > - agno = 8 > - agno = 9 > entry "iozone_68.tst" in shortform directory inode 301990016 points to > free inode 301990019 > would junk entry "iozone_68.tst" > - agno = 10 > - agno = 11 > - agno = 12 > entry "iozone_68.tst" in shortform directory inode 402653313 points to > free inode 402653318 > would junk entry "iozone_68.tst" > - agno = 13 > - agno = 14 > - agno = 15 > - traversal finished ... > - traversing all unattached subtrees ... > - traversals finished ... > - moving disconnected inodes to lost+found ... > libxfs_icache: 0x10002050 > Max supported entries = 524288 > Max utilized entries = 42 > Active entries = 42 > Hash table size = 65536 > Hits = 0 > Misses = 42 > Hit ratio = 0.00 > Hash buckets with 0 entries 65524 ( 0%) > Hash buckets with 1 entries 9 ( 21%) > Hash buckets with 6 entries 1 ( 14%) > Hash buckets with 12 entries 1 ( 28%) > Hash buckets with 15 entries 1 ( 35%) > libxfs_bcache: 0x100020b0 > Max supported entries = 524288 > Max utilized entries = 562 > Active entries = 17 > Hash table size = 65536 > Hits = 1035 > Misses = 581 > Hit ratio = 64.00 > Hash buckets with 0 entries 65533 ( 0%) > Hash buckets with 1 entries 2 ( 11%) > Hash buckets with 15 entries 1 ( 88%) > Fri Jun 27 02:13:10 2008 > Phase 7 - verify link counts... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - agno = 4 > - agno = 5 > - agno = 6 > - agno = 7 > - agno = 8 > - agno = 9 > - agno = 10 > - agno = 11 > - agno = 12 > - agno = 13 > - agno = 14 > - agno = 15 > libxfs_icache: 0x10002050 > Max supported entries = 524288 > Max utilized entries = 42 > Active entries = 42 > Hash table size = 65536 > Hits = 0 > Misses = 42 > Hit ratio = 0.00 > Hash buckets with 0 entries 65524 ( 0%) > Hash buckets with 1 entries 9 ( 21%) > Hash buckets with 6 entries 1 ( 14%) > Hash buckets with 12 entries 1 ( 28%) > Hash buckets with 15 entries 1 ( 35%) > libxfs_bcache: 0x100020b0 > Max supported entries = 524288 > Max utilized entries = 562 > Active entries = 16 > Hash table size = 65536 > Hits = 1051 > Misses = 597 > Hit ratio = 63.00 > Hash buckets with 0 entries 65534 ( 0%) > Hash buckets with 1 entries 1 ( 6%) > Hash buckets with 15 entries 1 ( 93%) > Fri Jun 27 02:13:17 2008 > No modify flag set, skipping filesystem flush and exiting. > > So there are several bad blocks and extents present in all ag, > which are causing the problem. > top output reveals that all cp are in D state > PID USER STATUS RSS PPID %CPU %MEM COMMAND > 7455 root R 984 1892 7.4 0.7 top > 6100 root D 524 1973 2.9 0.4 cp > 6799 root R 524 1983 2.9 0.4 cp > 6796 root D 524 2125 2.9 0.4 cp > 6074 root D 524 2109 1.4 0.4 cp > 6097 root D 524 1979 1.4 0.4 cp > 6076 root D 524 1975 1.4 0.4 cp > 6738 root D 524 2123 1.4 0.4 cp > 6759 root D 524 2115 1.4 0.4 cp > 7035 root D 524 1977 1.4 0.4 cp > 7440 root D 520 1985 1.4 0.4 cp > 73 root SW< 0 6 1.4 0.0 xfsdatad/0 > 67 root SW 0 6 1.4 0.0 pdflush > ... > .. > This means that they are waiting for an IO and sleeping in system > call but not able > to come out as several inodes are corrupted. And hence the script > never gets completed. > > Thanks > Sagar > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-27 10:25 ` Sagar Borikar @ 2008-06-28 0:05 ` Dave Chinner 2008-06-28 16:47 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-06-28 0:05 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Fri, Jun 27, 2008 at 03:55:05PM +0530, Sagar Borikar wrote: > > Dave, > > I also got continuous exceptions > > > XFS internal error XFS_WANT_CORRUPTED_RETURN at line 296 of file > fs/xfs/xfs_alloc.c. Caller 0x802962c0 corrupt alloc btree. xfs_repair won't report errors in this btree; it simply rebuilds it. xfs_check will report errors in it, though. > So memory was also not available for pdflush threads to flush the data > back to disks. But when nothing to do with memory availabilty, I think. FWIW, can you send the output of xfs_growfs -n <mntpt> and details of the partitioning and volume config? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash 2008-06-28 0:05 ` Dave Chinner @ 2008-06-28 16:47 ` Sagar Borikar 2008-06-29 21:56 ` Dave Chinner 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-06-28 16:47 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Dave, Attaching required information: > nothing to do with memory availabilty, I think. > FWIW, can you send the output of xfs_growfs -n <mntpt> and details > of the partitioning and volume config? [root@NAS001ee5ab9c85 ~]# xfs_growfs -n /mnt/RAIDA/vol/ meta-data=/dev/RAIDA/vol isize=256 agcount=16, agsize=1638400 blks = sectsz=512 attr=1 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming=version 2 bsize=4096 log =internal bsize=4096 blocks=12800, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 [root@NAS001ee5ab9c85 ~]# cat /etc/fstab /dev/root / ext2 rw,noauto 0 1 proc /proc proc defaults 0 0 devpts /dev/pts devpts defaults,gid=5,mode=620 0 0 tmpfs /tmp tmpfs defaults 0 0 /dev/RAIDA/vol /mnt/RAIDA/vol xfs defaults,usrquota,grpquota 0 0 /mnt/RAIDA/vol/sh /mnt/ftp_dir/sh none rw,bind 0 0 /mnt/RAIDA/vol/.autohome/ /mnt/ftp_dir/homes none rw,bind 0 0 [root@NAS001ee5ab9c85 ~]# fdisk -l Disk /dev/scsibd: 257 MB, 257425408 bytes 8 heads, 32 sectors/track, 1964 cylinders Units = cylinders of 256 * 512 = 131072 bytes Device Boot Start End Blocks Id System /dev/scsibd1 126 286 20608 83 Linux /dev/scsibd2 287 1023 94336 83 Linux /dev/scsibd3 1149 1309 20608 83 Linux /dev/scsibd4 1310 2046 94336 83 Linux Disk /dev/md0: 251.0 GB, 251000160256 bytes 2 heads, 4 sectors/track, 61279336 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md0 doesn't contain a valid partition table Disk /dev/dm-0: 107.3 GB, 107374182400 bytes 255 heads, 63 sectors/track, 13054 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes But still the issue is why doesn't it happen every time and less stress? I am surprised to see to let this happen immediately when the subdirectories increase more than 30. Else it decays slowly. Thanks Sagar Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-28 16:47 ` Sagar Borikar @ 2008-06-29 21:56 ` Dave Chinner 2008-06-30 3:37 ` Sagar Borikar [not found] ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca> 0 siblings, 2 replies; 48+ messages in thread From: Dave Chinner @ 2008-06-29 21:56 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote: > > FWIW, can you send the output of xfs_growfs -n <mntpt> and details > > of the partitioning and volume config? .... > [root@NAS001ee5ab9c85 ~]# cat /etc/fstab > /dev/root / ext2 rw,noauto 0 1 > proc /proc proc defaults 0 0 > devpts /dev/pts devpts defaults,gid=5,mode=620 0 > 0 > tmpfs /tmp tmpfs defaults 0 0 > /dev/RAIDA/vol /mnt/RAIDA/vol xfs defaults,usrquota,grpquota > 0 0 > /mnt/RAIDA/vol/sh /mnt/ftp_dir/sh none rw,bind 0 0 > /mnt/RAIDA/vol/.autohome/ /mnt/ftp_dir/homes none rw,bind > 0 0 > > [root@NAS001ee5ab9c85 ~]# fdisk -l > > Disk /dev/scsibd: 257 MB, 257425408 bytes > 8 heads, 32 sectors/track, 1964 cylinders > Units = cylinders of 256 * 512 = 131072 bytes > > Device Boot Start End Blocks Id System > /dev/scsibd1 126 286 20608 83 Linux > /dev/scsibd2 287 1023 94336 83 Linux > /dev/scsibd3 1149 1309 20608 83 Linux > /dev/scsibd4 1310 2046 94336 83 Linux I'd have to assume thats a flash based root drive, right? > Disk /dev/md0: 251.0 GB, 251000160256 bytes > 2 heads, 4 sectors/track, 61279336 cylinders > Units = cylinders of 8 * 512 = 4096 bytes > > Disk /dev/md0 doesn't contain a valid partition table > > Disk /dev/dm-0: 107.3 GB, 107374182400 bytes > 255 heads, 63 sectors/track, 13054 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes Neither of these tell me what /dev/RAIDA/vol is.... > But still the issue is why doesn't it happen every time and less stress? > > I am surprised to see to let this happen immediately when the > subdirectories increase more than 30. Else it decays slowly. So it happens when you get more than 30 entries in a directory under a certain load? That might be an extent->btree format conversion bug or vice versa. I'd suggest setting up a test based around this to try to narrow down the problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-29 21:56 ` Dave Chinner @ 2008-06-30 3:37 ` Sagar Borikar [not found] ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca> 1 sibling, 0 replies; 48+ messages in thread From: Sagar Borikar @ 2008-06-30 3:37 UTC (permalink / raw) To: xfs Dave Chinner wrote: > On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote: > > Device Boot Start End Blocks Id System >> /dev/scsibd1 126 286 20608 83 Linux >> /dev/scsibd2 287 1023 94336 83 Linux >> /dev/scsibd3 1149 1309 20608 83 Linux >> /dev/scsibd4 1310 2046 94336 83 Linux >> > > I'd have to assume thats a flash based root drive, right? > > That's right, >> Disk /dev/md0: 251.0 GB, 251000160256 bytes >> 2 heads, 4 sectors/track, 61279336 cylinders >> Units = cylinders of 8 * 512 = 4096 bytes >> >> Disk /dev/md0 doesn't contain a valid partition table >> >> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes >> 255 heads, 63 sectors/track, 13054 cylinders >> Units = cylinders of 16065 * 512 = 8225280 bytes >> > > Neither of these tell me what /dev/RAIDA/vol is.... > It is the device node to which /mnt/RAIDA/vol is mapped to. Its a JBOD with 233 GB size. > >> But still the issue is why doesn't it happen every time and less stress? >> >> I am surprised to see to let this happen immediately when the >> subdirectories increase more than 30. Else it decays slowly. >> > > So it happens when you get more than 30 entries in a directory > under a certain load? That might be an extent->btree format > conversion bug or vice versa. I'd suggest setting up a test based > around this to try to narrow down the problem. > > Cheers, > > Dave. > Thanks for all your help. Shall keep you posted with the progress on debugging. Regards Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>]
* Re: Xfs Access to block zero exception and system crash [not found] ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca> @ 2008-06-30 6:07 ` Sagar Borikar 2008-06-30 10:24 ` Sagar Borikar 1 sibling, 0 replies; 48+ messages in thread From: Sagar Borikar @ 2008-06-30 6:07 UTC (permalink / raw) To: xfs Sagar Borikar wrote: > Dave Chinner wrote: >> On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote: >> Device Boot Start End Blocks Id System >>> /dev/scsibd1 126 286 20608 83 Linux >>> /dev/scsibd2 287 1023 94336 83 Linux >>> /dev/scsibd3 1149 1309 20608 83 Linux >>> /dev/scsibd4 1310 2046 94336 83 Linux >>> >> >> I'd have to assume thats a flash based root drive, right? >> >> > That's right, >>> Disk /dev/md0: 251.0 GB, 251000160256 bytes >>> 2 heads, 4 sectors/track, 61279336 cylinders >>> Units = cylinders of 8 * 512 = 4096 bytes >>> >>> Disk /dev/md0 doesn't contain a valid partition table >>> >>> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes >>> 255 heads, 63 sectors/track, 13054 cylinders >>> Units = cylinders of 16065 * 512 = 8225280 bytes >>> >> >> Neither of these tell me what /dev/RAIDA/vol is.... >> It is the device node to which /mnt/RAIDA/vol is mapped to. Its a >> JBOD with 233 GB size. >> >>> But still the issue is why doesn't it happen every time and less >>> stress? >>> >>> I am surprised to see to let this happen immediately when the >>> subdirectories increase more than 30. Else it decays slowly. >>> >> >> So it happens when you get more than 30 entries in a directory >> under a certain load? That might be an extent->btree format >> conversion bug or vice versa. I'd suggest setting up a test based >> around this to try to narrow down the problem. >> >> Cheers, >> >> Dave. >> > Thanks for all your help. Shall keep you posted with the progress on > debugging. > > Regards > Sagar > > Sorry if I was not clear. As I mentioned the frequency of finding bad extents is much higher when I increase simultaneous transactions to 30 ( say in 5 min ) but if I run only two copies in infinite loop, the issue crops up in 2-3 hours roughly. And all the copies plus pdflush are in uninterruptible sleep state continuously. And it is not uninterruptible sleep and waiting state ( DW ) but just uninterruptible ( D ). Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash [not found] ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca> 2008-06-30 6:07 ` Sagar Borikar @ 2008-06-30 10:24 ` Sagar Borikar 2008-07-01 6:44 ` Dave Chinner 1 sibling, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-06-30 10:24 UTC (permalink / raw) To: xfs Hi Dave, Sagar Borikar wrote: > Dave Chinner wrote: >> On Sat, Jun 28, 2008 at 09:47:44AM -0700, Sagar Borikar wrote: >> Device Boot Start End Blocks Id System >>> /dev/scsibd1 126 286 20608 83 Linux >>> /dev/scsibd2 287 1023 94336 83 Linux >>> /dev/scsibd3 1149 1309 20608 83 Linux >>> /dev/scsibd4 1310 2046 94336 83 Linux >>> >> >> I'd have to assume thats a flash based root drive, right? >> >> > That's right, >>> Disk /dev/md0: 251.0 GB, 251000160256 bytes >>> 2 heads, 4 sectors/track, 61279336 cylinders >>> Units = cylinders of 8 * 512 = 4096 bytes >>> >>> Disk /dev/md0 doesn't contain a valid partition table >>> >>> Disk /dev/dm-0: 107.3 GB, 107374182400 bytes >>> 255 heads, 63 sectors/track, 13054 cylinders >>> Units = cylinders of 16065 * 512 = 8225280 bytes >>> >> >> Neither of these tell me what /dev/RAIDA/vol is.... >> It is the device node to which /mnt/RAIDA/vol is mapped to. Its a >> JBOD with 233 GB size. >> >>> But still the issue is why doesn't it happen every time and less >>> stress? >>> >>> I am surprised to see to let this happen immediately when the >>> subdirectories increase more than 30. Else it decays slowly. >>> >> >> So it happens when you get more than 30 entries in a directory >> under a certain load? That might be an extent->btree format >> conversion bug or vice versa. I'd suggest setting up a test based >> around this to try to narrow down the problem. >> >> Cheers, >> >> Dave. >> > Thanks for all your help. Shall keep you posted with the progress on > debugging. > > Regards > Sagar > After running my test for 20 min, when I check the fragmentation status of file system, I observe that it is severely fragmented. [root@NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol actual 94343, ideal 107, fragmentation factor 99.89% Do you think, this can cause the issue? Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-30 10:24 ` Sagar Borikar @ 2008-07-01 6:44 ` Dave Chinner 2008-07-02 4:18 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-07-01 6:44 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote: > After running my test for 20 min, when I check the fragmentation status > of file system, I observe that it > is severely fragmented. Depends on your definition of fragmentation.... > [root@NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol > actual 94343, ideal 107, fragmentation factor 99.89% And that one is a bad one ;) Still, there are a lot of extents - ~1000 to a file - which will be stressing the btree extent format code. > Do you think, this can cause the issue? Sure - just like any other workload that generates enough extents. Like I said originally, we've fixed so many problems in this code since 2.6.18 I'd suggest that your only sane hope for us to help you track done the problem is to upgrade to a current kernel and go from there.... Cheers,, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-01 6:44 ` Dave Chinner @ 2008-07-02 4:18 ` Sagar Borikar 2008-07-02 5:13 ` Dave Chinner 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-02 4:18 UTC (permalink / raw) To: Sagar Borikar, xfs Dave Chinner wrote: > On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote: > >> After running my test for 20 min, when I check the fragmentation status >> of file system, I observe that it >> is severely fragmented. >> > > Depends on your definition of fragmentation.... > > >> [root@NAS001ee5ab9c85 ~]# xfs_db -c frag -r /dev/RAIDA/vol >> actual 94343, ideal 107, fragmentation factor 99.89% >> > > And that one is a bad one ;) > > Still, there are a lot of extents - ~1000 to a file - which > will be stressing the btree extent format code. > > >> Do you think, this can cause the issue? >> > > Sure - just like any other workload that generates enough > extents. Like I said originally, we've fixed so many problems > in this code since 2.6.18 I'd suggest that your only sane > hope for us to help you track done the problem is to upgrade > to a current kernel and go from there.... > > Cheers,, > > Dave. > Thanks again Dave. But we can't upgrade the kernel as it is already in production and on field. So do you think, periodic cleaning of file system using xfs_fsr can solve the issue? If not, could you kindly direct me what all patches were fixing similar problem? I can try back porting them. Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-02 4:18 ` Sagar Borikar @ 2008-07-02 5:13 ` Dave Chinner 2008-07-02 5:35 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-07-02 5:13 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Wed, Jul 02, 2008 at 09:48:46AM +0530, Sagar Borikar wrote: > Dave Chinner wrote: >> On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote: >> Sure - just like any other workload that generates enough >> extents. Like I said originally, we've fixed so many problems >> in this code since 2.6.18 I'd suggest that your only sane >> hope for us to help you track done the problem is to upgrade >> to a current kernel and go from there.... >> > Thanks again Dave. But we can't upgrade the kernel as it is already in > production and on field. Yes, but you can run it in your test environment where you are reproducing this problem, right? > So do you think, periodic cleaning of file system using xfs_fsr can > solve the issue? No, at best it would only delay the problem (whatever it is). > If not, could you > kindly direct me what all patches were fixing similar problem? I can try > back porting them. I don't have time to try to identify some set of changes from the past 3-4 years that might fix your problem. There may not even be a patch that fixes your problem, which is one of the reasons why I've asked if you can reproduce it on a current kernel.... I pointed you the files that the bug could lie in earlier in the thread. You can find the history of changes to those files via the mainline git repository or via the XFS CVS repository. You'd probably do best to look at the git tree because all the changes are well described in the commit logs and you should be able to isolate ones that fix btree problems fairly easily... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-02 5:13 ` Dave Chinner @ 2008-07-02 5:35 ` Sagar Borikar 2008-07-02 6:13 ` Nathan Scott 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-02 5:35 UTC (permalink / raw) To: Sagar Borikar, xfs Dave Chinner wrote: > On Wed, Jul 02, 2008 at 09:48:46AM +0530, Sagar Borikar wrote: > >> Dave Chinner wrote: >> >>> On Mon, Jun 30, 2008 at 03:54:44PM +0530, Sagar Borikar wrote: >>> Sure - just like any other workload that generates enough >>> extents. Like I said originally, we've fixed so many problems >>> in this code since 2.6.18 I'd suggest that your only sane >>> hope for us to help you track done the problem is to upgrade >>> to a current kernel and go from there.... >>> >>> >> Thanks again Dave. But we can't upgrade the kernel as it is already in >> production and on field. >> > > Yes, but you can run it in your test environment where you are > reproducing this problem, right? > > Unfortunately the architecture is customized mips for which the standard kernel port is not available and we have to port the new kernel in order to try this which is why I was hesitating to do this. >> So do you think, periodic cleaning of file system using xfs_fsr can >> solve the issue? >> > > No, at best it would only delay the problem (whatever it is). > > >> If not, could you >> kindly direct me what all patches were fixing similar problem? I can try >> back porting them. >> > > I don't have time to try to identify some set of changes from the > past 3-4 years that might fix your problem. There may not even be a > patch that fixes your problem, which is one of the reasons why I've > asked if you can reproduce it on a current kernel.... > > I pointed you the files that the bug could lie in earlier in the > thread. You can find the history of changes to those files via the > mainline git repository or via the XFS CVS repository. You'd > probably do best to look at the git tree because all the changes are > well described in the commit logs and you should be able to isolate > ones that fix btree problems fairly easily... > > Cheers, > > Dave. > Sure I'll go through these changelogs. Thanks for all your help and really appreciate your time. I hope you don't mind to help me in future if I find something new :) Regards, Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-02 5:35 ` Sagar Borikar @ 2008-07-02 6:13 ` Nathan Scott 2008-07-02 6:56 ` Dave Chinner 0 siblings, 1 reply; 48+ messages in thread From: Nathan Scott @ 2008-07-02 6:13 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Wed, 2008-07-02 at 11:05 +0530, Sagar Borikar wrote: > > Unfortunately the architecture is customized mips for which the > standard > kernel port is > not available and we have to port the new kernel in order to try > this > which is why I was > hesitating to do this. You can always try the reverse - replace fs/xfs from your mips build tree with the one from the current/a recent kernel. Theres very few changes in the surrounding kernel code that xfs needs. cheers. -- Nathan ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-02 6:13 ` Nathan Scott @ 2008-07-02 6:56 ` Dave Chinner 2008-07-02 11:02 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-07-02 6:56 UTC (permalink / raw) To: Nathan Scott; +Cc: Sagar Borikar, xfs, sandeen On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote: > On Wed, 2008-07-02 at 11:05 +0530, Sagar Borikar wrote: > > > > Unfortunately the architecture is customized mips for which the > > standard > > kernel port is > > not available and we have to port the new kernel in order to try > > this > > which is why I was > > hesitating to do this. > > You can always try the reverse - replace fs/xfs from your mips build > tree with the one from the current/a recent kernel. Theres very few > changes in the surrounding kernel code that xfs needs. Eric should be able to comment on the pitfalls in doing this having tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric - any comments? Cheers, Dave. -- Dave Chinner dchinner@agami.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-02 6:56 ` Dave Chinner @ 2008-07-02 11:02 ` Sagar Borikar 2008-07-03 4:03 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-02 11:02 UTC (permalink / raw) To: Nathan Scott, Sagar Borikar, xfs, sandeen Dave Chinner wrote: > On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote: > >> On Wed, 2008-07-02 at 11:05 +0530, Sagar Borikar wrote: >> >>> Unfortunately the architecture is customized mips for which the >>> standard >>> kernel port is >>> not available and we have to port the new kernel in order to try >>> this >>> which is why I was >>> hesitating to do this. >>> >> You can always try the reverse - replace fs/xfs from your mips build >> tree with the one from the current/a recent kernel. Theres very few >> changes in the surrounding kernel code that xfs needs. >> > > Eric should be able to comment on the pitfalls in doing this having > tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric - > any comments? > > Cheers, > > Dave. > Eric, Could you please let me know about bits and pieces that we need to remember while back porting xfs to 2.6.18? If you share patches which takes care of it, that would be great. Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-02 11:02 ` Sagar Borikar @ 2008-07-03 4:03 ` Eric Sandeen 2008-07-03 5:14 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-03 4:03 UTC (permalink / raw) To: Sagar Borikar; +Cc: Nathan Scott, xfs Sagar Borikar wrote: > > Dave Chinner wrote: >> On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote: >>> You can always try the reverse - replace fs/xfs from your mips build >>> tree with the one from the current/a recent kernel. Theres very few >>> changes in the surrounding kernel code that xfs needs. >>> >> Eric should be able to comment on the pitfalls in doing this having >> tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric - >> any comments? >> >> Cheers, >> >> Dave. >> > Eric, Could you please let me know about bits and pieces that we need to > remember while back porting xfs to 2.6.18? > If you share patches which takes care of it, that would be great. http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2 should be pretty close. It was quick 'n' dirty and it has some warts but would give an idea of what backporting was done (see patches/ and the associated quilt series; quilt push -a to apply them all) -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-03 4:03 ` Eric Sandeen @ 2008-07-03 5:14 ` Sagar Borikar 2008-07-03 15:02 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-03 5:14 UTC (permalink / raw) To: Eric Sandeen; +Cc: Nathan Scott, xfs Eric Sandeen wrote: > Sagar Borikar wrote: > >> Dave Chinner wrote: >> >>> On Wed, Jul 02, 2008 at 04:13:11PM +1000, Nathan Scott wrote: >>> > > > >>>> You can always try the reverse - replace fs/xfs from your mips build >>>> tree with the one from the current/a recent kernel. Theres very few >>>> changes in the surrounding kernel code that xfs needs. >>>> >>>> >>> Eric should be able to comment on the pitfalls in doing this having >>> tried to backport a 2.6.25 fs/xfs to a 2.6.18 RHEL kernel. Eric - >>> any comments? >>> >>> Cheers, >>> >>> Dave. >>> >>> >> Eric, Could you please let me know about bits and pieces that we need to >> remember while back porting xfs to 2.6.18? >> If you share patches which takes care of it, that would be great. >> > > http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2 > > should be pretty close. It was quick 'n' dirty and it has some warts > but would give an idea of what backporting was done (see patches/ and > the associated quilt series; quilt push -a to apply them all) > Thanks a lot Eric. I'll go through it .I am actually trying another option of regularly defragmenting the file system under stress. I wanted to understand couple of things for using xfs_fsr utility: 1. What should be the state of filesystem when I am running xfs_fsr. Ideally we should stop all io before running defragmentation. 2. How effective is the utility when ran on highly fragmented file system? I saw that if filesystem is 99.89% fragmented, the recovery is very slow. It took around 25 min to clean up 100GB JBOD volume and after that system was fragmented to 82%. So I was confused on how exactly the fragmentation works. Any pointers on probable optimum use of xfs_fsr? 3. Any precautions I need to take when working with that from data consistency, robustness point of view? Any disadvantages? 4. Any threshold for starting the defragmentation on xfs? Thanks Sagar > -Eric > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-03 5:14 ` Sagar Borikar @ 2008-07-03 15:02 ` Eric Sandeen 2008-07-04 10:18 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-03 15:02 UTC (permalink / raw) To: Sagar Borikar; +Cc: Nathan Scott, xfs Sagar Borikar wrote: > > Eric Sandeen wrote: >>> Eric, Could you please let me know about bits and pieces that we need to >>> remember while back porting xfs to 2.6.18? >>> If you share patches which takes care of it, that would be great. >>> >> http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2 >> >> should be pretty close. It was quick 'n' dirty and it has some warts >> but would give an idea of what backporting was done (see patches/ and >> the associated quilt series; quilt push -a to apply them all) >> > Thanks a lot Eric. I'll go through it .I am actually trying another > option of regularly defragmenting the file system under stress. Ok, but that won't get to the bottom of the problem. It might alleviate it at best, but if I were shipping a product using xfs I'd want to know that it was properly solved. :) The tarball above should give you almost everything you need to run your testcase with current xfs code on your older kernel to see if the bug persists or if it's been fixed upstream, in which case you have a relatively easy path to an actual solution that your customers can depend on. > I wanted to understand couple of things for using xfs_fsr utility: > > 1. What should be the state of filesystem when I am running xfs_fsr. > Ideally we should stop all io before running defragmentation. you can run in any state. Some files will not get defragmented due to busy-ness or other conditions; look at the xfs_swap_extents() function in the kernel which is very well documented; some cases return EBUSY. > 2. How effective is the utility when ran on highly fragmented file > system? I saw that if filesystem is 99.89% fragmented, the recovery is > very slow. It took around 25 min to clean up 100GB JBOD volume and after > that system was fragmented to 82%. So I was confused on how exactly the > fragmentation works. Again read the code, but basically it tries to preallocate as much space as the file is currently using, then checks that it is more contiguous space than the file currently has and if so, it copies the data from old to new and swaps the new allocation for the old. Note, this involves a fair amount of IO. Also don't get hung up on that fragmentation factor, at least not until you've read xfs_db code to see how it's reported, and you've thought about what that means. For example: a 100G filesystem with 10 10G files each with 5x2G extents will report 80% fragmentation. Now, ask yourself, is a 10G file in 5x2G extents "bad" fragmentation? > Any pointers on probable optimum use of xfs_fsr? > 3. Any precautions I need to take when working with that from data > consistency, robustness point of view? Any disadvantages? Anything which corrupts data is a bug, and I'm not aware of any such bugs in the defragmentation process. > 4. Any threshold for starting the defragmentation on xfs? Pretty well determined by your individual use case and requirements, I think. -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-03 15:02 ` Eric Sandeen @ 2008-07-04 10:18 ` Sagar Borikar 2008-07-04 12:27 ` Dave Chinner 2008-07-04 15:33 ` Eric Sandeen 0 siblings, 2 replies; 48+ messages in thread From: Sagar Borikar @ 2008-07-04 10:18 UTC (permalink / raw) To: Eric Sandeen; +Cc: Nathan Scott, xfs [-- Attachment #1: Type: text/plain, Size: 4198 bytes --] Eric Sandeen wrote: > Sagar Borikar wrote: > >> Eric Sandeen wrote: >> > > > >>>> Eric, Could you please let me know about bits and pieces that we need to >>>> remember while back porting xfs to 2.6.18? >>>> If you share patches which takes care of it, that would be great. >>>> >>>> >>> http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2 >>> >>> should be pretty close. It was quick 'n' dirty and it has some warts >>> but would give an idea of what backporting was done (see patches/ and >>> the associated quilt series; quilt push -a to apply them all) >>> >>> >> Thanks a lot Eric. I'll go through it .I am actually trying another >> option of regularly defragmenting the file system under stress. >> > > Ok, but that won't get to the bottom of the problem. It might alleviate > it at best, but if I were shipping a product using xfs I'd want to know > that it was properly solved. :) > > Even we too don't want to leave it as it is. I still am working on back porting the latest xfs code. Your patches are helping a lot . Just to check whether that issue lies with 2.6.18 or MIPS port, I tested it on 2.6.24 x86 platform. Here we created a loop back device of 10 GB and mounted xfs on that. What I observe that xfs_repair reports quite a few bad blocks and bad extents here as well. So is developing bad blocks and extents normal behavior in xfs which would be recovered in background or is it a bug? I still didn't see the exception but the bad blocks and extents are generated within 10 minutes or running the tests. Attaching the log . > The tarball above should give you almost everything you need to run your > testcase with current xfs code on your older kernel to see if the bug > persists or if it's been fixed upstream, in which case you have a > relatively easy path to an actual solution that your customers can > depend on. > > >> I wanted to understand couple of things for using xfs_fsr utility: >> >> 1. What should be the state of filesystem when I am running xfs_fsr. >> Ideally we should stop all io before running defragmentation. >> > > you can run in any state. Some files will not get defragmented due to > busy-ness or other conditions; look at the xfs_swap_extents() function > in the kernel which is very well documented; some cases return EBUSY. > > >> 2. How effective is the utility when ran on highly fragmented file >> system? I saw that if filesystem is 99.89% fragmented, the recovery is >> very slow. It took around 25 min to clean up 100GB JBOD volume and after >> that system was fragmented to 82%. So I was confused on how exactly the >> fragmentation works. >> > > Again read the code, but basically it tries to preallocate as much space > as the file is currently using, then checks that it is more contiguous > space than the file currently has and if so, it copies the data from old > to new and swaps the new allocation for the old. Note, this involves a > fair amount of IO. > > Also don't get hung up on that fragmentation factor, at least not until > you've read xfs_db code to see how it's reported, and you've thought > about what that means. For example: a 100G filesystem with 10 10G files > each with 5x2G extents will report 80% fragmentation. Now, ask > yourself, is a 10G file in 5x2G extents "bad" fragmentation? > > Agreed as in x86 too I see 99.12% fragmentation when I run above mentioned test. and xfs_fsr doesn't help much even after freezing the file system. >> Any pointers on probable optimum use of xfs_fsr? >> 3. Any precautions I need to take when working with that from data >> consistency, robustness point of view? Any disadvantages? >> > > Anything which corrupts data is a bug, and I'm not aware of any such > bugs in the defragmentation process. > > Assuming that we get some improvement by running xfs_fsr, is it safe to run regularly in some periodic interval the defragmentation utility? >> 4. Any threshold for starting the defragmentation on xfs? >> > > Pretty well determined by your individual use case and requirements, I > think. > > -Eric > Thanks for the detailed response Eric. Sagar [-- Attachment #2: xfs_repair_log --] [-- Type: text/plain, Size: 4444 bytes --] bad nblocks 13345 for inode 50331785, would reset to 19431 bad nextents 156 for inode 50331785, would reset to 251 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 entry "testfile" in shortform directory 132 references free inode 142 would have junked entry "testfile" in directory inode 132 entry "testfile" in shortform directory 138 references free inode 143 would have junked entry "testfile" in directory inode 138 entry "testfile" in shortform directory 140 references free inode 144 would have junked entry "testfile" in directory inode 140 bad nblocks 15848 for inode 141, would reset to 18634 bad nextents 269 for inode 141, would reset to 306 bad nblocks 18888 for inode 16777350, would reset to 19144 bad nextents 303 for inode 16777350, would reset to 309 bad nblocks 18704 for inode 16777351, would reset to 19144 bad nextents 291 for inode 16777351, would reset to 299 bad fwd (right) sibling pointer (saw 107678 should be NULLDFSBNO) in inode 142 ((null) fork) bmap btree block 236077307437232 would have cleared inode 142 bad fwd (right) sibling pointer (saw 1139882 should be NULLDFSBNO) in inode 143 ((null) fork) bmap btree block 4556402090352816 would have cleared inode 143 bad fwd (right) sibling pointer (saw 1138473 should be NULLDFSBNO) in inode 144 ((null) fork) bmap btree block 4564279060373680 would have cleared inode 144 bad nblocks 13825 for inode 145, would reset to 18503 bad nextents 221 for inode 145, would reset to 222 - agno = 2 entry "testfile" in shortform directory 33595588 references free inode 33595593 would have junked entry "testfile" in directory inode 33595588 bad nblocks 18704 for inode 33595589, would reset to 19121 bad nextents 306 for inode 33595589, would reset to 314 bad nblocks 18704 for inode 33595590, would reset to 19432 bad nextents 302 for inode 33595590, would reset to 313 bad nblocks 18640 for inode 33595591, would reset to 19432 bad nextents 311 for inode 33595591, would reset to 317 bad nblocks 18888 for inode 33595592, would reset to 19432 bad nextents 312 for inode 33595592, would reset to 322 bad fwd (right) sibling pointer (saw 104113 should be NULLDFSBNO) in inode 33595593 ((null) fork) bmap btree block 9041060911947952 would have cleared inode 33595593 - agno = 3 bad nblocks 18888 for inode 50331781, would reset to 19432 bad nextents 315 for inode 50331781, would reset to 324 bad nblocks 18888 for inode 50331782, would reset to 19432 bad nextents 326 for inode 50331782, would reset to 333 bad nblocks 18888 for inode 50331783, would reset to 19432 bad nblocks 18428 for inode 50331784, would reset to 19784 bad nextents 285 for inode 50331784, would reset to 306 bad nblocks 18704 for inode 16777352, would reset to 19144 bad nextents 311 for inode 16777352, would reset to 315 bad nblocks 13345 for inode 50331785, would reset to 19431 bad nextents 156 for inode 50331785, would reset to 251 bad nblocks 18888 for inode 16777353, would reset to 19144 bad nextents 318 for inode 16777353, would reset to 321 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 entry "testfile" in shortform directory inode 132 points to free inode 142would junk entry entry "testfile" in shortform directory inode 138 points to free inode 143would junk entry entry "testfile" in shortform directory inode 140 points to free inode 144would junk entry - agno = 1 - agno = 2 entry "testfile" in shortform directory inode 33595588 points to free inode 33595593would junk entry - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Fri Jul 4 15:34:47 2008 Phase Start End Duration Phase 1: 07/04 15:34:00 07/04 15:34:04 4 seconds Phase 2: 07/04 15:34:04 07/04 15:34:31 27 seconds Phase 3: 07/04 15:34:31 07/04 15:34:47 16 seconds Phase 4: 07/04 15:34:47 07/04 15:34:47 Phase 5: Skipped Phase 6: 07/04 15:34:47 07/04 15:34:47 Phase 7: 07/04 15:34:47 07/04 15:34:47 Total run time: 47 seconds ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-04 10:18 ` Sagar Borikar @ 2008-07-04 12:27 ` Dave Chinner 2008-07-04 17:30 ` Sagar Borikar 2008-07-04 15:33 ` Eric Sandeen 1 sibling, 1 reply; 48+ messages in thread From: Dave Chinner @ 2008-07-04 12:27 UTC (permalink / raw) To: Sagar Borikar; +Cc: Eric Sandeen, Nathan Scott, xfs On Fri, Jul 04, 2008 at 03:48:24PM +0530, Sagar Borikar wrote: > Even we too don't want to leave it as it is. I still am working on back > porting the latest xfs code. > Your patches are helping a lot . > Just to check whether that issue lies with 2.6.18 or MIPS port, I tested > it on 2.6.24 x86 platform. > Here we created a loop back device of 10 GB and mounted xfs on that. And the script that generates the workload can be found where? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash 2008-07-04 12:27 ` Dave Chinner @ 2008-07-04 17:30 ` Sagar Borikar 2008-07-04 17:35 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-04 17:30 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, Nathan Scott, xfs The script is pretty straight forward: While [ 1 ] Do Cp -f $1 $2 Done Where I pass the first parameter as the 300+ MB file in one directory and $2 are is other directory. I run 30 instances of the script in parallel. Thanks Sagar On Fri, Jul 04, 2008 at 03:48:24PM +0530, Sagar Borikar wrote: > Even we too don't want to leave it as it is. I still am working on back > porting the latest xfs code. > Your patches are helping a lot . > Just to check whether that issue lies with 2.6.18 or MIPS port, I tested > it on 2.6.24 x86 platform. > Here we created a loop back device of 10 GB and mounted xfs on that. And the script that generates the workload can be found where? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-04 17:30 ` Sagar Borikar @ 2008-07-04 17:35 ` Eric Sandeen 2008-07-04 17:51 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-04 17:35 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > The script is pretty straight forward: > > While [ 1 ] > Do > Cp -f $1 $2 > Done > > Where I pass the first parameter as the 300+ MB file in one directory > and $2 are is other directory. I run 30 instances of the script in > parallel. Copying the same file to the same directory, or 30 different files to 30 different directories? Or the ame file to 30 different directories? If different directories what is the layout of the target directories? Etc... -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash 2008-07-04 17:35 ` Eric Sandeen @ 2008-07-04 17:51 ` Sagar Borikar 2008-07-05 16:25 ` Eric Sandeen 2008-07-06 4:19 ` Dave Chinner 0 siblings, 2 replies; 48+ messages in thread From: Sagar Borikar @ 2008-07-04 17:51 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs Copy is of the same file to 30 different directories and it is basically overwrite. Here is the setup: It's a JBOD with Volume size 20 GB. The directories are empty and this is basically continuous copy of the file on all thirty directories. But surprisingly none of the copy succeeds. All the copy processes are in Uninterruptible sleep state and xfs_repair log I have already attached With the prep. As mentioned it is with 2.6.24 Fedora kernel. Thanks Sagar -----Original Message----- From: Eric Sandeen [mailto:sandeen@sandeen.net] Sent: Friday, July 04, 2008 11:05 PM To: Sagar Borikar Cc: Dave Chinner; Nathan Scott; xfs@oss.sgi.com Subject: Re: Xfs Access to block zero exception and system crash Sagar Borikar wrote: > The script is pretty straight forward: > > While [ 1 ] > Do > Cp -f $1 $2 > Done > > Where I pass the first parameter as the 300+ MB file in one directory > and $2 are is other directory. I run 30 instances of the script in > parallel. Copying the same file to the same directory, or 30 different files to 30 different directories? Or the ame file to 30 different directories? If different directories what is the layout of the target directories? Etc... -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-04 17:51 ` Sagar Borikar @ 2008-07-05 16:25 ` Eric Sandeen 2008-07-06 17:24 ` Sagar Borikar 2008-07-06 4:19 ` Dave Chinner 1 sibling, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-05 16:25 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > Copy is of the same file to 30 different directories and it is basically > overwrite. > > Here is the setup: > > It's a JBOD with Volume size 20 GB. The directories are empty and this > is basically continuous copy of the file on all thirty directories. But > surprisingly none of the copy succeeds. All the copy processes are in > Uninterruptible sleep state and xfs_repair log I have already attached > With the prep. As mentioned it is with 2.6.24 Fedora kernel. It would probably be best to try a 2.6.26 kernel from rawhide to be sure you're closest to the bleeding edge. I tested on 2.6.24.7-92.fc8 on x86_64, and I did this, specifically, in the root of a 30G xfs fs: # for I in `seq 1 30`; do mkdir dir$I; done # vi copyit.sh (your script) # chmod +x copyit.sh # dd if=/dev/zero of=300mbfile bs=1M count=300 # for I in `seq 1 30`; do ./copyit.sh 300mbfile dir$I & done I got no errors or corruption after several iterations. Might also be worth checking dmesg for any errors when you run. -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash 2008-07-05 16:25 ` Eric Sandeen @ 2008-07-06 17:24 ` Sagar Borikar 2008-07-06 19:07 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-06 17:24 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > Copy is of the same file to 30 different directories and it is basically > overwrite. > > Here is the setup: > > It's a JBOD with Volume size 20 GB. The directories are empty and this > is basically continuous copy of the file on all thirty directories. But > surprisingly none of the copy succeeds. All the copy processes are in > Uninterruptible sleep state and xfs_repair log I have already attached > With the prep. As mentioned it is with 2.6.24 Fedora kernel. It would probably be best to try a 2.6.26 kernel from rawhide to be sure you're closest to the bleeding edge. <Sagar> Sure Eric but I reran the test and I got similar errors with 2.6.24 kernel on x86. I am still confused with the results that I see on 2.6.24 kernel on x86 machine. I see that the used size shown by ls is way too huge than the actual size. Here is the log of the system [root@lab00 ~/test_partition]# ls -lSah total 202M -rw-r--r-- 1 root root 202M Jul 4 14:06 original ---> this I sthe file Which I copy. drwxr-x--- 65 root root 12K Jul 6 21:57 .. -rwxr-xr-x 1 root root 189 Jul 4 16:31 runall -rwxr-xr-x 1 root root 50 Jul 4 16:32 copy drwxr-xr-x 2 root root 45 Jul 6 22:07 . -------> Total size is roughly 202MB. [root@lab00 ~/test_partition]# df -lh . Filesystem Size Used Avail Use% Mounted on /mnt/xfstest 9.6G 7.7G 2.0G 80% /root/test_partition Size reported by df is 7.7G which is complete anomaly here. This is 10GB loopback partition and it mentions that only 2 GB is available. [root@lab00 ~/test_partition]# cat /etc/mtab /dev/mapper/VolGroup00-LogVol00 / ext3 rw 0 0 proc /proc proc rw 0 0 sysfs /sys sysfs rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 /dev/sda1 /boot ext3 rw 0 0 tmpfs /dev/shm tmpfs rw 0 0 automount(pid3151) /net autofs rw,fd=4,pgrp=3151,minproto=2,maxproto=4 0 0 /mnt/xfstest /root/test_partition xfs rw,loop=/dev/loop0 0 0 ---> XFS partition. Here is the fragmentation result [root@lab00 ~/test_partition]# xfs_db -c frag -r /mnt/xfstest actual 7781, ideal 32, fragmentation factor 99.59% Here is the kernel version: [root@lab00 ~/test_partition]# uname -a Linux lab00 2.6.24 #1 SMP Fri Jul 4 12:20:56 IST 2008 i686 i686 i386 GNU/Linux I tested on 2.6.24.7-92.fc8 on x86_64, and I did this, specifically, in the root of a 30G xfs fs: # for I in `seq 1 30`; do mkdir dir$I; done # vi copyit.sh (your script) # chmod +x copyit.sh # dd if=/dev/zero of=300mbfile bs=1M count=300 # for I in `seq 1 30`; do ./copyit.sh 300mbfile dir$I & done I got no errors or corruption after several iterations. <Sagar> Surprising. I see it every time. I do it on 20 GB and 10GB partition on loopback device. When looked for the bad inode, Might also be worth checking dmesg for any errors when you run. <Sagar> dmesg log doesn't give any information. Here is XFS related info: XFS mounting filesystem loop0 Ending clean XFS mount for filesystem: loop0 Which is basically for mounting XFS cleanly. But there is no exception in XFS. Filesystem has become completely sluggish and response time is increased to 3-4 minutes for every command. Not a single copy is complete and all the copy processes are sleeping continuously. Xfs_repair starts reporting severe bugs: - agno = 1 entry "testfile" in shortform directory 16777472 references free inode 16777473 would have junked entry "testfile" in directory inode 16777472 - agno = 0 entry "testfile_3" at block 0 offset 664 in directory inode 128 references free inode 138 would clear inode number in entry at offset 664... entry "testfile_4" at block 0 offset 712 in directory inode 128 references free inode 140 would clear inode number in entry at offset 712... entry "testfile_5" at block 0 offset 760 in directory inode 128 references free inode 142 would clear inode number in entry at offset 760... entry "testfile_6" at block 0 offset 808 in directory inode 128 references free inode 143 would clear inode number in entry at offset 808... entry "testfile_7" at block 0 offset 856 in directory inode 128 references free inode 144 would clear inode number in entry at offset 856... entry "testfile_8" at block 0 offset 904 in directory inode 128 references free inode 146 would clear inode number in entry at offset 904... entry "testfile_9" at block 0 offset 952 in directory inode 128 references free inode 148 would clear inode number in entry at offset 952... entry "testfile_10" at block 0 offset 976 in directory inode 128 references free inode 149 would clear inode number in entry at offset 976... entry "testfile_12" at block 0 offset 1048 in directory inode 128 references free inode 150 would clear inode number in entry at offset 1048... entry "testfile_11" at block 0 offset 1072 in directory inode 128 references free inode 151 would clear inode number in entry at offset 1072... entry "testfile_13" at block 0 offset 1144 in directory inode 128 references free inode 154 data fork in ino 16777473 claims dup extent, off - 5266, start - 2164956, cnt 192 bad data fork in inode 16777473 would have cleared inode 16777473 entry "testfile" in shortform directory 16777474 references free inode 16777475 would have junked entry "testfile" in directory inode 16777474 would clear inode number in entry at offset 1144... entry "testfile_14" at block 0 offset 1168 in directory inode 128 references free inode 155 would clear inode number in entry at offset 1168... entry "testfile_15" at block 0 offset 1240 in directory inode 128 references free inode 156 would clear inode number in entry at offset 1240... entry "testfile_16" at block 0 offset 1264 in directory inode 128 references free inode 157 would clear inode number in entry at offset 1264... entry "testfile_17" at block 0 offset 1336 in directory inode 128 references free inode 160 would clear inode number in entry at offset 1336... entry "testfile_18" at block 0 offset 1360 in directory inode 128 references free inode 161 would clear inode number in entry at offset 1360... entry "testfile_19" at block 0 offset 1432 in directory inode 128 references free inode 162 would clear inode number in entry at offset 1432... entry "testfile_20" at block 0 offset 1456 in directory inode 128 references free inode 163 would clear inode number in entry at offset 1456... entry "testfile_2" at block 0 offset 3032 in directory inode 128 references free inode 137 would clear inode number in entry at offset 3032... data fork in ino 16777475 claims dup extent, off - 8178, start - 3200553, cnt 104 bad data fork in inode 16777475 would have cleared inode 16777475 entry "testfile" in shortform directory 16777476 references free inode 16777477 would have junked entry "testfile" in directory inode 16777476 data fork in ino 16777477 claims dup extent, off - 9402, start - 3221565, cnt 56 bad data fork in inode 16777477 would have cleared inode 16777477 entry "testfile" in shortform directory 16777478 references free inode 16777479 would have junked entry "testfile" in directory inode 16777478 data fork in ino 16777479 claims dup extent, off - 9586, start - 170361, cnt 96 bad data fork in inode 16777479 would have cleared inode 16777479 entry "testfile" in shortform directory 16777480 references free inode 16777481 would have junked entry "testfile" in directory inode 16777480 data fork in ino 16777481 claims dup extent, off - 8338, start - 3203018, cnt 128 bad data fork in inode 16777481 would have cleared inode 16777481 - agno = 2 entry "testfile" in shortform directory 33595712 references free inode 33595713 would have junked entry "testfile" in directory inode 33595712 bad data fork in inode 33595713 would have cleared inode 33595713 entry "testfile" in shortform directory 33595714 references free inode 33595715 would have junked entry "testfile" in directory inode 33595714 imap claims in-use inode 33595715 is free, correcting imap entry "testfile" in shortform directory 33595716 references free inode 33595717 would have junked entry "testfile" in directory inode 33595716 data fork in ino 33595717 claims dup extent, off - 0, start - 3281880, cnt 6180 bad data fork in inode 33595717 would have cleared inode 33595717 entry "testfile" in shortform directory 33595718 references free inode 33595719 would have junked entry "testfile" in directory inode 33595718 bad data fork in inode 33595719 would have cleared inode 33595719 entry "testfile" in shortform directory 33595720 references free inode 33595721 would have junked entry "testfile" in directory inode 33595720 bad data fork in inode 33595721 would have cleared inode 33595721 - agno = 3 entry "testfile" in shortform directory 50331904 references free inode 50331905 would have junked entry "testfile" in directory inode 50331904 bad data fork in inode 50331905 would have cleared inode 50331905 entry "testfile" in shortform directory 50331906 references free inode 50331907 would have junked entry "testfile" in directory inode 50331906 data fork in ino 50331907 claims dup extent, off - 609, start - 3151886, cnt 311 bad data fork in inode 50331907 would have cleared inode 50331907 entry "testfile" in shortform directory 50331908 references free inode 50331909 would have junked entry "testfile" in directory inode 50331908 imap claims in-use inode 50331909 is free, correcting imap entry "testfile" in shortform directory 50331910 references free inode 50331911 would have junked entry "testfile" in directory inode 50331910 bad data fork in inode 50331911 would have cleared inode 50331911 entry "testfile" in shortform directory 50331912 references free inode 50331913 would have junked entry "testfile" in directory inode 50331912 data fork in ino 50331913 claims dup extent, off - 6358, start - 3224389, cnt 469 bad data fork in inode 50331913 would have cleared inode 50331913 data fork in regular inode 133 claims used block 1075592 would have cleared inode 133 data fork in regular inode 136 claims used block 1075930 would have cleared inode 136 data fork in regular inode 137 claims used block 2162044 would have cleared inode 137 data fork in regular inode 138 claims used block 1075938 would have cleared inode 138 entry "testfile" in shortform directory 139 references free inode 141 would have junked entry "testfile" in directory inode 139 data fork in ino 140 claims dup extent, off - 12298, start - 202587, cnt 30 bad data fork in inode 140 would have cleared inode 140 data fork in ino 141 claims dup extent, off - 8562, start - 160071, cnt 384 bad data fork in inode 141 would have cleared inode 141 data fork in ino 142 claims dup extent, off - 1458, start - 80521, cnt 32 bad data fork in inode 142 would have cleared inode 142 data fork in ino 143 claims dup extent, off - 13770, start - 235117, cnt 96 bad data fork in inode 143 would have cleared inode 143 bad magic # 0 in inode 144 (data fork) bmbt block 3262925 bad data fork in inode 144 would have cleared inode 144 entry "testfile" in shortform directory 145 references free inode 147 would have junked entry "testfile" in directory inode 145 data fork in ino 146 claims dup extent, off - 8082, start - 138272, cnt 32 bad data fork in inode 146 would have cleared inode 146 data fork in regular inode 147 claims used block 1075759 would have cleared inode 147 data fork in regular inode 148 claims used block 3231076 would have cleared inode 148 data fork in ino 149 claims dup extent, off - 9426, start - 168635, cnt 8 bad data fork in inode 149 would have cleared inode 149 data fork in ino 150 claims dup extent, off - 3607, start - 105990, cnt 59 bad data fork in inode 150 would have cleared inode 150 data fork in regular inode 151 claims used block 1076476 would have cleared inode 151 entry "testfile" in shortform directory 152 references free inode 153 would have junked entry "testfile" in directory inode 152 bad magic # 0 in inode 153 (data fork) bmbt block 3271407 bad data fork in inode 153 would have cleared inode 153 data fork in regular inode 154 claims used block 1076388 would have cleared inode 154 data fork in regular inode 155 claims used block 1076068 would have cleared inode 155 data fork in regular inode 156 claims used block 3224002 would have cleared inode 156 data fork in ino 157 claims dup extent, off - 9554, start - 170265, cnt 96 bad data fork in inode 157 would have cleared inode 157 entry "testfile" in shortform directory 158 references free inode 159 would have junked entry "testfile" in directory inode 158 data fork in regular inode 159 claims used block 1076564 would have cleared inode 159 data fork in ino 160 claims dup extent, off - 9394, start - 168489, cnt 8 bad data fork in inode 160 would have cleared inode 160 data fork in ino 161 claims dup extent, off - 14662, start - 253175, cnt 32 bad data fork in inode 161 would have cleared inode 161 data fork in regular inode 162 claims used block 2209542 would have cleared inode 162 bad magic # 0 in inode 163 (data fork) bmbt block 3270098 bad data fork in inode 163 would have cleared inode 163 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 entry "testfile_3" in directory inode 128 points to free inode 138, would junk entry entry "testfile_4" in directory inode 128 points to free inode 140, would junk entry entry "testfile_5" in directory inode 128 points to free inode 142, would junk entry entry "testfile_6" in directory inode 128 points to free inode 143, would junk entry entry "testfile_7" in directory inode 128 points to free inode 144, would junk entry entry "testfile_8" in directory inode 128 points to free inode 146, would junk entry entry "testfile_9" in directory inode 128 points to free inode 148, would junk entry entry "testfile_10" in directory inode 128 points to free inode 149, would junk entry entry "testfile_12" in directory inode 128 points to free inode 150, would junk entry entry "testfile_11" in directory inode 128 points to free inode 151, would junk entry entry "testfile_13" in directory inode 128 points to free inode 154, would junk entry entry "testfile_14" in directory inode 128 points to free inode 155, would junk entry entry "testfile_15" in directory inode 128 points to free inode 156, would junk entry entry "testfile_16" in directory inode 128 points to free inode 157, would junk entry entry "testfile_17" in directory inode 128 points to free inode 160, would junk entry entry "testfile_18" in directory inode 128 points to free inode 161, would junk entry entry "testfile_19" in directory inode 128 points to free inode 162, would junk entry entry "testfile_20" in directory inode 128 points to free inode 163, would junk entry entry "testfile_1" in directory inode 128 points to free inode 136, would junk entry entry "testfile_2" in directory inode 128 points to free inode 137, would junk entry bad hash table for directory inode 128 (no data entry): would rebuild entry "testfile" in shortform directory inode 132 points to free inode 133would junk entry entry "testfile" in shortform directory inode 139 points to free inode 141would junk entry entry "testfile" in shortform directory inode 145 points to free inode 147would junk entry entry "testfile" in shortform directory inode 152 points to free inode 153would junk entry entry "testfile" in shortform directory inode 158 points to free inode 159would junk entry - agno = 1 entry "testfile" in shortform directory inode 16777472 points to free inode 16777473would junk entry entry "testfile" in shortform directory inode 16777474 points to free inode 16777475would junk entry entry "testfile" in shortform directory inode 16777476 points to free inode 16777477would junk entry entry "testfile" in shortform directory inode 16777478 points to free inode 16777479would junk entry entry "testfile" in shortform directory inode 16777480 points to free inode 16777481would junk entry - agno = 2 entry "testfile" in shortform directory inode 33595712 points to free inode 33595713would junk entry entry "testfile" in shortform directory inode 33595716 points to free inode 33595717would junk entry entry "testfile" in shortform directory inode 33595718 points to free inode 33595719would junk entry entry "testfile" in shortform directory inode 33595720 points to free inode 33595721would junk entry - agno = 3 entry "testfile" in shortform directory inode 50331904 points to free inode 50331905would junk entry entry "testfile" in shortform directory inode 50331906 points to free inode 50331907would junk entry entry "testfile" in shortform directory inode 50331910 points to free inode 50331911would junk entry entry "testfile" in shortform directory inode 50331912 points to free inode 50331913would junk entry - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Sun Jul 6 22:43:36 2008 Phase Start End Duration Phase 1: 07/06 22:39:18 07/06 22:39:33 15 seconds Phase 2: 07/06 22:39:33 07/06 22:41:47 2 minutes, 14 seconds Phase 3: 07/06 22:41:47 07/06 22:43:15 1 minute, 28 seconds Phase 4: 07/06 22:43:15 07/06 22:43:36 21 seconds Phase 5: Skipped Phase 6: 07/06 22:43:36 07/06 22:43:36 Phase 7: 07/06 22:43:36 07/06 22:43:36 Total run time: 4 minutes, 18 seconds When checked for bad inode in xfs_db, then the parent inode was shown as -1 I presume it should point to right parent directory inode. 1: byte offset 2560065792, length 256 buffer block 5000128 (fsbno 1048592), 8 bbs inode 16777473, dir inode -1, type inode I don't know what I am doing wrong here. Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-06 17:24 ` Sagar Borikar @ 2008-07-06 19:07 ` Eric Sandeen 2008-07-07 3:02 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-06 19:07 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > Sagar Borikar wrote: >> Copy is of the same file to 30 different directories and it is > basically >> overwrite. >> >> Here is the setup: >> >> It's a JBOD with Volume size 20 GB. The directories are empty and this >> is basically continuous copy of the file on all thirty directories. > But >> surprisingly none of the copy succeeds. All the copy processes are in >> Uninterruptible sleep state and xfs_repair log I have already attached > >> With the prep. As mentioned it is with 2.6.24 Fedora kernel. > > It would probably be best to try a 2.6.26 kernel from rawhide to be sure > you're closest to the bleeding edge. > > <Sagar> Sure Eric but I reran the test and I got similar errors with > 2.6.24 kernel on x86. I am still confused with the results that I see on > 2.6.24 kernel on x86 machine. I see that the used size shown by ls is > way too huge than the actual size. Here is the log of the system > > [root@lab00 ~/test_partition]# ls -lSah > total 202M > -rw-r--r-- 1 root root 202M Jul 4 14:06 original ---> this I sthe file > Which I copy. > drwxr-x--- 65 root root 12K Jul 6 21:57 .. > -rwxr-xr-x 1 root root 189 Jul 4 16:31 runall > -rwxr-xr-x 1 root root 50 Jul 4 16:32 copy > drwxr-xr-x 2 root root 45 Jul 6 22:07 . It'd be great if you provided these actual scripts so we don't have to guess at what you're doing or work backwards from the repair output :) > dmesg log doesn't give any information. Here is XFS related > info: > > XFS mounting filesystem loop0 > Ending clean XFS mount for filesystem: loop0 > Which is basically for mounting XFS cleanly. But there is no exception > in XFS. and nothing else of interest either? > Filesystem has become completely sluggish and response time is increased > to > 3-4 minutes for every command. Not a single copy is complete and all > the copy processes are sleeping continuously. And how did you recover from this; did you power-cycle the box? -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-06 19:07 ` Eric Sandeen @ 2008-07-07 3:02 ` Sagar Borikar 2008-07-07 3:04 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-07 3:02 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs [-- Attachment #1: Type: text/plain, Size: 2363 bytes --] Eric Sandeen wrote: > Sagar Borikar wrote: > >> Sagar Borikar wrote: >> >>> Copy is of the same file to 30 different directories and it is >>> >> basically >> >>> overwrite. >>> >>> Here is the setup: >>> >>> It's a JBOD with Volume size 20 GB. The directories are empty and this >>> is basically continuous copy of the file on all thirty directories. >>> >> But >> >>> surprisingly none of the copy succeeds. All the copy processes are in >>> Uninterruptible sleep state and xfs_repair log I have already attached >>> >>> With the prep. As mentioned it is with 2.6.24 Fedora kernel. >>> >> It would probably be best to try a 2.6.26 kernel from rawhide to be sure >> you're closest to the bleeding edge. >> >> <Sagar> Sure Eric but I reran the test and I got similar errors with >> 2.6.24 kernel on x86. I am still confused with the results that I see on >> 2.6.24 kernel on x86 machine. I see that the used size shown by ls is >> way too huge than the actual size. Here is the log of the system >> >> [root@lab00 ~/test_partition]# ls -lSah >> total 202M >> -rw-r--r-- 1 root root 202M Jul 4 14:06 original ---> this I sthe file >> Which I copy. >> drwxr-x--- 65 root root 12K Jul 6 21:57 .. >> -rwxr-xr-x 1 root root 189 Jul 4 16:31 runall >> -rwxr-xr-x 1 root root 50 Jul 4 16:32 copy >> drwxr-xr-x 2 root root 45 Jul 6 22:07 . >> > > It'd be great if you provided these actual scripts so we don't have to > guess at what you're doing or work backwards from the repair output :) > Attaching the scripts with this mail. > >> dmesg log doesn't give any information. Here is XFS related >> info: >> >> XFS mounting filesystem loop0 >> Ending clean XFS mount for filesystem: loop0 >> Which is basically for mounting XFS cleanly. But there is no exception >> in XFS. >> > > and nothing else of interest either? > Not really. That's why it was surprising. Even after setting the error_level to 11 > >> Filesystem has become completely sluggish and response time is increased >> to >> 3-4 minutes for every command. Not a single copy is complete and all >> the copy processes are sleeping continuously. >> > > And how did you recover from this; did you power-cycle the box? > There was no failure. Only the processes were stalled. System was operative. > -Eric > [-- Attachment #2: copy --] [-- Type: text/plain, Size: 50 bytes --] #! /bin/sh while [ 1 ] do cp -f $1 $2 done [-- Attachment #3: runall --] [-- Type: text/plain, Size: 189 bytes --] #! /bin/sh for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 do mkdir -p testdir_$i ./copy testfile testdir_$i & rm -Rf testdir_$1/testfile ./copy testfile testfile_$i & done ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:02 ` Sagar Borikar @ 2008-07-07 3:04 ` Eric Sandeen 2008-07-07 3:07 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-07 3:04 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > There was no failure. Only the processes were stalled. System was > operative. I'm curious, if the processes were stalled, how did you unmount the filesystem to run repair on it? -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:04 ` Eric Sandeen @ 2008-07-07 3:07 ` Sagar Borikar 2008-07-07 3:11 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-07 3:07 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs Eric Sandeen wrote: > Sagar Borikar wrote: > > > >> There was no failure. Only the processes were stalled. System was >> operative. >> > > > I'm curious, if the processes were stalled, how did you unmount the > filesystem to run repair on it? > > -Eric > I ran with -n option. xfs_repair -fvn /root/test_partition Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:07 ` Sagar Borikar @ 2008-07-07 3:11 ` Eric Sandeen 2008-07-07 3:17 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-07 3:11 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > > Eric Sandeen wrote: >> Sagar Borikar wrote: >> >> >> >>> There was no failure. Only the processes were stalled. System was >>> operative. >>> >> >> I'm curious, if the processes were stalled, how did you unmount the >> filesystem to run repair on it? >> >> -Eric >> > I ran with -n option. > > xfs_repair -fvn /root/test_partition oh.... So, you basically ran repair on a live, mounted filesystem; it's expected that it would not be consistent at this point. So, the errors you are seeing on this x86 are likely not related to those you see on mips. (the D state process might be interesting and worth looking into, but probably not related to the problem you're trying to solve.) -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:11 ` Eric Sandeen @ 2008-07-07 3:17 ` Sagar Borikar 2008-07-07 3:22 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-07 3:17 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs Eric Sandeen wrote: > Sagar Borikar wrote: > >> Eric Sandeen wrote: >> >>> Sagar Borikar wrote: >>> >>> >>> >>> >>>> There was no failure. Only the processes were stalled. System was >>>> operative. >>>> >>>> >>> I'm curious, if the processes were stalled, how did you unmount the >>> filesystem to run repair on it? >>> >>> -Eric >>> >>> >> I ran with -n option. >> >> xfs_repair -fvn /root/test_partition >> > > oh.... > > So, you basically ran repair on a live, mounted filesystem; it's > expected that it would not be consistent at this point. > > So, the errors you are seeing on this x86 are likely not related to > those you see on mips. (the D state process might be interesting and > worth looking into, but probably not related to the problem you're > trying to solve.) > > -Eric > Ok. But then I was surprised as why the copy is not successful. Here is the ps output root 29200 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_16 root 29201 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_16 root 29202 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_14 root 29203 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_2 root 29204 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_9 root 29205 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_5 root 29206 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_3 root 29207 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_15 root 29208 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testdir_2 root 29209 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_12 root 29210 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_10 root 29211 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_4 root 29212 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_13 root 29213 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_20 root 29214 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testdir_20 root 29215 0.0 0.1 2088 656 ? D 01:41 0:00 cp -f testfile testdir_18 root 29216 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_13 root 29217 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testdir_1 root 29218 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_8 root 29219 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_11 root 29220 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_6 root 29221 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_6 root 29222 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_10 root 29223 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_14 root 29224 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_19 root 29225 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_12 root 29226 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_5 root 29227 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testdir_11 root 29228 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_8 root 29229 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_4 root 29230 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_17 root 29231 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_18 root 29232 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testdir_15 root 29233 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_7 root 29234 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_3 root 29235 0.0 0.1 2088 644 ? D 01:41 0:00 cp -f testfile testfile_1 root 29236 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_17 root 29237 0.0 0.1 2088 652 ? D 01:41 0:00 cp -f testfile testdir_7 root 29238 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testdir_19 root 29239 0.0 0.1 2088 648 ? D 01:41 0:00 cp -f testfile testfile_9 All the the copies are pending and file size in those directories is constant. It is not increasing. And as the processes are in D state, the file system is marked as busy and I can't unmount it. Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:17 ` Sagar Borikar @ 2008-07-07 3:22 ` Eric Sandeen 2008-07-07 3:42 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-07 3:22 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > All the the copies are pending and file size in those directories is > constant. It is not > increasing. > And as the processes are in D state, the file system is marked as busy > and I can't unmount > it. Understood. It looks like you've deadlocked somewhere. But, this is not the problem you are really trying to solve, right? You just were trying to recreate the mips problem on x86? If you want, do a sysrq-t to get traces of all those cp's to see where they're stuck, but this probably isn't getting you much closer to solving the original problem. (BTW: is this the exact same testcase that led to the block 0 access on mips which started this thread?) -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:22 ` Eric Sandeen @ 2008-07-07 3:42 ` Sagar Borikar [not found] ` <487191C2.6090803@sandeen .net> 2008-07-07 3:47 ` Eric Sandeen 0 siblings, 2 replies; 48+ messages in thread From: Sagar Borikar @ 2008-07-07 3:42 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs Eric Sandeen wrote: > Sagar Borikar wrote: > > >> All the the copies are pending and file size in those directories is >> constant. It is not >> increasing. >> And as the processes are in D state, the file system is marked as busy >> and I can't unmount >> it. >> > > Understood. It looks like you've deadlocked somewhere. But, this is > not the problem you are really trying to solve, right? You just were > trying to recreate the mips problem on x86? > That's right. The intention behind testing on 2.6.24 was to check whether we can imitate failure on x86 which is considered to be more robust. If we replicate the failure then there could be some issue in XFS and if the test passes then we can back port this kernel on MIPS ( Which any way I am doing with your patches ). But I faced similar deadlock on MIPS with exceptions which I posted earlier. > If you want, do a sysrq-t to get traces of all those cp's to see where > they're stuck, but this probably isn't getting you much closer to > solving the original problem. > > I'll keep you posted with it. > (BTW: is this the exact same testcase that led to the block 0 access on > mips which started this thread?) > > -Eric > Ok. So initially our multi client iozone stress test used to fail. But as it took 2-3 days to replicate the issue, I tried the test, standalone on MIPS and observed similar failures which I used to get in multi client test. The test is exactly same what I do in mutli client iozoen over network. Hence I came to conclusion that if we fix system to pass my test case then we can try iozone test with that fix. And now on x86 with 2.6.24, I am finding similar deadlock but the system is responsive and there are no lockups or exceptions. Do you observe similar failures on x86 at your setup? Also do you think the issues which I am seeing on x86 and MIPS are coming from the same sources? Thanks Sagar ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <487191C2.6090803@sandeen .net>]
[parent not found: <4871947D.2090701@pmc-sierr a.com>]
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:42 ` Sagar Borikar [not found] ` <487191C2.6090803@sandeen .net> @ 2008-07-07 3:47 ` Eric Sandeen 2008-07-07 3:58 ` Sagar Borikar 1 sibling, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-07 3:47 UTC (permalink / raw) To: Sagar Borikar; +Cc: Dave Chinner, Nathan Scott, xfs Sagar Borikar wrote: > Ok. So initially our multi client iozone stress test used to fail. Are these multiple nfs clients? > But > as it took 2-3 days > to replicate the issue, I tried the test, standalone on MIPS and the iozone test again? > observed similar failures which > I used to get in multi client test. The test is exactly same what I do > in mutli client > iozoen over network. Hence I came to conclusion that if we fix system to > pass my test case > then we can try iozone test with that fix. And now on x86 with 2.6.24, > I am finding similar deadlock but > the system is responsive and there are no lockups or exceptions. Do you > observe similar failures on x86 > at your setup? So far I've not seen the deadlocks. > Also do you think the issues which I am seeing on x86 and > MIPS are coming from the > same sources? hard to say at this point, I think. -Eric > Thanks > Sagar > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:47 ` Eric Sandeen @ 2008-07-07 3:58 ` Sagar Borikar 2008-07-07 5:19 ` Eric Sandeen 0 siblings, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-07 3:58 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Chinner, Nathan Scott, xfs Eric Sandeen wrote: > Sagar Borikar wrote: > > > >> Ok. So initially our multi client iozone stress test used to fail. >> > > Are these multiple nfs clients? > Actually mix of them. 15 CIFS clients, 4 NFS clients ( 19 iozone clients ) , 2 FTP clients, 4 HTTP transfers. ( Total 25 transactions simultaneously ) > >> But >> as it took 2-3 days >> to replicate the issue, I tried the test, standalone on MIPS and >> > > the iozone test again? > iozone test is continuously giving the access to block zero exception and xfs shutdown errors with transaction cancel exceptions plus alloc btree corruption exception which I reported earlier. And my test gives transaction cancel exception and block zero exception with processes under test in deadlock state on MIPS but on x86 there are no exceptions but only incomplete copies due to uninterruptible sleep state and deadlock. > >> observed similar failures which >> I used to get in multi client test. The test is exactly same what I do >> in mutli client >> iozoen over network. Hence I came to conclusion that if we fix system to >> pass my test case >> then we can try iozone test with that fix. And now on x86 with 2.6.24, >> I am finding similar deadlock but >> the system is responsive and there are no lockups or exceptions. Do you >> observe similar failures on x86 >> at your setup? >> > > So far I've not seen the deadlocks. > Could you kindly try with my test? I presume you should see failure soon. I tried this on 2 different x86 systems 2 times ( after rebooting the system ) and I saw it every time. > >> Also do you think the issues which I am seeing on x86 and >> MIPS are coming from the >> same sources? >> > > hard to say at this point, I think. > > -Eric > > >> Thanks >> Sagar >> >> > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 3:58 ` Sagar Borikar @ 2008-07-07 5:19 ` Eric Sandeen 2008-07-07 5:58 ` Sagar Borikar 0 siblings, 1 reply; 48+ messages in thread From: Eric Sandeen @ 2008-07-07 5:19 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs Sagar Borikar wrote: > Could you kindly try with my test? I presume you should see failure > soon. I tried this on > 2 different x86 systems 2 times ( after rebooting the system ) and I saw > it every time. Sure. Is there a reason you're doing this on a loopback file? That probably stresses the vm a bit more, and might get even trickier if the loopback file is sparse... But anyway, on an x86_64 machine with 2G of memory and a non-sparse 10G loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for me, though the system does get sluggish. I let it run a bit then ran repair and it found no problems, I'll run it overnight to see if anything else turns up. -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-07 5:19 ` Eric Sandeen @ 2008-07-07 5:58 ` Sagar Borikar 0 siblings, 0 replies; 48+ messages in thread From: Sagar Borikar @ 2008-07-07 5:58 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Eric Sandeen wrote: > Sagar Borikar wrote: > > > >> Could you kindly try with my test? I presume you should see failure >> soon. I tried this on >> 2 different x86 systems 2 times ( after rebooting the system ) and I saw >> it every time. >> > > > Sure. Is there a reason you're doing this on a loopback file? That > probably stresses the vm a bit more, and might get even trickier if the > loopback file is sparse... > Initially I thought to do that since I didn't want to have a strict allocation limit but allowing allocations to grow as needed until the backing filesystem runs out of free space due to type of the test case I had. But then I dropped the plan and created a non-sparse loopback device. There was no specific reason to create loopback but as it was simplest option to do it. > But anyway, on an x86_64 machine with 2G of memory and a non-sparse 10G > loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for me, > though the system does get sluggish. I let it run a bit then ran repair > and it found no problems, I'll run it overnight to see if anything else > turns up. > That will be great. Thanks indeed. Sagar > -Eric > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-04 17:51 ` Sagar Borikar 2008-07-05 16:25 ` Eric Sandeen @ 2008-07-06 4:19 ` Dave Chinner 1 sibling, 0 replies; 48+ messages in thread From: Dave Chinner @ 2008-07-06 4:19 UTC (permalink / raw) To: Sagar Borikar; +Cc: Eric Sandeen, Nathan Scott, xfs On Fri, Jul 04, 2008 at 10:51:47AM -0700, Sagar Borikar wrote: > > Copy is of the same file to 30 different directories and it is basically > overwrite. Not an overwrite - cp truncates the destination file first: # cp t.t fred # strace cp -f t.t fred ..... stat("fred", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0 stat("t.t", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0 stat("fred", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0 open("t.t", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0 open("fred", O_WRONLY|O_TRUNC) = 4 ^^^^^^^^^^^^^^^^ fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 read(3, "fred\n", 4096) = 5 write(4, "fred\n", 5) = 5 close(4) = 0 close(3) = 0 ..... That being said, I can't reproduce it on a 2.6.24 (debian) kernel, either. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-07-04 10:18 ` Sagar Borikar 2008-07-04 12:27 ` Dave Chinner @ 2008-07-04 15:33 ` Eric Sandeen 1 sibling, 0 replies; 48+ messages in thread From: Eric Sandeen @ 2008-07-04 15:33 UTC (permalink / raw) To: Sagar Borikar; +Cc: Nathan Scott, xfs Sagar Borikar wrote: >> > Even we too don't want to leave it as it is. I still am working on back > porting the latest xfs code. > Your patches are helping a lot . > Just to check whether that issue lies with 2.6.18 or MIPS port, I tested > it on 2.6.24 x86 platform. > Here we created a loop back device of 10 GB and mounted xfs on that. > What I observe that xfs_repair reports quite a few bad blocks and bad > extents here as well. > So is developing bad blocks and extents normal behavior in xfs which > would be recovered > in background or is it a bug? I still didn't see the exception but the > bad blocks and extents are > generated within 10 minutes or running the tests. > Attaching the log . Repair finding corruption indicates a bug (or hardware problem) somewhere. As a long shot you might re-test with this patch in place: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=6ab455eeaff6893cd06da33843e840d888cdc04a But, as Dave said, please also provide the testcase. -Eric ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Xfs Access to block zero exception and system crash 2008-06-27 10:13 ` Sagar Borikar 2008-06-27 10:25 ` Sagar Borikar @ 2008-06-28 0:02 ` Dave Chinner 1 sibling, 0 replies; 48+ messages in thread From: Dave Chinner @ 2008-06-28 0:02 UTC (permalink / raw) To: Sagar Borikar; +Cc: xfs On Fri, Jun 27, 2008 at 03:43:49PM +0530, Sagar Borikar wrote: > Dave Chinner wrote: >> Yes, but all the same pattern of corruption, so it is likely >> that it is one problem. >> >> All I can suggest is working out a reproducable test case in your >> development environment, attaching a debugger and start digging around >> in memory when the problem is hit and try to find out exactly what >> is corrupted. If you can't reproduce it or work out what is >> occurring to trigger the problem, then we're not going to be able to >> find the cause... >> > Thanks Dave > I did some experiments today with the corrupted filesystem. > setup : NAS box contains one volume /share and 10 subdirectories. > In first subdirectory sh1, I kept 512MB file. Through a script I > continuously copy this file > simultaneously from sh2 to sh10 subdirectories. > The script looks like > .... > while [ 1 ] > do > cp $1 $2 > done .... > uninterruptible sleep state continuously. Ran xfs_repair with -n option > on filesystem mounted on JBOD > Here is the output : .... > entry "iozone_68.tst" in shortform directory 67108993 references free > inode 67108995 .... > entry "iozone_68.tst" in shortform directory 100663425 references free > inode 100663427 .... > entry "iozone_68.tst" in shortform directory 301990016 references free > inode 301990019 .... > entry "iozone_68.tst" in shortform directory 335544448 references free > inode 335544451 .... > entry "iozone_68.tst" in shortform directory 402653313 references free > inode 402653318 .... And so on. There's a pattern here. Can you try to find out what part of your workload is producing these errors? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <4872E0BC.6070400@pmc-sierra.com>]
[parent not found: <4872E33E.3090107@sandeen.net>]
* RE: Xfs Access to block zero exception and system crash [not found] ` <4872E33E.3090107@sandeen.net> @ 2008-07-08 5:03 ` Sagar Borikar 2008-07-09 16:57 ` Sagar Borikar 1 sibling, 0 replies; 48+ messages in thread From: Sagar Borikar @ 2008-07-08 5:03 UTC (permalink / raw) To: Eric Sandeen; +Cc: Raj Palani, xfs Sure Eric, I'll keep you posted with the results w/o loop back file. When you say that the deadlock could be due to vm, is it due to lack of memory? I checked meminfo and I found that sufficient buffers and committed_as were persent when xfs is stalled. Thanks Sagar Sagar Borikar wrote: > That's right Eric but I am still surprised that why should we get a > dead lock in this scenario as it is a plain copy of file in multiple > directories. Our customer is reporting similar kind of lockup in our > platform. ok, I guess I had missed that, sorry. > I do understand that we are chasing the access to block zero exception > and XFS forced shutdown which I mentioned earlier. But we also see > quite a few smbd processes which are writing data to XFS are in > uninterruptible sleep state and the system locks up too. Ok; then the next step is probably to do sysrq-t and see where things are stuck. It might be better to see if you can reproduce w/o the loopback file, too, since that's just another layer to go through that might be changing things. > So I thought > the test which I am running could be pointing to similar issue which > we are observing on our platform. But does this indicate that the > problem lies with x86 XFS too ? or maybe the vm ... > Also I presume in enterprise market such kind of simultaneous write > situation may happen. Has anybody reported similar issues to you? As > you observed it over x86 and 2.6.24 kernel, could you say what would > be root cause of this? Haven't really seen it before that I recall, and at this point can't say for sure what it might be. -Eric > Sorry for lots of questions at same time :) But I am happy that > you were able to see the deadlock in x86 on your setup with 2.6.24 > > Thanks > Sagar > > > Eric Sandeen wrote: >> Sagar Borikar wrote: >> >>> Hi Eric, >>> >>> Did you see any issues in your test? >>> >> I got a deadlock but that's it; I don't think that's the bug you want >> to chase... >> >> >> -Eric >> >> >>> Thanks >>> Sagar >>> >>> >>> Sagar Borikar wrote: >>> >>>> Eric Sandeen wrote: >>>> >>>>> Sagar Borikar wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Could you kindly try with my test? I presume you should see >>>>>> failure soon. I tried this on >>>>>> 2 different x86 systems 2 times ( after rebooting the system ) >>>>>> and I saw it every time. >>>>>> >>>>>> >>>>> Sure. Is there a reason you're doing this on a loopback file? >>>>> That probably stresses the vm a bit more, and might get even >>>>> trickier if the loopback file is sparse... >>>>> >>>>> >>>> Initially I thought to do that since I didn't want to have a strict >>>> allocation limit but allowing allocations to grow as needed until >>>> the backing filesystem runs out of free space due to type of the >>>> test case I had. But then I dropped the plan and created a >>>> non-sparse loopback device. There was no specific reason to create >>>> loopback but as it was simplest option to do it. >>>> >>>>> But anyway, on an x86_64 machine with 2G of memory and a >>>>> non-sparse 10G loopback file on 2.6.24.7-92.fc8, your test runs >>>>> w/o problems for me, though the system does get sluggish. I let >>>>> it run a bit then ran repair and it found no problems, I'll run it >>>>> overnight to see if anything else turns up. >>>>> >>>>> >>>> That will be great. Thanks indeed. >>>> Sagar >>>> >>>> >>>>> -Eric >>>>> >>>>> >> > ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash [not found] ` <4872E33E.3090107@sandeen.net> 2008-07-08 5:03 ` Sagar Borikar @ 2008-07-09 16:57 ` Sagar Borikar 2008-07-10 5:12 ` Sagar Borikar 1 sibling, 1 reply; 48+ messages in thread From: Sagar Borikar @ 2008-07-09 16:57 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Sagar Borikar wrote: > That's right Eric but I am still surprised that why should we get a dead > lock in this scenario as it is a plain copy of file in multiple > directories. Our customer is reporting similar kind of lockup in our > platform. ok, I guess I had missed that, sorry. > I do understand that we are chasing the access to block zero > exception and XFS forced shutdown which I mentioned earlier. But we > also see quite a few smbd processes which are writing data to XFS are in > uninterruptible sleep state and the system locks up too. Ok; then the next step is probably to do sysrq-t and see where things are stuck. It might be better to see if you can reproduce w/o the loopback file, too, since that's just another layer to go through that might be changing things. <Sagar> I ran it on actual device w/o loopback file and even there observed that XFS transactions going into uninterruptible sleep state and the copies were stalled. I had to hard reboot the system to bring XFS out of that state since soft reboot didn't work, it was waiting for file system to get unmounted. I shall provide the sysrq-t update later. > So I thought > the test which I am running could be pointing to similar issue which we > are observing on our platform. But does this indicate that the problem > lies with x86 XFS too ? or maybe the vm ... > Also I presume in enterprise market such kind > of simultaneous write situation may happen. Has anybody reported > similar issues to you? As you observed it over x86 and 2.6.24 kernel, > could you say what would be root cause of this? Haven't really seen it before that I recall, and at this point can't say for sure what it might be. -Eric > Sorry for lots of questions at same time :) But I am happy that you > were able to see the deadlock in x86 on your setup with 2.6.24 > > Thanks > Sagar > > > Eric Sandeen wrote: >> Sagar Borikar wrote: >> >>> Hi Eric, >>> >>> Did you see any issues in your test? >>> >> I got a deadlock but that's it; I don't think that's the bug you want to >> chase... >> >> >> -Eric >> >> >>> Thanks >>> Sagar >>> >>> >>> Sagar Borikar wrote: >>> >>>> Eric Sandeen wrote: >>>> >>>>> Sagar Borikar wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Could you kindly try with my test? I presume you should see failure >>>>>> soon. I tried this on >>>>>> 2 different x86 systems 2 times ( after rebooting the system ) and I >>>>>> saw it every time. >>>>>> >>>>>> >>>>> Sure. Is there a reason you're doing this on a loopback file? That >>>>> probably stresses the vm a bit more, and might get even trickier if the >>>>> loopback file is sparse... >>>>> >>>>> >>>> Initially I thought to do that since I didn't want to have a strict >>>> allocation limit but >>>> allowing allocations to grow as needed until the backing filesystem >>>> runs out of free space >>>> due to type of the test case I had. But then I dropped the plan and >>>> created a non-sparse >>>> loopback device. There was no specific reason to create loopback but >>>> as it was >>>> simplest option to do it. >>>> >>>>> But anyway, on an x86_64 machine with 2G of memory and a non-sparse 10G >>>>> loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for me, >>>>> though the system does get sluggish. I let it run a bit then ran repair >>>>> and it found no problems, I'll run it overnight to see if anything else >>>>> turns up. >>>>> >>>>> >>>> That will be great. Thanks indeed. >>>> Sagar >>>> >>>> >>>>> -Eric >>>>> >>>>> >> > ^ permalink raw reply [flat|nested] 48+ messages in thread
* RE: Xfs Access to block zero exception and system crash 2008-07-09 16:57 ` Sagar Borikar @ 2008-07-10 5:12 ` Sagar Borikar 0 siblings, 0 replies; 48+ messages in thread From: Sagar Borikar @ 2008-07-10 5:12 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs Eric, Could be a slight digression but can you let me know why the fragmentation factor is going to 99% immediately? I observed this on both x86 and MIPS platform. Also to alleviate this issue, if I specify allocsize=512m what would be the consequences? Since default allocsize is 64k right? Also while mounting we are setting up default option for mounting file system. Thanks Sagar -----Original Message----- From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] On Behalf Of Sagar Borikar Sent: Wednesday, July 09, 2008 10:28 PM To: Eric Sandeen Cc: xfs@oss.sgi.com Subject: RE: Xfs Access to block zero exception and system crash Sagar Borikar wrote: > That's right Eric but I am still surprised that why should we get a dead > lock in this scenario as it is a plain copy of file in multiple > directories. Our customer is reporting similar kind of lockup in our > platform. ok, I guess I had missed that, sorry. > I do understand that we are chasing the access to block zero > exception and XFS forced shutdown which I mentioned earlier. But we > also see quite a few smbd processes which are writing data to XFS are in > uninterruptible sleep state and the system locks up too. Ok; then the next step is probably to do sysrq-t and see where things are stuck. It might be better to see if you can reproduce w/o the loopback file, too, since that's just another layer to go through that might be changing things. <Sagar> I ran it on actual device w/o loopback file and even there observed that XFS transactions going into uninterruptible sleep state and the copies were stalled. I had to hard reboot the system to bring XFS out of that state since soft reboot didn't work, it was waiting for file system to get unmounted. I shall provide the sysrq-t update later. > So I thought > the test which I am running could be pointing to similar issue which we > are observing on our platform. But does this indicate that the problem > lies with x86 XFS too ? or maybe the vm ... > Also I presume in enterprise market such kind > of simultaneous write situation may happen. Has anybody reported > similar issues to you? As you observed it over x86 and 2.6.24 kernel, > could you say what would be root cause of this? Haven't really seen it before that I recall, and at this point can't say for sure what it might be. -Eric > Sorry for lots of questions at same time :) But I am happy that you > were able to see the deadlock in x86 on your setup with 2.6.24 > > Thanks > Sagar > > > Eric Sandeen wrote: >> Sagar Borikar wrote: >> >>> Hi Eric, >>> >>> Did you see any issues in your test? >>> >> I got a deadlock but that's it; I don't think that's the bug you want to >> chase... >> >> >> -Eric >> >> >>> Thanks >>> Sagar >>> >>> >>> Sagar Borikar wrote: >>> >>>> Eric Sandeen wrote: >>>> >>>>> Sagar Borikar wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Could you kindly try with my test? I presume you should see failure >>>>>> soon. I tried this on >>>>>> 2 different x86 systems 2 times ( after rebooting the system ) and I >>>>>> saw it every time. >>>>>> >>>>>> >>>>> Sure. Is there a reason you're doing this on a loopback file? That >>>>> probably stresses the vm a bit more, and might get even trickier if the >>>>> loopback file is sparse... >>>>> >>>>> >>>> Initially I thought to do that since I didn't want to have a strict >>>> allocation limit but >>>> allowing allocations to grow as needed until the backing filesystem >>>> runs out of free space >>>> due to type of the test case I had. But then I dropped the plan and >>>> created a non-sparse >>>> loopback device. There was no specific reason to create loopback but >>>> as it was >>>> simplest option to do it. >>>> >>>>> But anyway, on an x86_64 machine with 2G of memory and a non-sparse 10G >>>>> loopback file on 2.6.24.7-92.fc8, your test runs w/o problems for me, >>>>> though the system does get sluggish. I let it run a bit then ran repair >>>>> and it found no problems, I'll run it overnight to see if anything else >>>>> turns up. >>>>> >>>>> >>>> That will be great. Thanks indeed. >>>> Sagar >>>> >>>> >>>>> -Eric >>>>> >>>>> >> > ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2008-07-10 5:11 UTC | newest]
Thread overview: 48+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-24 7:03 Xfs Access to block zero exception and system crash Sagar Borikar
2008-06-25 6:48 ` Sagar Borikar
2008-06-25 8:49 ` Dave Chinner
2008-06-26 6:46 ` Sagar Borikar
2008-06-26 7:02 ` Dave Chinner
2008-06-27 10:13 ` Sagar Borikar
2008-06-27 10:25 ` Sagar Borikar
2008-06-28 0:05 ` Dave Chinner
2008-06-28 16:47 ` Sagar Borikar
2008-06-29 21:56 ` Dave Chinner
2008-06-30 3:37 ` Sagar Borikar
[not found] ` <20080630034112.055CF18904C4@bby1mta01.pmc-sierra.bc.ca>
2008-06-30 6:07 ` Sagar Borikar
2008-06-30 10:24 ` Sagar Borikar
2008-07-01 6:44 ` Dave Chinner
2008-07-02 4:18 ` Sagar Borikar
2008-07-02 5:13 ` Dave Chinner
2008-07-02 5:35 ` Sagar Borikar
2008-07-02 6:13 ` Nathan Scott
2008-07-02 6:56 ` Dave Chinner
2008-07-02 11:02 ` Sagar Borikar
2008-07-03 4:03 ` Eric Sandeen
2008-07-03 5:14 ` Sagar Borikar
2008-07-03 15:02 ` Eric Sandeen
2008-07-04 10:18 ` Sagar Borikar
2008-07-04 12:27 ` Dave Chinner
2008-07-04 17:30 ` Sagar Borikar
2008-07-04 17:35 ` Eric Sandeen
2008-07-04 17:51 ` Sagar Borikar
2008-07-05 16:25 ` Eric Sandeen
2008-07-06 17:24 ` Sagar Borikar
2008-07-06 19:07 ` Eric Sandeen
2008-07-07 3:02 ` Sagar Borikar
2008-07-07 3:04 ` Eric Sandeen
2008-07-07 3:07 ` Sagar Borikar
2008-07-07 3:11 ` Eric Sandeen
2008-07-07 3:17 ` Sagar Borikar
2008-07-07 3:22 ` Eric Sandeen
2008-07-07 3:42 ` Sagar Borikar
[not found] ` <487191C2.6090803@sandeen .net>
[not found] ` <4871947D.2090701@pmc-sierr a.com>
2008-07-07 3:47 ` Eric Sandeen
2008-07-07 3:58 ` Sagar Borikar
2008-07-07 5:19 ` Eric Sandeen
2008-07-07 5:58 ` Sagar Borikar
2008-07-06 4:19 ` Dave Chinner
2008-07-04 15:33 ` Eric Sandeen
2008-06-28 0:02 ` Dave Chinner
[not found] <4872E0BC.6070400@pmc-sierra.com>
[not found] ` <4872E33E.3090107@sandeen.net>
2008-07-08 5:03 ` Sagar Borikar
2008-07-09 16:57 ` Sagar Borikar
2008-07-10 5:12 ` Sagar Borikar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox