From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tao Ma Date: Fri, 14 Jan 2011 16:03:47 +0800 Subject: [Ocfs2-devel] Problems with fsck In-Reply-To: <4D2DD3E0.7080509@oracle.com> References: <4D2DCCF1.4080303@navynet.it> <4D2DD3E0.7080509@oracle.com> Message-ID: <4D300363.6080004@tao.ma> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ocfs2-devel@oss.oracle.com On 01/13/2011 12:16 AM, Sunil Mushran wrote: > fsck is failing because it is encountering block(s) with incorrect > checksums. An easy solution is to disable checksums and rerun > fsck. Checksums can be renabled later. > > The problem started with the segfault when activating indexed-dirs. > Do you have the coredump? I met with segfault when enabling indexed-dirs severl months ago. They are still pending for review and integration. http://oss.oracle.com/pipermail/ocfs2-tools-devel/2010-September/003574.html Regards, Tao > > On 01/12/2011 07:46 AM, Massimo Cetra wrote: >> Hi List, >> >> i'd like to share with you what happened yesterday. >> >> Kernel 2.6.36.1 >> ocfs2-tools 1.6.3 (latest). >> >> I had an old OCFS2 partition created with a 2.6.32 kernel and ocfs2 >> tools 1.4.5. >> >> I unmounted all partitions on all nodes in order to enable discontig-bg. >> >> I then used tunefs to add discontig-bg, inline-data and indexed-dirs. >> >> During indexed-dirs tunefs segfaulted and since then, fsck didn't work >> anymore. >> >> I managed to mount the partition again but after some errors like the >> following >> >> Jan 11 23:11:56 www1 kernel: [ 2339.642683] >> (mc,3305,0):ocfs2_block_check_validate:443 ERROR: CRC32 failed: stored: >> 0x76176db1, computed 0x9e4c2434. Applying ECC. >> Jan 11 23:11:56 www1 kernel: [ 2339.645074] >> (mc,3305,0):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 failed: >> stored: 0x76176db1, computed 0x91119fb2 >> Jan 11 23:11:56 www1 kernel: [ 2339.647196] >> (mc,3305,0):ocfs2_validate_extent_block:903 ERROR: Checksum failed for >> extent block 6924877 >> Jan 11 23:11:56 www1 kernel: [ 2339.649212] >> (mc,3305,0):__ocfs2_find_path:1837 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.650409] >> (mc,3305,0):ocfs2_remove_rightmost_path:3090 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.651719] >> (mc,3305,0):ocfs2_rotate_tree_left:3225 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.653076] >> (mc,3305,0):ocfs2_truncate_rec:5442 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.654272] >> (mc,3305,0):ocfs2_remove_extent:5526 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.655531] >> (mc,3305,0):ocfs2_remove_btree_range:5717 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.656908] >> (mc,3305,0):ocfs2_commit_truncate:7117 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.658152] >> (mc,3305,0):ocfs2_truncate_for_delete:622 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.659423] >> (mc,3305,0):ocfs2_wipe_inode:793 ERROR: status = -5 >> Jan 11 23:11:56 www1 kernel: [ 2339.660700] >> (mc,3305,0):ocfs2_delete_inode:1085 ERROR: status = -5 >> >> >> Jan 11 23:15:41 www1 kernel: [ 2565.101905] OCFS2: ERROR (device drbd1): >> ocfs2_commit_truncate: Inode 7418891 has an empty extent record, depth 2 >> Jan 11 23:15:41 www1 kernel: [ 2565.101908]. >> Jan 11 23:15:41 www1 kernel: [ 2565.105104] File system is now read-only >> due to the potential of on-disk corruption. Please run fsck.ocfs2 once >> the file system is unmounted. >> Jan 11 23:15:41 www1 kernel: [ 2565.108155] >> (kworker/u:3,3361,0):ocfs2_truncate_for_delete:622 ERROR: status = -30 >> Jan 11 23:15:41 www1 kernel: [ 2565.110190] >> (kworker/u:3,3361,0):ocfs2_wipe_inode:793 ERROR: status = -30 >> Jan 11 23:15:41 www1 kernel: [ 2565.111772] >> (kworker/u:3,3361,0):ocfs2_delete_inode:1085 ERROR: status = -30 >> Jan 11 23:15:41 www1 kernel: [ 2565.134131] OCFS2: ERROR (device drbd1): >> ocfs2_commit_truncate: Inode 7418889 has an empty extent record, depth 2 >> Jan 11 23:15:41 www1 kernel: [ 2565.134133]. >> >> i wasn't able to mount the filesystem anymore in RW. >> I could mount only in RO. >> >> fsck was failing like this: >> >> www1:~# fsck.ocfs2 -f /dev/drbd1 >> fsck.ocfs2 1.6.3 >> Checking OCFS2 filesystem in /dev/drbd1: >> Label: www-code >> UUID: 03F008AFA8BA458E9C8614A9B4A3E6E8 >> Number of blocks: 26213582 >> Block size: 2048 >> Number of clusters: 13106791 >> Cluster size: 4096 >> Number of slots: 8 >> >> /dev/drbd1 was run with -f, check forced. >> Pass 0a: Checking cluster allocation chains >> Pass 0b: Checking inode allocation chains >> Pass 0c: Checking extent block allocation chains >> Pass 1: Checking inodes and blocks. >> extent.c: I/O error on channel reading extent block at 9590812 in owner >> 3231503 for verification >> extent.c: I/O error on channel reading extent block at 6924320 in owner >> 3231503 for verification >> pass1: I/O error on channel while iterating over the blocks for inode >> 3231503 >> fsck.ocfs2: I/O error on channel while performing pass 1 >> www1:~# >> >> ----------------------------------------------- >> >> It was late and i didn't have time to investigate more on a production >> server so i did a complete backup, used mkfs to wipe everything and >> restore the backup. >> >> I'm sorry i can't provide more data on the problem. I tried to google >> and search the mailing list archives but i didn't find anything interesting. >> >> Obviously i was quite disappointed by this problem and i hope those >> informations may, in some way, help identifying and fix the problem. >> >> Thanks for your work, >> >> Massimo >> >> >> >> >> >> >> >> _______________________________________________ >> Ocfs2-devel mailing list >> Ocfs2-devel at oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-devel > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-devel