* e2fsck not fixing deleted inode referenced errors? @ 2014-09-30 17:56 Zlatko Calusic 2014-09-30 18:30 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Zlatko Calusic @ 2014-09-30 17:56 UTC (permalink / raw) To: linux-ext4 Hope this is the right list to ask this question. I have an ext4 filesystem that has a few errors like this: Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): ext4_lookup:1448: inode #7913865: comm find: deleted inode referenced: 7912058 Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): ext4_lookup:1448: inode #7913865: comm find: deleted inode referenced: 7912055 Yet, when I run e2fsck -fy on it, I have a clean run, no errors are found and/or fixed. Is this the expected behaviour? What am I supposed to do to get rid of errors like the above? The filesystem is on a md mirror device, the kernel is 3.17.0-rc7, e2progs 1.42.12-1 (Debian sid). Could md device somehow interfere? I ran md check yesterday, but there were no errors. BTW, this all started when I got ata2.00: failed command: FLUSH CACHE EXT error yesterday morning. I did several runs of e2fsck before the filesystem came up clean, yet errors like the above are popping constantly. Thanks for any info. [and please Cc:, I'm not subscribed] -- Zlatko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 17:56 e2fsck not fixing deleted inode referenced errors? Zlatko Calusic @ 2014-09-30 18:30 ` Darrick J. Wong 2014-09-30 18:43 ` Zlatko Calusic 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2014-09-30 18:30 UTC (permalink / raw) To: Zlatko Calusic; +Cc: linux-ext4 On Tue, Sep 30, 2014 at 07:56:36PM +0200, Zlatko Calusic wrote: > Hope this is the right list to ask this question. > > I have an ext4 filesystem that has a few errors like this: > > Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): > ext4_lookup:1448: inode #7913865: comm find: deleted inode > referenced: 7912058 > Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): > ext4_lookup:1448: inode #7913865: comm find: deleted inode > referenced: 7912055 > > Yet, when I run e2fsck -fy on it, I have a clean run, no errors are > found and/or fixed. Is this the expected behaviour? What am I > supposed to do to get rid of errors like the above? [I should hope not.] > The filesystem is on a md mirror device, the kernel is 3.17.0-rc7, > e2progs 1.42.12-1 (Debian sid). Could md device somehow interfere? I > ran md check yesterday, but there were no errors. > > BTW, this all started when I got ata2.00: failed command: FLUSH > CACHE EXT error yesterday morning. I did several runs of e2fsck > before the filesystem came up clean, yet errors like the above are > popping constantly. Normally that kernel message only happens if a dir refers to an inode with link_count and mode set to 0. Is the disk attached to ata2.00 one of the RAID1 mirrors? What was the full error message, and does smartctl -a report anything? It would be interesting to see what "debugfs -R 'stat <7912058>' /dev/md2" returns. --D > > Thanks for any info. [and please Cc:, I'm not subscribed] > > -- > Zlatko > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 18:30 ` Darrick J. Wong @ 2014-09-30 18:43 ` Zlatko Calusic 2014-09-30 19:29 ` Darrick J. Wong 2014-09-30 19:54 ` Theodore Ts'o 0 siblings, 2 replies; 10+ messages in thread From: Zlatko Calusic @ 2014-09-30 18:43 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-ext4 On 30.09.2014 20:30, Darrick J. Wong wrote: > On Tue, Sep 30, 2014 at 07:56:36PM +0200, Zlatko Calusic wrote: >> Hope this is the right list to ask this question. >> >> I have an ext4 filesystem that has a few errors like this: >> >> Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): >> ext4_lookup:1448: inode #7913865: comm find: deleted inode >> referenced: 7912058 >> Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): >> ext4_lookup:1448: inode #7913865: comm find: deleted inode >> referenced: 7912055 >> >> Yet, when I run e2fsck -fy on it, I have a clean run, no errors are >> found and/or fixed. Is this the expected behaviour? What am I >> supposed to do to get rid of errors like the above? > > [I should hope not.] > >> The filesystem is on a md mirror device, the kernel is 3.17.0-rc7, >> e2progs 1.42.12-1 (Debian sid). Could md device somehow interfere? I >> ran md check yesterday, but there were no errors. >> >> BTW, this all started when I got ata2.00: failed command: FLUSH >> CACHE EXT error yesterday morning. I did several runs of e2fsck >> before the filesystem came up clean, yet errors like the above are >> popping constantly. > > Normally that kernel message only happens if a dir refers to an inode with > link_count and mode set to 0. > > Is the disk attached to ata2.00 one of the RAID1 mirrors? What was the full > error message, and does smartctl -a report anything? Yes, it is part of the mirror: ata2.00: ATA-8: WDC WD1002FBYS-02A6B0, 03.00C06, max UDMA/133 ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA ata2.00: configured for UDMA/133 md2 : active raid1 sdb2[0] sda2[1] 976229760 blocks [2/2] [UU] bitmap: 0/8 pages [0KB], 65536KB chunk Full error message from the kernel log, together with data check I did in the evening: Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, connection status changed Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch } Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT Sep 29 05:07:51 atlas kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a res 40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY } Sep 29 05:07:51 atlas kernel: ata2: hard resetting link Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please be patient (ready=0) Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133 Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10 Sep 29 05:08:00 atlas kernel: ata2: EH complete Sep 29 05:37:36 atlas kernel: EXT4-fs error (device md2): ext4_mb_generate_buddy:757: group 1783, block bitmap and bg descriptor inconsistent: 8218 vs 9292 free clusters Sep 29 05:37:36 atlas kernel: JBD2: Spotted dirty metadata buffer (dev = md2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Sep 29 16:03:43 atlas kernel: EXT4-fs error (device md2): ext4_mb_generate_buddy:757: group 995, block bitmap and bg descriptor inconsistent: 15932 vs 15939 free clusters Sep 29 16:03:43 atlas kernel: EXT4-fs error (device md2): ext4_mb_generate_buddy:757: group 1732, block bitmap and bg descriptor inconsistent: 5055 vs 5705 free clusters Sep 29 19:24:01 atlas kernel: md: data-check of RAID array md2 Sep 29 19:24:01 atlas kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Sep 29 19:24:01 atlas kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Sep 29 19:24:01 atlas kernel: md: using 128k window, over a total of 976229760k. Sep 29 22:37:53 atlas kernel: md: md2: data-check done. Later on I did several (at least 3) e2fsck runs until the filesystem finally was clean of errors. Only to stumble upon new errors today that can't be fixed with e2fsck anymore. :( > > It would be interesting to see what "debugfs -R 'stat <7912058>' /dev/md2" > returns. Inode: 7912058 Type: regular Mode: 0644 Flags: 0x80000 Generation: 252726504 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5428ccf9:667449f0 -- Mon Sep 29 05:07:37 2014 atime: 0x5428ccf9:65fa3740 -- Mon Sep 29 05:07:37 2014 mtime: 0x5428ccf9:667449f0 -- Mon Sep 29 05:07:37 2014 crtime: 0x53451666:d35246b0 -- Wed Apr 9 11:44:06 2014 dtime: 0x5428ccf9 -- Mon Sep 29 05:07:37 2014 Size of extra inode fields: 28 EXTENTS: At this time there seems to be 7 such files. Here's what it looks like: {atlas} [/ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters]# ls -la ls: cannot access colormod.so: Input/output error ls: cannot access bumpmap.so: Input/output error ls: cannot access bumpmap.la: Input/output error ls: cannot access testfilter.la: Input/output error ls: cannot access testfilter.so: Input/output error ls: cannot access colormod.la: Input/output error total 8 drwxr-xr-x 2 root root 4096 Sep 28 11:10 . drwxr-xr-x 4 root root 4096 Sep 14 2013 .. -????????? ? ? ? ? ? bumpmap.la -????????? ? ? ? ? ? bumpmap.so -????????? ? ? ? ? ? colormod.la -????????? ? ? ? ? ? colormod.so -????????? ? ? ? ? ? testfilter.la -????????? ? ? ? ? ? testfilter.so {atlas} [/ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters]# cd {atlas} [~]# umount /ext tim{atlas} [~]# time e2fsck -fy /dev/md2 e2fsck 1.42.12 (29-Aug-2014) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/md2: 3863428/61022208 files (0.7% non-contiguous), 231256220/244057440 blocks e2fsck -fy /dev/md2 9.57s user 2.05s system 5% cpu 3:14.40 total Tried to delete that directory - impossible, i/o errors. I'll try to reboot now to see if anything changes... Thanks for your help. -- Zlatko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 18:43 ` Zlatko Calusic @ 2014-09-30 19:29 ` Darrick J. Wong 2014-09-30 20:10 ` Zlatko Calusic 2014-09-30 19:54 ` Theodore Ts'o 1 sibling, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2014-09-30 19:29 UTC (permalink / raw) To: Zlatko Calusic; +Cc: linux-ext4 On Tue, Sep 30, 2014 at 08:43:04PM +0200, Zlatko Calusic wrote: > On 30.09.2014 20:30, Darrick J. Wong wrote: > >On Tue, Sep 30, 2014 at 07:56:36PM +0200, Zlatko Calusic wrote: > >>Hope this is the right list to ask this question. > >> > >>I have an ext4 filesystem that has a few errors like this: > >> > >>Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): > >>ext4_lookup:1448: inode #7913865: comm find: deleted inode > >>referenced: 7912058 > >>Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): > >>ext4_lookup:1448: inode #7913865: comm find: deleted inode > >>referenced: 7912055 > >> > >>Yet, when I run e2fsck -fy on it, I have a clean run, no errors are > >>found and/or fixed. Is this the expected behaviour? What am I > >>supposed to do to get rid of errors like the above? > > > >[I should hope not.] > > > >>The filesystem is on a md mirror device, the kernel is 3.17.0-rc7, > >>e2progs 1.42.12-1 (Debian sid). Could md device somehow interfere? I > >>ran md check yesterday, but there were no errors. > >> > >>BTW, this all started when I got ata2.00: failed command: FLUSH > >>CACHE EXT error yesterday morning. I did several runs of e2fsck > >>before the filesystem came up clean, yet errors like the above are > >>popping constantly. > > > >Normally that kernel message only happens if a dir refers to an inode with > >link_count and mode set to 0. > > > >Is the disk attached to ata2.00 one of the RAID1 mirrors? What was the full > >error message, and does smartctl -a report anything? > > Yes, it is part of the mirror: > > ata2.00: ATA-8: WDC WD1002FBYS-02A6B0, 03.00C06, max UDMA/133 > ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA > ata2.00: configured for UDMA/133 > > md2 : active raid1 sdb2[0] sda2[1] > 976229760 blocks [2/2] [UU] > bitmap: 0/8 pages [0KB], 65536KB chunk > > Full error message from the kernel log, together with data check I > did in the evening: > > Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 > SErr 0x4010000 action 0xe frozen > Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, > connection status changed > Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch } > Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT > Sep 29 05:07:51 atlas kernel: ata2.00: cmd > ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a res > 40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) > Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY } > Sep 29 05:07:51 atlas kernel: ata2: hard resetting link > Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please > be patient (ready=0) > Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus > 123 SControl 300) > Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133 > Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10 > Sep 29 05:08:00 atlas kernel: ata2: EH complete > Sep 29 05:37:36 atlas kernel: EXT4-fs error (device md2): > ext4_mb_generate_buddy:757: group 1783, block bitmap and bg > descriptor inconsistent: 8218 vs 9292 free clusters > Sep 29 05:37:36 atlas kernel: JBD2: Spotted dirty metadata buffer > (dev = md2, blocknr = 0). There's a risk of filesystem corruption in > case of system crash. > Sep 29 16:03:43 atlas kernel: EXT4-fs error (device md2): > ext4_mb_generate_buddy:757: group 995, block bitmap and bg > descriptor inconsistent: 15932 vs 15939 free clusters > Sep 29 16:03:43 atlas kernel: EXT4-fs error (device md2): > ext4_mb_generate_buddy:757: group 1732, block bitmap and bg > descriptor inconsistent: 5055 vs 5705 free clusters > Sep 29 19:24:01 atlas kernel: md: data-check of RAID array md2 > Sep 29 19:24:01 atlas kernel: md: minimum _guaranteed_ speed: 1000 > KB/sec/disk. > Sep 29 19:24:01 atlas kernel: md: using maximum available idle IO > bandwidth (but not more than 200000 KB/sec) for data-check. > Sep 29 19:24:01 atlas kernel: md: using 128k window, over a total of > 976229760k. > Sep 29 22:37:53 atlas kernel: md: md2: data-check done. > > > Later on I did several (at least 3) e2fsck runs until the filesystem > finally was clean of errors. Only to stumble upon new errors today > that can't be fixed with e2fsck anymore. :( > > > > >It would be interesting to see what "debugfs -R 'stat <7912058>' /dev/md2" > >returns. > > Inode: 7912058 Type: regular Mode: 0644 Flags: 0x80000 > Generation: 252726504 Version: 0x00000000:00000001 > User: 0 Group: 0 Size: 0 > File ACL: 0 Directory ACL: 0 > Links: 0 Blockcount: 0 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x5428ccf9:667449f0 -- Mon Sep 29 05:07:37 2014 > atime: 0x5428ccf9:65fa3740 -- Mon Sep 29 05:07:37 2014 > mtime: 0x5428ccf9:667449f0 -- Mon Sep 29 05:07:37 2014 > crtime: 0x53451666:d35246b0 -- Wed Apr 9 11:44:06 2014 > dtime: 0x5428ccf9 -- Mon Sep 29 05:07:37 2014 > Size of extra inode fields: 28 > EXTENTS: Huh. This looks like a normal deleted file... just to ensure we're sane, what's the output of: debugfs -R 'ls <7913865>' /dev/md2 debugfs -R 'ncheck 7913865' /dev/md2 Hoping 7913865 -> /ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters > At this time there seems to be 7 such files. Here's what it looks like: > > {atlas} [/ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters]# ls -la > ls: cannot access colormod.so: Input/output error > ls: cannot access bumpmap.so: Input/output error > ls: cannot access bumpmap.la: Input/output error > ls: cannot access testfilter.la: Input/output error > ls: cannot access testfilter.so: Input/output error > ls: cannot access colormod.la: Input/output error > total 8 > drwxr-xr-x 2 root root 4096 Sep 28 11:10 . > drwxr-xr-x 4 root root 4096 Sep 14 2013 .. > -????????? ? ? ? ? ? bumpmap.la > -????????? ? ? ? ? ? bumpmap.so > -????????? ? ? ? ? ? colormod.la > -????????? ? ? ? ? ? colormod.so > -????????? ? ? ? ? ? testfilter.la > -????????? ? ? ? ? ? testfilter.so > {atlas} [/ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters]# cd > {atlas} [~]# umount /ext > tim{atlas} [~]# time e2fsck -fy /dev/md2 > e2fsck 1.42.12 (29-Aug-2014) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > /dev/md2: 3863428/61022208 files (0.7% non-contiguous), > 231256220/244057440 blocks > e2fsck -fy /dev/md2 9.57s user 2.05s system 5% cpu 3:14.40 total By any chance did you save the e2fsck logs? Digging through the e2fsck source code, the only way an inode gets marked used is if i_link_count > 0 or ... badblocks thinks the inode table block is bad. What does this say? debugfs -R 'stat <1>' /dev/md2 > Tried to delete that directory - impossible, i/o errors. I'll try to > reboot now to see if anything changes... In theory we can use debugfs to clear the directory and then run e2fsck to clean up, but let's sanity-check the world before we resort to that. :) --D > > Thanks for your help. > -- > Zlatko > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 19:29 ` Darrick J. Wong @ 2014-09-30 20:10 ` Zlatko Calusic 0 siblings, 0 replies; 10+ messages in thread From: Zlatko Calusic @ 2014-09-30 20:10 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-ext4 On 30.09.2014 21:29, Darrick J. Wong wrote: > Huh. This looks like a normal deleted file... just to ensure we're sane, > what's the output of: > > debugfs -R 'ls <7913865>' /dev/md2 7913865 (12) . 7913864 (12) .. 7912053 (20) bumpmap.la 7912054 (20) bumpmap.so 7912055 (20) colormod.la 7912056 (20) colormod.so 7912057 (24) testfilter.la 7912058 (3968) testfilter.so > debugfs -R 'ncheck 7913865' /dev/md2 Inode Pathname 7913865 /backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters > > Hoping 7913865 -> /ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters It is. > > By any chance did you save the e2fsck logs? I'm afraid not. I was in single-user mode, it scrolled much. And later the log in /var/log/fsck was overwritten with clean fsck run. :( Always thought that history of fsck logs should be kept... something like /var/log/syslog. > > Digging through the e2fsck source code, the only way an inode gets marked used > is if i_link_count > 0 or ... badblocks thinks the inode table block is bad. > What does this say? > > debugfs -R 'stat <1>' /dev/md2 Inode: 1 Type: bad type Mode: 0000 Flags: 0x0 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4ddb7efa -- Tue May 24 11:48:42 2011 atime: 0x4ddb7efa -- Tue May 24 11:48:42 2011 mtime: 0x4ddb7efa -- Tue May 24 11:48:42 2011 Size of extra inode fields: 0 BLOCKS: > >> Tried to delete that directory - impossible, i/o errors. I'll try to >> reboot now to see if anything changes... > > In theory we can use debugfs to clear the directory and then run e2fsck to > clean up, but let's sanity-check the world before we resort to that. :) > My thought exactly. Now that you reminded me that debugfs exists, I'm pretty convinced I could just unlink one directory and one file with it, run e2fsck and maybe be done. But, if you have any other ideas in the meantime, let's test them first. -- Zlatko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 18:43 ` Zlatko Calusic 2014-09-30 19:29 ` Darrick J. Wong @ 2014-09-30 19:54 ` Theodore Ts'o 2014-09-30 20:27 ` Zlatko Calusic 1 sibling, 1 reply; 10+ messages in thread From: Theodore Ts'o @ 2014-09-30 19:54 UTC (permalink / raw) To: Zlatko Calusic; +Cc: Darrick J. Wong, linux-ext4 On Tue, Sep 30, 2014 at 08:43:04PM +0200, Zlatko Calusic wrote: > Full error message from the kernel log, together with data check I did in > the evening: > > Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr > 0x4010000 action 0xe frozen > Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, connection > status changed > Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch } > Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT > Sep 29 05:07:51 atlas kernel: ata2.00: cmd > ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a res > 40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) > Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY } > Sep 29 05:07:51 atlas kernel: ata2: hard resetting link > Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please be > patient (ready=0) > Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 > SControl 300) > Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133 > Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10 > Sep 29 05:08:00 atlas kernel: ata2: EH complete That looks really bad; it sounds like you have a hardware error on at least one of your disks. Have you tried running running badblocks on both disks to make sure the disk isn't flagging more bad blocks, and then resynchronizing the RAID 1 array? Then try running e2fsck again. - Ted ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 19:54 ` Theodore Ts'o @ 2014-09-30 20:27 ` Zlatko Calusic 2014-09-30 20:36 ` Darrick J. Wong 2014-10-01 6:44 ` Zlatko Calusic 0 siblings, 2 replies; 10+ messages in thread From: Zlatko Calusic @ 2014-09-30 20:27 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Darrick J. Wong, linux-ext4 On 30.09.2014 21:54, Theodore Ts'o wrote: > On Tue, Sep 30, 2014 at 08:43:04PM +0200, Zlatko Calusic wrote: >> Full error message from the kernel log, together with data check I did in >> the evening: >> >> Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr >> 0x4010000 action 0xe frozen >> Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, connection >> status changed >> Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch } >> Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT >> Sep 29 05:07:51 atlas kernel: ata2.00: cmd >> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a res >> 40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) >> Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY } >> Sep 29 05:07:51 atlas kernel: ata2: hard resetting link >> Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please be >> patient (ready=0) >> Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 >> SControl 300) >> Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133 >> Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10 >> Sep 29 05:08:00 atlas kernel: ata2: EH complete > > That looks really bad; it sounds like you have a hardware error on at > least one of your disks. Have you tried running running badblocks on > both disks to make sure the disk isn't flagging more bad blocks, and > then resynchronizing the RAID 1 array? Then try running e2fsck again. > Yep, both disks are pretty old, somewhere at the end of warranty. Yet the interesting thing is that exactly that error (FLUSH CACHE EXT) happened from time to time, say once a year, but never before I got in such trouble that e2fsck wouldn't save the day after one quick run. I now remember Darrick also asked for smartctl data. Here it is: /dev/sda ======== Power_On_Hours 40984 and only 2 SMART READ/WRITE LOG errors in the log from long time ago... ATA Error Count: 2 Error 1 occurred at disk power-on lifetime: 14493 hours (603 days + 21 hours) Error 2 occurred at disk power-on lifetime: 14493 hours (603 days + 21 hours) Full: http://pastebin.com/GnQhACXf /dev/sdb (I believe the disk responsible for the problem) ======== Power_On_Hours 40978 No Errors Logged Full: http://pastebin.com/nUB2q0Tk Unless you have other ideas, I will run badblocks. Although, as ext4 fs is on /dev/md2, I think I should run it on /dev/md2 only? Do you really mean to run it on /dev/sda2, /dev/sdb2 - underlying devices? I'm not sure how MD would cope with it. But, I'm pretty sure that it will come out clean. The md check I did last night would surely detected bad blocks if there were any. Or not? Thanks for your help! -- Zlatko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 20:27 ` Zlatko Calusic @ 2014-09-30 20:36 ` Darrick J. Wong 2014-09-30 21:34 ` Zlatko Calusic 2014-10-01 6:44 ` Zlatko Calusic 1 sibling, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2014-09-30 20:36 UTC (permalink / raw) To: Zlatko Calusic; +Cc: Theodore Ts'o, linux-ext4 On Tue, Sep 30, 2014 at 10:27:12PM +0200, Zlatko Calusic wrote: > On 30.09.2014 21:54, Theodore Ts'o wrote: > >On Tue, Sep 30, 2014 at 08:43:04PM +0200, Zlatko Calusic wrote: > >>Full error message from the kernel log, together with data check I did in > >>the evening: > >> > >>Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr > >>0x4010000 action 0xe frozen > >>Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, connection > >>status changed > >>Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch } > >>Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT > >>Sep 29 05:07:51 atlas kernel: ata2.00: cmd > >>ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a res > >>40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) > >>Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY } > >>Sep 29 05:07:51 atlas kernel: ata2: hard resetting link > >>Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please be > >>patient (ready=0) > >>Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 > >>SControl 300) > >>Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133 > >>Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10 > >>Sep 29 05:08:00 atlas kernel: ata2: EH complete > > > >That looks really bad; it sounds like you have a hardware error on at > >least one of your disks. Have you tried running running badblocks on > >both disks to make sure the disk isn't flagging more bad blocks, and > >then resynchronizing the RAID 1 array? Then try running e2fsck again. > > > > Yep, both disks are pretty old, somewhere at the end of warranty. > Yet the interesting thing is that exactly that error (FLUSH CACHE > EXT) happened from time to time, say once a year, but never before I > got in such trouble that e2fsck wouldn't save the day after one > quick run. > > I now remember Darrick also asked for smartctl data. Here it is: > > /dev/sda > ======== > Power_On_Hours 40984 > > and only 2 SMART READ/WRITE LOG errors in the log from long time ago... > > ATA Error Count: 2 > Error 1 occurred at disk power-on lifetime: 14493 hours (603 days + > 21 hours) > Error 2 occurred at disk power-on lifetime: 14493 hours (603 days + > 21 hours) Yes, that looks like SMART commands colliding, i.e. not a media error. > Full: http://pastebin.com/GnQhACXf > > /dev/sdb (I believe the disk responsible for the problem) sysfs will tell you: $ ls /sys/block/sd* lrwxrwxrwx 1 root root 0 Sep 30 13:31 /sys/block/sda -> ../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/ ^^^^ (So on this system, sda is ata1.00.) > ======== > Power_On_Hours 40978 > > No Errors Logged > > Full: http://pastebin.com/nUB2q0Tk > > Unless you have other ideas, I will run badblocks. Although, as ext4 > fs is on /dev/md2, I think I should run it on /dev/md2 only? Do you > really mean to run it on /dev/sda2, /dev/sdb2 - underlying devices? > I'm not sure how MD would cope with it. Both drives, individually. md should be fine with that, since you're only running the read test. It would be fun to be able to poke around with an 'e2image /dev/md2' file, but given the 1TB volume that might be unmanageable. > But, I'm pretty sure that it will come out clean. The md check I did > last night would surely detected bad blocks if there were any. Or > not? Let's hope so, unless it's some weird transient error... --D > > Thanks for your help! > -- > Zlatko > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 20:36 ` Darrick J. Wong @ 2014-09-30 21:34 ` Zlatko Calusic 0 siblings, 0 replies; 10+ messages in thread From: Zlatko Calusic @ 2014-09-30 21:34 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Theodore Ts'o, linux-ext4 On 30.09.2014 22:36, Darrick J. Wong wrote: > > sysfs will tell you: > > $ ls /sys/block/sd* > lrwxrwxrwx 1 root root 0 Sep 30 13:31 /sys/block/sda -> > ../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/ > > ^^^^ > Nice, thanks. The error came from /dev/sdb definitely. > > Both drives, individually. > > md should be fine with that, since you're only running the read test. > You're right, of course, what was I thinking. :) OK, 2 x badblocks already running. Will report tomorrow how it ended. > It would be fun to be able to poke around with an 'e2image /dev/md2' file, but > given the 1TB volume that might be unmanageable. > Yeah. Together with the fact that fs has millions of files + not enough space on other disk, I'll have to pass on that one. >> But, I'm pretty sure that it will come out clean. The md check I did >> last night would surely detected bad blocks if there were any. Or >> not? > > Let's hope so, unless it's some weird transient error... > The worst kind. :( Dunno. I guess tomorrow I'll try unlink from debugfs. Then e2fsck. First, we'll see if there are any bad blocks around. Regards, -- Zlatko ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: e2fsck not fixing deleted inode referenced errors? 2014-09-30 20:27 ` Zlatko Calusic 2014-09-30 20:36 ` Darrick J. Wong @ 2014-10-01 6:44 ` Zlatko Calusic 1 sibling, 0 replies; 10+ messages in thread From: Zlatko Calusic @ 2014-10-01 6:44 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Darrick J. Wong, linux-ext4 On 30.09.2014 22:27, Zlatko Calusic wrote: > > Unless you have other ideas, I will run badblocks. Although, as ext4 fs > is on /dev/md2, I think I should run it on /dev/md2 only? Do you really > mean to run it on /dev/sda2, /dev/sdb2 - underlying devices? I'm not > sure how MD would cope with it. > > But, I'm pretty sure that it will come out clean. The md check I did > last night would surely detected bad blocks if there were any. Or not? So, I ran badblocks on both underlying devices, and both came up clean, as I suspected would happen. Then I ran debugfs -w /dev/md2 and unlinked 7 inodes that were problematic. Afterwards e2fsck found those inodes and connected them to lost+found. Second e2fsck run finished clean. And at this time I see no errors on the fs. This is definitely the case where e2fsck can't see or repair fs errors that are real. Regards, -- Zlatko ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-01 6:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-30 17:56 e2fsck not fixing deleted inode referenced errors? Zlatko Calusic 2014-09-30 18:30 ` Darrick J. Wong 2014-09-30 18:43 ` Zlatko Calusic 2014-09-30 19:29 ` Darrick J. Wong 2014-09-30 20:10 ` Zlatko Calusic 2014-09-30 19:54 ` Theodore Ts'o 2014-09-30 20:27 ` Zlatko Calusic 2014-09-30 20:36 ` Darrick J. Wong 2014-09-30 21:34 ` Zlatko Calusic 2014-10-01 6:44 ` Zlatko Calusic
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).