2nd Attempt - FSCK Errors

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 2nd Attempt - FSCK Errors
@ 2013-04-30 14:00 Stephen Elliott
  2013-04-30 16:14 ` Andreas Dilger
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Elliott @ 2013-04-30 14:00 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel

2nd Attempt…

From: Stephen Elliott [mailto:techweb@ntlworld.com] 
Sent: 26 April 2013 22:37
To: 'linux-ext4@vger.kernel.org'
Subject: FSCK Errors

Hi all,

Just rebooted my box today after 200 days uptime and thought I'd request a volume scan and it found errors! I've never had a power outage etc so am keen to know what could have caused this file system corruption? Anyu ideas???

I'm running 4.2.21 on a ReadyNAS Pro6, but ultimately it is a Linux (Debian) 2.6.37.6. based system underneath. 

***** File system check forced at Fri Apr 26 20:08:38 WEST 2013 ***** fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1: Checking inodes, blocks, and sizes Inode 4195619, i_blocks is 3135728, should be 3135904. Fix? yes


Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 4195619: 167904376 167904377 167904378 167904379 167904380 167904381 167904382 167904383 167904384 167904385 167904386 167949296 167949297 167949298 167949299 167949300 167949301 167949302 167949303 167949304 167949305 167949306 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 1 inodes containing multiply-claimed blocks.)

File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode #4195619, mod time Fri Apr 26 20:07:42 2013)
has 22 multiply-claimed block(s), shared with 0 file(s):
Multiply-claimed blocks already reassigned or cloned.

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/c/c: ***** FILE SYSTEM WAS MODIFIED *****
/dev/c/c: 615898/30212096 files (13.6% non-contiguous), 62353456/483393536 blocks

I'm curious to know what this is all about, shared with 0 files!!!!:

File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode #4195619, mod time Fri Apr 26 20:07:42 2013)
has 22 multiply-claimed block(s), shared with 0 file(s):
Multiply-claimed blocks already reassigned or cloned.


Every follow on boot with FSCK yields the following:


***** File system check forced at Fri Apr 26 20:21:42 WEST 2013 ***** fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1: Checking inodes, blocks, and sizes

Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 4195619: 167904376 167904377 167904378 167904379 167904380 167904381 167904382 167904383 167904384 167904385 167904386 167949296 167949297 167949298 167949299 167949300 167949301 167949302 167949303 167949304 167949305 167949306 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 1 inodes containing multiply-claimed blocks.)

File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode #4195619, mod time Fri Apr 26 20:07:42 2013)
  has 22 multiply-claimed block(s), shared with 0 file(s):
Multiply-claimed blocks already reassigned or cloned.

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/c/c: 615898/30212096 files (13.6% non-contiguous), 62353456/483393536 blocks

Still same thing with multiply claimed blocks :)☹

Any ideas?

Many Thanks
Stephen Elliott 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2nd Attempt - FSCK Errors
  2013-04-30 14:00 2nd Attempt - FSCK Errors Stephen Elliott
@ 2013-04-30 16:14 ` Andreas Dilger
  2013-04-30 16:25   ` Stephen Elliott
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andreas Dilger @ 2013-04-30 16:14 UTC (permalink / raw)
  To: <techweb@ntlworld.com>; +Cc: <linux-ext4@vger.kernel.org>

On 2013-04-30, at 8:00, "Stephen Elliott" <techweb@ntlworld.com> wrote:
> Just rebooted my box today after 200 days uptime and thought I'd request a volume scan and it found errors! I've never had a power outage etc so am keen to know what could have caused this file system corruption? Anyu ideas???
> 
> I'm running 4.2.21 on a ReadyNAS Pro6, but ultimately it is a Linux (Debian) 2.6.37.6. based system underneath. 
> 
> ***** File system check forced at Fri Apr 26 20:08:38 WEST 2013 ***** fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1: Checking inodes, blocks, and sizes Inode 4195619, i_blocks is 3135728, should be 3135904. Fix? yes

This is because the inode shows 176 sectors = 22 filesystem blocks
allocated than expected. Is this perhaps an extent format file? Try
"lsattr {filename}" and look for "e" in the file flags. 

> Running additional passes to resolve blocks claimed by more than one inode...
> Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 4195619: 167904376 167904377 167904378 167904379 167904380 167904381 167904382 167904383 167904384 167904385 167904386 167949296 167949297 167949298 167949299 167949300 167949301 167949302 167949303 167949304 167949305 167949306 Pass 1C: Scanning directories for inodes with multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There are 1 inodes containing multiply-claimed blocks.

This is consistent with the one inode suddenly growing 22 blocks longer. 

> File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode #4195619, mod time Fri Apr 26 20:07:42 2013)
> has 22 multiply-claimed block(s), shared with 0 file(s):
> Multiply-claimed blocks already reassigned or cloned.

This could be failing if the duplicate blocks are inside the same file?
I don't know if that is something that e2fsck expects or not?  I wonder
if the extent tree is corrupted in some manner, but it isn't being detected
during the duplicate block scan.

This file looks big and important, so the first thing I would suggest is to
make a backup copy of it ASAP if you haven't already (having a
backup is always a good idea). Then, I'd suggest to update to
the latest e2fsprogs 1.42.7 and try again, since there was a
bug fixed in the e2fsck extent handling.

If that doesn't fix it, please dump the allocated file blocks with
"debugfs -c -R 'stat <4195619>' /dev/c/c" so we can see
what it looks like (probably gzipped
and as an attachment, since
it will be pretty large).

Cheers, Andreas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-04-30 16:14 ` Andreas Dilger
@ 2013-04-30 16:25   ` Stephen Elliott
  2013-04-30 17:58   ` Stephen Elliott
  2013-05-03 10:55   ` Stephen Elliott
  2 siblings, 0 replies; 12+ messages in thread
From: Stephen Elliott @ 2013-04-30 16:25 UTC (permalink / raw)
  To: 'Andreas Dilger'; +Cc: linux-ext4

Hi,

Appreciate the help...

As requested:

despair:/c/PREMIER# lsattr "Premier Automation Purchase OrdersApp V18.5.mdb"
-------------e- Premier Automation Purchase OrdersApp V18.5.mdb

What does this mean???

The file is important as it is an Orders database and is backed up daily :)
So far no actual issues experienced with using the file (MS Access DB)

I can perhaps run that debugfs command but to be clear I assume there is no
risk providing I unmount the FS beforehand? How big is the file generated
likely to be, ball park? 

As for the other options of updating, I am somewhat bound by the FW versions
the ReadyNAS units use, unless I start messing with it, which would likely
void any warranty anyway.

Many Thanks
Stephen Elliott

-----Original Message-----
From: Andreas Dilger [mailto:adilger@dilger.ca] 
Sent: 30 April 2013 17:15
To: <techweb@ntlworld.com>
Cc: <linux-ext4@vger.kernel.org>
Subject: Re: 2nd Attempt - FSCK Errors

On 2013-04-30, at 8:00, "Stephen Elliott" <techweb@ntlworld.com> wrote:
> Just rebooted my box today after 200 days uptime and thought I'd request a
volume scan and it found errors! I've never had a power outage etc so am
keen to know what could have caused this file system corruption? Anyu
ideas???
> 
> I'm running 4.2.21 on a ReadyNAS Pro6, but ultimately it is a Linux
(Debian) 2.6.37.6. based system underneath. 
> 
> ***** File system check forced at Fri Apr 26 20:08:38 WEST 2013 ***** 
> fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1: 
> Checking inodes, blocks, and sizes Inode 4195619, i_blocks is 3135728, 
> should be 3135904. Fix? yes

This is because the inode shows 176 sectors = 22 filesystem blocks allocated
than expected. Is this perhaps an extent format file? Try "lsattr
{filename}" and look for "e" in the file flags. 

> Running additional passes to resolve blocks claimed by more than one
inode...
> Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s)
in inode 4195619: 167904376 167904377 167904378 167904379 167904380
167904381 167904382 167904383 167904384 167904385 167904386 167949296
167949297 167949298 167949299 167949300 167949301 167949302 167949303
167949304 167949305 167949306 Pass 1C: Scanning directories for inodes with
multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There
are 1 inodes containing multiply-claimed blocks.

This is consistent with the one inode suddenly growing 22 blocks longer. 

> File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode 
> #4195619, mod time Fri Apr 26 20:07:42 2013) has 22 multiply-claimed
block(s), shared with 0 file(s):
> Multiply-claimed blocks already reassigned or cloned.

This could be failing if the duplicate blocks are inside the same file?
I don't know if that is something that e2fsck expects or not?  I wonder if
the extent tree is corrupted in some manner, but it isn't being detected
during the duplicate block scan.

This file looks big and important, so the first thing I would suggest is to
make a backup copy of it ASAP if you haven't already (having a backup is
always a good idea). Then, I'd suggest to update to the latest e2fsprogs
1.42.7 and try again, since there was a bug fixed in the e2fsck extent
handling.

If that doesn't fix it, please dump the allocated file blocks with "debugfs
-c -R 'stat <4195619>' /dev/c/c" so we can see what it looks like (probably
gzipped and as an attachment, since it will be pretty large).

Cheers, Andreas=

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-04-30 16:14 ` Andreas Dilger
  2013-04-30 16:25   ` Stephen Elliott
@ 2013-04-30 17:58   ` Stephen Elliott
  2013-05-03 10:55   ` Stephen Elliott
  2 siblings, 0 replies; 12+ messages in thread
From: Stephen Elliott @ 2013-04-30 17:58 UTC (permalink / raw)
  To: 'Andreas Dilger'; +Cc: linux-ext4

Hi,

To further add, I ran the debugfs on the backup server but the output was
strange:

chaos:~# ls -i  /c/PREMIER
11141250 Premier Automation Purchase OrdersApp V18.5_Backup.mdb
11141790 Premier Automation Purchase OrdersApp V18.5.ldb
11142379 Premier Automation Purchase OrdersApp V18.5.mdb
11142332 PremierData.ldb
11142439 PremierData.mdb
11141128 Premier Y
11141129 Robots Bakups DO NOT USE SEE PDW
11141130 UpdateDrivers
11141252 Wkb.xwb
11141253 Workbench.xwb

chaos:~# ^[[3Pmount /dev/c/c

chaos:~# debugfs -c -R stat '<11142379>'
/dev/c/c\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b^[[1P\b\b\b\b\b^[[1@'
debugfs 1.41.14 (22-Dec-2010)
/dev/c/c: catastrophic mode - not reading inode or group bitmaps
Inode: 11142379   Type: bad type    Mode:  0000   Flags: 0x0
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x00000000 -- Thu Jan  1 00:00:00 1970
atime: 0x00000000 -- Thu Jan  1 00:00:00 1970
mtime: 0x00000000 -- Thu Jan  1 00:00:00 1970
Size of extra inode fields: 0
BLOCKS:

I suspect there may be another issue here with this tool on the ReadyNAS
devices.

Many Thanks
Stephen Elliott

-----Original Message-----
From: Stephen Elliott [mailto:techweb@ntlworld.com] 
Sent: 30 April 2013 17:26
To: 'Andreas Dilger'
Cc: '<linux-ext4@vger.kernel.org>'
Subject: RE: 2nd Attempt - FSCK Errors

Hi,

Appreciate the help...

As requested:

despair:/c/PREMIER# lsattr "Premier Automation Purchase OrdersApp V18.5.mdb"
-------------e- Premier Automation Purchase OrdersApp V18.5.mdb

What does this mean???

The file is important as it is an Orders database and is backed up daily :)
So far no actual issues experienced with using the file (MS Access DB)

I can perhaps run that debugfs command but to be clear I assume there is no
risk providing I unmount the FS beforehand? How big is the file generated
likely to be, ball park? 

As for the other options of updating, I am somewhat bound by the FW versions
the ReadyNAS units use, unless I start messing with it, which would likely
void any warranty anyway.

Many Thanks
Stephen Elliott

-----Original Message-----
From: Andreas Dilger [mailto:adilger@dilger.ca]
Sent: 30 April 2013 17:15
To: <techweb@ntlworld.com>
Cc: <linux-ext4@vger.kernel.org>
Subject: Re: 2nd Attempt - FSCK Errors

On 2013-04-30, at 8:00, "Stephen Elliott" <techweb@ntlworld.com> wrote:
> Just rebooted my box today after 200 days uptime and thought I'd request a
volume scan and it found errors! I've never had a power outage etc so am
keen to know what could have caused this file system corruption? Anyu
ideas???
> 
> I'm running 4.2.21 on a ReadyNAS Pro6, but ultimately it is a Linux
(Debian) 2.6.37.6. based system underneath. 
> 
> ***** File system check forced at Fri Apr 26 20:08:38 WEST 2013 ***** 
> fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1:
> Checking inodes, blocks, and sizes Inode 4195619, i_blocks is 3135728, 
> should be 3135904. Fix? yes

This is because the inode shows 176 sectors = 22 filesystem blocks allocated
than expected. Is this perhaps an extent format file? Try "lsattr
{filename}" and look for "e" in the file flags. 

> Running additional passes to resolve blocks claimed by more than one
inode...
> Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s)
in inode 4195619: 167904376 167904377 167904378 167904379 167904380
167904381 167904382 167904383 167904384 167904385 167904386 167949296
167949297 167949298 167949299 167949300 167949301 167949302 167949303
167949304 167949305 167949306 Pass 1C: Scanning directories for inodes with
multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There
are 1 inodes containing multiply-claimed blocks.

This is consistent with the one inode suddenly growing 22 blocks longer. 

> File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode 
> #4195619, mod time Fri Apr 26 20:07:42 2013) has 22 multiply-claimed
block(s), shared with 0 file(s):
> Multiply-claimed blocks already reassigned or cloned.

This could be failing if the duplicate blocks are inside the same file?
I don't know if that is something that e2fsck expects or not?  I wonder if
the extent tree is corrupted in some manner, but it isn't being detected
during the duplicate block scan.

This file looks big and important, so the first thing I would suggest is to
make a backup copy of it ASAP if you haven't already (having a backup is
always a good idea). Then, I'd suggest to update to the latest e2fsprogs
1.42.7 and try again, since there was a bug fixed in the e2fsck extent
handling.

If that doesn't fix it, please dump the allocated file blocks with "debugfs
-c -R 'stat <4195619>' /dev/c/c" so we can see what it looks like (probably
gzipped and as an attachment, since it will be pretty large).

Cheers, Andreas=


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-04-30 16:14 ` Andreas Dilger
  2013-04-30 16:25   ` Stephen Elliott
  2013-04-30 17:58   ` Stephen Elliott
@ 2013-05-03 10:55   ` Stephen Elliott
  2013-05-03 13:14     ` Theodore Ts'o
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Elliott @ 2013-05-03 10:55 UTC (permalink / raw)
  To: 'Andreas Dilger'; +Cc: linux-ext4

Any feedback to this folks???

-----Original Message-----
From: Stephen Elliott [mailto:techweb@ntlworld.com] 
Sent: 30 April 2013 18:59
To: 'Andreas Dilger'
Cc: '<linux-ext4@vger.kernel.org>'
Subject: RE: 2nd Attempt - FSCK Errors

Hi,

To further add, I ran the debugfs on the backup server but the output was
strange:

chaos:~# ls -i  /c/PREMIER
11141250 Premier Automation Purchase OrdersApp V18.5_Backup.mdb
11141790 Premier Automation Purchase OrdersApp V18.5.ldb
11142379 Premier Automation Purchase OrdersApp V18.5.mdb
11142332 PremierData.ldb
11142439 PremierData.mdb
11141128 Premier Y
11141129 Robots Bakups DO NOT USE SEE PDW
11141130 UpdateDrivers
11141252 Wkb.xwb
11141253 Workbench.xwb

chaos:~# ^[[3Pmount /dev/c/c

chaos:~# debugfs -c -R stat '<11142379>'
/dev/c/c\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b^[[1P\b\b\b\b\b^[[1@'
debugfs 1.41.14 (22-Dec-2010)
/dev/c/c: catastrophic mode - not reading inode or group bitmaps
Inode: 11142379   Type: bad type    Mode:  0000   Flags: 0x0
Generation: 0    Version: 0x00000000
User:     0   Group:     0   Size: 0
File ACL: 0    Directory ACL: 0
Links: 0   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
ctime: 0x00000000 -- Thu Jan  1 00:00:00 1970
atime: 0x00000000 -- Thu Jan  1 00:00:00 1970
mtime: 0x00000000 -- Thu Jan  1 00:00:00 1970 Size of extra inode fields: 0
BLOCKS:

I suspect there may be another issue here with this tool on the ReadyNAS
devices.

Many Thanks
Stephen Elliott

-----Original Message-----
From: Stephen Elliott [mailto:techweb@ntlworld.com]
Sent: 30 April 2013 17:26
To: 'Andreas Dilger'
Cc: '<linux-ext4@vger.kernel.org>'
Subject: RE: 2nd Attempt - FSCK Errors

Hi,

Appreciate the help...

As requested:

despair:/c/PREMIER# lsattr "Premier Automation Purchase OrdersApp V18.5.mdb"
-------------e- Premier Automation Purchase OrdersApp V18.5.mdb

What does this mean???

The file is important as it is an Orders database and is backed up daily :)
So far no actual issues experienced with using the file (MS Access DB)

I can perhaps run that debugfs command but to be clear I assume there is no
risk providing I unmount the FS beforehand? How big is the file generated
likely to be, ball park? 

As for the other options of updating, I am somewhat bound by the FW versions
the ReadyNAS units use, unless I start messing with it, which would likely
void any warranty anyway.

Many Thanks
Stephen Elliott

-----Original Message-----
From: Andreas Dilger [mailto:adilger@dilger.ca]
Sent: 30 April 2013 17:15
To: <techweb@ntlworld.com>
Cc: <linux-ext4@vger.kernel.org>
Subject: Re: 2nd Attempt - FSCK Errors

On 2013-04-30, at 8:00, "Stephen Elliott" <techweb@ntlworld.com> wrote:
> Just rebooted my box today after 200 days uptime and thought I'd request a
volume scan and it found errors! I've never had a power outage etc so am
keen to know what could have caused this file system corruption? Anyu
ideas???
> 
> I'm running 4.2.21 on a ReadyNAS Pro6, but ultimately it is a Linux
(Debian) 2.6.37.6. based system underneath. 
> 
> ***** File system check forced at Fri Apr 26 20:08:38 WEST 2013 ***** 
> fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1:
> Checking inodes, blocks, and sizes Inode 4195619, i_blocks is 3135728, 
> should be 3135904. Fix? yes

This is because the inode shows 176 sectors = 22 filesystem blocks allocated
than expected. Is this perhaps an extent format file? Try "lsattr
{filename}" and look for "e" in the file flags. 

> Running additional passes to resolve blocks claimed by more than one
inode...
> Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s)
in inode 4195619: 167904376 167904377 167904378 167904379 167904380
167904381 167904382 167904383 167904384 167904385 167904386 167949296
167949297 167949298 167949299 167949300 167949301 167949302 167949303
167949304 167949305 167949306 Pass 1C: Scanning directories for inodes with
multiply-claimed blocks Pass 1D: Reconciling multiply-claimed blocks (There
are 1 inodes containing multiply-claimed blocks.

This is consistent with the one inode suddenly growing 22 blocks longer. 

> File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode 
> #4195619, mod time Fri Apr 26 20:07:42 2013) has 22 multiply-claimed
block(s), shared with 0 file(s):
> Multiply-claimed blocks already reassigned or cloned.

This could be failing if the duplicate blocks are inside the same file?
I don't know if that is something that e2fsck expects or not?  I wonder if
the extent tree is corrupted in some manner, but it isn't being detected
during the duplicate block scan.

This file looks big and important, so the first thing I would suggest is to
make a backup copy of it ASAP if you haven't already (having a backup is
always a good idea). Then, I'd suggest to update to the latest e2fsprogs
1.42.7 and try again, since there was a bug fixed in the e2fsck extent
handling.

If that doesn't fix it, please dump the allocated file blocks with "debugfs
-c -R 'stat <4195619>' /dev/c/c" so we can see what it looks like (probably
gzipped and as an attachment, since it will be pretty large).

Cheers, Andreas=


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2nd Attempt - FSCK Errors
  2013-05-03 10:55   ` Stephen Elliott
@ 2013-05-03 13:14     ` Theodore Ts'o
  2013-05-03 13:31       ` Stephen Elliott
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2013-05-03 13:14 UTC (permalink / raw)
  To: Stephen Elliott; +Cc: 'Andreas Dilger', linux-ext4

What you've shown us makes me suspicious about whether the hardware
device is sane or not.  In the previous e2fsck run, it set i_size to a
non-zero value.  Yet when debugfs tries to read the same inode, it's
now seeing all zero's.

So that implies the disk (or software raid device; you haven't been
clear what the underlying storage is for this file system) is not
returning the same information for a particular block as was
previously written.

If the underlying block device is not stable, there really is nothing
for e2fsck to do.  You might want to check /var/log/messages for any
error messages relating to the underlying storage device(s).  If
you're seeing I/O errors in the log files, that would be another hint.

At this point, my recommendation to you is to find a separate disk (or
RAID array if necessary) which is as big as the underlying disk, and
do an image copy (via dd or ddrescue) to a known-good storage device,
and then retry the e2fsck on this copy of the file system.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-05-03 13:14     ` Theodore Ts'o
@ 2013-05-03 13:31       ` Stephen Elliott
  2013-05-03 15:29         ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Elliott @ 2013-05-03 13:31 UTC (permalink / raw)
  To: 'Theodore Ts'o'; +Cc: 'Andreas Dilger', linux-ext4

Well... Funny enough, the device which I ran debugfs on was the other
ReadyNAS device, not the one with the issue anyway. I wanted to test it
first.

I suspect the underlying architecture supporting RAID in these devices
screws with the debugfs interface.

I just find it bizarre that I get the same message regarding multiply
assigned blocks in 0 files on every FSCK as in it never gets resolved.
But... No issues with file access or no bad logs etc. I do have a case open
with Netgear support, since this is basically an appliance.

-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: 03 May 2013 14:14
To: Stephen Elliott
Cc: 'Andreas Dilger'; linux-ext4@vger.kernel.org
Subject: Re: 2nd Attempt - FSCK Errors

What you've shown us makes me suspicious about whether the hardware device
is sane or not.  In the previous e2fsck run, it set i_size to a non-zero
value.  Yet when debugfs tries to read the same inode, it's now seeing all
zero's.

So that implies the disk (or software raid device; you haven't been clear
what the underlying storage is for this file system) is not returning the
same information for a particular block as was previously written.

If the underlying block device is not stable, there really is nothing for
e2fsck to do.  You might want to check /var/log/messages for any error
messages relating to the underlying storage device(s).  If you're seeing I/O
errors in the log files, that would be another hint.

At this point, my recommendation to you is to find a separate disk (or RAID
array if necessary) which is as big as the underlying disk, and do an image
copy (via dd or ddrescue) to a known-good storage device, and then retry the
e2fsck on this copy of the file system.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2nd Attempt - FSCK Errors
  2013-05-03 13:31       ` Stephen Elliott
@ 2013-05-03 15:29         ` Theodore Ts'o
  2013-05-03 18:42           ` Stephen Elliott
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2013-05-03 15:29 UTC (permalink / raw)
  To: Stephen Elliott; +Cc: 'Andreas Dilger', linux-ext4

On Fri, May 03, 2013 at 02:31:41PM +0100, Stephen Elliott wrote:
> Well... Funny enough, the device which I ran debugfs on was the other
> ReadyNAS device, not the one with the issue anyway. I wanted to test it
> first.

Was the inode number from the other device as well?  Sorry, this is
the first I've heard that there are two ReadyNAS devices in play.
It's one of the reasons why it's really painful to try to be a help
desk for these sorts of questions over e-mail....

> I suspect the underlying architecture supporting RAID in these devices
> screws with the debugfs interface.

Debugfs uses the standard Linux block device interface.  If that's not
sane, e2fsck isn't going to be sane either, and it's a kernel bug.  At
that point, you'll have to talk to ReadyNAS folks since they are
providing the kernel you are using (or the wonky non-standard
hardware, or both....)

> I just find it bizarre that I get the same message regarding multiply
> assigned blocks in 0 files on every FSCK as in it never gets resolved.
> But... No issues with file access or no bad logs etc. I do have a case open
> with Netgear support, since this is basically an appliance.

There have been NAS boxes out there which have used non-standard, out
of tree kernel patches that have resulted in their devices using
non-standard file system formats.  So that's yet another thing which
can't be ruled out...

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-05-03 15:29         ` Theodore Ts'o
@ 2013-05-03 18:42           ` Stephen Elliott
  2013-05-03 21:31             ` Theodore Ts'o
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Elliott @ 2013-05-03 18:42 UTC (permalink / raw)
  To: 'Theodore Ts'o'; +Cc: 'Andreas Dilger', linux-ext4

Thanks very much... I do appreciate you guys are developers and not support
people :) The only thing is it is difficult to try and grasp an
understanding when troubleshooting FS problems, since they generally just
resolve themselves in my experience (Slackware linux user for 10+ years).

We have another NAS which acts as a warm standby. To be on the safe side, I
tested debugfs on that just to ensure it didn't crash the box etc. The
standby box is running totally clean.

One thing maybe you could explain (and Andreas gave me his take too) is how
you can multiply assigned blocks shared with "0" files. Andreas offered the
suggestion that they may be in the same file. If this were really the case,
I would suspect there to be some file corruption issues etc...

I think I am going to move the file off the device, run FSCK and then put it
back on and monitor.

-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: 03 May 2013 16:29
To: Stephen Elliott
Cc: 'Andreas Dilger'; linux-ext4@vger.kernel.org
Subject: Re: 2nd Attempt - FSCK Errors

On Fri, May 03, 2013 at 02:31:41PM +0100, Stephen Elliott wrote:
> Well... Funny enough, the device which I ran debugfs on was the other 
> ReadyNAS device, not the one with the issue anyway. I wanted to test 
> it first.

Was the inode number from the other device as well?  Sorry, this is the
first I've heard that there are two ReadyNAS devices in play.
It's one of the reasons why it's really painful to try to be a help desk for
these sorts of questions over e-mail....

> I suspect the underlying architecture supporting RAID in these devices 
> screws with the debugfs interface.

Debugfs uses the standard Linux block device interface.  If that's not sane,
e2fsck isn't going to be sane either, and it's a kernel bug.  At that point,
you'll have to talk to ReadyNAS folks since they are providing the kernel
you are using (or the wonky non-standard hardware, or both....)

> I just find it bizarre that I get the same message regarding multiply 
> assigned blocks in 0 files on every FSCK as in it never gets resolved.
> But... No issues with file access or no bad logs etc. I do have a case 
> open with Netgear support, since this is basically an appliance.

There have been NAS boxes out there which have used non-standard, out of
tree kernel patches that have resulted in their devices using non-standard
file system formats.  So that's yet another thing which can't be ruled
out...

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2nd Attempt - FSCK Errors
  2013-05-03 18:42           ` Stephen Elliott
@ 2013-05-03 21:31             ` Theodore Ts'o
  2013-05-05 11:19               ` Stephen Elliott
  2013-05-08 14:45               ` Stephen Elliott
  0 siblings, 2 replies; 12+ messages in thread
From: Theodore Ts'o @ 2013-05-03 21:31 UTC (permalink / raw)
  To: Stephen Elliott; +Cc: 'Andreas Dilger', linux-ext4

On Fri, May 03, 2013 at 07:42:55PM +0100, Stephen Elliott wrote:
> One thing maybe you could explain (and Andreas gave me his take too) is how
> you can multiply assigned blocks shared with "0" files. Andreas offered the
> suggestion that they may be in the same file. If this were really the case,
> I would suspect there to be some file corruption issues etc...

It could be that the blocks appear multiple times in the file; or it
could be the storage system is returning different data on subsequent
reads from the device.

What we need is a debugfs dump of the inode on the file system that is
having trouble.  A debugfs dump of the inode from a completely
underlated file system is not useful....

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-05-03 21:31             ` Theodore Ts'o
@ 2013-05-05 11:19               ` Stephen Elliott
  2013-05-08 14:45               ` Stephen Elliott
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Elliott @ 2013-05-05 11:19 UTC (permalink / raw)
  To: 'Theodore Ts'o'; +Cc: 'Andreas Dilger', linux-ext4

Understood... When I can get a debugfs dump from the device (following
advice from Netgear) I will post but this may be in 2 or 3 weeks time.

-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: 03 May 2013 22:32
To: Stephen Elliott
Cc: 'Andreas Dilger'; linux-ext4@vger.kernel.org
Subject: Re: 2nd Attempt - FSCK Errors

On Fri, May 03, 2013 at 07:42:55PM +0100, Stephen Elliott wrote:
> One thing maybe you could explain (and Andreas gave me his take too) 
> is how you can multiply assigned blocks shared with "0" files. Andreas 
> offered the suggestion that they may be in the same file. If this were 
> really the case, I would suspect there to be some file corruption issues
etc...

It could be that the blocks appear multiple times in the file; or it could
be the storage system is returning different data on subsequent reads from
the device.

What we need is a debugfs dump of the inode on the file system that is
having trouble.  A debugfs dump of the inode from a completely underlated
file system is not useful....

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: 2nd Attempt - FSCK Errors
  2013-05-03 21:31             ` Theodore Ts'o
  2013-05-05 11:19               ` Stephen Elliott
@ 2013-05-08 14:45               ` Stephen Elliott
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Elliott @ 2013-05-08 14:45 UTC (permalink / raw)
  To: 'Theodore Ts'o'; +Cc: 'Andreas Dilger', linux-ext4

Hi,

Well... I have some new and interesting information. 

After deleting the file, and re-creating from backup, the file system got
mounted read only and the following errors were logged:

May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904376:freeing already freed block
(bit 1144)
May  8 14:58:15 despair kernel: Aborting journal on device dm-0-8.
May  8 14:58:15 despair kernel: EXT4-fs (dm-0): Remounting filesystem
read-only
May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904377:freeing already freed block
(bit 1145)
May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904378:freeing already freed block
(bit 1146)
May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904379:freeing already freed block
(bit 1147)
May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904380:freeing already freed block
(bit 1148)
May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904381:freeing already freed block
(bit 1149)
May  8 14:58:15 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904382:freeing already freed block
(bit 1150)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904383:freeing already freed block
(bit 1151)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904384:freeing already freed block
(bit 1152)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904385:freeing already freed block
(bit 1153)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5124block 167904386:freeing already freed block
(bit 1154)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949296:freeing already freed block
(bit 13296)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949297:freeing already freed block
(bit 13297)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949298:freeing already freed block
(bit 13298)
May  8 14:58:16 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949299:freeing already freed block
(bit 13299)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949300:freeing already freed block
(bit 13300)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949301:freeing already freed block
(bit 13301)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949302:freeing already freed block
(bit 13302)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949303:freeing already freed block
(bit 13303)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949304:freeing already freed block
(bit 13304)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949305:freeing already freed block
(bit 13305)
May  8 14:58:17 despair kernel: EXT4-fs error (device dm-0):
mb_free_blocks:1411: group 5125block 167949306:freeing already freed block
(bit 13306)

These are the same blocks slated as multiply claimed :)

 And then running an FSCK, we got the following:

***** File system check forced at Wed May  8 15:16:50 WEST 2013 ***** fsck
1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012)
/dev/c/c: recovering journal
Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory
structure Pass 3: Checking directory connectivity Pass 4: Checking reference
counts Pass 5: Checking group summary information Free blocks count wrong
for group #5124 (28170, counted=28159).
Fix? yes

Free blocks count wrong for group #5125 (25861, counted=25850).
Fix? yes

Free blocks count wrong (420683133, counted=420644972).
Fix? yes

Free inodes count wrong (29595347, counted=29595271).
Fix? yes

/dev/c/c: ***** FILE SYSTEM WAS MODIFIED *****
/dev/c/c: 616825/30212096 files (13.6% non-contiguous), 62748564/483393536
blocks

Volume seems clean now after reboot and keeps coming back clean.

Any ideas based on this? The old Database was over 1.6 GB in size. The
re-created one from backup is much smaller 33MB (as it was compacted before
restoring). Once these DB files grow they don't get smaller until compacted.
Although I'm sure file size should not have a bearing on this.

Many Thanks
Stephen Elliott

-----Original Message-----
From: Theodore Ts'o [mailto:tytso@mit.edu] 
Sent: 03 May 2013 22:32
To: Stephen Elliott
Cc: 'Andreas Dilger'; linux-ext4@vger.kernel.org
Subject: Re: 2nd Attempt - FSCK Errors

On Fri, May 03, 2013 at 07:42:55PM +0100, Stephen Elliott wrote:
> One thing maybe you could explain (and Andreas gave me his take too) 
> is how you can multiply assigned blocks shared with "0" files. Andreas 
> offered the suggestion that they may be in the same file. If this were 
> really the case, I would suspect there to be some file corruption issues
etc...

It could be that the blocks appear multiple times in the file; or it could
be the storage system is returning different data on subsequent reads from
the device.

What we need is a debugfs dump of the inode on the file system that is
having trouble.  A debugfs dump of the inode from a completely underlated
file system is not useful....

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-05-08 14:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-30 14:00 2nd Attempt - FSCK Errors Stephen Elliott
2013-04-30 16:14 ` Andreas Dilger
2013-04-30 16:25   ` Stephen Elliott
2013-04-30 17:58   ` Stephen Elliott
2013-05-03 10:55   ` Stephen Elliott
2013-05-03 13:14     ` Theodore Ts'o
2013-05-03 13:31       ` Stephen Elliott
2013-05-03 15:29         ` Theodore Ts'o
2013-05-03 18:42           ` Stephen Elliott
2013-05-03 21:31             ` Theodore Ts'o
2013-05-05 11:19               ` Stephen Elliott
2013-05-08 14:45               ` Stephen Elliott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).