public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* e2fsck exit codes
@ 2017-02-14  0:59 thanumalayan mad
  2017-02-14 17:01 ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: thanumalayan mad @ 2017-02-14  0:59 UTC (permalink / raw)
  To: linux-ext4

Hi all,

I'm trying to figure out a way to tell fsck to inform me and abort if
it suspects disk corruption, but continue otherwise. I thought I knew
how to do this, but I don't.

I did the following experiment: I created a new file system using
mkfs.ext4, mounted it, ran a simple workload on it, and hard-rebooted
the machine during the middle of the workload. After rebooting the
machine, I ran "fsck.ext4 -fy" on the partition. Fsck complains about
wrong inode counts and block counts, and exits with code 1.

Is this the expected behavior? I retried the experiment on a couple of
machines, so I know its not an actual corrupted-drive issue. The
following is an example output of fsck:

root@lappy:/home/madthanu# fsck.ext4 -f /dev/sda8
e2fsck 1.42.12 (29-Aug-2014)
/dev/sda8: recovering journal
Clearing orphaned inode 6063 (uid=0, gid=0, mode=0100664, size=2377)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (2710767, counted=2681703).
Fix<y>? no
Free inodes count wrong (698998, counted=684135).
Fix<y>? no
/dev/sda8: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda8: 10/699008 files (80.0% non-contiguous), 83473/2794240 blocks

root@lappy:/home/madthanu# echo $?
1

root@lappy:/home/madthanu# fsck.ext4 -fy /dev/sda8
e2fsck 1.42.12 (29-Aug-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (2710767, counted=2681703).
Fix? yes
Free inodes count wrong (698998, counted=684135).
Fix? yes
/dev/sda8: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda8: 14873/699008 files (0.1% non-contiguous), 112537/2794240 blocks

root@lappy:/home/madthanu# echo $?
1

Is there a correct way to do that? "fsck.ext4 -p" does continue
without complaining much, but as I understand it, it might continue
even if there was disk corruption (so long as fsck has enough
confidence that it restored the correct version of corrupted metadata,
it'll continue).

Thanks,
Thanu

---
(Thanumalayan Sankaranarayana Pillai)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck exit codes
  2017-02-14  0:59 e2fsck exit codes thanumalayan mad
@ 2017-02-14 17:01 ` Theodore Ts'o
  2017-02-14 19:00   ` thanumalayan mad
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2017-02-14 17:01 UTC (permalink / raw)
  To: thanumalayan mad; +Cc: linux-ext4

On Mon, Feb 13, 2017 at 04:59:49PM -0800, thanumalayan mad wrote:
> root@lappy:/home/madthanu# fsck.ext4 -f /dev/sda8
> e2fsck 1.42.12 (29-Aug-2014)
> /dev/sda8: recovering journal
> Clearing orphaned inode 6063 (uid=0, gid=0, mode=0100664, size=2377)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong (2710767, counted=2681703).
> Fix<y>? no
> Free inodes count wrong (698998, counted=684135).
> Fix<y>? no
> /dev/sda8: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/sda8: 10/699008 files (80.0% non-contiguous), 83473/2794240 blocks

This is expected; ext4 doesn't use the overall number of free blocks
and free inodes summarized in the inodes --- and so to improve
scalability, quite a whle back we stoped updating those fields except
on a clean shutdown.  This means that if you crash your system or
other wise leave the file system with an unclean mount, the total
number of free blocks and free inodes will not be updated.

When the kernel mounts the file system, it will tally up the free
block/inode counts from all of the block group descriptors, and use
that value, and on the next clean shutdown, they will be updated
appropriately.  E2fsck is doing the same thing here.  The difference
is that we're asking the user's permission before we fix things, so it
looks like something more serious than it is.

The way you can tell whether or not the file system has errors is to
run e2fsck -fn:

% e2fsck -fn /tmp/foo.img
e2fsck 1.43.4 (31-Jan-2017)
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Inode 15 extent tree (at level 1) could be shorter.  Fix? no

Inode 115 extent tree (at level 1) could be shorter.  Fix? no

Inode 365 extent tree (at level 1) could be shorter.  Fix? no

Inode 741 extent tree (at level 1) could be shorter.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (6830, counted=163).
Fix? no

Free inodes count wrong (2037, counted=0).
Fix? no

/tmp/foo.img: 11/2048 files (900.0% non-contiguous), 1362/8192 blocks
% echo $?
0

If the file system has an error that needs to be fixed, it will return
an exit status of 4:

% e2fsck -fn /tmp/foo.img
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'p0' in /fs (12) has deleted/unused inode 14.  Clear? no

Entry 'p0' in /fs (12) has an incorrect filetype (was 2, should be 0).
Fix? no

Entry '..' in <14>/<17> (17) has deleted/unused inode 14.  Clear? no

Entry '..' in <14>/<17> (17) has an incorrect filetype (was 2, should be 0).
Fix? no

Pass 3: Checking directory connectivity
Unconnected directory inode 17 (<14>/<17>)
Connect to /lost+found? no

'..' in ... (17) is <14> (14), should be <The NULL inode> (0).
Fix? no

Pass 4: Checking reference counts
Inode 12 ref count is 7, should be 6.  Fix? no

Inode 17 ref count is 5, should be 4.  Fix? no

Pass 5: Checking group summary information
Block bitmap differences:  -1363
Fix? no

Inode bitmap differences:  -14
Fix? no

Directories count wrong for group #0 (79, counted=78).
Fix? no


/tmp/foo.img: ********** WARNING: Filesystem still has errors **********

/tmp/foo.img: 2047/2048 files (4.8% non-contiguous), 8025/8192 blocks

% echo $?
4

See the fsck.ext4 man page, the "EXIT CODE" section for an explanation
for how the exit codes work.

					- Ted
					

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck exit codes
  2017-02-14 17:01 ` Theodore Ts'o
@ 2017-02-14 19:00   ` thanumalayan mad
  2017-02-15 15:44     ` Theodore Ts'o
  0 siblings, 1 reply; 5+ messages in thread
From: thanumalayan mad @ 2017-02-14 19:00 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

Thank you for the quick reply! (I was curious about the free-blocks
count too, so thanks for explaining that.) Reponses inlined:

On Tue, Feb 14, 2017 at 9:01 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> The way you can tell whether or not the file system has errors is to
> run e2fsck -fn:
>
> % e2fsck -fn /tmp/foo.img
> e2fsck 1.43.4 (31-Jan-2017)
> Warning: skipping journal recovery because doing a read-only filesystem check.
> Pass 1: Checking inodes, blocks, and sizes
> Inode 15 extent tree (at level 1) could be shorter.  Fix? no
>
> Inode 115 extent tree (at level 1) could be shorter.  Fix? no
>
> Inode 365 extent tree (at level 1) could be shorter.  Fix? no
>
> Inode 741 extent tree (at level 1) could be shorter.  Fix? no
>
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong (6830, counted=163).
> Fix? no
>
> Free inodes count wrong (2037, counted=0).
> Fix? no
>
> /tmp/foo.img: 11/2048 files (900.0% non-contiguous), 1362/8192 blocks
> % echo $?
> 0
>
> If the file system has an error that needs to be fixed, it will return
> an exit status of 4:

I also tested "fsck.ext4 -fn", and it returns 4 even when there is no
disk corruption. That said, doing a "fsck -E journal_only" followed by
a "fsck -fn" does seem to work:

=====

desky madthanu # fsck.ext4 -fn /dev/sdb
e2fsck 1.42.13 (17-May-2015)
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry '1' in / (2) references inode 524289 in group 64 where
_INODE_UNINIT is set.
Fix? no
Entry '1' in / (2) has deleted/unused inode 524289.  Clear? no
Entry '2' in / (2) references inode 131073 in group 16 where
_INODE_UNINIT is set.
Fix? no
Entry '2' in / (2) has deleted/unused inode 131073.  Clear? no
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 2 ref count is 5, should be 3.  Fix? no
Pass 5: Checking group summary information
Block bitmap differences:  -(8871--9768) -(2105376--2106287)
Fix? no
Free blocks count wrong for group #0 (23897, counted=22999).
Fix? no
Free blocks count wrong for group #64 (24544, counted=23632).
Fix? no
Free blocks count wrong (2541777, counted=2539967).
Fix? no
Inode bitmap differences:  -(12--8192)
Fix? no
Free inodes count wrong for group #0 (8181, counted=0).
Fix? no
Free inodes count wrong (655349, counted=647168).
Fix? no
/dev/sdb: ********** WARNING: Filesystem still has errors **********
/dev/sdb: 11/655360 files (0.0% non-contiguous), 79663/2621440 blocks

desky madthanu # echo $?
4

desky madthanu # fsck.ext4 -E journal_only /dev/sdb
e2fsck 1.42.13 (17-May-2015)
/dev/sdb: recovering journal

desky madthanu # echo $?
0

desky madthanu # fsck.ext4 -fn /dev/sdb
e2fsck 1.42.13 (17-May-2015)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (2541777, counted=2412400).
Fix? no
Free inodes count wrong (655349, counted=595281).
Fix? no
/dev/sdb: 11/655360 files (136.4% non-contiguous), 79663/2621440 blocks

desky madthanu # echo $?
0

=====

Would this be the way to proceed? I am hoping "-E journal_only" also
takes care of clearing orphan inodes and still exits with a 0. (Note:
The output I pasted above is derived from a file system created with
default options, and does not switch off uninit_bg or
lazy_itable_init.)

> See the fsck.ext4 man page, the "EXIT CODE" section for an explanation
> for how the exit codes work.

Sorry, I did read through that, but got confused about the combination
of the exit codes and command line options.

Thanks,
Thanu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck exit codes
  2017-02-14 19:00   ` thanumalayan mad
@ 2017-02-15 15:44     ` Theodore Ts'o
  2017-02-15 21:44       ` thanumalayan mad
  0 siblings, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2017-02-15 15:44 UTC (permalink / raw)
  To: thanumalayan mad; +Cc: linux-ext4

On Tue, Feb 14, 2017 at 11:00:51AM -0800, thanumalayan mad wrote:
> Thank you for the quick reply! (I was curious about the free-blocks
> 
> Would this be the way to proceed? I am hoping "-E journal_only" also
> takes care of clearing orphan inodes and still exits with a 0. (Note:
> The output I pasted above is derived from a file system created with
> default options, and does not switch off uninit_bg or
> lazy_itable_init.)

What are you trying to _do_ by, at the high level?

It should do what you want, but at this point I'm wondering _why_ you
want to do it, and whether or not it is something you _should_ be
doing.

						- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: e2fsck exit codes
  2017-02-15 15:44     ` Theodore Ts'o
@ 2017-02-15 21:44       ` thanumalayan mad
  0 siblings, 0 replies; 5+ messages in thread
From: thanumalayan mad @ 2017-02-15 21:44 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

Hi Ted,

Thank you for replying, again.

On Wed, Feb 15, 2017 at 7:44 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> What are you trying to _do_ by, at the high level?
>
> It should do what you want, but at this point I'm wondering _why_ you
> want to do it, and whether or not it is something you _should_ be
> doing.

I probably have an odd usecase. I am trying to evaluate whether a new
storage stack (i.e., a virtual disk) is good enough for running an
application service; the service can frequently encounter crash
reboots, and the storage stack should be reliable enough. After
testing the service out on the new stack for a while with induced
crash-reboots, I began to suspect corruption in the stack, probably
caused during crashes. I was hoping I could catch the corruption
early-on, and make the testcase more repeatable, if I used "fsck -f"
during the testing. The other ideas I tried out to detect corruption
(like adding data checksums within the service) were funnily even
harder to get right.

Thanks,
Thanu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-15 21:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-14  0:59 e2fsck exit codes thanumalayan mad
2017-02-14 17:01 ` Theodore Ts'o
2017-02-14 19:00   ` thanumalayan mad
2017-02-15 15:44     ` Theodore Ts'o
2017-02-15 21:44       ` thanumalayan mad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox