[PATCH] default max mount count to unused

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] default max mount count to unused
@ 2010-01-20 22:37 Eric Sandeen
  2010-01-22  0:22 ` Andreas Dilger
  2010-01-22  1:29 ` tytso
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Sandeen @ 2010-01-20 22:37 UTC (permalink / raw)
  To: ext4 development; +Cc: Bill Nottingham

From: Bill Nottingham <notting@redhat.com>

Anaconda has been setting the max mount count on the root fs
to -1 (unused) for ages.

I (Eric) tend to agree that using mount count as a proxy for potential
for corruption seems odd.  And waiting for fsck on a reboot just because
it's number 20 (or so) is painful.  Can we just turn it off by default?

I wouldn't mind killing the periodic check as well, but consider
this a trial balloon.  :)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
--- 

diff --git a/doc/libext2fs.texinfo b/doc/libext2fs.texinfo
index 19899bc..a0439bf 100644
--- a/doc/libext2fs.texinfo
+++ b/doc/libext2fs.texinfo
@@ -319,7 +319,8 @@ skip the filesystem check if the number of times that the filesystem has
 been mounted is less than @code{s_max_mnt_count} and if the interval
 between the last time a filesystem check was performed and the current
 time is less than @code{s_checkinterval} (see below).  The default value
-of @code{s_max_mnt_count} is 20.
+of @code{s_max_mnt_count} is -1 (which means that this check is not
+done).
 
 @item s_checkinterval
 This field defines the minimal interval between filesystem checks.  See
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 114b001..b98d6e8 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -484,7 +484,7 @@ struct ext2_inode_large {
 /*
  * Maximal mount counts between two filesystem checks
  */
-#define EXT2_DFL_MAX_MNT_COUNT		20	/* Allow 20 mounts */
+#define EXT2_DFL_MAX_MNT_COUNT		-1	/* Don't use mount check */
 #define EXT2_DFL_CHECKINTERVAL		0	/* Don't use interval check */
 
 /*


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-20 22:37 [PATCH] default max mount count to unused Eric Sandeen
@ 2010-01-22  0:22 ` Andreas Dilger
  2010-01-22  1:37   ` tytso
  2010-01-22  1:29 ` tytso
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2010-01-22  0:22 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: ext4 development, Bill Nottingham, Alasdair G Kergon,
	LVM Mailing List, Theodore Ts'o

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

On 2010-01-20, at 15:37, Eric Sandeen wrote:
> From: Bill Nottingham <notting@redhat.com>
>
> Anaconda has been setting the max mount count on the root fs
> to -1 (unused) for ages.
>
> I (Eric) tend to agree that using mount count as a proxy for potential
> for corruption seems odd.  And waiting for fsck on a reboot just  
> because
> it's number 20 (or so) is painful.  Can we just turn it off by  
> default?
>
> I wouldn't mind killing the periodic check as well, but consider
> this a trial balloon.  :)

Rather than disabling the mount-count check, it would make a lot of  
sense to rather enable background checking via LVM snapshots, as  
described in:

https://www.redhat.com/archives/ext3-users/2008-February/msg00004.html

I've attached an updated version of this script and its config file.   
I've run a fair amount of testing on the script and it seems to do the  
right thing, and started running it from my /etc/cron.weekly to give  
it some further ongoing testing.

Since virtually all new distros use LVM devices, this makes a lot of  
sense to configure by default, rather than leaving filesystems to bit- 
rot in silence by turning off the periodic checking.  This also avoids  
the "all devices check after 6 months" problem for servers that reboot  
only rarely, because the filesystems get a periodic check and reset  
the check timestamp/interval so they will never need checking at boot  
time unless there is an error.


Alasdair, any chance you can include this script into the LVM package?

Ted, this should really be added to e2fsprogs, and the e2croncheck  
script removed.  The existing e2croncheck script is broken in a number  
of ways (e.g. the force check timestamp 19000101 is invalid, the email  
reporting doesn't work because "$RPT-EMAIL" is never set) and is less  
functional in other ways (it doesn't remove stale snapshots in case of  
an interrupted script, it doesn't check multiple LVs, etc).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

[-- Attachment #2: lvcheck --]
[-- Type: application/octet-stream, Size: 10833 bytes --]

#!/bin/bash
# lvcheck, version 1.1
# Maintainer: Bryan Kadzban <bryan.kadzban@is-a-geek.net>

# Other credits:
# Concept and original script by Theodore Tso <tyt...@mit.edu>
# on_ac_power is mostly from Debian's powermgmt-base package
# Lots of help (ideas, initial XFS/JFS support, etc.) from
# Andreas Dilger <adilger@sun.com>
# Better XFS support from Eric Sandeen <sandeen@redhat.com>

# Released under the GNU General Public License, either version 2 or
# (at your option) any later version.

# Overview:
# Run this from cron periodically (e.g. once per week). If the machine
# is on AC power, it will run the checks; otherwise they will all be
# skipped. (If the script can't tell whether the machine is on AC power,
# it will use a setting in the configuration file (/etc/lvcheck.conf) to
# decide whether to continue with the checks, or abort.)
#
# The script will then decide which logical volumes are active, and can
# therefore be checked via an LVM snapshot. Each of these LVs will be
# queried to find its last-check day, and if that was more than $INTERVAL
# days ago (where INTERVAL is set in the configuration file as well), or
# if the last-check day can't be determined, then the script will take an
# LVM snapshot of that LV and run fsck on the snapshot. The snapshot will
# be set to use 1/500 the space of the source LV. After fsck finishes,
# the snapshot is destroyed. (Snapshots are checked serially.)
#
# Any LV that passes fsck should have its last-check time updated (in
# the real superblock, not the snapshot's superblock); any LV whose
# fsck fails will send an email notification to a configurable user
# ($EMAIL). This $EMAIL setting is optional, but its use is highly
# recommended, since if any LV fails, it will need to be checked
# manually, offline. Relevant messages are also sent to syslog.

# Set default values for configuration params. Changes to these values will
# be overwritten on an upgrade! To change these values, edit /etc/lvcheck.conf.
EMAIL='root'
INTERVAL=30
AC_UNKNOWN="CONTINUE"
MINSNAP=256
MINFREE=0

[ "$1" == "-n" ] && NOCHECK="echo"

# pull in configuration -- overwrite the defaults above if the file exists
[ -r /etc/lvcheck.conf ] && . /etc/lvcheck.conf
CHECKPATH=$(dirname "$0" | sed -e 's:/s*bin::')
[ -r $CHECKPATH/etc/lvcheck.conf ] && . $CHECKPATH/etc/lvcheck.conf

# send $2 to syslog, with severity $1
# severities are emerg/alert/crit/err/warning/notice/info/debug
function log() {
	local sev="$1"
	local msg="$2"
	local arg=

	# log warning-or-higher messages to stderr as well
	case $sev in
	emerg|alert|crit|err|warning)
		arg=-s
		;;
	info|debug)
		:
		;;
	*)
		echo "error: unknown log severity '$sev'"
		;;
	esac

	[ "$NOCHECK" ] || logger -t lvcheck $arg -p user."$sev" -- "$msg"
}

# determine whether the machine is on AC power
function on_ac_power() {
	local any_known=no

	# try sysfs power class first
	if [ -d /sys/class/power_supply ]; then
		for psu in /sys/class/power_supply/*; do
			if [ -r "$psu/type" ]; then
		       		type=$(cat "$psu/type")

				# ignore batteries
				[ "$type" = "Battery" ] && continue

				online=$(cat "$psu/online")

				[ "$online" = 1 ] && return 0
				[ "$online" = 0 ] && any_known=yes
			fi
		done

		[ "$any_known" = "yes" ] && return 1
	fi

	# else fall back to AC adapters in /proc
	if [ -d /proc/acpi/ac_adapter ]; then
		for ac in /proc/acpi/ac_adapter/*; do
			if [ -r "$ac/state" ]; then
				grep -q on-line "$ac/state" && return 0
				grep -q off-line "$ac/state" && any_known=yes
			elif [ -r "$ac/status" ]; then
				grep -q on-line "$ac/status" && return 0
				grep -q off-line "$ac/status" && any_known=yes
			fi
		done

		[ "$any_known" = "yes" ] && return 1
	fi

	if [ "$AC_UNKNOWN" == "CONTINUE" ]; then
		return 0	# assume on AC power
	elif [ "$AC_UNKNOWN" == "ABORT" ]; then
		return 1	# assume on battery
	else
		log err "Invalid value for AC_UNKNOWN in the config file"
		exit 1
	fi
}

# attempt to force a check of $1 on the next reboot
function try_force_check() {
	local dev="$1"
	local fstype="$2"

	case "$fstype" in
	ext2|ext3|ext4)
		tune2fs -C 16000 "$dev"
		;;
	xfs)
		# XFS does not enforce check intervals; let email suffice.
		;;
	*)
		log warning "$dev: don't know how to force a check on $fstype."
		;;
	esac
}

# attempt to set the last-check time on $1 to now, and the mount count to 0.
function try_delay_checks() {
	local dev="$1"
	local fstype="$2"

	case "$fstype" in
	ext2|ext3|ext4)
		tune2fs -C 0 -T now "$dev"
		;;
	xfs)
		# XFS does not enforce check intervals; nothing to delay
		;;
	*)
		log info "$dev: don't know how to delay check on $fstype."
		;;
	esac
}

# print the date that $1 was last checked, in a format that date(1) will
# accept, or "Unknown" if we don't know how to find that date.
function try_get_check_date() {
	local dev="$1"
	local fstype="$2"

	case "$fstype" in
	ext2|ext3|ext4)
		dumpe2fs -h "$dev" 2>/dev/null | grep 'Last checked:' |
			sed -e 's/Last checked:[[:space:]]*//'
		;;
	*)
		# XFS does not save the last-checked date 
		# TODO: add support for various other FSes
		echo "Unknown"
		;;
	esac
}

# do any extra checks for filesystem type $2, on device $1
function should_still_check() {
	local dev="$1"
	local fstype="$2"

	case "$fstype" in
	ext2|ext3|ext4)
		if tune2fs -l "$dev" | grep -q "Journal device"; then
			log warning "skip $dev: using external journal."
			return 1
		fi
		;;
	jbd*)
		log debug "skip $dev: using external journal."
		return 1
		;;
	swap)
		log debug "skip $dev: is a swap device."
		return 1
		;;
	*)
		log warning "skip $dev: can't check $fstype passively: assuming OK."
		;;
	esac

	return 0
}

# check the FS on $1 passively, saving output to $3.
function perform_check() {
	local dev="$1"
	local fstype="$2"
	local errlog="$3"

	case "$fstype" in
	ext2|ext3|ext4)
		# first clear the orphaned-inode list, to avoid unnecessary FS
		# changes in the next step (which would cause an "error" exit
		# from e2fsck). -C 0 is present for cases where the script is
		# run interactively (logsave -s strips out the progress bar).
		# ignore the return status of this e2fsck, as it doesn't matter.
		$NOCHECK nice logsave -as "$errlog" e2fsck -p -C 0 "$dev"

		# then do the real check; -y is here to give more info on any
		# errors that may be present on the FS, in the log file. the
		# snapshot is writable, so it shouldn't break anything if
		# e2fsck changes it.
		$NOCHECK nice logsave -as "$errlog" e2fsck -fy -C 0 "$dev"
		return $?
		;;
	reiserfs)
		echo Yes | $NOCHECK nice logsave -as "$errlog" fsck.reiserfs --check "$dev"
		# apparently can't fail? let's hope not...
		return 0
		;;
	xfs)
		$NOCHECK nice logsave -as "$errlog" xfs_repair -n "$dev"
		return $?
		;;
	jfs)
		$NOCHECK nice logsave -as "$errlog" fsck.jfs -fn "$dev"
		return $?
		;;
	esac
}

# do everything needed to check and reset dates and counters on /dev/$1/$2.
function check_fs() {
	local vg="$1"
	local lv="$2"
	local fstype="$3"
	local snapsize="$4" # in units of MB

	local lvdev="/dev/$vg/$lv"
	local errlog="/var/log/lvcheck/$vg-$lv-$(date +%Y%m%d)"
	local snaplvbase="$lv-lvcheck-temp"
	local snaplv="$snaplvbase-$(date +'%Y%m%d')"

	# clean up any left-over snapshot LVs
	for lvtemp in /dev/$vg/$snaplvbase*; do
		if [ -e "$lvtemp" ]; then
			# Assume script won't run more than one at a time?
			log warning "stale $lvtemp: trying to remove old snapshot."

			if ! lvremove -f "$lvtemp"; then
				log err "error $lvtemp: could not delete."
				return 1
			fi
		fi
	done

	# see whether FS needs any extra checks that might disqualify it
	should_still_check "$lvdev" "$fstype" || return 0

	# get the last check time
	check_date=$(try_get_check_date "$lvdev" "$fstype")

	# if the date is unknown, run fsck every time the script runs. sigh.
	if [ "$check_date" != "Unknown" ]; then
		# add $INTERVAL days, and throw away the time portion
		check_day=$(date --date="$check_date $INTERVAL days" +'%Y%m%d')

		# get today's date, and skip the check if it's not within the interval
		today=$(date +'%Y%m%d')
		if [ $check_day -gt $today ]; then
			log debug "skip $lvdev: just checked on $check_date."
			return 0
		fi
	fi

	# create new snapshot LV
	lvcreate -s -L "$snapsize"M -n "$snaplv" "$vg/$lv"

	if perform_check "/dev/$vg/$snaplv" "$fstype" "$errlog"; then
		log info "$lvdev: Background check succeeded."
		[ -z "$NOCHECK" ] && try_delay_checks "$lvdev" "$fstype"
	else
		log err "error $lvdev: Background check failed! Run offline!"
		[ -z "$NOCHECK" ] && try_force_check "$lvdev" "$fstype"

		if [ "$EMAIL" -a -z "$NOCHECK" ]; then
			mail -s "Fsck $lvdev failed" $EMAIL < $errlog
		fi
	fi

	lvremove -f "$vg/$snaplv"
}

# check whether the machine is on AC power: if not, skip fsck
on_ac_power || exit 0

# parse up lvscan output
lvscan 2>&1 | grep ACTIVE | awk '{print $2;}' | while read DEV; do
	# remove the single quotes around the device name
	DEV=$(echo "$DEV" | tr -d \')
	if [ ! -b "$DEV" ]; then
		if [ ! -e "$DEV" ]; then
			log info "skip $DEV: no longer exists."
		else
			log info "skip $DEV: not a block device."
		fi
		continue
	fi

	# get the FS type: blkid prints TYPE="blah"
	FSTYPE=$(blkid -s TYPE "$DEV" | cut -d'=' -f2 | tr -d \"\ )
	if [ -z "$FSTYPE" ]; then
		log info "skip $DEV: can't determine device type."
		continue
	fi

	# get the volume group and logical volume names
	VG=$(echo $(lvs --noheadings -o vg_name "$DEV"))
	LV=$(echo $(lvs --noheadings -o lv_name "$DEV"))

	# get the free space and LV size (in megs), guess at the snapshot size,
	# and see how much the admin will let us use (keeping MINFREE available)
	SPACE=$(lvs --noheadings --units M -o vg_free "$DEV"|cut -d. -f1)
	SIZE=$(lvs --noheadings --units M -o lv_size "$DEV"|cut -d.  -f1)
	SNAPSIZE=$(($SIZE / 500))
	AVAIL=$(($SPACE - $MINFREE))

	# if we don't even have MINSNAP space available, skip the LV
	if [ "$MINSNAP" -gt "$AVAIL" -o "$AVAIL" -le 0 ]; then
		log warning "skip $DEV: need ${MINSNAP}M free space in volume group."
		continue
	fi

	# make snapshot large enough to handle e.g. journal and other updates
	[ "$SNAPSIZE" -lt "$MINSNAP" ] && SNAPSIZE="$MINSNAP"

	# limit snapshot to available space (VG space minus min-free)
	[ "$SNAPSIZE" -gt "$AVAIL" ] && SNAPSIZE="$AVAIL"

	# don't need to check SNAPSIZE again: MINSNAP <= AVAIL, MINSNAP <= SNAPSIZE,
	# and SNAPSIZE <= AVAIL, combined, means SNAPSIZE must be between MINSNAP
	# and AVAIL, which is what we need -- assuming AVAIL > 0

	check_fs "$VG" "$LV" "$FSTYPE" "$SNAPSIZE"
done 

[-- Attachment #3: lvcheck.conf --]
[-- Type: application/octet-stream, Size: 1242 bytes --]

#!/bin/sh
# lvcheck configuration file

# This file follows the pattern of sshd_config:
# default values are shown here, commented-out.

# EMAIL: Address to send failure notifications to. If empty, failure
# notifications will not be sent.
#EMAIL='root'

# INTERVAL: Days to wait between checks. All LVs use the same INTERVAL,
# but the "days since last check" value can be different per LV, since
# that value is stored in the filesystem superblock.
#INTERVAL=30

# AC_UNKNOWN: Whether to run the fsck.* checks if the script can't determine
# whether the machine is on AC power. Laptop users will want to set this to
# ABORT, while server and desktop users will probably want to set this to
# CONTINUE.  Those are the only two valid values.
#AC_UNKNOWN="CONTINUE"

# MINSNAP: Minimum snapshot size to take, in megabytes. The default size is
# 1/500 the size of the logical volume, but if that is less than MINSNAP, the
# script will use MINSNAP instead. This should be large enough to handle e.g.
# journal updates, and other disk changes that require (semi-)constant space.
#MINSNAP=256

# MINFREE: Minimum amount of space (in megabytes) to keep free in each volume
# group when creating snapshots.
#MINFREE=0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22  0:22 ` Andreas Dilger
@ 2010-01-22  1:37   ` tytso
  2010-01-22 17:42     ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: tytso @ 2010-01-22  1:37 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Eric Sandeen, ext4 development, Bill Nottingham,
	Alasdair G Kergon, LVM Mailing List

On Thu, Jan 21, 2010 at 05:22:55PM -0700, Andreas Dilger wrote:
> 
> Alasdair, any chance you can include this script into the LVM package?
> 
> Ted, this should really be added to e2fsprogs, and the e2croncheck
> script removed.  The existing e2croncheck script is broken in a
> number of ways (e.g. the force check timestamp 19000101 is invalid,
> the email reporting doesn't work because "$RPT-EMAIL" is never set)
> and is less functional in other ways (it doesn't remove stale
> snapshots in case of an interrupted script, it doesn't check
> multiple LVs, etc).

Sure, I'd be happy to include this to e2fsprogs.  I'm not sure which
distro package should be installing it, but we can leave that up to
the distro maintainers.

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22  1:37   ` tytso
@ 2010-01-22 17:42     ` Eric Sandeen
  2010-01-22 18:35       ` Andreas Dilger
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-01-22 17:42 UTC (permalink / raw)
  To: tytso
  Cc: Andreas Dilger, ext4 development, Bill Nottingham,
	Alasdair G Kergon, LVM Mailing List

tytso@mit.edu wrote:
> On Thu, Jan 21, 2010 at 05:22:55PM -0700, Andreas Dilger wrote:
>> Alasdair, any chance you can include this script into the LVM package?
>>
>> Ted, this should really be added to e2fsprogs, and the e2croncheck
>> script removed.  The existing e2croncheck script is broken in a
>> number of ways (e.g. the force check timestamp 19000101 is invalid,
>> the email reporting doesn't work because "$RPT-EMAIL" is never set)
>> and is less functional in other ways (it doesn't remove stale
>> snapshots in case of an interrupted script, it doesn't check
>> multiple LVs, etc).
> 
> Sure, I'd be happy to include this to e2fsprogs.  I'm not sure which
> distro package should be installing it, but we can leave that up to
> the distro maintainers.
> 
> 					- Ted


Last time around, we all seemed to think it should be in the lvm
tools (though I don't remember exactly why - probably because it's
really not ext*-specific at all)

It got forwarded to the LVM list, agk asked if anyone wanted to
clean it up & take ownership of it, and that was the end.  :(

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 17:42     ` Eric Sandeen
@ 2010-01-22 18:35       ` Andreas Dilger
  0 siblings, 0 replies; 15+ messages in thread
From: Andreas Dilger @ 2010-01-22 18:35 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: tytso, ext4 development, Bill Nottingham, Alasdair G Kergon,
	LVM Mailing List

On 2010-01-22, at 10:42, Eric Sandeen wrote:
> tytso@mit.edu wrote:
>> On Thu, Jan 21, 2010 at 05:22:55PM -0700, Andreas Dilger wrote:
>>> Alasdair, any chance you can include this script into the LVM  
>>> package?
>>>
>>> Ted, this should really be added to e2fsprogs, and the e2croncheck
>>> script removed.  The existing e2croncheck script is broken in a
>>> number of ways (e.g. the force check timestamp 19000101 is invalid,
>>> the email reporting doesn't work because "$RPT-EMAIL" is never set)
>>> and is less functional in other ways (it doesn't remove stale
>>> snapshots in case of an interrupted script, it doesn't check
>>> multiple LVs, etc).
>>
>> Sure, I'd be happy to include this to e2fsprogs.  I'm not sure which
>> distro package should be installing it, but we can leave that up to
>> the distro maintainers.
>
> Last time around, we all seemed to think it should be in the lvm
> tools (though I don't remember exactly why - probably because it's
> really not ext*-specific at all)

Sure, but since there is only a semi-functional version in e2fsprogs  
we may as well replace it with a working one.

> It got forwarded to the LVM list, agk asked if anyone wanted to
> clean it up & take ownership of it, and that was the end.  :(


I guess I wasn't on that email.  I've gone ahead and fixed up the  
current script to be functional (including testing of failed-fsck  
email, force-fsck on next boot, stale snapshot cleanup, etc) so you  
may as well chalk me up as the maintainer for now.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-20 22:37 [PATCH] default max mount count to unused Eric Sandeen
  2010-01-22  0:22 ` Andreas Dilger
@ 2010-01-22  1:29 ` tytso
  2010-01-22  3:37   ` Eric Sandeen
  1 sibling, 1 reply; 15+ messages in thread
From: tytso @ 2010-01-22  1:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development, Bill Nottingham

On Wed, Jan 20, 2010 at 04:37:25PM -0600, Eric Sandeen wrote:
> From: Bill Nottingham <notting@redhat.com>
> 
> Anaconda has been setting the max mount count on the root fs
> to -1 (unused) for ages.
> 
> I (Eric) tend to agree that using mount count as a proxy for potential
> for corruption seems odd.  And waiting for fsck on a reboot just because
> it's number 20 (or so) is painful.  Can we just turn it off by default?
> 
> I wouldn't mind killing the periodic check as well, but consider
> this a trial balloon.  :)

I think it would be better to make this be something tunable via
mke2fs.conf.  And as a profile option, maybe we would want this to be
something where we periodically force a full fsck check and then send
TRIM commands down to the SSD.  Given the size and speed of SSD's,
doing periodic TRIM's every N mounts mike actually be a good thing.
(It's dangerous to do a TRIM without doing a full fsck since if the
block allocation bitmap isn't quite right, the user could lose data.)

      		 	      	    	       - Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22  1:29 ` tytso
@ 2010-01-22  3:37   ` Eric Sandeen
  2010-01-22  8:09     ` Andreas Dilger
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-01-22  3:37 UTC (permalink / raw)
  To: tytso; +Cc: ext4 development, Bill Nottingham

tytso@mit.edu wrote:
> On Wed, Jan 20, 2010 at 04:37:25PM -0600, Eric Sandeen wrote:
>> From: Bill Nottingham <notting@redhat.com>
>>
>> Anaconda has been setting the max mount count on the root fs
>> to -1 (unused) for ages.
>>
>> I (Eric) tend to agree that using mount count as a proxy for potential
>> for corruption seems odd.  And waiting for fsck on a reboot just because
>> it's number 20 (or so) is painful.  Can we just turn it off by default?
>>
>> I wouldn't mind killing the periodic check as well, but consider
>> this a trial balloon.  :)
> 
> I think it would be better to make this be something tunable via
> mke2fs.conf. 

And defaulting it to unused, right ;)

> And as a profile option, maybe we would want this to be
> something where we periodically force a full fsck check and then send
> TRIM commands down to the SSD.  Given the size and speed of SSD's,
> doing periodic TRIM's every N mounts mike actually be a good thing.
> (It's dangerous to do a TRIM without doing a full fsck since if the
> block allocation bitmap isn't quite right, the user could lose data.)

That sounds fine, as do mke2fs.conf hooks, as does a nice shipped script
to do background checking of snapshots.

But I still don't know why "You mounted your fs 20 times" is a good
proxy for "you had better check for corruption now."  Have we so
little faith?  :)

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22  3:37   ` Eric Sandeen
@ 2010-01-22  8:09     ` Andreas Dilger
  2010-01-22 17:02       ` Ric Wheeler
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2010-01-22  8:09 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: tytso, ext4 development, Bill Nottingham

On 2010-01-21, at 20:37, Eric Sandeen wrote:
> That sounds fine, as do mke2fs.conf hooks, as does a nice shipped  
> script
> to do background checking of snapshots.
>
> But I still don't know why "You mounted your fs 20 times" is a good
> proxy for "you had better check for corruption now."  Have we so
> little faith?  :)

I've thought for quite a while that 20 mounts is too often, but I'm  
reluctant to turn it off completely.  I wouldn't object to increasing  
it to 60 or 80.

At one time there was a patch that checked the state of the filesystem  
at mount time and only incremented only 1/5 of the time (randomly) if  
it was unmounted cleanly (not dirty, or not in recovery), but every  
time if it crashed.  The reasoning was that systems which crashed are  
more likely to have memory corruption or software bugs, and ones that  
shut down cleanly are less likely to have such problems.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22  8:09     ` Andreas Dilger
@ 2010-01-22 17:02       ` Ric Wheeler
  2010-01-22 18:40         ` Andreas Dilger
  0 siblings, 1 reply; 15+ messages in thread
From: Ric Wheeler @ 2010-01-22 17:02 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Eric Sandeen, tytso, ext4 development, Bill Nottingham

On 01/22/2010 03:09 AM, Andreas Dilger wrote:
> On 2010-01-21, at 20:37, Eric Sandeen wrote:
>> That sounds fine, as do mke2fs.conf hooks, as does a nice shipped script
>> to do background checking of snapshots.
>>
>> But I still don't know why "You mounted your fs 20 times" is a good
>> proxy for "you had better check for corruption now."  Have we so
>> little faith?  :)
>
>
> I've thought for quite a while that 20 mounts is too often, but I'm 
> reluctant to turn it off completely.  I wouldn't object to increasing 
> it to 60 or 80.
>
> At one time there was a patch that checked the state of the filesystem 
> at mount time and only incremented only 1/5 of the time (randomly) if 
> it was unmounted cleanly (not dirty, or not in recovery), but every 
> time if it crashed.  The reasoning was that systems which crashed are 
> more likely to have memory corruption or software bugs, and ones that 
> shut down cleanly are less likely to have such problems.
>

I do like the snapshot idea, but also think that we need something will 
not introduce random (potentially multi-hour or multi-day) fsck runs 
after an otherwise clean reboot.

If we hit this with a combination of:

Reboot time:
     (1) Try to mount the file system
     (1) on mount failure, fsck the failed file system

While up and running, do a periodic check with the snapshot trick.

I think that would balance the fear that we have of creeping corruption 
(or at least severe corruption) against the need to be speedy when 
rebooting....

ric


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 17:02       ` Ric Wheeler
@ 2010-01-22 18:40         ` Andreas Dilger
  2010-01-22 18:57           ` Ric Wheeler
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2010-01-22 18:40 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Eric Sandeen, tytso, ext4 development, Bill Nottingham

On 2010-01-22, at 10:02, Ric Wheeler wrote:
>> I've thought for quite a while that 20 mounts is too often, but I'm  
>> reluctant to turn it off completely.  I wouldn't object to  
>> increasing it to 60 or 80.
>>
>> At one time there was a patch that checked the state of the  
>> filesystem at mount time and only incremented only 1/5 of the time  
>> (randomly) if it was unmounted cleanly (not dirty, or not in  
>> recovery), but every time if it crashed.  The reasoning was that  
>> systems which crashed are more likely to have memory corruption or  
>> software bugs, and ones that shut down cleanly are less likely to  
>> have such problems.
>>
>
> I do like the snapshot idea, but also think that we need something  
> will not introduce random (potentially multi-hour or multi-day) fsck  
> runs after an otherwise clean reboot.
>
> If we hit this with a combination of:
>
> Reboot time:
>    (1) Try to mount the file system
>    (1) on mount failure, fsck the failed file system

Well, this is essentially what already happens with e2fsck today,  
though it correctly checks the filesystem for errors _first_, and  
_then_ mounts the filesystem.  Otherwise it isn't possible to fix the  
filesystem after mount, and mounting a filesystem with errors is a  
recipe for further corruption and/or a crash/reboot cycle.

> While up and running, do a periodic check with the snapshot trick.

Yes, this is intended to reset the periodic mount/time counter to  
avoid the non-error boot-time check.  If that is not running correctly  
then the periodic check would still be done as a fail-safe measure.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 18:40         ` Andreas Dilger
@ 2010-01-22 18:57           ` Ric Wheeler
  2010-01-22 19:06             ` Andreas Dilger
  2010-01-22 23:18             ` tytso
  0 siblings, 2 replies; 15+ messages in thread
From: Ric Wheeler @ 2010-01-22 18:57 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Eric Sandeen, tytso, ext4 development, Bill Nottingham

On 01/22/2010 01:40 PM, Andreas Dilger wrote:
> On 2010-01-22, at 10:02, Ric Wheeler wrote:
>>> I've thought for quite a while that 20 mounts is too often, but I'm
>>> reluctant to turn it off completely. I wouldn't object to increasing
>>> it to 60 or 80.
>>>
>>> At one time there was a patch that checked the state of the
>>> filesystem at mount time and only incremented only 1/5 of the time
>>> (randomly) if it was unmounted cleanly (not dirty, or not in
>>> recovery), but every time if it crashed. The reasoning was that
>>> systems which crashed are more likely to have memory corruption or
>>> software bugs, and ones that shut down cleanly are less likely to
>>> have such problems.
>>>
>>
>> I do like the snapshot idea, but also think that we need something
>> will not introduce random (potentially multi-hour or multi-day) fsck
>> runs after an otherwise clean reboot.
>>
>> If we hit this with a combination of:
>>
>> Reboot time:
>> (1) Try to mount the file system
>> (1) on mount failure, fsck the failed file system
>
> Well, this is essentially what already happens with e2fsck today, though
> it correctly checks the filesystem for errors _first_, and _then_ mounts
> the filesystem. Otherwise it isn't possible to fix the filesystem after
> mount, and mounting a filesystem with errors is a recipe for further
> corruption and/or a crash/reboot cycle.

I think that we have to move towards an assumption that our journalling code 
actually works - the goal should be that we can *always* mount after a crash or 
clean reboot. That should be the basic test case - pound on a file system, drop 
power to the storage (and or server) and then on reboot, try to remount. 
Verification would be in the QA test case to unmount and fsck to make sure our 
journal was robust.

Note that in a technique that I have used in the past (with reiserfs) at large 
scale in actual deployments of hundreds of thousands of file systems. It does 
work pretty well in practice.

The key here is that any fsck can be a huge delay, pretty much unacceptable in 
production shops, where they might have multiple file systems per box.

>
>> While up and running, do a periodic check with the snapshot trick.
>
> Yes, this is intended to reset the periodic mount/time counter to avoid
> the non-error boot-time check. If that is not running correctly then the
> periodic check would still be done as a fail-safe measure.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 18:57           ` Ric Wheeler
@ 2010-01-22 19:06             ` Andreas Dilger
  2010-01-22 19:59               ` Eric Sandeen
  2010-01-22 23:18             ` tytso
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2010-01-22 19:06 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Eric Sandeen, tytso, ext4 development, Bill Nottingham

On 2010-01-22, at 11:57, Ric Wheeler wrote:
> On 01/22/2010 01:40 PM, Andreas Dilger wrote:
>>> Reboot time:
>>> (1) Try to mount the file system
>>> (1) on mount failure, fsck the failed file system
>>
>> Well, this is essentially what already happens with e2fsck today,  
>> though
>> it correctly checks the filesystem for errors _first_, and _then_  
>> mounts
>> the filesystem. Otherwise it isn't possible to fix the filesystem  
>> after
>> mount, and mounting a filesystem with errors is a recipe for further
>> corruption and/or a crash/reboot cycle.
>
> I think that we have to move towards an assumption that our  
> journalling code actually works - the goal should be that we can  
> *always* mount after a crash or clean reboot. That should be the  
> basic test case - pound on a file system, drop power to the storage  
> (and or server) and then on reboot, try to remount. Verification  
> would be in the QA test case to unmount and fsck to make sure our  
> journal was robust.

I think you are missing an important fact here.  While e2fsck _always_  
runs on a filesystem at boot time (or at least this is the recommended  
configuration), this initial e2fsck run is only doing a very minimal  
amount of work (i.e. it is NOT a full "e2fsck -f" run).  It checks  
that the superblock is sane, it recovers the journal, and it looks for  
error flags written to the journal and/or superblock.  If all of those  
tests pass (i.e. less than a second of work) then the e2fsck run  
passes (excluding periodic checking, which IMHO is the only issue  
under discussion here).

> Note that in a technique that I have used in the past (with  
> reiserfs) at large scale in actual deployments of hundreds of  
> thousands of file systems. It does work pretty well in practice.
>
> The key here is that any fsck can be a huge delay, pretty much  
> unacceptable in production shops, where they might have multiple  
> file systems per box.

No, there is no delay if the filesystem does not have any errors.  I  
consider the lack of ANY minimal boot-time sanity checking a serious  
problem with reiserfs and advised Hans many times to have minimal  
sanity checks at boot.

The problem is that if the kernel (or a background snapshot e2fsck)  
detects an error then the only way it can force a full check to  
correct is to do this on the next boot, by storing some information in  
the superblock.  If the filesystem is mounted at boot time without  
even a minimal check for such error flags in the superblock then the  
error may never be corrected, and in fact may cause cascading  
corruption elsewhere in the filesystem (e.g. corrupt bitmaps, bad  
indirect block pointers, etc).

If the filesystem is mounted with "errors=panic" (often done with HA  
configs to allow failover in case of node/cable/HBA errors) then any  
corruption on disk will just result in the node getting stuck in a  
reboot cycle with no automated e2fsck being run.

>>> While up and running, do a periodic check with the snapshot trick.
>>
>> Yes, this is intended to reset the periodic mount/time counter to  
>> avoid
>> the non-error boot-time check. If that is not running correctly  
>> then the
>> periodic check would still be done as a fail-safe measure.
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 19:06             ` Andreas Dilger
@ 2010-01-22 19:59               ` Eric Sandeen
  2010-01-22 20:58                 ` Valerie Aurora
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-01-22 19:59 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Ric Wheeler, tytso, ext4 development, Bill Nottingham

Andreas Dilger wrote:
> On 2010-01-22, at 11:57, Ric Wheeler wrote:
>> On 01/22/2010 01:40 PM, Andreas Dilger wrote:
>>>> Reboot time:
>>>> (1) Try to mount the file system
>>>> (1) on mount failure, fsck the failed file system
>>>
>>> Well, this is essentially what already happens with e2fsck today, though
>>> it correctly checks the filesystem for errors _first_, and _then_ mounts
>>> the filesystem. Otherwise it isn't possible to fix the filesystem after
>>> mount, and mounting a filesystem with errors is a recipe for further
>>> corruption and/or a crash/reboot cycle.
>>
>> I think that we have to move towards an assumption that our
>> journalling code actually works - the goal should be that we can
>> *always* mount after a crash or clean reboot. That should be the basic
>> test case - pound on a file system, drop power to the storage (and or
>> server) and then on reboot, try to remount. Verification would be in
>> the QA test case to unmount and fsck to make sure our journal was robust.
> 
> I think you are missing an important fact here.  While e2fsck _always_
> runs on a filesystem at boot time (or at least this is the recommended
> configuration), this initial e2fsck run is only doing a very minimal
> amount of work (i.e. it is NOT a full "e2fsck -f" run).  It checks that
> the superblock is sane, it recovers the journal, and it looks for error
> flags written to the journal and/or superblock.  If all of those tests
> pass (i.e. less than a second of work) then the e2fsck run passes
> (excluding periodic checking, which IMHO is the only issue under
> discussion here).
> 
>> Note that in a technique that I have used in the past (with reiserfs)
>> at large scale in actual deployments of hundreds of thousands of file
>> systems. It does work pretty well in practice.
>>
>> The key here is that any fsck can be a huge delay, pretty much
>> unacceptable in production shops, where they might have multiple file
>> systems per box.
> 
> No, there is no delay if the filesystem does not have any errors.  I

well, there is a delay if it's the magical Nth time or the magical Nth
hour, right?  Which is what we're trying to avoid.

> consider the lack of ANY minimal boot-time sanity checking a serious
> problem with reiserfs and advised Hans many times to have minimal sanity
> checks at boot.

I have no problem with checking an fs marked with errors...

> The problem is that if the kernel (or a background snapshot e2fsck)
> detects an error then the only way it can force a full check to correct
> is to do this on the next boot, by storing some information in the
> superblock.  If the filesystem is mounted at boot time without even a
> minimal check for such error flags in the superblock then the error may
> never be corrected, and in fact may cause cascading corruption elsewhere
> in the filesystem (e.g. corrupt bitmaps, bad indirect block pointers, etc).

Mmmhm, so if we mark it with the error and a next boot fscks... I can
live with that.

I just want to avoid the "we scheduled a brief window to upgrade the kernel,
and the next time we booted we got a 3-hour fsck that we didn't expect,
and we were afraid to stop it, but oh well it was clean anyway" scenario.

I guess the higher-level discussion to have is

a) what are the errors and the root-causes that the forced periodic
   checks are intended to catch

and

b) what are the pros and cons of periodic checking for those errors,
   vs catching them at runtime and scheduling a fsck as a result.

or maybe it's "how much of a nanny-state do we want to be?" :)

-Eric



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 19:59               ` Eric Sandeen
@ 2010-01-22 20:58                 ` Valerie Aurora
  0 siblings, 0 replies; 15+ messages in thread
From: Valerie Aurora @ 2010-01-22 20:58 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Andreas Dilger, Ric Wheeler, tytso, ext4 development,
	Bill Nottingham

On Fri, Jan 22, 2010 at 01:59:26PM -0600, Eric Sandeen wrote:
> Andreas Dilger wrote:
> > On 2010-01-22, at 11:57, Ric Wheeler wrote:
> >> On 01/22/2010 01:40 PM, Andreas Dilger wrote:
> >>>> Reboot time:
> >>>> (1) Try to mount the file system
> >>>> (1) on mount failure, fsck the failed file system
> >>>
> >>> Well, this is essentially what already happens with e2fsck today, though
> >>> it correctly checks the filesystem for errors _first_, and _then_ mounts
> >>> the filesystem. Otherwise it isn't possible to fix the filesystem after
> >>> mount, and mounting a filesystem with errors is a recipe for further
> >>> corruption and/or a crash/reboot cycle.
> >>
> >> I think that we have to move towards an assumption that our
> >> journalling code actually works - the goal should be that we can
> >> *always* mount after a crash or clean reboot. That should be the basic
> >> test case - pound on a file system, drop power to the storage (and or
> >> server) and then on reboot, try to remount. Verification would be in
> >> the QA test case to unmount and fsck to make sure our journal was robust.
> > 
> > I think you are missing an important fact here.  While e2fsck _always_
> > runs on a filesystem at boot time (or at least this is the recommended
> > configuration), this initial e2fsck run is only doing a very minimal
> > amount of work (i.e. it is NOT a full "e2fsck -f" run).  It checks that
> > the superblock is sane, it recovers the journal, and it looks for error
> > flags written to the journal and/or superblock.  If all of those tests
> > pass (i.e. less than a second of work) then the e2fsck run passes
> > (excluding periodic checking, which IMHO is the only issue under
> > discussion here).
> > 
> >> Note that in a technique that I have used in the past (with reiserfs)
> >> at large scale in actual deployments of hundreds of thousands of file
> >> systems. It does work pretty well in practice.
> >>
> >> The key here is that any fsck can be a huge delay, pretty much
> >> unacceptable in production shops, where they might have multiple file
> >> systems per box.
> > 
> > No, there is no delay if the filesystem does not have any errors.  I
> 
> well, there is a delay if it's the magical Nth time or the magical Nth
> hour, right?  Which is what we're trying to avoid.
> 
> > consider the lack of ANY minimal boot-time sanity checking a serious
> > problem with reiserfs and advised Hans many times to have minimal sanity
> > checks at boot.
> 
> I have no problem with checking an fs marked with errors...

Yes, I think we are all in violent agreement on this.

> > The problem is that if the kernel (or a background snapshot e2fsck)
> > detects an error then the only way it can force a full check to correct
> > is to do this on the next boot, by storing some information in the
> > superblock.  If the filesystem is mounted at boot time without even a
> > minimal check for such error flags in the superblock then the error may
> > never be corrected, and in fact may cause cascading corruption elsewhere
> > in the filesystem (e.g. corrupt bitmaps, bad indirect block pointers, etc).
> 
> Mmmhm, so if we mark it with the error and a next boot fscks... I can
> live with that.
> 
> I just want to avoid the "we scheduled a brief window to upgrade the kernel,
> and the next time we booted we got a 3-hour fsck that we didn't expect,
> and we were afraid to stop it, but oh well it was clean anyway" scenario.
> 
> I guess the higher-level discussion to have is
> 
> a) what are the errors and the root-causes that the forced periodic
>    checks are intended to catch
> 
> and
> 
> b) what are the pros and cons of periodic checking for those errors,
>    vs catching them at runtime and scheduling a fsck as a result.
> 
> or maybe it's "how much of a nanny-state do we want to be?" :)

Do any other file systems have this "fsck on N reboots/N days up"
behavior?  Is ext3/ext4 the odd one out?

-VAL

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] default max mount count to unused
  2010-01-22 18:57           ` Ric Wheeler
  2010-01-22 19:06             ` Andreas Dilger
@ 2010-01-22 23:18             ` tytso
  1 sibling, 0 replies; 15+ messages in thread
From: tytso @ 2010-01-22 23:18 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Andreas Dilger, Eric Sandeen, ext4 development, Bill Nottingham

On Fri, Jan 22, 2010 at 01:57:16PM -0500, Ric Wheeler wrote:
> 
> I think that we have to move towards an assumption that our
> journalling code actually works - the goal should be that we can
> *always* mount after a crash or clean reboot. That should be the
> basic test case - pound on a file system, drop power to the storage
> (and or server) and then on reboot, try to remount. Verification
> would be in the QA test case to unmount and fsck to make sure our
> journal was robust.

The original reason for the periodic fsck was not a fear that the
journalling system worked --- it was a concern that the hardware was
reliable.  (Ted's law of PC class hardware: it's crap. :-) That was
the reason for the periodic fsck in the BSD days, and it's the same
now.

That being said, I agree that 20-40 reboots (it's actually randomized
by mke2fs these days; the setting in libext2fs isn't the whole story)
is a ver y rough metric.  I'd much rather do the checking periodically
via snapshots in cron, at which point the reboot counter becomes moot
(the snapshot check zero's the mount count and sets the last checked
time correctly).

							- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-01-22 23:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-20 22:37 [PATCH] default max mount count to unused Eric Sandeen
2010-01-22  0:22 ` Andreas Dilger
2010-01-22  1:37   ` tytso
2010-01-22 17:42     ` Eric Sandeen
2010-01-22 18:35       ` Andreas Dilger
2010-01-22  1:29 ` tytso
2010-01-22  3:37   ` Eric Sandeen
2010-01-22  8:09     ` Andreas Dilger
2010-01-22 17:02       ` Ric Wheeler
2010-01-22 18:40         ` Andreas Dilger
2010-01-22 18:57           ` Ric Wheeler
2010-01-22 19:06             ` Andreas Dilger
2010-01-22 19:59               ` Eric Sandeen
2010-01-22 20:58                 ` Valerie Aurora
2010-01-22 23:18             ` tytso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).