From: Bryan Kadzban <bryan@kadzban.is-a-geek.net>
To: linux-lvm@redhat.com
Subject: [linux-lvm] Offline fsck (checking snapshots)
Date: Thu, 24 Apr 2008 18:39:41 -0400 [thread overview]
Message-ID: <48110C2D.5020800@kadzban.is-a-geek.net> (raw)
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
There was some discussion on the ext3-users list a few months ago, about
how e2fsck took a long time to run, and it was getting forced because it
keeps track of a couple of counters that can force it (the counters are
days since the last full fsck, and mounts since the last full fsck).
(The thread starts at [1], the script development started at [2], and
the most recent version is at [3]. The one extra thing I've added is
skipping ext2/3 FSes that have an external journal.)
The suggestion was made that if the user is using LVM, a temporary
snapshot could be taken, and then fsck could be run on that. If it
succeeds, then it's possible to set the last-fsck-time and mount-count
while the real FS is mounted.
I've gotten a script that I think is reasonable, that handles this.
With some help from others, it now works with XFS as well as ext2/3, and
it's supposed to also work with JFS. Since it requires LVM, I think it
might make sense to put something like it into the LVM userspace tools.
(There is one issue: it also requires blkid and logsave from e2fsprogs.
I could work around the requirement for logsave (using tee -a), but
blkid would be harder.)
The idea behind the script is, you run it at night from cron; it will
check each LV on the system and mail a user if there are any problems.
It also logs to syslog.
I've attached the script and its configuration file to this message.
Comments?
[1] https://www.redhat.com/archives/ext3-users/2008-January/msg00027.html
[2] https://www.redhat.com/archives/ext3-users/2008-January/msg00032.html
[3]
https://www.redhat.com/archives/ext3-users/2008-February/msg00004.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIEQwpS5vET1Wea5wRA43cAKDWGJVFgV6fmJKeQUgcPH6Ebd1aygCfb4a9
TbiWVGUYFnPeQSWiJVl0x2k=
=Juq0
-----END PGP SIGNATURE-----
[-- Attachment #2: lvcheck --]
[-- Type: text/plain, Size: 10196 bytes --]
#!/bin/sh
#
# lvcheck, version 1.0
# Maintainer: Bryan Kadzban <bryan@kadzban.is-a-geek.net>
# Other credits:
# Concept and original script by Theodore Tso <tytso@mit.edu>
# on_ac_power is mostly from Debian's powermgmt-base package
# Lots of help (ideas, initial XFS/JFS support, etc.) from
# Andreas Dilger <adilger@sun.com>
# Better XFS support from Eric Sandeen <sandeen@redhat.com>
# Released under the GNU General Public License, either version 2 or
# (at your option) any later version.
# Overview:
#
# Run this from cron periodically (e.g. once per week). If the
# machine is on AC power, it will run the checks; otherwise they will
# all be skipped. (If the script can't tell whether the machine is
# on AC power, it will use a setting in the configuration file
# (/etc/lvcheck.conf) to decide whether to continue with the checks,
# or abort.)
#
# The script will then decide which logical volumes are active, and
# can therefore be checked via an LVM snapshot. Each of these LVs
# will be queried to find its last-check day, and if that was more
# than $INTERVAL days ago (where INTERVAL is set in the configuration
# file as well), or if the last-check day can't be determined, then
# the script will take an LVM snapshot of that LV and run fsck on the
# snapshot. The snapshot will be set to use 1/500 the space of the
# source LV. After fsck finishes, the snapshot is destroyed.
# (Snapshots are checked serially.)
#
# Any LV that passes fsck should have its last-check time updated (in
# the real superblock, not the snapshot's superblock); any LV whose
# fsck fails will send an email notification to a configurable user
# ($EMAIL). This $EMAIL setting is optional, but its use is highly
# recommended, since if any LV fails, it will need to be checked
# manually, offline. Relevant messages are also sent to syslog.
# Set default values for configuration params. Changes to these values
# will be overwritten on an upgrade! To change these values, use
# /etc/lvcheck.conf.
EMAIL='root'
INTERVAL=30
AC_UNKNOWN="CONTINUE"
MINSNAP=256
MINFREE=0
# send $2 to syslog, with severity $1
# severities are emerg/alert/crit/err/warning/notice/info/debug
function log() {
local sev="$1"
local msg="$2"
local arg=
# log warning-or-higher messages to stderr as well
[ "$sev" == "emerg" || "$sev" == "alert" || "$sev" == "crit" || \
"$sev" == "err" || "$sev" == "warning" ] && arg=-s
logger -t lvcheck $arg -p user."$sev" -- "$msg"
}
# determine whether the machine is on AC power
function on_ac_power() {
local any_known=no
# try sysfs power class first
if [ -d /sys/class/power_supply ] ; then
for psu in /sys/class/power_supply/* ; do
if [ -r "${psu}/type" ] ; then
type="`cat "${psu}/type"`"
# ignore batteries
[ "${type}" = "Battery" ] && continue
online="`cat "${psu}/online"`"
[ "${online}" = 1 ] && return 0
[ "${online}" = 0 ] && any_known=yes
fi
done
[ "${any_known}" = "yes" ] && return 1
fi
# else fall back to AC adapters in /proc
if [ -d /proc/acpi/ac_adapter ] ; then
for ac in /proc/acpi/ac_adapter/* ; do
if [ -r "${ac}/state" ] ; then
grep -q on-line "${ac}/state" && return 0
grep -q off-line "${ac}/state" && any_known=yes
elif [ -r "${ac}/status" ] ; then
grep -q on-line "${ac}/status" && return 0
grep -q off-line "${ac}/status" && any_known=yes
fi
done
[ "${any_known}" = "yes" ] && return 1
fi
if [ "$AC_UNKNOWN" == "CONTINUE" ] ; then
return 0 # assume on AC power
elif [ "$AC_UNKNOWN" == "ABORT" ] ; then
return 1 # assume on battery
else
log "err" "Invalid value for AC_UNKNOWN in the config file"
exit 1
fi
}
# attempt to force a check of $1 on the next reboot
function try_force_check() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
tune2fs -C 16000 "$dev"
;;
xfs)
# XFS does not enforce check intervals; let email suffice.
;;
*)
log "warning" "Don't know how to force a check on $fstype..."
;;
esac
}
# attempt to set the last-check time on $1 to now, and the mount count to 0.
function try_delay_checks() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
tune2fs -C 0 -T now "$dev"
;;
xfs)
# XFS does not enforce check intervals; nothing to delay
;;
*)
log "warning" "Don't know how to delay checks on $fstype..."
;;
esac
}
# print the date that $1 was last checked, in a format that date(1) will
# accept, or "Unknown" if we don't know how to find that date.
function try_get_check_date() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
dumpe2fs -h "$dev" 2>/dev/null | grep 'Last checked:' | \
sed -e 's/Last checked:[[:space:]]*//'
;;
*)
# XFS does not save the last-checked date
# TODO: add support for various other FSes
echo "Unknown"
;;
esac
}
# do any extra checks for filesystem type $2, on device $1
function should_still_check() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
if tune2fs -l "$dev" | grep -q "Journal device" ; then
log "warning" "Cowardly refusing to check $dev, which has an external journal."
return 1
fi
esac
return 0
}
# check the FS on $1 passively, saving output to $3.
function perform_check() {
local dev="$1"
local fstype="$2"
local tmpfile="$3"
case "$fstype" in
ext2|ext3)
# first clear the orphaned-inode list, to avoid unnecessary FS changes
# in the next step (which would cause an "error" exit from e2fsck).
# -C 0 is present for cases where the script is run interactively
# (logsave -s strips out the progress bar). ignore the return status
# of this e2fsck, as it doesn't matter.
nice logsave -as "${tmpfile}" e2fsck -p -C 0 "$dev"
# then do the real check; -y is here to give more info on any errors
# that may be present on the FS, in the log file. the snapshot is
# writable, so it shouldn't break anything if e2fsck changes it.
nice logsave -as "${tmpfile}" e2fsck -fy -C 0 "$dev"
return $?
;;
reiserfs)
echo Yes | nice logsave -as "${tmpfile}" fsck.reiserfs --check "$dev"
# apparently can't fail? let's hope not...
return 0
;;
xfs)
nice logsave -as "${tmpfile}" xfs_repair -n "$dev"
return $?
;;
jfs)
nice logsave -as "${tmpfile}" fsck.jfs -fn "$dev"
return $?
;;
*)
log "warning" "Don't know how to check $fstype filesystems passively: assuming OK."
;;
esac
}
# do everything needed to check and reset dates and counters on /dev/$1/$2.
function check_fs() {
local vg="$1"
local lv="$2"
local fstype="$3"
local snapsize="$4"
local tmpfile=`mktemp -t lvcheck.log.XXXXXXXXXX`
local errlog="/var/log/lvcheck-${vg}@${lv}"
local snaplvbase="${lv}-lvcheck-temp"
local snaplv="${snaplvbase}-`date +'%Y%m%d'`"
# clean up any left-over snapshot LVs
for lvtemp in /dev/${vg}/${snaplvbase}* ; do
if [ -e "$lvtemp" ] ; then
# Assume the script won't run more than one instance at a time?
log "warning" "Found stale snapshot $lvtemp: attempting to remove."
if ! lvremove -f "${lvtemp##/dev}" ; then
log "error" "Could not delete stale snapshot $lvtemp"
return 1
fi
fi
done
# and create this one
lvcreate -s -l "$snapsize" -n "${snaplv}" "${vg}/${lv}"
if perform_check "/dev/${vg}/${snaplv}" "${fstype}" "${tmpfile}" ; then
log "info" "Background scrubbing of /dev/${vg}/${lv} succeeded."
try_delay_checks "/dev/${vg}/${lv}" "$fstype"
else
log "err" "Background scrubbing of /dev/${vg}/${lv} failed: run fsck offline soon!"
try_force_check "/dev/${vg}/${lv}" "$fstype"
if test -n "$EMAIL"; then
mail -s "Fsck of /dev/${vg}/${lv} failed!" $EMAIL < $tmpfile
fi
# save the log file in /var/log in case mail is disabled
(
echo ""
echo -n " Check on " ; date +'%Y-%m-%d'
echo "======================="
cat "$tmpfile"
) >>"$errlog"
fi
rm -f "$tmpfile"
lvremove -f "${vg}/${snaplv}"
}
# pull in configuration -- overwrite the defaults above if the file exists
[ -r /etc/lvcheck.conf ] && . /etc/lvcheck.conf
# check whether the machine is on AC power: if not, skip fsck
on_ac_power || exit 0
# parse up lvscan output
lvscan 2>&1 | grep ACTIVE | awk '{print $2;}' | \
while read DEV ; do
# remove the single quotes around the device name
DEV="`echo "$DEV" | tr -d \'`"
# get the FS type: blkid prints TYPE="blah"
eval `blkid -s TYPE "$DEV" | cut -d' ' -f2`
# see whether this FS needs any extra checks that might disqualify this device
should_still_check "$DEV" "$TYPE" || continue
# get the last-check time
check_date=`try_get_check_date "$DEV" "$TYPE"`
# if the date is unknown, run fsck every time the script runs. sigh.
if [ "$check_date" != "Unknown" ] ; then
# add $INTERVAL days, and throw away the time portion
check_day=`date --date="$check_date $INTERVAL days" +'%Y%m%d'`
# get today's date, and skip the check if it's not within the interval
today=`date +'%Y%m%d'`
[ $check_day -gt $today ] && continue
fi
# get the volume group and logical volume names
VG="`lvs --noheadings -o vg_name "$DEV"`"
LV="`lvs --noheadings -o lv_name "$DEV"`"
# get the free space and LV size (in megs), guess at the snapshot
# size, and see how much the admin will let us use (keeping MINFREE
# available)
SPACE="`lvs --noheadings --units M --nosuffix -o vg_free "$DEV"`"
SIZE="`lvs --noheadings --units M --nosuffix -o lv_size "$DEV"`"
SNAPSIZE="`expr "$SIZE" / 500`"
AVAIL="`expr "$SPACE" - "$MINFREE"`"
# if we don't even have MINSNAP space available, skip the LV
if [ "$MINSNAP" -gt "$AVAIL" -o "$AVAIL" -le 0 ] ; then
log "warning" "Not enough free space on volume group for ${DEV}; skipping"
continue
fi
# make snapshot large enough to handle e.g. journal and other updates
[ "$SNAPSIZE" -lt "$MINSNAP" ] && SNAPSIZE="$MINSNAP"
# limit snapshot to available space (VG space minus min-free)
[ "$SNAPSIZE" -gt "$AVAIL" ] && SNAPSIZE="$AVAIL"
# don't need to check SNAPSIZE again: MINSNAP <= AVAIL, MINSNAP <= SNAPSIZE,
# and SNAPSIZE <= AVAIL, combined, means SNAPSIZE must be between MINSNAP
# and AVAIL, which is what we need -- assuming AVAIL > 0
# check it
check_fs "$VG" "$LV" "$TYPE" "$SNAPSIZE"
done
[-- Attachment #3: lvcheck.conf --]
[-- Type: text/plain, Size: 1289 bytes --]
#!/bin/sh
# lvcheck configuration file
# This file follows the pattern of sshd_config: default
# values are shown here, commented-out.
# EMAIL
# Address to send failure notifications to. If empty,
# failure notifications will not be sent.
#EMAIL='root'
# INTERVAL
# Days to wait between checks. All LVs use the same
# INTERVAL, but the "days since last check" value can
# be different per LV, since that value is stored in
# the filesystem superblock.
#INTERVAL=30
# AC_UNKNOWN
# Whether to run the *fsck checks if the script can't
# determine whether the machine is on AC power. Laptop
# users will want to set this to ABORT, while server and
# desktop users will probably want to set this to
# CONTINUE. Those are the only two valid values.
#AC_UNKNOWN="CONTINUE"
# MINSNAP
# Minimum snapshot size to take, in megabytes. The
# default snapshot size is 1/500 the size of the logical
# volume, but if that size is less than MINSNAP, the
# script will use MINSNAP instead. This should be large
# enough to handle e.g. journal updates, and other disk
# changes that require (semi-)constant space.
#MINSNAP=256
# MINFREE
# Minimum amount of space (in megabytes) to keep free in
# each volume group when creating snapshots.
#MINFREE=0
next reply other threads:[~2008-04-24 22:40 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-24 22:39 Bryan Kadzban [this message]
2008-04-27 1:26 ` [linux-lvm] Offline fsck (checking snapshots) Charles Marcus
2008-04-27 2:17 ` Bryan Kadzban
2008-04-27 19:48 ` Charles Marcus
2008-04-27 23:37 ` Bryan Kadzban
2008-04-27 23:48 ` Dale
2008-05-10 1:07 ` David Coulson
2008-05-11 0:25 ` Bryan Kadzban
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48110C2D.5020800@kadzban.is-a-geek.net \
--to=bryan@kadzban.is-a-geek.net \
--cc=linux-lvm@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).