* [linux-lvm] Offline fsck (checking snapshots)
@ 2008-04-24 22:39 Bryan Kadzban
2008-04-27 1:26 ` Charles Marcus
2008-05-10 1:07 ` David Coulson
0 siblings, 2 replies; 8+ messages in thread
From: Bryan Kadzban @ 2008-04-24 22:39 UTC (permalink / raw)
To: linux-lvm
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
There was some discussion on the ext3-users list a few months ago, about
how e2fsck took a long time to run, and it was getting forced because it
keeps track of a couple of counters that can force it (the counters are
days since the last full fsck, and mounts since the last full fsck).
(The thread starts at [1], the script development started at [2], and
the most recent version is at [3]. The one extra thing I've added is
skipping ext2/3 FSes that have an external journal.)
The suggestion was made that if the user is using LVM, a temporary
snapshot could be taken, and then fsck could be run on that. If it
succeeds, then it's possible to set the last-fsck-time and mount-count
while the real FS is mounted.
I've gotten a script that I think is reasonable, that handles this.
With some help from others, it now works with XFS as well as ext2/3, and
it's supposed to also work with JFS. Since it requires LVM, I think it
might make sense to put something like it into the LVM userspace tools.
(There is one issue: it also requires blkid and logsave from e2fsprogs.
I could work around the requirement for logsave (using tee -a), but
blkid would be harder.)
The idea behind the script is, you run it at night from cron; it will
check each LV on the system and mail a user if there are any problems.
It also logs to syslog.
I've attached the script and its configuration file to this message.
Comments?
[1] https://www.redhat.com/archives/ext3-users/2008-January/msg00027.html
[2] https://www.redhat.com/archives/ext3-users/2008-January/msg00032.html
[3]
https://www.redhat.com/archives/ext3-users/2008-February/msg00004.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIEQwpS5vET1Wea5wRA43cAKDWGJVFgV6fmJKeQUgcPH6Ebd1aygCfb4a9
TbiWVGUYFnPeQSWiJVl0x2k=
=Juq0
-----END PGP SIGNATURE-----
[-- Attachment #2: lvcheck --]
[-- Type: text/plain, Size: 10196 bytes --]
#!/bin/sh
#
# lvcheck, version 1.0
# Maintainer: Bryan Kadzban <bryan@kadzban.is-a-geek.net>
# Other credits:
# Concept and original script by Theodore Tso <tytso@mit.edu>
# on_ac_power is mostly from Debian's powermgmt-base package
# Lots of help (ideas, initial XFS/JFS support, etc.) from
# Andreas Dilger <adilger@sun.com>
# Better XFS support from Eric Sandeen <sandeen@redhat.com>
# Released under the GNU General Public License, either version 2 or
# (at your option) any later version.
# Overview:
#
# Run this from cron periodically (e.g. once per week). If the
# machine is on AC power, it will run the checks; otherwise they will
# all be skipped. (If the script can't tell whether the machine is
# on AC power, it will use a setting in the configuration file
# (/etc/lvcheck.conf) to decide whether to continue with the checks,
# or abort.)
#
# The script will then decide which logical volumes are active, and
# can therefore be checked via an LVM snapshot. Each of these LVs
# will be queried to find its last-check day, and if that was more
# than $INTERVAL days ago (where INTERVAL is set in the configuration
# file as well), or if the last-check day can't be determined, then
# the script will take an LVM snapshot of that LV and run fsck on the
# snapshot. The snapshot will be set to use 1/500 the space of the
# source LV. After fsck finishes, the snapshot is destroyed.
# (Snapshots are checked serially.)
#
# Any LV that passes fsck should have its last-check time updated (in
# the real superblock, not the snapshot's superblock); any LV whose
# fsck fails will send an email notification to a configurable user
# ($EMAIL). This $EMAIL setting is optional, but its use is highly
# recommended, since if any LV fails, it will need to be checked
# manually, offline. Relevant messages are also sent to syslog.
# Set default values for configuration params. Changes to these values
# will be overwritten on an upgrade! To change these values, use
# /etc/lvcheck.conf.
EMAIL='root'
INTERVAL=30
AC_UNKNOWN="CONTINUE"
MINSNAP=256
MINFREE=0
# send $2 to syslog, with severity $1
# severities are emerg/alert/crit/err/warning/notice/info/debug
function log() {
local sev="$1"
local msg="$2"
local arg=
# log warning-or-higher messages to stderr as well
[ "$sev" == "emerg" || "$sev" == "alert" || "$sev" == "crit" || \
"$sev" == "err" || "$sev" == "warning" ] && arg=-s
logger -t lvcheck $arg -p user."$sev" -- "$msg"
}
# determine whether the machine is on AC power
function on_ac_power() {
local any_known=no
# try sysfs power class first
if [ -d /sys/class/power_supply ] ; then
for psu in /sys/class/power_supply/* ; do
if [ -r "${psu}/type" ] ; then
type="`cat "${psu}/type"`"
# ignore batteries
[ "${type}" = "Battery" ] && continue
online="`cat "${psu}/online"`"
[ "${online}" = 1 ] && return 0
[ "${online}" = 0 ] && any_known=yes
fi
done
[ "${any_known}" = "yes" ] && return 1
fi
# else fall back to AC adapters in /proc
if [ -d /proc/acpi/ac_adapter ] ; then
for ac in /proc/acpi/ac_adapter/* ; do
if [ -r "${ac}/state" ] ; then
grep -q on-line "${ac}/state" && return 0
grep -q off-line "${ac}/state" && any_known=yes
elif [ -r "${ac}/status" ] ; then
grep -q on-line "${ac}/status" && return 0
grep -q off-line "${ac}/status" && any_known=yes
fi
done
[ "${any_known}" = "yes" ] && return 1
fi
if [ "$AC_UNKNOWN" == "CONTINUE" ] ; then
return 0 # assume on AC power
elif [ "$AC_UNKNOWN" == "ABORT" ] ; then
return 1 # assume on battery
else
log "err" "Invalid value for AC_UNKNOWN in the config file"
exit 1
fi
}
# attempt to force a check of $1 on the next reboot
function try_force_check() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
tune2fs -C 16000 "$dev"
;;
xfs)
# XFS does not enforce check intervals; let email suffice.
;;
*)
log "warning" "Don't know how to force a check on $fstype..."
;;
esac
}
# attempt to set the last-check time on $1 to now, and the mount count to 0.
function try_delay_checks() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
tune2fs -C 0 -T now "$dev"
;;
xfs)
# XFS does not enforce check intervals; nothing to delay
;;
*)
log "warning" "Don't know how to delay checks on $fstype..."
;;
esac
}
# print the date that $1 was last checked, in a format that date(1) will
# accept, or "Unknown" if we don't know how to find that date.
function try_get_check_date() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
dumpe2fs -h "$dev" 2>/dev/null | grep 'Last checked:' | \
sed -e 's/Last checked:[[:space:]]*//'
;;
*)
# XFS does not save the last-checked date
# TODO: add support for various other FSes
echo "Unknown"
;;
esac
}
# do any extra checks for filesystem type $2, on device $1
function should_still_check() {
local dev="$1"
local fstype="$2"
case "$fstype" in
ext2|ext3)
if tune2fs -l "$dev" | grep -q "Journal device" ; then
log "warning" "Cowardly refusing to check $dev, which has an external journal."
return 1
fi
esac
return 0
}
# check the FS on $1 passively, saving output to $3.
function perform_check() {
local dev="$1"
local fstype="$2"
local tmpfile="$3"
case "$fstype" in
ext2|ext3)
# first clear the orphaned-inode list, to avoid unnecessary FS changes
# in the next step (which would cause an "error" exit from e2fsck).
# -C 0 is present for cases where the script is run interactively
# (logsave -s strips out the progress bar). ignore the return status
# of this e2fsck, as it doesn't matter.
nice logsave -as "${tmpfile}" e2fsck -p -C 0 "$dev"
# then do the real check; -y is here to give more info on any errors
# that may be present on the FS, in the log file. the snapshot is
# writable, so it shouldn't break anything if e2fsck changes it.
nice logsave -as "${tmpfile}" e2fsck -fy -C 0 "$dev"
return $?
;;
reiserfs)
echo Yes | nice logsave -as "${tmpfile}" fsck.reiserfs --check "$dev"
# apparently can't fail? let's hope not...
return 0
;;
xfs)
nice logsave -as "${tmpfile}" xfs_repair -n "$dev"
return $?
;;
jfs)
nice logsave -as "${tmpfile}" fsck.jfs -fn "$dev"
return $?
;;
*)
log "warning" "Don't know how to check $fstype filesystems passively: assuming OK."
;;
esac
}
# do everything needed to check and reset dates and counters on /dev/$1/$2.
function check_fs() {
local vg="$1"
local lv="$2"
local fstype="$3"
local snapsize="$4"
local tmpfile=`mktemp -t lvcheck.log.XXXXXXXXXX`
local errlog="/var/log/lvcheck-${vg}@${lv}"
local snaplvbase="${lv}-lvcheck-temp"
local snaplv="${snaplvbase}-`date +'%Y%m%d'`"
# clean up any left-over snapshot LVs
for lvtemp in /dev/${vg}/${snaplvbase}* ; do
if [ -e "$lvtemp" ] ; then
# Assume the script won't run more than one instance at a time?
log "warning" "Found stale snapshot $lvtemp: attempting to remove."
if ! lvremove -f "${lvtemp##/dev}" ; then
log "error" "Could not delete stale snapshot $lvtemp"
return 1
fi
fi
done
# and create this one
lvcreate -s -l "$snapsize" -n "${snaplv}" "${vg}/${lv}"
if perform_check "/dev/${vg}/${snaplv}" "${fstype}" "${tmpfile}" ; then
log "info" "Background scrubbing of /dev/${vg}/${lv} succeeded."
try_delay_checks "/dev/${vg}/${lv}" "$fstype"
else
log "err" "Background scrubbing of /dev/${vg}/${lv} failed: run fsck offline soon!"
try_force_check "/dev/${vg}/${lv}" "$fstype"
if test -n "$EMAIL"; then
mail -s "Fsck of /dev/${vg}/${lv} failed!" $EMAIL < $tmpfile
fi
# save the log file in /var/log in case mail is disabled
(
echo ""
echo -n " Check on " ; date +'%Y-%m-%d'
echo "======================="
cat "$tmpfile"
) >>"$errlog"
fi
rm -f "$tmpfile"
lvremove -f "${vg}/${snaplv}"
}
# pull in configuration -- overwrite the defaults above if the file exists
[ -r /etc/lvcheck.conf ] && . /etc/lvcheck.conf
# check whether the machine is on AC power: if not, skip fsck
on_ac_power || exit 0
# parse up lvscan output
lvscan 2>&1 | grep ACTIVE | awk '{print $2;}' | \
while read DEV ; do
# remove the single quotes around the device name
DEV="`echo "$DEV" | tr -d \'`"
# get the FS type: blkid prints TYPE="blah"
eval `blkid -s TYPE "$DEV" | cut -d' ' -f2`
# see whether this FS needs any extra checks that might disqualify this device
should_still_check "$DEV" "$TYPE" || continue
# get the last-check time
check_date=`try_get_check_date "$DEV" "$TYPE"`
# if the date is unknown, run fsck every time the script runs. sigh.
if [ "$check_date" != "Unknown" ] ; then
# add $INTERVAL days, and throw away the time portion
check_day=`date --date="$check_date $INTERVAL days" +'%Y%m%d'`
# get today's date, and skip the check if it's not within the interval
today=`date +'%Y%m%d'`
[ $check_day -gt $today ] && continue
fi
# get the volume group and logical volume names
VG="`lvs --noheadings -o vg_name "$DEV"`"
LV="`lvs --noheadings -o lv_name "$DEV"`"
# get the free space and LV size (in megs), guess at the snapshot
# size, and see how much the admin will let us use (keeping MINFREE
# available)
SPACE="`lvs --noheadings --units M --nosuffix -o vg_free "$DEV"`"
SIZE="`lvs --noheadings --units M --nosuffix -o lv_size "$DEV"`"
SNAPSIZE="`expr "$SIZE" / 500`"
AVAIL="`expr "$SPACE" - "$MINFREE"`"
# if we don't even have MINSNAP space available, skip the LV
if [ "$MINSNAP" -gt "$AVAIL" -o "$AVAIL" -le 0 ] ; then
log "warning" "Not enough free space on volume group for ${DEV}; skipping"
continue
fi
# make snapshot large enough to handle e.g. journal and other updates
[ "$SNAPSIZE" -lt "$MINSNAP" ] && SNAPSIZE="$MINSNAP"
# limit snapshot to available space (VG space minus min-free)
[ "$SNAPSIZE" -gt "$AVAIL" ] && SNAPSIZE="$AVAIL"
# don't need to check SNAPSIZE again: MINSNAP <= AVAIL, MINSNAP <= SNAPSIZE,
# and SNAPSIZE <= AVAIL, combined, means SNAPSIZE must be between MINSNAP
# and AVAIL, which is what we need -- assuming AVAIL > 0
# check it
check_fs "$VG" "$LV" "$TYPE" "$SNAPSIZE"
done
[-- Attachment #3: lvcheck.conf --]
[-- Type: text/plain, Size: 1289 bytes --]
#!/bin/sh
# lvcheck configuration file
# This file follows the pattern of sshd_config: default
# values are shown here, commented-out.
# EMAIL
# Address to send failure notifications to. If empty,
# failure notifications will not be sent.
#EMAIL='root'
# INTERVAL
# Days to wait between checks. All LVs use the same
# INTERVAL, but the "days since last check" value can
# be different per LV, since that value is stored in
# the filesystem superblock.
#INTERVAL=30
# AC_UNKNOWN
# Whether to run the *fsck checks if the script can't
# determine whether the machine is on AC power. Laptop
# users will want to set this to ABORT, while server and
# desktop users will probably want to set this to
# CONTINUE. Those are the only two valid values.
#AC_UNKNOWN="CONTINUE"
# MINSNAP
# Minimum snapshot size to take, in megabytes. The
# default snapshot size is 1/500 the size of the logical
# volume, but if that size is less than MINSNAP, the
# script will use MINSNAP instead. This should be large
# enough to handle e.g. journal updates, and other disk
# changes that require (semi-)constant space.
#MINSNAP=256
# MINFREE
# Minimum amount of space (in megabytes) to keep free in
# each volume group when creating snapshots.
#MINFREE=0
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-04-24 22:39 [linux-lvm] Offline fsck (checking snapshots) Bryan Kadzban
@ 2008-04-27 1:26 ` Charles Marcus
2008-04-27 2:17 ` Bryan Kadzban
2008-05-10 1:07 ` David Coulson
1 sibling, 1 reply; 8+ messages in thread
From: Charles Marcus @ 2008-04-27 1:26 UTC (permalink / raw)
To: LVM general discussion and development
Bryan Kadzban wrote:
> I've gotten a script that I think is reasonable, that handles this.
> With some help from others, it now works with XFS as well as ext2/3, and
> it's supposed to also work with JFS. Since it requires LVM, I think it
> might make sense to put something like it into the LVM userspace tools.
Sounds interesting... but any particular reason you're ignoring reiserfs?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-04-27 1:26 ` Charles Marcus
@ 2008-04-27 2:17 ` Bryan Kadzban
2008-04-27 19:48 ` Charles Marcus
0 siblings, 1 reply; 8+ messages in thread
From: Bryan Kadzban @ 2008-04-27 2:17 UTC (permalink / raw)
To: LVM general discussion and development
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
Charles Marcus wrote:
> Bryan Kadzban wrote:
>> I've gotten a script that I think is reasonable, that handles this.
>> With some help from others, it now works with XFS as well as ext2/3,
>> and it's supposed to also work with JFS. Since it requires LVM, I
>> think it might make sense to put something like it into the LVM
>> userspace tools.
>
> Sounds interesting... but any particular reason you're ignoring
> reiserfs?
No particular reason, no. I just haven't used it in maybe 6 years, so I
don't remember much about it. I also assume that nobody listening to
the discussion on ext3-users uses it either (based on the fact that
nobody else asked for it). So it didn't get added. :-)
I assume fsck.reiserfs is the right executable to use? (I seem to
remember a reiserfsck, but not whether they were equivalent...) What
args should be used to get it to check the snapshot FS, preferably
making as few changes as possible? (E.g., ext3 requires a pre-check
check to clean up orphan inodes, otherwise the real check will exit with
a failure status; does reiserfs require anything similar?)
I don't remember whether it stores the last-fsck time either; if it
does, is there some way to get that out (for try_get_check_date)? Does
it support updating that time while an LV is online? What about forcing
a check (per-filesystem) on the next reboot? (Any ability to set the
last-fsck time would work for both preventing a check and forcing a
check, of course.)
Google says it can do an external journal; how can you tell whether
that's in use (to skip making snapshots of only half the device)?
Thanks!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIE+IlS5vET1Wea5wRA9jiAKDWRfyU42n5gVbmeN5oqdt4ElhhkgCfZ+we
T5XNVLgYw/tG4r5g5uaNIAU=
=Vdq3
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-04-27 2:17 ` Bryan Kadzban
@ 2008-04-27 19:48 ` Charles Marcus
2008-04-27 23:37 ` Bryan Kadzban
0 siblings, 1 reply; 8+ messages in thread
From: Charles Marcus @ 2008-04-27 19:48 UTC (permalink / raw)
To: LVM general discussion and development
On 4/26/2008, Bryan Kadzban (bryan@kadzban.is-a-geek.net) wrote:
>>> I've gotten a script that I think is reasonable, that handles
>>> this. With some help from others, it now works with XFS as well
>>> as ext2/3, and it's supposed to also work with JFS. Since it
>>> requires LVM, I think it might make sense to put something like
>>> it into the LVM userspace tools.
>> Sounds interesting... but any particular reason you're ignoring
>> reiserfs?
> No particular reason, no. I just haven't used it in maybe 6 years, so
> I don't remember much about it. I also assume that nobody listening
> to the discussion on ext3-users uses it either (based on the fact
> that nobody else asked for it). So it didn't get added. :-)
Heh - no worries, I wasn't complaining, just asking... I do use it for
my /var (maildirs), which is why I was asking...
I know, I've heard all of the horror stories... but my RAID card has a
BBU on it, and my servers all have good UPS's on them , and are running
nut sp will safely shut down in the event of a prolonged power failure
(which has only happened once).
Mine has been rock-solid for almost 4 years now.
> I assume fsck.reiserfs is the right executable to use? (I seem to
> remember a reiserfsck, but not whether they were equivalent...) What
> args should be used to get it to check the snapshot FS, preferably
> making as few changes as possible? (E.g., ext3 requires a pre-check
> check to clean up orphan inodes, otherwise the real check will exit
> with a failure status; does reiserfs require anything similar?)
Ouch... I wish I could help, I'd be happy to, but I'm just a lowly sys
admin pretender, not a programmer... ;)
Right now, I'm just trying to find the time to get an automated backup
script running to pause some services (postfix+dovecot), take a snapshot
of my /var, restart the services, run rsnapshot on the snapshot volume,
then release the snapshot volume...
Right now I'm running my backups on the live filesystem, which, since
this system isn't used all that heavily, especially at night, isn't
doesn't worry me *too* much, I'd still prefer to 'do it right'... hence
my interest in your script...
--
Best regards,
Charles
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-04-27 19:48 ` Charles Marcus
@ 2008-04-27 23:37 ` Bryan Kadzban
2008-04-27 23:48 ` Dale
0 siblings, 1 reply; 8+ messages in thread
From: Bryan Kadzban @ 2008-04-27 23:37 UTC (permalink / raw)
To: LVM general discussion and development
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
Charles Marcus wrote:
> On 4/26/2008, Bryan Kadzban (bryan@kadzban.is-a-geek.net) wrote:
>> No particular reason, no. I just haven't used it in maybe 6 years,
>> so I don't remember much about it. I also assume that nobody
>> listening to the discussion on ext3-users uses it either (based on
>> the fact that nobody else asked for it). So it didn't get added.
>> :-)
Actually, now that I look again, there is a case in there for checking
reiserfs, but not for doing anything else (reading or writing the
last-checked date, for instance).
> I know, I've heard all of the horror stories... but my RAID card has
> a BBU on it, and my servers all have good UPS's on them
Some of the horror stories that I've heard are exactly the case that
you're protecting against with your batteries: Supposedly a sudden
removal of power causes some grief with certain journaling methods,
including the method that reiserfs uses. (Or used to use; maybe it's
changed since then.)
Anyway...
>> I assume fsck.reiserfs is the right executable to use? [...] What
>> args should be used to get it to check the snapshot FS, preferably
>> making as few changes as possible?
>
> Ouch... I wish I could help, I'd be happy to, but I'm just a lowly
> sys admin pretender, not a programmer... ;)
So I should go ask the reiserfs people then. That's fine. :-)
> Right now, I'm just trying to find the time to get an automated
> backup script running to pause some services (postfix+dovecot), take
> a snapshot of my /var, restart the services, run rsnapshot on the
> snapshot volume, then release the snapshot volume...
Feel free to use the LVM-specific bits of the script if you want (it is,
after all, GPLv2 or later). It's just that most of the script focuses
on running fsck, not doing a backup, so it has a bunch of stuff that you
probably don't need. But the LVM parts are the same no matter which
filesystem is on the logical volume.
Anyway, that's probably the check_fs function and anything that calls
it; you'd probably want to change perform_check to do your backup stuff,
though. And you won't care about the $fstype stuff that's littered all
over the script, either, if you're just going to run rsnapshot. (So you
don't need to run blkid either.) But you can probably get rid of
try_delay_checks and try_force_check, along with on_ac_power. And
should_still_check, and try_get_check_date. (Plus the calls in to each
of these functions, of course.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIFQ41S5vET1Wea5wRA96rAKDLUIo7evXELO9i/PDaz+0gG0N8QACfZ6f/
Ze+vrR9TsxnhD4cpkXom4Lg=
=5RRe
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-04-27 23:37 ` Bryan Kadzban
@ 2008-04-27 23:48 ` Dale
0 siblings, 0 replies; 8+ messages in thread
From: Dale @ 2008-04-27 23:48 UTC (permalink / raw)
To: LVM general discussion and development
Bryan Kadzban wrote:
> < SNIP >
>
> Some of the horror stories that I've heard are exactly the case that
> you're protecting against with your batteries: Supposedly a sudden
> removal of power causes some grief with certain journaling methods,
> including the method that reiserfs uses. (Or used to use; maybe it's
> changed since then.)
>
< SNIP >
I think things have changed since then. I have had to on a couple
occasions unplug my rig or use the halt command and not have a clean
shutdown. It fixes the errors when I reboot, if any. I have read where
others have done the same with no losses.
I have tried XFS and others and had trouble in these areas before. That
is one reason I use reiserfs and it came highly recommended because it
can recover so well.
On XFS, if you have a UPS and can always have a clean shutdown, very
cool. It seriously hates unclean shutdowns.
Dale
:-) :-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-04-24 22:39 [linux-lvm] Offline fsck (checking snapshots) Bryan Kadzban
2008-04-27 1:26 ` Charles Marcus
@ 2008-05-10 1:07 ` David Coulson
2008-05-11 0:25 ` Bryan Kadzban
1 sibling, 1 reply; 8+ messages in thread
From: David Coulson @ 2008-05-10 1:07 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 2717 bytes --]
I've been playing with this script today - Is there a more current
version, or someone who maintains it? There are a bunch of things that
seem to be broken that I ended up cleaning up on my own (e.g. it
calculates the snapshot size in M, then creates it in extents...).
I imagine there are other users who are frustrated with it, as it does
not run cleanly as it is.
David
Bryan Kadzban wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
> There was some discussion on the ext3-users list a few months ago, about
> how e2fsck took a long time to run, and it was getting forced because it
> keeps track of a couple of counters that can force it (the counters are
> days since the last full fsck, and mounts since the last full fsck).
> (The thread starts at [1], the script development started at [2], and
> the most recent version is at [3]. The one extra thing I've added is
> skipping ext2/3 FSes that have an external journal.)
>
> The suggestion was made that if the user is using LVM, a temporary
> snapshot could be taken, and then fsck could be run on that. If it
> succeeds, then it's possible to set the last-fsck-time and mount-count
> while the real FS is mounted.
>
> I've gotten a script that I think is reasonable, that handles this.
> With some help from others, it now works with XFS as well as ext2/3, and
> it's supposed to also work with JFS. Since it requires LVM, I think it
> might make sense to put something like it into the LVM userspace tools.
>
> (There is one issue: it also requires blkid and logsave from e2fsprogs.
> I could work around the requirement for logsave (using tee -a), but
> blkid would be harder.)
>
> The idea behind the script is, you run it at night from cron; it will
> check each LV on the system and mail a user if there are any problems.
> It also logs to syslog.
>
> I've attached the script and its configuration file to this message.
> Comments?
>
> [1] https://www.redhat.com/archives/ext3-users/2008-January/msg00027.html
>
> [2] https://www.redhat.com/archives/ext3-users/2008-January/msg00032.html
>
> [3]
> https://www.redhat.com/archives/ext3-users/2008-February/msg00004.html
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFIEQwpS5vET1Wea5wRA43cAKDWGJVFgV6fmJKeQUgcPH6Ebd1aygCfb4a9
> TbiWVGUYFnPeQSWiJVl0x2k=
> =Juq0
> -----END PGP SIGNATURE-----
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[-- Attachment #2: Type: text/html, Size: 3639 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-lvm] Offline fsck (checking snapshots)
2008-05-10 1:07 ` David Coulson
@ 2008-05-11 0:25 ` Bryan Kadzban
0 siblings, 0 replies; 8+ messages in thread
From: Bryan Kadzban @ 2008-05-11 0:25 UTC (permalink / raw)
To: LVM general discussion and development
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
David Coulson wrote:
> I've been playing with this script today - Is there a more current
> version, or someone who maintains it? There are a bunch of things
> that seem to be broken that I ended up cleaning up on my own
There isn't a more current version, no. I guess (for the moment anyway)
I maintain it, since I was the one that took the original and added the
config file, support for other filesystems, etc. (All that really
happened was some discussion on the ext3-users list.)
If you have patches, I'd be happy to apply them, and re-post. :-)
(I did try running it once or twice, but those were older versions, and
lots of stuff got changed since then. Unfortunately my main system
doesn't use LVM, and the qemu image that does seems to be corrupted (it
crashed at one point, and the RAID5 under the LVM got screwed up), so
it's been a long time since I ran it. It sounds like I should create
another qemu image and start testing it again.)
> (e.g. it calculates the snapshot size in M, then creates it in
> extents...).
Ouch. I don't remember exactly when that part of the script changed,
but obviously it should be using -L, and should have an "M" suffix.
I've fixed that.
> I imagine there are other users who are frustrated with it, as it
> does not run cleanly as it is.
If there are any other users. ;-)
So far, this is just an idea that was brought up in the context of ext3
(and someone complaining about really long fsck times being forced on
them on huge FSes). I brought it up here in an attempt to get it maybe
merged into the LVM tools.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFIJjz1S5vET1Wea5wRAySLAKCXKxAA+ahI/bTzlP2Zcf0gwuNqGwCfeeZC
Cmg0RzRJmIrSKD+0XabcRTQ=
=xn99
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-05-11 0:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-24 22:39 [linux-lvm] Offline fsck (checking snapshots) Bryan Kadzban
2008-04-27 1:26 ` Charles Marcus
2008-04-27 2:17 ` Bryan Kadzban
2008-04-27 19:48 ` Charles Marcus
2008-04-27 23:37 ` Bryan Kadzban
2008-04-27 23:48 ` Dale
2008-05-10 1:07 ` David Coulson
2008-05-11 0:25 ` Bryan Kadzban
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).