From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Jan Tulak <jtulak@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
Eric Sandeen <sandeen@sandeen.net>,
linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] fsck.xfs: allow forced repairs using xfs_repair
Date: Thu, 8 Mar 2018 08:28:38 -0800 [thread overview]
Message-ID: <20180308162838.GY18989@magnolia> (raw)
In-Reply-To: <CACj3i71SHP90DOAhpS3kWyY_6gnx9OzTqgHy7=uxAZH0cfUKMw@mail.gmail.com>
On Thu, Mar 08, 2018 at 11:57:40AM +0100, Jan Tulak wrote:
> On Tue, Mar 6, 2018 at 10:39 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, Mar 06, 2018 at 12:51:18PM +0100, Jan Tulak wrote:
> >> On Tue, Mar 6, 2018 at 12:33 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> >> > On 3/5/18 4:31 PM, Dave Chinner wrote:
> >> >> On Mon, Mar 05, 2018 at 04:06:38PM -0600, Eric Sandeen wrote:
> >> >>> As for running automatically and fix any problems, we may need to make
> >> >>> a decision. If it won't mount due to a log problem, do we automatically
> >> >>> use -L or drop to a shell and punt to the admin? (That's what we would
> >> >>> do w/o any fsck -f invocation today...)
> >> >>
> >> >> Define the expected "forcefsck" semantics, and that will tell us
> >> >> what we need to do. Is it automatic system recovery? What if the
> >> >> root fs can't be mounted due to log replay problems?
> >> >
> >> > You're asking too much. ;) Semantics? ;) Best we can probably do
> >> > is copy what e2fsck does - it tries to replay the log before running
> >> > the actual fsck. So ... what does e2fsck do if /it/ can't replay
> >> > the log?
> >>
> >> As far as I can tell, in that case, e2fsck exit code indicates 4 -
> >> File system errors left uncorrected, but I'm studying ext testing
> >> tools and will try to verify it.
> >> About the -L flag, I think it is a bad idea - we don't want anything
> >> dangerous to happen here, so if it can't be fixed safely and in an
> >> automated way, just bail out.
> >> That being said, I added a log replay attempt in there (via mount/unmount).
> >
> > I really don't advise doing that for a forced filesystem check. If
> > the log is corrupt, mounting it will trigger the problems we are
> > trying to avoid/fix by running a forced filesystem check. As it is,
> > we're probably being run in this mode because mounting has already
> > failed and causing the system not to boot.
> >
> > What we need to do is list how the startup scripts work according to
> > what error is returned, and then match the behaviour we want in a
> > specific corruption case to the behaviour of a specific return
> > value.
> >
> > i.e. if we have a dirty log, then really we need manual
> > intervention. That means we need to return an error that will cause
> > the startup script to stop and drop into an interactive shell for
> > the admin to fix manually.
> >
> > This is what I mean by "define the expected forcefsck semantics" -
> > describe the behaviour of the system in reponse to the errors we can
> > return to it, and match them to the problem cases we need to resolve
> > with fsck.xfs.
>
> I tested it on Fedora 27. Exit codes 2 and 4 ("File system errors
> corrected, system should be rebooted" and "File system errors left
> uncorrected") drop the user into the emergency shell. Anything else
> and the boot continues.
FWIW Debian seems to panic() if the exit code has (1 << 2) set, where
"panic()" either drops to a shell if panic= is not given or actually
reboots the machine if panic= is given. All other cases proceed with
boot, including 2 (errors fixed, reboot now).
That said, the installer seems to set up root xfs as pass 0 in fstab so
fsck is not included in the initramfs at all.
> This happens before root volume is mounted during the boot, so I
> propose this behaviour for fsck.xfs:
> - if the volume/device is mounted, exit with 16 - usage or syntax
> error (just to be sure)
> - if the volume/device has a dirty log, exit with 4 - errors left
> uncorrected (drop to the shell)
> - if we find no errors, exit with 0 - no errors
> - if we find anything and xfs_repair ends successfully, exit with 1 -
> errors corrected
> - anything else and exit with 8 - operational error
>
> And is there any other way how to get the "there were some errors, but
> we corrected them" other than either 1) screenscrape xfs_repair or 2)
> run xfs_repair twice, once with -n to detect and then without -n to
> fix the found errors?
I wouldn't run it twice, repair can take quite a while to run.
--D
> >
> >> >>>> I also wonder if we can limit this to just the boot infrastructure,
> >> >>>> because I really don't like the idea of users using fsck.xfs -f to
> >> >>>> repair damage filesystems because "that's what I do to repair ext4
> >> >>>> filesystems"....
> >> >>>
> >> >>> Depending on how this gets fleshed out, fsck.xfs -f isn't any different
> >> >>> than bare xfs_repair... (Unless all of the above suggestions about dirty
> >> >>> logs get added, then it certainly is!) So, yeah...
> >> >>>
> >> >>> How would you propose limiting it to the boot environment?
> >> >>
> >> >> I have no idea - this is all way outside my area of expertise...
> >> >
> >> > A halfway measure would be to test whether the script is interactive, perhaps?
> >> >
> >> > https://www.tldp.org/LDP/abs/html/intandnonint.html
> >> >
> >> > case $- in
> >> > *i*) # interactive shell
> >> > ;;
> >> > *) # non-interactive shell
> >> > ;;
> >> >
> >>
> >> IMO, any such test would make fsck.xfs behave unpredictably for the
> >> user. If anyone wants to run fsck.xfs -f instead of xfs_repair, it is
> >> their choice.
> >
> > We limit user choices all the time. Default values, config options,
> > tuning variables, etc, IOWs, it's our choice as developers to allow
> > users to do something or not. And in this case, we made this choice
> > to limit what fsck.xfs could do a long time ago:
> >
> > # man fsck.xfs
> > .....
> > If you wish to check the consistency of an XFS filesystem,
> > or repair a damaged or corrupt XFS filesystem, see
> > xfs_repair(8).
> > .....
> > # fsck.xfs
> > If you wish to check the consistency of an XFS filesystem or
> > repair a damaged filesystem, see xfs_repair(8).
> > #
> >
>
> At that point, it was a consistent behaviour, do nothing all the time,
> no matter what.
>
> >
> >> We can print something "next time use xfs_repair
> >> directly" for an interactive session, but I don't like the idea of the
> >> script doing different things based on some (for the user) hidden
> >> variables.
> >
> > What hidden variable are you talking about here? Having a script
> > determine behaviour based on whether it is in an interactive
> > sessions or not is a common thing to do. There's nothing tricky or
> > unusual about it....
> >
>
> I'm not aware of any script or tool that would refuse to work except
> when started in a specific environment and noninteractively (doesn't
> mean they don't exist, but it is not common). And because it seems
> that fsck.xfs -f will do only what bare xfs_repair would do, no log
> replay, nothing... then I really think that changing what the script
> does (not just altering its output) based on environment tests is
> unnecessary. And for anyone without this specific knowledge, it would
> be confusing - people expect that for the same input the application
> or script does the same thing at the end.
>
> Cheers,
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-03-08 16:28 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-05 15:05 [PATCH] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
2018-03-05 21:56 ` Dave Chinner
2018-03-05 22:06 ` Eric Sandeen
2018-03-05 22:20 ` Darrick J. Wong
2018-03-05 22:31 ` Dave Chinner
2018-03-05 23:33 ` Eric Sandeen
2018-03-06 11:51 ` Jan Tulak
2018-03-06 21:39 ` Dave Chinner
2018-03-08 10:57 ` Jan Tulak
2018-03-08 16:28 ` Darrick J. Wong [this message]
2018-03-08 22:36 ` Dave Chinner
2018-03-14 13:51 ` Jan Tulak
2018-03-14 15:25 ` Darrick J. Wong
2018-03-14 21:10 ` Dave Chinner
2018-03-15 17:01 ` Jan Tulak
2018-03-08 23:28 ` Eric Sandeen
2018-03-14 13:30 ` Jan Tulak
2018-03-14 15:19 ` Eric Sandeen
2018-03-15 11:16 ` Jan Tulak
2018-03-15 22:19 ` Dave Chinner
2018-03-15 17:45 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Jan Tulak
2018-03-15 17:45 ` [PATCH 2/2 v1] fsck.xfs: allow forced repairs using xfs_repair Jan Tulak
2018-03-15 17:47 ` [PATCH 2/2 v2] " Jan Tulak
2018-03-15 17:50 ` [PATCH 2/2] " Jan Tulak
2018-03-15 18:11 ` Darrick J. Wong
2018-03-15 18:22 ` Jan Tulak
2018-03-15 18:28 ` [PATCH 2/2 v4] " Jan Tulak
2018-03-15 18:49 ` Darrick J. Wong
2018-03-16 10:19 ` Jan Tulak
2018-03-16 15:39 ` Darrick J. Wong
2018-03-16 17:07 ` [PATCH 2/2 v5] " Jan Tulak
2018-03-23 2:37 ` Eric Sandeen
2018-03-23 3:25 ` Darrick J. Wong
2018-03-23 3:29 ` Eric Sandeen
2018-03-23 3:42 ` Darrick J. Wong
2018-03-23 14:00 ` Jan Tulak
2018-03-23 14:14 ` Jan Tulak
2018-03-23 14:33 ` [PATCH 2/2 v6] " Jan Tulak
2022-09-28 5:28 ` Darrick J. Wong
2022-09-29 8:31 ` Carlos Maiolino
2018-03-15 18:03 ` [PATCH 1/2] xfs_repair: add flag -e to detect corrected errors Darrick J. Wong
2018-03-15 18:23 ` [PATCH 1/2 v2] " Jan Tulak
2018-03-15 18:44 ` Darrick J. Wong
2018-03-23 1:57 ` Eric Sandeen
2018-03-23 9:24 ` Jan Tulak
2018-03-23 14:32 ` [PATCH 1/2 v3] xfs_repair: add flag -e to modify exit code for " Jan Tulak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180308162838.GY18989@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=jtulak@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).