From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>,
Sodagudi Prasad <psodagud@codeaurora.org>,
adilger.kernel@dilger.ca, wen.xu@gatech.edu,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Remounting filesystem read-only
Date: Sat, 28 Jul 2018 00:47:04 -0700 [thread overview]
Message-ID: <20180728074704.GA4203@magnolia> (raw)
In-Reply-To: <20180728001823.GA28432@thunk.org>
On Fri, Jul 27, 2018 at 08:18:23PM -0400, Theodore Y. Ts'o wrote:
> On Fri, Jul 27, 2018 at 01:34:31PM -0700, Sodagudi Prasad wrote:
> > > The error should be pretty clear: "Inode table for bg 0 marked as
> > > needing zeroing". That should never happen.
> >
> > Can you provide any debug patch to detect when this corruption is happening?
> > Source of this corruption and how this is partition getting corrupted?
> > Or which file system operation lead to this corruption?
>
> Do you have a reliable repro? If it's a one-off, it can be caused by
> *anything*. Crappy hardware, a bug in some proprietary, binary-only
> GPU driver dereferencing some wild pointer that corrupts kernel
> memory, etc.
>
> Asking for a debug patch is like asking for "can you create technology
> that can detect when a cockroach enter my house?"
Well, ext4 *could* add metadata read and write verifiers to complain
loudly in dmesg about stuff that shouldn't be there, so at least we'd
know when we're writing cockroaches into the house... :)
--D
> So if you have a reliable repro, then we know what operations might be
> triggering the corruption, and then you work on creating a minimal
> repro, and only *then* when we have a restricted set of possibilities
> that might be the cause (for example, if removing a GPU call makes the
> problem go away, then the patch would need to be in the proprietary
> GPU driver....)
>
> > I am digging code a bit around this warning to understand more.
>
> The warning means that a flag in block group descriptor #0 is set
> that should never be set. How did the flag get set? There is any
> number of things that could cause that.
>
> You might want to look at the block group descriptor via dumpe2fs or
> debugfs, to see if it's just a single bit getting flipped, or if the
> entire block group descriptor is garbage. Note that under normal code
> paths, the flag *never* gets set by ext4 kernel code. The flag will
> get set on non-block group 0 block group descriptors by ext4, and the
> ext4 kernel code will only clear the flag.
>
> Of course, if there is a bug in some driver that dereferences a
> pointer widely, all bets are off.
>
> - Ted
next prev parent reply other threads:[~2018-07-28 7:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <366cf3ac534bbadaaa61714a43006ac7@codeaurora.org>
2018-07-27 19:26 ` Remounting filesystem read-only Sodagudi Prasad
2018-07-27 19:52 ` Theodore Y. Ts'o
2018-07-27 20:34 ` Sodagudi Prasad
2018-07-28 0:18 ` Theodore Y. Ts'o
2018-07-28 7:47 ` Darrick J. Wong [this message]
2018-08-02 2:23 ` Sodagudi Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180728074704.GA4203@magnolia \
--to=darrick.wong@oracle.com \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=psodagud@codeaurora.org \
--cc=tytso@mit.edu \
--cc=wen.xu@gatech.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.