From: "Theodore Ts'o" <tytso@mit.edu>
To: Samuel Mendoza-Jonas <samjonas@amazon.com>
Cc: linux-ext4@vger.kernel.org, adilger.kernel@dilger.ca, benh@amazon.com
Subject: Re: Debugging ext4 corruption with nojournal & extents
Date: Mon, 8 Nov 2021 22:14:33 -0500 [thread overview]
Message-ID: <YYnnmQjrYii0dOYH@mit.edu> (raw)
In-Reply-To: <20211108173520.xp6xphodfhcen2sy@u87e72aa3c6c25c.ant.amazon.com>
On Mon, Nov 08, 2021 at 09:35:20AM -0800, Samuel Mendoza-Jonas wrote:
> Based on that what I think is happening is
> - A file with separate (i.e. non-inline) extents is synced / written to disk
> (in this case, one of the large "compound" files)
> - ext4_end_io_end() kicks off writeback of extent metadata
> - AIUI this marks the related buffers dirty but does not wait on them in the
> no-journal case
> - The file is deleted, causing the extents to be "removed" and the blocks where
> they were stored are marked unused
> - A new file is created (any file, separate extents not required)
> - The new file is allocated the block that was just freed (the physical block
> where the old extents were located)
>
> Some time between this point and when the file is next read, the dirty extent
> buffer hits the disk instead of the intended data for the new file.
> A big-hammer hack in __ext4_handle_dirty_metadata() to always sync metadata
> blocks appears to avoid the issue but isn't ideal - most likely a better
> solution would be to ensure any dirty metadata buffers are synced before the
> inode is dropped.
>
> Overall does this summary sound valid, or have I wandered into the
> weeds somewhere?
Hmm... well, I can tell you what's *supposed* to happen. When the
extent block is freed, ext4_free_blocks() gets called with the
EXT4_FREE_BLOCKS_FORGET flag set. ext4_free_blocks() calls
ext4_forget() in two places; one when bh passed to ext4_free_blocks()
is NULL, and one where it is non-NULL. And then ext4_free_blocks()
calls bforget(), which should cause the dirty extent block to get
thrown away.
This *should* have prevented your failure scenario from taking place,
since after the call to bforget() the dirty extent buffer *shouldn't*
have hit the disk. If your theory is correct, the somehow either (a)
the bforget() wasn't called, or (b) the bforget() didn't work, and
then the page writeback for the new page happened first, and then
buffer cache writeback happened second, overwriting the intended data
for the new file.
Have you tried enabling the blktrace tracer in combination with some
of the ext4 tracepoints, to see if you can catch the double write
happening? Another thing to try would be enabling some tracepoints,
such as ext4_forget and ext4_free_blocks. Unfortunately we don't have
any tracepoints in fs/ext4/page-io.c to get a tracepoint which
includes the physical block ranges coming from the writeback path.
And the tracepoints in fs/fs-writeback.c won't have the physical block
number (just the inode and logical block numbers).
- Ted
next prev parent reply other threads:[~2021-11-09 3:14 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-08 17:35 Debugging ext4 corruption with nojournal & extents Samuel Mendoza-Jonas
2021-11-09 3:14 ` Theodore Ts'o [this message]
2021-11-15 23:55 ` Samuel Mendoza-Jonas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYnnmQjrYii0dOYH@mit.edu \
--to=tytso@mit.edu \
--cc=adilger.kernel@dilger.ca \
--cc=benh@amazon.com \
--cc=linux-ext4@vger.kernel.org \
--cc=samjonas@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox