From: "Theodore Ts'o" <tytso@mit.edu>
To: "Windl, Ulrich" <u.windl@ukr.de>
Cc: Krister Johansen <kjlx@templeofstupid.com>,
"util-linux@vger.kernel.org" <util-linux@vger.kernel.org>,
Karel Zak <kzak@redhat.com>,
"systemd-devel@lists.freedesktop.org"
<systemd-devel@lists.freedesktop.org>,
David Reaver <me@davidreaver.com>
Subject: Re: [EXT] [systemd-devel] [PATCH] libblkid: fix spurious ext superblock checksum mismatches
Date: Tue, 19 Nov 2024 09:49:57 -0800 [thread overview]
Message-ID: <20241119174957.GA3484088@mit.edu> (raw)
In-Reply-To: <7bc43689ca4249f18c60fa4b063822ea@ukr.de>
On Tue, Nov 19, 2024 at 08:15:29AM +0000, Windl, Ulrich wrote:
> > Reads of ext superblocks can race with updates. If libblkid observes a
> [Windl, Ulrich]
>
> I really wonder:
>
> Can one single block be inconsistent when read, considering that the
> block on disk is not inconsistent? That would mean that the block
> buffer you are reading is being modified by another process. AFAIK
> the basic UNIX semantic guarantee that a block is read atomically;
> if it's not, something is severely broken, and I don't think that
> O_DIRECT fixes that.
Yes, this can happen if the file system is mounted. The reason for
this is that the kernel updates metadata blocks via the block buffer
cache, with the jbd2 (journaled block layer v2) subsystem managing the
atomic updates. The jbd2 layer will block buffer cache writebacks
until the changes are committed in a jbd2 transaction. So the version
on disk is guaranteed to be consistent.
However, a buffer cache read does not have any consistency guarantees,
and if the file system is being actively modified, it is possible that
you could a superblock where the checksum hasn't yet been updated.
The O_DIRECT read isn't a magic bullet. For example, if you have a
scratch file system which is guaranteed not to survive a Kubernetes or
Borg container getting aborted, you might decide to format the file
system without a jbd2 journal, since that would be more efficient, and
by definition you don't care about the contents of the file system
after a crash. So there are millions of ext4 file systems in
hyperscale computing environments that are created without a journal;
and in that case, O_DIRECT will not be sufficient for guaranteeing a
consistent read of the superblock.
In the long term, I'll probably be adding an ioctl which will allow
userspace to read the superblock consistently for a mounted file
system. We actually already have ioctls, EXT4_IOC_GETFSUUID and
FS_IOC_GETFSLABEL which will allow userspace to fetch the UUID and
Label for a mounted file system. So eventually, I'll probably end
up adding EXT4_IOC_GET_SUPERBLOCK. Let me know if this is something
that util-linux would very much want.
Note: this does require figuring out (a) whether the file system is
mounted, and (b) if so, where is it mounted. So if blkid wants to use
this, it would need to have something like the function
ext2fs_check_mount_point[1].
Cheers,
- Ted
[1] https://github.com/tytso/e2fsprogs/blob/950a0d69c82b585aba30118f01bf80151deffe8c/lib/ext2fs/ismounted.c#L363
next prev parent reply other threads:[~2024-11-19 17:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-18 20:35 [PATCH] libblkid: fix spurious ext superblock checksum mismatches Krister Johansen
2024-11-18 22:36 ` [systemd-devel] " Lennart Poettering
2024-11-18 23:13 ` Krister Johansen
2024-11-19 8:19 ` [EXT] " Windl, Ulrich
2024-11-19 8:15 ` [EXT] " Windl, Ulrich
2024-11-19 17:49 ` Theodore Ts'o [this message]
2024-11-19 23:59 ` Krister Johansen
2024-11-20 6:07 ` Theodore Ts'o
2024-11-21 10:44 ` Karel Zak
2024-11-21 15:55 ` Theodore Ts'o
2024-11-22 8:54 ` Krister Johansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241119174957.GA3484088@mit.edu \
--to=tytso@mit.edu \
--cc=kjlx@templeofstupid.com \
--cc=kzak@redhat.com \
--cc=me@davidreaver.com \
--cc=systemd-devel@lists.freedesktop.org \
--cc=u.windl@ukr.de \
--cc=util-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox