All of lore.kernel.org
 help / color / mirror / Atom feed
From: Krister Johansen <kjlx@templeofstupid.com>
To: Lennart Poettering <lennart@poettering.net>
Cc: util-linux@vger.kernel.org, Karel Zak <kzak@redhat.com>,
	systemd-devel@lists.freedesktop.org,
	David Reaver <me@davidreaver.com>, Theodore Ts'o <tytso@mit.edu>
Subject: Re: [systemd-devel] [PATCH] libblkid: fix spurious ext superblock checksum mismatches
Date: Mon, 18 Nov 2024 15:13:52 -0800	[thread overview]
Message-ID: <20241118231352.GC1885@templeofstupid.com> (raw)
In-Reply-To: <ZzvBgOP_skwId4ci@gardel-login>

On Mon, Nov 18, 2024 at 11:36:48PM +0100, Lennart Poettering wrote:
> On Mo, 18.11.24 12:35, Krister Johansen (kjlx@templeofstupid.com) wrote:
> 
> > Reads of ext superblocks can race with updates.  If libblkid observes a
> > checksum mismatch, re-read the superblock with O_DIRECT in order to get
> > a consistent view of its contents.  Only if the O_DIRECT read fails the
> > checksum should it be reported to have failed.
> >
> > This fixes a problem where devices that were named by filesystem label
> > failed to be found when systemd attempted to mount them on boot.  The
> > problem was caused by systemd-udevd using libblkid. If a read of a
> > superblock resulted in a checksum mismatch, udev will remove the
> > by-label links which result in the mount call failing to find the
> > device.  The checksum mismatch that was triggering the problem was
> > spurious, and when we use O_DIRECT, or even perform a subsequent retry,
> > the superblock is correctly read.  This resulted in a failure to mount
> > /boot in one out of every 2,000 or so attempts in our environment.
> >
> > e2fsprogs fixed[1] an identical version of this bug that afflicted
> > resize2fs during online grow operations when run from cloud-init.  The
> > fix there was also to use O_DIRECT in order to read the superblock.
> > This patch uses a similar approach: read the superblock with O_DIRECT in
> > the case where a bad checksum is detected.
> 
> Umpf. udev has a clearly defined protocol to comprehensively avoid
> such issues:
> 
> https://systemd.io/BLOCK_DEVICE_LOCKING
> 
> Partitioning tools should simply follow this logic, and udev and
> programs downstream from it will not even be tempted to operate with
> half-written superblocks, partition tables or such.
> 
> Hence, I personally am not convinced of that O_DIRECT approach. First
> of all, it only works on superblocks that have a useful checksum
> covering enough relevant data, and it can never really catch scenarios
> where a disk is comprehensively repartitioned, i.e. one or more fs and
> partition metadata changed at the same time...

I may have done a poor job of explaining this.  This is ext writing its
own superblock from the kernel, but reads seeing an potentially
inconsistent view of that write.  O_DIRECT causes us to seralize with
the locks ext4 holds when it writes the superblock, which prevents the
read from observing a partial update.

It's not necessarily the partitioning tools causing this, but any
filesystem level udpdate that modifies the contents of the superblock.

-K

  reply	other threads:[~2024-11-18 23:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-18 20:35 [PATCH] libblkid: fix spurious ext superblock checksum mismatches Krister Johansen
2024-11-18 22:36 ` [systemd-devel] " Lennart Poettering
2024-11-18 23:13   ` Krister Johansen [this message]
2024-11-19  8:19     ` [EXT] " Windl, Ulrich
2024-11-19  8:15 ` [EXT] " Windl, Ulrich
2024-11-19 17:49   ` Theodore Ts'o
2024-11-19 23:59     ` Krister Johansen
2024-11-20  6:07       ` Theodore Ts'o
2024-11-21 10:44     ` Karel Zak
2024-11-21 15:55       ` Theodore Ts'o
2024-11-22  8:54       ` Krister Johansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241118231352.GC1885@templeofstupid.com \
    --to=kjlx@templeofstupid.com \
    --cc=kzak@redhat.com \
    --cc=lennart@poettering.net \
    --cc=me@davidreaver.com \
    --cc=systemd-devel@lists.freedesktop.org \
    --cc=tytso@mit.edu \
    --cc=util-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.