Re: uuid ioctl - was: Re: [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
	Amir Goldstein <amir73il@gmail.com>,
	Jeff Layton <jlayton@kernel.org>,
	miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: uuid ioctl - was: Re: [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes
Date: Fri, 2 Jun 2023 07:23:29 -0700	[thread overview]
Message-ID: <20230602142329.GC16848@frogsfrogsfrogs> (raw)
In-Reply-To: <20230602-dividende-model-62b2bdc073cf@brauner>

On Fri, Jun 02, 2023 at 03:52:16PM +0200, Christian Brauner wrote:
> On Fri, Jun 02, 2023 at 04:34:58PM +1000, Dave Chinner wrote:
> > On Fri, Jun 02, 2023 at 12:27:14AM -0400, Theodore Ts'o wrote:
> > > On Thu, Jun 01, 2023 at 06:23:35PM -0700, Darrick J. Wong wrote:
> > > > Someone ought to cc Ted since I asked him about this topic this morning
> > > > and he said he hadn't noticed it going by...
> > > > 
> > > > > > > In addition the uuid should be set when the filesystem is mounted.
> > > > > > > Unless the filesystem implements a dedicated ioctl() - like ext4 - to
> > > > > > > change the uuid.
> > > > > > 
> > > > > > IMO, that ext4 functionality is a landmine waiting to be stepped on.
> > > > > > 
> > > > > > We should not be changing the sb->s_uuid of filesysetms dynamically.
> > > > > 
> > > > > Yeah, I kinda agree. If it works for ext4 and it's an ext4 specific
> > > > > ioctl then this is fine though.
> > > > 
> > > > Now that Dave's brought up all kinds of questions about other parts of
> > > > the kernel using s_uuid for things, I'm starting to think that even ext4
> > > > shouldn't be changing its own uuid on the fly.
> > > 
> > > So let's set some context here.  The tune2fs program in e2fsprogs has
> > > supported changing the UUID for a *very* long time.  Specifically,
> > > since September 7, 1996 (e2fsprogs version 1.05, when we first added
> > > the UUID field in the ext2 superblock).
> > 
> > Yup, and XFS has supported offline changing of the UUID a couple of
> > years before that.
> > 
> > > This feature was added from
> > > the very beginning since in Large Installation System Administration
> > > (LISA) systems, a very common thing to do is to image boot disks from
> > > a "golden master", and then afterwards, you want to make sure the file
> > > systems on each boot disk have a unique UUID; and this is done via
> > > "tune2fs -U random /dev/sdXX".  Since I was working at MIT Project
> > > Athena at the time, we regularly did this when installing Athena
> > > client workstations, and when I added UUID support to ext2, I made
> > > sure this feature was well-supported.
> > 
> > See xfs_copy(8). This was a tool originally written, IIRC, in early
> > 1995 for physically cloning sparse golden images in the SGI factory
> > production line. It was multi-threaded and could write up to 16 scsi
> > disks at once with a single ascending LBA order pass. The last thing
> > it does is change the UUID of each clone to make them unique.
> > 
> > There's nothing new here - this is all 30 years ago, and we've had
> > tools changing filesystems UUIDs for all this time.
> > 
> > > The tune2fs program allows the UUID to be changed via the file system
> > > is mounted (with some caveats), which it did by directly modifying the
> > > on-disk superblock.  Obviously, when it did that, it wouldn't change
> > > sb->s_uuid "dynamically", although the next time the file system was
> > > mounted, sb->s_uuid would get the new UUID.
> > 
> > Yes, which means for userspace and most of the kernel it's no
> > different to "unmount, change UUID, mount". It's effectively an
> > offline change, even if the on-disk superblock is changed while the
> > filesystem is mounted.
> > 
> > > If overlayfs and IMA are
> > > expecting that a file system's UUID would stay consant and persistent
> > > --- well, that's not true, and it has always been that way, since
> > > there are tools that make it trivially easy for a system administrator
> > > to adjust the UUID.
> > 
> > Yes, but that's not the point I've been making. My point is that the
> > *online change of sb->s_uuid* that was being proposed for the
> > XFS/generic variant of the ext4 online UUID change ioctl is
> > completely new, and that's where all the problems start....
> > 
> > > In addition to the LISA context, this feature is also commonly used in
> > > various cloud deployments, since when you create a new VM, it
> > > typically gets a new root file system, which is copied from a fixed,
> > > read-only image.  So on a particular hyperscale cloud system, if we
> > > didn't do anything special, there could be hundreds of thousands VM's
> > > whose root file system would all have the same UUID, which would mean
> > > that the UUID... isn't terribly unique.
> > 
> > Again, nothing new here - we've been using snapshots/clones/reflinks
> > for efficient VM storage provisioning for well over 15 years now.
> > 
> > .....
> > 
> > > This is the reason why we added the ext4 ioctl; it was intended for
> > > the express use of "tune2fs -U", and like tune2fs -U, it doesn't
> > > actually change sb->s_uuid; it only changes the on-disk superblock's
> > > UUID.  This was mostly because we forgot about sb->s_uuid, to be
> > > honest, but it means that regardless of whether "tune2fs -U" directly
> > > modifies the block device, or uses the ext4 ioctl, the behaviour with
> > > respect to sb->s_uuid is the same; it's not modified when the on-disk
> > > uuid is changed.

...which means that anyone writing out non-ext4 ondisk metadata will now
be doing it with a stale fsuuid.  Er... that might just be an ext*
quirk that everyone will have to live with.

> > IOWs, not only was the ext4 functionality was poorly thought out, it
> > was *poorly implemented*.
> > 
> > So, let's take a step back here - we've done the use case thing to
> > death now - and consider what is it we actually need here?
> > 
> > All we need for the hyperscale/VM provisioning use case is for the
> > the UUID to be changed at first boot/mount time before anything else
> > happens.
> > 
> > So why do we need userspace to be involved in that? Indeed,
> > all the problems stem from needing to have userspace change the
> > UUID.
> > 
> > There's an obvious solution: a newly provisioned filesystem needs to
> > change the uuid at first mount. The only issue is the
> > kernel/filesystem doesn't know when the first mount is.
> > 
> > Darrick suggested "mount -o setuuid=xxxx" on #xfs earlier, but that
> > requires changing userspace init stuff and, well, I hate single use
> > case mount options like this.
> > 
> > However, we have a golden image that every client image is cloned
> > from. Say we set a special feature bit in that golden image that
> > means "need UUID regeneration". Then on the first mount of the
> > cloned image after provisioning, the filesystem sees the bit and
> > automatically regenerates the UUID with needing any help from
> > userspace at all.
> > 
> > Problem solved, yes? We don't need userspace to change the uuid on
> > first boot of the newly provisioned VM - the filesystem just makes
> > it happen.
> 
> systemd-repart implements the following logic currently: If the GPT
> *partition* and *disk* UUIDs are 0 then it will generate new UUIDs
> before the first mount.
> 
> So for the *filesystem* UUID I think the golden image should either have
> the UUID set to zero as well or to a special UUID. Either way, it would
> mean the filesystem needs to generate a new UUID when it is mounted the
> first time.
> 
> If we do this then all filesystems that support this should use the same
> value to indicate "generate new UUID".

Curiously, I noticed that blkid doesn't report the xfs uuid if it's all
zeroes:

# mkfs.xfs -f /dev/loop0 -m uuid=00000000-0000-0000-0000-000000000000

# blkid /dev/loop0
/dev/loop0: BLOCK_SIZE="512" TYPE="xfs"

Nor does udev create symlinks:

# ls /dev/disk/by-uuid/0*
ls: cannot access '/dev/disk/by-uuid/0*': No such file or directory

Nor does mounting by uuid work:

# mount UUID=00000000-0000-0000-0000-000000000000 /tmp/x
mount: /tmp/x: can't find UUID=00000000-0000-0000-0000-000000000000.

So I wonder if xfs even really needs a new superblock bit at all --
mounting via uuid doesn't work in the zeroed-uuid case, and the kernel
could indeed generate a new one at mount time before it populates
s_uuid, etc.  Then the initscripts can re-run blkid (or xfs_info) to
extract the new uuid and update config files as needed.

Though, the first-mount uuid would still break anything recorded in the
non-xfs metadata by the image creating system (such as evm attributes).
But at least that's on the image creator people to know that.

> > 
> > If the "first run" init scripts are set up to run blkid to grab the
> > new uuid after mount and update whatever needs to be updated with
> > the new root filesystem UUID, then we've moved the entire problem
> > out of the VM boot path and back into the provisioning system where
> > it should be.
> > 
> > And then we don't need an ioctl to change UUIDs online, nor do we
> 
> It also doesn't really help that much. What userspace would need is a
> way to regenerate the filesystem UUID before the filesystem is mounted.
> It doesn't help that much if you have to mount it first to change it...

<shrug> Well it's the rootfs where we want to change the uuid at
first-run time, and all the config info that needs updating is inside
the rootfs anyway.  If someone needs mount-by-uuid for the rootfs during
the first run or they require a specific uuid, they can still run
xfs_admin from within the initramfs.

--D

next prev parent reply	other threads:[~2023-06-02 14:23 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-05 17:14 [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes Stefan Berger
2023-04-06 10:26 ` Christian Brauner
2023-04-06 14:05   ` Paul Moore
2023-04-06 14:20     ` Stefan Berger
2023-04-06 14:36       ` Paul Moore
2023-04-06 15:01         ` Christian Brauner
2023-04-06 18:46           ` Jeff Layton
2023-04-06 19:11             ` Stefan Berger
2023-04-06 19:37               ` Jeff Layton
2023-04-06 20:22                 ` Stefan Berger
2023-04-06 21:24                   ` Jeff Layton
2023-04-06 21:58                     ` Stefan Berger
2023-04-06 22:09                       ` Jeff Layton
2023-04-06 22:04                     ` Jeff Layton
2023-04-06 22:27                       ` Stefan Berger
2023-04-07  8:31                       ` Christian Brauner
2023-04-07 13:29                         ` Jeff Layton
2023-04-09 15:22                           ` Christian Brauner
2023-04-09 22:12                             ` Jeff Layton
2023-04-11  8:38                               ` Christian Brauner
2023-04-11  9:32                                 ` Jeff Layton
2023-04-11  9:49                                   ` Christian Brauner
2023-04-11 10:13                                     ` Jeff Layton
2023-04-11 14:08                                       ` Christian Brauner
2023-04-21 14:55                                 ` Mimi Zohar
2023-04-17  1:57                           ` Stefan Berger
2023-04-17  8:11                             ` Christian Brauner
2023-04-17 10:05                             ` Jeff Layton
2023-04-17 12:45                               ` Stefan Berger
2023-04-17 13:18                                 ` Jeff Layton
2023-04-21 14:43                           ` Mimi Zohar
2023-05-18 20:46                             ` Paul Moore
2023-05-18 20:50                               ` Mimi Zohar
2023-05-19 14:58                                 ` Paul Moore
2023-05-25 14:43                                   ` Mimi Zohar
2023-05-19 19:42                         ` Mimi Zohar
2023-05-20  9:15                           ` Amir Goldstein
2023-05-22 12:18                             ` Mimi Zohar
2023-05-22 14:00                               ` Amir Goldstein
2023-05-23 19:38                                 ` Mimi Zohar
2023-05-20  9:17                           ` Christian Brauner
2023-05-21 22:49                             ` Dave Chinner
2023-05-22 10:50                               ` uuid ioctl - was: " Christian Brauner
2023-06-02  1:23                                 ` Darrick J. Wong
2023-06-02  4:27                                   ` Theodore Ts'o
2023-06-02  6:34                                     ` Dave Chinner
2023-06-02 10:53                                       ` Amir Goldstein
2023-06-02 13:52                                       ` Christian Brauner
2023-06-02 14:23                                         ` Darrick J. Wong [this message]
2023-06-02 15:34                                           ` Christian Brauner
2023-06-04 22:59                                         ` Dave Chinner
2023-06-05 11:37                                           ` Christian Brauner
2023-06-05 14:36                                             ` Theodore Ts'o
2023-06-06  0:54                                               ` Dave Chinner
2023-06-02 14:58                                       ` Theodore Ts'o
2023-06-04 22:35                                         ` Dave Chinner
2023-06-02 13:14                                   ` Christian Brauner
2023-05-23 17:35                               ` Mimi Zohar
2023-04-17 14:07                       ` Stefan Berger
2023-04-07  6:42                   ` Amir Goldstein
2023-04-06 16:10         ` Stefan Berger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230602142329.GC16848@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).