From: "Darrick J. Wong" <djwong@kernel.org>
To: Christian Brauner <brauner@kernel.org>, Theodore Ts'o <tytso@mit.edu>
Cc: Dave Chinner <david@fromorbit.com>,
Amir Goldstein <amir73il@gmail.com>,
Jeff Layton <jlayton@kernel.org>,
miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
linux-xfs@vger.kernel.org
Subject: Re: uuid ioctl - was: Re: [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes
Date: Thu, 1 Jun 2023 18:23:35 -0700 [thread overview]
Message-ID: <20230602012335.GB16848@frogsfrogsfrogs> (raw)
In-Reply-To: <20230522-unsensibel-backblech-7be4e920ba87@brauner>
Someone ought to cc Ted since I asked him about this topic this morning
and he said he hadn't noticed it going by...
On Mon, May 22, 2023 at 12:50:19PM +0200, Christian Brauner wrote:
> On Mon, May 22, 2023 at 08:49:50AM +1000, Dave Chinner wrote:
> > On Sat, May 20, 2023 at 11:17:35AM +0200, Christian Brauner wrote:
> > > On Fri, May 19, 2023 at 03:42:38PM -0400, Mimi Zohar wrote:
> > > > On Fri, 2023-04-07 at 10:31 +0200, Christian Brauner wrote:
> > > > > So, I think we want both; we want the ovl_copyattr() and the
> > > > > vfs_getattr_nosec() change:
> > > > >
> > > > > (1) overlayfs should copy up the inode version in ovl_copyattr(). That
> > > > > is in line what we do with all other inode attributes. IOW, the
> > > > > overlayfs inode's i_version counter should aim to mirror the
> > > > > relevant layer's i_version counter. I wouldn't know why that
> > > > > shouldn't be the case. Asking the other way around there doesn't
> > > > > seem to be any use for overlayfs inodes to have an i_version that
> > > > > isn't just mirroring the relevant layer's i_version.
> > > > > (2) Jeff's changes for ima to make it rely on vfs_getattr_nosec().
> > > > > Currently, ima assumes that it will get the correct i_version from
> > > > > an inode but that just doesn't hold for stacking filesystem.
> > > > >
> > > > > While (1) would likely just fix the immediate bug (2) is correct and
> > > > > _robust_. If we change how attributes are handled vfs_*() helpers will
> > > > > get updated and ima with it. Poking at raw inodes without using
> > > > > appropriate helpers is much more likely to get ima into trouble.
> > > >
> > > > In addition to properly setting the i_version for IMA, EVM has a
> > > > similar issue with i_generation and s_uuid. Adding them to
> > > > ovl_copyattr() seems to resolve it. Does that make sense?
> > > >
> > > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > > > index 923d66d131c1..cd0aeb828868 100644
> > > > --- a/fs/overlayfs/util.c
> > > > +++ b/fs/overlayfs/util.c
> > > > @@ -1118,5 +1118,8 @@ void ovl_copyattr(struct inode *inode)
> > > > inode->i_atime = realinode->i_atime;
> > > > inode->i_mtime = realinode->i_mtime;
> > > > inode->i_ctime = realinode->i_ctime;
> > > > + inode->i_generation = realinode->i_generation;
> > > > + if (inode->i_sb)
> > > > + uuid_copy(&inode->i_sb->s_uuid, &realinode->i_sb-
> > >
> > > Overlayfs can consist of multiple lower layers and each of those lower
> > > layers may have a different uuid. So everytime you trigger a
> > > ovl_copyattr() on a different layer this patch would alter the uuid of
> > > the overlayfs superblock.
> > >
> > > In addition the uuid should be set when the filesystem is mounted.
> > > Unless the filesystem implements a dedicated ioctl() - like ext4 - to
> > > change the uuid.
> >
> > IMO, that ext4 functionality is a landmine waiting to be stepped on.
> >
> > We should not be changing the sb->s_uuid of filesysetms dynamically.
>
> Yeah, I kinda agree. If it works for ext4 and it's an ext4 specific
> ioctl then this is fine though.
Now that Dave's brought up all kinds of questions about other parts of
the kernel using s_uuid for things, I'm starting to think that even ext4
shouldn't be changing its own uuid on the fly.
Unless, of course, someone writes a way to record when s_uuid has been
accessed by something (e.g. pnfs, ima, etc) to lock out changes to the
value...
> Thanks for bringing this up. I had some thoughts on this (mostly at the
> end of this mail) but haven't had the time to express them.
>
> > The VFS does not guarantee in any way that it is safe to change the
> > sb->s_uuid (i.e. no locking, no change notifications, no udev
> > events, etc). Various subsystems - both in the kernel and in
> > userspace - use the sb->s_uuid as a canonical and/or persistent
> > filesystem/device identifier and are unprepared to have it change
> > while the filesystem is mounted and active.
...just like Dave just said. Heh. :(
--D
> Yes, it is not a VFS concept for sure.
>
> >
> > I commented on this from an XFS perspective here when it was
> > proposed to copy this ext4 mis-feature in XFS:
> >
> > https://lore.kernel.org/linux-xfs/20230314062847.GQ360264@dread.disaster.area/
>
> So I read the thread back then and I agree with you specifically about:
>
> * changing uuid dynamically isn't a well-defined concept
> * hoisting the ext4 specific ioctl that allows changing the uuid
> dynamically into a generic vfs ioctl is premature and gives the
> impression that this is a well-defined concept when it isn't.
> * the chosen data structure with a flexible array member would probably
> work but is suboptimal
>
> >
> > Further to this, I also suspect that changing uuids online will
> > cause issues with userspace caching of fs uuids (e.g. libblkid and
> > anything that uses it) and information that uses uuids to identify
> > the filesystem that are set up at mount time (/dev/disk/by-uuid/
> > links, etc) by kernel events sent to userspace helpers...
>
> Yeah, that's a valid concern as it's common practice to put uuids into
> /etc/fstab so if they were allowed to change while the filesystem is
> mounted/superblock is active the minimum thing needed is for userspace
> get a uevent so the /dev/disk/by-uuid/$uuid symlink can be updated by
> udev. But I digress.
>
> >
> > IMO, we shouldn't even be considering dynamic sb->s_uuid changes
> > without first working through the full system impacts of having
> > persistent userspace-visible filesystem identifiers change
> > dynamically...
>
> Yes.
>
> ---
>
> The thing that I think we could do is have all filesystems that can
> reasonably support it set a uuid. We currently don't do that. If we
> would start doing that then all filesystems that currently don't
> implement a separate f_fsid based on e.g., the disk's device number can
> just generate the f_fsid based on the uuid. This will make all these
> filesystems available to be used with fanotify - which requires f_fsid
> to be set for its most useful features.
>
> This is often the most useful for filesystems such as tmpfs which gained
> support for uuids quite recently. For such pseudo filesystems the
> lifetime of the uuid would be the lifetime of the superblock in contrast
> to filesystems like xfs that persist the uuid to disk. IOW, if you
> mount -t tmpfs tmpfs /mnt; umount /mnt; mount -t tmpfs tmpfs /mnt then
> you get a new uuid but the uuid stays fixed for the lifetime of the
> superblock and can't be changed.
>
> So the patchset that you objected had one part that made sense to me
> which was to hoist the ioctl that _gets_ the uuid from a filesystems
> into a generic ioctl. But I agree that the structure wasn't chosen
> nicely. I would prefer if this was a fixed size but extensible structure
> which is a concept we've had for a long time. So say we were to chose
> the following structure layout for the generic ioctl:
>
> struct fsuuid {
> __u32 fsu_len;
> __u32 fsu_flags;
> __u8 fsu_uuid[16]; // 8 * 16 = 128 = 64 * 2
> };
>
> then this would be compatible with ext4. It would also be extensible if
> we wanted to add additional fields in the future or switch to a new uuid
> format or whatever.
>
> A while back we did work for extensible struct in system calls but these
> extensible structs also work with ioctls.
>
> For example, see what we did for kernel/seccomp.c:
>
> /* Extensible Argument ioctls */
> #define EA_IOCTL(cmd) ((cmd) & ~(IOC_INOUT | IOCSIZE_MASK))
>
> switch (EA_IOCTL(cmd)) {
> case EA_IOCTL(SECCOMP_IOCTL_NOTIF_ADDFD):
> return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
> default:
> return -EINVAL;
> }
>
> and then
>
> static long seccomp_notify_addfd(struct seccomp_filter *filter,
> struct seccomp_notif_addfd __user *uaddfd,
> unsigned int size)
> {
> [...]
>
> BUILD_BUG_ON(sizeof(addfd) < SECCOMP_NOTIFY_ADDFD_SIZE_VER0);
> BUILD_BUG_ON(sizeof(addfd) != SECCOMP_NOTIFY_ADDFD_SIZE_LATEST);
>
> if (size < SECCOMP_NOTIFY_ADDFD_SIZE_VER0 || size >= PAGE_SIZE)
> return -EINVAL;
>
> ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
> if (ret)
> return ret;
>
> [...]
> }
>
> So the struct is versioned by size the same as for system calls. The
> difference for the ioctl is that the size is already encoded in the
> ioctl when it is defined. So even with a fixed size struct it is
> trivially possible to extend the struct later as long as the extension
> is 64bit aligned.
next prev parent reply other threads:[~2023-06-02 1:24 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-05 17:14 [PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes Stefan Berger
2023-04-06 10:26 ` Christian Brauner
2023-04-06 14:05 ` Paul Moore
2023-04-06 14:20 ` Stefan Berger
2023-04-06 14:36 ` Paul Moore
2023-04-06 15:01 ` Christian Brauner
2023-04-06 18:46 ` Jeff Layton
2023-04-06 19:11 ` Stefan Berger
2023-04-06 19:37 ` Jeff Layton
2023-04-06 20:22 ` Stefan Berger
2023-04-06 21:24 ` Jeff Layton
2023-04-06 21:58 ` Stefan Berger
2023-04-06 22:09 ` Jeff Layton
2023-04-06 22:04 ` Jeff Layton
2023-04-06 22:27 ` Stefan Berger
2023-04-07 8:31 ` Christian Brauner
2023-04-07 13:29 ` Jeff Layton
2023-04-09 15:22 ` Christian Brauner
2023-04-09 22:12 ` Jeff Layton
2023-04-11 8:38 ` Christian Brauner
2023-04-11 9:32 ` Jeff Layton
2023-04-11 9:49 ` Christian Brauner
2023-04-11 10:13 ` Jeff Layton
2023-04-11 14:08 ` Christian Brauner
2023-04-21 14:55 ` Mimi Zohar
2023-04-17 1:57 ` Stefan Berger
2023-04-17 8:11 ` Christian Brauner
2023-04-17 10:05 ` Jeff Layton
2023-04-17 12:45 ` Stefan Berger
2023-04-17 13:18 ` Jeff Layton
2023-04-21 14:43 ` Mimi Zohar
2023-05-18 20:46 ` Paul Moore
2023-05-18 20:50 ` Mimi Zohar
2023-05-19 14:58 ` Paul Moore
2023-05-25 14:43 ` Mimi Zohar
2023-05-19 19:42 ` Mimi Zohar
2023-05-20 9:15 ` Amir Goldstein
2023-05-22 12:18 ` Mimi Zohar
2023-05-22 14:00 ` Amir Goldstein
2023-05-23 19:38 ` Mimi Zohar
2023-05-20 9:17 ` Christian Brauner
2023-05-21 22:49 ` Dave Chinner
2023-05-22 10:50 ` uuid ioctl - was: " Christian Brauner
2023-06-02 1:23 ` Darrick J. Wong [this message]
2023-06-02 4:27 ` Theodore Ts'o
2023-06-02 6:34 ` Dave Chinner
2023-06-02 10:53 ` Amir Goldstein
2023-06-02 13:52 ` Christian Brauner
2023-06-02 14:23 ` Darrick J. Wong
2023-06-02 15:34 ` Christian Brauner
2023-06-04 22:59 ` Dave Chinner
2023-06-05 11:37 ` Christian Brauner
2023-06-05 14:36 ` Theodore Ts'o
2023-06-06 0:54 ` Dave Chinner
2023-06-02 14:58 ` Theodore Ts'o
2023-06-04 22:35 ` Dave Chinner
2023-06-02 13:14 ` Christian Brauner
2023-05-23 17:35 ` Mimi Zohar
2023-04-17 14:07 ` Stefan Berger
2023-04-07 6:42 ` Amir Goldstein
2023-04-06 16:10 ` Stefan Berger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230602012335.GB16848@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).