* Ensuring that mount(8) will always interpret a filesystem correctly @ 2024-12-08 1:45 Demi Marie Obenour 2024-12-09 10:26 ` Karel Zak 0 siblings, 1 reply; 8+ messages in thread From: Demi Marie Obenour @ 2024-12-08 1:45 UTC (permalink / raw) To: util-linux Is there a guarantee that if all data before the filesystem superblock is zero, and that the filesystem never writes to this region, libblkid (and thus, presumably, mount(8)) will always mount the filesystem with the correct filesystem type, even if e.g. someone writes a file containing a superblock of a different filesystem and the filesystem happens to put it where that superblock is valid? The motivation for this message is that systemd-gpt-generator generates mountpoints based on Discoverable Partition Specification GUIDs. These indicate the mountpoint of the partition but not the filesystem type. If a correctly-produced filesystem image will always continue to be recognized as the correct type, this is fine. Otherwise, an unlucky combination of writes to the filesystem and filesystem allocation decisions could cause the filesystem to start being mounted as the wrong type, which would be very bad. According to https://github.com/util-linux/util-linux/issues/1305, libblkid can indeed probe for subsequent superblocks after the first one it finds. -- Sincerely, Demi Marie Obenour (she/her/hers) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-08 1:45 Ensuring that mount(8) will always interpret a filesystem correctly Demi Marie Obenour @ 2024-12-09 10:26 ` Karel Zak 2024-12-10 5:11 ` Demi Marie Obenour 0 siblings, 1 reply; 8+ messages in thread From: Karel Zak @ 2024-12-09 10:26 UTC (permalink / raw) To: Demi Marie Obenour; +Cc: util-linux Hi Demi, On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote: > Is there a guarantee that if all data before the filesystem superblock is > zero, and that the filesystem never writes to this region, libblkid (and > thus, presumably, mount(8)) will always mount the filesystem with the > correct filesystem type, even if e.g. someone writes a file containing > a superblock of a different filesystem and the filesystem happens to put > it where that superblock is valid? the libblkid library offers multiple modes, with "safe mode" being the default for detecting filesystems. In this mode, the library checks for any additional valid superblocks on the device. There are exceptions for certain filesystems on CD/DVD media (such as udf and iso), but for regular filesystems, sharing the same device is not allowed. There is also an option to specify that a superblock is only valid if no other area is using it (using blkid_probe_set_wiper() and blkid_probe_use_wiper()). However, this is only used for LVM and bcache. The library does not require that there are zeros before the superblock, as not all mkfs-like programs zero out all areas. In recent years, there have been no reports of collisions. In the entire history of the library, the only collisions I can recall are with swap areas and luks, and occasionally with poorly detected FAT filesystems (due to the messy design of FAT). > The motivation for this message is that systemd-gpt-generator generates > mountpoints based on Discoverable Partition Specification GUIDs. These > indicate the mountpoint of the partition but not the filesystem type. Filesystem auto-detection is a common feature. The situation is similar to having an "auto" fstype in fstab. The systemd-gpt-generator simply identifies the partition as "/usr" (or any other mountpoint) and the rest is usual scenario. > If a correctly-produced filesystem image will always continue to be > recognized as the correct type, this is fine. Otherwise, an unlucky > combination of writes to the filesystem and filesystem allocation decisions > could cause the filesystem to start being mounted as the wrong type, which > would be very bad. According to https://github.com/util-linux/util-linux/issues/1305, > libblkid can indeed probe for subsequent superblocks after the first one it > finds. I believe the situation would be the same even without the Discoverable Partition Specification. The kernel always divides the whole disk into partitions, and libblkid/mount utilizes these partitions. Therefore, the filesystems are automatically separated by the partition table. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-09 10:26 ` Karel Zak @ 2024-12-10 5:11 ` Demi Marie Obenour 2024-12-10 11:16 ` Karel Zak 0 siblings, 1 reply; 8+ messages in thread From: Demi Marie Obenour @ 2024-12-10 5:11 UTC (permalink / raw) To: Karel Zak; +Cc: util-linux On 12/9/24 5:26 AM, Karel Zak wrote: > > Hi Demi, > > On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote: >> Is there a guarantee that if all data before the filesystem superblock is >> zero, and that the filesystem never writes to this region, libblkid (and >> thus, presumably, mount(8)) will always mount the filesystem with the >> correct filesystem type, even if e.g. someone writes a file containing >> a superblock of a different filesystem and the filesystem happens to put >> it where that superblock is valid? > > the libblkid library offers multiple modes, with "safe mode" being the > default for detecting filesystems. In this mode, the library checks > for any additional valid superblocks on the device. There are > exceptions for certain filesystems on CD/DVD media (such as udf and > iso), but for regular filesystems, sharing the same device is not > allowed. > > There is also an option to specify that a superblock is only valid if > no other area is using it (using blkid_probe_set_wiper() and > blkid_probe_use_wiper()). However, this is only used for LVM and > bcache. > > The library does not require that there are zeros before the > superblock, as not all mkfs-like programs zero out all areas. > > In recent years, there have been no reports of collisions. In the > entire history of the library, the only collisions I can recall are > with swap areas and luks, and occasionally with poorly detected FAT > filesystems (due to the messy design of FAT). Was https://github.com/util-linux/util-linux/issues/1305 a collision between ZFS and ext4? >> The motivation for this message is that systemd-gpt-generator generates >> mountpoints based on Discoverable Partition Specification GUIDs. These >> indicate the mountpoint of the partition but not the filesystem type. > > Filesystem auto-detection is a common feature. The situation is > similar to having an "auto" fstype in fstab. The systemd-gpt-generator > simply identifies the partition as "/usr" (or any other mountpoint) > and the rest is usual scenario.> >> If a correctly-produced filesystem image will always continue to be >> recognized as the correct type, this is fine. Otherwise, an unlucky >> combination of writes to the filesystem and filesystem allocation decisions >> could cause the filesystem to start being mounted as the wrong type, which >> would be very bad. According to https://github.com/util-linux/util-linux/issues/1305, >> libblkid can indeed probe for subsequent superblocks after the first one it >> finds. > > I believe the situation would be the same even without the > Discoverable Partition Specification. The kernel always divides the > whole disk into partitions, and libblkid/mount utilizes these > partitions. Therefore, the filesystems are automatically separated by > the partition table. /etc/fstab provides an explicit filesystem type. The Discoverable Partition Specification doesn't. -- Sincerely, Demi Marie Obenour (she/her/hers) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-10 5:11 ` Demi Marie Obenour @ 2024-12-10 11:16 ` Karel Zak 2024-12-10 23:28 ` Demi Marie Obenour 0 siblings, 1 reply; 8+ messages in thread From: Karel Zak @ 2024-12-10 11:16 UTC (permalink / raw) To: Demi Marie Obenour; +Cc: util-linux On Tue, Dec 10, 2024 at 12:11:49AM GMT, Demi Marie Obenour wrote: > On 12/9/24 5:26 AM, Karel Zak wrote: > > > > Hi Demi, > > > > On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote: > >> Is there a guarantee that if all data before the filesystem superblock is > >> zero, and that the filesystem never writes to this region, libblkid (and > >> thus, presumably, mount(8)) will always mount the filesystem with the > >> correct filesystem type, even if e.g. someone writes a file containing > >> a superblock of a different filesystem and the filesystem happens to put > >> it where that superblock is valid? > > > > the libblkid library offers multiple modes, with "safe mode" being the > > default for detecting filesystems. In this mode, the library checks > > for any additional valid superblocks on the device. There are > > exceptions for certain filesystems on CD/DVD media (such as udf and > > iso), but for regular filesystems, sharing the same device is not > > allowed. > > > > There is also an option to specify that a superblock is only valid if > > no other area is using it (using blkid_probe_set_wiper() and > > blkid_probe_use_wiper()). However, this is only used for LVM and > > bcache. > > > > The library does not require that there are zeros before the > > superblock, as not all mkfs-like programs zero out all areas. > > > > In recent years, there have been no reports of collisions. In the > > entire history of the library, the only collisions I can recall are > > with swap areas and luks, and occasionally with poorly detected FAT > > filesystems (due to the messy design of FAT). > > Was https://github.com/util-linux/util-linux/issues/1305 a > collision between ZFS and ext4? Yes, but in this case, ZFS was incorrectly detected. As you can see from the bug report, blkid ended with an "ambiguous result" error. > > I believe the situation would be the same even without the > > Discoverable Partition Specification. The kernel always divides the > > whole disk into partitions, and libblkid/mount utilizes these > > partitions. Therefore, the filesystems are automatically separated by > > the partition table. > > /etc/fstab provides an explicit filesystem type. The Discoverable > Partition Specification doesn't. You can use the "auto" file system type in fstab. It is also common for people to not use the "-t <type>" option on the mount(8) command line. However, if you are paranoid, then specifying the file system type in fstab and avoiding Discoverable Partitions is a good choice. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-10 11:16 ` Karel Zak @ 2024-12-10 23:28 ` Demi Marie Obenour 2024-12-11 13:38 ` Theodore Ts'o 0 siblings, 1 reply; 8+ messages in thread From: Demi Marie Obenour @ 2024-12-10 23:28 UTC (permalink / raw) To: Karel Zak; +Cc: util-linux On 12/10/24 6:16 AM, Karel Zak wrote: > On Tue, Dec 10, 2024 at 12:11:49AM GMT, Demi Marie Obenour wrote: >> On 12/9/24 5:26 AM, Karel Zak wrote: >>> >>> Hi Demi, >>> >>> On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote: >>>> Is there a guarantee that if all data before the filesystem superblock is >>>> zero, and that the filesystem never writes to this region, libblkid (and >>>> thus, presumably, mount(8)) will always mount the filesystem with the >>>> correct filesystem type, even if e.g. someone writes a file containing >>>> a superblock of a different filesystem and the filesystem happens to put >>>> it where that superblock is valid? >>> >>> the libblkid library offers multiple modes, with "safe mode" being the >>> default for detecting filesystems. In this mode, the library checks >>> for any additional valid superblocks on the device. There are >>> exceptions for certain filesystems on CD/DVD media (such as udf and >>> iso), but for regular filesystems, sharing the same device is not >>> allowed. >>> >>> There is also an option to specify that a superblock is only valid if >>> no other area is using it (using blkid_probe_set_wiper() and >>> blkid_probe_use_wiper()). However, this is only used for LVM and >>> bcache. >>> >>> The library does not require that there are zeros before the >>> superblock, as not all mkfs-like programs zero out all areas. >>> >>> In recent years, there have been no reports of collisions. In the >>> entire history of the library, the only collisions I can recall are >>> with swap areas and luks, and occasionally with poorly detected FAT >>> filesystems (due to the messy design of FAT). >> >> Was https://github.com/util-linux/util-linux/issues/1305 a >> collision between ZFS and ext4? > > Yes, but in this case, ZFS was incorrectly detected. As you can see > from the bug report, blkid ended with an "ambiguous result" error. Should blkid instead stop at the first valid superblock when probing filesystems for mounting? >>> I believe the situation would be the same even without the >>> Discoverable Partition Specification. The kernel always divides the >>> whole disk into partitions, and libblkid/mount utilizes these >>> partitions. Therefore, the filesystems are automatically separated by >>> the partition table. >> >> /etc/fstab provides an explicit filesystem type. The Discoverable >> Partition Specification doesn't. > > You can use the "auto" file system type in fstab. It is also common > for people to not use the "-t <type>" option on the mount(8) command > line. > > However, if you are paranoid, then specifying the file system type in > fstab and avoiding Discoverable Partitions is a good choice. Does that mean that Discoverable Partitions are a bad idea for any filesystem that is not read-only? Can you explain “if you are paranoid”? -- Sincerely, Demi Marie Obenour (she/her/hers) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-10 23:28 ` Demi Marie Obenour @ 2024-12-11 13:38 ` Theodore Ts'o 2024-12-14 22:08 ` Demi Marie Obenour 0 siblings, 1 reply; 8+ messages in thread From: Theodore Ts'o @ 2024-12-11 13:38 UTC (permalink / raw) To: Demi Marie Obenour; +Cc: Karel Zak, util-linux On Tue, Dec 10, 2024 at 06:28:28PM -0500, Demi Marie Obenour wrote: > >> Was https://github.com/util-linux/util-linux/issues/1305 a > >> collision between ZFS and ext4? > > > > Yes, but in this case, ZFS was incorrectly detected. As you can see > > from the bug report, blkid ended with an "ambiguous result" error. mke2fs (mkfs.ext4) does attempt to zero the typical locations where conflicting superblocks might be found. The ext4 metadata is located at the beginning of the file system, except for the first 1k, which we leave zero out on all platforms except for Sparc (the exact reason is lost in the midsts of time, since it pre-exists git, but as I recall Sparc had something critical that would cause its BIOS to lose its marbles if we zeroed it out), and we also zero out the very end of the disk where the MD superblock is located. It sounds like ZFS is putting its superblock someplace random that mke2fs ext4 doesn't know about. If someone wants to do the research to let me know what needs to be zeroed out to zap the ZFS superblock, please feel to file a bug against e2fsck (or better yet, send me a patch :-P ) and I'll be happy to add support for it. > >> /etc/fstab provides an explicit filesystem type. The Discoverable > >> Partition Specification doesn't. From what I can tell, the Discoverable Partition Table specification, at least as defined here[1] only supports explicit file system types supplied by the GPT partition table. [1] https://uapi-group.org/specifications/specs/discoverable_partitions_specification/ My personal preference is this *is* the best way to do things; the main reason why we have blkid is because of the disaster which is the MSDOS FAT partition table, where there was only a single byte used for the partition type, that (a) was largely ignored by other x86 operating systems, and (b) wasn't under our control, so we couldn't define a new partition type each time we introduced a new Linux file system. In general, having explicit file system types, whether it is in /etc/fstab, or in the GPT partition table, is the better way to go. Using blkid is ideally the fallback when the best possible way doesn't work, since it will ultimately always be a "best efforts" sort of thing. That being said, I suspect that if you ask, file system maintainers will be happy to try to make things work better --- just send us a patch or tell us what we need to do. ZFS is not a native Linux file system, and blkid pre-dates ZFS, so it's not something that I bothered testing. It doesn't help that I had absolutely zero interest in dealing with Sun deliberately making the CDDL incompatible with the GPL, and Larry Elison potentially trying to sue us into the ground. :-) Cheers, - Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-11 13:38 ` Theodore Ts'o @ 2024-12-14 22:08 ` Demi Marie Obenour 2024-12-15 3:20 ` Theodore Ts'o 0 siblings, 1 reply; 8+ messages in thread From: Demi Marie Obenour @ 2024-12-14 22:08 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Karel Zak, util-linux On 12/11/24 8:38 AM, Theodore Ts'o wrote: > On Tue, Dec 10, 2024 at 06:28:28PM -0500, Demi Marie Obenour wrote: >>>> Was https://github.com/util-linux/util-linux/issues/1305 a >>>> collision between ZFS and ext4? >>> >>> Yes, but in this case, ZFS was incorrectly detected. As you can see >>> from the bug report, blkid ended with an "ambiguous result" error. > > mke2fs (mkfs.ext4) does attempt to zero the typical locations where > conflicting superblocks might be found. The ext4 metadata is located > at the beginning of the file system, except for the first 1k, which we > leave zero out on all platforms except for Sparc (the exact reason is > lost in the midsts of time, since it pre-exists git, but as I recall > Sparc had something critical that would cause its BIOS to lose its > marbles if we zeroed it out), and we also zero out the very end of the > disk where the MD superblock is located. > > It sounds like ZFS is putting its superblock someplace random that > mke2fs ext4 doesn't know about. If someone wants to do the research > to let me know what needs to be zeroed out to zap the ZFS superblock, > please feel to file a bug against e2fsck (or better yet, send me a > patch :-P ) and I'll be happy to add support for it. I’m not too worried about this, and instead am of the opinion that it needs to be fixed on the blkid side (by ignoring the ZFS superblock). >>>> /etc/fstab provides an explicit filesystem type. The Discoverable >>>> Partition Specification doesn't. > > From what I can tell, the Discoverable Partition Table specification, > at least as defined here[1] only supports explicit file system types > supplied by the GPT partition table. > > [1] https://uapi-group.org/specifications/specs/discoverable_partitions_specification/ It’s the other way around: the GPT only provides the mountpoint, never the type. That’s why I filed an issue [1] asking for per-filesystem-type UUIDs. [1]: https://github.com/uapi-group/specifications/issues/132 > My personal preference is this *is* the best way to do things; the > main reason why we have blkid is because of the disaster which is the > MSDOS FAT partition table, where there was only a single byte used for > the partition type, that (a) was largely ignored by other x86 > operating systems, and (b) wasn't under our control, so we couldn't > define a new partition type each time we introduced a new Linux file > system. > > In general, having explicit file system types, whether it is in > /etc/fstab, or in the GPT partition table, is the better way to go. > Using blkid is ideally the fallback when the best possible way doesn't > work, since it will ultimately always be a "best efforts" sort of > thing. Thanks for confirming what I expected. -- Sincerely, Demi Marie Obenour (she/her/hers) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Ensuring that mount(8) will always interpret a filesystem correctly 2024-12-14 22:08 ` Demi Marie Obenour @ 2024-12-15 3:20 ` Theodore Ts'o 0 siblings, 0 replies; 8+ messages in thread From: Theodore Ts'o @ 2024-12-15 3:20 UTC (permalink / raw) To: Demi Marie Obenour; +Cc: Karel Zak, util-linux On Sat, Dec 14, 2024 at 05:08:54PM -0500, Demi Marie Obenour wrote: > > From what I can tell, the Discoverable Partition Table specification, > > at least as defined here[1] only supports explicit file system types > > supplied by the GPT partition table. > > > > [1] https://uapi-group.org/specifications/specs/discoverable_partitions_specification/ > > It’s the other way around: the GPT only provides the mountpoint, > never the type. That’s why I filed an issue [1] asking for > per-filesystem-type UUIDs. Bleah, you're right. Other partition tables, including MBR(!) used the "partiton type" to be the kind of file system. (For example, 0x07h meant OS/2, 0x09 meant QNX/Coherent/OS-9, 0x0Bh meant FAT32 with CHS addressing, 0x0Ch meat FAT32 with LBA, 39h meant Plan9, etc.) When I saw "partition type" in the UEFI spec, I thought they were seeing the path of wisdom and moving away from in-band signaling to an explicit type specification --- but you're right, looking at the UEFI spec more closely, it's about how the file system is to be used, not the file system type. (It's really not even the mount point, since 773f91ef-66d4-49b5-bd83-d683bf40ad16 means "per-user home partition", but since the UUID doesn't specify the username, you would't know whether it was supposed to be mounted in /home/lucy, or /home/snoopy, or /home/charlie_brown. Yelch....) > I’m not too worried about this, and instead am of the opinion that it > needs to be fixed on the blkid side (by ignoring the ZFS superblock). I disagree; blkid's *job* is to detect the file system type, and just ignoring all ZFS superblocks means that it won't be able to detect ZFS file systems, which would be sad. And having some kind of arbitrary preference where blkid were to say, "well, if it's ambiguous whether a block device is ext4 or btrfs or ZFS, I'll just arbitrarily say ext4 because I like ext4 more" is well, arbitrary. The best way to solve this is to either have users use "wipefs -a /dev/hdXX" before running a mkfs program, but in the spirit of being kind to users[1] who don't know about wipefs, or for distro installers that don't bother to call wipefs, I'm perfectly happy to teach mkfs.ext4 how to make the right thing happen automatically. I just need to know how to zap ZFS superbloks. BTW, in practice this happens automatically for SSD's, since we will call BLKDISCARD on the entire device, for better FTL GC performance. But for HDD's, we will need to explicitly write zeroes in the correct location. Cheers, - Ted [1] Using a variation from Struck and White's "The Elements of Style" where they said, "always write with a deep empathy towards the reader", we should strive to program with deep empathy towards the user. :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-12-15 3:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-08 1:45 Ensuring that mount(8) will always interpret a filesystem correctly Demi Marie Obenour 2024-12-09 10:26 ` Karel Zak 2024-12-10 5:11 ` Demi Marie Obenour 2024-12-10 11:16 ` Karel Zak 2024-12-10 23:28 ` Demi Marie Obenour 2024-12-11 13:38 ` Theodore Ts'o 2024-12-14 22:08 ` Demi Marie Obenour 2024-12-15 3:20 ` Theodore Ts'o
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.