All of lore.kernel.org
 help / color / mirror / Atom feed
* Ensuring that mount(8) will always interpret a filesystem correctly
@ 2024-12-08  1:45 Demi Marie Obenour
  2024-12-09 10:26 ` Karel Zak
  0 siblings, 1 reply; 8+ messages in thread
From: Demi Marie Obenour @ 2024-12-08  1:45 UTC (permalink / raw)
  To: util-linux

Is there a guarantee that if all data before the filesystem superblock is
zero, and that the filesystem never writes to this region, libblkid (and
thus, presumably, mount(8)) will always mount the filesystem with the
correct filesystem type, even if e.g. someone writes a file containing
a superblock of a different filesystem and the filesystem happens to put
it where that superblock is valid?

The motivation for this message is that systemd-gpt-generator generates
mountpoints based on Discoverable Partition Specification GUIDs.  These
indicate the mountpoint of the partition but not the filesystem type.
If a correctly-produced filesystem image will always continue to be
recognized as the correct type, this is fine.  Otherwise, an unlucky
combination of writes to the filesystem and filesystem allocation decisions
could cause the filesystem to start being mounted as the wrong type, which
would be very bad.  According to https://github.com/util-linux/util-linux/issues/1305,
libblkid can indeed probe for subsequent superblocks after the first one it
finds.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-08  1:45 Ensuring that mount(8) will always interpret a filesystem correctly Demi Marie Obenour
@ 2024-12-09 10:26 ` Karel Zak
  2024-12-10  5:11   ` Demi Marie Obenour
  0 siblings, 1 reply; 8+ messages in thread
From: Karel Zak @ 2024-12-09 10:26 UTC (permalink / raw)
  To: Demi Marie Obenour; +Cc: util-linux


 Hi Demi,

On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote:
> Is there a guarantee that if all data before the filesystem superblock is
> zero, and that the filesystem never writes to this region, libblkid (and
> thus, presumably, mount(8)) will always mount the filesystem with the
> correct filesystem type, even if e.g. someone writes a file containing
> a superblock of a different filesystem and the filesystem happens to put
> it where that superblock is valid?

the libblkid library offers multiple modes, with "safe mode" being the
default for detecting filesystems. In this mode, the library checks
for any additional valid superblocks on the device. There are
exceptions for certain filesystems on CD/DVD media (such as udf and
iso), but for regular filesystems, sharing the same device is not
allowed.

There is also an option to specify that a superblock is only valid if
no other area is using it (using blkid_probe_set_wiper() and
blkid_probe_use_wiper()). However, this is only used for LVM and
bcache.

The library does not require that there are zeros before the
superblock, as not all mkfs-like programs zero out all areas.

In recent years, there have been no reports of collisions. In the
entire history of the library, the only collisions I can recall are
with swap areas and luks, and occasionally with poorly detected FAT
filesystems (due to the messy design of FAT).

> The motivation for this message is that systemd-gpt-generator generates
> mountpoints based on Discoverable Partition Specification GUIDs.  These
> indicate the mountpoint of the partition but not the filesystem type.

Filesystem auto-detection is a common feature. The situation is
similar to having an "auto" fstype in fstab. The systemd-gpt-generator
simply identifies the partition as "/usr" (or any other mountpoint)
and the rest is usual scenario.

> If a correctly-produced filesystem image will always continue to be
> recognized as the correct type, this is fine.  Otherwise, an unlucky
> combination of writes to the filesystem and filesystem allocation decisions
> could cause the filesystem to start being mounted as the wrong type, which
> would be very bad.  According to https://github.com/util-linux/util-linux/issues/1305,
> libblkid can indeed probe for subsequent superblocks after the first one it
> finds.

I believe the situation would be the same even without the
Discoverable Partition Specification. The kernel always divides the
whole disk into partitions, and libblkid/mount utilizes these
partitions. Therefore, the filesystems are automatically separated by
the partition table.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-09 10:26 ` Karel Zak
@ 2024-12-10  5:11   ` Demi Marie Obenour
  2024-12-10 11:16     ` Karel Zak
  0 siblings, 1 reply; 8+ messages in thread
From: Demi Marie Obenour @ 2024-12-10  5:11 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On 12/9/24 5:26 AM, Karel Zak wrote:
> 
>  Hi Demi,
> 
> On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote:
>> Is there a guarantee that if all data before the filesystem superblock is
>> zero, and that the filesystem never writes to this region, libblkid (and
>> thus, presumably, mount(8)) will always mount the filesystem with the
>> correct filesystem type, even if e.g. someone writes a file containing
>> a superblock of a different filesystem and the filesystem happens to put
>> it where that superblock is valid?
> 
> the libblkid library offers multiple modes, with "safe mode" being the
> default for detecting filesystems. In this mode, the library checks
> for any additional valid superblocks on the device. There are
> exceptions for certain filesystems on CD/DVD media (such as udf and
> iso), but for regular filesystems, sharing the same device is not
> allowed.
> 
> There is also an option to specify that a superblock is only valid if
> no other area is using it (using blkid_probe_set_wiper() and
> blkid_probe_use_wiper()). However, this is only used for LVM and
> bcache.
> 
> The library does not require that there are zeros before the
> superblock, as not all mkfs-like programs zero out all areas.
> 
> In recent years, there have been no reports of collisions. In the
> entire history of the library, the only collisions I can recall are
> with swap areas and luks, and occasionally with poorly detected FAT
> filesystems (due to the messy design of FAT).

Was https://github.com/util-linux/util-linux/issues/1305 a
collision between ZFS and ext4?

>> The motivation for this message is that systemd-gpt-generator generates
>> mountpoints based on Discoverable Partition Specification GUIDs.  These
>> indicate the mountpoint of the partition but not the filesystem type.
> 
> Filesystem auto-detection is a common feature. The situation is
> similar to having an "auto" fstype in fstab. The systemd-gpt-generator
> simply identifies the partition as "/usr" (or any other mountpoint)
> and the rest is usual scenario.> 
>> If a correctly-produced filesystem image will always continue to be
>> recognized as the correct type, this is fine.  Otherwise, an unlucky
>> combination of writes to the filesystem and filesystem allocation decisions
>> could cause the filesystem to start being mounted as the wrong type, which
>> would be very bad.  According to https://github.com/util-linux/util-linux/issues/1305,
>> libblkid can indeed probe for subsequent superblocks after the first one it
>> finds.
> 
> I believe the situation would be the same even without the
> Discoverable Partition Specification. The kernel always divides the
> whole disk into partitions, and libblkid/mount utilizes these
> partitions. Therefore, the filesystems are automatically separated by
> the partition table.

/etc/fstab provides an explicit filesystem type.  The Discoverable
Partition Specification doesn't.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-10  5:11   ` Demi Marie Obenour
@ 2024-12-10 11:16     ` Karel Zak
  2024-12-10 23:28       ` Demi Marie Obenour
  0 siblings, 1 reply; 8+ messages in thread
From: Karel Zak @ 2024-12-10 11:16 UTC (permalink / raw)
  To: Demi Marie Obenour; +Cc: util-linux

On Tue, Dec 10, 2024 at 12:11:49AM GMT, Demi Marie Obenour wrote:
> On 12/9/24 5:26 AM, Karel Zak wrote:
> > 
> >  Hi Demi,
> > 
> > On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote:
> >> Is there a guarantee that if all data before the filesystem superblock is
> >> zero, and that the filesystem never writes to this region, libblkid (and
> >> thus, presumably, mount(8)) will always mount the filesystem with the
> >> correct filesystem type, even if e.g. someone writes a file containing
> >> a superblock of a different filesystem and the filesystem happens to put
> >> it where that superblock is valid?
> > 
> > the libblkid library offers multiple modes, with "safe mode" being the
> > default for detecting filesystems. In this mode, the library checks
> > for any additional valid superblocks on the device. There are
> > exceptions for certain filesystems on CD/DVD media (such as udf and
> > iso), but for regular filesystems, sharing the same device is not
> > allowed.
> > 
> > There is also an option to specify that a superblock is only valid if
> > no other area is using it (using blkid_probe_set_wiper() and
> > blkid_probe_use_wiper()). However, this is only used for LVM and
> > bcache.
> > 
> > The library does not require that there are zeros before the
> > superblock, as not all mkfs-like programs zero out all areas.
> > 
> > In recent years, there have been no reports of collisions. In the
> > entire history of the library, the only collisions I can recall are
> > with swap areas and luks, and occasionally with poorly detected FAT
> > filesystems (due to the messy design of FAT).
> 
> Was https://github.com/util-linux/util-linux/issues/1305 a
> collision between ZFS and ext4?

Yes, but in this case, ZFS was incorrectly detected. As you can see
from the bug report, blkid ended with an "ambiguous result" error.

> > I believe the situation would be the same even without the
> > Discoverable Partition Specification. The kernel always divides the
> > whole disk into partitions, and libblkid/mount utilizes these
> > partitions. Therefore, the filesystems are automatically separated by
> > the partition table.
> 
> /etc/fstab provides an explicit filesystem type.  The Discoverable
> Partition Specification doesn't.

You can use the "auto" file system type in fstab. It is also common
for people to not use the "-t <type>" option on the mount(8) command
line.

However, if you are paranoid, then specifying the file system type in
fstab and avoiding Discoverable Partitions is a good choice.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-10 11:16     ` Karel Zak
@ 2024-12-10 23:28       ` Demi Marie Obenour
  2024-12-11 13:38         ` Theodore Ts'o
  0 siblings, 1 reply; 8+ messages in thread
From: Demi Marie Obenour @ 2024-12-10 23:28 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On 12/10/24 6:16 AM, Karel Zak wrote:
> On Tue, Dec 10, 2024 at 12:11:49AM GMT, Demi Marie Obenour wrote:
>> On 12/9/24 5:26 AM, Karel Zak wrote:
>>>
>>>  Hi Demi,
>>>
>>> On Sat, Dec 07, 2024 at 08:45:32PM GMT, Demi Marie Obenour wrote:
>>>> Is there a guarantee that if all data before the filesystem superblock is
>>>> zero, and that the filesystem never writes to this region, libblkid (and
>>>> thus, presumably, mount(8)) will always mount the filesystem with the
>>>> correct filesystem type, even if e.g. someone writes a file containing
>>>> a superblock of a different filesystem and the filesystem happens to put
>>>> it where that superblock is valid?
>>>
>>> the libblkid library offers multiple modes, with "safe mode" being the
>>> default for detecting filesystems. In this mode, the library checks
>>> for any additional valid superblocks on the device. There are
>>> exceptions for certain filesystems on CD/DVD media (such as udf and
>>> iso), but for regular filesystems, sharing the same device is not
>>> allowed.
>>>
>>> There is also an option to specify that a superblock is only valid if
>>> no other area is using it (using blkid_probe_set_wiper() and
>>> blkid_probe_use_wiper()). However, this is only used for LVM and
>>> bcache.
>>>
>>> The library does not require that there are zeros before the
>>> superblock, as not all mkfs-like programs zero out all areas.
>>>
>>> In recent years, there have been no reports of collisions. In the
>>> entire history of the library, the only collisions I can recall are
>>> with swap areas and luks, and occasionally with poorly detected FAT
>>> filesystems (due to the messy design of FAT).
>>
>> Was https://github.com/util-linux/util-linux/issues/1305 a
>> collision between ZFS and ext4?
> 
> Yes, but in this case, ZFS was incorrectly detected. As you can see
> from the bug report, blkid ended with an "ambiguous result" error.

Should blkid instead stop at the first valid superblock when probing
filesystems for mounting?

>>> I believe the situation would be the same even without the
>>> Discoverable Partition Specification. The kernel always divides the
>>> whole disk into partitions, and libblkid/mount utilizes these
>>> partitions. Therefore, the filesystems are automatically separated by
>>> the partition table.
>>
>> /etc/fstab provides an explicit filesystem type.  The Discoverable
>> Partition Specification doesn't.
> 
> You can use the "auto" file system type in fstab. It is also common
> for people to not use the "-t <type>" option on the mount(8) command
> line.
> 
> However, if you are paranoid, then specifying the file system type in
> fstab and avoiding Discoverable Partitions is a good choice.

Does that mean that Discoverable Partitions are a bad idea for any
filesystem that is not read-only?  Can you explain “if you are
paranoid”?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-10 23:28       ` Demi Marie Obenour
@ 2024-12-11 13:38         ` Theodore Ts'o
  2024-12-14 22:08           ` Demi Marie Obenour
  0 siblings, 1 reply; 8+ messages in thread
From: Theodore Ts'o @ 2024-12-11 13:38 UTC (permalink / raw)
  To: Demi Marie Obenour; +Cc: Karel Zak, util-linux

On Tue, Dec 10, 2024 at 06:28:28PM -0500, Demi Marie Obenour wrote:
> >> Was https://github.com/util-linux/util-linux/issues/1305 a
> >> collision between ZFS and ext4?
> > 
> > Yes, but in this case, ZFS was incorrectly detected. As you can see
> > from the bug report, blkid ended with an "ambiguous result" error.

mke2fs (mkfs.ext4) does attempt to zero the typical locations where
conflicting superblocks might be found.  The ext4 metadata is located
at the beginning of the file system, except for the first 1k, which we
leave zero out on all platforms except for Sparc (the exact reason is
lost in the midsts of time, since it pre-exists git, but as I recall
Sparc had something critical that would cause its BIOS to lose its
marbles if we zeroed it out), and we also zero out the very end of the
disk where the MD superblock is located.

It sounds like ZFS is putting its superblock someplace random that
mke2fs ext4 doesn't know about.  If someone wants to do the research
to let me know what needs to be zeroed out to zap the ZFS superblock,
please feel to file a bug against e2fsck (or better yet, send me a
patch :-P ) and I'll be happy to add support for it.

> >> /etc/fstab provides an explicit filesystem type.  The Discoverable
> >> Partition Specification doesn't.

From what I can tell, the Discoverable Partition Table specification,
at least as defined here[1] only supports explicit file system types
supplied by the GPT partition table.

[1] https://uapi-group.org/specifications/specs/discoverable_partitions_specification/

My personal preference is this *is* the best way to do things; the
main reason why we have blkid is because of the disaster which is the
MSDOS FAT partition table, where there was only a single byte used for
the partition type, that (a) was largely ignored by other x86
operating systems, and (b) wasn't under our control, so we couldn't
define a new partition type each time we introduced a new Linux file
system.

In general, having explicit file system types, whether it is in
/etc/fstab, or in the GPT partition table, is the better way to go.
Using blkid is ideally the fallback when the best possible way doesn't
work, since it will ultimately always be a "best efforts" sort of
thing.

That being said, I suspect that if you ask, file system maintainers
will be happy to try to make things work better --- just send us a
patch or tell us what we need to do.  ZFS is not a native Linux file
system, and blkid pre-dates ZFS, so it's not something that I bothered
testing.  It doesn't help that I had absolutely zero interest in
dealing with Sun deliberately making the CDDL incompatible with the
GPL, and Larry Elison potentially trying to sue us into the ground.  :-)

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-11 13:38         ` Theodore Ts'o
@ 2024-12-14 22:08           ` Demi Marie Obenour
  2024-12-15  3:20             ` Theodore Ts'o
  0 siblings, 1 reply; 8+ messages in thread
From: Demi Marie Obenour @ 2024-12-14 22:08 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Karel Zak, util-linux

On 12/11/24 8:38 AM, Theodore Ts'o wrote:
> On Tue, Dec 10, 2024 at 06:28:28PM -0500, Demi Marie Obenour wrote:
>>>> Was https://github.com/util-linux/util-linux/issues/1305 a
>>>> collision between ZFS and ext4?
>>>
>>> Yes, but in this case, ZFS was incorrectly detected. As you can see
>>> from the bug report, blkid ended with an "ambiguous result" error.
> 
> mke2fs (mkfs.ext4) does attempt to zero the typical locations where
> conflicting superblocks might be found.  The ext4 metadata is located
> at the beginning of the file system, except for the first 1k, which we
> leave zero out on all platforms except for Sparc (the exact reason is
> lost in the midsts of time, since it pre-exists git, but as I recall
> Sparc had something critical that would cause its BIOS to lose its
> marbles if we zeroed it out), and we also zero out the very end of the
> disk where the MD superblock is located.
> 
> It sounds like ZFS is putting its superblock someplace random that
> mke2fs ext4 doesn't know about.  If someone wants to do the research
> to let me know what needs to be zeroed out to zap the ZFS superblock,
> please feel to file a bug against e2fsck (or better yet, send me a
> patch :-P ) and I'll be happy to add support for it.

I’m not too worried about this, and instead am of the opinion that it
needs to be fixed on the blkid side (by ignoring the ZFS superblock).

>>>> /etc/fstab provides an explicit filesystem type.  The Discoverable
>>>> Partition Specification doesn't.
> 
> From what I can tell, the Discoverable Partition Table specification,
> at least as defined here[1] only supports explicit file system types
> supplied by the GPT partition table.
> 
> [1] https://uapi-group.org/specifications/specs/discoverable_partitions_specification/

It’s the other way around: the GPT only provides the mountpoint,
never the type.  That’s why I filed an issue [1] asking for
per-filesystem-type UUIDs.

[1]: https://github.com/uapi-group/specifications/issues/132

> My personal preference is this *is* the best way to do things; the
> main reason why we have blkid is because of the disaster which is the
> MSDOS FAT partition table, where there was only a single byte used for
> the partition type, that (a) was largely ignored by other x86
> operating systems, and (b) wasn't under our control, so we couldn't
> define a new partition type each time we introduced a new Linux file
> system.
> 
> In general, having explicit file system types, whether it is in
> /etc/fstab, or in the GPT partition table, is the better way to go.
> Using blkid is ideally the fallback when the best possible way doesn't
> work, since it will ultimately always be a "best efforts" sort of
> thing.

Thanks for confirming what I expected.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Ensuring that mount(8) will always interpret a filesystem correctly
  2024-12-14 22:08           ` Demi Marie Obenour
@ 2024-12-15  3:20             ` Theodore Ts'o
  0 siblings, 0 replies; 8+ messages in thread
From: Theodore Ts'o @ 2024-12-15  3:20 UTC (permalink / raw)
  To: Demi Marie Obenour; +Cc: Karel Zak, util-linux

On Sat, Dec 14, 2024 at 05:08:54PM -0500, Demi Marie Obenour wrote:
> > From what I can tell, the Discoverable Partition Table specification,
> > at least as defined here[1] only supports explicit file system types
> > supplied by the GPT partition table.
> > 
> > [1] https://uapi-group.org/specifications/specs/discoverable_partitions_specification/
> 
> It’s the other way around: the GPT only provides the mountpoint,
> never the type.  That’s why I filed an issue [1] asking for
> per-filesystem-type UUIDs.

Bleah, you're right.  Other partition tables, including MBR(!) used
the "partiton type" to be the kind of file system.  (For example,
0x07h meant OS/2, 0x09 meant QNX/Coherent/OS-9, 0x0Bh meant FAT32 with
CHS addressing, 0x0Ch meat FAT32 with LBA, 39h meant Plan9, etc.)

When I saw "partition type" in the UEFI spec, I thought they were
seeing the path of wisdom and moving away from in-band signaling to an
explicit type specification --- but you're right, looking at the UEFI
spec more closely, it's about how the file system is to be used, not
the file system type.

(It's really not even the mount point, since
773f91ef-66d4-49b5-bd83-d683bf40ad16 means "per-user home partition",
but since the UUID doesn't specify the username, you would't know
whether it was supposed to be mounted in /home/lucy, or /home/snoopy,
or /home/charlie_brown.  Yelch....)

> I’m not too worried about this, and instead am of the opinion that it
> needs to be fixed on the blkid side (by ignoring the ZFS superblock).

I disagree; blkid's *job* is to detect the file system type, and just
ignoring all ZFS superblocks means that it won't be able to detect ZFS
file systems, which would be sad.  And having some kind of arbitrary
preference where blkid were to say, "well, if it's ambiguous whether a
block device is ext4 or btrfs or ZFS, I'll just arbitrarily say ext4
because I like ext4 more" is well, arbitrary.

The best way to solve this is to either have users use "wipefs -a
/dev/hdXX" before running a mkfs program, but in the spirit of being
kind to users[1] who don't know about wipefs, or for distro installers
that don't bother to call wipefs, I'm perfectly happy to teach
mkfs.ext4 how to make the right thing happen automatically.  I just
need to know how to zap ZFS superbloks.

BTW, in practice this happens automatically for SSD's, since we will
call BLKDISCARD on the entire device, for better FTL GC performance.
But for HDD's, we will need to explicitly write zeroes in the correct
location.

Cheers,

					- Ted

[1] Using a variation from Struck and White's "The Elements of Style"
where they said, "always write with a deep empathy towards the
reader", we should strive to program with deep empathy towards the
user.  :-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-15  3:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-08  1:45 Ensuring that mount(8) will always interpret a filesystem correctly Demi Marie Obenour
2024-12-09 10:26 ` Karel Zak
2024-12-10  5:11   ` Demi Marie Obenour
2024-12-10 11:16     ` Karel Zak
2024-12-10 23:28       ` Demi Marie Obenour
2024-12-11 13:38         ` Theodore Ts'o
2024-12-14 22:08           ` Demi Marie Obenour
2024-12-15  3:20             ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.