* inline extents @ 2018-09-19 16:23 Chris Murphy 2018-09-20 1:45 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Chris Murphy @ 2018-09-19 16:23 UTC (permalink / raw) To: xfs list Hi, I ran across this: xfs: introduce inode data inline feature https://lwn.net/Articles/759183/ I'm wondering if there's a chance it becomes the default one day in the not too distant future? And if it is enabled at all (user or by default), what happens if something where to directly overwrite 1024 bytes outside of the file system? For example: The file /boot/grub/grubenv is 1KiB and stores GRUB environment info, like what the default boot should be. Fedora up until now only writes to it when booted, so it goes through the file system, and the bootloader only reads from it. Fedora 29 has a new feature to test if boot+startup fails, so the bootloader can do a fallback at next boot, to a previously working entry. Part of this means GRUB (the bootloader code, not the user space code) uses "save_env" to overwrite the 1024 data bytes with updated environment information. On something like FAT or ext2 or even ext4 without checksums, this isn't a problem. On Btrfs, 1KiB is almost always going to be an inline extent, found inside a 16KiB leaf, and that leaf has a checksum predicated on the entire contents of that leaf. Overwrite 1KiB outside the file system and now the checksum is wrong, the kernel code will consider the entire 16KiB leaf corrupt. And that leaf might contain items totally unrelated to the file being modified so it could be a rather significant corruption. And it may not be fixable (I haven't really tested this yet and I think GRUB knows better than to write to a grubenv on Btrfs anyway). For XFS, I'm not sure how the inline extent is saved, and whether metadata checksumming includes or excludes the inline extent. I'm also kinda ignoring the reflink ramifications of this behavior, for now. Let's just say even if there's no corruption I'm really suspicious of bootloader code writing anything, even what seems to be a simple overwrite of two sectors. Thanks, -- Chris Murphy ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: inline extents 2018-09-19 16:23 inline extents Chris Murphy @ 2018-09-20 1:45 ` Dave Chinner 2018-09-20 3:18 ` Chris Murphy 0 siblings, 1 reply; 4+ messages in thread From: Dave Chinner @ 2018-09-20 1:45 UTC (permalink / raw) To: Chris Murphy; +Cc: xfs list On Wed, Sep 19, 2018 at 10:23:38AM -0600, Chris Murphy wrote: > Hi, > > I ran across this: > xfs: introduce inode data inline feature > https://lwn.net/Articles/759183/ > > I'm wondering if there's a chance it becomes the default one day in > the not too distant future? And if it is enabled at all (user or by > default), what happens if something where to directly overwrite 1024 > bytes outside of the file system? First somebody has to implement the feature in a way that it can be merged. :) > For example: > > The file /boot/grub/grubenv is 1KiB and stores GRUB environment info, > like what the default boot should be. Fedora up until now only writes > to it when booted, so it goes through the file system, and the > bootloader only reads from it. > > Fedora 29 has a new feature to test if boot+startup fails, so the > bootloader can do a fallback at next boot, to a previously working > entry. Part of this means GRUB (the bootloader code, not the user > space code) uses "save_env" to overwrite the 1024 data bytes with > updated environment information. That's just broken. Illegal. Completely unsupportable. Doesn't matter what the filesystem is, nobody is allowed to write directly to the block device a filesystem owns. > On something like FAT or ext2 or even ext4 without checksums, this > isn't a problem. On Btrfs, 1KiB is almost always going to be an inline > extent, found inside a 16KiB leaf, and that leaf has a checksum > predicated on the entire contents of that leaf. Overwrite 1KiB outside > the file system and now the checksum is wrong, the kernel code will > consider the entire 16KiB leaf corrupt. Yup. Or it's a shared data extent (i.e. a reflinked or deduped file) and writing to it corrupts the other copies because the filesystem wasn't able to COW it. > And that leaf might contain > items totally unrelated to the file being modified so it could be a > rather significant corruption. And it may not be fixable (I haven't > really tested this yet and I think GRUB knows better than to write to > a grubenv on Btrfs anyway). ext4 has inline data, too, so there's every chance grub will corrupt ext4 filesystems with tit's wonderful new feature. I'm not sure if the ext4 metadata cksums cover the entire inode and inline data, but if they do it's the same problem as btrfs. > For XFS, I'm not sure how the inline extent is saved, and whether > metadata checksumming includes or excludes the inline extent. When XFS implements this, it will be like btrfs as the data will be covered by the metadata CRCs for the inode, and so writing directly to it would corrupt the inode and render it unreadable by the filesystem. > I'm also kinda ignoring the reflink ramifications of this behavior, > for now. Let's just say even if there's no corruption I'm really > suspicious of bootloader code writing anything, even what seems to be > a simple overwrite of two sectors. You're not the only one Like I said, it doesn't matter what the filesystem is, overwriting file data by writing directly to the block device is not supportable. It's essentially a filesystem corruption vector, and grub needs to have that functionality removed immediately. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: inline extents 2018-09-20 1:45 ` Dave Chinner @ 2018-09-20 3:18 ` Chris Murphy 2018-09-20 19:34 ` Theodore Y. Ts'o 0 siblings, 1 reply; 4+ messages in thread From: Chris Murphy @ 2018-09-20 3:18 UTC (permalink / raw) To: Dave Chinner, Linux FS Devel; +Cc: xfs list, Btrfs BTRFS, linux-ext4 Adding fsdevel@, linux-ext4, and btrfs@ (which has a separate subject on this same issue) On Wed, Sep 19, 2018 at 7:45 PM, Dave Chinner <david@fromorbit.com> wrote: >On Wed, Sep 19, 2018 at 10:23:38AM -0600, Chris Murphy wrote: >> Fedora 29 has a new feature to test if boot+startup fails, so the >> bootloader can do a fallback at next boot, to a previously working >> entry. Part of this means GRUB (the bootloader code, not the user >> space code) uses "save_env" to overwrite the 1024 data bytes with >> updated environment information. > > That's just broken. Illegal. Completely unsupportable. Doesn't > matter what the filesystem is, nobody is allowed to write directly > to the block device a filesystem owns. Yeah, the word I'm thinking of is abomination. However in their defense, grubenv and the 'save_env' command are old features: line 3638 @node Environment block http://git.savannah.gnu.org/cgit/grub.git/tree/docs/grub.texi "For safety reasons, this storage is only available when installed on a plain disk (no LVM or RAID), using a non-checksumming filesystem (no ZFS), and using BIOS or EFI functions (no ATA, USB or IEEE1275)." I haven't checked how it tests for this. But by now, it should list the supported file systems, rather than what's exempt. That's a shorter list. > ext4 has inline data, too, so there's every chance grub will corrupt > ext4 filesystems with tit's wonderful new feature. I'm not sure if > the ext4 metadata cksums cover the entire inode and inline data, but > if they do it's the same problem as btrfs. I don't see inline used with a default mkfs, but I do see metadata_csum e2fsprogs-1.44.3-1.fc29.x86_64 Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl > >> For XFS, I'm not sure how the inline extent is saved, and whether >> metadata checksumming includes or excludes the inline extent. > > When XFS implements this, it will be like btrfs as the data will be > covered by the metadata CRCs for the inode, and so writing directly > to it would corrupt the inode and render it unreadable by the > filesystem. Good to know. > >> I'm also kinda ignoring the reflink ramifications of this behavior, >> for now. Let's just say even if there's no corruption I'm really >> suspicious of bootloader code writing anything, even what seems to be >> a simple overwrite of two sectors. > > You're not the only one > > Like I said, it doesn't matter what the filesystem is, overwriting > file data by writing directly to the block device is not > supportable. It's essentially a filesystem corruption vector, and > grub needs to have that functionality removed immediately. I'm in agreement with respect to the more complex file systems. We've already realized the folly of the bootloader being unable to do journal replay, ergo it doesn't really for sure have a complete picture of the file system anyway. That's suboptimal when it results in boot failure. But if it were going to use stale file system information, get a wrong idea of the file system, and then use that to do even 1024 bytes of writes? No, no, and no. Meanwhile, also in Fedoraland, it's one of the distros where grubenv and grub.cfg stuff is on the EFI System partition, which is FAT. This overwrite behavior will work there, but even this case is a kind of betrayal that the file is being modified, without its metadata being updated. I think it's an old era hack that by today's standards simply isn't good enough. I'm a little surprised that all UEFI implementations permit arbitrary writes from the pre-boot environment to arbitrary block devices, even with Secure Boot enabled. That seems specious. I know some of the file systems have reserve areas for bootloader stuff. I'm not sure if that's preferred over bootloaders just getting their own partition and controlling it stem to stern however they want. -- Chris Murphy ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: inline extents 2018-09-20 3:18 ` Chris Murphy @ 2018-09-20 19:34 ` Theodore Y. Ts'o 0 siblings, 0 replies; 4+ messages in thread From: Theodore Y. Ts'o @ 2018-09-20 19:34 UTC (permalink / raw) To: Chris Murphy Cc: Dave Chinner, Linux FS Devel, xfs list, Btrfs BTRFS, linux-ext4 On Wed, Sep 19, 2018 at 09:18:16PM -0600, Chris Murphy wrote: > > ext4 has inline data, too, so there's every chance grub will corrupt > > ext4 filesystems with tit's wonderful new feature. I'm not sure if > > the ext4 metadata cksums cover the entire inode and inline data, but > > if they do it's the same problem as btrfs. > > I don't see inline used with a default mkfs, but I do see metadata_csum The grub environment block is 1024 bytes, so unless you are using a 4k inode size (the default is 256 bytes), the grub environment block won't be an inline data file. So in practice, this won't be a problem. So in practice this won't be a problem for ext4 today. There are ways that grub can query the file system (using FIEMAP or FS_IOC_GETFLAGS) to see whether or not the file is an inline file. For xfs grub would also need to make sure the file isn't reflinked file (which can also be queried using FIEMAP). That's fine for inline, since a file won't magically change from stored in a block versus inline, so grub can check this at grub setup time. But the problem with reflink is that a file that was originally using a block exclusively could later have that block be shared by another file, at which point it's only safe to write the block using copy-on-write. > I know some of the file systems have reserve areas for bootloader > stuff. I'm not sure if that's preferred over bootloaders just getting > their own partition and controlling it stem to stern however they > want. Ext4 had reserved the bootloader inode, and we have defined a way for a bootloader to access it. The original intent was that not it just be fore the environment block, but also the grub stage 1.5 or stage 2 loader. This would allow grub to just use a fixed list of blocks or extents, and not need to understand the ext4 extents structures. We added way this when, because we anticipated this would be easier for grub than having it learn how to read ext4 extents structures. Instead, the bootloader would at setup time use the EXT4_IOC_SWAP_BOOT ioctl and then use a fixed list of blocks or block extents like LILO used to do. Of course, grub exceeded expectations and actually learned how to read ext4 file systems, and probably we (the ext4 developers) should have reached out to GRUB folks. Cheers, - Ted ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-09-21 1:19 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-19 16:23 inline extents Chris Murphy 2018-09-20 1:45 ` Dave Chinner 2018-09-20 3:18 ` Chris Murphy 2018-09-20 19:34 ` Theodore Y. Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).