* xfs_clear_incompat_log_features considered harmful? @ 2024-01-31 23:00 Darrick J. Wong 2024-02-01 4:33 ` Christoph Hellwig 2024-02-05 1:09 ` Dave Chinner 0 siblings, 2 replies; 8+ messages in thread From: Darrick J. Wong @ 2024-01-31 23:00 UTC (permalink / raw) To: Christoph Hellwig, Dave Chinner; +Cc: xfs Hi everyone, Christoph spied the xfs_swapext_can_use_without_log_assistance function[0] in the atomic file updates patchset[1] and wondered why we go through this inverted-bitmask dance to avoid setting the XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature. (The same principles apply to xfs_attri_can_use_without_log_assistance from the since-merged LARP series.) The reason for this dance is that xfs_add_incompat_log_feature is an expensive operation -- it forces the log, pushes the AIL, and then if nobody's beaten us to it, sets the feature bit and issues a synchronous write of the primary superblock. That could be a one-time cost amortized over the life of the filesystem, but the log quiesce and cover operations call xfs_clear_incompat_log_features to remove feature bits opportunistically. On a moderately loaded filesystem this leads to us cycling those bits on and off over and over, which hurts performance. Why do we clear the log incompat bits? Back in ~2020 I think Dave and I had a conversation on IRC[2] about what the log incompat bits represent. IIRC in that conversation we decided that the log incompat bits protect unrecovered log items so that old kernels won't try to recover them and barf. Since a clean log has no protected log items, we could clear the bits at cover/quiesce time. At the time, I think we decided to go with this idea because we didn't like the idea of reducing the span of kernels that can recover a filesystem over the lifetime of that filesystem. [ISTR Eric pointing out at some point that adding incompat feature bits at runtime could confuse users who crash and try to recover with Ye Olde Knoppix CD because there's now a log incompat bit set that wasn't there at format time, but my memory is a bit hazy.] Christoph wondered why I don't just set the log incompat bits at mkfs time, especially for filesystems that have some newer feature set (e.g. parent pointers, metadir, rtgroups...) to avoid the runtime cost of adding the feature flag. I don't bother with that because of the log clearing behavior. He also noted that _can_use_without_log_assistance is potentially dangerous if distro vendors backport features to old kernels in a different order than they land in upstream. Another problem with this scheme is the l_incompat_users rwsem that we use to protect a log cleaning operation from clearing a feature bit that a frontend thread is trying to set -- this lock adds another way to fail w.r.t. locking. For the swapext series, I shard that into multiple locks just to work around the lockdep complaints, and that's fugly. My final point is that this cycling increases fstests runtime by a good 5%, though fstests isn't a normal workflow. So. Given that this set/clear dance imposes continuous runtime costs on all the users, I want to remove xfs_clear_incompat_log_features. Log incompat bits get set once, and they never go away. This eliminates the need for the rwsem, all the extra incompat-clearing bits in the log code, and fixes the performance problems I see. Going forward, I'd make mkfs set the log incompat features during a fresh format if any of the currently-undefined feature bits are set, which means that they'll be enabled by default on any filesystem with directory parent pointers and/or metadata directories. I'd also add mkfs -l options so that sysadmins can turn it on at format time. We can discuss whether we want to allow people to set the log incompat features at runtime -- allowing it at least for recent filesystems (e.g. v5 + rmap) is easier for users, but only if we decide that we don't really care about the "recover with old Knoppix" surprise. If we decide against online upgrades, we /could/ at least allow upgrades via xfs_admin like we have for bigtime/inobtcnt. Or we could decide that new functionality requires a reformat. Thoughts? I vote for removing xfs_clear_incompat_log_features and letting people turn on log incompat features at runtime. --D [0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=djwong-wtf&id=fc6f493c540c520b24e28dfa77c23eea773fa20d [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tag/?h=atomic-file-updates_2024-01-30 [2] https://lore.kernel.org/linux-xfs/10237117-2149-d504-bbad-6ec28d420c9b@oracle.com/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-01-31 23:00 xfs_clear_incompat_log_features considered harmful? Darrick J. Wong @ 2024-02-01 4:33 ` Christoph Hellwig 2024-02-05 1:09 ` Dave Chinner 1 sibling, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2024-02-01 4:33 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Christoph Hellwig, Dave Chinner, xfs On Wed, Jan 31, 2024 at 03:00:43PM -0800, Darrick J. Wong wrote: > Hi everyone, > > Christoph spied the xfs_swapext_can_use_without_log_assistance > function[0] in the atomic file updates patchset[1] and wondered why we > go through this inverted-bitmask dance to avoid setting the > XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature. > > (The same principles apply to xfs_attri_can_use_without_log_assistance > from the since-merged LARP series.) xfs_attri_can_use_without_log_assistance actually is new in your patch stack. Not that my biggest issue is that this check actually is broken. The point of the compat,incompat,log_incompat features is that they move away from a linear version model where a new version implies all previous version to one where we have a bit designating exactly what is supported. Optimizing away the need to set a bit just because other bits are sit brings back this linear versining in a sneaky, undocumented and dangerous way. > The reason for this dance is that xfs_add_incompat_log_feature is an > expensive operation -- it forces the log, pushes the AIL, and then if > nobody's beaten us to it, sets the feature bit and issues a synchronous > write of the primary superblock. That could be a one-time cost > amortized over the life of the filesystem, Yes. > Given that this set/clear dance imposes continuous runtime costs on all > the users, I want to remove xfs_clear_incompat_log_features. Log > incompat bits get set once, and they never go away. This eliminates the > need for the rwsem, all the extra incompat-clearing bits in the log > code, and fixes the performance problems I see. I mostly agree. I don't think we have to strictly say they don't go away, but makign them go away should be explicit and we should only do that if someone has a clear use case for it. > Going forward, I'd make mkfs set the log incompat features during a > fresh format if any of the currently-undefined feature bits are set, > which means that they'll be enabled by default on any filesystem with > directory parent pointers and/or metadata directories. I'd also add > mkfs -l options so that sysadmins can turn it on at format time. Yes. > We can discuss whether we want to allow people to set the log incompat > features at runtime -- allowing it at least for recent filesystems (e.g. > v5 + rmap) is easier for users, but only if we decide that we don't > really care about the "recover with old Knoppix" surprise. If we decide > against online upgrades, we /could/ at least allow upgrades via > xfs_admin like we have for bigtime/inobtcnt. Or we could decide that > new functionality requires a reformat. I think a runtime add (for recent enough file systems) would be really useful. But it should be an explicit opt-in and not a silent upgrade, which avoids any surprices with reocvery tools. > Thoughts? I vote for removing xfs_clear_incompat_log_features and > letting people turn on log incompat features at runtime. Please do! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-01-31 23:00 xfs_clear_incompat_log_features considered harmful? Darrick J. Wong 2024-02-01 4:33 ` Christoph Hellwig @ 2024-02-05 1:09 ` Dave Chinner 2024-02-05 6:45 ` Christoph Hellwig 2024-02-06 5:30 ` Darrick J. Wong 1 sibling, 2 replies; 8+ messages in thread From: Dave Chinner @ 2024-02-05 1:09 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs On Wed, Jan 31, 2024 at 03:00:43PM -0800, Darrick J. Wong wrote: > Hi everyone, > > Christoph spied the xfs_swapext_can_use_without_log_assistance > function[0] in the atomic file updates patchset[1] and wondered why we > go through this inverted-bitmask dance to avoid setting the > XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature. > > (The same principles apply to xfs_attri_can_use_without_log_assistance > from the since-merged LARP series.) > > The reason for this dance is that xfs_add_incompat_log_feature is an > expensive operation -- it forces the log, pushes the AIL, and then if > nobody's beaten us to it, sets the feature bit and issues a synchronous > write of the primary superblock. That could be a one-time cost > amortized over the life of the filesystem, but the log quiesce and cover > operations call xfs_clear_incompat_log_features to remove feature bits > opportunistically. On a moderately loaded filesystem this leads to us > cycling those bits on and off over and over, which hurts performance. > > Why do we clear the log incompat bits? Back in ~2020 I think Dave and I > had a conversation on IRC[2] about what the log incompat bits represent. > IIRC in that conversation we decided that the log incompat bits protect > unrecovered log items so that old kernels won't try to recover them and > barf. Since a clean log has no protected log items, we could clear the > bits at cover/quiesce time. At the time, I think we decided to go with > this idea because we didn't like the idea of reducing the span of > kernels that can recover a filesystem over the lifetime of that > filesystem. I don't think that was the issue - it was a practical concern that having unnecessary log incompat fields set would prevent the common practice of provisioning VMs with newer kernels than the host is running. The issue arises if the host tries to mount the guest VM image to configure the clone of a golden image prior to first start. If there are log incompat fields set in the golden image that was generated by a newer kernel/OS image builder then the provisioning host cannot mount the filesystem even though the log is clean and recovery is unnecessary to mount the filesystem. Hence on unmount we really want the journal contents based log incompat bits cleared because there is nothing incompatible in the log and so there is no reason to prevent older kernels from mounting the filesytsem. > > [ISTR Eric pointing out at some point that adding incompat feature bits > at runtime could confuse users who crash and try to recover with Ye Olde > Knoppix CD because there's now a log incompat bit set that wasn't there > at format time, but my memory is a bit hazy.] > > Christoph wondered why I don't just set the log incompat bits at mkfs > time, especially for filesystems that have some newer feature set (e.g. > parent pointers, metadir, rtgroups...) to avoid the runtime cost of > adding the feature flag. I don't bother with that because of the log > clearing behavior. He also noted that _can_use_without_log_assistance > is potentially dangerous if distro vendors backport features to old > kernels in a different order than they land in upstream. This is what incompat feature bits are for, not log incompat bits. We don't need log incompat bits for pp, metaddir, rtgroups, etc because older kernels won't even get to log recovery as they see incompat feature bits set when they first read the superblock. IOWs, a log incompat flag should only be necessary to prevent log recovery on older kernels if we change how something is logged without otherwise changing the on disk format of those items (e.g. LARP). There are no incompat feature bits that are set in these cases, so we need a log incompat bit to be set whilst the journal has those items in it.... > Another problem with this scheme is the l_incompat_users rwsem that we > use to protect a log cleaning operation from clearing a feature bit that > a frontend thread is trying to set -- this lock adds another way to fail > w.r.t. locking. For the swapext series, I shard that into multiple > locks just to work around the lockdep complaints, and that's fugly. We can avoid that simply by not clearing the incompat bit at cover time, and instead do it only at unmount. Then it only gets set once per mount, and only gets cleared when we are running single threaded on unmount. No need for locking at this point holding the superblock buffer locked will serialise feature bit addition.... > Going forward, I'd make mkfs set the log incompat features during a > fresh format if any of the currently-undefined feature bits are set, > which means that they'll be enabled by default on any filesystem with > directory parent pointers and/or metadata directories. I'd also add > mkfs -l options so that sysadmins can turn it on at format time. At this point they are just normal incompat feature bits. Why not just use those instead? i.e. Why do we need log incompat bits to be permanently sticky when we've already got incompat feature bits for that? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-02-05 1:09 ` Dave Chinner @ 2024-02-05 6:45 ` Christoph Hellwig 2024-02-06 5:23 ` Darrick J. Wong 2024-02-06 5:30 ` Darrick J. Wong 1 sibling, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2024-02-05 6:45 UTC (permalink / raw) To: Dave Chinner; +Cc: Darrick J. Wong, Christoph Hellwig, xfs On Mon, Feb 05, 2024 at 12:09:23PM +1100, Dave Chinner wrote: > The issue arises if the host tries to mount the guest VM image to > configure the clone of a golden image prior to first start. If there > are log incompat fields set in the golden image that was generated > by a newer kernel/OS image builder then the provisioning > host cannot mount the filesystem even though the log is clean and > recovery is unnecessary to mount the filesystem. Well, even with the current code base in Darrick's queue a mount alone won't upgrade features, you need to do an explicit exchrange or online repair operation. And I think we should basically never do log or other format incompatible changes without an explicit user action. The only exception would be if the feature is so old that we finally want to get rid of the old implementation, in which case we can think of automatically doing the upgrade with a big fat warning. > Hence on unmount we really want the journal contents based log > incompat bits cleared because there is nothing incompatible in the > log and so there is no reason to prevent older kernels from > mounting the filesytsem. Doing the clearing at unmount time only (and maybe freeze if someone really cares) sounds perfectly fine. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-02-05 6:45 ` Christoph Hellwig @ 2024-02-06 5:23 ` Darrick J. Wong 2024-02-09 6:33 ` Christoph Hellwig 0 siblings, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2024-02-06 5:23 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Dave Chinner, xfs On Sun, Feb 04, 2024 at 10:45:26PM -0800, Christoph Hellwig wrote: > On Mon, Feb 05, 2024 at 12:09:23PM +1100, Dave Chinner wrote: > > The issue arises if the host tries to mount the guest VM image to > > configure the clone of a golden image prior to first start. If there > > are log incompat fields set in the golden image that was generated > > by a newer kernel/OS image builder then the provisioning > > host cannot mount the filesystem even though the log is clean and > > recovery is unnecessary to mount the filesystem. > > Well, even with the current code base in Darrick's queue a mount alone > won't upgrade features, you need to do an explicit exchrange or online > repair operation. And I think we should basically never do log or > other format incompatible changes without an explicit user action. Should I add a flags bit to the ioctls so that programs can force them on if the process has CAP_SYS_ADMIN? Or would you rather a mount option "-o allow_log_upgrades=1" so that's totally under control of whoever writes fstab? The first option probably turns into an "and now everyone sets this" thing; the second one clutters up the mount options. > The only exception would be if the feature is so old that we finally > want to get rid of the old implementation, in which case we can think > of automatically doing the upgrade with a big fat warning. Heh, we're probably going to have to do that with bigtime come 2035. > > Hence on unmount we really want the journal contents based log > > incompat bits cleared because there is nothing incompatible in the > > log and so there is no reason to prevent older kernels from > > mounting the filesytsem. > > Doing the clearing at unmount time only (and maybe freeze if someone > really cares) sounds perfectly fine. Ugh, I only want to do this at umount time if I can get away with it. Freeze is already hard enough to grok. --D ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-02-06 5:23 ` Darrick J. Wong @ 2024-02-09 6:33 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2024-02-09 6:33 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Christoph Hellwig, Dave Chinner, xfs On Mon, Feb 05, 2024 at 09:23:29PM -0800, Darrick J. Wong wrote: > > > > Well, even with the current code base in Darrick's queue a mount alone > > won't upgrade features, you need to do an explicit exchrange or online > > repair operation. And I think we should basically never do log or > > other format incompatible changes without an explicit user action. > > Should I add a flags bit to the ioctls so that programs can force them > on if the process has CAP_SYS_ADMIN? Please not, that's just a horrible interface. > Or would you rather a mount option > "-o allow_log_upgrades=1" so that's totally under control of whoever > writes fstab? > > The first option probably turns into an "and now everyone sets this" > thing; the second one clutters up the mount options. Yes, that or a CAP_SYS_ADMIN ioctl on the fs. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-02-05 1:09 ` Dave Chinner 2024-02-05 6:45 ` Christoph Hellwig @ 2024-02-06 5:30 ` Darrick J. Wong 2024-02-07 0:14 ` Dave Chinner 1 sibling, 1 reply; 8+ messages in thread From: Darrick J. Wong @ 2024-02-06 5:30 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs On Mon, Feb 05, 2024 at 12:09:23PM +1100, Dave Chinner wrote: > On Wed, Jan 31, 2024 at 03:00:43PM -0800, Darrick J. Wong wrote: > > Hi everyone, > > > > Christoph spied the xfs_swapext_can_use_without_log_assistance > > function[0] in the atomic file updates patchset[1] and wondered why we > > go through this inverted-bitmask dance to avoid setting the > > XFS_SB_FEAT_INCOMPAT_LOG_SWAPEXT feature. > > > > (The same principles apply to xfs_attri_can_use_without_log_assistance > > from the since-merged LARP series.) > > > > The reason for this dance is that xfs_add_incompat_log_feature is an > > expensive operation -- it forces the log, pushes the AIL, and then if > > nobody's beaten us to it, sets the feature bit and issues a synchronous > > write of the primary superblock. That could be a one-time cost > > amortized over the life of the filesystem, but the log quiesce and cover > > operations call xfs_clear_incompat_log_features to remove feature bits > > opportunistically. On a moderately loaded filesystem this leads to us > > cycling those bits on and off over and over, which hurts performance. > > > > Why do we clear the log incompat bits? Back in ~2020 I think Dave and I > > had a conversation on IRC[2] about what the log incompat bits represent. > > IIRC in that conversation we decided that the log incompat bits protect > > unrecovered log items so that old kernels won't try to recover them and > > barf. Since a clean log has no protected log items, we could clear the > > bits at cover/quiesce time. At the time, I think we decided to go with > > this idea because we didn't like the idea of reducing the span of > > kernels that can recover a filesystem over the lifetime of that > > filesystem. > > I don't think that was the issue - it was a practical concern that > having unnecessary log incompat fields set would prevent the common > practice of provisioning VMs with newer kernels than the host is > running. aha, thanks for refreshing my memory. > The issue arises if the host tries to mount the guest VM image to > configure the clone of a golden image prior to first start. If there > are log incompat fields set in the golden image that was generated > by a newer kernel/OS image builder then the provisioning > host cannot mount the filesystem even though the log is clean and > recovery is unnecessary to mount the filesystem. > > Hence on unmount we really want the journal contents based log > incompat bits cleared because there is nothing incompatible in the > log and so there is no reason to prevent older kernels from > mounting the filesytsem. <nod> Will it be a problem if we crash with a log incompat bit set but none of the actual log items that it protects in the ondisk log? > > [ISTR Eric pointing out at some point that adding incompat feature bits > > at runtime could confuse users who crash and try to recover with Ye Olde > > Knoppix CD because there's now a log incompat bit set that wasn't there > > at format time, but my memory is a bit hazy.] > > > > Christoph wondered why I don't just set the log incompat bits at mkfs > > time, especially for filesystems that have some newer feature set (e.g. > > parent pointers, metadir, rtgroups...) to avoid the runtime cost of > > adding the feature flag. I don't bother with that because of the log > > clearing behavior. He also noted that _can_use_without_log_assistance > > is potentially dangerous if distro vendors backport features to old > > kernels in a different order than they land in upstream. > > This is what incompat feature bits are for, not log incompat bits. > We don't need log incompat bits for pp, metaddir, rtgroups, etc > because older kernels won't even get to log recovery as they see > incompat feature bits set when they first read the superblock. > > IOWs, a log incompat flag should only be necessary to prevent log > recovery on older kernels if we change how something is logged > without otherwise changing the on disk format of those items (e.g. > LARP). There are no incompat feature bits that are set in these > cases, so we need a log incompat bit to be set whilst the journal > has those items in it.... Ok. > > Another problem with this scheme is the l_incompat_users rwsem that we > > use to protect a log cleaning operation from clearing a feature bit that > > a frontend thread is trying to set -- this lock adds another way to fail > > w.r.t. locking. For the swapext series, I shard that into multiple > > locks just to work around the lockdep complaints, and that's fugly. > > We can avoid that simply by not clearing the incompat bit at cover > time, and instead do it only at unmount. Then it only gets set once > per mount, and only gets cleared when we are running single threaded > on unmount. No need for locking at this point holding the superblock > buffer locked will serialise feature bit addition.... Ok. I can live with that -- clearing the log-incompat bits on a clean unmount, and leaving them set at cleaning time. I think this means l_incompat_users can go away too. > > Going forward, I'd make mkfs set the log incompat features during a > > fresh format if any of the currently-undefined feature bits are set, > > which means that they'll be enabled by default on any filesystem with > > directory parent pointers and/or metadata directories. I'd also add > > mkfs -l options so that sysadmins can turn it on at format time. > > At this point they are just normal incompat feature bits. Why not > just use those instead? i.e. Why do we need log incompat bits to be > permanently sticky when we've already got incompat feature bits for > that? Without the "clear log incompat on clean" behavior, I don't think we need anything I wrote about in that paragraph anymore either. So, if I'm reading you both correctly, I'll: 1. change the log incompat handling code to clear them on umount only; 2. add a mount option to allow admins to permit setting of log incompat flags; 3. leave the xfs_{swapext,attri}_can_use_without_log_assistance functions as they are in djwong-wtf so that new incompat filesystem features will eliminate the need for setting/clearing the log incompat flags Did I get that right? --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfs_clear_incompat_log_features considered harmful? 2024-02-06 5:30 ` Darrick J. Wong @ 2024-02-07 0:14 ` Dave Chinner 0 siblings, 0 replies; 8+ messages in thread From: Dave Chinner @ 2024-02-07 0:14 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs On Mon, Feb 05, 2024 at 09:30:11PM -0800, Darrick J. Wong wrote: > > So, if I'm reading you both correctly, I'll: > > 1. change the log incompat handling code to clear them on umount only; > 2. add a mount option to allow admins to permit setting of log incompat > flags; I'd maybe turn this aroudn the other way - mount option to disallow it, because I think the number of use cases where we don't want flags to be set are much smaller than the number of use cases where it either doesn't matter or using the new features is desired. > 3. leave the xfs_{swapext,attri}_can_use_without_log_assistance > functions as they are in djwong-wtf so that new incompat filesystem > features will eliminate the need for setting/clearing the log > incompat flags > > Did I get that right? Sounds mostly fine to me. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-02-09 6:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-31 23:00 xfs_clear_incompat_log_features considered harmful? Darrick J. Wong 2024-02-01 4:33 ` Christoph Hellwig 2024-02-05 1:09 ` Dave Chinner 2024-02-05 6:45 ` Christoph Hellwig 2024-02-06 5:23 ` Darrick J. Wong 2024-02-09 6:33 ` Christoph Hellwig 2024-02-06 5:30 ` Darrick J. Wong 2024-02-07 0:14 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox