* Re: [LSF/MM/BPF TOPIC] Filesystem Suspend Resume [not found] ` <acae7a99f8acb0ebf408bb6fc82ab53fb687559c.camel@HansenPartnership.com> @ 2025-03-21 5:23 ` Christoph Hellwig 2025-03-21 12:34 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2025-03-21 5:23 UTC (permalink / raw) To: James Bottomley Cc: linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Thu, Mar 20, 2025 at 02:15:15PM -0400, James Bottomley wrote: > On Thu, 2025-03-20 at 09:48 -0700, Christoph Hellwig wrote: > [...] > > We finally got hibernate to freeze file system on suspend, > > I was looking for this to see if I could possibly plug something in for > pseudo filesystems that don't have backing devices. However, I can't > find the path where suspend causes freeze (at least the bdev doesn't > seem to register any power notifier like the scsi block device does), > where is the code? Looking again I can't find it either. On the internet I find a patch adding it from 2006: https://groups.google.com/g/fa.linux.kernel/c/dtxsNJ7ks58/m/mqU8SIAbvLgJ But I couldn't see if it got applied or disappaeared again somehow. Adding the relevant maintainers. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-21 5:23 ` [LSF/MM/BPF TOPIC] Filesystem Suspend Resume Christoph Hellwig @ 2025-03-21 12:34 ` James Bottomley 2025-03-21 17:00 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2025-03-21 12:34 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Thu, 2025-03-20 at 22:23 -0700, Christoph Hellwig wrote: > On Thu, Mar 20, 2025 at 02:15:15PM -0400, James Bottomley wrote: > > On Thu, 2025-03-20 at 09:48 -0700, Christoph Hellwig wrote: > > [...] > > > We finally got hibernate to freeze file system on suspend, > > > > I was looking for this to see if I could possibly plug something in > > for pseudo filesystems that don't have backing devices. However, I > > can't find the path where suspend causes freeze (at least the bdev > > doesn't seem to register any power notifier like the scsi block > > device does), where is the code? > > Looking again I can't find it either. On the internet I find a patch > adding it from 2006: > > https://groups.google.com/g/fa.linux.kernel/c/dtxsNJ7ks58/m/mqU8SIAbvLgJ Wow google has a terrible interface. This is the lore link: https://lore.kernel.org/all/200611011200.18438.rjw@sisk.pl/ So the patch indicates where to put direct hooks in the power management but it operates via bdev_freeze/thaw() which wouldn't work for pseudo filesystems, but could be replaced by a direct hook into the vfs that would iterate over superblocks calling freeze_super/thaw_super(). > But I couldn't see if it got applied or disappaeared again somehow. > Adding the relevant maintainers. It looks like it got reposted about 5 years later as well (in the middle of a thread about xfs hibernate lockups): https://lore.kernel.org/all/201108032315.06012.rjw__14254.1066081778$1312406161$gmane$org@sisk.pl/ Then again 6 months later: https://lore.kernel.org/all/201201281445.49377.rjw@sisk.pl/ everything kept foundering on deadlock problems between filesystems needing threads to shrink and complete writeout and the freezing of those threads. Let me digest all that and see if we have more hope this time around. Regards, James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-21 12:34 ` James Bottomley @ 2025-03-21 17:00 ` James Bottomley 2025-03-21 17:17 ` Lukas Wunner 2025-03-24 11:38 ` [Lsf-pc] " Jan Kara 0 siblings, 2 replies; 19+ messages in thread From: James Bottomley @ 2025-03-21 17:00 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: [...] > Let me digest all that and see if we have more hope this time around. OK, I think I've gone over it all. The biggest problem with resurrecting the patch was bugs in ext3, which isn't a problem now. Most of the suspend system has been rearchitected to separate suspending user space processes from kernel ones. The sync it currently does occurs before even user processes are frozen. I think (as most of the original proposals did) that we just do freeze all supers (using the reverse list) after user processes are frozen but just before kernel threads are (this shouldn't perturb the image allocation in hibernate, which was another source of bugs in xfs). There's a final wrinkle in that if I plumb efivarfs into all this, it needs to know whether it was a hibernate or suspend, but I can add that as an extra freeze_holder flag. This looked like such a tiny can of worms when I opened it; now it seems to be a lot bigger on the inside than it was on the outside, sigh. Regards, James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-21 17:00 ` James Bottomley @ 2025-03-21 17:17 ` Lukas Wunner 2025-03-21 18:20 ` James Bottomley 2025-03-24 11:38 ` [Lsf-pc] " Jan Kara 1 sibling, 1 reply; 19+ messages in thread From: Lukas Wunner @ 2025-03-21 17:17 UTC (permalink / raw) To: James Bottomley Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Fri, Mar 21, 2025 at 01:00:24PM -0400, James Bottomley wrote: > There's a final wrinkle in that if I plumb efivarfs into all this, it > needs to know whether it was a hibernate or suspend, but I can add that > as an extra freeze_holder flag. Perhaps system_entering_hibernation() does what you need? Thanks, Lukas ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-21 17:17 ` Lukas Wunner @ 2025-03-21 18:20 ` James Bottomley 0 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2025-03-21 18:20 UTC (permalink / raw) To: Lukas Wunner Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Fri, 2025-03-21 at 18:17 +0100, Lukas Wunner wrote: > On Fri, Mar 21, 2025 at 01:00:24PM -0400, James Bottomley wrote: > > There's a final wrinkle in that if I plumb efivarfs into all this, > > it needs to know whether it was a hibernate or suspend, but I can > > add that as an extra freeze_holder flag. > > Perhaps system_entering_hibernation() does what you need? efivarfs needs to know on the resume path, unfortunately, which that call doesn't seem to work for. Also filesystems would have to suspend before devices ... i.e. before this is set even in the suspend path, but I suppose it would be possible to design a flag that has the width of scope required (which would be about the same amount of work as simply adding the extra flags to communicate what the freeze or thaw are for). Regards, James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-21 17:00 ` James Bottomley 2025-03-21 17:17 ` Lukas Wunner @ 2025-03-24 11:38 ` Jan Kara 2025-03-24 14:34 ` James Bottomley 2025-03-24 20:50 ` Dave Chinner 1 sibling, 2 replies; 19+ messages in thread From: Jan Kara @ 2025-03-24 11:38 UTC (permalink / raw) To: James Bottomley Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > [...] > > Let me digest all that and see if we have more hope this time around. > > OK, I think I've gone over it all. The biggest problem with > resurrecting the patch was bugs in ext3, which isn't a problem now. > Most of the suspend system has been rearchitected to separate > suspending user space processes from kernel ones. The sync it > currently does occurs before even user processes are frozen. I think > (as most of the original proposals did) that we just do freeze all > supers (using the reverse list) after user processes are frozen but > just before kernel threads are (this shouldn't perturb the image > allocation in hibernate, which was another source of bugs in xfs). So as far as my memory serves the fundamental problem with this approach was FUSE - once userspace is frozen, you cannot write to FUSE filesystems so filesystem freezing of FUSE would block if userspace is already suspended. You may even have a setup like: bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- another fs So you really have to be careful to freeze this stack without causing deadlocks. So you need to be freezing userspace after filesystems are frozen but then you have to deal with the fact that parts of your userspace will be blocked in the kernel (trying to do some write) waiting for the filesystem to thaw. But it might be tractable these days since I have a vague recollection that system suspend is now able to gracefully handle even tasks in uninterruptible sleep. > There's a final wrinkle in that if I plumb efivarfs into all this, it > needs to know whether it was a hibernate or suspend, but I can add that > as an extra freeze_holder flag. > > This looked like such a tiny can of worms when I opened it; now it > seems to be a lot bigger on the inside than it was on the outside, > sigh. Never underestimate the amount of worms in a can ;) Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 11:38 ` [Lsf-pc] " Jan Kara @ 2025-03-24 14:34 ` James Bottomley 2025-03-24 19:28 ` Jan Kara 2025-03-24 20:56 ` Dave Chinner 2025-03-24 20:50 ` Dave Chinner 1 sibling, 2 replies; 19+ messages in thread From: James Bottomley @ 2025-03-24 14:34 UTC (permalink / raw) To: Jan Kara Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > [...] > > > Let me digest all that and see if we have more hope this time > > > around. > > > > OK, I think I've gone over it all. The biggest problem with > > resurrecting the patch was bugs in ext3, which isn't a problem now. > > Most of the suspend system has been rearchitected to separate > > suspending user space processes from kernel ones. The sync it > > currently does occurs before even user processes are frozen. I > > think > > (as most of the original proposals did) that we just do freeze all > > supers (using the reverse list) after user processes are frozen but > > just before kernel threads are (this shouldn't perturb the image > > allocation in hibernate, which was another source of bugs in xfs). > > So as far as my memory serves the fundamental problem with this > approach was FUSE - once userspace is frozen, you cannot write to > FUSE filesystems so filesystem freezing of FUSE would block if > userspace is already suspended. You may even have a setup like: > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > another fs > > So you really have to be careful to freeze this stack without causing > deadlocks. Ah, so that explains why the sys_sync() sits in suspend resume *before* freezing userspace ... that always appeared odd to me. > So you need to be freezing userspace after filesystems are > frozen but then you have to deal with the fact that parts of your > userspace will be blocked in the kernel (trying to do some write) > waiting for the filesystem to thaw. But it might be tractable these > days since I have a vague recollection that system suspend is now > able to gracefully handle even tasks in uninterruptible sleep. There is another thing I thought about: we don't actually have to freeze across the sleep. It might be possible simply to invoke freeze/thaw where sys_sync() is now done to get a better on stable storage image? That should have fewer deadlock issues. > > There's a final wrinkle in that if I plumb efivarfs into all this, > > it needs to know whether it was a hibernate or suspend, but I can > > add that as an extra freeze_holder flag. > > > > This looked like such a tiny can of worms when I opened it; now it > > seems to be a lot bigger on the inside than it was on the outside, > > sigh. > > Never underestimate the amount of worms in a can ;) Tell me about it ... Regards, James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 14:34 ` James Bottomley @ 2025-03-24 19:28 ` Jan Kara 2025-03-27 14:55 ` Eric Sandeen 2025-03-24 20:56 ` Dave Chinner 1 sibling, 1 reply; 19+ messages in thread From: Jan Kara @ 2025-03-24 19:28 UTC (permalink / raw) To: James Bottomley Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Mon 24-03-25 10:34:56, James Bottomley wrote: > On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > [...] > > > > Let me digest all that and see if we have more hope this time > > > > around. > > > > > > OK, I think I've gone over it all. The biggest problem with > > > resurrecting the patch was bugs in ext3, which isn't a problem now. > > > Most of the suspend system has been rearchitected to separate > > > suspending user space processes from kernel ones. The sync it > > > currently does occurs before even user processes are frozen. I > > > think > > > (as most of the original proposals did) that we just do freeze all > > > supers (using the reverse list) after user processes are frozen but > > > just before kernel threads are (this shouldn't perturb the image > > > allocation in hibernate, which was another source of bugs in xfs). > > > > So as far as my memory serves the fundamental problem with this > > approach was FUSE - once userspace is frozen, you cannot write to > > FUSE filesystems so filesystem freezing of FUSE would block if > > userspace is already suspended. You may even have a setup like: > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > another fs > > > > So you really have to be careful to freeze this stack without causing > > deadlocks. > > Ah, so that explains why the sys_sync() sits in suspend resume *before* > freezing userspace ... that always appeared odd to me. > > > So you need to be freezing userspace after filesystems are > > frozen but then you have to deal with the fact that parts of your > > userspace will be blocked in the kernel (trying to do some write) > > waiting for the filesystem to thaw. But it might be tractable these > > days since I have a vague recollection that system suspend is now > > able to gracefully handle even tasks in uninterruptible sleep. > > There is another thing I thought about: we don't actually have to > freeze across the sleep. It might be possible simply to invoke > freeze/thaw where sys_sync() is now done to get a better on stable > storage image? That should have fewer deadlock issues. Well, there's not going to be a huge difference between doing sync(2) and doing freeze+thaw for each filesystem. After you thaw the filesystem drivers usually mark that the fs is in inconsistent state and that triggers journal replay / fsck on next mount. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 19:28 ` Jan Kara @ 2025-03-27 14:55 ` Eric Sandeen 2025-03-27 17:30 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Eric Sandeen @ 2025-03-27 14:55 UTC (permalink / raw) To: Jan Kara, James Bottomley Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On 3/24/25 2:28 PM, Jan Kara wrote: > On Mon 24-03-25 10:34:56, James Bottomley wrote: >> On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: >>> On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: >>>> On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: >>>> [...] >>>>> Let me digest all that and see if we have more hope this time >>>>> around. >>>> >>>> OK, I think I've gone over it all. The biggest problem with >>>> resurrecting the patch was bugs in ext3, which isn't a problem now. >>>> Most of the suspend system has been rearchitected to separate >>>> suspending user space processes from kernel ones. The sync it >>>> currently does occurs before even user processes are frozen. I >>>> think >>>> (as most of the original proposals did) that we just do freeze all >>>> supers (using the reverse list) after user processes are frozen but >>>> just before kernel threads are (this shouldn't perturb the image >>>> allocation in hibernate, which was another source of bugs in xfs). >>> >>> So as far as my memory serves the fundamental problem with this >>> approach was FUSE - once userspace is frozen, you cannot write to >>> FUSE filesystems so filesystem freezing of FUSE would block if >>> userspace is already suspended. You may even have a setup like: >>> >>> bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- >>> another fs >>> >>> So you really have to be careful to freeze this stack without causing >>> deadlocks. >> >> Ah, so that explains why the sys_sync() sits in suspend resume *before* >> freezing userspace ... that always appeared odd to me. >> >>> So you need to be freezing userspace after filesystems are >>> frozen but then you have to deal with the fact that parts of your >>> userspace will be blocked in the kernel (trying to do some write) >>> waiting for the filesystem to thaw. But it might be tractable these >>> days since I have a vague recollection that system suspend is now >>> able to gracefully handle even tasks in uninterruptible sleep. >> >> There is another thing I thought about: we don't actually have to >> freeze across the sleep. It might be possible simply to invoke >> freeze/thaw where sys_sync() is now done to get a better on stable >> storage image? That should have fewer deadlock issues. > > Well, there's not going to be a huge difference between doing sync(2) and > doing freeze+thaw for each filesystem. After you thaw the filesystem > drivers usually mark that the fs is in inconsistent state and that triggers > journal replay / fsck on next mount. For XFS, IIRC we only do that (mark the log dirty) so that we will process orphan inodes if we crash while frozen, which today happens only during log replay. I tried to remove that behavior long ago but didn't get very far. (Since then maybe we have grown other reasons to mark dirty, not sure.) https://lore.kernel.org/linux-xfs/83696ce6-4054-0e77-b4b8-e82a1a9fbbc3@redhat.com/ Does ext4 mark it dirty too? I actually thought it left a clean journal when freezing. Thanks, -Eric > Honza ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-27 14:55 ` Eric Sandeen @ 2025-03-27 17:30 ` Jan Kara 0 siblings, 0 replies; 19+ messages in thread From: Jan Kara @ 2025-03-27 17:30 UTC (permalink / raw) To: Eric Sandeen Cc: Jan Kara, James Bottomley, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Thu 27-03-25 09:55:21, Eric Sandeen wrote: > On 3/24/25 2:28 PM, Jan Kara wrote: > > On Mon 24-03-25 10:34:56, James Bottomley wrote: > >> On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: > >>> On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > >>>> On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > >>>> [...] > >>>>> Let me digest all that and see if we have more hope this time > >>>>> around. > >>>> > >>>> OK, I think I've gone over it all. The biggest problem with > >>>> resurrecting the patch was bugs in ext3, which isn't a problem now. > >>>> Most of the suspend system has been rearchitected to separate > >>>> suspending user space processes from kernel ones. The sync it > >>>> currently does occurs before even user processes are frozen. I > >>>> think > >>>> (as most of the original proposals did) that we just do freeze all > >>>> supers (using the reverse list) after user processes are frozen but > >>>> just before kernel threads are (this shouldn't perturb the image > >>>> allocation in hibernate, which was another source of bugs in xfs). > >>> > >>> So as far as my memory serves the fundamental problem with this > >>> approach was FUSE - once userspace is frozen, you cannot write to > >>> FUSE filesystems so filesystem freezing of FUSE would block if > >>> userspace is already suspended. You may even have a setup like: > >>> > >>> bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > >>> another fs > >>> > >>> So you really have to be careful to freeze this stack without causing > >>> deadlocks. > >> > >> Ah, so that explains why the sys_sync() sits in suspend resume *before* > >> freezing userspace ... that always appeared odd to me. > >> > >>> So you need to be freezing userspace after filesystems are > >>> frozen but then you have to deal with the fact that parts of your > >>> userspace will be blocked in the kernel (trying to do some write) > >>> waiting for the filesystem to thaw. But it might be tractable these > >>> days since I have a vague recollection that system suspend is now > >>> able to gracefully handle even tasks in uninterruptible sleep. > >> > >> There is another thing I thought about: we don't actually have to > >> freeze across the sleep. It might be possible simply to invoke > >> freeze/thaw where sys_sync() is now done to get a better on stable > >> storage image? That should have fewer deadlock issues. > > > > Well, there's not going to be a huge difference between doing sync(2) and > > doing freeze+thaw for each filesystem. After you thaw the filesystem > > drivers usually mark that the fs is in inconsistent state and that triggers > > journal replay / fsck on next mount. > > For XFS, IIRC we only do that (mark the log dirty) so that we will process > orphan inodes if we crash while frozen, which today happens only during log > replay. I tried to remove that behavior long ago but didn't get very far. > (Since then maybe we have grown other reasons to mark dirty, not sure.) > > https://lore.kernel.org/linux-xfs/83696ce6-4054-0e77-b4b8-e82a1a9fbbc3@redhat.com/ > > Does ext4 mark it dirty too? I actually thought it left a clean journal when > freezing. The journal is completely checkpointed (thus emptied) while freezing but thawing marks the superblock as requiring replay again and also background filesystem threads (like lazy init, periodic superblock stats update, etc.) can start creating transactions in the journal. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 14:34 ` James Bottomley 2025-03-24 19:28 ` Jan Kara @ 2025-03-24 20:56 ` Dave Chinner 1 sibling, 0 replies; 19+ messages in thread From: Dave Chinner @ 2025-03-24 20:56 UTC (permalink / raw) To: James Bottomley Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Mon, Mar 24, 2025 at 10:34:56AM -0400, James Bottomley wrote: > On Mon, 2025-03-24 at 12:38 +0100, Jan Kara wrote: > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > [...] > > > > Let me digest all that and see if we have more hope this time > > > > around. > > > > > > OK, I think I've gone over it all. The biggest problem with > > > resurrecting the patch was bugs in ext3, which isn't a problem now. > > > Most of the suspend system has been rearchitected to separate > > > suspending user space processes from kernel ones. The sync it > > > currently does occurs before even user processes are frozen. I > > > think > > > (as most of the original proposals did) that we just do freeze all > > > supers (using the reverse list) after user processes are frozen but > > > just before kernel threads are (this shouldn't perturb the image > > > allocation in hibernate, which was another source of bugs in xfs). > > > > So as far as my memory serves the fundamental problem with this > > approach was FUSE - once userspace is frozen, you cannot write to > > FUSE filesystems so filesystem freezing of FUSE would block if > > userspace is already suspended. You may even have a setup like: > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > another fs > > > > So you really have to be careful to freeze this stack without causing > > deadlocks. > > Ah, so that explains why the sys_sync() sits in suspend resume *before* > freezing userspace ... that always appeared odd to me. > > > So you need to be freezing userspace after filesystems are > > frozen but then you have to deal with the fact that parts of your > > userspace will be blocked in the kernel (trying to do some write) > > waiting for the filesystem to thaw. But it might be tractable these > > days since I have a vague recollection that system suspend is now > > able to gracefully handle even tasks in uninterruptible sleep. > > There is another thing I thought about: we don't actually have to > freeze across the sleep. Yes we do. Filesystems have background workers that do stuff even when the filesystem has been synced, and this can race with hibernate shutting stuff down. This is the whole reason we needed to move to filesystem freezing - to tell the filesystems to *temporarily stop dirtying* new objects. > It might be possible simply to invoke > freeze/thaw where sys_sync() is now done to get a better on stable > storage image? That should have fewer deadlock issues. A freeze/thaw cycle still allows the filesystems to dirty objects in the background whilst hibernate continues onwards assuming filesystem are all clean. It took a long time to get all those worms in the can, and we really don't want to let them back out.... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 11:38 ` [Lsf-pc] " Jan Kara 2025-03-24 14:34 ` James Bottomley @ 2025-03-24 20:50 ` Dave Chinner 2025-03-24 21:02 ` James Bottomley 1 sibling, 1 reply; 19+ messages in thread From: Dave Chinner @ 2025-03-24 20:50 UTC (permalink / raw) To: Jan Kara Cc: James Bottomley, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Mon, Mar 24, 2025 at 12:38:20PM +0100, Jan Kara wrote: > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > [...] > > > Let me digest all that and see if we have more hope this time around. > > > > OK, I think I've gone over it all. The biggest problem with > > resurrecting the patch was bugs in ext3, which isn't a problem now. > > Most of the suspend system has been rearchitected to separate > > suspending user space processes from kernel ones. The sync it > > currently does occurs before even user processes are frozen. I think > > (as most of the original proposals did) that we just do freeze all > > supers (using the reverse list) after user processes are frozen but > > just before kernel threads are (this shouldn't perturb the image > > allocation in hibernate, which was another source of bugs in xfs). > > So as far as my memory serves the fundamental problem with this approach > was FUSE - once userspace is frozen, you cannot write to FUSE filesystems > so filesystem freezing of FUSE would block if userspace is already > suspended. You may even have a setup like: > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- another fs > > So you really have to be careful to freeze this stack without causing > deadlocks. So you need to be freezing userspace after filesystems are > frozen but then you have to deal with the fact that parts of your userspace > will be blocked in the kernel (trying to do some write) waiting for the > filesystem to thaw. But it might be tractable these days since I have a > vague recollection that system suspend is now able to gracefully handle > even tasks in uninterruptible sleep. I thought we largely solved this problem with userspace flusher threads being able to call prctl(PR_IO_FLUSHER) to tell the kernel they are part of the IO stack and so need to be considered special from the POV of memory allocation and write (dirty page) throttling. Maybe hibernate needs to be aware of these userspace flusher tasks and only suspend them after filesystems are frozen instead of when userspace is initially halted? -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 20:50 ` Dave Chinner @ 2025-03-24 21:02 ` James Bottomley 2025-03-24 21:07 ` Dave Chinner 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2025-03-24 21:02 UTC (permalink / raw) To: Dave Chinner, Jan Kara Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Tue, 2025-03-25 at 07:50 +1100, Dave Chinner wrote: > On Mon, Mar 24, 2025 at 12:38:20PM +0100, Jan Kara wrote: > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > [...] > > > > Let me digest all that and see if we have more hope this time > > > > around. > > > > > > OK, I think I've gone over it all. The biggest problem with > > > resurrecting the patch was bugs in ext3, which isn't a problem > > > now. Most of the suspend system has been rearchitected to > > > separate suspending user space processes from kernel ones. The > > > sync it currently does occurs before even user processes are > > > frozen. I think (as most of the original proposals did) that we > > > just do freeze all supers (using the reverse list) after user > > > processes are frozen but just before kernel threads are (this > > > shouldn't perturb the image allocation in hibernate, which was > > > another source of bugs in xfs). > > > > So as far as my memory serves the fundamental problem with this > > approach was FUSE - once userspace is frozen, you cannot write to > > FUSE filesystems so filesystem freezing of FUSE would block if > > userspace is already suspended. You may even have a setup like: > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > another fs > > > > So you really have to be careful to freeze this stack without > > causing deadlocks. So you need to be freezing userspace after > > filesystems are frozen but then you have to deal with the fact that > > parts of your userspace will be blocked in the kernel (trying to do > > some write) waiting for the filesystem to thaw. But it might be > > tractable these days since I have a vague recollection that system > > suspend is now able to gracefully handle even tasks in > > uninterruptible sleep. > > I thought we largely solved this problem with userspace flusher > threads being able to call prctl(PR_IO_FLUSHER) to tell the kernel > they are part of the IO stack and so need to be considered > special from the POV of memory allocation and write (dirty page) > throttling. > > Maybe hibernate needs to be aware of these userspace flusher > tasks and only suspend them after filesystems are frozen instead > of when userspace is initially halted? I can confirm it's not. Its check for kernel thread is in kernel/power/process.c:try_to_freeze_tasks(). It really only uses the PF_KTHREAD flag in differentiating between user and kernel threads. But what I heard in the session was that we should freeze filesystems before any tasks because that means tasks touching the frozen fs freeze themselves. Regards, James ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 21:02 ` James Bottomley @ 2025-03-24 21:07 ` Dave Chinner 2025-03-25 13:42 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: Dave Chinner @ 2025-03-24 21:07 UTC (permalink / raw) To: James Bottomley Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Mon, Mar 24, 2025 at 05:02:54PM -0400, James Bottomley wrote: > On Tue, 2025-03-25 at 07:50 +1100, Dave Chinner wrote: > > On Mon, Mar 24, 2025 at 12:38:20PM +0100, Jan Kara wrote: > > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > > [...] > > > > > Let me digest all that and see if we have more hope this time > > > > > around. > > > > > > > > OK, I think I've gone over it all. The biggest problem with > > > > resurrecting the patch was bugs in ext3, which isn't a problem > > > > now. Most of the suspend system has been rearchitected to > > > > separate suspending user space processes from kernel ones. The > > > > sync it currently does occurs before even user processes are > > > > frozen. I think (as most of the original proposals did) that we > > > > just do freeze all supers (using the reverse list) after user > > > > processes are frozen but just before kernel threads are (this > > > > shouldn't perturb the image allocation in hibernate, which was > > > > another source of bugs in xfs). > > > > > > So as far as my memory serves the fundamental problem with this > > > approach was FUSE - once userspace is frozen, you cannot write to > > > FUSE filesystems so filesystem freezing of FUSE would block if > > > userspace is already suspended. You may even have a setup like: > > > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > > another fs > > > > > > So you really have to be careful to freeze this stack without > > > causing deadlocks. So you need to be freezing userspace after > > > filesystems are frozen but then you have to deal with the fact that > > > parts of your userspace will be blocked in the kernel (trying to do > > > some write) waiting for the filesystem to thaw. But it might be > > > tractable these days since I have a vague recollection that system > > > suspend is now able to gracefully handle even tasks in > > > uninterruptible sleep. > > > > I thought we largely solved this problem with userspace flusher > > threads being able to call prctl(PR_IO_FLUSHER) to tell the kernel > > they are part of the IO stack and so need to be considered > > special from the POV of memory allocation and write (dirty page) > > throttling. > > > > Maybe hibernate needs to be aware of these userspace flusher > > tasks and only suspend them after filesystems are frozen instead > > of when userspace is initially halted? > > I can confirm it's not. Its check for kernel thread is in > kernel/power/process.c:try_to_freeze_tasks(). It really only uses the > PF_KTHREAD flag in differentiating between user and kernel threads. > > But what I heard in the session was that we should freeze filesystems > before any tasks because that means tasks touching the frozen fs freeze > themselves. But that's exactly the behaviour that leads to FUSE based deadlocks, is it not? i.e. freeze the backing fs, then try to freeze the FUSE filesystem and the freeze blocks forever trying to write to the frozen backing fs.... What am I missing here? -Dave -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-24 21:07 ` Dave Chinner @ 2025-03-25 13:42 ` Jan Kara 2025-03-26 2:36 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2025-03-25 13:42 UTC (permalink / raw) To: Dave Chinner Cc: James Bottomley, Jan Kara, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Tue 25-03-25 08:07:52, Dave Chinner wrote: > On Mon, Mar 24, 2025 at 05:02:54PM -0400, James Bottomley wrote: > > On Tue, 2025-03-25 at 07:50 +1100, Dave Chinner wrote: > > > On Mon, Mar 24, 2025 at 12:38:20PM +0100, Jan Kara wrote: > > > > On Fri 21-03-25 13:00:24, James Bottomley via Lsf-pc wrote: > > > > > On Fri, 2025-03-21 at 08:34 -0400, James Bottomley wrote: > > > > > [...] > > > > > > Let me digest all that and see if we have more hope this time > > > > > > around. > > > > > > > > > > OK, I think I've gone over it all. The biggest problem with > > > > > resurrecting the patch was bugs in ext3, which isn't a problem > > > > > now. Most of the suspend system has been rearchitected to > > > > > separate suspending user space processes from kernel ones. The > > > > > sync it currently does occurs before even user processes are > > > > > frozen. I think (as most of the original proposals did) that we > > > > > just do freeze all supers (using the reverse list) after user > > > > > processes are frozen but just before kernel threads are (this > > > > > shouldn't perturb the image allocation in hibernate, which was > > > > > another source of bugs in xfs). > > > > > > > > So as far as my memory serves the fundamental problem with this > > > > approach was FUSE - once userspace is frozen, you cannot write to > > > > FUSE filesystems so filesystem freezing of FUSE would block if > > > > userspace is already suspended. You may even have a setup like: > > > > > > > > bdev <- fs <- FUSE filesystem <- loopback file <- loop device <- > > > > another fs > > > > > > > > So you really have to be careful to freeze this stack without > > > > causing deadlocks. So you need to be freezing userspace after > > > > filesystems are frozen but then you have to deal with the fact that > > > > parts of your userspace will be blocked in the kernel (trying to do > > > > some write) waiting for the filesystem to thaw. But it might be > > > > tractable these days since I have a vague recollection that system > > > > suspend is now able to gracefully handle even tasks in > > > > uninterruptible sleep. > > > > > > I thought we largely solved this problem with userspace flusher > > > threads being able to call prctl(PR_IO_FLUSHER) to tell the kernel > > > they are part of the IO stack and so need to be considered > > > special from the POV of memory allocation and write (dirty page) > > > throttling. > > > > > > Maybe hibernate needs to be aware of these userspace flusher > > > tasks and only suspend them after filesystems are frozen instead > > > of when userspace is initially halted? > > > > I can confirm it's not. Its check for kernel thread is in > > kernel/power/process.c:try_to_freeze_tasks(). It really only uses the > > PF_KTHREAD flag in differentiating between user and kernel threads. > > > > But what I heard in the session was that we should freeze filesystems > > before any tasks because that means tasks touching the frozen fs freeze > > themselves. > > But that's exactly the behaviour that leads to FUSE based deadlocks, > is it not? i.e. freeze the backing fs, then try to freeze the FUSE > filesystem and the freeze blocks forever trying to write to the > frozen backing fs.... > > What am I missing here? I don't think that creates FUSE based deadlocks. Whan you describe is generally a problem with the order of how filesystems are frozen and can happen with loop devices as well. If you leave userspace running and freeze filesystems in proper order (happens to be reverse ordering of superblock list), then you should freeze all filesystems without deadlocking. If I remember correctly, the problem in the past was, that if you leave userspace running while freezing filesystems, some processes may enter uninterruptible sleep waiting for fs to be thawed and in the past suspend code was not able to hibernate such processes. But I think this obstacle has been removed couple of years ago as now we could use TASK_FREEZABLE flag in sb_start_write() -> percpu_rwsem_wait and thus allow tasks blocked on frozen filesystem to be hibernated. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-25 13:42 ` Jan Kara @ 2025-03-26 2:36 ` James Bottomley 2025-03-26 14:59 ` Jan Kara 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2025-03-26 2:36 UTC (permalink / raw) To: Jan Kara, Dave Chinner Cc: Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Tue, 2025-03-25 at 14:42 +0100, Jan Kara wrote: [...] > If I remember correctly, the problem in the past was, that if you > leave userspace running while freezing filesystems, some processes > may enter uninterruptible sleep waiting for fs to be thawed and in > the past suspend code was not able to hibernate such processes. But I > think this obstacle has been removed couple of years ago as now we > could use TASK_FREEZABLE flag in sb_start_write() -> > percpu_rwsem_wait and thus allow tasks blocked on frozen filesystem > to be hibernated. I tested this and we do indeed deadlock hibernation on the processes touching the filesystem (systemd-journald actually). But if I make this change: diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index 6083883c4fe0..720418720bbc 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -156,7 +156,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) spin_unlock_irq(&sem->waiters.lock); while (wait) { - set_current_state(TASK_UNINTERRUPTIBLE); + set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); if (!smp_load_acquire(&wq_entry.private)) break; schedule(); Then everything will work, with no lockdep problems (thanks, Christian). Is that the change you want me to make or should sb_start_write be using a special freezable version of percpu_rwsem_wait()? Regards, James ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-26 2:36 ` James Bottomley @ 2025-03-26 14:59 ` Jan Kara 2025-03-26 15:25 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: Jan Kara @ 2025-03-26 14:59 UTC (permalink / raw) To: James Bottomley Cc: Jan Kara, Dave Chinner, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Tue 25-03-25 22:36:56, James Bottomley wrote: > On Tue, 2025-03-25 at 14:42 +0100, Jan Kara wrote: > [...] > > If I remember correctly, the problem in the past was, that if you > > leave userspace running while freezing filesystems, some processes > > may enter uninterruptible sleep waiting for fs to be thawed and in > > the past suspend code was not able to hibernate such processes. But I > > think this obstacle has been removed couple of years ago as now we > > could use TASK_FREEZABLE flag in sb_start_write() -> > > percpu_rwsem_wait and thus allow tasks blocked on frozen filesystem > > to be hibernated. > > I tested this and we do indeed deadlock hibernation on the processes > touching the filesystem (systemd-journald actually). But if I make > this change: > > diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c > index 6083883c4fe0..720418720bbc 100644 > --- a/kernel/locking/percpu-rwsem.c > +++ b/kernel/locking/percpu-rwsem.c > @@ -156,7 +156,7 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) > spin_unlock_irq(&sem->waiters.lock); > > while (wait) { > - set_current_state(TASK_UNINTERRUPTIBLE); > + set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); > if (!smp_load_acquire(&wq_entry.private)) > break; > schedule(); > > Then everything will work, with no lockdep problems (thanks, > Christian). Is that the change you want me to make or should > sb_start_write be using a special freezable version of > percpu_rwsem_wait()? I was thinking about this. The possible problem with this may be that a task waiting in percpu_rwsem_wait() is hibernated and if it holds another lock (e.g. some mutex) and there's another task waiting for this mutex, then hibernation fails because that other task cannot be hibernated. With sb_start_write() specifically, this is usually not a problem because this is the outermoust lock we take. The only catch here would be if a process is blocked in a write page fault for a frozen filesystem. Then we are holding mmap_sem for the process so hibernation could fail this way. But I'd guess this is rare enough that we could live with that possibility. So to summarize I think we may need to introduce freezable variant of percpu_rwsem_down_read() and use it in sb_start_write(). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-26 14:59 ` Jan Kara @ 2025-03-26 15:25 ` James Bottomley 2025-03-27 14:28 ` James Bottomley 0 siblings, 1 reply; 19+ messages in thread From: James Bottomley @ 2025-03-26 15:25 UTC (permalink / raw) To: Jan Kara Cc: Dave Chinner, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Wed, 2025-03-26 at 15:59 +0100, Jan Kara wrote: [...] > So to summarize I think we may need to introduce freezable variant of > percpu_rwsem_down_read() and use it in sb_start_write(). Aye, aye, sir! and thanks for making the can of worms bigger ... This is what I came up with for freezable variants of the sb_write_start(). I'm still building the kernel (laptop only ...) so I'll let you know in an hour or so if it actually works. Regards, James --- diff --git a/include/linux/fs.h b/include/linux/fs.h index dd84d1c3b8af..ce21d81c6e34 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1782,7 +1782,8 @@ static inline void __sb_end_write(struct super_block *sb, int level) static inline void __sb_start_write(struct super_block *sb, int level) { - percpu_down_read(sb->s_writers.rw_sem + level - 1); + percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, + level == SB_FREEZE_WRITE); } static inline bool __sb_start_write_trylock(struct super_block *sb, int level) diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index c012df33a9f0..a55fe709b832 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -42,9 +42,10 @@ is_static struct percpu_rw_semaphore name = { \ #define DEFINE_STATIC_PERCPU_RWSEM(name) \ __DEFINE_PERCPU_RWSEM(name, static) -extern bool __percpu_down_read(struct percpu_rw_semaphore *, bool); +extern bool __percpu_down_read(struct percpu_rw_semaphore *, bool, bool); -static inline void percpu_down_read(struct percpu_rw_semaphore *sem) +static inline void percpu_down_read_internal(struct percpu_rw_semaphore *sem, + bool freezable) { might_sleep(); @@ -62,7 +63,7 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem) if (likely(rcu_sync_is_idle(&sem->rss))) this_cpu_inc(*sem->read_count); else - __percpu_down_read(sem, false); /* Unconditional memory barrier */ + __percpu_down_read(sem, false, freezable); /* Unconditional memory barrier */ /* * The preempt_enable() prevents the compiler from * bleeding the critical section out. @@ -70,6 +71,17 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem) preempt_enable(); } +static inline void percpu_down_read(struct percpu_rw_semaphore *sem) +{ + percpu_down_read_internal(sem, false); +} + +static inline void percpu_down_read_freezable(struct percpu_rw_semaphore *sem, + bool freeze) +{ + percpu_down_read_internal(sem, freeze); +} + static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) { bool ret = true; @@ -81,7 +93,7 @@ static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) if (likely(rcu_sync_is_idle(&sem->rss))) this_cpu_inc(*sem->read_count); else - ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */ + ret = __percpu_down_read(sem, true, false); /* Unconditional memory barrier */ preempt_enable(); /* * The barrier() from preempt_enable() prevents the compiler from diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index 6083883c4fe0..890837b73476 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -138,7 +138,8 @@ static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entry, return !reader; /* wake (readers until) 1 writer */ } -static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) +static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader, + bool freeze) { DEFINE_WAIT_FUNC(wq_entry, percpu_rwsem_wake_function); bool wait; @@ -156,7 +157,8 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) spin_unlock_irq(&sem->waiters.lock); while (wait) { - set_current_state(TASK_UNINTERRUPTIBLE); + set_current_state(TASK_UNINTERRUPTIBLE | + freeze ? TASK_FREEZABLE : 0); if (!smp_load_acquire(&wq_entry.private)) break; schedule(); @@ -164,7 +166,8 @@ static void percpu_rwsem_wait(struct percpu_rw_semaphore *sem, bool reader) __set_current_state(TASK_RUNNING); } -bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try) +bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try, + bool freeze) { if (__percpu_down_read_trylock(sem)) return true; @@ -174,7 +177,7 @@ bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try) trace_contention_begin(sem, LCB_F_PERCPU | LCB_F_READ); preempt_enable(); - percpu_rwsem_wait(sem, /* .reader = */ true); + percpu_rwsem_wait(sem, /* .reader = */ true, freeze); preempt_disable(); trace_contention_end(sem, 0); @@ -237,7 +240,7 @@ void __sched percpu_down_write(struct percpu_rw_semaphore *sem) */ if (!__percpu_down_write_trylock(sem)) { trace_contention_begin(sem, LCB_F_PERCPU | LCB_F_WRITE); - percpu_rwsem_wait(sem, /* .reader = */ false); + percpu_rwsem_wait(sem, /* .reader = */ false, false); contended = true; } ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Filesystem Suspend Resume 2025-03-26 15:25 ` James Bottomley @ 2025-03-27 14:28 ` James Bottomley 0 siblings, 0 replies; 19+ messages in thread From: James Bottomley @ 2025-03-27 14:28 UTC (permalink / raw) To: Jan Kara Cc: Dave Chinner, Christoph Hellwig, linux-fsdevel, lsf-pc, Rafael J. Wysocki, Pavel Machek, Len Brown, linux-pm On Wed, 2025-03-26 at 11:25 -0400, James Bottomley wrote: > On Wed, 2025-03-26 at 15:59 +0100, Jan Kara wrote: > [...] > > So to summarize I think we may need to introduce freezable variant > > of > > percpu_rwsem_down_read() and use it in sb_start_write(). > > Aye, aye, sir! and thanks for making the can of worms bigger ... > > This is what I came up with for freezable variants of the > sb_write_start(). I'm still building the kernel (laptop only ...) so > I'll let you know in an hour or so if it actually works. Slightly longer than an hour, but I can confirm this all works. I've also tested it with filesystem on loop on filesystem (with ext4 as upper and lower) and it hibernates just fine running some fio stress. I've posted what I'm currently working with here: https://lore.kernel.org/all/20250327140613.25178-1-James.Bottomley@HansenPartnership.com/ So people can see what I'm currently playing with. Regards, James ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-03-27 17:31 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <0a76e074ef262ca857c61175dd3d0dc06b67ec42.camel@HansenPartnership.com>
[not found] ` <Z9xG2l8lm7ha3Pf2@infradead.org>
[not found] ` <acae7a99f8acb0ebf408bb6fc82ab53fb687559c.camel@HansenPartnership.com>
2025-03-21 5:23 ` [LSF/MM/BPF TOPIC] Filesystem Suspend Resume Christoph Hellwig
2025-03-21 12:34 ` James Bottomley
2025-03-21 17:00 ` James Bottomley
2025-03-21 17:17 ` Lukas Wunner
2025-03-21 18:20 ` James Bottomley
2025-03-24 11:38 ` [Lsf-pc] " Jan Kara
2025-03-24 14:34 ` James Bottomley
2025-03-24 19:28 ` Jan Kara
2025-03-27 14:55 ` Eric Sandeen
2025-03-27 17:30 ` Jan Kara
2025-03-24 20:56 ` Dave Chinner
2025-03-24 20:50 ` Dave Chinner
2025-03-24 21:02 ` James Bottomley
2025-03-24 21:07 ` Dave Chinner
2025-03-25 13:42 ` Jan Kara
2025-03-26 2:36 ` James Bottomley
2025-03-26 14:59 ` Jan Kara
2025-03-26 15:25 ` James Bottomley
2025-03-27 14:28 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox