* [LSF/MM TOPIC] Phasing out kernel thread freezing @ 2018-01-26 9:09 Luis R. Rodriguez 2018-01-31 19:10 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Luis R. Rodriguez @ 2018-01-26 9:09 UTC (permalink / raw) To: lsf-pc Cc: linux-fsdevel, mcgrof, Jan Kara, Dave Chinner, Jeff Layton, Rafael J. Wysocki, Bart Van Assche, Jiri Kosina Since the 2015 Kernel summit in South Korea we agreed that we should phase out the kernel thread freezer. This was due to the fact that filesystem freezing was originally added into the kernel to aid in going to suspend to ensure no unwanted IO activity would cause filesystem corruption, and we could instead replace this by using the already implemented filesystem suspend/thaw calls. Filesystems are not the only users of the freezer API now though. Although most uses outside of filesystems might be bogus, we're prone to hit many regressions with a wide sweep removal. Actually phasing out kernel thread freezing turns out to be trickier than expected even just in filesystems alone, so the current approach is to slowly phase this out one step at time. One subsystem and driver type at a time. Clearly the first subsystem we should tackle is filesystems. We now seems to have reached consensus on how to do this now for a few filesystems which implement freeze_fs() only. The outstanding work I have has to just do evaluation of the prospect use of sharing the same semantics to freeze as with freeze_bdev(), initiated by dm, and a proper way to address reference counting in a generic form for sb freezing. The only filesystems which implement freeze_fs(): o xfs o reiserfs o nilfs2 o jfs o f2fs o ext4 o ext2 o btrfs Of these, the following have freezer helpers, which can then be removed after the kernel automaticaly calls freeze_fs for us on suspend: o xfs o nilfs2 o jfs o f2fs o ext4 Long term we need to decide what to do with filesystem which do not implement freeze_fs(), or for instance filesystems which implement freeze_super(). Jan Kara made a few suggestions I'll be evaluating soon to this regards, however there are others special filesystem with other considerations though. As an example, for NFS Jeff Layton has suggested to have freeze_fs() make the RPC engine "park" newly issued RPCs for that fs' client onto a rpc_wait_queue. Any RPC that has already been sent however, we need to wait for a reply. Once everything is quiesced we can return and call it frozen. unfreeze_fs can then just have the engine stop parking RPCs and wake up the waitq. He however points out that if we're interested in making the cgroup freezer also work, then we may need to do a bit more work to ensure that we don't end up with frozen tasks squatting on VFS locks. Dave Chinner however notes that cgroup is broken by design *if* it requires tasks to be frozen without holding any VFS/filesystem lock context, and as such we *should* be able to ignore it. We also need to decide what to do with complex layered situations, for example Bart Van Assche suggested considering the case of a filesystem that exists on top of an md device where the md device uses one or more files as backing store and with the loop driver between the md device and the files. Chinner has suggested to allow block devices to freez superblocks on the block device, however some *may* prefer to have a call to allow a superblock to quiesce the underlying block device which would allow md/dm to suspend whatever on-going maintenance operations it has in progress until the filesystem suggests it needs to thaw. The pros / cons of both approaches should probably be discussed unless its already crystal clear what path to take. Finally, we should evaluate any other potential uses of the kernel freezer API which now have grown dependent on it, even though the design for it was only to help avoid filesystem corruption on our way to suspend. If none have really become dependent on them, then great, we can just remove them one at a time subsystem at a time to avoid regressions. Luis ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-01-26 9:09 [LSF/MM TOPIC] Phasing out kernel thread freezing Luis R. Rodriguez @ 2018-01-31 19:10 ` Darrick J. Wong 2018-02-04 22:41 ` Bart Van Assche 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2018-01-31 19:10 UTC (permalink / raw) To: Luis R. Rodriguez Cc: lsf-pc, linux-fsdevel, Jan Kara, Dave Chinner, Jeff Layton, Rafael J. Wysocki, Bart Van Assche, Jiri Kosina On Fri, Jan 26, 2018 at 10:09:23AM +0100, Luis R. Rodriguez wrote: > Since the 2015 Kernel summit in South Korea we agreed that we should phase out > the kernel thread freezer. This was due to the fact that filesystem freezing > was originally added into the kernel to aid in going to suspend to ensure no > unwanted IO activity would cause filesystem corruption, and we could instead > replace this by using the already implemented filesystem suspend/thaw calls. > > Filesystems are not the only users of the freezer API now though. Although > most uses outside of filesystems might be bogus, we're prone to hit many > regressions with a wide sweep removal. Actually phasing out kernel thread > freezing turns out to be trickier than expected even just in filesystems alone, > so the current approach is to slowly phase this out one step at time. One > subsystem and driver type at a time. Clearly the first subsystem we should > tackle is filesystems. > > We now seems to have reached consensus on how to do this now for a few > filesystems which implement freeze_fs() only. The outstanding work I have has > to just do evaluation of the prospect use of sharing the same semantics to > freeze as with freeze_bdev(), initiated by dm, and a proper way to address > reference counting in a generic form for sb freezing. The only filesystems > which implement freeze_fs(): > > o xfs > o reiserfs > o nilfs2 > o jfs > o f2fs > o ext4 > o ext2 > o btrfs > > Of these, the following have freezer helpers, which can then be removed after > the kernel automaticaly calls freeze_fs for us on suspend: > > o xfs > o nilfs2 > o jfs > o f2fs > o ext4 > > Long term we need to decide what to do with filesystem which do not implement > freeze_fs(), or for instance filesystems which implement freeze_super(). Jan > Kara made a few suggestions I'll be evaluating soon to this regards, however > there are others special filesystem with other considerations though. As an > example, for NFS Jeff Layton has suggested to have freeze_fs() make the RPC > engine "park" newly issued RPCs for that fs' client onto a rpc_wait_queue. Any > RPC that has already been sent however, we need to wait for a reply. Once > everything is quiesced we can return and call it frozen. unfreeze_fs can then > just have the engine stop parking RPCs and wake up the waitq. He however points > out that if we're interested in making the cgroup freezer also work, then we > may need to do a bit more work to ensure that we don't end up with frozen tasks > squatting on VFS locks. Dave Chinner however notes that cgroup is broken by > design *if* it requires tasks to be frozen without holding any VFS/filesystem > lock context, and as such we *should* be able to ignore it. > > We also need to decide what to do with complex layered situations, for example > Bart Van Assche suggested considering the case of a filesystem that exists on > top of an md device where the md device uses one or more files as backing store > and with the loop driver between the md device and the files. Chinner has > suggested to allow block devices to freez superblocks on the block device, > however some *may* prefer to have a call to allow a superblock to quiesce the > underlying block device which would allow md/dm to suspend whatever on-going > maintenance operations it has in progress until the filesystem suggests it > needs to thaw. The pros / cons of both approaches should probably be discussed > unless its already crystal clear what path to take. For a brief moment I pondered whether it would make sense to make filesystems part of the device model so that the suspend code could work out fs <-> bdev dependencies and know in which order to freeze filesystems and quiesce devices, but every time I go digging into how all those macros work I get confused and my eyes glaze over, so I don't know if this is at all a good idea or just confused ramblings. Maybe it would suffice to start freezing in reverse order of mount and have some way to tell the underlying bdev that it should flush/quiesce/whatever itself? --D > Finally, we should evaluate any other potential uses of the kernel freezer API > which now have grown dependent on it, even though the design for it was only to > help avoid filesystem corruption on our way to suspend. If none have really > become dependent on them, then great, we can just remove them one at a time > subsystem at a time to avoid regressions. > > Luis ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-01-31 19:10 ` Darrick J. Wong @ 2018-02-04 22:41 ` Bart Van Assche 2018-02-05 8:28 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Bart Van Assche @ 2018-02-04 22:41 UTC (permalink / raw) To: darrick.wong@oracle.com, mcgrof@kernel.org Cc: jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, rafael@kernel.org On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote: > For a brief moment I pondered whether it would make sense to make > filesystems part of the device model so that the suspend code could work > out fs <-> bdev dependencies and know in which order to freeze > filesystems and quiesce devices, but every time I go digging into how > all those macros work I get confused and my eyes glaze over, so I don't > know if this is at all a good idea or just confused ramblings. If we have to go this way: shouldn't we introduce a new abstraction ("storage stack element" or similar) rather than making filesystems part of the device model? Thanks, Bart. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-04 22:41 ` Bart Van Assche @ 2018-02-05 8:28 ` Rafael J. Wysocki 2018-02-24 3:27 ` Luis R. Rodriguez 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2018-02-05 8:28 UTC (permalink / raw) To: Bart Van Assche Cc: darrick.wong@oracle.com, mcgrof@kernel.org, jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, rafael@kernel.org On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote: >> For a brief moment I pondered whether it would make sense to make >> filesystems part of the device model so that the suspend code could work >> out fs <-> bdev dependencies and know in which order to freeze >> filesystems and quiesce devices, but every time I go digging into how >> all those macros work I get confused and my eyes glaze over, so I don't >> know if this is at all a good idea or just confused ramblings. > > If we have to go this way: shouldn't we introduce a new abstraction > ("storage stack element" or similar) rather than making filesystems part of > the device model? That would be my approach. Trying to "suspend" filesystems at the same time as I/O devices (and all of that asynchronously) may be problematic for ordering reasons and similar. Moreover, during hibernation devices are suspended for two times (and resumed in between, of course) whereas filesystems only need to be "suspended" once. With that in mind, I would add a mechanism allowing filesystems (and possibly other components of the storage stack) to register a set of callbacks for suspend and resume and then invoking those callbacks in a specific order. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-05 8:28 ` Rafael J. Wysocki @ 2018-02-24 3:27 ` Luis R. Rodriguez 2018-02-25 9:45 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Luis R. Rodriguez @ 2018-02-24 3:27 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Bart Van Assche, darrick.wong@oracle.com, mcgrof@kernel.org, jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote: > On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote: > >> For a brief moment I pondered whether it would make sense to make > >> filesystems part of the device model so that the suspend code could work > >> out fs <-> bdev dependencies and know in which order to freeze > >> filesystems and quiesce devices, but every time I go digging into how > >> all those macros work I get confused and my eyes glaze over, so I don't > >> know if this is at all a good idea or just confused ramblings. > > > > If we have to go this way: shouldn't we introduce a new abstraction > > ("storage stack element" or similar) rather than making filesystems part of > > the device model? > > That would be my approach. > > Trying to "suspend" filesystems at the same time as I/O devices (and > all of that asynchronously) may be problematic for ordering reasons > and similar. Oh look, another ordering issue. And this is why I was not a fan of the device link API even though that is what we got merged. Moving on... > Moreover, during hibernation devices are suspended for two times (and > resumed in between, of course) whereas filesystems only need to be > "suspended" once. >From your point of view yes, but actually internally the VFS layer or filesystems themselves may end up re-using this mechanism later for other things like -- snapshotting. And if some folks have it the way they want it, we may need a dependency map between filesystems anyway for filesystem specific reasons. > With that in mind, I would add a mechanism allowing filesystems (and > possibly other components of the storage stack) to register a set of > callbacks for suspend and resume and then invoking those callbacks in > a specific order. That's what I had done in my series, the issue here is order. Order in my series is simple but should work for starters, later however I suspect we'll need something more robust to help. Luis ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-24 3:27 ` Luis R. Rodriguez @ 2018-02-25 9:45 ` Rafael J. Wysocki 2018-02-25 17:22 ` Luis R. Rodriguez 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2018-02-25 9:45 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Rafael J. Wysocki, Bart Van Assche, darrick.wong@oracle.com, jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote: >> On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: >> > On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote: >> >> For a brief moment I pondered whether it would make sense to make >> >> filesystems part of the device model so that the suspend code could work >> >> out fs <-> bdev dependencies and know in which order to freeze >> >> filesystems and quiesce devices, but every time I go digging into how >> >> all those macros work I get confused and my eyes glaze over, so I don't >> >> know if this is at all a good idea or just confused ramblings. >> > >> > If we have to go this way: shouldn't we introduce a new abstraction >> > ("storage stack element" or similar) rather than making filesystems part of >> > the device model? >> >> That would be my approach. >> >> Trying to "suspend" filesystems at the same time as I/O devices (and >> all of that asynchronously) may be problematic for ordering reasons >> and similar. > > Oh look, another ordering issue. And this is why I was not a fan of the > device link API even though that is what we got merged. Moving on... > >> Moreover, during hibernation devices are suspended for two times (and >> resumed in between, of course) whereas filesystems only need to be >> "suspended" once. > > From your point of view yes, but actually internally the VFS layer or > filesystems themselves may end up re-using this mechanism later for > other things like -- snapshotting. And if some folks have it the way > they want it, we may need a dependency map between filesystems anyway > for filesystem specific reasons. That's orthogonal to what I said. A dependency map between filesystems and other components of the block layer (like md, dm etc) will be necessary going forward (if all of the suspending and resuming of them is expected to be reliable anyway), but that doesn't change hibernation-related requirements one whit. Filesystems need to be suspended (or frozen or whatever terminology ends up being used for that) *before* creating a hibernation image and they *cannot* be resumed (unfrozen etc) after that until the system is off or the kernel decides that the hibernation has failed and rolls back. Whatever data/metadata are there in persistent storage before the image is created, changing them after that point is potentially critically harmful, so (in the hibernation case) all of the in-flight I/O that may end up being written to persistent storage needs to be flushed before creating the image. However, *devices* are resumed after creating the image so that the image itself can be written to persistent storage and are suspended after that again before putting the system to sleep (for wakeup to work, among other things). That's why suspend/resume of filesystems cannot be tied to suspend/resume of devices. Note that this isn't the case for system suspend/resume (suspend-to-RAM or suspend-to-idle). >> With that in mind, I would add a mechanism allowing filesystems (and >> possibly other components of the storage stack) to register a set of >> callbacks for suspend and resume and then invoking those callbacks in >> a specific order. > > That's what I had done in my series, the issue here is order. Order in my > series is simple but should work for starters, later however I suspect we'll > need something more robust to help. Quite likely. Thanks, Rafael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-25 9:45 ` Rafael J. Wysocki @ 2018-02-25 17:22 ` Luis R. Rodriguez 2018-02-26 9:25 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Luis R. Rodriguez @ 2018-02-25 17:22 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Luis R. Rodriguez, Bart Van Assche, darrick.wong@oracle.com, jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org On Sun, Feb 25, 2018 at 10:45:26AM +0100, Rafael J. Wysocki wrote: > On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote: > >> Moreover, during hibernation devices are suspended for two times (and > >> resumed in between, of course) whereas filesystems only need to be > >> "suspended" once. > > > > From your point of view yes, but actually internally the VFS layer or > > filesystems themselves may end up re-using this mechanism later for > > other things like -- snapshotting. And if some folks have it the way > > they want it, we may need a dependency map between filesystems anyway > > for filesystem specific reasons. > > That's orthogonal to what I said. <-- snip --> > However, *devices* are resumed after creating the image so that the > image itself can be written to persistent storage and are suspended > after that again before putting the system to sleep (for wakeup to > work, among other things). > > That's why suspend/resume of filesystems cannot be tied to > suspend/resume of devices. Ah, yes, I see your point now. So for filesystems we really don't care if its suspend or hibernation, we just need to freeze in the right order. So long as we get that order right we should be OK. Curious -- do we resume *all* devices after creating the image for hibernation today? Not that I am advocating using devices for this mechanism or resolution for filesystems, I'm just curious as we're on the topic. Luis ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-25 17:22 ` Luis R. Rodriguez @ 2018-02-26 9:25 ` Rafael J. Wysocki 2018-02-26 12:44 ` Jan Kara 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2018-02-26 9:25 UTC (permalink / raw) To: Luis R. Rodriguez Cc: Rafael J. Wysocki, Bart Van Assche, darrick.wong@oracle.com, jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org On Sun, Feb 25, 2018 at 6:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > On Sun, Feb 25, 2018 at 10:45:26AM +0100, Rafael J. Wysocki wrote: >> On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >> > On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote: >> >> Moreover, during hibernation devices are suspended for two times (and >> >> resumed in between, of course) whereas filesystems only need to be >> >> "suspended" once. >> > >> > From your point of view yes, but actually internally the VFS layer or >> > filesystems themselves may end up re-using this mechanism later for >> > other things like -- snapshotting. And if some folks have it the way >> > they want it, we may need a dependency map between filesystems anyway >> > for filesystem specific reasons. >> >> That's orthogonal to what I said. > > <-- snip --> > >> However, *devices* are resumed after creating the image so that the >> image itself can be written to persistent storage and are suspended >> after that again before putting the system to sleep (for wakeup to >> work, among other things). >> >> That's why suspend/resume of filesystems cannot be tied to >> suspend/resume of devices. > > Ah, yes, I see your point now. So for filesystems we really don't > care if its suspend or hibernation, we just need to freeze in the right > order. So long as we get that order right we should be OK. Generally, yes. System suspend/resume (S2R, S2I), however, doesn't really require the flushing part, so in principle the full-blown fs freezing is not necessary in that case, strictly speaking. The ordering of writes still has to be preserved in this case (that is, writes to persistent storage in the presence of a suspend-resume cycle must occur in the same order as without the suspend-resume), but it doesn't matter too much when the writes actually happen. They may be carried out before the suspend or after the resume or somewhere during one of them: all should be fine so long as the ordering of writes doesn't change as a result of the suspend-resume (and suspend-resume failures are like surprise resets from the fs perspective in that case, so journaling should be sufficient to recover from them). Of course, the full-blown freezing will work for system suspend too, but it may be high-latency which is not desirable in some scenarios utilizing system suspend/resume ("dark resume" or "lucid sleep" scenarios, for example). That will be a concern in the long run (it kind of is a concern already today), so I would consider special-casing it from the outset. > Curious -- do we resume *all* devices after creating the image for hibernation > today? Yes, we do. It is not straightforward to determine which devices will be necessary to save the image in general and we would need to resume a significant subset of the device hierarchy for this purpose anyway. Thanks, Rafael ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-26 9:25 ` Rafael J. Wysocki @ 2018-02-26 12:44 ` Jan Kara 2018-02-26 13:27 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Jan Kara @ 2018-02-26 12:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Luis R. Rodriguez, Bart Van Assche, darrick.wong@oracle.com, jlayton@redhat.com, jikos@kernel.org, jack@suse.cz, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org On Mon 26-02-18 10:25:55, Rafael J. Wysocki wrote: > On Sun, Feb 25, 2018 at 6:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: > > Ah, yes, I see your point now. So for filesystems we really don't > > care if its suspend or hibernation, we just need to freeze in the right > > order. So long as we get that order right we should be OK. > > Generally, yes. > > System suspend/resume (S2R, S2I), however, doesn't really require the > flushing part, so in principle the full-blown fs freezing is not > necessary in that case, strictly speaking. The ordering of writes > still has to be preserved in this case (that is, writes to persistent > storage in the presence of a suspend-resume cycle must occur in the > same order as without the suspend-resume), but it doesn't matter too > much when the writes actually happen. I agree that in principle we don't have to flush all dirty data from page cache before system suspend (I believe this is what you are speaking about here, isn't it?). However from implementation POV it is much simpler that way as otherwise processes get blocked in the kernel in unexpected places waiting for locks held by blocked flusher threads etc. > They may be carried out before > the suspend or after the resume or somewhere during one of them: all > should be fine so long as the ordering of writes doesn't change as a > result of the suspend-resume (and suspend-resume failures are like > surprise resets from the fs perspective in that case, so journaling > should be sufficient to recover from them). Err, I don't follow how suspend failure looks like surprise reset to the fs. If suspend fails, we just thaw the filesystem and off we go, no IO lost. If you speak about a situation where you suspend but then boot without resuming - yes, that looks like a surprise reset and it behaves that way already today. > Of course, the full-blown freezing will work for system suspend too, > but it may be high-latency which is not desirable in some scenarios > utilizing system suspend/resume ("dark resume" or "lucid sleep" > scenarios, for example). That will be a concern in the long run (it > kind of is a concern already today), so I would consider > special-casing it from the outset. Understood but that would require considerable amount of work on the fs side and the problem is hard enough as is :) And switching the freeze implementation to avoid sync(2) if asked to is quite independent from implementing system suspend to use fs freezing. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing 2018-02-26 12:44 ` Jan Kara @ 2018-02-26 13:27 ` Rafael J. Wysocki 0 siblings, 0 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2018-02-26 13:27 UTC (permalink / raw) To: Jan Kara Cc: Rafael J. Wysocki, Luis R. Rodriguez, Bart Van Assche, darrick.wong@oracle.com, jlayton@redhat.com, jikos@kernel.org, david@fromorbit.com, lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org On Mon, Feb 26, 2018 at 1:44 PM, Jan Kara <jack@suse.cz> wrote: > On Mon 26-02-18 10:25:55, Rafael J. Wysocki wrote: >> On Sun, Feb 25, 2018 at 6:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote: >> > Ah, yes, I see your point now. So for filesystems we really don't >> > care if its suspend or hibernation, we just need to freeze in the right >> > order. So long as we get that order right we should be OK. >> >> Generally, yes. >> >> System suspend/resume (S2R, S2I), however, doesn't really require the >> flushing part, so in principle the full-blown fs freezing is not >> necessary in that case, strictly speaking. The ordering of writes >> still has to be preserved in this case (that is, writes to persistent >> storage in the presence of a suspend-resume cycle must occur in the >> same order as without the suspend-resume), but it doesn't matter too >> much when the writes actually happen. > > I agree that in principle we don't have to flush all dirty data from page > cache before system suspend (I believe this is what you are speaking about > here, isn't it?). Yes, it is. > However from implementation POV it is much simpler that > way as otherwise processes get blocked in the kernel in unexpected places > waiting for locks held by blocked flusher threads etc. Understood. >> They may be carried out before >> the suspend or after the resume or somewhere during one of them: all >> should be fine so long as the ordering of writes doesn't change as a >> result of the suspend-resume (and suspend-resume failures are like >> surprise resets from the fs perspective in that case, so journaling >> should be sufficient to recover from them). > > Err, I don't follow how suspend failure looks like surprise reset to the > fs. If suspend fails, we just thaw the filesystem and off we go, no IO > lost. Right. > If you speak about a situation where you suspend but then boot > without resuming - yes, that looks like a surprise reset and it behaves > that way already today. I was talking about the latter. >> Of course, the full-blown freezing will work for system suspend too, >> but it may be high-latency which is not desirable in some scenarios >> utilizing system suspend/resume ("dark resume" or "lucid sleep" >> scenarios, for example). That will be a concern in the long run (it >> kind of is a concern already today), so I would consider >> special-casing it from the outset. > > Understood but that would require considerable amount of work on the fs > side and the problem is hard enough as is :) Fair enough. :-) > And switching the freeze implementation to avoid sync(2) if asked to is quite > independent from implementing system suspend to use fs freezing. Agreed. Thanks, Rafael ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-02-26 13:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-01-26 9:09 [LSF/MM TOPIC] Phasing out kernel thread freezing Luis R. Rodriguez 2018-01-31 19:10 ` Darrick J. Wong 2018-02-04 22:41 ` Bart Van Assche 2018-02-05 8:28 ` Rafael J. Wysocki 2018-02-24 3:27 ` Luis R. Rodriguez 2018-02-25 9:45 ` Rafael J. Wysocki 2018-02-25 17:22 ` Luis R. Rodriguez 2018-02-26 9:25 ` Rafael J. Wysocki 2018-02-26 12:44 ` Jan Kara 2018-02-26 13:27 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).