[LSF/MM TOPIC] Phasing out kernel thread freezing

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM TOPIC] Phasing out kernel thread freezing
@ 2018-01-26  9:09 Luis R. Rodriguez
  2018-01-31 19:10 ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Luis R. Rodriguez @ 2018-01-26  9:09 UTC (permalink / raw)
  To: lsf-pc
  Cc: linux-fsdevel, mcgrof, Jan Kara, Dave Chinner, Jeff Layton,
	Rafael J. Wysocki, Bart Van Assche, Jiri Kosina

Since the 2015 Kernel summit in South Korea we agreed that we should phase out
the kernel thread freezer. This was due to the fact that filesystem freezing
was originally added into the kernel to aid in going to suspend to ensure no
unwanted IO activity would cause filesystem corruption, and we could instead
replace this by using the already implemented filesystem suspend/thaw calls.

Filesystems are not the only users of the freezer API now though. Although
most uses outside of filesystems might be bogus, we're prone to hit many
regressions with a wide sweep removal. Actually phasing out kernel thread
freezing turns out to be trickier than expected even just in filesystems alone,
so the current approach is to slowly phase this out one step at time. One
subsystem and driver type at a time. Clearly the first subsystem we should
tackle is filesystems.

We now seems to have reached consensus on how to do this now for a few
filesystems which implement freeze_fs() only. The outstanding work I have has
to just do evaluation of the prospect use of sharing the same semantics to
freeze as with freeze_bdev(), initiated by dm, and a proper way to address
reference counting in a generic form for sb freezing. The only filesystems
which implement freeze_fs():

  o xfs
  o reiserfs
  o nilfs2
  o jfs
  o f2fs
  o ext4
  o ext2
  o btrfs

Of these, the following have freezer helpers, which can then be removed after
the kernel automaticaly calls freeze_fs for us on suspend:

  o xfs                                                                                                                                                                                       
  o nilfs2                                                                                                                                                                                    
  o jfs                                                                                                                                                                                       
  o f2fs                                                                                                                                                                                      
  o ext4 

Long term we need to decide what to do with filesystem which do not implement
freeze_fs(), or for instance filesystems which implement freeze_super(). Jan
Kara made a few suggestions I'll be evaluating soon to this regards, however
there are others special filesystem with other considerations though.  As an
example, for NFS Jeff Layton has suggested to have freeze_fs() make the RPC
engine "park" newly issued RPCs for that fs' client onto a rpc_wait_queue.  Any
RPC that has already been sent however, we need to wait for a reply. Once
everything is quiesced we can return and call it frozen.  unfreeze_fs can then
just have the engine stop parking RPCs and wake up the waitq. He however points
out that if we're interested in making the cgroup freezer also work, then we
may need to do a bit more work to ensure that we don't end up with frozen tasks
squatting on VFS locks. Dave Chinner however notes that cgroup is broken by
design *if* it requires tasks to be frozen without holding any VFS/filesystem
lock context, and as such we *should* be able to ignore it.

We also need to decide what to do with complex layered situations, for example
Bart Van Assche suggested considering the case of a filesystem that exists on
top of an md device where the md device uses one or more files as backing store
and with the loop driver between the md device and the files. Chinner has
suggested to allow block devices to freez superblocks on the block device,
however some *may* prefer to have a call to allow a superblock to quiesce the
underlying block device which would allow md/dm to suspend whatever on-going
maintenance operations it has in progress until the filesystem suggests it
needs to thaw. The pros / cons of both approaches should probably be discussed
unless its already crystal clear what path to take.

Finally, we should evaluate any other potential uses of the kernel freezer API
which now have grown dependent on it, even though the design for it was only to
help avoid filesystem corruption on our way to suspend. If none have really
become dependent on them, then great, we can just remove them one at a time
subsystem at a time to avoid regressions.

  Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-01-26  9:09 [LSF/MM TOPIC] Phasing out kernel thread freezing Luis R. Rodriguez
@ 2018-01-31 19:10 ` Darrick J. Wong
  2018-02-04 22:41   ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2018-01-31 19:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: lsf-pc, linux-fsdevel, Jan Kara, Dave Chinner, Jeff Layton,
	Rafael J. Wysocki, Bart Van Assche, Jiri Kosina

On Fri, Jan 26, 2018 at 10:09:23AM +0100, Luis R. Rodriguez wrote:
> Since the 2015 Kernel summit in South Korea we agreed that we should phase out
> the kernel thread freezer. This was due to the fact that filesystem freezing
> was originally added into the kernel to aid in going to suspend to ensure no
> unwanted IO activity would cause filesystem corruption, and we could instead
> replace this by using the already implemented filesystem suspend/thaw calls.
> 
> Filesystems are not the only users of the freezer API now though. Although
> most uses outside of filesystems might be bogus, we're prone to hit many
> regressions with a wide sweep removal. Actually phasing out kernel thread
> freezing turns out to be trickier than expected even just in filesystems alone,
> so the current approach is to slowly phase this out one step at time. One
> subsystem and driver type at a time. Clearly the first subsystem we should
> tackle is filesystems.
> 
> We now seems to have reached consensus on how to do this now for a few
> filesystems which implement freeze_fs() only. The outstanding work I have has
> to just do evaluation of the prospect use of sharing the same semantics to
> freeze as with freeze_bdev(), initiated by dm, and a proper way to address
> reference counting in a generic form for sb freezing. The only filesystems
> which implement freeze_fs():
> 
>   o xfs
>   o reiserfs
>   o nilfs2
>   o jfs
>   o f2fs
>   o ext4
>   o ext2
>   o btrfs
> 
> Of these, the following have freezer helpers, which can then be removed after
> the kernel automaticaly calls freeze_fs for us on suspend:
>                                                                                                                                                                                               
>   o xfs                                                                                                                                                                                       
>   o nilfs2                                                                                                                                                                                    
>   o jfs                                                                                                                                                                                       
>   o f2fs                                                                                                                                                                                      
>   o ext4 
> 
> Long term we need to decide what to do with filesystem which do not implement
> freeze_fs(), or for instance filesystems which implement freeze_super(). Jan
> Kara made a few suggestions I'll be evaluating soon to this regards, however
> there are others special filesystem with other considerations though.  As an
> example, for NFS Jeff Layton has suggested to have freeze_fs() make the RPC
> engine "park" newly issued RPCs for that fs' client onto a rpc_wait_queue.  Any
> RPC that has already been sent however, we need to wait for a reply. Once
> everything is quiesced we can return and call it frozen.  unfreeze_fs can then
> just have the engine stop parking RPCs and wake up the waitq. He however points
> out that if we're interested in making the cgroup freezer also work, then we
> may need to do a bit more work to ensure that we don't end up with frozen tasks
> squatting on VFS locks. Dave Chinner however notes that cgroup is broken by
> design *if* it requires tasks to be frozen without holding any VFS/filesystem
> lock context, and as such we *should* be able to ignore it.
> 
> We also need to decide what to do with complex layered situations, for example
> Bart Van Assche suggested considering the case of a filesystem that exists on
> top of an md device where the md device uses one or more files as backing store
> and with the loop driver between the md device and the files. Chinner has
> suggested to allow block devices to freez superblocks on the block device,
> however some *may* prefer to have a call to allow a superblock to quiesce the
> underlying block device which would allow md/dm to suspend whatever on-going
> maintenance operations it has in progress until the filesystem suggests it
> needs to thaw. The pros / cons of both approaches should probably be discussed
> unless its already crystal clear what path to take.

For a brief moment I pondered whether it would make sense to make
filesystems part of the device model so that the suspend code could work
out fs <-> bdev dependencies and know in which order to freeze
filesystems and quiesce devices, but every time I go digging into how
all those macros work I get confused and my eyes glaze over, so I don't
know if this is at all a good idea or just confused ramblings.

Maybe it would suffice to start freezing in reverse order of mount and
have some way to tell the underlying bdev that it should
flush/quiesce/whatever itself?

--D

> Finally, we should evaluate any other potential uses of the kernel freezer API
> which now have grown dependent on it, even though the design for it was only to
> help avoid filesystem corruption on our way to suspend. If none have really
> become dependent on them, then great, we can just remove them one at a time
> subsystem at a time to avoid regressions.
> 
>   Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-01-31 19:10 ` Darrick J. Wong
@ 2018-02-04 22:41   ` Bart Van Assche
  2018-02-05  8:28     ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2018-02-04 22:41 UTC (permalink / raw)
  To: darrick.wong@oracle.com, mcgrof@kernel.org
  Cc: jlayton@redhat.com, jikos@kernel.org, jack@suse.cz,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org, rafael@kernel.org

On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote:
> For a brief moment I pondered whether it would make sense to make
> filesystems part of the device model so that the suspend code could work
> out fs <-> bdev dependencies and know in which order to freeze
> filesystems and quiesce devices, but every time I go digging into how
> all those macros work I get confused and my eyes glaze over, so I don't
> know if this is at all a good idea or just confused ramblings.

If we have to go this way: shouldn't we introduce a new abstraction
("storage stack element" or similar) rather than making filesystems part of
the device model?

Thanks,

Bart.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-04 22:41   ` Bart Van Assche
@ 2018-02-05  8:28     ` Rafael J. Wysocki
  2018-02-24  3:27       ` Luis R. Rodriguez
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2018-02-05  8:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: darrick.wong@oracle.com, mcgrof@kernel.org, jlayton@redhat.com,
	jikos@kernel.org, jack@suse.cz, david@fromorbit.com,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	rafael@kernel.org

On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote:
>> For a brief moment I pondered whether it would make sense to make
>> filesystems part of the device model so that the suspend code could work
>> out fs <-> bdev dependencies and know in which order to freeze
>> filesystems and quiesce devices, but every time I go digging into how
>> all those macros work I get confused and my eyes glaze over, so I don't
>> know if this is at all a good idea or just confused ramblings.
>
> If we have to go this way: shouldn't we introduce a new abstraction
> ("storage stack element" or similar) rather than making filesystems part of
> the device model?

That would be my approach.

Trying to "suspend" filesystems at the same time as I/O devices (and
all of that asynchronously) may be problematic for ordering reasons
and similar.

Moreover, during hibernation devices are suspended for two times (and
resumed in between, of course) whereas filesystems only need to be
"suspended" once.

With that in mind, I would add a mechanism allowing filesystems (and
possibly other components of the storage stack) to register a set of
callbacks for suspend and resume and then invoking those callbacks in
a specific order.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-05  8:28     ` Rafael J. Wysocki
@ 2018-02-24  3:27       ` Luis R. Rodriguez
  2018-02-25  9:45         ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Luis R. Rodriguez @ 2018-02-24  3:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bart Van Assche, darrick.wong@oracle.com, mcgrof@kernel.org,
	jlayton@redhat.com, jikos@kernel.org, jack@suse.cz,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org

On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote:
> On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> > On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote:
> >> For a brief moment I pondered whether it would make sense to make
> >> filesystems part of the device model so that the suspend code could work
> >> out fs <-> bdev dependencies and know in which order to freeze
> >> filesystems and quiesce devices, but every time I go digging into how
> >> all those macros work I get confused and my eyes glaze over, so I don't
> >> know if this is at all a good idea or just confused ramblings.
> >
> > If we have to go this way: shouldn't we introduce a new abstraction
> > ("storage stack element" or similar) rather than making filesystems part of
> > the device model?
> 
> That would be my approach.
> 
> Trying to "suspend" filesystems at the same time as I/O devices (and
> all of that asynchronously) may be problematic for ordering reasons
> and similar.

Oh look, another ordering issue. And this is why I was not a fan of the
device link API even though that is what we got merged. Moving on...

> Moreover, during hibernation devices are suspended for two times (and
> resumed in between, of course) whereas filesystems only need to be
> "suspended" once.

>From your point of view yes, but actually internally the VFS layer or
filesystems themselves may end up re-using this mechanism later for
other things like -- snapshotting. And if some folks have it the way
they want it, we may need a dependency map between filesystems anyway
for filesystem specific reasons.

> With that in mind, I would add a mechanism allowing filesystems (and
> possibly other components of the storage stack) to register a set of
> callbacks for suspend and resume and then invoking those callbacks in
> a specific order.

That's what I had done in my series, the issue here is order. Order in my
series is simple but should work for starters, later however I suspect we'll
need something more robust to help.

  Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-24  3:27       ` Luis R. Rodriguez
@ 2018-02-25  9:45         ` Rafael J. Wysocki
  2018-02-25 17:22           ` Luis R. Rodriguez
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2018-02-25  9:45 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Rafael J. Wysocki, Bart Van Assche, darrick.wong@oracle.com,
	jlayton@redhat.com, jikos@kernel.org, jack@suse.cz,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org

On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote:
>> On Sun, Feb 4, 2018 at 11:41 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>> > On Wed, 2018-01-31 at 11:10 -0800, Darrick J. Wong wrote:
>> >> For a brief moment I pondered whether it would make sense to make
>> >> filesystems part of the device model so that the suspend code could work
>> >> out fs <-> bdev dependencies and know in which order to freeze
>> >> filesystems and quiesce devices, but every time I go digging into how
>> >> all those macros work I get confused and my eyes glaze over, so I don't
>> >> know if this is at all a good idea or just confused ramblings.
>> >
>> > If we have to go this way: shouldn't we introduce a new abstraction
>> > ("storage stack element" or similar) rather than making filesystems part of
>> > the device model?
>>
>> That would be my approach.
>>
>> Trying to "suspend" filesystems at the same time as I/O devices (and
>> all of that asynchronously) may be problematic for ordering reasons
>> and similar.
>
> Oh look, another ordering issue. And this is why I was not a fan of the
> device link API even though that is what we got merged. Moving on...
>
>> Moreover, during hibernation devices are suspended for two times (and
>> resumed in between, of course) whereas filesystems only need to be
>> "suspended" once.
>
> From your point of view yes, but actually internally the VFS layer or
> filesystems themselves may end up re-using this mechanism later for
> other things like -- snapshotting. And if some folks have it the way
> they want it, we may need a dependency map between filesystems anyway
> for filesystem specific reasons.

That's orthogonal to what I said.

A dependency map between filesystems and other components of the block
layer (like md, dm etc) will be necessary going forward (if all of the
suspending and resuming of them is expected to be reliable anyway),
but that doesn't change hibernation-related requirements one whit.

Filesystems need to be suspended (or frozen or whatever terminology
ends up being used for that) *before* creating a hibernation image and
they *cannot* be resumed (unfrozen etc) after that until the system is
off or the kernel decides that the hibernation has failed and rolls
back.  Whatever data/metadata are there in persistent storage before
the image is created, changing them after that point is potentially
critically harmful, so (in the hibernation case) all of the in-flight
I/O that may end up being written to persistent storage needs to be
flushed before creating the image.

However, *devices* are resumed after creating the image so that the
image itself can be written to persistent storage and are suspended
after that again before putting the system to sleep (for wakeup to
work, among other things).

That's why suspend/resume of filesystems cannot be tied to
suspend/resume of devices.

Note that this isn't the case for system suspend/resume
(suspend-to-RAM or suspend-to-idle).

>> With that in mind, I would add a mechanism allowing filesystems (and
>> possibly other components of the storage stack) to register a set of
>> callbacks for suspend and resume and then invoking those callbacks in
>> a specific order.
>
> That's what I had done in my series, the issue here is order. Order in my
> series is simple but should work for starters, later however I suspect we'll
> need something more robust to help.

Quite likely.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-25  9:45         ` Rafael J. Wysocki
@ 2018-02-25 17:22           ` Luis R. Rodriguez
  2018-02-26  9:25             ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Luis R. Rodriguez @ 2018-02-25 17:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luis R. Rodriguez, Bart Van Assche, darrick.wong@oracle.com,
	jlayton@redhat.com, jikos@kernel.org, jack@suse.cz,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org

On Sun, Feb 25, 2018 at 10:45:26AM +0100, Rafael J. Wysocki wrote:
> On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote:
> >> Moreover, during hibernation devices are suspended for two times (and
> >> resumed in between, of course) whereas filesystems only need to be
> >> "suspended" once.
> >
> > From your point of view yes, but actually internally the VFS layer or
> > filesystems themselves may end up re-using this mechanism later for
> > other things like -- snapshotting. And if some folks have it the way
> > they want it, we may need a dependency map between filesystems anyway
> > for filesystem specific reasons.
> 
> That's orthogonal to what I said.

<-- snip -->

> However, *devices* are resumed after creating the image so that the
> image itself can be written to persistent storage and are suspended
> after that again before putting the system to sleep (for wakeup to
> work, among other things).
> 
> That's why suspend/resume of filesystems cannot be tied to
> suspend/resume of devices.

Ah, yes, I see your point now. So for filesystems we really don't
care if its suspend or hibernation, we just need to freeze in the right
order. So long as we get that order right we should be OK.

Curious --  do we resume *all* devices after creating the image for hibernation
today?  Not that I am advocating using devices for this mechanism or resolution
for filesystems, I'm just curious as we're on the topic.

  Luis

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-25 17:22           ` Luis R. Rodriguez
@ 2018-02-26  9:25             ` Rafael J. Wysocki
  2018-02-26 12:44               ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael J. Wysocki @ 2018-02-26  9:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Rafael J. Wysocki, Bart Van Assche, darrick.wong@oracle.com,
	jlayton@redhat.com, jikos@kernel.org, jack@suse.cz,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org

On Sun, Feb 25, 2018 at 6:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Sun, Feb 25, 2018 at 10:45:26AM +0100, Rafael J. Wysocki wrote:
>> On Sat, Feb 24, 2018 at 4:27 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > On Mon, Feb 05, 2018 at 09:28:37AM +0100, Rafael J. Wysocki wrote:
>> >> Moreover, during hibernation devices are suspended for two times (and
>> >> resumed in between, of course) whereas filesystems only need to be
>> >> "suspended" once.
>> >
>> > From your point of view yes, but actually internally the VFS layer or
>> > filesystems themselves may end up re-using this mechanism later for
>> > other things like -- snapshotting. And if some folks have it the way
>> > they want it, we may need a dependency map between filesystems anyway
>> > for filesystem specific reasons.
>>
>> That's orthogonal to what I said.
>
> <-- snip -->
>
>> However, *devices* are resumed after creating the image so that the
>> image itself can be written to persistent storage and are suspended
>> after that again before putting the system to sleep (for wakeup to
>> work, among other things).
>>
>> That's why suspend/resume of filesystems cannot be tied to
>> suspend/resume of devices.
>
> Ah, yes, I see your point now. So for filesystems we really don't
> care if its suspend or hibernation, we just need to freeze in the right
> order. So long as we get that order right we should be OK.

Generally, yes.

System suspend/resume (S2R, S2I), however, doesn't really require the
flushing part, so in principle the full-blown fs freezing is not
necessary in that case, strictly speaking.  The ordering of writes
still has to be preserved in this case (that is, writes to persistent
storage in the presence of a suspend-resume cycle must occur in the
same order as without the suspend-resume), but it doesn't matter too
much when the writes actually happen.  They may be carried out before
the suspend or after the resume or somewhere during one of them: all
should be fine so long as the ordering of writes doesn't change as a
result of the suspend-resume (and suspend-resume failures are like
surprise resets from the fs perspective in that case, so journaling
should be sufficient to recover from them).

Of course, the full-blown freezing will work for system suspend too,
but it may be high-latency which is not desirable in some scenarios
utilizing system suspend/resume ("dark resume" or "lucid sleep"
scenarios, for example).  That will be a concern in the long run (it
kind of is a concern already today), so I would consider
special-casing it from the outset.

> Curious --  do we resume *all* devices after creating the image for hibernation
> today?

Yes, we do.

It is not straightforward to determine which devices will be necessary
to save the image in general and we would need to resume a significant
subset of the device hierarchy for this purpose anyway.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-26  9:25             ` Rafael J. Wysocki
@ 2018-02-26 12:44               ` Jan Kara
  2018-02-26 13:27                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2018-02-26 12:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Luis R. Rodriguez, Bart Van Assche, darrick.wong@oracle.com,
	jlayton@redhat.com, jikos@kernel.org, jack@suse.cz,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org

On Mon 26-02-18 10:25:55, Rafael J. Wysocki wrote:
> On Sun, Feb 25, 2018 at 6:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > Ah, yes, I see your point now. So for filesystems we really don't
> > care if its suspend or hibernation, we just need to freeze in the right
> > order. So long as we get that order right we should be OK.
> 
> Generally, yes.
> 
> System suspend/resume (S2R, S2I), however, doesn't really require the
> flushing part, so in principle the full-blown fs freezing is not
> necessary in that case, strictly speaking.  The ordering of writes
> still has to be preserved in this case (that is, writes to persistent
> storage in the presence of a suspend-resume cycle must occur in the
> same order as without the suspend-resume), but it doesn't matter too
> much when the writes actually happen.

I agree that in principle we don't have to flush all dirty data from page
cache before system suspend (I believe this is what you are speaking about
here, isn't it?). However from implementation POV it is much simpler that
way as otherwise processes get blocked in the kernel in unexpected places
waiting for locks held by blocked flusher threads etc.

>  They may be carried out before
> the suspend or after the resume or somewhere during one of them: all
> should be fine so long as the ordering of writes doesn't change as a
> result of the suspend-resume (and suspend-resume failures are like
> surprise resets from the fs perspective in that case, so journaling
> should be sufficient to recover from them).

Err, I don't follow how suspend failure looks like surprise reset to the
fs. If suspend fails, we just thaw the filesystem and off we go, no IO
lost. If you speak about a situation where you suspend but then boot
without resuming - yes, that looks like a surprise reset and it behaves
that way already today.

> Of course, the full-blown freezing will work for system suspend too,
> but it may be high-latency which is not desirable in some scenarios
> utilizing system suspend/resume ("dark resume" or "lucid sleep"
> scenarios, for example).  That will be a concern in the long run (it
> kind of is a concern already today), so I would consider
> special-casing it from the outset.

Understood but that would require considerable amount of work on the fs
side and the problem is hard enough as is :) And switching the freeze
implementation to avoid sync(2) if asked to is quite independent from
implementing system suspend to use fs freezing.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LSF/MM TOPIC] Phasing out kernel thread freezing
  2018-02-26 12:44               ` Jan Kara
@ 2018-02-26 13:27                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2018-02-26 13:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Rafael J. Wysocki, Luis R. Rodriguez, Bart Van Assche,
	darrick.wong@oracle.com, jlayton@redhat.com, jikos@kernel.org,
	david@fromorbit.com, lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org

On Mon, Feb 26, 2018 at 1:44 PM, Jan Kara <jack@suse.cz> wrote:
> On Mon 26-02-18 10:25:55, Rafael J. Wysocki wrote:
>> On Sun, Feb 25, 2018 at 6:22 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > Ah, yes, I see your point now. So for filesystems we really don't
>> > care if its suspend or hibernation, we just need to freeze in the right
>> > order. So long as we get that order right we should be OK.
>>
>> Generally, yes.
>>
>> System suspend/resume (S2R, S2I), however, doesn't really require the
>> flushing part, so in principle the full-blown fs freezing is not
>> necessary in that case, strictly speaking.  The ordering of writes
>> still has to be preserved in this case (that is, writes to persistent
>> storage in the presence of a suspend-resume cycle must occur in the
>> same order as without the suspend-resume), but it doesn't matter too
>> much when the writes actually happen.
>
> I agree that in principle we don't have to flush all dirty data from page
> cache before system suspend (I believe this is what you are speaking about
> here, isn't it?).

Yes, it is.

> However from implementation POV it is much simpler that
> way as otherwise processes get blocked in the kernel in unexpected places
> waiting for locks held by blocked flusher threads etc.

Understood.

>>  They may be carried out before
>> the suspend or after the resume or somewhere during one of them: all
>> should be fine so long as the ordering of writes doesn't change as a
>> result of the suspend-resume (and suspend-resume failures are like
>> surprise resets from the fs perspective in that case, so journaling
>> should be sufficient to recover from them).
>
> Err, I don't follow how suspend failure looks like surprise reset to the
> fs. If suspend fails, we just thaw the filesystem and off we go, no IO
> lost.

Right.

> If you speak about a situation where you suspend but then boot
> without resuming - yes, that looks like a surprise reset and it behaves
> that way already today.

I was talking about the latter.

>> Of course, the full-blown freezing will work for system suspend too,
>> but it may be high-latency which is not desirable in some scenarios
>> utilizing system suspend/resume ("dark resume" or "lucid sleep"
>> scenarios, for example).  That will be a concern in the long run (it
>> kind of is a concern already today), so I would consider
>> special-casing it from the outset.
>
> Understood but that would require considerable amount of work on the fs
> side and the problem is hard enough as is :)

Fair enough. :-)

> And switching the freeze implementation to avoid sync(2) if asked to is quite
> independent from implementing system suspend to use fs freezing.

Agreed.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-02-26 13:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-26  9:09 [LSF/MM TOPIC] Phasing out kernel thread freezing Luis R. Rodriguez
2018-01-31 19:10 ` Darrick J. Wong
2018-02-04 22:41   ` Bart Van Assche
2018-02-05  8:28     ` Rafael J. Wysocki
2018-02-24  3:27       ` Luis R. Rodriguez
2018-02-25  9:45         ` Rafael J. Wysocki
2018-02-25 17:22           ` Luis R. Rodriguez
2018-02-26  9:25             ` Rafael J. Wysocki
2018-02-26 12:44               ` Jan Kara
2018-02-26 13:27                 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).