From: Dave Chinner <david@fromorbit.com>
To: NeilBrown <neilb@suse.de>
Cc: Len Brown <lenb@kernel.org>,
rjw@rjwysocki.net, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, Len Brown <len.brown@intel.com>
Subject: Re: [PATCH 1/1] suspend: delete sys_sync()
Date: Fri, 15 May 2015 09:54:26 +1000 [thread overview]
Message-ID: <20150514235426.GF4316@dastard> (raw)
In-Reply-To: <20150514092251.6d0625af@notabene.brown>
ng back On Thu, May 14, 2015 at 09:22:51AM +1000, NeilBrown wrote:
> On Mon, 11 May 2015 11:44:28 +1000 Dave Chinner <david@fromorbit.com> wrote:
>
> > On Fri, May 08, 2015 at 03:08:43AM -0400, Len Brown wrote:
> > > From: Len Brown <len.brown@intel.com>
> > >
> > > Remove sys_sync() from the kernel's suspend flow.
> > >
> > > sys_sync() is extremely expensive in some configurations,
> > > and so the kernel should not force users to pay this cost
> > > on every suspend.
> >
> > Since when? Please explain what your use case is that makes this
> > so prohibitively expensive it needs to be removed.
> >
> > >
> > > The user-space utilities s2ram and s2disk choose to invoke sync() today.
> > > A user can invoke suspend directly via /sys/power/state to skip that cost.
> >
> > So, you want to have s2disk write all the dirty pages in memory to
> > the suspend image, rather than to the filesystem?
> >
> > Either way you have to write that dirty data to disk, but if you
> > write it to the suspend image, it then has to be loaded again on
> > resume, and then written again to the filesystem the system has
> > resumed. This doesn't seem very efficient to me....
> >
> > And, quite frankly, machines fail to resume from suspne dall the
> > time. e.g. run out of batteries when they are under s2ram
> > conditions, or s2disk fails because a kernel upgrade was done before
> > the s2disk and so can't be resumed. With your change, users lose all
> > the data that was buffered in memory before suspend, whereas right
> > now it is written to disk and so nothing is lost if the resume from
> > suspend fails for whatever reason.
> >
> > IOWs, I can see several good reasons why the sys_sync() needs to
> > remain in the suspend code. User data safety and filesystem
> > integrity is far, far more important than a couple of seconds
> > improvement in suspend speed....
>
> To be honest, this sounds like superstition and fear, not science and fact.
>
> "filesystem integrity" is not an issue for the fast majority of filesystems
> which use journalling to ensure continued integrity even after a crash. I
> think even XFS does that :-)
It has nothing to do with journalling, and everything to do with
bring filesystems to an *idle state* before suspend runs. We have a
long history of bug reports with XFS that go: suspend, resume, XFS
almost immediately detects corruption, shuts down.
The problem is that "sync" doesn't make the filesystem idle - XFs
has *lots* of background work going on, and if we aren't *real
careful* the filesystem is still doing work while the hardware gets
powerd down and the suspend image is being taken. the result is on
resume that the on-disk filesystem state does not match the memory
image pulled back from resume, and we get shutdowns.
sys_sync() does not guarantee a filesystem is idle - it guarantees
the data in memory is recoverable, butit doesn't stop the filesystem
from doing things like writing back metadata or running background
cleaup tasks. If those aren't stopped properly, then we get into
the state where in-memory and on-disk state get out of whack. And
s2ram can have these problems too, because if there is IO in flight
when the hardware is powered down, that IO is lost....
Every time some piece of generic infrastructure changes behaviour
w.r.t. suspend/resume, we get a new set of problems being reported
by users. It's extremely hard to test for these problems and it
might take months of occasional corruption reports from a user to
isolate it to being a suspend/resume problem. It's a game of
whack-a-mole, because quite often they come down to the fact that
something changed and nobody in the XFS world knew they had to now
set an different initialisation flag on some structure or workqueue
to make it work the way it needed to work.
Go back an look at the history of sys_sync() in suspend discussions
over the past 10 years. You'll find me saying exactly the same
thing again and again about sys_sync(): it does not guarantee the
filesystem is in an idle or coherent, unchanging state, and nothing
in the suspend code tells the filesystem to enter an idle or frozen
state. We actually have mechanisms for doing this - we use it in the
storage layers to idle the filesystem while we do things like *take
a snapshot*.
What is the mechanism suspend to disk uses? It *takes a snapshot* of
system state, written to disk. It's supposed to be consistent, and
the only way you can guarantee the state of an active, mounted
filesystem has consistent in-memory state and on-disk state and
that it won't get changed is to *freeze the filesystem*.
Removing the sync is only going to make this problem worse because
the delta between on-disk and in-memory state is going to be much,
much larger. There is also likely to be significant filesystem
activity occurring when the filesystem has all it's background
threads and work queues abruptly frozen with no warning or
co-ordination, which makes it impossible for anyone to test
suspend/resume reliably.
Sorry for the long rant, but I've been saying the same thing for 10
years, which is abotu as long as I've been dealing with filesystem
corruptions that have resulted from suspend/resume.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-05-14 23:54 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-08 7:08 [PATCH 1/1] suspend: delete sys_sync() Len Brown
2015-05-08 14:34 ` Alan Stern
2015-05-08 16:36 ` Len Brown
2015-05-08 19:13 ` One Thousand Gnomes
2015-05-08 19:32 ` Len Brown
2015-05-08 19:52 ` One Thousand Gnomes
2015-05-08 20:39 ` Rafael J. Wysocki
2015-05-08 20:30 ` Rafael J. Wysocki
2015-05-09 19:59 ` Alan Stern
2015-05-09 20:25 ` Henrique de Moraes Holschuh
2015-05-11 20:34 ` Len Brown
2015-05-12 6:11 ` Oliver Neukum
2015-06-25 17:11 ` Henrique de Moraes Holschuh
2015-06-30 20:04 ` Len Brown
2015-07-01 12:21 ` Henrique de Moraes Holschuh
2015-07-02 3:07 ` Len Brown
2015-07-03 1:42 ` Dave Chinner
2015-07-04 1:03 ` Rafael J. Wysocki
2015-07-04 8:50 ` Geert Uytterhoeven
2015-07-05 23:25 ` Rafael J. Wysocki
2015-07-04 14:19 ` Alan Stern
2015-07-05 23:28 ` Rafael J. Wysocki
2015-07-06 11:06 ` Pavel Machek
2015-07-06 13:59 ` Rafael J. Wysocki
2015-07-07 10:25 ` Pavel Machek
2015-07-07 12:22 ` Rafael J. Wysocki
2015-07-06 0:06 ` Dave Chinner
2015-07-06 11:11 ` Pavel Machek
2015-07-06 13:52 ` Rafael J. Wysocki
2015-07-07 1:17 ` Dave Chinner
2015-07-07 12:14 ` Rafael J. Wysocki
2015-07-07 13:16 ` Oliver Neukum
2015-07-07 14:32 ` Rafael J. Wysocki
2015-07-07 14:38 ` Oliver Neukum
2015-07-07 15:03 ` Alan Stern
2015-07-07 22:20 ` Rafael J. Wysocki
2015-07-08 11:20 ` Pavel Machek
2015-07-08 14:40 ` Alan Stern
2015-07-08 22:04 ` Rafael J. Wysocki
2015-07-07 22:11 ` Rafael J. Wysocki
2015-07-08 7:51 ` Oliver Neukum
2015-07-08 22:03 ` Rafael J. Wysocki
2015-07-09 7:32 ` Oliver Neukum
2015-07-09 23:22 ` Rafael J. Wysocki
2015-08-04 19:54 ` Pavel Machek
2015-07-08 11:17 ` Pavel Machek
2015-07-07 13:42 ` Takashi Iwai
2015-07-06 10:15 ` Ming Lei
2015-07-06 10:03 ` Pavel Machek
2015-05-11 1:44 ` Dave Chinner
2015-05-11 20:22 ` Len Brown
2015-05-12 22:34 ` Dave Chinner
2015-05-13 23:22 ` NeilBrown
2015-05-14 23:54 ` Dave Chinner [this message]
2015-05-15 0:34 ` Rafael J. Wysocki
2015-05-15 0:40 ` Ming Lei
2015-05-15 0:59 ` Rafael J. Wysocki
2015-05-15 5:13 ` Ming Lei
2015-05-15 10:35 ` One Thousand Gnomes
2015-05-18 1:57 ` NeilBrown
[not found] ` <CAJvTdKn_0EZ0ZuqO2e4+ExD8kFWcy78fse4zHr3uFZODOroXEg@mail.gmail.com>
2015-06-19 1:09 ` Dave Chinner
2015-06-19 2:35 ` Len Brown
2015-06-19 4:31 ` Dave Chinner
2015-06-19 6:34 ` Len Brown
2015-06-19 23:07 ` Dave Chinner
2015-06-20 5:26 ` Len Brown
2015-05-15 1:04 ` NeilBrown
2015-05-15 14:20 ` Alan Stern
2015-05-15 14:32 ` Alan Stern
2015-05-15 14:19 ` Alan Stern
2015-07-06 10:07 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150514235426.GF4316@dastard \
--to=david@fromorbit.com \
--cc=len.brown@intel.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=neilb@suse.de \
--cc=rjw@rjwysocki.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox