Re: Re: standby to disk transition

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

From: Nigel Cunningham <nigel@suspend2.net>
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-pm@osdl.org, linux-pm@lists.osdl.org, "Victor Porton, , ,
	" <porton@ex-code.com>, Pavel Machek <pavel@ucw.cz>
Subject: Re: Re: standby to disk transition
Date: Tue, 14 Mar 2006 10:18:57 +1000	[thread overview]
Message-ID: <200603141019.02907.nigel@suspend2.net> (raw)
In-Reply-To: <200603140036.17051.rjw@sisk.pl>


[-- Attachment #1.1: Type: text/plain, Size: 3923 bytes --]

Hi.

On Tuesday 14 March 2006 09:36, Rafael J. Wysocki wrote:
> Hi,
>
> On Tuesday 14 March 2006 00:11, Nigel Cunningham wrote:
> > On Tuesday 14 March 2006 08:42, Rafael J. Wysocki wrote:
> > > On Monday 13 March 2006 23:08, Pavel Machek wrote:
> > > > > >  > Yep, I call that suspend-to-both. It is planned, but not
> > > > > >  > really trivial, and I'm a little busy. If someone wants to
> > > > > >  > help....
> > > > > >
> > > > > > I was thinking a few days ago. With your move of all this stuff
> > > > > > to userspace, if it was done in multiple stages, we could
> > > > > > implement a form of checkpointing this way.
> > > > > >
> > > > > > So instead of doing the 'suspend to disk/ram' after 'write out
> > > > > > all pages', we just continue.
> > > > > >
> > > > > > Why is this useful ?  We've seen bugs reported that only ever
> > > > > > bite customers after they've run their workload for a month. 
> > > > > > Now, if they had a means of checkpointing, then when it crashes,
> > > > > > they could capture the last image that landed somewhere, and set
> > > > > > that up for more tests/monitoring with kprobes etc and reproduce
> > > > > > those hard-to-reproduce bugs a lot faster.
> > > > >
> > > > > I've been asked about this from time to time too. Apart from the
> > > > > issues Pavel has already mentioned, the big problem in my mind was
> > > > > figuring out what to do about disk storage. As the algorithm stands
> > > > > at the moment, the image includes information about the state of
> > > > > mounted filesystems. We'd need to somehow get rid of or be able to
> > > > > ignore that. Any suggestions?
> > > >
> > > > Well, copying all the filesystems would work, as would having no
> > > > filesystems at all :-) [ramdisk case]. And perhaps practical
> > > > equivalent of "copy all filesystems" can be done with device mapper.
> > > >
> > > > [Of course, you'd have to copy all the filesystems back before doing
> > > > resume].
> > >
> > > If we had anything like fs suspend/resume, we could handle such things.
> > > We could also handle the "USB device mounted before suspend" problem
> > > (I think it's related).
> >
> > Well, we have bdev freezing, which I guess is what is used for fixing up
> > raid mirrors (but don't know for certain). I use it in refrigerating to
> > get XFS to really stop activity. I don't think it helps in this case
> > though:
>
> I don't think so too.
>
> > We need to be able to rollback the state of the filesystem in memory and
> > on disk to the point where the last checkpoint was made. Memory would be
> > straight forward if we want to do it dumbly and slowly - just reload the
> > whole check pointed image. If we want to be more efficient, we'd want to
> > just load the pages that had changed (Mark on (first) write?). But
> > filesystems seem to be a whole different story. Do any of the commonly
> > used fses have support for checkpointing and rollback back at the moment?
>
> I'm not sure if we need a rollback as such.  What we need is to make sure
> the filesystems state will be consistent before as well as after we have
> "reloaded" the snapshot.

Rereading what I think was Dave's original comment above (bug reports that 
only bite customers...), I think the requirement is to be able to rollback 
the entire system to the checkpoint - not merely ensure it's consistent, but 
ensure it's the same so that (all other things being equal), the bug could be 
reproduced with the extra instrumentation in place. Having a filesystem that 
was consistent but (say) discarding the inodes and dentries in memory at the 
time of the checkpoint might be throwing away the very data required to 
reproduce the bug.

HTH.

Nigel
-- 
See our web page for Howtos, FAQs, the Wiki and mailing list info.
http://www.suspend2.net                IRC: #suspend2 on Freenode

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

next prev parent reply	other threads:[~2006-03-14  0:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-13  2:34 standby to disk transition Victor Porton,,,
2006-03-13  8:30 ` Pavel Machek
2006-03-13  8:48   ` Adam Belay
2006-03-13  8:50     ` Pavel Machek
2006-03-13  9:07       ` Adam Belay
2006-03-13  9:13         ` Pavel Machek
2006-03-13 18:33           ` Dave Jones
2006-03-13 21:24             ` Pavel Machek
2006-03-13 21:28               ` Dave Jones
2006-03-13 21:46                 ` Pavel Machek
2006-03-13 22:06                   ` Rafael J. Wysocki
2006-03-13 21:59                 ` Nigel Cunningham
2006-03-13 22:08                   ` Pavel Machek
2006-03-13 22:42                     ` Rafael J. Wysocki
2006-03-13 23:11                       ` Nigel Cunningham
2006-03-13 23:36                         ` Rafael J. Wysocki
2006-03-14  0:18                           ` Nigel Cunningham [this message]
2006-03-14 18:12                             ` Rafael J. Wysocki
2006-03-14 20:33                           ` Pavel Machek
2006-03-14 21:13                             ` Rafael J. Wysocki
2006-03-14 21:22                               ` Pavel Machek
2006-03-14 21:42                                 ` Alan Stern
2006-03-14 22:07                                   ` Rafael J. Wysocki
2006-03-15 15:14                                     ` Alan Stern
2006-03-14 21:57                                 ` Rafael J. Wysocki
2006-03-14 21:59                               ` Pavel Machek
2006-03-15  0:22                                 ` suspend-to-both [was Re: Re: standby to disk transition] Pavel Machek
2006-03-14 20:29                         ` Re: standby to disk transition Pavel Machek
2006-03-14  0:21               ` Nigel Cunningham
2006-03-14  9:50                 ` Pavel Machek
2006-03-13 13:55     ` Nigel Cunningham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200603141019.02907.nigel@suspend2.net \
    --to=nigel@suspend2.net \
    --cc=linux-pm@lists.osdl.org \
    --cc=linux-pm@osdl.org \
    --cc=pavel@ucw.cz \
    --cc=porton@ex-code.com \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox