From: Matt Helsley <matthltc@us.ibm.com>
To: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
linux-fsdevel@vger.kernel.org,
containers@lists.linux-foundation.org,
Jamie Lokier <jamie@shareable.org>,
Andreas Dilger <adilger@sun.com>
Subject: Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Date: Sun, 21 Mar 2010 19:12:42 -0700 [thread overview]
Message-ID: <20100322021242.GI2887@count0.beaverton.ibm.com> (raw)
In-Reply-To: <4BA68884.3080003@free.fr>
On Sun, Mar 21, 2010 at 09:58:44PM +0100, Daniel Lezcano wrote:
> Serge E. Hallyn wrote:
> > Quoting Jamie Lokier (jamie@shareable.org):
> >
> >> Matt Helsley wrote:
> >>
> >>>> That said, if the intent is to allow the restore to be done on
> >>>> another node with a "similar" filesystem (e.g. created by rsync/node
> >>>> image), instead of having a coherent distributed filesystem on all
> >>>> of the nodes then the filename makes sense.
> >>>>
> >>> Yes, this is the intent.
> >>>
> >> I would worry about programs which are using files which have been
> >> deleted, renamed, or (very common) renamed-over by another process
> >> after being opened, as there's a good chance they will successfully
> >> open the wrong file after c/r, and corrupt state from then on.
> >>
> >
> > Userspace is expected to back up and restore the filesystem, for
> > instance using a btrfs snapshot or a simple rsync or tar.
> >
> >
> That does not solve the problem Jamie is talking about.
> A rsync or a tar will not see a deleted file and using a btrfs to have
> the CR to work with the deleted files is a bit overkill, no ?
These are the same kinds of problems encountered during backup. You
can play fast and loose -- like taking a backup while everything is
running -- or you can play it conservative and freeze things.
I think btrfs snapshots are just one possible solution and it's not
overkill.
For some filesystems it might make sense to use the filesystem freezer to
ensure that no files are deleted while the backup takes place. Combined
with tools like rsync or rdiff backup these operations could be low bandwidth
and low latency if well-known live-migration techniques are used.
Or use dm snapshots.
I imagine fanotify could also be useful so long as userspace has marked
things correctly prior to checkpoint. My high level understanding of
fanotify was we'd be able to delay (or deny) deletion until checkpoint
is complete.
Or if using fanotify is unacceptable, at the very least we could use
inotify to know when a file needed for restart has been deleted. It might
go something like:
start watching files/dirs needed (fanotify or inotify)
Delay/deny changes (fanotify ONLY)
freeze tasks for checkpoint
freeze filesystem contents:
take btrfs snapshots OR
take dm snapshots OR
use filesystem freezer OR
backup filesystem contents
sys_checkpoint
check for changes to the filesystem contents and report failure if they
interfere with restart (inotify ONLY)
thaw filesystem contents
thaw tasks
So there are lots of possible solutions and they don't all involve trying to
stop the whole VFS or the whole machine. They also don't require anything
more in-kernel than what's already being pushed (our patchset, Eric Paris'
patchset for the optional fanotify idea).
> I have another question about the deleted files. How is handled the case
> when a process has a deleted mapped file but without an associated file
> descriptor ?
The mapped file holds a struct file reference in the VMA. When checkpoint
walks the VMAs the struct file is visited just like for struct files reached
from file descriptors.
Cheers,
-Matt Helsley
next prev parent reply other threads:[~2010-03-22 2:12 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-19 0:59 [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20 Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 20/96] c/r: make file_pos_read/write() public Oren Laadan
2010-03-22 6:31 ` Nick Piggin
2010-03-23 0:12 ` Oren Laadan
2010-03-23 0:43 ` Nick Piggin
2010-03-23 0:56 ` Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 37/96] c/r: introduce new 'file_operations': ->checkpoint, ->collect() Oren Laadan
2010-03-22 6:34 ` Nick Piggin
2010-03-22 10:16 ` Matt Helsley
2010-03-22 11:00 ` Nick Piggin
2010-03-19 0:59 ` [C/R v20][PATCH 38/96] c/r: dump open file descriptors Oren Laadan
2010-03-19 23:19 ` Andreas Dilger
2010-03-20 4:43 ` Matt Helsley
2010-03-21 17:27 ` Jamie Lokier
2010-03-21 19:40 ` Serge E. Hallyn
2010-03-21 20:58 ` Daniel Lezcano
2010-03-21 21:36 ` Oren Laadan
[not found] ` <4BA6914D.8040007-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-21 23:31 ` xing lin
2010-03-22 8:40 ` Daniel Lezcano
2010-03-22 2:12 ` Matt Helsley [this message]
2010-03-22 13:51 ` Jamie Lokier
2010-03-22 23:18 ` Andreas Dilger
2010-03-22 1:06 ` Matt Helsley
2010-03-22 2:20 ` Jamie Lokier
2010-03-22 3:37 ` Matt Helsley
2010-03-22 14:13 ` Jamie Lokier
2010-03-22 2:55 ` Serge E. Hallyn
[not found] ` <1268960401-16680-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-22 10:30 ` Nick Piggin
2010-03-22 13:22 ` Matt Helsley
2010-03-22 13:38 ` Nick Piggin
2010-03-19 0:59 ` [C/R v20][PATCH 39/96] c/r: restore " Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 40/96] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 44/96] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 45/96] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 46/96] c/r: add checkpoint operation for opened files of generic filesystems Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 50/96] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 51/96] c/r: support for open pipes Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 52/96] c/r: checkpoint and restore FIFOs Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 53/96] c/r: refuse to checkpoint if monitoring directories with dnotify Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 66/96] c/r: restore file->f_cred Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 82/96] c/r: checkpoint/restart epoll sets Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 83/96] c/r: checkpoint/restart eventfd Oren Laadan
2010-03-19 1:00 ` [C/R v20][PATCH 84/96] c/r: restore task fs_root and pwd (v3) Oren Laadan
2010-03-19 1:00 ` [C/R v20][PATCH 85/96] c/r: preliminary support mounts namespace Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100322021242.GI2887@count0.beaverton.ibm.com \
--to=matthltc@us.ibm.com \
--cc=adilger@sun.com \
--cc=containers@lists.linux-foundation.org \
--cc=daniel.lezcano@free.fr \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).