From: Oren Laadan <orenl@cs.columbia.edu>
To: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
linux-fsdevel@vger.kernel.org,
containers@lists.linux-foundation.org,
Jamie Lokier <jamie@shareable.org>,
Andreas Dilger <adilger@sun.com>
Subject: Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Date: Sun, 21 Mar 2010 17:36:13 -0400 [thread overview]
Message-ID: <4BA6914D.8040007@cs.columbia.edu> (raw)
In-Reply-To: <4BA68884.3080003@free.fr>
Daniel Lezcano wrote:
> Serge E. Hallyn wrote:
>> Quoting Jamie Lokier (jamie@shareable.org):
>>
>>> Matt Helsley wrote:
>>>
>>>>> That said, if the intent is to allow the restore to be done on
>>>>> another node with a "similar" filesystem (e.g. created by rsync/node
>>>>> image), instead of having a coherent distributed filesystem on all
>>>>> of the nodes then the filename makes sense.
>>>>>
>>>> Yes, this is the intent.
>>>>
>>> I would worry about programs which are using files which have been
>>> deleted, renamed, or (very common) renamed-over by another process
>>> after being opened, as there's a good chance they will successfully
>>> open the wrong file after c/r, and corrupt state from then on.
>>>
>> Userspace is expected to back up and restore the filesystem, for
>> instance using a btrfs snapshot or a simple rsync or tar.
>>
>>
> That does not solve the problem Jamie is talking about.
> A rsync or a tar will not see a deleted file and using a btrfs to have
> the CR to work with the deleted files is a bit overkill, no ?
Let's separate the issues of file system snapshot and deleted files.
1) File system snapshot:
------------------------
The requirement is to preserve the file system state between the time
of the checkpoint and the time of the restart, because userspace will
expect it to remain the same.
The alternatives are:
a) Use capable file system, like brfs, or (modified) nilfs.
b) Userspace saves the state e.g. w/ tar or rsync (maybe incremental)
c) Assume/expect that the file system isn't modified between checkpoint
and restart (e.g. if we use c/r to suspend a user's session)
d) Expect userspace to adapt to changes if they occur, e.g. by having
the application be aware of the possibility, or by providing a wrapper
that will do some magic prior to restart (by looking at the checkpoint
image).
Options a,b,c are all transparent to the application, while option
d required that applications become aware of c/r. That's ok, but our
primary goal is to be generic enough to unmodified applications.
2) Deleted files:
-----------------
The requirement is that at restart we'll be able to restore the file
point in the kernel to a deleted file with same properties and contents
as it was at the time of the checkpoint.
The alternatives we considered are:
e) For each deleted file, save the contents of that file as part of
the checkpoint image;
At restart - create a new file, populate with the contents, open it
(to get an active file pointer), and finally unlink it, so it is -
again - deleted.
f) At checkpoint time, create a file (from scratch) in a dedicated
area of the file system (userspace configurable?), and copy the
contents of the deleted file to this file. Only save the file system
state after this is done.
At restart, open the alternative file instead, and then immediately
delete it.
g) At checkpoint time, re-link the file to a dedicated area of the
file system. This requires support from the underlying file system,
of course. For instance, it's trivial for ext2,3 but IIRC will need
help for ext4. Re-linking is essentially attaching a new filename
to an existing inode that is still referenced but is otherwise not
reachable - and make it reachable again.
At restart, open the re-linked file and then immediately delete it.
> I have another question about the deleted files. How is handled the case
> when a process has a deleted mapped file but without an associated file
> descriptor ?
>
It works the same as with non-deleted files (assuming that we know
how to handle delete files in general, e.g. options e,d,f above):
To checkpoint a task's mm we loop through the vma's and checkpoint
them. For a vma that corresponds to a mapped file, we first save
the vma->vm_file. In turn, for a file pointer we save the filename,
properties, credentials. A file pointer is saved as an independent
object - and is assigned a unique id - objref. The state of the vma
will indicate indicate this objref.
At restart, we will first see the file pointer object, and will
open the file to create a corresponding file pointer. Later when
we restore the vma, we'll locate the (new) file pointer using the
objref and use it in mmap.
Oren.
>> If we detect anything which really is not supported (for instance
>> inotify for now) then we fail and leave a log message explaining the
>> failure.
>>
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
next prev parent reply other threads:[~2010-03-21 21:36 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-19 0:59 [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20 Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 20/96] c/r: make file_pos_read/write() public Oren Laadan
2010-03-22 6:31 ` Nick Piggin
2010-03-23 0:12 ` Oren Laadan
2010-03-23 0:43 ` Nick Piggin
2010-03-23 0:56 ` Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 37/96] c/r: introduce new 'file_operations': ->checkpoint, ->collect() Oren Laadan
2010-03-22 6:34 ` Nick Piggin
2010-03-22 10:16 ` Matt Helsley
2010-03-22 11:00 ` Nick Piggin
2010-03-19 0:59 ` [C/R v20][PATCH 38/96] c/r: dump open file descriptors Oren Laadan
2010-03-19 23:19 ` Andreas Dilger
2010-03-20 4:43 ` Matt Helsley
2010-03-21 17:27 ` Jamie Lokier
2010-03-21 19:40 ` Serge E. Hallyn
2010-03-21 20:58 ` Daniel Lezcano
2010-03-21 21:36 ` Oren Laadan [this message]
[not found] ` <4BA6914D.8040007-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-21 23:31 ` xing lin
2010-03-22 8:40 ` Daniel Lezcano
2010-03-22 2:12 ` Matt Helsley
2010-03-22 13:51 ` Jamie Lokier
2010-03-22 23:18 ` Andreas Dilger
2010-03-22 1:06 ` Matt Helsley
2010-03-22 2:20 ` Jamie Lokier
2010-03-22 3:37 ` Matt Helsley
2010-03-22 14:13 ` Jamie Lokier
2010-03-22 2:55 ` Serge E. Hallyn
[not found] ` <1268960401-16680-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-22 10:30 ` Nick Piggin
2010-03-22 13:22 ` Matt Helsley
2010-03-22 13:38 ` Nick Piggin
2010-03-19 0:59 ` [C/R v20][PATCH 39/96] c/r: restore " Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 40/96] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 44/96] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 45/96] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 46/96] c/r: add checkpoint operation for opened files of generic filesystems Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 50/96] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 51/96] c/r: support for open pipes Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 52/96] c/r: checkpoint and restore FIFOs Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 53/96] c/r: refuse to checkpoint if monitoring directories with dnotify Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 66/96] c/r: restore file->f_cred Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 82/96] c/r: checkpoint/restart epoll sets Oren Laadan
2010-03-19 0:59 ` [C/R v20][PATCH 83/96] c/r: checkpoint/restart eventfd Oren Laadan
2010-03-19 1:00 ` [C/R v20][PATCH 84/96] c/r: restore task fs_root and pwd (v3) Oren Laadan
2010-03-19 1:00 ` [C/R v20][PATCH 85/96] c/r: preliminary support mounts namespace Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BA6914D.8040007@cs.columbia.edu \
--to=orenl@cs.columbia.edu \
--cc=adilger@sun.com \
--cc=containers@lists.linux-foundation.org \
--cc=daniel.lezcano@free.fr \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).