linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@free.fr>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
	linux-fsdevel@vger.kernel.org,
	containers@lists.linux-foundation.org,
	Jamie Lokier <jamie@shareable.org>,
	Andreas Dilger <adilger@sun.com>
Subject: Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Date: Mon, 22 Mar 2010 09:40:32 +0100	[thread overview]
Message-ID: <4BA72D00.7040406@free.fr> (raw)
In-Reply-To: <4BA6914D.8040007@cs.columbia.edu>

Oren Laadan wrote:
>
>
> Daniel Lezcano wrote:
>> Serge E. Hallyn wrote:
>>> Quoting Jamie Lokier (jamie@shareable.org):
>>>  
>>>> Matt Helsley wrote:
>>>>    
>>>>>> That said, if the intent is to allow the restore to be done on
>>>>>> another node with a "similar" filesystem (e.g. created by rsync/node
>>>>>> image), instead of having a coherent distributed filesystem on all
>>>>>> of the nodes then the filename makes sense.
>>>>>>         
>>>>> Yes, this is the intent.
>>>>>       
>>>> I would worry about programs which are using files which have been
>>>> deleted, renamed, or (very common) renamed-over by another process
>>>> after being opened, as there's a good chance they will successfully
>>>> open the wrong file after c/r, and corrupt state from then on.
>>>>     
>>> Userspace is expected to back up and restore the filesystem, for
>>> instance using a btrfs snapshot or a simple rsync or tar.
>>>
>>>   
>> That does not solve the problem Jamie is talking about.
>> A rsync or a tar will not see a deleted file and using a btrfs to 
>> have the CR to work with the deleted files is a bit overkill, no ?
>
> Let's separate the issues of file system snapshot and deleted files.
>
> 1) File system snapshot:
> ------------------------
> The requirement is to preserve the file system state between the time
> of the checkpoint and the time of the restart, because userspace will
> expect it to remain the same.
>
> The alternatives are:
>
> a) Use capable file system, like brfs, or (modified) nilfs.
>
> b) Userspace saves the state e.g. w/ tar or rsync (maybe incremental)
>
> c) Assume/expect that the file system isn't modified between checkpoint
> and restart (e.g. if we use c/r to suspend a user's session)
>
> d) Expect userspace to adapt to changes if they occur, e.g. by having
> the application be aware of the possibility, or by providing a wrapper
> that will do some magic prior to restart (by looking at the checkpoint
> image).
>
> Options a,b,c are all transparent to the application, while option
> d required that applications become aware of c/r. That's ok, but our
> primary goal is to be generic enough to unmodified applications.
>
> 2) Deleted files:
> -----------------
> The requirement is that at restart we'll be able to restore the file
> point in the kernel to a deleted file with same properties and contents
> as it was at the time of the checkpoint.
>
> The alternatives we considered are:
>
> e) For each deleted file, save the contents of that file as part of
> the checkpoint image;
> At restart - create a new file, populate with the contents, open it
> (to get an active file pointer), and finally unlink it, so it is -
> again - deleted.
>
> f) At checkpoint time, create a file (from scratch) in a dedicated
> area of the file system (userspace configurable?), and copy the
> contents of the deleted file to this file. Only save the file system
> state after this is done.
> At restart, open the alternative file instead, and then immediately
> delete it.
>
> g) At checkpoint time, re-link the file to a dedicated area of the
> file system. This requires support from the underlying file system,
> of course. For instance, it's trivial for ext2,3 but IIRC will need
> help for ext4. Re-linking is essentially attaching a new filename
> to an existing inode that is still referenced but is otherwise not
> reachable - and make it reachable again.
> At restart, open the re-linked file and then immediately delete it.
>
>> I have another question about the deleted files. How is handled the 
>> case when a process has a deleted mapped file but without an 
>> associated file descriptor ?
>>
>
> It works the same as with non-deleted files (assuming that we know
> how to handle delete files in general, e.g. options e,d,f above):
>
> To checkpoint a task's mm we loop through the vma's and checkpoint
> them. For a vma that corresponds to a mapped file, we first save
> the vma->vm_file. In turn, for a file pointer we save the filename,
> properties, credentials. A file pointer is saved as an independent
> object - and is assigned a unique id - objref. The state of the vma
> will indicate indicate this objref.
>
> At restart, we will first see the file pointer object, and will
> open the file to create a corresponding file pointer. Later when
> we restore the vma, we'll locate the (new) file pointer using the
> objref and use it in mmap.
>
> Oren.
>

Thanks Oren for the detailed answer.

  parent reply	other threads:[~2010-03-22  8:40 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-19  0:59 [C/R v20][PATCH 00/96] Linux Checkpoint-Restart - v20 Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 20/96] c/r: make file_pos_read/write() public Oren Laadan
2010-03-22  6:31   ` Nick Piggin
2010-03-23  0:12     ` Oren Laadan
2010-03-23  0:43       ` Nick Piggin
2010-03-23  0:56         ` Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 37/96] c/r: introduce new 'file_operations': ->checkpoint, ->collect() Oren Laadan
2010-03-22  6:34   ` Nick Piggin
2010-03-22 10:16     ` Matt Helsley
2010-03-22 11:00       ` Nick Piggin
2010-03-19  0:59 ` [C/R v20][PATCH 38/96] c/r: dump open file descriptors Oren Laadan
2010-03-19 23:19   ` Andreas Dilger
2010-03-20  4:43     ` Matt Helsley
2010-03-21 17:27       ` Jamie Lokier
2010-03-21 19:40         ` Serge E. Hallyn
2010-03-21 20:58           ` Daniel Lezcano
2010-03-21 21:36             ` Oren Laadan
     [not found]               ` <4BA6914D.8040007-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-21 23:31                 ` xing lin
2010-03-22  8:40               ` Daniel Lezcano [this message]
2010-03-22  2:12             ` Matt Helsley
2010-03-22 13:51               ` Jamie Lokier
2010-03-22 23:18               ` Andreas Dilger
2010-03-22  1:06         ` Matt Helsley
2010-03-22  2:20           ` Jamie Lokier
2010-03-22  3:37             ` Matt Helsley
2010-03-22 14:13               ` Jamie Lokier
2010-03-22  2:55           ` Serge E. Hallyn
     [not found]   ` <1268960401-16680-4-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-03-22 10:30     ` Nick Piggin
2010-03-22 13:22       ` Matt Helsley
2010-03-22 13:38         ` Nick Piggin
2010-03-19  0:59 ` [C/R v20][PATCH 39/96] c/r: restore " Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 40/96] c/r: introduce method '->checkpoint()' in struct vm_operations_struct Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 44/96] c/r: add generic '->checkpoint' f_op to ext fses Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 45/96] c/r: add generic '->checkpoint()' f_op to simple devices Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 46/96] c/r: add checkpoint operation for opened files of generic filesystems Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 50/96] splice: export pipe/file-to-pipe/file functionality Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 51/96] c/r: support for open pipes Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 52/96] c/r: checkpoint and restore FIFOs Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 53/96] c/r: refuse to checkpoint if monitoring directories with dnotify Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 66/96] c/r: restore file->f_cred Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 82/96] c/r: checkpoint/restart epoll sets Oren Laadan
2010-03-19  0:59 ` [C/R v20][PATCH 83/96] c/r: checkpoint/restart eventfd Oren Laadan
2010-03-19  1:00 ` [C/R v20][PATCH 84/96] c/r: restore task fs_root and pwd (v3) Oren Laadan
2010-03-19  1:00 ` [C/R v20][PATCH 85/96] c/r: preliminary support mounts namespace Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BA72D00.7040406@free.fr \
    --to=daniel.lezcano@free.fr \
    --cc=adilger@sun.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=jamie@shareable.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).