linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@osdl.org>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-api@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	Serge Hallyn <serue@us.ibm.com>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC v10][PATCH 08/13] Dump open file descriptors
Date: Mon, 01 Dec 2008 16:20:05 -0500	[thread overview]
Message-ID: <49345505.5050104@cs.columbia.edu> (raw)
In-Reply-To: <1228164679.2971.91.camel@nimitz>



Dave Hansen wrote:
> On Mon, 2008-12-01 at 15:23 -0500, Oren Laadan wrote:
>> Verifying that the size doesn't change does not ensure that the table's
>> contents remained the same, so we can still end up with obsolete data.
> 
> With the realloc() scheme, we have virtually no guarantees about how the
> fdtable that we read relates to the source.  All that we know is that
> the n'th fd was at this value at *some* time.
> 
> Using the scheme that I just suggested (and you evidently originally
> used) at least guarantees that we have an atomic copy of the fdtable.

It doesn't matter if the fdtable changes while we scan the fd's or
after we're done scanning the fd's, does it ?  the end result is the
same - inconsistency.

> Why is this done in two steps?  It first grabs a list of fd numbers
> which needs to be validated, then goes back and turns those into 'struct
> file's which it saves off.  Is there a problem with doing that
> fd->'struct file' conversion under the files->file_lock?

Again, say we get all the struct file for all those fd's. Now we have to
drop the lock, and start processing the struct files. So what if at this
point the fdtable changes ?  we still end up with potentially inconsistent
state.

We can do it atomically or non-atomically, I couldn't care less.

My argument is the following:
1) Ideally, the state should not change while we are checkpointing
2) If the state changes, then the behavior is undefined (but the kernel
   should not crash, of course).
3) It should be the userspace responsibility to ensure that (1) holds
   (for instance, that all tasks sharing these resources are frozen
   during the checkpoint)

#2 will be an uncommon event - a programmer/user/administrator mistake.
Kernel may help enforce this rule, but catching all such cases will be
expensive as it means looping through all tasks for each new resource.

As a first step, we can ensure that all threads of a process are frozen
at checkpoint time (and prevent unfreeze until checkpoint is over). But
that will not cover a clone() with CLONE_FS. And this will be the same
for other namespaces later.

Oren.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-12-01 21:20 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-27  1:04 [RFC v10][PATCH 00/13] Kernel based checkpoint/restart Oren Laadan
     [not found] ` <1227747884-14150-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-27  1:04   ` [RFC v10][PATCH 01/13] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 02/13] Checkpoint/restart: initial documentation Oren Laadan
     [not found]     ` <1227747884-14150-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:45       ` Al Viro
     [not found]         ` <20081128104554.GP28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 18:15           ` Dave Hansen
2008-11-27  1:04   ` [RFC v10][PATCH 03/13] General infrastructure for checkpoint restart Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 04/13] x86 support for checkpoint/restart Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 05/13] Dump memory address space Oren Laadan
     [not found]     ` <1227747884-14150-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:53       ` Al Viro
     [not found]         ` <20081128105351.GQ28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 18:00           ` Dave Hansen
2008-12-01 20:57             ` Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 06/13] Restore " Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 07/13] Infrastructure for shared objects Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 08/13] Dump open file descriptors Oren Laadan
     [not found]     ` <1227747884-14150-9-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:19       ` Al Viro
     [not found]         ` <20081128101919.GO28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 17:47           ` Dave Hansen
2008-12-01 20:23             ` Oren Laadan
     [not found]               ` <493447DD.7010102-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 20:51                 ` Dave Hansen
2008-12-01 21:02                   ` Linus Torvalds
     [not found]                     ` <alpine.LFD.2.00.0812011258390.3256-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-12-01 21:25                       ` Dave Hansen
2008-12-01 21:20                   ` Oren Laadan [this message]
2008-11-27  1:04   ` [RFC v10][PATCH 09/13] Restore open file descriprtors Oren Laadan
     [not found]     ` <1227747884-14150-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 11:27       ` Al Viro
     [not found]         ` <20081128112745.GR28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 19:22           ` Dave Hansen
2008-12-01 20:41             ` Oren Laadan
     [not found]               ` <49344C11.6090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 20:54                 ` Dave Hansen
2008-12-01 21:00                   ` Oren Laadan
     [not found]                     ` <49345086.4-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 21:07                       ` Dave Hansen
2008-12-02  1:31                         ` Dave Hansen
2008-12-02  1:12                       ` Dave Hansen
2008-11-27  1:04   ` [RFC v10][PATCH 10/13] External checkpoint of a task other than ourself Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 11/13] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 12/13] Checkpoint multiple processes Oren Laadan
2008-11-27  1:04   ` [RFC v10][PATCH 13/13] Restart " Oren Laadan
2008-12-03 23:58   ` [RFC v10][PATCH 00/13] Kernel based checkpoint/restart Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49345505.5050104@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=serue@us.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@osdl.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).