From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Dave Hansen <dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Linus Torvalds <torvalds-3NddpPZAyC0@public.gmane.org>,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
Serge Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
Ingo Molnar <mingo-X9Un+BFzKDI@public.gmane.org>,
"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC v10][PATCH 08/13] Dump open file descriptors
Date: Mon, 01 Dec 2008 15:23:57 -0500 [thread overview]
Message-ID: <493447DD.7010102@cs.columbia.edu> (raw)
In-Reply-To: <1228153645.2971.36.camel@nimitz>
Dave Hansen wrote:
> On Fri, 2008-11-28 at 10:19 +0000, Al Viro wrote:
>> On Wed, Nov 26, 2008 at 08:04:39PM -0500, Oren Laadan wrote:
>>> +int cr_scan_fds(struct files_struct *files, int **fdtable)
>>> +{
>>> + struct fdtable *fdt;
>>> + int *fds;
>>> + int i, n = 0;
>>> + int tot = CR_DEFAULT_FDTABLE;
>>> +
>>> + fds = kmalloc(tot * sizeof(*fds), GFP_KERNEL);
>>> + if (!fds)
>>> + return -ENOMEM;
>>> +
>>> + /*
>>> + * We assume that the target task is frozen (or that we checkpoint
>>> + * ourselves), so we can safely proceed after krealloc() from where
>>> + * we left off; in the worst cases restart will fail.
>>> + */
>> Task may be frozen, but it may share the table with any number of other
>> tasks...
>
> First of all, thanks for looking at this, Al.
That goes for me too.
>
> I think Oren's assumption here is that all tasks possibly sharing the
> table would be frozen. I don't think that's a good assumption,
> either. :)
>
> This would be a lot safer and bulletproof if we size the allocation
> ahead of time, take all the locks, then retry if the size has changed.
Yes, I assume that all tasks possibly sharing the table are frozen for
this to work "right"; by "right" I mean that if checkpoint completes
then a matching restart would complete successfully.
Verifying that the size doesn't change does not ensure that the table's
contents remained the same, so we can still end up with obsolete data.
For that, you'll need to take the lock over a very long period of time,
while you capture and write data about the files.
So back to the assumption: the idea is that if the user does not adhere
to this assumption - ensuring that all tasks are frozen and that there
are no sharing otherwise - then the results are undefined (just as, e.g.
not calling execve() immediately after vfork() gives undefined results).
By "undefined" I mean that it may produce a checkpoint image that can't
be restarted, or fail. If the assumption doesn't hold, then the table may
change, and we will get a list of fd's which is "incorrect". The comment
before this function explicitly says that "The caller must validate the
file descriptors...", so the checkpoint may will fail subsequently, eg.
while trying to access a bad fd. Otherwise, the image file will reflect
a state that is inconsistent, and restart will fail.
Note that in both cases, the code will not crash the kernel (unless you
think I missed something...)
(BTW, the alternative would be to test for such sharing and abort with
an error. The only way I can think of is to loop through the all the
tasks, for each files_struct that we find. But we don't really need to,
given the argument above).
>
> I think that will just plain work of we do this:
>
>>> + spin_lock(&files->file_lock);
>>> + rcu_read_lock();
>>> + fdt = files_fdtable(files);
>>> + for (i = 0; i < fdt->max_fds; i++) {
>>> + if (!fcheck_files(files, i))
>>> + continue;
>>> + if (n == tot) {
>>> + /*
>>> + * fcheck_files() is safe with drop/re-acquire
>>> + * of the lock, because it tests: fd < max_fds
>>> + */
>>> + spin_unlock(&files->file_lock);
>>> + rcu_read_unlock();
>>> + tot *= 2; /* won't overflow: kmalloc will fail */
>
> free(fds);
> goto first_kmalloc_in_this_function;
>
>>> + }
>>> + fds[n++] = i;
>>> + }
>>> + rcu_read_unlock();
>>> + spin_unlock(&files->file_lock);
>>> +
>>> + *fdtable = fds;
>>> + return n;
>>> +}
>
> Right?
Lol .. that was actually in my original post, and changed in response to
comments that explicitly asked to not count in advance :o
Actually, I think the current code is cleaner, and I don't see how counting
in advance with the lock give us any advantage here.
>>> + switch (inode->i_mode & S_IFMT) {
>>> + case S_IFREG:
>>> + fd_type = CR_FD_FILE;
>>> + break;
>>> + case S_IFDIR:
>>> + fd_type = CR_FD_DIR;
>>> + break;
>>> + case S_IFLNK:
>>> + fd_type = CR_FD_LINK;
>> Opened symlinks? May I have whatever you'd been smoking, please?
>
> Ugh, that certainly doesn't have any place here. I do wonder if Oren
> had some use for that in the fully put together code, but it can
> certainly go for now.
Not anymore. Indeed ugh...
Thanks,
Oren.
>
> I'll send patches for these shortly.
>
> -- Dave
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2008-12-01 20:23 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-27 1:04 [RFC v10][PATCH 00/13] Kernel based checkpoint/restart Oren Laadan
[not found] ` <1227747884-14150-1-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-27 1:04 ` [RFC v10][PATCH 01/13] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 02/13] Checkpoint/restart: initial documentation Oren Laadan
[not found] ` <1227747884-14150-3-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:45 ` Al Viro
[not found] ` <20081128104554.GP28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 18:15 ` Dave Hansen
2008-11-27 1:04 ` [RFC v10][PATCH 03/13] General infrastructure for checkpoint restart Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 04/13] x86 support for checkpoint/restart Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 05/13] Dump memory address space Oren Laadan
[not found] ` <1227747884-14150-6-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:53 ` Al Viro
[not found] ` <20081128105351.GQ28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 18:00 ` Dave Hansen
2008-12-01 20:57 ` Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 06/13] Restore " Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 07/13] Infrastructure for shared objects Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 08/13] Dump open file descriptors Oren Laadan
[not found] ` <1227747884-14150-9-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 10:19 ` Al Viro
[not found] ` <20081128101919.GO28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 17:47 ` Dave Hansen
2008-12-01 20:23 ` Oren Laadan [this message]
[not found] ` <493447DD.7010102-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 20:51 ` Dave Hansen
2008-12-01 21:02 ` Linus Torvalds
[not found] ` <alpine.LFD.2.00.0812011258390.3256-nfNrOhbfy2R17+2ddN/4kux8cNe9sq/dYPYVAmT7z5s@public.gmane.org>
2008-12-01 21:25 ` Dave Hansen
2008-12-01 21:20 ` Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 09/13] Restore open file descriprtors Oren Laadan
[not found] ` <1227747884-14150-10-git-send-email-orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-11-28 11:27 ` Al Viro
[not found] ` <20081128112745.GR28946-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2008-12-01 19:22 ` Dave Hansen
2008-12-01 20:41 ` Oren Laadan
[not found] ` <49344C11.6090204-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 20:54 ` Dave Hansen
2008-12-01 21:00 ` Oren Laadan
[not found] ` <49345086.4-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-12-01 21:07 ` Dave Hansen
2008-12-02 1:31 ` Dave Hansen
2008-12-02 1:12 ` Dave Hansen
2008-11-27 1:04 ` [RFC v10][PATCH 10/13] External checkpoint of a task other than ourself Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 11/13] Track in-kernel when we expect checkpoint/restart to work Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 12/13] Checkpoint multiple processes Oren Laadan
2008-11-27 1:04 ` [RFC v10][PATCH 13/13] Restart " Oren Laadan
2008-12-03 23:58 ` [RFC v10][PATCH 00/13] Kernel based checkpoint/restart Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=493447DD.7010102@cs.columbia.edu \
--to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
--cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mingo-X9Un+BFzKDI@public.gmane.org \
--cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
--cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
--cc=torvalds-3NddpPZAyC0@public.gmane.org \
--cc=viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).