Re: Banning checkpoint (was: Re: What can OpenVZ do?)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
       [not found]                   ` <c8Xp3-3TH-3@gated-at.bofh.it>
@ 2009-02-24 13:00                     ` Bodo Eggert
  2009-02-24 13:00                     ` Bodo Eggert
  1 sibling, 0 replies; 8+ messages in thread
From: Bodo Eggert @ 2009-02-24 13:00 UTC (permalink / raw)
  To: Alexey Dobriyan, Ingo Molnar, Nathan Lynch, linux-api, containers,
	mpm

Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On Thu, Feb 19, 2009 at 11:11:54AM -0800, Dave Hansen wrote:
>> On Thu, 2009-02-19 at 22:06 +0300, Alexey Dobriyan wrote:

>> Alexey, I agree with you here.  I've been fighting myself internally
>> about these two somewhat opposing approaches.  Of *course* we can
>> determine the "checkpointability" at sys_checkpoint() time by checking
>> all the various bits of state.
>> 
>> The problem that I think Ingo is trying to address here is that doing it
>> then makes it hard to figure out _when_ you went wrong.  That's the
>> single most critical piece of finding out how to go address it.
>> 
>> I see where you are coming from.  Ingo's suggestion has the *huge*
>> downside that we've got to go muck with a lot of generic code and hook
>> into all the things we don't support.
>> 
>> I think what I posted is a decent compromise.  It gets you those
>> warnings at runtime and is a one-way trip for any given process.  But,
>> it does detect in certain cases (fork() and unshare(FILES)) when it is
>> safe to make the trip back to the "I'm checkpointable" state again.
> 
> "Checkpointable" is not even per-process property.
> 
> Imagine, set of SAs (struct xfrm_state) and SPDs (struct xfrm_policy).
> They are a) per-netns, b) persistent.
> 
> You can hook into socketcalls to mark process as uncheckpointable,
> but since SAs and SPDs are persistent, original process already exited.
> You're going to walk every process with same netns as SA adder and mark
> it as uncheckpointable. Definitely doable, but ugly, isn't it?
> 
> Same for iptable rules.
> 
> "Checkpointable" is container property, OK?

IMO: Everything around the process may change as long as you can do the same
using 'kill -STOP $PID; ...; kill -CONT $PID;'. E.g. changing iptables rules
can be done to a normal process, so this should not prevent checkpointing
(unless you checkpoint iptables, but don't do that then?).

BTW1: I might want to checkpoint something like seti@home. It will connect
to a server from time to time, and send/receive a packet. If having opened
a socket once in a lifetime would prevent checkpointing, this won't be
possible. I see the benefit of the one-way-flag forcing to make all
syscalls be checkpointable, but this won't work on sockets.

Therefore I think you need something inbetween. Some syscalls (etc.) are not
supported, so just make the process be uncheckpointable. But some syscalls
will enter and leave non-checkpointable states by design, they need at least
counters.

Maybe you'll want to let the application decide if it's OK to be checkpointed
on some conditions, too. The Seti client might know how to handle broken
connections, and doing duplicate transfers or skipping them is expected, too.
So the Seti client might declare the socket to be checkpointable, instead of
making the do-the-checkpoint application wait until it's closed.

BTW2: There is the problem of invalidating checkpoints, too. If a browser did
a HTTP PUT, you don't want to restore the checkpoint where it was just about
to start the PUT request. The application should be able to signal this to
a checkpointing daemon. There will be a race, so having a signal "Invalidate
checkpoints" won't work, but if the application sends a stable hash value,
the duplicate can be detected. (Off cause you'd say "don't do that then" for
browsers, but it's just an example. Off cause 2, the checkpoint daemon is
only needed for advanced setups, a simple "checkpoint $povray --store jobfile"
should just work.)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
       [not found]                   ` <c8Xp3-3TH-3@gated-at.bofh.it>
  2009-02-24 13:00                     ` Banning checkpoint (was: Re: What can OpenVZ do?) Bodo Eggert
@ 2009-02-24 13:00                     ` Bodo Eggert
  1 sibling, 0 replies; 8+ messages in thread
From: Bodo Eggert @ 2009-02-24 13:00 UTC (permalink / raw)
  To: Alexey Dobriyan, Ingo Molnar, Nathan Lynch, linux-api, containers,
	mpm, linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds,
	tglx, xemul, Dave Hansen

Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On Thu, Feb 19, 2009 at 11:11:54AM -0800, Dave Hansen wrote:
>> On Thu, 2009-02-19 at 22:06 +0300, Alexey Dobriyan wrote:

>> Alexey, I agree with you here.  I've been fighting myself internally
>> about these two somewhat opposing approaches.  Of *course* we can
>> determine the "checkpointability" at sys_checkpoint() time by checking
>> all the various bits of state.
>> 
>> The problem that I think Ingo is trying to address here is that doing it
>> then makes it hard to figure out _when_ you went wrong.  That's the
>> single most critical piece of finding out how to go address it.
>> 
>> I see where you are coming from.  Ingo's suggestion has the *huge*
>> downside that we've got to go muck with a lot of generic code and hook
>> into all the things we don't support.
>> 
>> I think what I posted is a decent compromise.  It gets you those
>> warnings at runtime and is a one-way trip for any given process.  But,
>> it does detect in certain cases (fork() and unshare(FILES)) when it is
>> safe to make the trip back to the "I'm checkpointable" state again.
> 
> "Checkpointable" is not even per-process property.
> 
> Imagine, set of SAs (struct xfrm_state) and SPDs (struct xfrm_policy).
> They are a) per-netns, b) persistent.
> 
> You can hook into socketcalls to mark process as uncheckpointable,
> but since SAs and SPDs are persistent, original process already exited.
> You're going to walk every process with same netns as SA adder and mark
> it as uncheckpointable. Definitely doable, but ugly, isn't it?
> 
> Same for iptable rules.
> 
> "Checkpointable" is container property, OK?

IMO: Everything around the process may change as long as you can do the same
using 'kill -STOP $PID; ...; kill -CONT $PID;'. E.g. changing iptables rules
can be done to a normal process, so this should not prevent checkpointing
(unless you checkpoint iptables, but don't do that then?).

BTW1: I might want to checkpoint something like seti@home. It will connect
to a server from time to time, and send/receive a packet. If having opened
a socket once in a lifetime would prevent checkpointing, this won't be
possible. I see the benefit of the one-way-flag forcing to make all
syscalls be checkpointable, but this won't work on sockets.

Therefore I think you need something inbetween. Some syscalls (etc.) are not
supported, so just make the process be uncheckpointable. But some syscalls
will enter and leave non-checkpointable states by design, they need at least
counters.

Maybe you'll want to let the application decide if it's OK to be checkpointed
on some conditions, too. The Seti client might know how to handle broken
connections, and doing duplicate transfers or skipping them is expected, too.
So the Seti client might declare the socket to be checkpointable, instead of
making the do-the-checkpoint application wait until it's closed.

BTW2: There is the problem of invalidating checkpoints, too. If a browser did
a HTTP PUT, you don't want to restore the checkpoint where it was just about
to start the PUT request. The application should be able to signal this to
a checkpointing daemon. There will be a race, so having a signal "Invalidate
checkpoints" won't work, but if the application sends a stable hash value,
the duplicate can be detected. (Off cause you'd say "don't do that then" for
browsers, but it's just an example. Off cause 2, the checkpoint daemon is
only needed for advanced setups, a simple "checkpoint $povray --store jobfile"
should just work.)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
@ 2009-02-13 10:53 Ingo Molnar
  2009-02-16 20:51 ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-02-13 10:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Hansen, mpm, containers, hpa, linux-kernel, linux-mm, viro,
	linux-api, torvalds, tglx, xemul

* Andrew Morton <akpm@linux-foundation.org> wrote:

> Now, we've gone in blind before - most notably on the
> containers/cgroups/namespaces stuff.  That hail mary pass worked out
> acceptably, I think.  Maybe we got lucky.  I thought that
> net-namespaces in particular would never get there, but it did.
> 
> That was a very large and quite long-term-important user-visible
> feature.
> 
> checkpoint/restart/migration is also a long-term-...-feature.  But if
> at all possible I do think that we should go into it with our eyes a
> little less shut.

IMO, s/.../important/

More important than containers in fact. Being able to detach all
software state from the hw state and being able to reattach it:

   1) at a later point in time,                   or
   2) in a different piece of hardware,           or
   3) [future] in a different kernel

... is powerful stuff on a very conceptual level IMO.

The only reason we dont have it in every OS is not because it's not
desired and not wanted, but because it's very, very hard to do it on
a wide scale. But people would love it even if it adds (some) overhead.

This kind of featureset is actually the main motivator for virtualization.

If the native kernel was able to do checkpointing we'd have not only
near-zero-cost virtualization done at the right abstraction level
(when combined with containers/control-groups), but we'd also have
a few future feature items like:

  1) Kernel upgrades done intelligently: transparent reboot into an
     upgraded kernel.

  2) Downgrade-on-regressions done sanely: transparent downgrade+reboot
     to a known-working kernel. (as long as the regression is app
     misbehavior or a performance problem - not a kernel crash. Most
     regressions on kernel upgrades are not actual crashes or data
     corruption but functional and performance regressions - i.e. it's
     safely checkpointable and downgradeable.)

  3) Hibernation done intelligently: checkpoint everything, turn off
     system. Turn on system, restore everything from the checkpoint.

  4) Backups done intelligently: full "backups" of long-running
     computational jobs, maybe even of complex things like databases
     or desktop sessions.

  5) Remote debugging done intelligently: got a crashed session?
     Checkpoint the whole app in its anomalous state and upload the
     image (as long as you can trust the developer with that image
     and with the filesystem state that goes with it).

I dont see many long-term dragons here. The kernel is obviously always
able to do near-zero-overhead checkpointing: it knows about all its
own data structures, can enumerate them and knows how they map to
user-space objects.

The rest is performance considerations: do we want to embedd
checkpointing helpers in certain runtime codepaths, to make
checkpointing faster? But if that is undesirable (serialization,
etc.), we can always fall back to the dumbest, zero-overhead methods.

There is _one_ interim runtime cost: the "can we checkpoint or not"
decision that the kernel has to make while the feature is not complete.

That, if this feature takes off, is just a short-term worry - as
basically everything will be checkpointable in the long run.

In any case, by designing checkpointing to reuse the existing LSM
callbacks, we'd hit multiple birds with the same stone. (One of
which is the constant complaints about the runtime costs of the LSM
callbacks - with checkpointing we get an independent, non-security
user of the facility which is a nice touch.)

So all things considered it does not look like a bad deal to me - but
i might be missing something nasty.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-13 10:53 What can OpenVZ do? Ingo Molnar
@ 2009-02-16 20:51 ` Dave Hansen
  2009-02-17 22:23   ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2009-02-16 20:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-api, containers, hpa, linux-kernel, linux-mm,
	viro, mpm, tglx, torvalds, xemul, Nathan Lynch

On Fri, 2009-02-13 at 11:53 +0100, Ingo Molnar wrote:
> In any case, by designing checkpointing to reuse the existing LSM
> callbacks, we'd hit multiple birds with the same stone. (One of
> which is the constant complaints about the runtime costs of the LSM
> callbacks - with checkpointing we get an independent, non-security
> user of the facility which is a nice touch.)

There's a fundamental problem with using LSM that I'm seeing now that I
look at using it for file descriptors.  The LSM hooks are there to say,
"No, you can't do this" and abort whatever kernel operation was going
on.  That's good for detecting when we do something that's "bad" for
checkpointing.

*But* it completely falls on its face when we want to find out when we
are doing things that are *good*.  For instance, let's say that we open
a network socket.  The LSM hook sees it and marks us as
uncheckpointable.  What about when we close it?  We've become
checkpointable again.  But, there's no LSM hook for the close side
because we don't currently have a need for it.

We have a couple of options:

We can let uncheckpointable actions behave like security violations and
just abort the kernel calls.  The problem with this is that it makes it
difficult to do *anything* unless your application is 100% supported.
Pretty inconvenient, especially at first.  Might be useful later on
though.

We could just log the actions and let them proceed.  But the problem
with this is that we don't get the temporal idea when an app transitions
between the "good" and "bad" states.  We would need to work on culling
the output in the logs since we'd be potentially getting a lot of
redundant data.

We could add to the set of security hooks.  Make sure that we cover all
the transitional states like close().

What I'm thinking about doing for now is what I have attached here.  We
allow the apps who we want to be checkpointed to query some interface
that will use the same checks that sys_checkpoint() does internally.
Say:

# cat /proc/1072/checkpointable
mm: 1
files: 0
...

Then, when it realizes that its files can't be checkpointed, it can look
elsewhere:

/proc/1072/fdinfo/2:pos:	0
/proc/1072/fdinfo/2:flags:	02
/proc/1072/fdinfo/2:checkpointable: 0 (special file)
/proc/1072/fdinfo/3:pos:	0
/proc/1072/fdinfo/3:flags:	04000
/proc/1072/fdinfo/3:checkpointable: 0 (pipefs does not support checkpoint)
/proc/1072/fdinfo/4:pos:	0
/proc/1072/fdinfo/4:flags:	04002
/proc/1072/fdinfo/4:checkpointable: 0 (sockfs does not support checkpoint)
/proc/1074/fdinfo/0:pos:	0
/proc/1074/fdinfo/0:flags:	0100002
/proc/1074/fdinfo/0:checkpointable: 0 (devpts does not support checkpoint)

That requires zero overhead during runtime of the app.  It is also less
error-prone because we don't have any of the transitions to catch.

-- Dave

diff --git a/checkpoint/ckpt_file.c b/checkpoint/ckpt_file.c
index e3097ac..ebe776a 100644
--- a/checkpoint/ckpt_file.c
+++ b/checkpoint/ckpt_file.c
@@ -72,6 +72,32 @@ int cr_scan_fds(struct files_struct *files, int **fdtable)
 	return n;
 }
 
+int cr_can_checkpoint_file(struct file *file, char *explain, int left)
+{
+	char p[] = "checkpointable";
+	struct inode *inode = file->f_dentry->d_inode;
+	struct file_system_type *fs_type = inode->i_sb->s_type;
+
+	printk("%s() left: %d\n", __func__, left);
+
+	if (!(fs_type->fs_flags & FS_CHECKPOINTABLE)) {
+		if (explain)
+			snprintf(explain, left,
+				"%s: 0 (%s does not support checkpoint)\n",
+				p, fs_type->name);
+		return 0;
+	}
+
+	if (special_file(inode->i_mode)) {
+		if (explain)
+			snprintf(explain, left,	"%s: 0 (special file)\n", p);
+		return 0;
+	}
+
+	snprintf(explain, left, "%s: 1\n", p);
+	return 1;
+}
+
 /* cr_write_fd_data - dump the state of a given file pointer */
 static int cr_write_fd_data(struct cr_ctx *ctx, struct file *file, int parent)
 {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d467760..2300353 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1597,7 +1597,19 @@ out:
 	return ~0U;
 }
 
-#define PROC_FDINFO_MAX 64
+#define PROC_FDINFO_MAX PAGE_SIZE
+
+static void proc_fd_write_info(struct file *file, char *info)
+{
+	int max = PROC_FDINFO_MAX;
+	int p = 0;
+	if (!info)
+		return;
+
+	p += snprintf(info+p, max-p, "pos:\t%lli\n", (long long) file->f_pos);
+	p += snprintf(info+p, max-p, "flags:\t0%o\n", file->f_flags);
+	cr_can_checkpoint_file(file, info, max-p);
+}
 
 static int proc_fd_info(struct inode *inode, struct path *path, char *info)
 {
@@ -1622,12 +1634,7 @@ static int proc_fd_info(struct inode *inode, struct path *path, char *info)
 				*path = file->f_path;
 				path_get(&file->f_path);
 			}
-			if (info)
-				snprintf(info, PROC_FDINFO_MAX,
-					 "pos:\t%lli\n"
-					 "flags:\t0%o\n",
-					 (long long) file->f_pos,
-					 file->f_flags);
+			proc_fd_write_info(file, info);
 			spin_unlock(&files->file_lock);
 			put_files_struct(files);
 			return 0;
@@ -1831,10 +1838,11 @@ static int proc_readfd(struct file *filp, void *dirent, filldir_t filldir)
 static ssize_t proc_fdinfo_read(struct file *file, char __user *buf,
 				      size_t len, loff_t *ppos)
 {
-	char tmp[PROC_FDINFO_MAX];
+	char *tmp = kmalloc(PROC_FDINFO_MAX, GFP_KERNEL);
 	int err = proc_fd_info(file->f_path.dentry->d_inode, NULL, tmp);
 	if (!err)
 		err = simple_read_from_buffer(buf, len, ppos, tmp, strlen(tmp));
+	kfree(tmp);
 	return err;
 }
 
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index 217cf6e..84e69b0 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -142,11 +142,17 @@ static inline void __task_deny_checkpointing(struct task_struct *task,
 #define task_deny_checkpointing(p)  \
 	__task_deny_checkpointing(p, __FILE__, __LINE__)
 
+int cr_can_checkpoint_file(struct file *file, char *explain, int left);
+
 #else
 
 static inline void task_deny_checkpointing(struct task_struct *task) {}
 static inline void process_deny_checkpointing(struct task_struct *task) {}
 
-#endif
+static inline int cr_can_checkpoint_file(struct file *file, char *explain, int left)
+{
+	return 0;
+}
 
+#endif
 #endif /* _CHECKPOINT_CKPT_H_ */


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-16 20:51 ` Dave Hansen
@ 2009-02-17 22:23   ` Ingo Molnar
  2009-02-17 22:30     ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-02-17 22:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, linux-api, containers, hpa, linux-kernel, linux-mm,
	viro, mpm, tglx, torvalds, xemul, Nathan Lynch


* Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Fri, 2009-02-13 at 11:53 +0100, Ingo Molnar wrote:
> > In any case, by designing checkpointing to reuse the existing LSM
> > callbacks, we'd hit multiple birds with the same stone. (One of
> > which is the constant complaints about the runtime costs of the LSM
> > callbacks - with checkpointing we get an independent, non-security
> > user of the facility which is a nice touch.)
> 
> There's a fundamental problem with using LSM that I'm seeing 
> now that I look at using it for file descriptors.  The LSM 
> hooks are there to say, "No, you can't do this" and abort 
> whatever kernel operation was going on.  That's good for 
> detecting when we do something that's "bad" for checkpointing.
> 
> *But* it completely falls on its face when we want to find out 
> when we are doing things that are *good*.  For instance, let's 
> say that we open a network socket.  The LSM hook sees it and 
> marks us as uncheckpointable.  What about when we close it?  
> We've become checkpointable again.  But, there's no LSM hook 
> for the close side because we don't currently have a need for 
> it.

Uncheckpointable should be a one-way flag anyway. We want this 
to become usable, so uncheckpointable functionality should be as 
painful as possible, to make sure it's getting fixed ...

> We have a couple of options:
> 
> We can let uncheckpointable actions behave like security 
> violations and just abort the kernel calls.  The problem with 
> this is that it makes it difficult to do *anything* unless 
> your application is 100% supported. Pretty inconvenient, 
> especially at first.  Might be useful later on though.

It still beats "no checkpointing support at all in the upstream 
kernel", by a wide merging. If an app fails, the more reasons to 
bring checkpointing support up to production quality? We dont 
want to make the 'interim' state _too_ convenient, because it 
will quickly turn into the status quo.

Really, the LSM approach seems to be the right approach here. It 
keeps maintenance costs very low - there's no widespread 
BKL-style flaggery.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-17 22:23   ` Ingo Molnar
@ 2009-02-17 22:30     ` Dave Hansen
  2009-02-18  0:32       ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2009-02-17 22:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-api, containers, hpa, linux-kernel, linux-mm,
	viro, mpm, tglx, torvalds, xemul, Nathan Lynch

On Tue, 2009-02-17 at 23:23 +0100, Ingo Molnar wrote:
> * Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > On Fri, 2009-02-13 at 11:53 +0100, Ingo Molnar wrote:
> > > In any case, by designing checkpointing to reuse the existing LSM
> > > callbacks, we'd hit multiple birds with the same stone. (One of
> > > which is the constant complaints about the runtime costs of the LSM
> > > callbacks - with checkpointing we get an independent, non-security
> > > user of the facility which is a nice touch.)
> > 
> > There's a fundamental problem with using LSM that I'm seeing 
> > now that I look at using it for file descriptors.  The LSM 
> > hooks are there to say, "No, you can't do this" and abort 
> > whatever kernel operation was going on.  That's good for 
> > detecting when we do something that's "bad" for checkpointing.
> > 
> > *But* it completely falls on its face when we want to find out 
> > when we are doing things that are *good*.  For instance, let's 
> > say that we open a network socket.  The LSM hook sees it and 
> > marks us as uncheckpointable.  What about when we close it?  
> > We've become checkpointable again.  But, there's no LSM hook 
> > for the close side because we don't currently have a need for 
> > it.
> 
> Uncheckpointable should be a one-way flag anyway. We want this 
> to become usable, so uncheckpointable functionality should be as 
> painful as possible, to make sure it's getting fixed ...

Again, as these patches stand, we don't support checkpointing when
non-simple files are opened.  Basically, if a open()/lseek() pair won't
get you back where you were, we don't deal with them.

init does non-checkpointable things.  If the flag is a one-way trip,
we'll never be able to checkpoint because we'll always inherit init's !
checkpointable flag.  

To fix this, we could start working on making sure we can checkpoint
init, but that's practically worthless.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-17 22:30     ` Dave Hansen
@ 2009-02-18  0:32       ` Ingo Molnar
  2009-02-18  0:40         ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-02-18  0:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, linux-api, containers, hpa, linux-kernel, linux-mm,
	viro, mpm, tglx, torvalds, xemul, Nathan Lynch


* Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Tue, 2009-02-17 at 23:23 +0100, Ingo Molnar wrote:
> > * Dave Hansen <dave@linux.vnet.ibm.com> wrote:
> > > On Fri, 2009-02-13 at 11:53 +0100, Ingo Molnar wrote:
> > > > In any case, by designing checkpointing to reuse the existing LSM
> > > > callbacks, we'd hit multiple birds with the same stone. (One of
> > > > which is the constant complaints about the runtime costs of the LSM
> > > > callbacks - with checkpointing we get an independent, non-security
> > > > user of the facility which is a nice touch.)
> > > 
> > > There's a fundamental problem with using LSM that I'm seeing 
> > > now that I look at using it for file descriptors.  The LSM 
> > > hooks are there to say, "No, you can't do this" and abort 
> > > whatever kernel operation was going on.  That's good for 
> > > detecting when we do something that's "bad" for checkpointing.
> > > 
> > > *But* it completely falls on its face when we want to find out 
> > > when we are doing things that are *good*.  For instance, let's 
> > > say that we open a network socket.  The LSM hook sees it and 
> > > marks us as uncheckpointable.  What about when we close it?  
> > > We've become checkpointable again.  But, there's no LSM hook 
> > > for the close side because we don't currently have a need for 
> > > it.
> > 
> > Uncheckpointable should be a one-way flag anyway. We want this 
> > to become usable, so uncheckpointable functionality should be as 
> > painful as possible, to make sure it's getting fixed ...
> 
> Again, as these patches stand, we don't support checkpointing 
> when non-simple files are opened.  Basically, if a 
> open()/lseek() pair won't get you back where you were, we 
> don't deal with them.
> 
> init does non-checkpointable things.  If the flag is a one-way 
> trip, we'll never be able to checkpoint because we'll always 
> inherit init's ! checkpointable flag.
> 
> To fix this, we could start working on making sure we can 
> checkpoint init, but that's practically worthless.

i mean, it should be per process (per app) one-way flag of 
course. If the app does something unsupported, it gets 
non-checkpointable and that's it.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-18  0:32       ` Ingo Molnar
@ 2009-02-18  0:40         ` Dave Hansen
  2009-02-18  5:11           ` Alexey Dobriyan
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2009-02-18  0:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, linux-api, containers, hpa, linux-kernel, linux-mm,
	viro, mpm, tglx, torvalds, xemul, Nathan Lynch

On Wed, 2009-02-18 at 01:32 +0100, Ingo Molnar wrote:
> > > Uncheckpointable should be a one-way flag anyway. We want this 
> > > to become usable, so uncheckpointable functionality should be as 
> > > painful as possible, to make sure it's getting fixed ...
> > 
> > Again, as these patches stand, we don't support checkpointing 
> > when non-simple files are opened.  Basically, if a 
> > open()/lseek() pair won't get you back where you were, we 
> > don't deal with them.
> > 
> > init does non-checkpointable things.  If the flag is a one-way 
> > trip, we'll never be able to checkpoint because we'll always 
> > inherit init's ! checkpointable flag.
> > 
> > To fix this, we could start working on making sure we can 
> > checkpoint init, but that's practically worthless.
> 
> i mean, it should be per process (per app) one-way flag of 
> course. If the app does something unsupported, it gets 
> non-checkpointable and that's it.

OK, we can definitely do that.  Do you think it is OK to run through a
set of checks at exec() time to check if the app currently has any
unsupported things going on?  If we don't directly inherit the parent's
status, then we need to have *some* time when we check it.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-18  0:40         ` Dave Hansen
@ 2009-02-18  5:11           ` Alexey Dobriyan
  2009-02-18 18:16             ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Dobriyan @ 2009-02-18  5:11 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, Andrew Morton, linux-api, containers, hpa,
	linux-kernel, linux-mm, viro, mpm, tglx, torvalds, xemul,
	Nathan Lynch

On Tue, Feb 17, 2009 at 04:40:39PM -0800, Dave Hansen wrote:
> On Wed, 2009-02-18 at 01:32 +0100, Ingo Molnar wrote:
> > > > Uncheckpointable should be a one-way flag anyway. We want this 
> > > > to become usable, so uncheckpointable functionality should be as 
> > > > painful as possible, to make sure it's getting fixed ...
> > > 
> > > Again, as these patches stand, we don't support checkpointing 
> > > when non-simple files are opened.  Basically, if a 
> > > open()/lseek() pair won't get you back where you were, we 
> > > don't deal with them.
> > > 
> > > init does non-checkpointable things.  If the flag is a one-way 
> > > trip, we'll never be able to checkpoint because we'll always 
> > > inherit init's ! checkpointable flag.
> > > 
> > > To fix this, we could start working on making sure we can 
> > > checkpoint init, but that's practically worthless.
> > 
> > i mean, it should be per process (per app) one-way flag of 
> > course. If the app does something unsupported, it gets 
> > non-checkpointable and that's it.
> 
> OK, we can definitely do that.  Do you think it is OK to run through a
> set of checks at exec() time to check if the app currently has any
> unsupported things going on?  If we don't directly inherit the parent's
> status, then we need to have *some* time when we check it.

Uncheckpointable is not one-way.

Imagine remap_file_pages(2) is unsupported. Now app uses
remap_file_pages(2), then unmaps interesting VMA. Now app is
checkpointable again.

As for overloading LSM, I think, it would be horrible.
Most hooks are useless, there are config options expanding LSM hooks,
and CPT and LSM are just totally orthogonal.

Instead, just (no offence) get big enough coverage -- run modern and
past distros, run servers packaged with them, and if you can checkpoint
all of this, you're mostly fine.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-18  5:11           ` Alexey Dobriyan
@ 2009-02-18 18:16             ` Ingo Molnar
  2009-02-18 21:27               ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-02-18 18:16 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Dave Hansen, Andrew Morton, linux-api, containers, hpa,
	linux-kernel, linux-mm, viro, mpm, tglx, torvalds, xemul,
	Nathan Lynch

* Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Tue, Feb 17, 2009 at 04:40:39PM -0800, Dave Hansen wrote:
> > On Wed, 2009-02-18 at 01:32 +0100, Ingo Molnar wrote:
> > > > > Uncheckpointable should be a one-way flag anyway. We want this 
> > > > > to become usable, so uncheckpointable functionality should be as 
> > > > > painful as possible, to make sure it's getting fixed ...
> > > > 
> > > > Again, as these patches stand, we don't support checkpointing 
> > > > when non-simple files are opened.  Basically, if a 
> > > > open()/lseek() pair won't get you back where you were, we 
> > > > don't deal with them.
> > > > 
> > > > init does non-checkpointable things.  If the flag is a one-way 
> > > > trip, we'll never be able to checkpoint because we'll always 
> > > > inherit init's ! checkpointable flag.
> > > > 
> > > > To fix this, we could start working on making sure we can 
> > > > checkpoint init, but that's practically worthless.
> > > 
> > > i mean, it should be per process (per app) one-way flag of 
> > > course. If the app does something unsupported, it gets 
> > > non-checkpointable and that's it.
> > 
> > OK, we can definitely do that.  Do you think it is OK to run through a
> > set of checks at exec() time to check if the app currently has any
> > unsupported things going on?  If we don't directly inherit the parent's
> > status, then we need to have *some* time when we check it.
> 
> Uncheckpointable is not one-way.
> 
> Imagine remap_file_pages(2) is unsupported. Now app uses 
> remap_file_pages(2), then unmaps interesting VMA. Now app is 
> checkpointable again.

But that's precisely the kind of over-design that defeats the 
common purpose: which would be to make everything 
checkpointable. (including weirdo APIs like fremap())

Nothing motivates more than app designers complaining about the 
one-way flag.

Furthermore, it's _far_ easier to make a one-way flag SMP-safe. 
We just set it and that's it. When we unset it, what do we about 
SMP races with other threads in the same MM installing another 
non-linear vma, etc.

> As for overloading LSM, I think, it would be horrible. Most 
> hooks are useless, there are config options expanding LSM 
> hooks, and CPT and LSM are just totally orthogonal.

Sure it would have to be adopted to the needs of CPT, but i can 
tell you one thing for sure: there's only one thing that is 
worse than every syscall annotated with an LSM hook (which is 
the current status quo): every syscall annotated with an LSM 
hook _and_ a separate CPT hook.

It's just bad design. CPT might be orthogonal, but it wants to 
hook into syscalls at roughly the same places where LSM hooks 
into, which pretty much settles the question.

If there's places that need new hooks then we can add them not 
as CPT hooks, but as security hooks. That way there's synergy: 
both LSM and CPT advances, on the shoulders of each other.

> Instead, just (no offence) get big enough coverage -- run 
> modern and past distros, run servers packaged with them, and 
> if you can checkpoint all of this, you're mostly fine.

That's definitely a good advice, just it doesnt give the kind of 
minimal environment from where productization efforts can be 
seeded from.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-18 18:16             ` Ingo Molnar
@ 2009-02-18 21:27               ` Dave Hansen
  2009-02-18 23:15                 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2009-02-18 21:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexey Dobriyan, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul

On Wed, 2009-02-18 at 19:16 +0100, Ingo Molnar wrote:
> Nothing motivates more than app designers complaining about the 
> one-way flag.
> 
> Furthermore, it's _far_ easier to make a one-way flag SMP-safe. 
> We just set it and that's it. When we unset it, what do we about 
> SMP races with other threads in the same MM installing another 
> non-linear vma, etc.

After looking at this for file descriptors, I have to really agree with
Ingo on this one, at least as far as the flag is concerned.  I want to
propose one teeny change, though:  I think the flag should be
per-resource.

We should have one flag in mm_struct, one in files_struct, etc...  The
task_is_checkpointable() function can just query task->mm, task->files,
etc...  This gives us nice behavior at clone() *and* fork that just
works.

I'll do this for files_struct and see how it comes out so you can take a
peek.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: What can OpenVZ do?
  2009-02-18 21:27               ` Dave Hansen
@ 2009-02-18 23:15                 ` Ingo Molnar
  2009-02-19 19:06                   ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2009-02-18 23:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alexey Dobriyan, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul


* Dave Hansen <dave@linux.vnet.ibm.com> wrote:

> On Wed, 2009-02-18 at 19:16 +0100, Ingo Molnar wrote:
> > Nothing motivates more than app designers complaining about the 
> > one-way flag.
> > 
> > Furthermore, it's _far_ easier to make a one-way flag SMP-safe. 
> > We just set it and that's it. When we unset it, what do we about 
> > SMP races with other threads in the same MM installing another 
> > non-linear vma, etc.
> 
> After looking at this for file descriptors, I have to really 
> agree with Ingo on this one, at least as far as the flag is 
> concerned.  I want to propose one teeny change, though: I 
> think the flag should be per-resource.
> 
> We should have one flag in mm_struct, one in files_struct, 
> etc...  The task_is_checkpointable() function can just query 
> task->mm, task->files, etc...  This gives us nice behavior at 
> clone() *and* fork that just works.
> 
> I'll do this for files_struct and see how it comes out so you 
> can take a peek.

Yeah, per resource it should be. That's per task in the normal 
case - except for threaded workloads where it's shared by 
threads.

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Banning checkpoint (was: Re: What can OpenVZ do?)
  2009-02-18 23:15                 ` Ingo Molnar
@ 2009-02-19 19:06                   ` Alexey Dobriyan
  2009-02-19 19:11                     ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Dobriyan @ 2009-02-19 19:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Hansen, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul

I think that all these efforts to abort checkpoint "intelligently" by
banning it early are completely misguided.

"Checkpointable" property isn't one-way ticket like "tainted" flag,
so doing it like tainted var isn't right, atomic or not, SMP-safe or
not.

With filesystems, one has ->f_op field to compare against banned
filesystems, one more flag isn't necessary.

Inotify isn't supported yet? You do

	if (!list_empty(&inode->inotify_watches))
		return -E;

without hooking into inotify syscalls.

ptrace(2) isn't supported -- look at struct task_struct::ptraced and
friends.

And so on.

System call (or whatever) does something with some piece of kernel
internals. We look at this "something" when walking data structures and
abort if it's scary enough.

Please, show at least one counter-example.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
  2009-02-19 19:06                   ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
@ 2009-02-19 19:11                     ` Dave Hansen
  2009-02-24  4:47                       ` Alexey Dobriyan
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2009-02-19 19:11 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Ingo Molnar, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul

On Thu, 2009-02-19 at 22:06 +0300, Alexey Dobriyan wrote:
> Inotify isn't supported yet? You do
> 
>         if (!list_empty(&inode->inotify_watches))
>                 return -E;
> 
> without hooking into inotify syscalls.
> 
> ptrace(2) isn't supported -- look at struct task_struct::ptraced and
> friends.
> 
> And so on.
> 
> System call (or whatever) does something with some piece of kernel
> internals. We look at this "something" when walking data structures
> and
> abort if it's scary enough.
> 
> Please, show at least one counter-example.

Alexey, I agree with you here.  I've been fighting myself internally
about these two somewhat opposing approaches.  Of *course* we can
determine the "checkpointability" at sys_checkpoint() time by checking
all the various bits of state.

The problem that I think Ingo is trying to address here is that doing it
then makes it hard to figure out _when_ you went wrong.  That's the
single most critical piece of finding out how to go address it.

I see where you are coming from.  Ingo's suggestion has the *huge*
downside that we've got to go muck with a lot of generic code and hook
into all the things we don't support.

I think what I posted is a decent compromise.  It gets you those
warnings at runtime and is a one-way trip for any given process.  But,
it does detect in certain cases (fork() and unshare(FILES)) when it is
safe to make the trip back to the "I'm checkpointable" state again.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
  2009-02-19 19:11                     ` Dave Hansen
@ 2009-02-24  4:47                       ` Alexey Dobriyan
  2009-02-24  5:11                         ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Dobriyan @ 2009-02-24  4:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul

On Thu, Feb 19, 2009 at 11:11:54AM -0800, Dave Hansen wrote:
> On Thu, 2009-02-19 at 22:06 +0300, Alexey Dobriyan wrote:
> > Inotify isn't supported yet? You do
> > 
> >         if (!list_empty(&inode->inotify_watches))
> >                 return -E;
> > 
> > without hooking into inotify syscalls.
> > 
> > ptrace(2) isn't supported -- look at struct task_struct::ptraced and
> > friends.
> > 
> > And so on.
> > 
> > System call (or whatever) does something with some piece of kernel
> > internals. We look at this "something" when walking data structures
> > and
> > abort if it's scary enough.
> > 
> > Please, show at least one counter-example.
> 
> Alexey, I agree with you here.  I've been fighting myself internally
> about these two somewhat opposing approaches.  Of *course* we can
> determine the "checkpointability" at sys_checkpoint() time by checking
> all the various bits of state.
> 
> The problem that I think Ingo is trying to address here is that doing it
> then makes it hard to figure out _when_ you went wrong.  That's the
> single most critical piece of finding out how to go address it.
> 
> I see where you are coming from.  Ingo's suggestion has the *huge*
> downside that we've got to go muck with a lot of generic code and hook
> into all the things we don't support.
> 
> I think what I posted is a decent compromise.  It gets you those
> warnings at runtime and is a one-way trip for any given process.  But,
> it does detect in certain cases (fork() and unshare(FILES)) when it is
> safe to make the trip back to the "I'm checkpointable" state again.

"Checkpointable" is not even per-process property.

Imagine, set of SAs (struct xfrm_state) and SPDs (struct xfrm_policy).
They are a) per-netns, b) persistent.

You can hook into socketcalls to mark process as uncheckpointable,
but since SAs and SPDs are persistent, original process already exited.
You're going to walk every process with same netns as SA adder and mark
it as uncheckpointable. Definitely doable, but ugly, isn't it?

Same for iptable rules.

"Checkpointable" is container property, OK?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
  2009-02-24  4:47                       ` Alexey Dobriyan
@ 2009-02-24  5:11                         ` Dave Hansen
  2009-02-24 15:43                           ` Serge E. Hallyn
  2009-02-24 20:09                           ` Alexey Dobriyan
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Hansen @ 2009-02-24  5:11 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Ingo Molnar, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul

On Tue, 2009-02-24 at 07:47 +0300, Alexey Dobriyan wrote:
> > I think what I posted is a decent compromise.  It gets you those
> > warnings at runtime and is a one-way trip for any given process.  But,
> > it does detect in certain cases (fork() and unshare(FILES)) when it is
> > safe to make the trip back to the "I'm checkpointable" state again.
> 
> "Checkpointable" is not even per-process property.
> 
> Imagine, set of SAs (struct xfrm_state) and SPDs (struct xfrm_policy).
> They are a) per-netns, b) persistent.
> 
> You can hook into socketcalls to mark process as uncheckpointable,
> but since SAs and SPDs are persistent, original process already exited.
> You're going to walk every process with same netns as SA adder and mark
> it as uncheckpointable. Definitely doable, but ugly, isn't it?
> 
> Same for iptable rules.
> 
> "Checkpointable" is container property, OK?

Ideally, I completely agree.

But, we don't currently have a concept of a true container in the
kernel.  Do you have any suggestions for any current objects that we
could use in its place for a while?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
  2009-02-24  5:11                         ` Dave Hansen
@ 2009-02-24 15:43                           ` Serge E. Hallyn
  2009-02-24 20:09                           ` Alexey Dobriyan
  1 sibling, 0 replies; 8+ messages in thread
From: Serge E. Hallyn @ 2009-02-24 15:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alexey Dobriyan, hpa, linux-api, containers, Nathan Lynch,
	linux-kernel, linux-mm, tglx, viro, mpm, Ingo Molnar, torvalds,
	Andrew Morton, xemul

Quoting Dave Hansen (dave@linux.vnet.ibm.com):
> On Tue, 2009-02-24 at 07:47 +0300, Alexey Dobriyan wrote:
> > > I think what I posted is a decent compromise.  It gets you those
> > > warnings at runtime and is a one-way trip for any given process.  But,
> > > it does detect in certain cases (fork() and unshare(FILES)) when it is
> > > safe to make the trip back to the "I'm checkpointable" state again.
> > 
> > "Checkpointable" is not even per-process property.
> > 
> > Imagine, set of SAs (struct xfrm_state) and SPDs (struct xfrm_policy).
> > They are a) per-netns, b) persistent.
> > 
> > You can hook into socketcalls to mark process as uncheckpointable,
> > but since SAs and SPDs are persistent, original process already exited.
> > You're going to walk every process with same netns as SA adder and mark
> > it as uncheckpointable. Definitely doable, but ugly, isn't it?
> > 
> > Same for iptable rules.
> > 
> > "Checkpointable" is container property, OK?
> 
> Ideally, I completely agree.
> 
> But, we don't currently have a concept of a true container in the
> kernel.  Do you have any suggestions for any current objects that we
> could use in its place for a while?

I think the main point is that it makes the concept of marking a task as
uncheckpointable unworkable.  So at sys_checkpoint() time or when we cat
/proc/$$/checkpointable, we can check for all of the uncheckpointable
state of both $$ and its container (including whether $$ is a container
init).  But we can't expect that (to use Alexey's example) when one task
in a netns does a certain sys_socketcall, all tasks in the container
will be marked uncheckpointable.  Or at least we don't want to.

Which means task->uncheckpointable can't be the big stick which I think
you were hoping it would be.

-serge

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Banning checkpoint (was: Re: What can OpenVZ do?)
  2009-02-24  5:11                         ` Dave Hansen
  2009-02-24 15:43                           ` Serge E. Hallyn
@ 2009-02-24 20:09                           ` Alexey Dobriyan
  1 sibling, 0 replies; 8+ messages in thread
From: Alexey Dobriyan @ 2009-02-24 20:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, Nathan Lynch, linux-api, containers, mpm,
	linux-kernel, linux-mm, viro, hpa, Andrew Morton, torvalds, tglx,
	xemul

On Mon, Feb 23, 2009 at 09:11:25PM -0800, Dave Hansen wrote:
> On Tue, 2009-02-24 at 07:47 +0300, Alexey Dobriyan wrote:
> > > I think what I posted is a decent compromise.  It gets you those
> > > warnings at runtime and is a one-way trip for any given process.  But,
> > > it does detect in certain cases (fork() and unshare(FILES)) when it is
> > > safe to make the trip back to the "I'm checkpointable" state again.
> > 
> > "Checkpointable" is not even per-process property.
> > 
> > Imagine, set of SAs (struct xfrm_state) and SPDs (struct xfrm_policy).
> > They are a) per-netns, b) persistent.
> > 
> > You can hook into socketcalls to mark process as uncheckpointable,
> > but since SAs and SPDs are persistent, original process already exited.
> > You're going to walk every process with same netns as SA adder and mark
> > it as uncheckpointable. Definitely doable, but ugly, isn't it?
> > 
> > Same for iptable rules.
> > 
> > "Checkpointable" is container property, OK?
> 
> Ideally, I completely agree.
> 
> But, we don't currently have a concept of a true container in the
> kernel.  Do you have any suggestions for any current objects that we
> could use in its place for a while?

After all foo_ns changes struct nsproxy is such thing.

More specific, a process with fully cloned nsproxy acting as init,
all its children. In terms of data structures, every task_struct in such
tree, every nsproxy of them, every foo_ns, and so on to lower levels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-02-24 20:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <c6GC9-3l7-11@gated-at.bofh.it>
     [not found] ` <c6GLN-3xO-31@gated-at.bofh.it>
     [not found]   ` <c6IDT-6AZ-9@gated-at.bofh.it>
     [not found]     ` <c6INu-6Ol-1@gated-at.bofh.it>
     [not found]       ` <c6MR7-54s-3@gated-at.bofh.it>
     [not found]         ` <c6ZbG-hF-13@gated-at.bofh.it>
     [not found]           ` <c729F-5gF-21@gated-at.bofh.it>
     [not found]             ` <c73S1-8dJ-19@gated-at.bofh.it>
     [not found]               ` <c7mrC-4YD-3@gated-at.bofh.it>
     [not found]                 ` <c7mBk-5b2-23@gated-at.bofh.it>
     [not found]                   ` <c8Xp3-3TH-3@gated-at.bofh.it>
2009-02-24 13:00                     ` Banning checkpoint (was: Re: What can OpenVZ do?) Bodo Eggert
2009-02-24 13:00                     ` Bodo Eggert
2009-02-13 10:53 What can OpenVZ do? Ingo Molnar
2009-02-16 20:51 ` Dave Hansen
2009-02-17 22:23   ` Ingo Molnar
2009-02-17 22:30     ` Dave Hansen
2009-02-18  0:32       ` Ingo Molnar
2009-02-18  0:40         ` Dave Hansen
2009-02-18  5:11           ` Alexey Dobriyan
2009-02-18 18:16             ` Ingo Molnar
2009-02-18 21:27               ` Dave Hansen
2009-02-18 23:15                 ` Ingo Molnar
2009-02-19 19:06                   ` Banning checkpoint (was: Re: What can OpenVZ do?) Alexey Dobriyan
2009-02-19 19:11                     ` Dave Hansen
2009-02-24  4:47                       ` Alexey Dobriyan
2009-02-24  5:11                         ` Dave Hansen
2009-02-24 15:43                           ` Serge E. Hallyn
2009-02-24 20:09                           ` Alexey Dobriyan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).