cryo and mm->arg

All of lore.kernel.org
 help / color / mirror / Atom feed

* cryo and mm->arg_start
@ 2008-07-11 13:13 Serge E. Hallyn
       [not found] ` <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Serge E. Hallyn @ 2008-07-11 13:13 UTC (permalink / raw)
  To: Linux Containers

What cryo does right now to restart some task (say openmp stream) is:

	1. fork, ptrace_tracem(), then execute the original application
	   (stream)
	2. (some other stuff)
	3. through ptrace, cause the restarted process to read the
	   checkpointed data back into writeable maps.  This includes
	   the stack

The restarted task's filename is correctly reported through
/proc/$$/cmdline.  Once we rewrite the stack, it is corrupted.

The reason is that the cmdline contents are taken from mm->arg_start,
which varies with each execution.

On the one hand it's kind of a "small thing."  But IIUC it's like
did_exec in that there is no way to fix it for userspace.

One thing we could do here is to start extending the cryo approach
with Eric's checkpoint-as-a-coredump (caac?).  We generate the
tiniest of coredumps which, at first, contains nothing but
mm->arg_start and maybe a process id.  It would be simplest if
it also contained a filename for the real executable, but I don't
know that we could get away with that.  If we *could* get away
with that, then we could have a trivial fs/binfmt_cr.c "execute"
such a caac file, which would mean it would exec the original
executable, then change process settings in accordance with the
ccac file contents.

Any other ideas?  Comments?

-serge

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]

* Re: cryo and mm->arg_start
       [not found] ` <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-11 16:38   ` Dave Hansen
  2008-07-11 21:26     ` Serge E. Hallyn
  2008-07-11 22:01     ` Matt Helsley
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Hansen @ 2008-07-11 16:38 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers

On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote:
> 
> One thing we could do here is to start extending the cryo approach
> with Eric's checkpoint-as-a-coredump (caac?).  We generate the
> tiniest of coredumps which, at first, contains nothing but
> mm->arg_start and maybe a process id.  It would be simplest if
> it also contained a filename for the real executable,

The exec model sounds reasonable to me.

But, I think the filename of the exe is going to have to be in the
checkpoint *already*.  It is mapped by at least one of the VMAs, and
will probably be dumped as a normal file-backed area.  

Now, since arg_start is already set up at exec time, it doesn't seem
unreasonable to have the theoretical fs/binfmt_cr.c set it as well.

-- Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cryo and mm->arg_start
  2008-07-11 16:38   ` Dave Hansen
@ 2008-07-11 21:26     ` Serge E. Hallyn
  2008-07-11 22:01     ` Matt Helsley
  1 sibling, 0 replies; 7+ messages in thread
From: Serge E. Hallyn @ 2008-07-11 21:26 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Linux Containers

Quoting Dave Hansen (dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org):
> On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote:
> > 
> > One thing we could do here is to start extending the cryo approach
> > with Eric's checkpoint-as-a-coredump (caac?).  We generate the
> > tiniest of coredumps which, at first, contains nothing but
> > mm->arg_start and maybe a process id.  It would be simplest if
> > it also contained a filename for the real executable,
> 
> The exec model sounds reasonable to me.
> 
> But, I think the filename of the exe is going to have to be in the
> checkpoint *already*.  It is mapped by at least one of the VMAs, and
> will probably be dumped as a normal file-backed area.  
> 
> Now, since arg_start is already set up at exec time, it doesn't seem
> unreasonable to have the theoretical fs/binfmt_cr.c set it as well.
> 
> -- Dave

Ok.

So I'll play with this a bit over the next week.  I'm mostly unfamiliar
with the coredump code and have looked through the binfmts mainly for
tracking the order of security events, so this should be fun.

-serge

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cryo and mm->arg_start
  2008-07-11 16:38   ` Dave Hansen
  2008-07-11 21:26     ` Serge E. Hallyn
@ 2008-07-11 22:01     ` Matt Helsley
       [not found]       ` <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Matt Helsley @ 2008-07-11 22:01 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Linux Containers


On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote:
> On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote:
> > 
> > One thing we could do here is to start extending the cryo approach
> > with Eric's checkpoint-as-a-coredump (caac?).  We generate the
> > tiniest of coredumps which, at first, contains nothing but
> > mm->arg_start and maybe a process id.  It would be simplest if
> > it also contained a filename for the real executable,
> 
> The exec model sounds reasonable to me.
>
> But, I think the filename of the exe is going to have to be in the
> checkpoint *already*.  It is mapped by at least one of the VMAs, and
> will probably be dumped as a normal file-backed area.  

	Yes, the file that backed the exec will be there. Note that thanks to
"stacking" filesystems the path to the file backing the exe is not
_always_ going to be the same as the path to the file which userspace
exec'd in the first place. You can see this by comparing
the /proc/<pid>/exe symlink with the file backing the VMA.

	This is important to any program which checks the /proc/self/exe
symlink to find out where it's installed (Java does this, for example).
I think it's possible to do this with a binfmt -- it's just one more
detail to remember.

Cheers,
	-Matt

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]

* Re: cryo and mm->arg_start
       [not found]       ` <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2008-07-13 21:08         ` Serge E. Hallyn
       [not found]           ` <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Serge E. Hallyn @ 2008-07-13 21:08 UTC (permalink / raw)
  To: Matt Helsley; +Cc: Linux Containers, Dave Hansen

Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> 
> On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote:
> > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote:
> > > 
> > > One thing we could do here is to start extending the cryo approach
> > > with Eric's checkpoint-as-a-coredump (caac?).  We generate the
> > > tiniest of coredumps which, at first, contains nothing but
> > > mm->arg_start and maybe a process id.  It would be simplest if
> > > it also contained a filename for the real executable,
> > 
> > The exec model sounds reasonable to me.
> >
> > But, I think the filename of the exe is going to have to be in the
> > checkpoint *already*.  It is mapped by at least one of the VMAs, and
> > will probably be dumped as a normal file-backed area.  
> 
> 	Yes, the file that backed the exec will be there. Note that thanks to
> "stacking" filesystems the path to the file backing the exe is not
> _always_ going to be the same as the path to the file which userspace
> exec'd in the first place. You can see this by comparing
> the /proc/<pid>/exe symlink with the file backing the VMA.
> 
> 	This is important to any program which checks the /proc/self/exe
> symlink to find out where it's installed (Java does this, for example).
> I think it's possible to do this with a binfmt -- it's just one more
> detail to remember.
> 
> Cheers,
> 	-Matt

Let's say that before starting my checkpointable job, I did

	mount -t ecryptfs /home/hallyn /home/hallyn

Now if the checkpointable job is /home/hallyn/somelongjob, then I think
it's fair to say that restart can fail if /home/hallyn at the restart
machine isn't ecryptfs-mounted.

In that case, would you still think there is a problem?

On the other hand, if the checkpointable job did the ecryptfs mount
itself, then it would be expected that at restart the ecryptfs mount
would be remounted.  How that would be done I have no idea offhand.

thanks,
-serge

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]

* Re: cryo and mm->arg_start
       [not found]           ` <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-15 21:40             ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
       [not found]               ` <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-07-15 21:40 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers, Dave Hansen

Serge E. Hallyn [serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org] wrote:
| Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
| > 
| > On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote:
| > > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote:
| > > > 
| > > > One thing we could do here is to start extending the cryo approach
| > > > with Eric's checkpoint-as-a-coredump (caac?).  We generate the
| > > > tiniest of coredumps which, at first, contains nothing but
| > > > mm->arg_start and maybe a process id.  It would be simplest if
| > > > it also contained a filename for the real executable,
| > > 
| > > The exec model sounds reasonable to me.
| > >
| > > But, I think the filename of the exe is going to have to be in the
| > > checkpoint *already*.  It is mapped by at least one of the VMAs, and
| > > will probably be dumped as a normal file-backed area.  
| > 
| > 	Yes, the file that backed the exec will be there. Note that thanks to
| > "stacking" filesystems the path to the file backing the exe is not
| > _always_ going to be the same as the path to the file which userspace
| > exec'd in the first place. You can see this by comparing
| > the /proc/<pid>/exe symlink with the file backing the VMA.
| > 
| > 	This is important to any program which checks the /proc/self/exe
| > symlink to find out where it's installed (Java does this, for example).
| > I think it's possible to do this with a binfmt -- it's just one more
| > detail to remember.
| > 
| > Cheers,
| > 	-Matt
| 
| Let's say that before starting my checkpointable job, I did
| 
| 	mount -t ecryptfs /home/hallyn /home/hallyn
| 
| Now if the checkpointable job is /home/hallyn/somelongjob, then I think
| it's fair to say that restart can fail if /home/hallyn at the restart
| machine isn't ecryptfs-mounted.
| 
| In that case, would you still think there is a problem?
| 
| On the other hand, if the checkpointable job did the ecryptfs mount
| itself, then it would be expected that at restart the ecryptfs mount
| would be remounted.  How that would be done I have no idea offhand.

Hmm, wonder if the new /proc/pid/mountinfo with its mount-ids would
enable us to identify the filesystems that a given process expects.

Which brings up another question. If two processes in the same container
have different mount namespaces and mount points, we would need to
reestablish the mounts during restart right ?

Suka

^ permalink raw reply	[flat|nested] 7+ messages in thread

[parent not found: <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]

* Re: cryo and mm->arg_start
       [not found]               ` <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2008-07-16 15:23                 ` Serge E. Hallyn
  0 siblings, 0 replies; 7+ messages in thread
From: Serge E. Hallyn @ 2008-07-16 15:23 UTC (permalink / raw)
  To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Linux Containers, Dave Hansen

Quoting sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org (sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> Serge E. Hallyn [serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org] wrote:
> | Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org):
> | > 
> | > On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote:
> | > > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote:
> | > > > 
> | > > > One thing we could do here is to start extending the cryo approach
> | > > > with Eric's checkpoint-as-a-coredump (caac?).  We generate the
> | > > > tiniest of coredumps which, at first, contains nothing but
> | > > > mm->arg_start and maybe a process id.  It would be simplest if
> | > > > it also contained a filename for the real executable,
> | > > 
> | > > The exec model sounds reasonable to me.
> | > >
> | > > But, I think the filename of the exe is going to have to be in the
> | > > checkpoint *already*.  It is mapped by at least one of the VMAs, and
> | > > will probably be dumped as a normal file-backed area.  
> | > 
> | > 	Yes, the file that backed the exec will be there. Note that thanks to
> | > "stacking" filesystems the path to the file backing the exe is not
> | > _always_ going to be the same as the path to the file which userspace
> | > exec'd in the first place. You can see this by comparing
> | > the /proc/<pid>/exe symlink with the file backing the VMA.
> | > 
> | > 	This is important to any program which checks the /proc/self/exe
> | > symlink to find out where it's installed (Java does this, for example).
> | > I think it's possible to do this with a binfmt -- it's just one more
> | > detail to remember.
> | > 
> | > Cheers,
> | > 	-Matt
> | 
> | Let's say that before starting my checkpointable job, I did
> | 
> | 	mount -t ecryptfs /home/hallyn /home/hallyn
> | 
> | Now if the checkpointable job is /home/hallyn/somelongjob, then I think
> | it's fair to say that restart can fail if /home/hallyn at the restart
> | machine isn't ecryptfs-mounted.
> | 
> | In that case, would you still think there is a problem?
> | 
> | On the other hand, if the checkpointable job did the ecryptfs mount
> | itself, then it would be expected that at restart the ecryptfs mount
> | would be remounted.  How that would be done I have no idea offhand.
> 
> Hmm, wonder if the new /proc/pid/mountinfo with its mount-ids would
> enable us to identify the filesystems that a given process expects.

Interesting point.  Yes, it *should*, that's sort of the idea.  I don't
remember whether some of the limitations in terms of hiding mount-ids
from other namespaces were implemented or not, if so I suspect they
could be a problem.

> Which brings up another question. If two processes in the same container
> have different mount namespaces and mount points, we would need to
> reestablish the mounts during restart right ?

Yes.

-serge

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-07-16 15:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 13:13 cryo and mm->arg_start Serge E. Hallyn
     [not found] ` <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-11 16:38   ` Dave Hansen
2008-07-11 21:26     ` Serge E. Hallyn
2008-07-11 22:01     ` Matt Helsley
     [not found]       ` <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-07-13 21:08         ` Serge E. Hallyn
     [not found]           ` <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-15 21:40             ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
     [not found]               ` <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-16 15:23                 ` Serge E. Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.