* cryo and mm->arg_start
@ 2008-07-11 13:13 Serge E. Hallyn
[not found] ` <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Serge E. Hallyn @ 2008-07-11 13:13 UTC (permalink / raw)
To: Linux Containers
What cryo does right now to restart some task (say openmp stream) is:
1. fork, ptrace_tracem(), then execute the original application
(stream)
2. (some other stuff)
3. through ptrace, cause the restarted process to read the
checkpointed data back into writeable maps. This includes
the stack
The restarted task's filename is correctly reported through
/proc/$$/cmdline. Once we rewrite the stack, it is corrupted.
The reason is that the cmdline contents are taken from mm->arg_start,
which varies with each execution.
On the one hand it's kind of a "small thing." But IIUC it's like
did_exec in that there is no way to fix it for userspace.
One thing we could do here is to start extending the cryo approach
with Eric's checkpoint-as-a-coredump (caac?). We generate the
tiniest of coredumps which, at first, contains nothing but
mm->arg_start and maybe a process id. It would be simplest if
it also contained a filename for the real executable, but I don't
know that we could get away with that. If we *could* get away
with that, then we could have a trivial fs/binfmt_cr.c "execute"
such a caac file, which would mean it would exec the original
executable, then change process settings in accordance with the
ccac file contents.
Any other ideas? Comments?
-serge
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: cryo and mm->arg_start [not found] ` <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-07-11 16:38 ` Dave Hansen 2008-07-11 21:26 ` Serge E. Hallyn 2008-07-11 22:01 ` Matt Helsley 0 siblings, 2 replies; 7+ messages in thread From: Dave Hansen @ 2008-07-11 16:38 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Linux Containers On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote: > > One thing we could do here is to start extending the cryo approach > with Eric's checkpoint-as-a-coredump (caac?). We generate the > tiniest of coredumps which, at first, contains nothing but > mm->arg_start and maybe a process id. It would be simplest if > it also contained a filename for the real executable, The exec model sounds reasonable to me. But, I think the filename of the exe is going to have to be in the checkpoint *already*. It is mapped by at least one of the VMAs, and will probably be dumped as a normal file-backed area. Now, since arg_start is already set up at exec time, it doesn't seem unreasonable to have the theoretical fs/binfmt_cr.c set it as well. -- Dave ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cryo and mm->arg_start 2008-07-11 16:38 ` Dave Hansen @ 2008-07-11 21:26 ` Serge E. Hallyn 2008-07-11 22:01 ` Matt Helsley 1 sibling, 0 replies; 7+ messages in thread From: Serge E. Hallyn @ 2008-07-11 21:26 UTC (permalink / raw) To: Dave Hansen; +Cc: Linux Containers Quoting Dave Hansen (dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org): > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote: > > > > One thing we could do here is to start extending the cryo approach > > with Eric's checkpoint-as-a-coredump (caac?). We generate the > > tiniest of coredumps which, at first, contains nothing but > > mm->arg_start and maybe a process id. It would be simplest if > > it also contained a filename for the real executable, > > The exec model sounds reasonable to me. > > But, I think the filename of the exe is going to have to be in the > checkpoint *already*. It is mapped by at least one of the VMAs, and > will probably be dumped as a normal file-backed area. > > Now, since arg_start is already set up at exec time, it doesn't seem > unreasonable to have the theoretical fs/binfmt_cr.c set it as well. > > -- Dave Ok. So I'll play with this a bit over the next week. I'm mostly unfamiliar with the coredump code and have looked through the binfmts mainly for tracking the order of security events, so this should be fun. -serge ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cryo and mm->arg_start 2008-07-11 16:38 ` Dave Hansen 2008-07-11 21:26 ` Serge E. Hallyn @ 2008-07-11 22:01 ` Matt Helsley [not found] ` <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Matt Helsley @ 2008-07-11 22:01 UTC (permalink / raw) To: Dave Hansen; +Cc: Linux Containers On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote: > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote: > > > > One thing we could do here is to start extending the cryo approach > > with Eric's checkpoint-as-a-coredump (caac?). We generate the > > tiniest of coredumps which, at first, contains nothing but > > mm->arg_start and maybe a process id. It would be simplest if > > it also contained a filename for the real executable, > > The exec model sounds reasonable to me. > > But, I think the filename of the exe is going to have to be in the > checkpoint *already*. It is mapped by at least one of the VMAs, and > will probably be dumped as a normal file-backed area. Yes, the file that backed the exec will be there. Note that thanks to "stacking" filesystems the path to the file backing the exe is not _always_ going to be the same as the path to the file which userspace exec'd in the first place. You can see this by comparing the /proc/<pid>/exe symlink with the file backing the VMA. This is important to any program which checks the /proc/self/exe symlink to find out where it's installed (Java does this, for example). I think it's possible to do this with a binfmt -- it's just one more detail to remember. Cheers, -Matt ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>]
* Re: cryo and mm->arg_start [not found] ` <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org> @ 2008-07-13 21:08 ` Serge E. Hallyn [not found] ` <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Serge E. Hallyn @ 2008-07-13 21:08 UTC (permalink / raw) To: Matt Helsley; +Cc: Linux Containers, Dave Hansen Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): > > On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote: > > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote: > > > > > > One thing we could do here is to start extending the cryo approach > > > with Eric's checkpoint-as-a-coredump (caac?). We generate the > > > tiniest of coredumps which, at first, contains nothing but > > > mm->arg_start and maybe a process id. It would be simplest if > > > it also contained a filename for the real executable, > > > > The exec model sounds reasonable to me. > > > > But, I think the filename of the exe is going to have to be in the > > checkpoint *already*. It is mapped by at least one of the VMAs, and > > will probably be dumped as a normal file-backed area. > > Yes, the file that backed the exec will be there. Note that thanks to > "stacking" filesystems the path to the file backing the exe is not > _always_ going to be the same as the path to the file which userspace > exec'd in the first place. You can see this by comparing > the /proc/<pid>/exe symlink with the file backing the VMA. > > This is important to any program which checks the /proc/self/exe > symlink to find out where it's installed (Java does this, for example). > I think it's possible to do this with a binfmt -- it's just one more > detail to remember. > > Cheers, > -Matt Let's say that before starting my checkpointable job, I did mount -t ecryptfs /home/hallyn /home/hallyn Now if the checkpointable job is /home/hallyn/somelongjob, then I think it's fair to say that restart can fail if /home/hallyn at the restart machine isn't ecryptfs-mounted. In that case, would you still think there is a problem? On the other hand, if the checkpointable job did the ecryptfs mount itself, then it would be expected that at restart the ecryptfs mount would be remounted. How that would be done I have no idea offhand. thanks, -serge ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: cryo and mm->arg_start [not found] ` <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-07-15 21:40 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA [not found] ` <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2008-07-15 21:40 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Linux Containers, Dave Hansen Serge E. Hallyn [serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org] wrote: | Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): | > | > On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote: | > > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote: | > > > | > > > One thing we could do here is to start extending the cryo approach | > > > with Eric's checkpoint-as-a-coredump (caac?). We generate the | > > > tiniest of coredumps which, at first, contains nothing but | > > > mm->arg_start and maybe a process id. It would be simplest if | > > > it also contained a filename for the real executable, | > > | > > The exec model sounds reasonable to me. | > > | > > But, I think the filename of the exe is going to have to be in the | > > checkpoint *already*. It is mapped by at least one of the VMAs, and | > > will probably be dumped as a normal file-backed area. | > | > Yes, the file that backed the exec will be there. Note that thanks to | > "stacking" filesystems the path to the file backing the exe is not | > _always_ going to be the same as the path to the file which userspace | > exec'd in the first place. You can see this by comparing | > the /proc/<pid>/exe symlink with the file backing the VMA. | > | > This is important to any program which checks the /proc/self/exe | > symlink to find out where it's installed (Java does this, for example). | > I think it's possible to do this with a binfmt -- it's just one more | > detail to remember. | > | > Cheers, | > -Matt | | Let's say that before starting my checkpointable job, I did | | mount -t ecryptfs /home/hallyn /home/hallyn | | Now if the checkpointable job is /home/hallyn/somelongjob, then I think | it's fair to say that restart can fail if /home/hallyn at the restart | machine isn't ecryptfs-mounted. | | In that case, would you still think there is a problem? | | On the other hand, if the checkpointable job did the ecryptfs mount | itself, then it would be expected that at restart the ecryptfs mount | would be remounted. How that would be done I have no idea offhand. Hmm, wonder if the new /proc/pid/mountinfo with its mount-ids would enable us to identify the filesystems that a given process expects. Which brings up another question. If two processes in the same container have different mount namespaces and mount points, we would need to reestablish the mounts during restart right ? Suka ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* Re: cryo and mm->arg_start [not found] ` <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2008-07-16 15:23 ` Serge E. Hallyn 0 siblings, 0 replies; 7+ messages in thread From: Serge E. Hallyn @ 2008-07-16 15:23 UTC (permalink / raw) To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Linux Containers, Dave Hansen Quoting sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org (sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): > Serge E. Hallyn [serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org] wrote: > | Quoting Matt Helsley (matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org): > | > > | > On Fri, 2008-07-11 at 09:38 -0700, Dave Hansen wrote: > | > > On Fri, 2008-07-11 at 08:13 -0500, Serge E. Hallyn wrote: > | > > > > | > > > One thing we could do here is to start extending the cryo approach > | > > > with Eric's checkpoint-as-a-coredump (caac?). We generate the > | > > > tiniest of coredumps which, at first, contains nothing but > | > > > mm->arg_start and maybe a process id. It would be simplest if > | > > > it also contained a filename for the real executable, > | > > > | > > The exec model sounds reasonable to me. > | > > > | > > But, I think the filename of the exe is going to have to be in the > | > > checkpoint *already*. It is mapped by at least one of the VMAs, and > | > > will probably be dumped as a normal file-backed area. > | > > | > Yes, the file that backed the exec will be there. Note that thanks to > | > "stacking" filesystems the path to the file backing the exe is not > | > _always_ going to be the same as the path to the file which userspace > | > exec'd in the first place. You can see this by comparing > | > the /proc/<pid>/exe symlink with the file backing the VMA. > | > > | > This is important to any program which checks the /proc/self/exe > | > symlink to find out where it's installed (Java does this, for example). > | > I think it's possible to do this with a binfmt -- it's just one more > | > detail to remember. > | > > | > Cheers, > | > -Matt > | > | Let's say that before starting my checkpointable job, I did > | > | mount -t ecryptfs /home/hallyn /home/hallyn > | > | Now if the checkpointable job is /home/hallyn/somelongjob, then I think > | it's fair to say that restart can fail if /home/hallyn at the restart > | machine isn't ecryptfs-mounted. > | > | In that case, would you still think there is a problem? > | > | On the other hand, if the checkpointable job did the ecryptfs mount > | itself, then it would be expected that at restart the ecryptfs mount > | would be remounted. How that would be done I have no idea offhand. > > Hmm, wonder if the new /proc/pid/mountinfo with its mount-ids would > enable us to identify the filesystems that a given process expects. Interesting point. Yes, it *should*, that's sort of the idea. I don't remember whether some of the limitations in terms of hiding mount-ids from other namespaces were implemented or not, if so I suspect they could be a problem. > Which brings up another question. If two processes in the same container > have different mount namespaces and mount points, we would need to > reestablish the mounts during restart right ? Yes. -serge ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-07-16 15:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 13:13 cryo and mm->arg_start Serge E. Hallyn
[not found] ` <20080711131345.GA18870-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-11 16:38 ` Dave Hansen
2008-07-11 21:26 ` Serge E. Hallyn
2008-07-11 22:01 ` Matt Helsley
[not found] ` <1215813673.5456.284.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-07-13 21:08 ` Serge E. Hallyn
[not found] ` <20080713210846.GD8186-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-15 21:40 ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
[not found] ` <20080715214050.GA29648-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-07-16 15:23 ` Serge E. Hallyn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.