All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl@cs.columbia.edu>
To: Andrey Mirkin <major@openvz.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Louis.Rilling@kerlabs.com,
	Cedric Le Goater <clg@fr.ibm.com>,
	Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process
Date: Sat, 25 Oct 2008 17:10:36 -0400	[thread overview]
Message-ID: <49038B4C.2010009@cs.columbia.edu> (raw)
In-Reply-To: <200810240757.38012.major@openvz.org>



Andrey Mirkin wrote:
> On Thursday 23 October 2008 17:57 Dave Hansen wrote:
>> On Thu, 2008-10-23 at 13:00 +0400, Andrey Mirkin wrote:
>>>>>>> It is not related to the freezer code actually.
>>>>>>> That is needed to restart syscalls. Right now I don't have a code
>>>>>>> in my patchset which restarts a syscall, but later I plan to add
>>>>>>> it. In OpenVZ checkpointing we restart syscalls if process was
>>>>>>> caught in syscall during checkpointing.
>>>>>> Do you checkpoint uninterruptible syscalls as well? If only
>>>>>> interruptible syscalls are checkpointed, I'd say that either this
>>>>>> syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
>>>>>> handling code already does the trick, or this syscall does not
>>>>>> restart itself when interrupted, and well, this is life, userspace
>>>>>> just sees -EINTR, which is allowed by the syscall spec.
>>>>>> Actually this is how we checkpoint/migrate tasks in interruptible
>>>>>> syscalls in Kerrighed and this works.
>>>>> We checkpoint only interruptible syscalls. Some syscalls do not
>>>>> restart themself, that is why after restarting a process we restart
>>>>> syscall to complete it.
>>>> Can you please elaborate on this ?  I don't recall having had issues
>>>> with that.
>>> Right now in 2.6.18 kernel we restarts in such a way pause,
>>> rt_sigtimedwait and futex syscalls. Recently futex syscall was reworked
>>> and we will not need such hooks for it.
>> Could you elaborate on this a bit?
>>
>> If the futex syscall was reworked, perhaps we can do the same for
>> rt_sigtimedwait() and get rid of this code completely.
> 
> Well, we can try to rework rt_sigtimedwait(), but we will still need this code 
> in the future to restart pause syscall from kernel without returning to user 
> space. Also this code will be needed to restore some complex states.
> As concerns pause syscall I have already written to Louis about the problem we 
> are trying to solve with this code. There is a gap when process will be in 
> user space just before entering syscall again. At this time a signal can be 
> delivered to process and it even can be handled. So, we will miss a signal 
> which must interrupt pause syscall.

I'm not convinced that you a real race exists, and even if it does, I'm not
convinced that hacking the assembly entry/exit code is the best way to do it.

Let me explain:

You are concerned about a race in which a signal is delivered to a task
that resumes from restart to user space and is about to (re)invoke 'pause()'
(because the restart so arranged its EIP and registers).

This almost always means that the user code is buggy and relies on specific
scheduling, because you can usually find a scheduling (without the C/R) where
the intended recipient of the signal was delayed and only calls pause() after
the signal is delivered.

For instance, if the sequence of events is:
	A calls pause() -> checkpoint -> restart ->
		B signals A -> A calls pause() (after restart),
then the following sequence is possible(*) without C/R:
	B signals A -> A calls pause()
because normally B cannot assume anything about when A is actually,
really, is suspended (which means the programmer did an imperfect job).

I said "almost always" and "usually", because there is one case where the
alternative schedule: task B could, prior to sending the signal, "ensure"
that task A is already sleeping within the 'pause()' syscall. While this
is possible, it is definitely unusual, and in fact I never code that does
that. And what if the sysadmin send SIGSTOP followed by SIGCONT ?  In
short, such code is simply broken.

More importantly, if you think about the operation and semantics of the
freezer cgroup - similar behavior is to be expected when you freeze and
then thaw a container.

Specifically, when you freeze the container that has a task in sys_pause(),
then that task will abort the syscall become frozen. As soon as it becomes
unfrozen, it will return to user space (with the EIP "rewinded") only to
re-invoke the syscall. So the same "race" remains even if you only freeze
and then thaw, regardless of C/R.

Moreover, I argue that basically when you return from a sys_restart(), the
entire container should, by default, remain in frozen state - just like it
is with sys_checkpoint(). An explicit thaw will make the container resume
execution.

Therefore, there are two options: the first is to decide that this behavior
- going back to user space to re-invoke the syscall - is valid. In this case
you don't need a special hack for returning from sys_restart(). The second
option is to decide that it is broken, in which case you need to also fix
the freezer code. Personally, I think that this behavior is valid and need
not be fixed.

Finally, even if you do want to fix the behavior for this pathologic case,
I don't see why you'd want to do it in this manner. Instead, you can add a
simple test prior to returning from sys_restart(), something like this:

	...
	/* almost done: now handle special cases: */
	if (our last syscall == __NR_pause) {
		ret = sys_pause();
	} else if (our last syscall == __NR_futex) {
		do some stuff;
		ret = sys_futex();
	} else {
		ret = what-we-want-to-return
	}
	/* finally, return to user space */
	return ret;
}

I'm not quite know what other "complex states" you refer to; but I wonder
whether that code "needed to restore some complex states" could not be
implemented along the same idea.

The upside is clear: the code is less obscure, simple to debug, and not
architecture-dependent. (hehe .. it even runs faster because it saves a
whole kernel->user->kernel switch, what do you know !).

Oren.


  reply	other threads:[~2008-10-25 21:12 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-17 23:11 [PATCH 0/10] OpenVZ kernel based checkpointing/restart (v2) Andrey Mirkin
2008-10-17 23:11 ` Andrey Mirkin
     [not found] ` <1224285098-573-1-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11   ` [PATCH 01/10] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
2008-10-17 23:11 ` Andrey Mirkin
     [not found]   ` <1224285098-573-2-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11     ` [PATCH 02/10] Make checkpoint/restart functionality modular Andrey Mirkin
2008-10-17 23:11       ` Andrey Mirkin
2008-10-20 16:51       ` Dave Hansen
2008-10-20 16:59       ` Serge E. Hallyn
     [not found]       ` <1224285098-573-3-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11         ` [PATCH 03/10] Introduce context structure needed during checkpointing/restart Andrey Mirkin
2008-10-17 23:11           ` Andrey Mirkin
     [not found]           ` <1224285098-573-4-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11             ` [PATCH 04/10] Introduce container dump function Andrey Mirkin
2008-10-17 23:11               ` Andrey Mirkin
     [not found]               ` <1224285098-573-5-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11                 ` [PATCH 05/10] Introduce function to dump process Andrey Mirkin
2008-10-17 23:11                   ` Andrey Mirkin
2008-10-20 11:02                   ` Louis Rilling
     [not found]                     ` <20081020110226.GP15171-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-10-24  4:15                       ` [Devel] " Andrey Mirkin
2008-10-24  4:15                     ` Andrey Mirkin
     [not found]                   ` <1224285098-573-6-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11                     ` [PATCH 06/10] Introduce functions to dump mm Andrey Mirkin
2008-10-17 23:11                       ` Andrey Mirkin
2008-10-20 17:21                       ` Dave Hansen
2008-10-23  8:43                         ` [Devel] " Andrey Mirkin
2008-10-23  8:43                         ` Andrey Mirkin
2008-10-23 13:51                           ` Dave Hansen
2008-10-24  4:07                             ` Andrey Mirkin
2008-10-24  4:07                             ` Andrey Mirkin
     [not found]                           ` <200810231243.42181.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-23 13:51                             ` Dave Hansen
     [not found]                       ` <1224285098-573-7-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11                         ` [PATCH 07/10] Introduce function for restarting a container Andrey Mirkin
2008-10-17 23:11                           ` Andrey Mirkin
2008-10-17 23:11                           ` [PATCH 08/10] Introduce functions to restart a process Andrey Mirkin
2008-10-17 23:11                             ` [PATCH 09/10] Introduce functions to restore mm Andrey Mirkin
     [not found]                               ` <1224285098-573-10-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11                                 ` [PATCH 10/10] Add support for multiple processes Andrey Mirkin
2008-10-17 23:11                               ` Andrey Mirkin
2008-10-27 15:58                                 ` Oren Laadan
     [not found]                                   ` <4905E50C.8020803-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-30  4:55                                     ` [Devel] " Andrey Mirkin
2008-10-30  4:55                                   ` Andrey Mirkin
     [not found]                                 ` <1224285098-573-11-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-27 15:58                                   ` Oren Laadan
     [not found]                             ` <1224285098-573-9-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11                               ` [PATCH 09/10] Introduce functions to restore mm Andrey Mirkin
2008-10-20  9:23                               ` [PATCH 08/10] Introduce functions to restart a process Cedric Le Goater
2008-10-20  9:23                                 ` Cedric Le Goater
2008-10-22  8:49                                 ` [Devel] " Andrey Mirkin
2008-10-22  9:25                                   ` Louis Rilling
2008-10-22 10:06                                     ` Greg Kurz
2008-10-22 10:44                                       ` Louis Rilling
     [not found]                                         ` <20081022104448.GX15171-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-10-22 12:44                                           ` Greg Kurz
2008-10-22 12:44                                         ` Greg Kurz
     [not found]                                       ` <1224669979.4210.15.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-10-22 10:44                                         ` Louis Rilling
     [not found]                                     ` <20081022092502.GW15171-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-10-22 10:06                                       ` Greg Kurz
2008-10-22 10:12                                       ` Andrey Mirkin
2008-10-22 10:12                                         ` Andrey Mirkin
2008-10-22 10:46                                         ` Louis Rilling
     [not found]                                           ` <20081022104630.GY15171-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-10-23  8:53                                             ` Andrey Mirkin
2008-10-23  8:53                                           ` Andrey Mirkin
     [not found]                                         ` <200810221412.14174.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-22 10:46                                           ` Louis Rilling
2008-10-22 15:25                                           ` Oren Laadan
2008-10-22 15:25                                             ` Oren Laadan
     [not found]                                             ` <48FF45CF.5000306-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-23  9:00                                               ` Andrey Mirkin
2008-10-23  9:00                                             ` Andrey Mirkin
2008-10-23 13:57                                               ` Dave Hansen
2008-10-24  3:57                                                 ` Andrey Mirkin
2008-10-25 21:10                                                   ` Oren Laadan [this message]
     [not found]                                                     ` <49038B4C.2010009-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2008-10-29 14:52                                                       ` Andrey Mirkin
2008-10-29 14:52                                                     ` Andrey Mirkin
     [not found]                                                       ` <200810291752.19281.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-30 15:59                                                         ` Oren Laadan
2008-10-30 15:59                                                       ` Oren Laadan
     [not found]                                                   ` <200810240757.38012.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-25 21:10                                                     ` Oren Laadan
2008-10-24  3:57                                                 ` Andrey Mirkin
     [not found]                                               ` <200810231300.50628.amirkin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2008-10-23 13:57                                                 ` Dave Hansen
     [not found]                                   ` <200810221249.55600.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-22  9:25                                     ` Louis Rilling
2008-10-22 12:47                                     ` Cedric Le Goater
2008-10-22 12:47                                   ` Cedric Le Goater
2008-10-23  9:54                                     ` Andrey Mirkin
     [not found]                                       ` <200810231354.47033.major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-23 13:49                                         ` Dave Hansen
2008-10-23 13:49                                       ` Dave Hansen
2008-10-24  4:04                                         ` Andrey Mirkin
2008-10-24  4:04                                         ` Andrey Mirkin
     [not found]                                     ` <48FF20F6.1040505-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-10-23  9:54                                       ` Andrey Mirkin
     [not found]                                 ` <48FC4E0C.7050008-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-10-22  8:49                                   ` Andrey Mirkin
2008-10-20 13:25                               ` Louis Rilling
2008-10-20 13:25                                 ` Louis Rilling
     [not found]                                 ` <20081020132536.GS15171-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-10-23 10:56                                   ` [Devel] " Andrey Mirkin
2008-10-23 10:56                                 ` Andrey Mirkin
     [not found]                           ` <1224285098-573-8-git-send-email-major-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2008-10-17 23:11                             ` Andrey Mirkin
2008-10-20 12:25                         ` [PATCH 06/10] Introduce functions to dump mm Louis Rilling
2008-10-20 12:25                           ` Louis Rilling
     [not found]                           ` <20081020122514.GR15171-Hu8+6S1rdjywhHL9vcZdMVaTQe2KTcn/@public.gmane.org>
2008-10-22  8:58                             ` [Devel] " Andrey Mirkin
2008-10-22  8:58                           ` Andrey Mirkin
2008-10-20 17:21                         ` Dave Hansen
2008-10-20 11:02                     ` [PATCH 05/10] Introduce function to dump process Louis Rilling
2008-10-20 17:48                     ` Serge E. Hallyn
2008-10-20 17:48                   ` Serge E. Hallyn
     [not found]                     ` <20081020174801.GB29092-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2008-10-24  4:40                       ` [Devel] " Andrey Mirkin
2008-10-24  4:40                         ` Andrey Mirkin
2008-10-20 17:02             ` [PATCH 03/10] Introduce context structure needed during checkpointing/restart Dave Hansen
2008-10-20 17:02               ` Dave Hansen
2008-10-29 15:30               ` [Devel] " Andrey Mirkin
2008-10-29 15:30               ` Andrey Mirkin
2008-10-20 16:51         ` [PATCH 02/10] Make checkpoint/restart functionality modular Dave Hansen
2008-10-20 16:59         ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49038B4C.2010009@cs.columbia.edu \
    --to=orenl@cs.columbia.edu \
    --cc=Louis.Rilling@kerlabs.com \
    --cc=clg@fr.ibm.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=major@openvz.org \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.