All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Linux Containers
	<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
	linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH RFC] s390: let tasks know to restart syscalls after sys_restart()
Date: Tue, 09 Feb 2010 16:12:11 -0500	[thread overview]
Message-ID: <4B71CFAB.70409@cs.columbia.edu> (raw)
In-Reply-To: <20100209201643.GB29094-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org):
>>>> Serge E. Hallyn wrote:
>>>>> (This is a patch against the checkpoint/restart kernel tree at
>>>>> http://git.ncl.cs.columbia.edu/?p=linux-cr.git;a=shortlog;h=refs/heads/ckpt-v19-rc2.9)
>>>>>
>>>>> On x86, do_signal() leaves -516 in eax while it freezes, which
>>>>> sys_restart() can use to detect that it should restart the
>>>>> syscall which was interrupted by a signal (or the freezer).
>>>>>
>>>>> On s390, gprs[2] gets tweaked to -EINTR (-4) instead, leaving
>>>>> us no reliable way to tell whether should be restarted.  If the
>>>>> task is checkpointed here and then restarted, then the last part
>>>>> of do_signal() won't be done, and the interrupted syscall won't
>>>>> be restarted.
>>>>>
>>>>> This patch defines TIF_RESTARTBLOCK as a thread flag showing that
>>>>> the thread expects to be frozen while kicked out of a restartable
>>>>> syscall by a signal.
>>>>>
>>>>> The TIF_RESTARTBLOCK flag is only set for the duration of the
>>>>> get get_signal_to_deliver() which is where the task may get
>>>>> frozen.  We also set it in sys_restart() if the checkpointed task
>>>>> had had TIF_RESTARTBLOCK set.  It will get cleared if upon exiting
>>>>> sys_restart() we jump to sysc_sigpending.  If instead we jump back
>>>>> to do_signal(), then TIF_RESTARTBLOCK will stay set again until
>>>>> after get_signal_to_deliver() so that if it immediately freezes and
>>>>> is re-checkpointed, the resulting second checkpoint image again
>>>>> will have TIF_RESTARTBLOCK set.
>>>> Two comments:
>>>>
>>>> 1) This note will be lost once we fold this patch into a clean
>>>> patchset. Can you please make it part of the code ?
>>> Sure, good idea.
>>>
>>>> 2) Maybe I'm missing something, but I'm not convinced. Can you
>>>> elaborate on why this is correct in different cases ?  Also, in
>>>> particular with respect to the post-signal-sent snippet in the
>>>> signal handling code:
>>>>
>>>>         if (retval == -ERESTART_RESTARTBLOCK
>>>>             && regs->psw.addr == continue_addr) {
>>>>
>>>>                 regs->gprs[2] = __NR_restart_syscall;
>>>>
>>>>                 set_thread_flag(TIF_RESTART_SVC);
>>>>
>>>>         }
>>>>
>>>>
>>>> Would it do what you expect after a restart ?  (restart modifies
>>>> the psw.addr)
>>> I don't understand the question.  After sys_restart(), we won't be
>>> returning to this kernel code.  We'll either immediately call
>>> restart_syscall(), or, if a signal was delivered before sys_restart(),
>>> completed, then do_signal() will start again from the top.
>> Ok, I re-read the code: let's look at these cases:
>>
>> case 1: checkpointee wasn't in syscall -- no problem.
>>
>> case 2: checkpointee was in syscall, no signal pending; when it was
>> frozen, regs->svcnr became 0, and that's what we save, so on restart
>> we won't enter that snippet again. Again, no problem.
>>
>> case 3: checkpointee was in syscall, signal pending;
>> case 4: checkpointee was in syscall, signal received at restart;
>> look at this snippet:
>>
>>         if (signr > 0 && regs->psw.addr == restart_addr) {
>>                 if (retval == -ERESTARTNOHAND
>>                     || (retval == -ERESTARTSYS
>>                          && !(current->sighand->action[signr-1].sa.sa_flags
>>                               & SA_RESTART))) {
>>                         regs->gprs[2] = -EINTR;
>>                         regs->psw.addr = continue_addr;
>>                 }
>>         }
>>
>> Because svcnr is/was 0, neither restart_addr nor continue_addr
>> were setup, so this condition is always false, which I think is
>> wrong.
> 
> I've been focusing on the ERESTART_RESTARTBLOCK case.  Can
> we agree that all cases appear to be handled correctly there?

Errr... everytime I need to go and read all the code from
scratch to convince myself... and I my mind just context
switched :(

> 
> For the ERESTARTSYS/ERESTARTNOHAND case, I'm probably not
> doing the right thing.  For a single checkpoint, since either
> there was no real signal (freezer) or it didn't get handled
> before checkpoint,  psw.addr gets checkpointed and restored
> as restart_addr, which is the right thing.  (since signr is
> not >0, we would have kept the values the same after
> get_signal_to_deliver()).
> 
> But if a real signal gets delivered upon exit of sys_restart(),
> then I think I do think we'll end up doing the wrong thing -
> we'll restart the interrupted system call with the orig_gpr2,
> so we'll pretend the signal did not get delivered, rather
> than proceed past the call to the system call (in userspace)
> with return value -EINTR.  (Just how wrong is that?)
> 
> This is all dense enough that it may be worth thinking of
> a different way to handle it, but I'm not sure what that
> way would be.  The challenge is finding a *simple*, reliable
> way to detect what the the initial conditions to do_signal()
> where, based on the register/thread_info values as they are
> at do_signal()->get_signal_to_deliver()->try_to_freeze(),
> given the ways the values get swapped in the block above
> the get_signal_to_deliver() call.
> 
> The simplest thing by far would be if we could safely
> move the get_signal_to_deliver() call before the
> 
> 	if (regs->svcnr) {
> 		continue_addr = regs->psw.addr;
> 		...
> 
> block.  I assume there are entry_64.S-related reasons why
> we cannot?

Yes, that's what I was thinking. And I don't know enough of
s390 to understand why we could not.

Either that or record the state before it is modified in
dedicated checkpoint-related fields (e.g. per thread).

Oren.

WARNING: multiple messages have this Message-ID (diff)
From: Oren Laadan <orenl@cs.columbia.edu>
To: linux-s390@vger.kernel.org
Subject: Re: [PATCH RFC] s390: let tasks know to restart syscalls after sys_restart()
Date: Tue, 09 Feb 2010 21:12:11 +0000	[thread overview]
Message-ID: <4B71CFAB.70409@cs.columbia.edu> (raw)
In-Reply-To: <20100209201643.GB29094@us.ibm.com>



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl@cs.columbia.edu):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl@cs.columbia.edu):
>>>> Serge E. Hallyn wrote:
>>>>> (This is a patch against the checkpoint/restart kernel tree at
>>>>> http://git.ncl.cs.columbia.edu/?p=linux-cr.git;a=shortlog;h=refs/heads/ckpt-v19-rc2.9)
>>>>>
>>>>> On x86, do_signal() leaves -516 in eax while it freezes, which
>>>>> sys_restart() can use to detect that it should restart the
>>>>> syscall which was interrupted by a signal (or the freezer).
>>>>>
>>>>> On s390, gprs[2] gets tweaked to -EINTR (-4) instead, leaving
>>>>> us no reliable way to tell whether should be restarted.  If the
>>>>> task is checkpointed here and then restarted, then the last part
>>>>> of do_signal() won't be done, and the interrupted syscall won't
>>>>> be restarted.
>>>>>
>>>>> This patch defines TIF_RESTARTBLOCK as a thread flag showing that
>>>>> the thread expects to be frozen while kicked out of a restartable
>>>>> syscall by a signal.
>>>>>
>>>>> The TIF_RESTARTBLOCK flag is only set for the duration of the
>>>>> get get_signal_to_deliver() which is where the task may get
>>>>> frozen.  We also set it in sys_restart() if the checkpointed task
>>>>> had had TIF_RESTARTBLOCK set.  It will get cleared if upon exiting
>>>>> sys_restart() we jump to sysc_sigpending.  If instead we jump back
>>>>> to do_signal(), then TIF_RESTARTBLOCK will stay set again until
>>>>> after get_signal_to_deliver() so that if it immediately freezes and
>>>>> is re-checkpointed, the resulting second checkpoint image again
>>>>> will have TIF_RESTARTBLOCK set.
>>>> Two comments:
>>>>
>>>> 1) This note will be lost once we fold this patch into a clean
>>>> patchset. Can you please make it part of the code ?
>>> Sure, good idea.
>>>
>>>> 2) Maybe I'm missing something, but I'm not convinced. Can you
>>>> elaborate on why this is correct in different cases ?  Also, in
>>>> particular with respect to the post-signal-sent snippet in the
>>>> signal handling code:
>>>>
>>>>         if (retval == -ERESTART_RESTARTBLOCK
>>>>             && regs->psw.addr == continue_addr) {
>>>>
>>>>                 regs->gprs[2] = __NR_restart_syscall;
>>>>
>>>>                 set_thread_flag(TIF_RESTART_SVC);
>>>>
>>>>         }
>>>>
>>>>
>>>> Would it do what you expect after a restart ?  (restart modifies
>>>> the psw.addr)
>>> I don't understand the question.  After sys_restart(), we won't be
>>> returning to this kernel code.  We'll either immediately call
>>> restart_syscall(), or, if a signal was delivered before sys_restart(),
>>> completed, then do_signal() will start again from the top.
>> Ok, I re-read the code: let's look at these cases:
>>
>> case 1: checkpointee wasn't in syscall -- no problem.
>>
>> case 2: checkpointee was in syscall, no signal pending; when it was
>> frozen, regs->svcnr became 0, and that's what we save, so on restart
>> we won't enter that snippet again. Again, no problem.
>>
>> case 3: checkpointee was in syscall, signal pending;
>> case 4: checkpointee was in syscall, signal received at restart;
>> look at this snippet:
>>
>>         if (signr > 0 && regs->psw.addr == restart_addr) {
>>                 if (retval == -ERESTARTNOHAND
>>                     || (retval == -ERESTARTSYS
>>                          && !(current->sighand->action[signr-1].sa.sa_flags
>>                               & SA_RESTART))) {
>>                         regs->gprs[2] = -EINTR;
>>                         regs->psw.addr = continue_addr;
>>                 }
>>         }
>>
>> Because svcnr is/was 0, neither restart_addr nor continue_addr
>> were setup, so this condition is always false, which I think is
>> wrong.
> 
> I've been focusing on the ERESTART_RESTARTBLOCK case.  Can
> we agree that all cases appear to be handled correctly there?

Errr... everytime I need to go and read all the code from
scratch to convince myself... and I my mind just context
switched :(

> 
> For the ERESTARTSYS/ERESTARTNOHAND case, I'm probably not
> doing the right thing.  For a single checkpoint, since either
> there was no real signal (freezer) or it didn't get handled
> before checkpoint,  psw.addr gets checkpointed and restored
> as restart_addr, which is the right thing.  (since signr is
> not >0, we would have kept the values the same after
> get_signal_to_deliver()).
> 
> But if a real signal gets delivered upon exit of sys_restart(),
> then I think I do think we'll end up doing the wrong thing -
> we'll restart the interrupted system call with the orig_gpr2,
> so we'll pretend the signal did not get delivered, rather
> than proceed past the call to the system call (in userspace)
> with return value -EINTR.  (Just how wrong is that?)
> 
> This is all dense enough that it may be worth thinking of
> a different way to handle it, but I'm not sure what that
> way would be.  The challenge is finding a *simple*, reliable
> way to detect what the the initial conditions to do_signal()
> where, based on the register/thread_info values as they are
> at do_signal()->get_signal_to_deliver()->try_to_freeze(),
> given the ways the values get swapped in the block above
> the get_signal_to_deliver() call.
> 
> The simplest thing by far would be if we could safely
> move the get_signal_to_deliver() call before the
> 
> 	if (regs->svcnr) {
> 		continue_addr = regs->psw.addr;
> 		...
> 
> block.  I assume there are entry_64.S-related reasons why
> we cannot?

Yes, that's what I was thinking. And I don't know enough of
s390 to understand why we could not.

Either that or record the state before it is modified in
dedicated checkpoint-related fields (e.g. per thread).

Oren.

  parent reply	other threads:[~2010-02-09 21:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-28 23:35 [PATCH RFC] s390: let tasks know to restart syscalls after sys_restart() Serge E. Hallyn
2010-01-28 23:35 ` [PATCH RFC] s390: let tasks know to restart syscalls after Serge E. Hallyn
     [not found] ` <20100128233528.GA10140-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-05  3:29   ` [PATCH RFC] s390: let tasks know to restart syscalls after sys_restart() Oren Laadan
     [not found]     ` <4B6B908F.10709-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-02-09 17:12       ` Serge E. Hallyn
     [not found]         ` <20100209171232.GA21899-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-09 17:41           ` Oren Laadan
     [not found]             ` <4B719E56.7000009-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-02-09 20:16               ` Serge E. Hallyn
     [not found]                 ` <20100209201643.GB29094-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-02-09 21:12                   ` Oren Laadan [this message]
2010-02-09 21:12                     ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B71CFAB.70409@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.