All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Waychison <mikew-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Jim Winget <winget-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	haveblue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX@public.gmane.org,
	dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
Subject: Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
Date: Mon, 09 Feb 2009 11:25:23 -0800	[thread overview]
Message-ID: <49908323.3090606@google.com> (raw)
In-Reply-To: <f4192e520902090953x43a98134hfaa8443d586a32a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Jim Winget wrote:
> Any way to use a delayed checkpoint signal (perhaps somewhat
> non-deterministic, e.g. "do it now" really means "do it pretty soon") that
> is only taken on return to user space thus allowing a deterministic
> solution?

Ya, I'm thinking that a 'checkpoint' signal would be advisory, with the 
SIG_DFL action performing the checkpoint itself.

Considering that we'd need to cleanly get access to all registers, the 
checkpoint itself needs to be a well defined path from 
userland->kernelland.  I'm wondering if sys_checkpoint could be this 
well-defined path using the PTREGSCALL stub macro.

For tasks that aren't checkpoint-aware, SIG_DFL could possibly be done 
by having the vsyscall page/vdso implement the userland sighandler that 
calls sys_checkpoint.

What this means though is that we won't be able to freeze or SIGSTOP 
tasks before checkpoint.  Both of these paths can be entered via a 
variety of kernel entry points and unless we start dumping the full 
ptregs on each entry point, we'll never be able to reliably get access 
to all registers.

sys_checkpoint itself would have to have it's own method to quiesce all 
the tasks (basically wait for all tasks to enter sys_checkpoint so that 
a multi-task checkpoint is self-consistent).  The nice thing about a 
signal too is that userland can block it and ignore it in a 
deterministic way.  The failure logic for ignored or blocked-for-a-long 
time can be pushed back down to userland.

This is all a dramatic shift from the current way things are done, so 
we'd be best getting a better feel for our options though..

> Jim
> 
> On Fri, Feb 6, 2009 at 4:17 PM, Nauman Rafique <nauman-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> 
>> The patch sent by Masahiko assumes that all the user-space registers are
>> saved on
>> the kernel stack on a system call. This is not true for the majority
>> of the system calls. The callee saved registers (as defined by x86_64
>> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
>> cases. That means that these registers would not be available to
>> checkpoint code. Moreover, the restore code would have no space in
>> stack to restore those registers.
>>
>> This patch partially solves that problem, but using a stub around
>> checkpoint/restart system calls. This stub saves/restores those callee
>> saved registers to/from the kernel stack. This solves the problem in
>> the case of self checkpoint and restore.
>>
>> In case of external checkpoint, there is no clean way to have access
>> to these callee saved registers. We freeze or SIGSTOP the process that
>> has to be checkpointed. The process could have entered the kernel
>> space via any arbitrary code path before it was stopped or
>> frozen. Thus the callee saved registers were not saved in pt_regs
>> (i.e. the bottom of the kernel mode stack). They would be saved at
>> some arbitrary place in the kernel mode stack. And when we want to
>> checkpoint that process, we cannot find those registers and save them
>> in the checkpoint.
>>
>> Possible solutions to this external checkpointing problem include
>> saving/restoring all registers (not feasible as it would have
>> performance penalty for every code path), and overloading a signal for
>> achieving external checkpointing. Any ideas?
>> ---
>>
>>  arch/x86/include/asm/unistd_64.h |    4 ++--
>>  arch/x86/kernel/entry_64.S       |   10 ++++++++++
>>  arch/x86/mm/checkpoint.c         |    3 +--
>>  arch/x86/mm/restart.c            |    5 ++---
>>  4 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/unistd_64.h
>> b/arch/x86/include/asm/unistd_64.h
>> index fe7174d..76aa903 100644
>> --- a/arch/x86/include/asm/unistd_64.h
>> +++ b/arch/x86/include/asm/unistd_64.h
>> @@ -654,9 +654,9 @@ __SYSCALL(__NR_pipe2, sys_pipe2)
>>  #define __NR_inotify_init1                     294
>>  __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
>>  #define __NR_checkpoint                                295
>> -__SYSCALL(__NR_checkpoint, sys_checkpoint)
>> +__SYSCALL(__NR_checkpoint, stub_checkpoint)
>>  #define __NR_restart                           296
>> -__SYSCALL(__NR_restart, sys_restart)
>> +__SYSCALL(__NR_restart, stub_restart)
>>
>>
>>  #ifndef __NO_STUBS
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index b86f332..0369267 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -545,6 +545,14 @@ END(system_call)
>>  END(\label)
>>        .endm
>>
>> +       .macro FULLSTACKCALL label,func
>> +       .globl \label
>> +       \label:
>> +       leaq    \func(%rip),%rax
>> +       jmp     ptregscall_common
>> +       END(\label)
>> +       .endm
>> +
>>        CFI_STARTPROC
>>
>>        PTREGSCALL stub_clone, sys_clone, %r8
>> @@ -552,6 +560,8 @@ END(\label)
>>        PTREGSCALL stub_vfork, sys_vfork, %rdi
>>        PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
>>        PTREGSCALL stub_iopl, sys_iopl, %rsi
>> +       FULLSTACKCALL stub_restart, sys_restart
>> +       FULLSTACKCALL stub_checkpoint, sys_checkpoint
>>
>>  ENTRY(ptregscall_common)
>>        popq %r11
>> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
>> index 2514f14..a26332d 100644
>> --- a/arch/x86/mm/checkpoint.c
>> +++ b/arch/x86/mm/checkpoint.c
>> @@ -75,10 +75,10 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
>> struct task_struct *t)
>>        hh->ip = regs->ip;
>>        hh->cs = regs->cs;
>>        hh->flags = regs->flags;
>> +       hh->sp = regs->sp;
>>        hh->ss = regs->ss;
>>
>>  #ifdef CONFIG_X86_64
>> -       hh->sp = read_pda (oldrsp);
>>        hh->r8 = regs->r8;
>>        hh->r9 = regs->r9;
>>        hh->r10 = regs->r10;
>> @@ -90,7 +90,6 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
>> struct task_struct *t)
>>        hh->ds = thread->ds;
>>        hh->es = thread->es;
>>  #else /* !CONFIG_X86_64 */
>> -       hh->sp = regs->sp;
>>        hh->ds = regs->ds;
>>        hh->es = regs->es;
>>  #endif /* CONFIG_X86_64 */
>> diff --git a/arch/x86/mm/restart.c b/arch/x86/mm/restart.c
>> index a10d63e..329f938 100644
>> --- a/arch/x86/mm/restart.c
>> +++ b/arch/x86/mm/restart.c
>> @@ -111,15 +111,14 @@ static int cr_load_cpu_regs(struct cr_hdr_cpu *hh,
>> struct task_struct *t)
>>        regs->cs = hh->cs;
>>        regs->flags = hh->flags;
>>        regs->sp = hh->sp;
>> -       write_pda(oldrsp, hh->sp);
>>        regs->ss = hh->ss;
>>
>> -       thread->gs = hh->gs;
>> -       thread->fs = hh->fs;
>>  #ifdef CONFIG_X86_64
>>        do_arch_prctl(t, ARCH_SET_FS, hh->fs);
>>        do_arch_prctl(t, ARCH_SET_GS, hh->gs);
>>  #else
>> +       thread->gs = hh->gs;
>> +       thread->fs = hh->fs;
>>        loadsegment(gs, hh->gs);
>>        loadsegment(fs, hh->fs);
>>  #endif
>>
>>
>> --~--~---------~--~----~------------~-------~--~----~
>> You received this message because you are subscribed to the Google Groups
>> "kernel-live-migration" group.
>> To post to this group, send email to kernel-live-migration-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
>> To unsubscribe from this group, send email to
>> kernel-live-migration+unsubscribe-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org<kernel-live-migration%2Bunsubscribe-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> For more options, visit this group at
>> http://groups.google.com/a/google.com/group/kernel-live-migration?hl=en
>> -~----------~----~----~----~------~----~------~--~---
>>
>>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

  parent reply	other threads:[~2009-02-09 19:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-07  0:17 [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart) Nauman Rafique
     [not found] ` <20090207001609.8168.14884.stgit-AP77eCFSSktSzHKm+aFRNNkmqwFzkYv6@public.gmane.org>
2009-02-09 17:53   ` Jim Winget
     [not found]     ` <f4192e520902090953x43a98134hfaa8443d586a32a6-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-02-09 19:25       ` Mike Waychison [this message]
     [not found]         ` <49908323.3090606-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2009-02-09 20:14           ` Cedric Le Goater
     [not found]             ` <49908EA0.2080901-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-02-10 10:25               ` Louis Rilling
2009-02-09 18:02   ` Dave Hansen
2009-02-09 18:06     ` Dave Hansen
2009-02-10 22:27     ` Nauman Rafique
     [not found]       ` <e98e18940902101427i7459a7edke4fdd8404e2ef642-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-02-11  3:34         ` Nauman Rafique
     [not found]           ` <e98e18940902101934o6f93230ag7226da6013afd20-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-03-18  6:56             ` Oren Laadan
     [not found]               ` <49C09B03.6040403-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-03-20 17:21                 ` Nauman Rafique
  -- strict thread matches above, loose matches on Subject: below --
2009-01-27  9:12 Checkpoint / Restart Ralph-Gordon Paul
2009-01-27 15:59 ` Serge E. Hallyn
     [not found]   ` <20090127155947.GB10039-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-28  2:10     ` [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart) Masahiko Takahashi
     [not found]       ` <090128111035.M0106630-n+Fz6uxiQ6t02ytvwG4l7tBPR1lH4CV8@public.gmane.org>
2009-01-28 21:59         ` Serge E. Hallyn
     [not found]           ` <20090128215902.GA5635-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-29  1:45             ` Masahiko Takahashi
2009-02-04 16:21         ` Dave Hansen
2009-02-05  1:13           ` Masahiko Takahashi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49908323.3090606@google.com \
    --to=mikew-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
    --cc=Ralph-Gordon.Paul-4bfl1RV3iZDOEhgYWvzSCYQuADTiUCJX@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=dave-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=haveblue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=winget-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.