* [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code
@ 2009-01-15 5:05 Serge E. Hallyn
[not found] ` <20090115050523.GA10415-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Serge E. Hallyn @ 2009-01-15 5:05 UTC (permalink / raw)
To: Linux Containers; +Cc: Martin Schwidefsky, Arnd Bergmann
Hi,
here is a first stab at extending Oren's application c/r patchset
(http://lkml.org/lkml/2008/12/29/38) to s390. I pretty much spent a day
or two looking through the s390 include and .S files and then took a
stab, so I won't be surprised to find these patches (and myself) the
subject of ridicule. For instance, I'm really not *sure* whether I
should be backing up the acrs registers (some s390 docs suggested
userspace could use them), the ksp, or the vdso_base. But one thing
I've got going for me at least... it works!
Please take a look, point and laugh, and maybe even explain
what's so funny and how to improve them.
thanks,
-serge
^ permalink raw reply [flat|nested] 8+ messages in thread
* [RFC PATCH 1/2] c/r: hook checkpoint and restart for s390
[not found] ` <20090115050523.GA10415-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-01-15 5:06 ` Serge E. Hallyn
2009-01-15 5:06 ` [RFC PATCH 2/2] cr: s390: fill in the read/write routines Serge E. Hallyn
2009-01-15 9:39 ` [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code Martin Schwidefsky
2 siblings, 0 replies; 8+ messages in thread
From: Serge E. Hallyn @ 2009-01-15 5:06 UTC (permalink / raw)
To: Linux Containers; +Cc: Martin Schwidefsky, Arnd Bergmann
This is based almost 100% on the equivalent ppc patch by
Nathan Lynch. Thanks ntl :)
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
arch/s390/include/asm/checkpoint_hdr.h | 52 +++++++++++
arch/s390/include/asm/unistd.h | 4 +-
arch/s390/kernel/compat_wrapper.S | 12 +++
arch/s390/kernel/syscalls.S | 2 +
arch/s390/mm/Makefile | 1 +
arch/s390/mm/checkpoint.c | 153 ++++++++++++++++++++++++++++++++
checkpoint/Kconfig | 2 +-
checkpoint/checkpoint.c | 2 +
checkpoint/restart.c | 2 +
9 files changed, 228 insertions(+), 2 deletions(-)
create mode 100644 arch/s390/include/asm/checkpoint_hdr.h
create mode 100644 arch/s390/mm/checkpoint.c
diff --git a/arch/s390/include/asm/checkpoint_hdr.h b/arch/s390/include/asm/checkpoint_hdr.h
new file mode 100644
index 0000000..81ca76f
--- /dev/null
+++ b/arch/s390/include/asm/checkpoint_hdr.h
@@ -0,0 +1,52 @@
+#ifndef __ASM_S390_CKPT_HDR_H
+#define __ASM_S390_CKPT_HDR_H
+/*
+ * Checkpoint/restart - architecture specific headers s/390
+ *
+ * Copyright (C) 2008 Oren Laadan
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+
+#include <linux/types.h>
+
+/*
+ * To maintain compatibility between 32-bit and 64-bit architecture flavors,
+ * keep data 64-bit aligned: use padding for structure members, and use
+ * __attribute__((aligned (8))) for the entire structure.
+ *
+ * Quoting Arnd Bergmann:
+ * "This structure has an odd multiple of 32-bit members, which means
+ * that if you put it into a larger structure that also contains 64-bit
+ * members, the larger structure may get different alignment on x86-32
+ * and x86-64, which you might want to avoid. I can't tell if this is
+ * an actual problem here. ... In this case, I'm pretty sure that
+ * sizeof(cr_hdr_task) on x86-32 is different from x86-64, since it
+ * will be 32-bit aligned on x86-32."
+ */
+
+#ifdef __KERNEL__
+#include <asm/processor.h>
+#else
+#include <sys/user.h>
+#endif
+
+struct cr_hdr_head_arch {
+ __u16 unimplemented;
+};
+
+struct cr_hdr_thread {
+ __s16 unimplemented;
+};
+
+struct cr_hdr_cpu {
+ __u64 unimplemented;
+};
+
+struct cr_hdr_mm_context {
+ __s16 unimplemented;
+};
+
+#endif /* __ASM_S390_CKPT_HDR__H */
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index c8ad350..ffe64a0 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -265,7 +265,9 @@
#define __NR_pipe2 325
#define __NR_dup3 326
#define __NR_epoll_create1 327
-#define NR_syscalls 328
+#define __NR_checkpoint 328
+#define __NR_restart 329
+#define NR_syscalls 330
/*
* There are some system calls that are not present on 64 bit, some
diff --git a/arch/s390/kernel/compat_wrapper.S b/arch/s390/kernel/compat_wrapper.S
index fc2c971..9546a81 100644
--- a/arch/s390/kernel/compat_wrapper.S
+++ b/arch/s390/kernel/compat_wrapper.S
@@ -1767,3 +1767,15 @@ sys_dup3_wrapper:
sys_epoll_create1_wrapper:
lgfr %r2,%r2 # int
jg sys_epoll_create1 # branch to system call
+
+ .globl sys_checkpoint_wrapper
+sys_checkpoint_wrapper:
+ lgfr %r2,%r2 # pid_t
+ lgfr %r3,%r3 # int
+ llgfr %r4,%r4 # unsigned long
+
+ .globl sys_restart_wrapper
+sys_restart_wrapper:
+ lgfr %r2,%r2 # int
+ lgfr %r3,%r3 # int
+ llgfr %r4,%r4 # unsigned long
diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
index 2d61787..54316c8 100644
--- a/arch/s390/kernel/syscalls.S
+++ b/arch/s390/kernel/syscalls.S
@@ -336,3 +336,5 @@ SYSCALL(sys_inotify_init1,sys_inotify_init1,sys_inotify_init1_wrapper)
SYSCALL(sys_pipe2,sys_pipe2,sys_pipe2_wrapper) /* 325 */
SYSCALL(sys_dup3,sys_dup3,sys_dup3_wrapper)
SYSCALL(sys_epoll_create1,sys_epoll_create1,sys_epoll_create1_wrapper)
+SYSCALL(sys_checkpoint,sys_checkpoint,sys_checkpoint_wrapper)
+SYSCALL(sys_restart,sys_restart,sys_restart_wrapper)
diff --git a/arch/s390/mm/Makefile b/arch/s390/mm/Makefile
index 2a74581..b3f0f32 100644
--- a/arch/s390/mm/Makefile
+++ b/arch/s390/mm/Makefile
@@ -6,3 +6,4 @@ obj-y := init.o fault.o extmem.o mmap.o vmem.o pgtable.o
obj-$(CONFIG_CMM) += cmm.o
obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
obj-$(CONFIG_PAGE_STATES) += page-states.o
+obj-$(CONFIG_CHECKPOINT_RESTART) += checkpoint.o
diff --git a/arch/s390/mm/checkpoint.c b/arch/s390/mm/checkpoint.c
new file mode 100644
index 0000000..7f7e0b1
--- /dev/null
+++ b/arch/s390/mm/checkpoint.c
@@ -0,0 +1,153 @@
+/*
+ * Checkpoint/restart - architecture specific support for s390
+ *
+ * Copyright (C) 2008 Oren Laadan
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+#include <linux/kernel.h>
+
+/* dump the thread_struct of a given task */
+int cr_write_thread(struct cr_ctx *ctx, struct task_struct *t)
+{
+ struct cr_hdr h;
+ struct cr_hdr_thread *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ struct thread_struct *thread;
+ int ret;
+
+ h.type = CR_HDR_THREAD;
+ h.len = sizeof(*hh);
+ h.parent = task_pid_vnr(t);
+
+ thread = &t->thread;
+
+ hh->unimplemented = 0xbeef;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+ WARN_ON_ONCE(ret < 0);
+
+ return ret;
+}
+
+/* dump the cpu state and registers of a given task */
+int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
+{
+ struct cr_hdr h;
+ struct cr_hdr_cpu *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ int ret;
+
+ h.type = CR_HDR_CPU;
+ h.len = sizeof(*hh);
+ h.parent = task_pid_vnr(t);
+
+ hh->unimplemented = 0xdeadbeef;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+ WARN_ON_ONCE(ret < 0);
+
+ return ret;
+}
+
+int cr_write_head_arch(struct cr_ctx *ctx)
+{
+ struct cr_hdr h;
+ struct cr_hdr_head_arch *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ int ret;
+
+ h.type = CR_HDR_HEAD_ARCH;
+ h.len = sizeof(*hh);
+ h.parent = 0;
+
+ /* FIXME: FPU/altivec etc */
+ hh->unimplemented = 0xbeef;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+
+ WARN_ON_ONCE(ret < 0);
+
+ return ret;
+}
+
+/* dump the mm->context state */
+int cr_write_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int parent)
+{
+ struct cr_hdr h;
+ struct cr_hdr_mm_context *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ int ret;
+
+ h.type = CR_HDR_MM_CONTEXT;
+ h.len = sizeof(*hh);
+ h.parent = parent;
+
+ hh->unimplemented = 0xbeef;
+
+ ret = cr_write_obj(ctx, &h, hh);
+ cr_hbuf_put(ctx, sizeof(*hh));
+
+ WARN_ON_ONCE(ret < 0);
+ if (ret < 0)
+ goto out;
+
+ /* FIXME: NFI. */
+ ret = 0;
+out:
+ return ret;
+}
+
+/* restart APIs */
+
+/* read the thread_struct into the current task */
+int cr_read_thread(struct cr_ctx *ctx)
+{
+ WARN_ON_ONCE(true);
+ return -ENOSYS;
+}
+
+int cr_read_cpu(struct cr_ctx *ctx)
+{
+ WARN_ON_ONCE(true);
+ return -ENOSYS;
+}
+
+int cr_read_head_arch(struct cr_ctx *ctx)
+{
+ struct cr_hdr_head_arch *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ int parent, ret = 0;
+
+ parent = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_HEAD_ARCH);
+ if (parent < 0) {
+ ret = parent;
+ goto out;
+ } else if (parent != 0)
+ goto out;
+out:
+ cr_hbuf_put(ctx, sizeof(*hh));
+ return ret;
+}
+
+int cr_read_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int rparent)
+{
+ struct cr_hdr_mm_context *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ int parent, ret = -EINVAL;
+
+ parent = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_MM_CONTEXT);
+ if (parent < 0) {
+ ret = parent;
+ goto out;
+ }
+ if (parent != rparent)
+ goto out;
+
+ WARN_ON_ONCE(hh->unimplemented != (__s16)0xbeef);
+out:
+ cr_hbuf_put(ctx, sizeof(*hh));
+ return ret;
+}
diff --git a/checkpoint/Kconfig b/checkpoint/Kconfig
index ffaa635..31e7594 100644
--- a/checkpoint/Kconfig
+++ b/checkpoint/Kconfig
@@ -1,7 +1,7 @@
config CHECKPOINT_RESTART
prompt "Enable checkpoint/restart (EXPERIMENTAL)"
def_bool n
- depends on X86_32 && EXPERIMENTAL
+ depends on (X86_32 || S390) && EXPERIMENTAL
help
Application checkpoint/restart is the ability to save the
state of a running application so that it can later resume
diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
index fbcd9eb..06e15fc 100644
--- a/checkpoint/checkpoint.c
+++ b/checkpoint/checkpoint.c
@@ -8,6 +8,8 @@
* distribution for more details.
*/
+#define DEBUG 1
+
#include <linux/version.h>
#include <linux/sched.h>
#include <linux/ptrace.h>
diff --git a/checkpoint/restart.c b/checkpoint/restart.c
index 6b4cd75..f65a63e 100644
--- a/checkpoint/restart.c
+++ b/checkpoint/restart.c
@@ -8,6 +8,8 @@
* distribution for more details.
*/
+#define DEBUG 1
+
#include <linux/version.h>
#include <linux/sched.h>
#include <linux/wait.h>
--
1.6.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 2/2] cr: s390: fill in the read/write routines
[not found] ` <20090115050523.GA10415-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-15 5:06 ` [RFC PATCH 1/2] c/r: hook checkpoint and restart for s390 Serge E. Hallyn
@ 2009-01-15 5:06 ` Serge E. Hallyn
2009-01-15 9:39 ` [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code Martin Schwidefsky
2 siblings, 0 replies; 8+ messages in thread
From: Serge E. Hallyn @ 2009-01-15 5:06 UTC (permalink / raw)
To: Linux Containers; +Cc: Martin Schwidefsky, Arnd Bergmann
This gets a simple checkpoint/restart working on an s390x.
Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
arch/s390/include/asm/checkpoint_hdr.h | 35 +++++++++-
arch/s390/mm/checkpoint.c | 110 +++++++++++++++++++++++++++++---
2 files changed, 132 insertions(+), 13 deletions(-)
diff --git a/arch/s390/include/asm/checkpoint_hdr.h b/arch/s390/include/asm/checkpoint_hdr.h
index 81ca76f..2be5ced 100644
--- a/arch/s390/include/asm/checkpoint_hdr.h
+++ b/arch/s390/include/asm/checkpoint_hdr.h
@@ -34,19 +34,46 @@
#endif
struct cr_hdr_head_arch {
- __u16 unimplemented;
+ __u64 unimplemented;
};
struct cr_hdr_thread {
- __s16 unimplemented;
+ /* restart blocks */
+ __u64 unimplemented;
};
+/*
+ * Notes
+ * NUM_GPRS defined in <asm/ptrace.h> to be 16
+ * NUM_FPRS defined in <asm/ptrace.h> to be 16
+ * NUM_APRS defined in <asm/ptrace.h> to be 16
+ */
struct cr_hdr_cpu {
- __u64 unimplemented;
+ psw_t psw;
+ unsigned long args[1];
+ s390_fp_regs fp_regs;
+ unsigned long gprs[NUM_GPRS];
+ unsigned long orig_gpr2;
+ unsigned short svcnr;
+ unsigned short ilc;
+ unsigned int acrs[NUM_ACRS];
+ unsigned long ksp;
+ unsigned long prot_addr;
+ unsigned int trap_no;
+ per_struct per_info;
+ unsigned long ieee_instruction_pointer;
+ unsigned long pfault_wait;
};
struct cr_hdr_mm_context {
- __s16 unimplemented;
+#if 0
+ unsigned long asce_bits;
+ unsigned long asce_limit;
+ int noexec;
+ int has_pgste;
+ int alloc_pgste;
+#endif
+ unsigned long vdso_base;
};
#endif /* __ASM_S390_CKPT_HDR__H */
diff --git a/arch/s390/mm/checkpoint.c b/arch/s390/mm/checkpoint.c
index 7f7e0b1..b2d7841 100644
--- a/arch/s390/mm/checkpoint.c
+++ b/arch/s390/mm/checkpoint.c
@@ -35,6 +35,29 @@ int cr_write_thread(struct cr_ctx *ctx, struct task_struct *t)
return ret;
}
+static void cr_save_cpu_regs(struct cr_hdr_cpu *hh, struct task_struct *t)
+{
+ struct thread_struct *thread = &t->thread;
+ struct pt_regs *regs = task_pt_regs(t);
+
+ memcpy(&hh->psw, ®s->psw, sizeof(psw_t));
+ hh->args[0] = regs->args[0];
+ hh->svcnr = regs->svcnr;
+ hh->ilc = regs->ilc;
+ memcpy(hh->gprs, regs->gprs, NUM_GPRS*sizeof(unsigned long));
+ hh->orig_gpr2 = regs->orig_gpr2;
+
+ memcpy(&hh->fp_regs, &thread->fp_regs, sizeof(s390_fp_regs));
+ memcpy(hh->acrs, thread->acrs, NUM_ACRS * sizeof(unsigned int));
+ hh->ksp = thread->ksp;
+ printk(KERN_NOTICE "%s: saving ksp as %lx\n", __func__, hh->ksp);
+ hh->prot_addr = thread->prot_addr;
+ hh->trap_no = thread->trap_no;
+ memcpy(&hh->per_info, &thread->per_info, sizeof(per_struct));
+ hh->ieee_instruction_pointer = thread->ieee_instruction_pointer;
+ hh->pfault_wait = thread->pfault_wait;
+}
+
/* dump the cpu state and registers of a given task */
int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
{
@@ -46,7 +69,7 @@ int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
h.len = sizeof(*hh);
h.parent = task_pid_vnr(t);
- hh->unimplemented = 0xdeadbeef;
+ cr_save_cpu_regs(hh, t);
ret = cr_write_obj(ctx, &h, hh);
cr_hbuf_put(ctx, sizeof(*hh));
@@ -87,16 +110,22 @@ int cr_write_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int parent)
h.len = sizeof(*hh);
h.parent = parent;
- hh->unimplemented = 0xbeef;
+#if 1
+ hh->vdso_base = mm->context.vdso_base;
+#else
+ hh->asce_bits = mm->context.asce_bits;
+ hh->asce_limit = mm->context.asce_limit;
+ hh->noexec = mm->context.noexec;
+ hh->has_pgste = mm->context.has_pgste;
+ hh->alloc_pgste = mm->context.alloc_pgste;
+#endif
ret = cr_write_obj(ctx, &h, hh);
cr_hbuf_put(ctx, sizeof(*hh));
- WARN_ON_ONCE(ret < 0);
if (ret < 0)
goto out;
- /* FIXME: NFI. */
ret = 0;
out:
return ret;
@@ -107,14 +136,64 @@ out:
/* read the thread_struct into the current task */
int cr_read_thread(struct cr_ctx *ctx)
{
- WARN_ON_ONCE(true);
- return -ENOSYS;
+ struct cr_hdr_thread *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ int parent, ret;
+
+ parent = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_THREAD);
+ if (parent < 0) {
+ ret = parent;
+ goto out;
+ }
+
+ if (hh->unimplemented != 0xbeef) {
+ printk(KERN_NOTICE "Error: cr file corrupted\n");
+ ret = -EINVAL;
+ goto out;
+ }
+ ret = 0;
+
+out:
+ cr_hbuf_put(ctx, sizeof(*hh));
+ return 0;
}
int cr_read_cpu(struct cr_ctx *ctx)
{
- WARN_ON_ONCE(true);
- return -ENOSYS;
+ struct cr_hdr_cpu *hh = cr_hbuf_get(ctx, sizeof(*hh));
+ struct thread_struct *thread = ¤t->thread;
+ struct pt_regs *regs = task_pt_regs(current);
+ int parent, ret;
+
+ parent = cr_read_obj_type(ctx, hh, sizeof(*hh), CR_HDR_CPU);
+ if (parent < 0) {
+ ret = parent;
+ goto out;
+ }
+ ret = 0;
+
+ //memcpy(®s->psw, &hh->psw, sizeof(psw_t));
+ regs->psw.addr &= ~PSW_ADDR_INSN;
+ regs->psw.addr |= hh->psw.addr & PSW_ADDR_INSN;
+ regs->args[0] = hh->args[0];
+ regs->svcnr = hh->svcnr;
+ regs->ilc = hh->ilc;
+ memcpy(regs->gprs, hh->gprs, NUM_GPRS*sizeof(unsigned long));
+ regs->orig_gpr2 = hh->orig_gpr2;
+
+ memcpy(&thread->fp_regs, &hh->fp_regs, sizeof(s390_fp_regs));
+ memcpy(thread->acrs, hh->acrs, NUM_ACRS * sizeof(unsigned int));
+ printk(KERN_NOTICE "%s: orig task's ksp was %lx\n", __func__, thread->ksp);
+ thread->ksp = hh->ksp;
+ printk(KERN_NOTICE "%s: restoring ksp as %lx\n", __func__, hh->ksp);
+ thread->prot_addr = hh->prot_addr;
+ thread->trap_no = hh->trap_no;
+ memcpy(&thread->per_info, &hh->per_info, sizeof(per_struct));
+ thread->ieee_instruction_pointer = hh->ieee_instruction_pointer;
+ thread->pfault_wait = hh->pfault_wait;
+
+out:
+ cr_hbuf_put(ctx, sizeof(*hh));
+ return ret;
}
int cr_read_head_arch(struct cr_ctx *ctx)
@@ -128,6 +207,11 @@ int cr_read_head_arch(struct cr_ctx *ctx)
goto out;
} else if (parent != 0)
goto out;
+
+ if (hh->unimplemented != 0xbeef) {
+ printk(KERN_NOTICE "%s: checkpoint file corrupt\n", __func__);
+ ret = -EINVAL;
+ }
out:
cr_hbuf_put(ctx, sizeof(*hh));
return ret;
@@ -146,7 +230,15 @@ int cr_read_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int rparent)
if (parent != rparent)
goto out;
- WARN_ON_ONCE(hh->unimplemented != (__s16)0xbeef);
+#if 0
+ mm->context.asce_bits = hh->asce_bits;
+ mm->context.asce_limit = hh->asce_limit;
+ mm->context.noexec = hh->noexec;
+ mm->context.has_pgste = hh->has_pgste;
+ mm->context.alloc_pgste = hh->alloc_pgste;
+#endif
+ mm->context.vdso_base = hh->vdso_base;
+ ret = 0;
out:
cr_hbuf_put(ctx, sizeof(*hh));
return ret;
--
1.6.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code
[not found] ` <20090115050523.GA10415-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-15 5:06 ` [RFC PATCH 1/2] c/r: hook checkpoint and restart for s390 Serge E. Hallyn
2009-01-15 5:06 ` [RFC PATCH 2/2] cr: s390: fill in the read/write routines Serge E. Hallyn
@ 2009-01-15 9:39 ` Martin Schwidefsky
2009-01-15 9:55 ` Oren Laadan
2009-01-15 16:29 ` Serge E. Hallyn
2 siblings, 2 replies; 8+ messages in thread
From: Martin Schwidefsky @ 2009-01-15 9:39 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: Linux Containers, Arnd Bergmann
On Wed, 2009-01-14 at 23:05 -0600, Serge E. Hallyn wrote:
> Hi,
>
> here is a first stab at extending Oren's application c/r patchset
> (http://lkml.org/lkml/2008/12/29/38) to s390. I pretty much spent a day
> or two looking through the s390 include and .S files and then took a
> stab, so I won't be surprised to find these patches (and myself) the
> subject of ridicule. For instance, I'm really not *sure* whether I
> should be backing up the acrs registers (some s390 docs suggested
> userspace could use them), the ksp, or the vdso_base. But one thing
> I've got going for me at least... it works!
The access registers need to be saved, a0/a1 contain the TLS pointer and
the user can store anything to a2-a15. The ksp does not have to be
stored as it cannot contain an important value. If it would then we'd
have kernel state which would break checkpoint/restart. The restart code
needs to come up with a sensible initial value for ksp though. The
vdso_base code needs to be stored as well.
This hunk from patch #2 worries me a bit:
struct cr_hdr_mm_context {
- __s16 unimplemented;
+#if 0
+ unsigned long asce_bits;
+ unsigned long asce_limit;
+ int noexec;
+ int has_pgste;
+ int alloc_pgste;
+#endif
+ unsigned long vdso_base;
};
The page table can have 2, 3, or 4 levels and if KVM is used the page
tables have a the pgste table attached to them. If that is ignored then
the creation of the process address space on restart is definitly
broken.
> Please take a look, point and laugh, and maybe even explain
> what's so funny and how to improve them.
It doesn't look THAT bad ;-)
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code
2009-01-15 9:39 ` [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code Martin Schwidefsky
@ 2009-01-15 9:55 ` Oren Laadan
[not found] ` <496F082C.3020008-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-15 16:29 ` Serge E. Hallyn
1 sibling, 1 reply; 8+ messages in thread
From: Oren Laadan @ 2009-01-15 9:55 UTC (permalink / raw)
To: schwidefsky-tA70FqPdS9bQT0dZR+AlfA; +Cc: Linux Containers, Arnd Bergmann
Martin Schwidefsky wrote:
> On Wed, 2009-01-14 at 23:05 -0600, Serge E. Hallyn wrote:
>> Hi,
>>
>> here is a first stab at extending Oren's application c/r patchset
>> (http://lkml.org/lkml/2008/12/29/38) to s390. I pretty much spent a day
>> or two looking through the s390 include and .S files and then took a
>> stab, so I won't be surprised to find these patches (and myself) the
>> subject of ridicule. For instance, I'm really not *sure* whether I
>> should be backing up the acrs registers (some s390 docs suggested
>> userspace could use them), the ksp, or the vdso_base. But one thing
>> I've got going for me at least... it works!
>
> The access registers need to be saved, a0/a1 contain the TLS pointer and
> the user can store anything to a2-a15. The ksp does not have to be
> stored as it cannot contain an important value. If it would then we'd
> have kernel state which would break checkpoint/restart. The restart code
> needs to come up with a sensible initial value for ksp though. The
> vdso_base code needs to be stored as well.
>
> This hunk from patch #2 worries me a bit:
>
> struct cr_hdr_mm_context {
> - __s16 unimplemented;
> +#if 0
> + unsigned long asce_bits;
> + unsigned long asce_limit;
> + int noexec;
> + int has_pgste;
> + int alloc_pgste;
> +#endif
> + unsigned long vdso_base;
> };
>
> The page table can have 2, 3, or 4 levels and if KVM is used the page
> tables have a the pgste table attached to them. If that is ignored then
> the creation of the process address space on restart is definitly
> broken.
Disclaimer: I have zero knowledge about s390 specifics, so take
this with a grain of salt...
That said, I wonder why would we care about the page table choice ?
Does user-level have any notion of this low-level detail ?
We save the VMAs in checkpoint, and reconstruct them in restart by
calling do_mmap_pgoff(). The nearest we get to that level is in
calling follow_page() in cr_consider_private_page(), at checkpoint.
I'd expect everything below to be entirely transparent to us.
>
>> Please take a look, point and laugh, and maybe even explain
>> what's so funny and how to improve them.
>
> It doesn't look THAT bad ;-)
>
Thanks, Serge. It looks perfect to me ... see disclaimer above :p
Oren.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code
[not found] ` <496F082C.3020008-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2009-01-15 10:10 ` Martin Schwidefsky
2009-01-15 22:25 ` Serge E. Hallyn
0 siblings, 1 reply; 8+ messages in thread
From: Martin Schwidefsky @ 2009-01-15 10:10 UTC (permalink / raw)
To: Oren Laadan; +Cc: Linux Containers, Arnd Bergmann
On Thu, 2009-01-15 at 04:55 -0500, Oren Laadan wrote:
>
> Martin Schwidefsky wrote:
> > This hunk from patch #2 worries me a bit:
> >
> > struct cr_hdr_mm_context {
> > - __s16 unimplemented;
> > +#if 0
> > + unsigned long asce_bits;
> > + unsigned long asce_limit;
> > + int noexec;
> > + int has_pgste;
> > + int alloc_pgste;
> > +#endif
> > + unsigned long vdso_base;
> > };
> >
> > The page table can have 2, 3, or 4 levels and if KVM is used the page
> > tables have a the pgste table attached to them. If that is ignored then
> > the creation of the process address space on restart is definitly
> > broken.
>
> Disclaimer: I have zero knowledge about s390 specifics, so take
> this with a grain of salt...
>
> That said, I wonder why would we care about the page table choice ?
> Does user-level have any notion of this low-level detail ?
>
> We save the VMAs in checkpoint, and reconstruct them in restart by
> calling do_mmap_pgoff(). The nearest we get to that level is in
> calling follow_page() in cr_consider_private_page(), at checkpoint.
>
> I'd expect everything below to be entirely transparent to us.
Ok, the recreation of the VMs with do_mmap_pgoff takes care of the
number of page table leves. It gets automatically upgraded when the
first VMA is mapped that is over the limit.
What is left are the pgstes tables. After you forked the new process
that is used to restart a KVM enabled process you need to call
s390_enable_sie(), preferably before you recreate the VMAs.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code
2009-01-15 9:39 ` [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code Martin Schwidefsky
2009-01-15 9:55 ` Oren Laadan
@ 2009-01-15 16:29 ` Serge E. Hallyn
1 sibling, 0 replies; 8+ messages in thread
From: Serge E. Hallyn @ 2009-01-15 16:29 UTC (permalink / raw)
To: Martin Schwidefsky; +Cc: Linux Containers, Arnd Bergmann
Quoting Martin Schwidefsky (schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org):
> On Wed, 2009-01-14 at 23:05 -0600, Serge E. Hallyn wrote:
> > Hi,
> >
> > here is a first stab at extending Oren's application c/r patchset
> > (http://lkml.org/lkml/2008/12/29/38) to s390. I pretty much spent a day
> > or two looking through the s390 include and .S files and then took a
> > stab, so I won't be surprised to find these patches (and myself) the
> > subject of ridicule. For instance, I'm really not *sure* whether I
> > should be backing up the acrs registers (some s390 docs suggested
> > userspace could use them), the ksp, or the vdso_base. But one thing
> > I've got going for me at least... it works!
>
> The access registers need to be saved, a0/a1 contain the TLS pointer and
> the user can store anything to a2-a15. The ksp does not have to be
> stored as it cannot contain an important value. If it would then we'd
Ok, will drop the kso part.
> have kernel state which would break checkpoint/restart. The restart code
> needs to come up with a sensible initial value for ksp though. The
> vdso_base code needs to be stored as well.
But the vdso is set up at exec() time, right? So if I reset vdso_base
to the checkpointed value, might it actually end up at the wrong place,
since the exec() of the 'restart' program might have placed the vdso
at a different location than where it was at checkpoint time?
I also notice that on s390 vdso_base seems to always be either
20000020000 or 20000000000, and so far my checkpointed and restart
programs have always had 20000000000, so testing doesn't really help :)
-serge
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code
2009-01-15 10:10 ` Martin Schwidefsky
@ 2009-01-15 22:25 ` Serge E. Hallyn
0 siblings, 0 replies; 8+ messages in thread
From: Serge E. Hallyn @ 2009-01-15 22:25 UTC (permalink / raw)
To: Martin Schwidefsky; +Cc: Linux Containers, Arnd Bergmann
Quoting Martin Schwidefsky (schwidefsky-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org):
> What is left are the pgstes tables. After you forked the new process
> that is used to restart a KVM enabled process you need to call
> s390_enable_sie(), preferably before you recreate the VMAs.
So, let's say we're checkpointing 3 tasks in a s390-kvm.
Now we restart them outside of kvm. Some process will
fork(), exec() some restart program, which will in turn
fork() twice, then each of those programs will call
sys_restart(), read the info pertaining to the checkpoint
task they are to re-create, and set themselves up.
Should the s390_enable_sie() then not be correctly set
by the system automatically? So whether or not I'm
restarting in kvm, the kvm_arch_create_vm() for s390
will have been called?
Or do I misunderstand?
thanks,
-serge
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-01-15 22:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-15 5:05 [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code Serge E. Hallyn
[not found] ` <20090115050523.GA10415-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-15 5:06 ` [RFC PATCH 1/2] c/r: hook checkpoint and restart for s390 Serge E. Hallyn
2009-01-15 5:06 ` [RFC PATCH 2/2] cr: s390: fill in the read/write routines Serge E. Hallyn
2009-01-15 9:39 ` [RFC PATCH 0/2] cr: Introduce s390x checkpoint/restart code Martin Schwidefsky
2009-01-15 9:55 ` Oren Laadan
[not found] ` <496F082C.3020008-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-01-15 10:10 ` Martin Schwidefsky
2009-01-15 22:25 ` Serge E. Hallyn
2009-01-15 16:29 ` Serge E. Hallyn
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.