Linux Container Development
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
To: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [PATCH] checkpoint/restart of robust futex lists
Date: Thu, 18 Jun 2009 13:12:18 -0400	[thread overview]
Message-ID: <4A3A7572.9030907@cs.columbia.edu> (raw)
In-Reply-To: <20090603041919.GO9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>



Matt Helsley wrote:
>     Save and restore the [compat_]robust_list member of the task struct.
>     
>     These lists record which futexes the task holds. To keep the overhead of
>     robust futexes low the list is kept in userspace. When the task exits the
>     kernel carefully walks these lists to recover held futexes that
>     other tasks may be attempting to acquire with FUTEX_WAIT.
>     
>     Because they point to userspace memory that is saved/restored by
>     checkpoint/restart saving the list pointers works.
>     
>     While saving the pointers works during checkpoint, restart is tricky
>     because the robust futex ABI contains provisions for changes based on
>     checking the size of the list head. So we need to save the length of
>     the list head too in order to make sure that the kernel used during
>     restart is capable of handling that ABI. Since there is only one ABI
>     supported at the moment taking the list head's size is simple. Should
>     the ABI change we will need to use the same size as specified during
>     sys_set_robust_list() and hence some new means of determining the length
>     of this userspace structure in sys_checkpoint would be required.
>     
>     Rather than rewrite the logic that checks and handles the ABI we reuse
>     sys_set_robust_list() by factoring out the body of the function and
>     calling it during restart.
>     
>     Signed-off-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>

What happens if we checkpoint on a system with CONFIG_FUTEX and
restart on a a system with !CONFIG_FUTEX ?

Clearly, if processes used futex, restart should fail.

If they didn't - then all we know is that at the time of the
checkpoint they didn't. It could be that they didn't and will
never try, so restarting on a futex-less kernel is correct.

But it could be that they perhaps did and perhaps will attempt
again later on. In this case, restarting on a futex-less kernel
is incorrect, because it isn't compatible with the (intention)
of the original execution.

I suppose it's time to come up with some scheme to encode the
capabilities of a kernel in the checkpoint image. At restart
we can test compatibility of current kernel with original one,
and decide what to do.

The decision itself, or part of it - like the question above -
can be done in userspace.

The opposite case (checkpoint with !CONFIG_FUTEX and restart on
CONFIG_FUTEX) is always ok, because it will overwrite with null
values on the restarting tasks.

Oren.


> 
> diff --git a/checkpoint/process.c b/checkpoint/process.c
> index b604a85..084a2e4 100644
> --- a/checkpoint/process.c
> +++ b/checkpoint/process.c
> @@ -42,6 +42,17 @@ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t)
>  
>  	h->task_comm_len = TASK_COMM_LEN;
>  
> +#ifdef CONFIG_FUTEX
> +	/* These are __user pointers and can be saved without the objhash. */
> +	h->robust_futex_list = t->robust_list;
> +	h->robust_futex_head_len = sizeof(t->robust_list);
> +#ifdef CONFIG_COMPAT
> +	h->compat_robust_futex_list = t->compat_robust_list;
> +	h->compat_robust_futex_head_len = sizeof(t->compat_robust_list);
> +#endif
> +	/* FIXME save pi futex state?? */
> +#endif
> +
>  	/* FIXME: save remaining relevant task_struct fields */
>  
>  	ret = ckpt_write_obj(ctx, &h->h);
> @@ -366,6 +377,20 @@ static int restore_task_struct(struct ckpt_ctx *ctx)
>  	memset(t->comm, 0, TASK_COMM_LEN);
>  	ret = _ckpt_read_string(ctx, t->comm, h->task_comm_len);
>  
> +#ifdef CONFIG_FUTEX
> +	/* Since we restore the memory map the address remains the same and
> +	 * this is safe. This is the same as [compat_]sys_set_robust_list() */
> +	if (h->robust_futex_list) {
> +		struct robust_list_head __user *rfl = h->robust_futex_list;
> +		do_set_robust_list(rfl, h->robust_futex_head_len);
> +	}
> +#ifdef CONFIG_COMPAT
> +	if (h->compat_robust_futex_list) {
> +		struct compat_robust_list_head __user *crfl = h->compat_robust_futex_list;
> +		do_compat_set_robust_list(crfl, h->compat_robust_futex_head_len);
> +	}
> +#endif
> +#endif
>  	/* FIXME: restore remaining relevant task_struct fields */
>   out:
>  	ckpt_hdr_put(ctx, h);
> diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
> index cd427d8..f226f8f 100644
> --- a/include/linux/checkpoint_hdr.h
> +++ b/include/linux/checkpoint_hdr.h
> @@ -163,6 +163,12 @@ struct ckpt_hdr_task {
>  	__u32 exit_signal;
>  
>  	__u32 task_comm_len;
> +
> +	__u32 robust_futex_head_len;
> +	__u32 compat_robust_futex_head_len;
> +	__u64 robust_futex_list; /* a __user * */
> +	__u64 compat_robust_futex_list; /* a __user * */
> +	/* FIXME need futex prio inheritance state? */
>  } __attribute__((aligned(8)));
>  
>  /* namespaces */
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index f2ded21..f311e36 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -165,7 +165,8 @@ struct compat_robust_list_head {
>  };
>  
>  extern void compat_exit_robust_list(struct task_struct *curr);
> -
> +extern long do_compat_set_robust_list(struct compat_robust_list_head __user *head,
> +				      compat_size_t len);
>  asmlinkage long
>  compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
>  			   compat_size_t len);
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index dd0e06b..8685d1c 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -178,6 +178,7 @@ union futex_key {
>  #define FUTEX_KEY_INIT (union futex_key) { .both = { .ptr = NULL } }
>  
>  #ifdef CONFIG_FUTEX
> +extern long do_set_robust_list(struct robust_list_head __user *head, size_t len);
>  extern void exit_robust_list(struct task_struct *curr);
>  extern void exit_pi_state_list(struct task_struct *curr);
>  extern int futex_cmpxchg_enabled;
> diff --git a/kernel/futex.c b/kernel/futex.c
> index f405c73..cf80e7c 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1680,13 +1680,7 @@ pi_faulted:
>   * the list. There can only be one such pending lock.
>   */
>  
> -/**
> - * sys_set_robust_list - set the robust-futex list head of a task
> - * @head: pointer to the list-head
> - * @len: length of the list-head, as userspace expects
> - */
> -SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
> -		size_t, len)
> +long do_set_robust_list(struct robust_list_head __user *head, size_t len)
>  {
>  	if (!futex_cmpxchg_enabled)
>  		return -ENOSYS;
> @@ -1702,6 +1696,17 @@ SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
>  }
>  
>  /**
> + * sys_set_robust_list - set the robust-futex list head of a task
> + * @head: pointer to the list-head
> + * @len: length of the list-head, as userspace expects
> + */
> +SYSCALL_DEFINE2(set_robust_list, struct robust_list_head __user *, head,
> +		size_t, len)
> +{
> +	return do_set_robust_list(head, len);
> +}
> +
> +/**
>   * sys_get_robust_list - get the robust-futex list head of a task
>   * @pid: pid of the process [zero for current task]
>   * @head_ptr: pointer to a list-head pointer, the kernel fills it in
> diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
> index d607a5b..eac734c 100644
> --- a/kernel/futex_compat.c
> +++ b/kernel/futex_compat.c
> @@ -114,9 +114,9 @@ void compat_exit_robust_list(struct task_struct *curr)
>  	}
>  }
>  
> -asmlinkage long
> -compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
> -			   compat_size_t len)
> +long
> +do_compat_set_robust_list(struct compat_robust_list_head __user *head,
> +			  compat_size_t len)
>  {
>  	if (!futex_cmpxchg_enabled)
>  		return -ENOSYS;
> @@ -130,6 +130,13 @@ compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
>  }
>  
>  asmlinkage long
> +compat_sys_set_robust_list(struct compat_robust_list_head __user *head,
> +			   compat_size_t len)
> +{
> +	return do_compat_set_robust_list(head, len);
> +}
> +
> +asmlinkage long
>  compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
>  			   compat_size_t __user *len_ptr)
>  {
> 

  parent reply	other threads:[~2009-06-18 17:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-03  4:19 [PATCH] checkpoint/restart of robust futex lists Matt Helsley
     [not found] ` <20090603041919.GO9285-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-03 15:58   ` Serge E. Hallyn
     [not found]     ` <20090603155804.GA7848-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-08  2:28       ` Oren Laadan
2009-06-08  2:37   ` Oren Laadan
     [not found]     ` <4A2C7972.9090404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-06-08 14:08       ` Serge E. Hallyn
     [not found]         ` <20090608140810.GB29432-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-08 16:31           ` Oren Laadan
     [not found]             ` <4A2D3CE3.6030400-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-06-08 16:59               ` Serge E. Hallyn
     [not found]                 ` <20090608165951.GA3610-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-08 18:34                   ` Oren Laadan
2009-06-18 17:12   ` Oren Laadan [this message]
     [not found]     ` <4A3A7572.9030907-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-06-18 19:24       ` Nathan Lynch
     [not found]         ` <m31vphgxa0.fsf-e+AXbWqSrlAAvxtiuMwx3w@public.gmane.org>
2009-06-19  3:41           ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A3A7572.9030907@cs.columbia.edu \
    --to=orenl-eqauephvms7envbuuze7ea@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox