* [PATCH] Clear the objhash before completing restart, but delay free (v2)
@ 2010-10-19 14:03 Dan Smith
[not found] ` <1287496991-18207-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Dan Smith @ 2010-10-19 14:03 UTC (permalink / raw)
To: orenl-eQaUEPhvms7ENvBUuze7eA
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
This patch causes the restart coordinator to clear the object hash
before releasing the restarted tasks. It does this to make sure
that any objects being held exclusively by the hash are released
before the tasks start running again.
If we continue to postpone clearing the object hash until restart
returns to userspace there can be a race where the restarted tasks
behave differently due to the references held by the objhash.
One specific example of this is restarting half-closed pipes.
Without this patch we've got a race between the coordinator --
about to clear the object hash -- and two restarted tasks connected
via a half-closed pipe. Because the object hash contains a reference
to both ends of the pipe one end of the pipe will not be closed
and EPIPE/SIGPIPE won't be handled when reading from the pipe
for instance. As far as the restarted userspace task can tell the
pipe may briefly appear to re-open. Moving the object hash clear
prevents this race and others like it.
Note that eventually the coordinator would close the pipe and correct
behavior would be restored. Thus this bug would only affect the
correctness of userspace -- after a close() the pipe may briefly re-open
and allow more data to be sent before automatically closing again.
To avoid the overhead of actually freeing the object hash's structures
at the same time, this adds a queue to ckpt_obj_hash and pushes
the ckpt_obj structures there to be free()'d later during the cleanup
process.
Changes in v2:
- Avoid adding another list_head to ckpt_obj by re-using the hlist
hash node for the free list
Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
include/linux/checkpoint.h | 1 +
kernel/checkpoint/objhash.c | 18 +++++++++++++++---
kernel/checkpoint/restart.c | 2 ++
3 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index a11d40e..f888363 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -179,6 +179,7 @@ extern void restore_notify_error(struct ckpt_ctx *ctx);
extern int ckpt_obj_module_get(void);
extern void ckpt_obj_module_put(void);
+extern void ckpt_obj_hash_clear(struct ckpt_ctx *ctx);
extern void ckpt_obj_hash_free(struct ckpt_ctx *ctx);
extern int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx);
diff --git a/kernel/checkpoint/objhash.c b/kernel/checkpoint/objhash.c
index 62c34ff..4aae0c0 100644
--- a/kernel/checkpoint/objhash.c
+++ b/kernel/checkpoint/objhash.c
@@ -36,6 +36,7 @@ struct ckpt_obj {
struct ckpt_obj_hash {
struct hlist_head *head;
struct hlist_head list;
+ struct hlist_head free;
int next_free_objref;
};
@@ -128,8 +129,9 @@ int ckpt_obj_module_get(void)
#define CKPT_OBJ_HASH_NBITS 10
#define CKPT_OBJ_HASH_TOTAL (1UL << CKPT_OBJ_HASH_NBITS)
-static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
+void ckpt_obj_hash_clear(struct ckpt_ctx *ctx)
{
+ struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
struct hlist_head *h = obj_hash->head;
struct hlist_node *n, *t;
struct ckpt_obj *obj;
@@ -139,7 +141,9 @@ static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
hlist_for_each_entry_safe(obj, n, t, &h[i], hash) {
if (obj->ops->ref_drop)
obj->ops->ref_drop(obj->ptr, 1);
- kfree(obj);
+ hlist_del_init(&obj->hash);
+ hlist_del(&obj->next);
+ hlist_add_head(&obj->hash, &obj_hash->free);
}
}
}
@@ -149,7 +153,14 @@ void ckpt_obj_hash_free(struct ckpt_ctx *ctx)
struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
if (obj_hash) {
- obj_hash_clear(obj_hash);
+ struct hlist_node *pos, *n;
+ struct ckpt_obj *obj;
+
+ ckpt_obj_hash_clear(ctx);
+
+ hlist_for_each_entry_safe(obj, pos, n, &obj_hash->free, hash)
+ kfree(obj);
+
kfree(obj_hash->head);
kfree(ctx->obj_hash);
ctx->obj_hash = NULL;
@@ -173,6 +184,7 @@ int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx)
obj_hash->head = head;
obj_hash->next_free_objref = 1;
INIT_HLIST_HEAD(&obj_hash->list);
+ INIT_HLIST_HEAD(&obj_hash->free);
ctx->obj_hash = obj_hash;
return 0;
diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
index 17270b8..f2241a9 100644
--- a/kernel/checkpoint/restart.c
+++ b/kernel/checkpoint/restart.c
@@ -1349,6 +1349,8 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
if (ret < 0)
ckpt_err(ctx, ret, "restart failed (coordinator)\n");
+ ckpt_obj_hash_clear(ctx);
+
if (ckpt_test_error(ctx)) {
destroy_descendants(ctx);
ret = ckpt_get_error(ctx);
--
1.7.2.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <1287496991-18207-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-10-19 18:46 ` Matt Helsley
2010-11-01 16:24 ` Oren Laadan
2011-02-01 0:31 ` Oren Laadan
2 siblings, 0 replies; 9+ messages in thread
From: Matt Helsley @ 2010-10-19 18:46 UTC (permalink / raw)
To: Dan Smith
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
On Tue, Oct 19, 2010 at 07:03:11AM -0700, Dan Smith wrote:
> This patch causes the restart coordinator to clear the object hash
> before releasing the restarted tasks. It does this to make sure
> that any objects being held exclusively by the hash are released
> before the tasks start running again.
>
> If we continue to postpone clearing the object hash until restart
> returns to userspace there can be a race where the restarted tasks
> behave differently due to the references held by the objhash.
> One specific example of this is restarting half-closed pipes.
> Without this patch we've got a race between the coordinator --
> about to clear the object hash -- and two restarted tasks connected
> via a half-closed pipe. Because the object hash contains a reference
> to both ends of the pipe one end of the pipe will not be closed
> and EPIPE/SIGPIPE won't be handled when reading from the pipe
> for instance. As far as the restarted userspace task can tell the
> pipe may briefly appear to re-open. Moving the object hash clear
> prevents this race and others like it.
>
> Note that eventually the coordinator would close the pipe and correct
> behavior would be restored. Thus this bug would only affect the
> correctness of userspace -- after a close() the pipe may briefly re-open
> and allow more data to be sent before automatically closing again.
>
> To avoid the overhead of actually freeing the object hash's structures
> at the same time, this adds a queue to ckpt_obj_hash and pushes
> the ckpt_obj structures there to be free()'d later during the cleanup
> process.
>
> Changes in v2:
> - Avoid adding another list_head to ckpt_obj by re-using the hlist
> hash node for the free list
>
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Looks good.
Reviewed-by: Matt Helsley <matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> include/linux/checkpoint.h | 1 +
> kernel/checkpoint/objhash.c | 18 +++++++++++++++---
> kernel/checkpoint/restart.c | 2 ++
> 3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
> index a11d40e..f888363 100644
> --- a/include/linux/checkpoint.h
> +++ b/include/linux/checkpoint.h
> @@ -179,6 +179,7 @@ extern void restore_notify_error(struct ckpt_ctx *ctx);
> extern int ckpt_obj_module_get(void);
> extern void ckpt_obj_module_put(void);
>
> +extern void ckpt_obj_hash_clear(struct ckpt_ctx *ctx);
> extern void ckpt_obj_hash_free(struct ckpt_ctx *ctx);
> extern int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx);
>
> diff --git a/kernel/checkpoint/objhash.c b/kernel/checkpoint/objhash.c
> index 62c34ff..4aae0c0 100644
> --- a/kernel/checkpoint/objhash.c
> +++ b/kernel/checkpoint/objhash.c
> @@ -36,6 +36,7 @@ struct ckpt_obj {
> struct ckpt_obj_hash {
> struct hlist_head *head;
> struct hlist_head list;
> + struct hlist_head free;
> int next_free_objref;
> };
>
> @@ -128,8 +129,9 @@ int ckpt_obj_module_get(void)
> #define CKPT_OBJ_HASH_NBITS 10
> #define CKPT_OBJ_HASH_TOTAL (1UL << CKPT_OBJ_HASH_NBITS)
>
> -static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
> +void ckpt_obj_hash_clear(struct ckpt_ctx *ctx)
> {
> + struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
> struct hlist_head *h = obj_hash->head;
> struct hlist_node *n, *t;
> struct ckpt_obj *obj;
> @@ -139,7 +141,9 @@ static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
> hlist_for_each_entry_safe(obj, n, t, &h[i], hash) {
> if (obj->ops->ref_drop)
> obj->ops->ref_drop(obj->ptr, 1);
> - kfree(obj);
> + hlist_del_init(&obj->hash);
> + hlist_del(&obj->next);
> + hlist_add_head(&obj->hash, &obj_hash->free);
> }
> }
> }
> @@ -149,7 +153,14 @@ void ckpt_obj_hash_free(struct ckpt_ctx *ctx)
> struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
>
> if (obj_hash) {
> - obj_hash_clear(obj_hash);
> + struct hlist_node *pos, *n;
> + struct ckpt_obj *obj;
> +
> + ckpt_obj_hash_clear(ctx);
> +
> + hlist_for_each_entry_safe(obj, pos, n, &obj_hash->free, hash)
> + kfree(obj);
> +
> kfree(obj_hash->head);
> kfree(ctx->obj_hash);
> ctx->obj_hash = NULL;
> @@ -173,6 +184,7 @@ int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx)
> obj_hash->head = head;
> obj_hash->next_free_objref = 1;
> INIT_HLIST_HEAD(&obj_hash->list);
> + INIT_HLIST_HEAD(&obj_hash->free);
>
> ctx->obj_hash = obj_hash;
> return 0;
> diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
> index 17270b8..f2241a9 100644
> --- a/kernel/checkpoint/restart.c
> +++ b/kernel/checkpoint/restart.c
> @@ -1349,6 +1349,8 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
> if (ret < 0)
> ckpt_err(ctx, ret, "restart failed (coordinator)\n");
>
> + ckpt_obj_hash_clear(ctx);
> +
> if (ckpt_test_error(ctx)) {
> destroy_descendants(ctx);
> ret = ckpt_get_error(ctx);
> --
> 1.7.2.2
>
> _______________________________________________
> Containers mailing list
> Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <1287496991-18207-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-10-19 18:46 ` Matt Helsley
@ 2010-11-01 16:24 ` Oren Laadan
[not found] ` <4CCEE9AD.7010105-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-01 0:31 ` Oren Laadan
2 siblings, 1 reply; 9+ messages in thread
From: Oren Laadan @ 2010-11-01 16:24 UTC (permalink / raw)
To: Dan Smith
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
Dan,
Does this patch solve the performance problem in freeing all
the objhash entries upfront (e.g. before letting userspace
resume) ?
If so, is there still a performance hit for doing the 'clear'
portion early before resuming the tasks ? How much does it
depend on the complexity of the hierarchy being checkpointed ?
Oren.
On 10/19/2010 10:03 AM, Dan Smith wrote:
> This patch causes the restart coordinator to clear the object hash
> before releasing the restarted tasks. It does this to make sure
> that any objects being held exclusively by the hash are released
> before the tasks start running again.
>
> If we continue to postpone clearing the object hash until restart
> returns to userspace there can be a race where the restarted tasks
> behave differently due to the references held by the objhash.
> One specific example of this is restarting half-closed pipes.
> Without this patch we've got a race between the coordinator --
> about to clear the object hash -- and two restarted tasks connected
> via a half-closed pipe. Because the object hash contains a reference
> to both ends of the pipe one end of the pipe will not be closed
> and EPIPE/SIGPIPE won't be handled when reading from the pipe
> for instance. As far as the restarted userspace task can tell the
> pipe may briefly appear to re-open. Moving the object hash clear
> prevents this race and others like it.
>
> Note that eventually the coordinator would close the pipe and correct
> behavior would be restored. Thus this bug would only affect the
> correctness of userspace -- after a close() the pipe may briefly re-open
> and allow more data to be sent before automatically closing again.
>
> To avoid the overhead of actually freeing the object hash's structures
> at the same time, this adds a queue to ckpt_obj_hash and pushes
> the ckpt_obj structures there to be free()'d later during the cleanup
> process.
>
> Changes in v2:
> - Avoid adding another list_head to ckpt_obj by re-using the hlist
> hash node for the free list
>
> Signed-off-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
> include/linux/checkpoint.h | 1 +
> kernel/checkpoint/objhash.c | 18 +++++++++++++++---
> kernel/checkpoint/restart.c | 2 ++
> 3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
> index a11d40e..f888363 100644
> --- a/include/linux/checkpoint.h
> +++ b/include/linux/checkpoint.h
> @@ -179,6 +179,7 @@ extern void restore_notify_error(struct ckpt_ctx *ctx);
> extern int ckpt_obj_module_get(void);
> extern void ckpt_obj_module_put(void);
>
> +extern void ckpt_obj_hash_clear(struct ckpt_ctx *ctx);
> extern void ckpt_obj_hash_free(struct ckpt_ctx *ctx);
> extern int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx);
>
> diff --git a/kernel/checkpoint/objhash.c b/kernel/checkpoint/objhash.c
> index 62c34ff..4aae0c0 100644
> --- a/kernel/checkpoint/objhash.c
> +++ b/kernel/checkpoint/objhash.c
> @@ -36,6 +36,7 @@ struct ckpt_obj {
> struct ckpt_obj_hash {
> struct hlist_head *head;
> struct hlist_head list;
> + struct hlist_head free;
> int next_free_objref;
> };
>
> @@ -128,8 +129,9 @@ int ckpt_obj_module_get(void)
> #define CKPT_OBJ_HASH_NBITS 10
> #define CKPT_OBJ_HASH_TOTAL (1UL << CKPT_OBJ_HASH_NBITS)
>
> -static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
> +void ckpt_obj_hash_clear(struct ckpt_ctx *ctx)
> {
> + struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
> struct hlist_head *h = obj_hash->head;
> struct hlist_node *n, *t;
> struct ckpt_obj *obj;
> @@ -139,7 +141,9 @@ static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
> hlist_for_each_entry_safe(obj, n, t, &h[i], hash) {
> if (obj->ops->ref_drop)
> obj->ops->ref_drop(obj->ptr, 1);
> - kfree(obj);
> + hlist_del_init(&obj->hash);
> + hlist_del(&obj->next);
> + hlist_add_head(&obj->hash, &obj_hash->free);
> }
> }
> }
> @@ -149,7 +153,14 @@ void ckpt_obj_hash_free(struct ckpt_ctx *ctx)
> struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
>
> if (obj_hash) {
> - obj_hash_clear(obj_hash);
> + struct hlist_node *pos, *n;
> + struct ckpt_obj *obj;
> +
> + ckpt_obj_hash_clear(ctx);
> +
> + hlist_for_each_entry_safe(obj, pos, n, &obj_hash->free, hash)
> + kfree(obj);
> +
> kfree(obj_hash->head);
> kfree(ctx->obj_hash);
> ctx->obj_hash = NULL;
> @@ -173,6 +184,7 @@ int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx)
> obj_hash->head = head;
> obj_hash->next_free_objref = 1;
> INIT_HLIST_HEAD(&obj_hash->list);
> + INIT_HLIST_HEAD(&obj_hash->free);
>
> ctx->obj_hash = obj_hash;
> return 0;
> diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
> index 17270b8..f2241a9 100644
> --- a/kernel/checkpoint/restart.c
> +++ b/kernel/checkpoint/restart.c
> @@ -1349,6 +1349,8 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
> if (ret < 0)
> ckpt_err(ctx, ret, "restart failed (coordinator)\n");
>
> + ckpt_obj_hash_clear(ctx);
> +
> if (ckpt_test_error(ctx)) {
> destroy_descendants(ctx);
> ret = ckpt_get_error(ctx);
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <4CCEE9AD.7010105-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-11-01 17:01 ` Dan Smith
[not found] ` <87oca9vuyw.fsf-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Dan Smith @ 2010-11-01 17:01 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
OL> Does this patch solve the performance problem in freeing all
OL> the objhash entries upfront (e.g. before letting userspace
OL> resume) ?
OL> If so, is there still a performance hit for doing the 'clear'
OL> portion early before resuming the tasks ? How much does it
OL> depend on the complexity of the hierarchy being checkpointed ?
I've actually been unable to detect any real different in performance
between this and the way it was before. I've run it against several
different large checkpoints (multiple processes, with sockets, IPC,
many files open, etc). I have another option for the functionality,
which is to mark only specific items as needing to be free()'d early,
but I can't discern a difference between that or this option either.
What I can discern is that this patch is more likely to prevent
similar issues in the future by making the behavior of the hash far
more generally correct, IMHO.
--
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <87oca9vuyw.fsf-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>
@ 2010-11-01 17:50 ` Oren Laadan
[not found] ` <4CCEFDD4.1030308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Oren Laadan @ 2010-11-01 17:50 UTC (permalink / raw)
To: Dan Smith
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
On 11/01/2010 01:01 PM, Dan Smith wrote:
> OL> Does this patch solve the performance problem in freeing all
> OL> the objhash entries upfront (e.g. before letting userspace
> OL> resume) ?
>
> OL> If so, is there still a performance hit for doing the 'clear'
> OL> portion early before resuming the tasks ? How much does it
> OL> depend on the complexity of the hierarchy being checkpointed ?
>
> I've actually been unable to detect any real different in performance
> between this and the way it was before. I've run it against several
> different large checkpoints (multiple processes, with sockets, IPC,
> many files open, etc). I have another option for the functionality,
> which is to mark only specific items as needing to be free()'d early,
> but I can't discern a difference between that or this option either.
> What I can discern is that this patch is more likely to prevent
> similar issues in the future by making the behavior of the hash far
> more generally correct, IMHO.
So to be clear: is there a performance issue due to moving the
cleanup - some or all - to before allowing tasks to resume ?
(This is what I understood over IRC).
I agree that it's more correct and safe this way. But I want to
know what the implications of a (or any) fix we make.
Oren.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <4CCEFDD4.1030308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2010-11-01 18:10 ` Dan Smith
0 siblings, 0 replies; 9+ messages in thread
From: Dan Smith @ 2010-11-01 18:10 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
OL> So to be clear: is there a performance issue due to moving the
OL> cleanup - some or all - to before allowing tasks to resume ?
OL> (This is what I understood over IRC).
Not that I have been able to measure, no.
--
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <1287496991-18207-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-10-19 18:46 ` Matt Helsley
2010-11-01 16:24 ` Oren Laadan
@ 2011-02-01 0:31 ` Oren Laadan
[not found] ` <4D47544E.7080600-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2 siblings, 1 reply; 9+ messages in thread
From: Oren Laadan @ 2011-02-01 0:31 UTC (permalink / raw)
To: Dan Smith
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
I modifed the patch a bit according to our IRC chat today:
From 9c74f82411d77cf0194a17ba99af0dd31070e88a Mon Sep 17 00:00:00 2001
From: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Date: Mon, 31 Jan 2011 19:01:49 -0500
Subject: [PATCH] c/r: clear the objhash before completing restart, but delay free (v3)
This patch causes the restart coordinator to drop references to
objects in the objhash before releasing the restarted tasks. This
ensures that objects held exclusively there are released before any
task starts running again.
Keeping those references when we allow restarted tasks to return to
userspace is racy if an object's behavior depends on this reference.
One example is restarting half-closed pipes: without this patch they
temporarily (until the coordinator clears the objhash) appear fully
open on both sides. Consider a pipe with read-side closed. A write to
the write-side should produce EPIPE/SIGPIPE, however, because the
objhash contains a reference to both ends of the pipe, the read-side
at restart will not really be closed until after the cleanup, and a
process would unexpectedly not receive an error. Moving the object
hash clear prevents this race and others like it.
To avoid the overhead of actually freeing the object hash's structures
at the same time, we defer the actual memory free to until after the
restarted tasks are allowed to resume execution.
Original patch posted by Dan Smith.
Changes in v3:
- Use a flag on ctx->kflags to indicate that "->drop" was done on
the entire hash instead of moving elements to a "free" list
Changes in v2:
- Avoid adding another list_head to ckpt_obj by re-using the hlist
hash node for the free list
Cc: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
---
include/linux/checkpoint.h | 3 +++
kernel/checkpoint/objhash.c | 17 +++++++++++++----
kernel/checkpoint/restart.c | 8 ++++++++
3 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h
index adaf6af..6c0ccfd 100644
--- a/include/linux/checkpoint.h
+++ b/include/linux/checkpoint.h
@@ -50,11 +50,13 @@ extern long do_sys_restart(pid_t pid, int fd,
#define CKPT_CTX_RESTART_BIT 1
#define CKPT_CTX_SUCCESS_BIT 2
#define CKPT_CTX_ERROR_BIT 3
+#define CKPT_CTX_DROPPED_BIT 4
#define CKPT_CTX_CHECKPOINT (1 << CKPT_CTX_CHECKPOINT_BIT)
#define CKPT_CTX_RESTART (1 << CKPT_CTX_RESTART_BIT)
#define CKPT_CTX_SUCCESS (1 << CKPT_CTX_SUCCESS_BIT)
#define CKPT_CTX_ERROR (1 << CKPT_CTX_ERROR_BIT)
+#define CKPT_CTX_DROPPED (1 << CKPT_CTX_DROPPED_BIT)
/* ckpt_ctx: uflags */
#define CHECKPOINT_USER_FLAGS \
@@ -176,6 +178,7 @@ extern void restore_notify_error(struct ckpt_ctx *ctx);
/* obj_hash */
extern void ckpt_obj_hash_free(struct ckpt_ctx *ctx);
+extern void ckpt_obj_hash_drop(struct ckpt_ctx *ctx);
extern int ckpt_obj_hash_alloc(struct ckpt_ctx *ctx);
extern int restore_obj(struct ckpt_ctx *ctx, struct ckpt_hdr_objref *h);
diff --git a/kernel/checkpoint/objhash.c b/kernel/checkpoint/objhash.c
index 52062a8..40c2e9b 100644
--- a/kernel/checkpoint/objhash.c
+++ b/kernel/checkpoint/objhash.c
@@ -66,7 +66,7 @@ EXPORT_SYMBOL(register_checkpoint_obj);
#define CKPT_OBJ_HASH_NBITS 10
#define CKPT_OBJ_HASH_TOTAL (1UL << CKPT_OBJ_HASH_NBITS)
-static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
+static void obj_hash_free(struct ckpt_obj_hash *obj_hash, int drop, int clean)
{
struct hlist_head *h = obj_hash->head;
struct hlist_node *n, *t;
@@ -75,19 +75,28 @@ static void obj_hash_clear(struct ckpt_obj_hash *obj_hash)
for (i = 0; i < CKPT_OBJ_HASH_TOTAL; i++) {
hlist_for_each_entry_safe(obj, n, t, &h[i], hash) {
- if (obj->ops->ref_drop)
+ if (drop && obj->ops->ref_drop)
obj->ops->ref_drop(obj->ptr, 1);
- kfree(obj);
+ if (clean)
+ kfree(obj);
}
}
+
+}
+
+void ckpt_obj_hash_drop(struct ckpt_ctx *ctx)
+{
+ obj_hash_free(ctx->obj_hash, 1, 0);
+ ckpt_set_ctx_kflag(ctx, CKPT_CTX_DROPPED);
}
void ckpt_obj_hash_free(struct ckpt_ctx *ctx)
{
struct ckpt_obj_hash *obj_hash = ctx->obj_hash;
+ int dropped = ctx->kflags & CKPT_CTX_DROPPED;
if (obj_hash) {
- obj_hash_clear(obj_hash);
+ obj_hash_free(obj_hash, !dropped, 1);
kfree(obj_hash->head);
kfree(ctx->obj_hash);
ctx->obj_hash = NULL;
diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
index 2eae499..01da67f 100644
--- a/kernel/checkpoint/restart.c
+++ b/kernel/checkpoint/restart.c
@@ -1345,6 +1345,14 @@ static int do_restore_coord(struct ckpt_ctx *ctx, pid_t pid)
destroy_descendants(ctx);
ret = ckpt_get_error(ctx);
} else {
+ /*
+ * Drop references to all objects in the objhash before
+ * any task is allowed to resume: this prevents resources
+ * whose semantics depend on the number of references from
+ * temporarily misbehaving (e.g. a half-closed pipe may
+ * temporarily appear to fully-open).
+ */
+ ckpt_obj_hash_drop(ctx);
ckpt_set_success(ctx);
wake_up_all(&ctx->waitq);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <4D47544E.7080600-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
@ 2011-02-01 19:38 ` Dan Smith
[not found] ` <87pqrb5yvn.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Dan Smith @ 2011-02-01 19:38 UTC (permalink / raw)
To: Oren Laadan
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
OL> Original patch posted by Dan Smith.
Tested-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
With this and my pipe refcount fix, all my pipe tests pass, including
the ones that were stuck before because the hash was getting free'd
late.
Thanks!
--
Dan Smith
IBM Linux Technology Center
email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] Clear the objhash before completing restart, but delay free (v2)
[not found] ` <87pqrb5yvn.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
@ 2011-02-02 14:55 ` Oren Laadan
0 siblings, 0 replies; 9+ messages in thread
From: Oren Laadan @ 2011-02-02 14:55 UTC (permalink / raw)
To: Dan Smith
Cc: containers-qjLDD68F18O7TbgM5vRIOg,
matthltc-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
Thanks for testing - pushed both patches to v23-rc1.
Oren.
On 02/01/2011 02:38 PM, Dan Smith wrote:
> OL> Original patch posted by Dan Smith.
>
> Tested-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>
> With this and my pipe refcount fix, all my pipe tests pass, including
> the ones that were stuck before because the hash was getting free'd
> late.
>
> Thanks!
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-02-02 14:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-19 14:03 [PATCH] Clear the objhash before completing restart, but delay free (v2) Dan Smith
[not found] ` <1287496991-18207-1-git-send-email-danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-10-19 18:46 ` Matt Helsley
2010-11-01 16:24 ` Oren Laadan
[not found] ` <4CCEE9AD.7010105-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-01 17:01 ` Dan Smith
[not found] ` <87oca9vuyw.fsf-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>
2010-11-01 17:50 ` Oren Laadan
[not found] ` <4CCEFDD4.1030308-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-11-01 18:10 ` Dan Smith
2011-02-01 0:31 ` Oren Laadan
[not found] ` <4D47544E.7080600-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2011-02-01 19:38 ` Dan Smith
[not found] ` <87pqrb5yvn.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2011-02-02 14:55 ` Oren Laadan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.