* multi-threaded app fails to restart
@ 2010-07-19 19:36 John Paul Walters
[not found] ` <AANLkTilxfsYGyYLwO__VmDLSFQ_s_Qe03G49kIEztVja-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: John Paul Walters @ 2010-07-19 19:36 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
I have a very simple multi-threaded application that I'm testing with,
but I'm unable to get a restart to complete. I've tried both versions
21 and version 22-dev. I'm using a debian 32 bit install inside of a
VMWare fusion virtual machine. The problem seems to be limited to
threads as I'm able to checkpoint and restart the multitask test
application. The steps that I'm executing are:
./pthread_test &
[1] 3982
ps -efL | grep pthread_test
jwalters 3982 3357 3982 0 2 19:21 pts/0 00:00:00 ./pthread_test
jwalters 3982 3357 3983 0 2 19:21 pts/0 00:00:00 ./pthread_test
for i in 3982 3983; do echo $i > /containers/1/tasks ; done
echo FROZEN /containers/1/freezer.state
cat /containers/1/freezer.state
FROZEN
./checkpoint 3982 > checkpoint_out
(there aren't any unusual looking messages in the dmesg output at this point)
After thawing and killing off the running instance, I attempt to restart:
./restart -d < checkpoint_out
...
<4030>c/r read input 16384
<4030>c/r read input 16384
<4030>c/r read input 12789
<4030>c/r read input 0
<4029>restart succeeded
<4029>SIGCHLD: already collected
<4029>task terminated with signal 11
<4029>c/r succeeded
The tail end of the syslog also contains:
[ 3210.327177] [4029:4029:c/r:do_restart:1451] sys_restart returns 0
[ 3210.327190] [4033:4033:c/r:wait_task_sync:919] task sync done (errno 0)
[ 3210.327192] [4033:4033:c/r:clear_task_ctx:852] task 4033 clear checkpoint_ctx
[ 3210.327194] [4033:4033:c/r:do_restart:1451] sys_restart returns -516
[ 3210.327227] pthread_test[4033]: segfault at b781f424 ip b781f424 sp
b75cc1c0 error 4
[ 3210.330254] [4031:4031:c/r:wait_task_sync:919] task sync done (errno 0)
[ 3210.330257] [4031:4031:c/r:clear_task_ctx:852] task 4031 clear checkpoint_ctx
[ 3210.330259] [4031:4031:c/r:restore_debug_free:144] 4 tasks
registered, nr_tasks was 0 nr_total 0
[ 3210.330261] [4031:4031:c/r:restore_debug_free:147] active pid was
2, ctx->errno 0
[ 3210.330263] [4031:4031:c/r:restore_debug_free:149] kflags 22 uflags
0 oflags 1
[ 3210.330265] [4031:4031:c/r:restore_debug_free:151] task[0] to run 4031
[ 3210.330267] [4031:4031:c/r:restore_debug_free:151] task[1] to run 4033
[ 3210.330269] [4031:4031:c/r:restore_debug_free:176] pid 4029 type
Coord state Success
[ 3210.330272] [4031:4031:c/r:restore_debug_free:176] pid 4031 type
Root state Success
[ 3210.330274] [4031:4031:c/r:restore_debug_free:176] pid 4033 type
Task state Success
[ 3210.330276] [4031:4031:c/r:restore_debug_free:176] pid 4032 type
Ghost state Success
[ 3210.330285] [4031:4031:c/r:pgarr_release_pages:102] total pages 0
[ 3210.330288] [4031:4031:c/r:do_restart:1451] sys_restart returns -512
Any thoughts?
best regards,
JP
^ permalink raw reply [flat|nested] 15+ messages in thread[parent not found: <AANLkTilxfsYGyYLwO__VmDLSFQ_s_Qe03G49kIEztVja-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTilxfsYGyYLwO__VmDLSFQ_s_Qe03G49kIEztVja-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-19 19:54 ` Nathan Lynch 2010-07-19 20:27 ` John Paul Walters 0 siblings, 1 reply; 15+ messages in thread From: Nathan Lynch @ 2010-07-19 19:54 UTC (permalink / raw) To: John Paul Walters; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On Mon, 2010-07-19 at 15:36 -0400, John Paul Walters wrote: > [ 3210.327227] pthread_test[4033]: segfault at b781f424 ip b781f424 sp > b75cc1c0 error 4 > [ 3210.330254] [4031:4031:c/r:wait_task_sync:919] task sync done (errno 0) > [ 3210.330257] [4031:4031:c/r:clear_task_ctx:852] task 4031 clear checkpoint_ctx > [ 3210.330259] [4031:4031:c/r:restore_debug_free:144] 4 tasks > registered, nr_tasks was 0 nr_total 0 > [ 3210.330261] [4031:4031:c/r:restore_debug_free:147] active pid was > 2, ctx->errno 0 > [ 3210.330263] [4031:4031:c/r:restore_debug_free:149] kflags 22 uflags > 0 oflags 1 > [ 3210.330265] [4031:4031:c/r:restore_debug_free:151] task[0] to run 4031 > [ 3210.330267] [4031:4031:c/r:restore_debug_free:151] task[1] to run 4033 > [ 3210.330269] [4031:4031:c/r:restore_debug_free:176] pid 4029 type > Coord state Success > [ 3210.330272] [4031:4031:c/r:restore_debug_free:176] pid 4031 type > Root state Success > [ 3210.330274] [4031:4031:c/r:restore_debug_free:176] pid 4033 type > Task state Success > [ 3210.330276] [4031:4031:c/r:restore_debug_free:176] pid 4032 type > Ghost state Success > [ 3210.330285] [4031:4031:c/r:pgarr_release_pages:102] total pages 0 > [ 3210.330288] [4031:4031:c/r:do_restart:1451] sys_restart returns -512 > > Any thoughts? There were two patches posted to the containers list on 11 July - "fix task tree traversal for threads" and "save/restore 'sysenter_return' for threads". Can you try with those on top of ckpt-v22-dev? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: multi-threaded app fails to restart 2010-07-19 19:54 ` Nathan Lynch @ 2010-07-19 20:27 ` John Paul Walters [not found] ` <AANLkTimpXSXQr1wew1wvZKnBFsOXD7f2tblY4EGmJoFM-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: John Paul Walters @ 2010-07-19 20:27 UTC (permalink / raw) Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA >> Ghost state Success >> [ 3210.330285] [4031:4031:c/r:pgarr_release_pages:102] total pages 0 >> [ 3210.330288] [4031:4031:c/r:do_restart:1451] sys_restart returns -512 >> >> Any thoughts? > > There were two patches posted to the containers list on 11 July - "fix > task tree traversal for threads" and "save/restore 'sysenter_return' for > threads". Can you try with those on top of ckpt-v22-dev? > > > Hi Nathan, Thanks for your help. I applied the two patches as you suggested. They fixed the first of the two bad sys_restart return values, but the final one (quoted above, for what it's worth) still returns -512. When I use the -d -v switches to restart, it appears to work (no error messages are returned), but only the main thread is restored while the second thread is not. JP ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTimpXSXQr1wew1wvZKnBFsOXD7f2tblY4EGmJoFM-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTimpXSXQr1wew1wvZKnBFsOXD7f2tblY4EGmJoFM-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-20 3:24 ` Oren Laadan [not found] ` <4C4516DD.1000809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Oren Laadan @ 2010-07-20 3:24 UTC (permalink / raw) To: John Paul Walters; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On 07/19/2010 04:27 PM, John Paul Walters wrote: >>> Ghost state Success >>> [ 3210.330285] [4031:4031:c/r:pgarr_release_pages:102] total pages 0 >>> [ 3210.330288] [4031:4031:c/r:do_restart:1451] sys_restart returns -512 >>> >>> Any thoughts? >> >> There were two patches posted to the containers list on 11 July - "fix >> task tree traversal for threads" and "save/restore 'sysenter_return' for >> threads". Can you try with those on top of ckpt-v22-dev? >> >> >> > > Hi Nathan, > > Thanks for your help. I applied the two patches as you suggested. > They fixed the first of the two bad sys_restart return values, but the > final one (quoted above, for what it's worth) still returns -512. > When I use the -d -v switches to restart, it appears to work (no error > messages are returned), but only the main thread is restored while the > second thread is not. Hi John, I just pushed a few more fixes related to signals to ckpt-v22-dev. Can you please see if they fix your problem ? Also, can you please post the test program that you are using, so we can try to replicate the problem ? Note that it is usually ok for sys_restart() to return -512 -- it means that the process/thread was interrupted when the checkpoint, and it will now retry the same syscall from then. You can use the -F (--freezer) switch of restart(1) to freeze the restarted tasks/threads before they are allowed to run in userspace. Using it you can tell whether the other thread dies immediately after restart, or is not at all restarted. Thanks, Oren. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <4C4516DD.1000809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <4C4516DD.1000809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> @ 2010-07-20 18:58 ` John Paul Walters [not found] ` <AANLkTimPENgm-LSh6iMv2uxegRdHEivbGMTYmEfiOEJG-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: John Paul Walters @ 2010-07-20 18:58 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA > > Hi John, > > I just pushed a few more fixes related to signals to ckpt-v22-dev. > Can you please see if they fix your problem ? > > Also, can you please post the test program that you are using, so > we can try to replicate the problem ? > > Note that it is usually ok for sys_restart() to return -512 -- it > means that the process/thread was interrupted when the checkpoint, > and it will now retry the same syscall from then. > > You can use the -F (--freezer) switch of restart(1) to freeze the > restarted tasks/threads before they are allowed to run in userspace. > Using it you can tell whether the other thread dies immediately > after restart, or is not at all restarted. > > Thanks, > > Oren. > Hi Oren, I grabbed the most recent v22-dev that includes the updates. I'm still experiencing the same issue. Testing with -F indicates that the second thread isn't being restarted. The code that I'm using is: #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <sys/syscall.h> #include <errno.h> #include <string.h> #include <unistd.h> #define OUTFILE "/tmp/cr-self.out" void * func (void *arg) { FILE *file; int counter = 0; file = fopen(OUTFILE, "w+"); while (1){ sleep(2); counter++; fprintf(file, "Count %d\n", counter); fflush(file); } return NULL; } int main (int argc, char **argv) { pthread_t thread; close (0); close (1); close (2); unlink (OUTFILE); pthread_create(&thread, NULL, func, NULL); pthread_join(thread, NULL); return 0; } Thanks for your help, JP ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTimPENgm-LSh6iMv2uxegRdHEivbGMTYmEfiOEJG-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTimPENgm-LSh6iMv2uxegRdHEivbGMTYmEfiOEJG-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-20 23:12 ` Oren Laadan [not found] ` <Pine.LNX.4.64.1007201906370.15255-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Oren Laadan @ 2010-07-20 23:12 UTC (permalink / raw) To: John Paul Walters; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Hi John In your program, it is a thread of the root task (of the hierarchy) that is missed. Indeed the previous patch was incomplete - it did fix the non-root-threads case but spoiled the root-threads case. That was silly... well, can you try this little patch: Thanks for following up, was very helpful ! Oren. --- diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c index 171c867..3288af0 100644 --- a/kernel/checkpoint/sys.c +++ b/kernel/checkpoint/sys.c @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, continue; } + /* if not last thread - proceed with thread */ + task = next_thread(task); + if (!thread_group_leader(task)) + continue; + /* by definition, skip siblings of root */ while (task != root) { - /* if not last thread - proceed with thread */ - task = next_thread(task); - if (!thread_group_leader(task)) - break; - /* if has sibling - proceed with sibling */ if (!list_is_last(&task->sibling, &parent->children)) { task = list_entry(task->sibling.next, --- On Tue, 20 Jul 2010, John Paul Walters wrote: > > > > Hi John, > > > > I just pushed a few more fixes related to signals to ckpt-v22-dev. > > Can you please see if they fix your problem ? > > > > Also, can you please post the test program that you are using, so > > we can try to replicate the problem ? > > > > Note that it is usually ok for sys_restart() to return -512 -- it > > means that the process/thread was interrupted when the checkpoint, > > and it will now retry the same syscall from then. > > > > You can use the -F (--freezer) switch of restart(1) to freeze the > > restarted tasks/threads before they are allowed to run in userspace. > > Using it you can tell whether the other thread dies immediately > > after restart, or is not at all restarted. > > > > Thanks, > > > > Oren. > > > > Hi Oren, > > I grabbed the most recent v22-dev that includes the updates. I'm > still experiencing the same issue. Testing with -F indicates that the > second thread isn't being restarted. The code that I'm using is: > > #include <stdio.h> > #include <stdlib.h> > #include <pthread.h> > #include <sys/syscall.h> > #include <errno.h> > #include <string.h> > #include <unistd.h> > > #define OUTFILE "/tmp/cr-self.out" > > void * > func (void *arg) > { > FILE *file; > int counter = 0; > > file = fopen(OUTFILE, "w+"); > > while (1){ > sleep(2); > counter++; > fprintf(file, "Count %d\n", counter); > fflush(file); > } > > return NULL; > } > > int > main (int argc, char **argv) > { > pthread_t thread; > close (0); > close (1); > close (2); > unlink (OUTFILE); > > pthread_create(&thread, NULL, func, NULL); > pthread_join(thread, NULL); > return 0; > } > > Thanks for your help, > JP > > ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <Pine.LNX.4.64.1007201906370.15255-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <Pine.LNX.4.64.1007201906370.15255-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> @ 2010-07-21 0:03 ` John Paul Walters [not found] ` <AANLkTinZYiWPtSegjRJWnlc6hipFAZyujr8-2ug6ettF-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: John Paul Walters @ 2010-07-21 0:03 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA On Tue, Jul 20, 2010 at 7:12 PM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote: > > Hi John > > In your program, it is a thread of the root task (of the hierarchy) > that is missed. Indeed the previous patch was incomplete - it did > fix the non-root-threads case but spoiled the root-threads case. > That was silly... well, can you try this little patch: > > Thanks for following up, was very helpful ! > > Oren. Hi Oren, I'm still unable to fully restart the application with your patch, but the result is now different. If I attempt to restart using --pidns and -F, both threads are created and frozen. However, as soon as I thaw them I get a segfault. If I attempt to restart them without the --pidns option, I get a message from restart indicating that it's about to call sys_restart and restart hangs. I also have the following in my syslog: [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] wait_checkpoint_ctx: failed (-512) [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync kflags 0x1a (ret 0) [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 [ 1541.864698] [err -512][pos 419][E @ ckpt_read_obj_type:426]Expecting to read type 9001 [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart failed (coordinator) [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks registered, nr_tasks was 0 nr_total 1 [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was 0, ctx->errno -512 [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags 0 oflags 1 [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type Coord state Failed [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type Root state Failed [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type Ghost state Failed thanks, JP > > --- > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > index 171c867..3288af0 100644 > --- a/kernel/checkpoint/sys.c > +++ b/kernel/checkpoint/sys.c > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > continue; > } > > + /* if not last thread - proceed with thread */ > + task = next_thread(task); > + if (!thread_group_leader(task)) > + continue; > + > /* by definition, skip siblings of root */ > while (task != root) { > - /* if not last thread - proceed with thread */ > - task = next_thread(task); > - if (!thread_group_leader(task)) > - break; > - > /* if has sibling - proceed with sibling */ > if (!list_is_last(&task->sibling, &parent->children)) { > task = list_entry(task->sibling.next, > --- ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTinZYiWPtSegjRJWnlc6hipFAZyujr8-2ug6ettF-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTinZYiWPtSegjRJWnlc6hipFAZyujr8-2ug6ettF-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-21 5:54 ` Oren Laadan [not found] ` <Pine.LNX.4.64.1007210143120.22870-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Oren Laadan @ 2010-07-21 5:54 UTC (permalink / raw) To: John Paul Walters; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA [-- Attachment #1: Type: TEXT/PLAIN, Size: 4684 bytes --] On Tue, 20 Jul 2010, John Paul Walters wrote: > On Tue, Jul 20, 2010 at 7:12 PM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote: > > > > Hi John > > > > In your program, it is a thread of the root task (of the hierarchy) > > that is missed. Indeed the previous patch was incomplete - it did > > fix the non-root-threads case but spoiled the root-threads case. > > That was silly... well, can you try this little patch: > > > > Thanks for following up, was very helpful ! > > > > Oren. > > Hi Oren, > > I'm still unable to fully restart the application with your patch, but > the result is now different. If I attempt to restart using --pidns > and -F, both threads are created and frozen. However, as soon as I > thaw them I get a segfault. If I attempt to restart them without the > --pidns option, I get a message from restart indicating that it's > about to call sys_restart and restart hangs. I also have the > following in my syslog: Hi John, I assume the log below is for the --no-pidns case, right ? Can you also post the output of 'restart -vd ...' ? (Unfortunately I won't have a chance to try it until the weekend) Thanks, Oren. > > > [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 > [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 > [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 > [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed > [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed > [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx > [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting > [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 > [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] > wait_checkpoint_ctx: failed (-512) > [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting > [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 > [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync > kflags 0x1a (ret 0) > [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 > [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 > [ 1541.864698] [err -512][pos 419][E @ > ckpt_read_obj_type:426]Expecting to read type 9001 > [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 > [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart > failed (coordinator) > [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 > [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx > [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 > [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks > registered, nr_tasks was 0 nr_total 1 > [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was > 0, ctx->errno -512 > [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags > 0 oflags 1 > [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 > [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 > [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type > Coord state Failed > [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type > Root state Failed > [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type > Ghost state Failed > > thanks, > JP > > > > > --- > > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > > index 171c867..3288af0 100644 > > --- a/kernel/checkpoint/sys.c > > +++ b/kernel/checkpoint/sys.c > > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > > continue; > > } > > > > + /* if not last thread - proceed with thread */ > > + task = next_thread(task); > > + if (!thread_group_leader(task)) > > + continue; > > + > > /* by definition, skip siblings of root */ > > while (task != root) { > > - /* if not last thread - proceed with thread */ > > - task = next_thread(task); > > - if (!thread_group_leader(task)) > > - break; > > - > > /* if has sibling - proceed with sibling */ > > if (!list_is_last(&task->sibling, &parent->children)) { > > task = list_entry(task->sibling.next, > > --- > > [-- Attachment #2: Type: text/plain, Size: 206 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <Pine.LNX.4.64.1007210143120.22870-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <Pine.LNX.4.64.1007210143120.22870-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> @ 2010-07-21 12:52 ` John Paul Walters [not found] ` <AANLkTinOFIzK8RZnp9NHouKv-WA7Omr08pPTGfrfVLfP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: John Paul Walters @ 2010-07-21 12:52 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA >> >> Hi Oren, >> >> I'm still unable to fully restart the application with your patch, but >> the result is now different. If I attempt to restart using --pidns >> and -F, both threads are created and frozen. However, as soon as I >> thaw them I get a segfault. If I attempt to restart them without the >> --pidns option, I get a message from restart indicating that it's >> about to call sys_restart and restart hangs. I also have the >> following in my syslog: > > Hi John, > > I assume the log below is for the --no-pidns case, right ? > Can you also post the output of 'restart -vd ...' ? > (Unfortunately I won't have a chance to try it until the weekend) > Hi Oren, That's correct, the original log was for the --no-pidns case. Below I've included the restart log up to the point where it hangs at sys_restart. Thanks again for all of your help. best, JP ./restart -v -d --no-pidns < checkpoint_out <4124>number of tasks: 2 <4124>number of vpids: 0 <4124>total tasks (including ghosts): 3 <4124>pid 3583: thread tgid 3582 <4124>pid 3583: creator set to 3582 <4124>pid 1: propagate session 3582 <4124>pid 1: creator set to 3582 <4124>pid 1: set session <4124>pid 1: moving up to 3582 <4124>====== TASKS <4124> [0] pid 3582 ppid 3349 sid 0 creator 0 <4124> [1] pid 3583 ppid 3349 sid 0 creator 3582 prev 1 T <4124> [2] pid 1 ppid 3582 sid 3582 creator 3582 next 3583 S G <4124>............ <4124>task[0].vidx = -1 <4124>task[1].vidx = -1 <4124>subtree (existing pidns) <4124>forking child vpid 3582 flags 0x1 <4124>task 3582 forking with flags 11 numpids 1 <4124>task 3582 pid[0]=0 <4124>forked child vpid 4126 (asked 3582) <4126>root task pid 4126 <4126>pid 3582: pid 4126 sid 3386 parent 4124 <4126>pid 3582: fork child 1 with session <4126>forking child vpid 1 flags 0x12 <4126>task 1 forking with flags 11 numpids 1 <4126>task 1 pid[0]=0 <4126>forked child vpid 4127 (asked 1) <4126>pid 3582: fork child 3583 without session <4126>forking child vpid 3583 flags 0x4 <4126>task 3583 forking with flags 10911 numpids 1 <4126>task 3583 pid[0]=0 <4126>forked child vpid 4128 (asked 3583) <4126>about to call sys_restart(), flags 0 <4125>====== PIDS ARRAY <4125>[0] pid 3582 ppid 1 sid 1 pgid 3582 <4125>[1] pid 3583 ppid 1 sid 1 pgid 3582 <4125>............ <4125>c/r swap old 3582 new 4126 <4128>pid 3583: pid 4128 sid 3386 parent 4124 <4128>about to call sys_restart(), flags 0 <4125>c/r swap old 3583 new 4128 <4127>pid 1: pid 4127 sid 3386 parent 4126 <4125>c/r swap old 1 new 4127 <4125>====== PIDS ARRAY (swaped) <4125>[0] pid 4126 ppid 1 sid 4127 pgid 4126 <4125>[1] pid 4128 ppid 1 sid 4127 pgid 4126 <4125>............ <4125>c/r read input 16384 <4127>about to call sys_restart(), flags 0x4 <4125>c/r read input 16384 <4125>c/r read input 16384 <4125>c/r read input 16384 <4125>c/r read input 16384 > Thanks, > > Oren. > >> >> >> [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 >> [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 >> [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 >> [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed >> [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed >> [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx >> [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting >> [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 >> [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] >> wait_checkpoint_ctx: failed (-512) >> [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting >> [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 >> [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync >> kflags 0x1a (ret 0) >> [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 >> [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 >> [ 1541.864698] [err -512][pos 419][E @ >> ckpt_read_obj_type:426]Expecting to read type 9001 >> [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 >> [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart >> failed (coordinator) >> [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 >> [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx >> [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 >> [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks >> registered, nr_tasks was 0 nr_total 1 >> [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was >> 0, ctx->errno -512 >> [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags >> 0 oflags 1 >> [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 >> [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 >> [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type >> Coord state Failed >> [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type >> Root state Failed >> [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type >> Ghost state Failed >> >> thanks, >> JP >> >> > >> > --- >> > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c >> > index 171c867..3288af0 100644 >> > --- a/kernel/checkpoint/sys.c >> > +++ b/kernel/checkpoint/sys.c >> > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, >> > continue; >> > } >> > >> > + /* if not last thread - proceed with thread */ >> > + task = next_thread(task); >> > + if (!thread_group_leader(task)) >> > + continue; >> > + >> > /* by definition, skip siblings of root */ >> > while (task != root) { >> > - /* if not last thread - proceed with thread */ >> > - task = next_thread(task); >> > - if (!thread_group_leader(task)) >> > - break; >> > - >> > /* if has sibling - proceed with sibling */ >> > if (!list_is_last(&task->sibling, &parent->children)) { >> > task = list_entry(task->sibling.next, >> > --- >> >> ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTinOFIzK8RZnp9NHouKv-WA7Omr08pPTGfrfVLfP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTinOFIzK8RZnp9NHouKv-WA7Omr08pPTGfrfVLfP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-22 1:04 ` Oren Laadan [not found] ` <Pine.LNX.4.64.1007212102010.6257-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Oren Laadan @ 2010-07-22 1:04 UTC (permalink / raw) To: John Paul Walters; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA [-- Attachment #1: Type: TEXT/PLAIN, Size: 7585 bytes --] Hi John, This is a bit embarrassing, the behavior sounds too familiar -- please try to following patch: -- diff --git a/arch/x86/kernel/checkpoint.c b/arch/x86/kernel/checkpoint.c index 3fb9deb..b770f70 100644 --- a/arch/x86/kernel/checkpoint.c +++ b/arch/x86/kernel/checkpoint.c @@ -104,7 +104,7 @@ int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t) h->gdt_entry_tls_entries = GDT_ENTRY_TLS_ENTRIES; h->sizeof_tls_array = tls_size; h->sysenter_return = (__u64) (unsigned long) - task_thread_info(current)->sysenter_return; + task_thread_info(t)->sysenter_return; /* For simplicity dump the entire array */ memcpy(h + 1, t->thread.tls_array, tls_size); -- On Wed, 21 Jul 2010, John Paul Walters wrote: > >> > >> Hi Oren, > >> > >> I'm still unable to fully restart the application with your patch, but > >> the result is now different. If I attempt to restart using --pidns > >> and -F, both threads are created and frozen. However, as soon as I > >> thaw them I get a segfault. If I attempt to restart them without the > >> --pidns option, I get a message from restart indicating that it's > >> about to call sys_restart and restart hangs. I also have the > >> following in my syslog: > > > > Hi John, > > > > I assume the log below is for the --no-pidns case, right ? > > Can you also post the output of 'restart -vd ...' ? > > (Unfortunately I won't have a chance to try it until the weekend) > > > > Hi Oren, > > That's correct, the original log was for the --no-pidns case. Below > I've included the restart log up to the point where it hangs at > sys_restart. Thanks again for all of your help. > > best, > JP > > ./restart -v -d --no-pidns < checkpoint_out > <4124>number of tasks: 2 > <4124>number of vpids: 0 > <4124>total tasks (including ghosts): 3 > <4124>pid 3583: thread tgid 3582 > <4124>pid 3583: creator set to 3582 > <4124>pid 1: propagate session 3582 > <4124>pid 1: creator set to 3582 > <4124>pid 1: set session > <4124>pid 1: moving up to 3582 > <4124>====== TASKS > <4124> [0] pid 3582 ppid 3349 sid 0 creator 0 > <4124> [1] pid 3583 ppid 3349 sid 0 creator 3582 prev 1 T > <4124> [2] pid 1 ppid 3582 sid 3582 creator 3582 next 3583 S G > <4124>............ > <4124>task[0].vidx = -1 > <4124>task[1].vidx = -1 > <4124>subtree (existing pidns) > <4124>forking child vpid 3582 flags 0x1 > <4124>task 3582 forking with flags 11 numpids 1 > <4124>task 3582 pid[0]=0 > <4124>forked child vpid 4126 (asked 3582) > <4126>root task pid 4126 > <4126>pid 3582: pid 4126 sid 3386 parent 4124 > <4126>pid 3582: fork child 1 with session > <4126>forking child vpid 1 flags 0x12 > <4126>task 1 forking with flags 11 numpids 1 > <4126>task 1 pid[0]=0 > <4126>forked child vpid 4127 (asked 1) > <4126>pid 3582: fork child 3583 without session > <4126>forking child vpid 3583 flags 0x4 > <4126>task 3583 forking with flags 10911 numpids 1 > <4126>task 3583 pid[0]=0 > <4126>forked child vpid 4128 (asked 3583) > <4126>about to call sys_restart(), flags 0 > <4125>====== PIDS ARRAY > <4125>[0] pid 3582 ppid 1 sid 1 pgid 3582 > <4125>[1] pid 3583 ppid 1 sid 1 pgid 3582 > <4125>............ > <4125>c/r swap old 3582 new 4126 > <4128>pid 3583: pid 4128 sid 3386 parent 4124 > <4128>about to call sys_restart(), flags 0 > <4125>c/r swap old 3583 new 4128 > <4127>pid 1: pid 4127 sid 3386 parent 4126 > <4125>c/r swap old 1 new 4127 > <4125>====== PIDS ARRAY (swaped) > <4125>[0] pid 4126 ppid 1 sid 4127 pgid 4126 > <4125>[1] pid 4128 ppid 1 sid 4127 pgid 4126 > <4125>............ > <4125>c/r read input 16384 > <4127>about to call sys_restart(), flags 0x4 > <4125>c/r read input 16384 > <4125>c/r read input 16384 > <4125>c/r read input 16384 > <4125>c/r read input 16384 > > > > > > > > Thanks, > > > > Oren. > > > >> > >> > >> [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 > >> [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 > >> [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 > >> [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed > >> [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed > >> [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx > >> [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting > >> [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 > >> [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] > >> wait_checkpoint_ctx: failed (-512) > >> [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting > >> [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 > >> [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync > >> kflags 0x1a (ret 0) > >> [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 > >> [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 > >> [ 1541.864698] [err -512][pos 419][E @ > >> ckpt_read_obj_type:426]Expecting to read type 9001 > >> [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 > >> [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart > >> failed (coordinator) > >> [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 > >> [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx > >> [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 > >> [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks > >> registered, nr_tasks was 0 nr_total 1 > >> [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was > >> 0, ctx->errno -512 > >> [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags > >> 0 oflags 1 > >> [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 > >> [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 > >> [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type > >> Coord state Failed > >> [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type > >> Root state Failed > >> [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type > >> Ghost state Failed > >> > >> thanks, > >> JP > >> > >> > > >> > --- > >> > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > >> > index 171c867..3288af0 100644 > >> > --- a/kernel/checkpoint/sys.c > >> > +++ b/kernel/checkpoint/sys.c > >> > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > >> > continue; > >> > } > >> > > >> > + /* if not last thread - proceed with thread */ > >> > + task = next_thread(task); > >> > + if (!thread_group_leader(task)) > >> > + continue; > >> > + > >> > /* by definition, skip siblings of root */ > >> > while (task != root) { > >> > - /* if not last thread - proceed with thread */ > >> > - task = next_thread(task); > >> > - if (!thread_group_leader(task)) > >> > - break; > >> > - > >> > /* if has sibling - proceed with sibling */ > >> > if (!list_is_last(&task->sibling, &parent->children)) { > >> > task = list_entry(task->sibling.next, > >> > --- > >> > >> > > [-- Attachment #2: Type: text/plain, Size: 206 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <Pine.LNX.4.64.1007212102010.6257-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <Pine.LNX.4.64.1007212102010.6257-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> @ 2010-07-22 16:23 ` John Paul Walters [not found] ` <AANLkTimW98q0sFZeCAk3xHsEfBV9yhL4kUKHjNGxn_2P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: John Paul Walters @ 2010-07-22 16:23 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Hi Oren, Thanks for the patch. For the --pidns case, that seems to have solved the problem. In the case of --no-pidns, restart still hangs as described before. Should this work with in the --no-pidns case, or is it expected to fail in this case? JP On Wed, Jul 21, 2010 at 9:04 PM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote: > Hi John, > > This is a bit embarrassing, the behavior sounds too familiar -- > please try to following patch: > > -- > diff --git a/arch/x86/kernel/checkpoint.c b/arch/x86/kernel/checkpoint.c > index 3fb9deb..b770f70 100644 > --- a/arch/x86/kernel/checkpoint.c > +++ b/arch/x86/kernel/checkpoint.c > @@ -104,7 +104,7 @@ int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t) > h->gdt_entry_tls_entries = GDT_ENTRY_TLS_ENTRIES; > h->sizeof_tls_array = tls_size; > h->sysenter_return = (__u64) (unsigned long) > - task_thread_info(current)->sysenter_return; > + task_thread_info(t)->sysenter_return; > > /* For simplicity dump the entire array */ > memcpy(h + 1, t->thread.tls_array, tls_size); > -- > > On Wed, 21 Jul 2010, John Paul Walters wrote: > >> >> >> >> Hi Oren, >> >> >> >> I'm still unable to fully restart the application with your patch, but >> >> the result is now different. If I attempt to restart using --pidns >> >> and -F, both threads are created and frozen. However, as soon as I >> >> thaw them I get a segfault. If I attempt to restart them without the >> >> --pidns option, I get a message from restart indicating that it's >> >> about to call sys_restart and restart hangs. I also have the >> >> following in my syslog: >> > >> > Hi John, >> > >> > I assume the log below is for the --no-pidns case, right ? >> > Can you also post the output of 'restart -vd ...' ? >> > (Unfortunately I won't have a chance to try it until the weekend) >> > >> >> Hi Oren, >> >> That's correct, the original log was for the --no-pidns case. Below >> I've included the restart log up to the point where it hangs at >> sys_restart. Thanks again for all of your help. >> >> best, >> JP >> >> ./restart -v -d --no-pidns < checkpoint_out >> <4124>number of tasks: 2 >> <4124>number of vpids: 0 >> <4124>total tasks (including ghosts): 3 >> <4124>pid 3583: thread tgid 3582 >> <4124>pid 3583: creator set to 3582 >> <4124>pid 1: propagate session 3582 >> <4124>pid 1: creator set to 3582 >> <4124>pid 1: set session >> <4124>pid 1: moving up to 3582 >> <4124>====== TASKS >> <4124> [0] pid 3582 ppid 3349 sid 0 creator 0 >> <4124> [1] pid 3583 ppid 3349 sid 0 creator 3582 prev 1 T >> <4124> [2] pid 1 ppid 3582 sid 3582 creator 3582 next 3583 S G >> <4124>............ >> <4124>task[0].vidx = -1 >> <4124>task[1].vidx = -1 >> <4124>subtree (existing pidns) >> <4124>forking child vpid 3582 flags 0x1 >> <4124>task 3582 forking with flags 11 numpids 1 >> <4124>task 3582 pid[0]=0 >> <4124>forked child vpid 4126 (asked 3582) >> <4126>root task pid 4126 >> <4126>pid 3582: pid 4126 sid 3386 parent 4124 >> <4126>pid 3582: fork child 1 with session >> <4126>forking child vpid 1 flags 0x12 >> <4126>task 1 forking with flags 11 numpids 1 >> <4126>task 1 pid[0]=0 >> <4126>forked child vpid 4127 (asked 1) >> <4126>pid 3582: fork child 3583 without session >> <4126>forking child vpid 3583 flags 0x4 >> <4126>task 3583 forking with flags 10911 numpids 1 >> <4126>task 3583 pid[0]=0 >> <4126>forked child vpid 4128 (asked 3583) >> <4126>about to call sys_restart(), flags 0 >> <4125>====== PIDS ARRAY >> <4125>[0] pid 3582 ppid 1 sid 1 pgid 3582 >> <4125>[1] pid 3583 ppid 1 sid 1 pgid 3582 >> <4125>............ >> <4125>c/r swap old 3582 new 4126 >> <4128>pid 3583: pid 4128 sid 3386 parent 4124 >> <4128>about to call sys_restart(), flags 0 >> <4125>c/r swap old 3583 new 4128 >> <4127>pid 1: pid 4127 sid 3386 parent 4126 >> <4125>c/r swap old 1 new 4127 >> <4125>====== PIDS ARRAY (swaped) >> <4125>[0] pid 4126 ppid 1 sid 4127 pgid 4126 >> <4125>[1] pid 4128 ppid 1 sid 4127 pgid 4126 >> <4125>............ >> <4125>c/r read input 16384 >> <4127>about to call sys_restart(), flags 0x4 >> <4125>c/r read input 16384 >> <4125>c/r read input 16384 >> <4125>c/r read input 16384 >> <4125>c/r read input 16384 >> >> >> >> >> >> >> > Thanks, >> > >> > Oren. >> > >> >> >> >> >> >> [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 >> >> [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 >> >> [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 >> >> [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed >> >> [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed >> >> [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx >> >> [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting >> >> [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 >> >> [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] >> >> wait_checkpoint_ctx: failed (-512) >> >> [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting >> >> [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 >> >> [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync >> >> kflags 0x1a (ret 0) >> >> [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 >> >> [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 >> >> [ 1541.864698] [err -512][pos 419][E @ >> >> ckpt_read_obj_type:426]Expecting to read type 9001 >> >> [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 >> >> [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart >> >> failed (coordinator) >> >> [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 >> >> [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx >> >> [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 >> >> [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks >> >> registered, nr_tasks was 0 nr_total 1 >> >> [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was >> >> 0, ctx->errno -512 >> >> [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags >> >> 0 oflags 1 >> >> [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 >> >> [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 >> >> [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type >> >> Coord state Failed >> >> [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type >> >> Root state Failed >> >> [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type >> >> Ghost state Failed >> >> >> >> thanks, >> >> JP >> >> >> >> > >> >> > --- >> >> > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c >> >> > index 171c867..3288af0 100644 >> >> > --- a/kernel/checkpoint/sys.c >> >> > +++ b/kernel/checkpoint/sys.c >> >> > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, >> >> > continue; >> >> > } >> >> > >> >> > + /* if not last thread - proceed with thread */ >> >> > + task = next_thread(task); >> >> > + if (!thread_group_leader(task)) >> >> > + continue; >> >> > + >> >> > /* by definition, skip siblings of root */ >> >> > while (task != root) { >> >> > - /* if not last thread - proceed with thread */ >> >> > - task = next_thread(task); >> >> > - if (!thread_group_leader(task)) >> >> > - break; >> >> > - >> >> > /* if has sibling - proceed with sibling */ >> >> > if (!list_is_last(&task->sibling, &parent->children)) { >> >> > task = list_entry(task->sibling.next, >> >> > --- >> >> >> >> >> >> ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTimW98q0sFZeCAk3xHsEfBV9yhL4kUKHjNGxn_2P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTimW98q0sFZeCAk3xHsEfBV9yhL4kUKHjNGxn_2P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-26 11:18 ` Oren Laadan [not found] ` <Pine.LNX.4.64.1007260711310.1050-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Oren Laadan @ 2010-07-26 11:18 UTC (permalink / raw) To: John Paul Walters; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA [-- Attachment #1: Type: TEXT/PLAIN, Size: 9692 bytes --] Hi John, Please try the following patch - it should be applied _instead_ of the patch I sent on 7/20. The previous patch was still insufficient when the root task has not only threads, but also a child (the child was a "ghost" task used temporarily during restart). I believe this patch correctly addresses the problem, and I tested against your program with and without --pidns. I'll wait for your confirmation before pushing the fix to cpt-v22-dev. Thanks ! Oren. --- diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c index 171c867..c5517c2 100644 --- a/kernel/checkpoint/sys.c +++ b/kernel/checkpoint/sys.c @@ -625,8 +625,11 @@ int walk_task_subtree(struct task_struct *root, } /* if we arrive at root again -- done */ - if (task == root) - break; + if (task == root) { + /* if not last thread - proceed with thread */ + task = root = next_thread(task); + if (thread_group_leader(task)) + break; } read_unlock(&tasklist_lock); --- On Thu, 22 Jul 2010, John Paul Walters wrote: > Hi Oren, > > Thanks for the patch. For the --pidns case, that seems to have solved > the problem. In the case of --no-pidns, restart still hangs as > described before. Should this work with in the --no-pidns case, or is > it expected to fail in this case? > > JP > > On Wed, Jul 21, 2010 at 9:04 PM, Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org> wrote: > > Hi John, > > > > This is a bit embarrassing, the behavior sounds too familiar -- > > please try to following patch: > > > > -- > > diff --git a/arch/x86/kernel/checkpoint.c b/arch/x86/kernel/checkpoint.c > > index 3fb9deb..b770f70 100644 > > --- a/arch/x86/kernel/checkpoint.c > > +++ b/arch/x86/kernel/checkpoint.c > > @@ -104,7 +104,7 @@ int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t) > > h->gdt_entry_tls_entries = GDT_ENTRY_TLS_ENTRIES; > > h->sizeof_tls_array = tls_size; > > h->sysenter_return = (__u64) (unsigned long) > > - task_thread_info(current)->sysenter_return; > > + task_thread_info(t)->sysenter_return; > > > > /* For simplicity dump the entire array */ > > memcpy(h + 1, t->thread.tls_array, tls_size); > > -- > > > > On Wed, 21 Jul 2010, John Paul Walters wrote: > > > >> >> > >> >> Hi Oren, > >> >> > >> >> I'm still unable to fully restart the application with your patch, but > >> >> the result is now different. If I attempt to restart using --pidns > >> >> and -F, both threads are created and frozen. However, as soon as I > >> >> thaw them I get a segfault. If I attempt to restart them without the > >> >> --pidns option, I get a message from restart indicating that it's > >> >> about to call sys_restart and restart hangs. I also have the > >> >> following in my syslog: > >> > > >> > Hi John, > >> > > >> > I assume the log below is for the --no-pidns case, right ? > >> > Can you also post the output of 'restart -vd ...' ? > >> > (Unfortunately I won't have a chance to try it until the weekend) > >> > > >> > >> Hi Oren, > >> > >> That's correct, the original log was for the --no-pidns case. Below > >> I've included the restart log up to the point where it hangs at > >> sys_restart. Thanks again for all of your help. > >> > >> best, > >> JP > >> > >> ./restart -v -d --no-pidns < checkpoint_out > >> <4124>number of tasks: 2 > >> <4124>number of vpids: 0 > >> <4124>total tasks (including ghosts): 3 > >> <4124>pid 3583: thread tgid 3582 > >> <4124>pid 3583: creator set to 3582 > >> <4124>pid 1: propagate session 3582 > >> <4124>pid 1: creator set to 3582 > >> <4124>pid 1: set session > >> <4124>pid 1: moving up to 3582 > >> <4124>====== TASKS > >> <4124> [0] pid 3582 ppid 3349 sid 0 creator 0 > >> <4124> [1] pid 3583 ppid 3349 sid 0 creator 3582 prev 1 T > >> <4124> [2] pid 1 ppid 3582 sid 3582 creator 3582 next 3583 S G > >> <4124>............ > >> <4124>task[0].vidx = -1 > >> <4124>task[1].vidx = -1 > >> <4124>subtree (existing pidns) > >> <4124>forking child vpid 3582 flags 0x1 > >> <4124>task 3582 forking with flags 11 numpids 1 > >> <4124>task 3582 pid[0]=0 > >> <4124>forked child vpid 4126 (asked 3582) > >> <4126>root task pid 4126 > >> <4126>pid 3582: pid 4126 sid 3386 parent 4124 > >> <4126>pid 3582: fork child 1 with session > >> <4126>forking child vpid 1 flags 0x12 > >> <4126>task 1 forking with flags 11 numpids 1 > >> <4126>task 1 pid[0]=0 > >> <4126>forked child vpid 4127 (asked 1) > >> <4126>pid 3582: fork child 3583 without session > >> <4126>forking child vpid 3583 flags 0x4 > >> <4126>task 3583 forking with flags 10911 numpids 1 > >> <4126>task 3583 pid[0]=0 > >> <4126>forked child vpid 4128 (asked 3583) > >> <4126>about to call sys_restart(), flags 0 > >> <4125>====== PIDS ARRAY > >> <4125>[0] pid 3582 ppid 1 sid 1 pgid 3582 > >> <4125>[1] pid 3583 ppid 1 sid 1 pgid 3582 > >> <4125>............ > >> <4125>c/r swap old 3582 new 4126 > >> <4128>pid 3583: pid 4128 sid 3386 parent 4124 > >> <4128>about to call sys_restart(), flags 0 > >> <4125>c/r swap old 3583 new 4128 > >> <4127>pid 1: pid 4127 sid 3386 parent 4126 > >> <4125>c/r swap old 1 new 4127 > >> <4125>====== PIDS ARRAY (swaped) > >> <4125>[0] pid 4126 ppid 1 sid 4127 pgid 4126 > >> <4125>[1] pid 4128 ppid 1 sid 4127 pgid 4126 > >> <4125>............ > >> <4125>c/r read input 16384 > >> <4127>about to call sys_restart(), flags 0x4 > >> <4125>c/r read input 16384 > >> <4125>c/r read input 16384 > >> <4125>c/r read input 16384 > >> <4125>c/r read input 16384 > >> > >> > >> > >> > >> > >> > >> > Thanks, > >> > > >> > Oren. > >> > > >> >> > >> >> > >> >> [ 1482.348060] [3753:3753:c/r:walk_task_subtree:633] total 2 ret 1 > >> >> [ 1482.348060] [3753:3753:c/r:prepare_descendants:1148] nr 2/2 > >> >> [ 1482.348060] [3753:3753:c/r:do_restore_coord:1320] restore prepare: 2 > >> >> [ 1541.864073] [err -512][pos 419][E @ do_ghost_task:973]ghost restart failed > >> >> [ 1541.864343] [err -512][pos 419][E @ do_restore_task:1084]task restart failed > >> >> [ 1541.864346] [3755:3755:c/r:clear_task_ctx:852] task 3755 clear checkpoint_ctx > >> >> [ 1541.864349] [3755:3755:c/r:do_restart:1444] restart err -4, exiting > >> >> [ 1541.864352] [3755:3755:c/r:do_restart:1451] sys_restart returns -4 > >> >> [ 1541.864366] [3757:3757:c/r:wait_checkpoint_ctx:938] > >> >> wait_checkpoint_ctx: failed (-512) > >> >> [ 1541.864368] [3757:3757:c/r:do_restart:1444] restart err -4, exiting > >> >> [ 1541.864371] [3757:3757:c/r:do_restart:1451] sys_restart returns -4 > >> >> [ 1541.864689] [3753:3753:c/r:wait_all_tasks_finish:1173] final sync > >> >> kflags 0x1a (ret 0) > >> >> [ 1541.864692] [3753:3753:c/r:do_restore_coord:1325] restore finish: 0 > >> >> [ 1541.864694] [3753:3753:c/r:do_restore_coord:1331] restore deferqueue: 0 > >> >> [ 1541.864698] [err -512][pos 419][E @ > >> >> ckpt_read_obj_type:426]Expecting to read type 9001 > >> >> [ 1541.864700] [3753:3753:c/r:do_restore_coord:1336] restore tail: -512 > >> >> [ 1541.864703] [err -512][pos 419][E @ do_restore_coord:1350]restart > >> >> failed (coordinator) > >> >> [ 1541.864706] [3753:3753:c/r:walk_task_subtree:633] total 0 ret 0 > >> >> [ 1541.864709] [3753:3753:c/r:clear_task_ctx:852] task 3753 clear checkpoint_ctx > >> >> [ 1541.864715] [3753:3753:c/r:do_restart:1451] sys_restart returns -4 > >> >> [ 1541.864718] [3753:3753:c/r:restore_debug_free:144] 3 tasks > >> >> registered, nr_tasks was 0 nr_total 1 > >> >> [ 1541.864721] [3753:3753:c/r:restore_debug_free:147] active pid was > >> >> 0, ctx->errno -512 > >> >> [ 1541.864723] [3753:3753:c/r:restore_debug_free:149] kflags 26 uflags > >> >> 0 oflags 1 > >> >> [ 1541.864726] [3753:3753:c/r:restore_debug_free:151] task[0] to run 3755 > >> >> [ 1541.864728] [3753:3753:c/r:restore_debug_free:151] task[1] to run 3757 > >> >> [ 1541.864731] [3753:3753:c/r:restore_debug_free:176] pid 3753 type > >> >> Coord state Failed > >> >> [ 1541.864735] [3753:3753:c/r:restore_debug_free:176] pid 3755 type > >> >> Root state Failed > >> >> [ 1541.864737] [3753:3753:c/r:restore_debug_free:176] pid 3756 type > >> >> Ghost state Failed > >> >> > >> >> thanks, > >> >> JP > >> >> > >> >> > > >> >> > --- > >> >> > diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > >> >> > index 171c867..3288af0 100644 > >> >> > --- a/kernel/checkpoint/sys.c > >> >> > +++ b/kernel/checkpoint/sys.c > >> >> > @@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root, > >> >> > continue; > >> >> > } > >> >> > > >> >> > + /* if not last thread - proceed with thread */ > >> >> > + task = next_thread(task); > >> >> > + if (!thread_group_leader(task)) > >> >> > + continue; > >> >> > + > >> >> > /* by definition, skip siblings of root */ > >> >> > while (task != root) { > >> >> > - /* if not last thread - proceed with thread */ > >> >> > - task = next_thread(task); > >> >> > - if (!thread_group_leader(task)) > >> >> > - break; > >> >> > - > >> >> > /* if has sibling - proceed with sibling */ > >> >> > if (!list_is_last(&task->sibling, &parent->children)) { > >> >> > task = list_entry(task->sibling.next, > >> >> > --- > >> >> > >> >> > >> > >> > > [-- Attachment #2: Type: text/plain, Size: 206 bytes --] _______________________________________________ Containers mailing list Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <Pine.LNX.4.64.1007260711310.1050-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <Pine.LNX.4.64.1007260711310.1050-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org> @ 2010-07-26 17:11 ` Dan Smith [not found] ` <8739v6tbgj.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Dan Smith @ 2010-07-26 17:11 UTC (permalink / raw) To: Oren Laadan; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA OL> diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c OL> index 171c867..c5517c2 100644 OL> --- a/kernel/checkpoint/sys.c OL> +++ b/kernel/checkpoint/sys.c OL> @@ -625,8 +625,11 @@ int walk_task_subtree(struct task_struct *root, OL> } OL> /* if we arrive at root again -- done */ OL> - if (task == root) OL> - break; OL> + if (task == root) { OL> + /* if not last thread - proceed with thread */ OL> + task = root = next_thread(task); OL> + if (thread_group_leader(task)) OL> + break; } // Need to close this block Otherwise it seems to work for me: Tested-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> -- Dan Smith IBM Linux Technology Center email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <8739v6tbgj.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <8739v6tbgj.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org> @ 2010-07-26 17:56 ` John Paul Walters [not found] ` <AANLkTikaaxCdjgKywJ6SvHpez_R1PNiW5LzNYAdAONxr-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: John Paul Walters @ 2010-07-26 17:56 UTC (permalink / raw) To: Dan Smith; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA It works for me as well. Thanks for your help Oren. JP On Mon, Jul 26, 2010 at 1:11 PM, Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: > OL> diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c > OL> index 171c867..c5517c2 100644 > OL> --- a/kernel/checkpoint/sys.c > OL> +++ b/kernel/checkpoint/sys.c > OL> @@ -625,8 +625,11 @@ int walk_task_subtree(struct task_struct *root, > OL> } > > OL> /* if we arrive at root again -- done */ > OL> - if (task == root) > OL> - break; > OL> + if (task == root) { > OL> + /* if not last thread - proceed with thread */ > OL> + task = root = next_thread(task); > OL> + if (thread_group_leader(task)) > OL> + break; > > } // Need to close this block > > Otherwise it seems to work for me: > > Tested-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> > > -- > Dan Smith > IBM Linux Technology Center > email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org > ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AANLkTikaaxCdjgKywJ6SvHpez_R1PNiW5LzNYAdAONxr-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: multi-threaded app fails to restart [not found] ` <AANLkTikaaxCdjgKywJ6SvHpez_R1PNiW5LzNYAdAONxr-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2010-07-26 18:18 ` Oren Laadan 0 siblings, 0 replies; 15+ messages in thread From: Oren Laadan @ 2010-07-26 18:18 UTC (permalink / raw) To: John Paul Walters Cc: Dan Smith, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Great. Pushed fixes to ckpt-v22-dev. Oren. On 07/26/2010 01:56 PM, John Paul Walters wrote: > It works for me as well. Thanks for your help Oren. > > JP > > > > On Mon, Jul 26, 2010 at 1:11 PM, Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote: >> OL> diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c >> OL> index 171c867..c5517c2 100644 >> OL> --- a/kernel/checkpoint/sys.c >> OL> +++ b/kernel/checkpoint/sys.c >> OL> @@ -625,8 +625,11 @@ int walk_task_subtree(struct task_struct *root, >> OL> } >> >> OL> /* if we arrive at root again -- done */ >> OL> - if (task == root) >> OL> - break; >> OL> + if (task == root) { >> OL> + /* if not last thread - proceed with thread */ >> OL> + task = root = next_thread(task); >> OL> + if (thread_group_leader(task)) >> OL> + break; >> >> } // Need to close this block >> >> Otherwise it seems to work for me: >> >> Tested-by: Dan Smith <danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> >> >> -- >> Dan Smith >> IBM Linux Technology Center >> email: danms-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-07-26 18:18 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-19 19:36 multi-threaded app fails to restart John Paul Walters
[not found] ` <AANLkTilxfsYGyYLwO__VmDLSFQ_s_Qe03G49kIEztVja-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-19 19:54 ` Nathan Lynch
2010-07-19 20:27 ` John Paul Walters
[not found] ` <AANLkTimpXSXQr1wew1wvZKnBFsOXD7f2tblY4EGmJoFM-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-20 3:24 ` Oren Laadan
[not found] ` <4C4516DD.1000809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-20 18:58 ` John Paul Walters
[not found] ` <AANLkTimPENgm-LSh6iMv2uxegRdHEivbGMTYmEfiOEJG-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-20 23:12 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.1007201906370.15255-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-21 0:03 ` John Paul Walters
[not found] ` <AANLkTinZYiWPtSegjRJWnlc6hipFAZyujr8-2ug6ettF-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-21 5:54 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.1007210143120.22870-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-21 12:52 ` John Paul Walters
[not found] ` <AANLkTinOFIzK8RZnp9NHouKv-WA7Omr08pPTGfrfVLfP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-22 1:04 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.1007212102010.6257-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-22 16:23 ` John Paul Walters
[not found] ` <AANLkTimW98q0sFZeCAk3xHsEfBV9yhL4kUKHjNGxn_2P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-26 11:18 ` Oren Laadan
[not found] ` <Pine.LNX.4.64.1007260711310.1050-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-26 17:11 ` Dan Smith
[not found] ` <8739v6tbgj.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2010-07-26 17:56 ` John Paul Walters
[not found] ` <AANLkTikaaxCdjgKywJ6SvHpez_R1PNiW5LzNYAdAONxr-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-26 18:18 ` Oren Laadan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.