All of lore.kernel.org
 help / color / mirror / Atom feed
* multi-threaded app fails to restart
@ 2010-07-19 19:36 John Paul Walters
       [not found] ` <AANLkTilxfsYGyYLwO__VmDLSFQ_s_Qe03G49kIEztVja-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: John Paul Walters @ 2010-07-19 19:36 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

I have a very simple multi-threaded application that I'm testing with,
but I'm unable to get a restart to complete.  I've tried both versions
21 and version 22-dev.  I'm using a debian 32 bit install inside of a
VMWare fusion virtual machine.  The problem seems to be limited to
threads as I'm able to checkpoint and restart the multitask test
application.  The steps that I'm executing are:

./pthread_test  &
[1] 3982

 ps -efL | grep pthread_test
jwalters  3982  3357  3982  0    2 19:21 pts/0    00:00:00 ./pthread_test
jwalters  3982  3357  3983  0    2 19:21 pts/0    00:00:00 ./pthread_test

for i in 3982 3983; do echo $i > /containers/1/tasks ; done

echo FROZEN /containers/1/freezer.state

cat /containers/1/freezer.state
FROZEN

 ./checkpoint 3982 > checkpoint_out
(there aren't any unusual looking messages in the dmesg output at this point)

After thawing and killing off the running instance, I attempt to restart:
./restart -d < checkpoint_out
...

<4030>c/r read input 16384
<4030>c/r read input 16384
<4030>c/r read input 12789
<4030>c/r read input 0
<4029>restart succeeded
<4029>SIGCHLD: already collected
<4029>task terminated with signal 11
<4029>c/r succeeded

The tail end of the syslog also contains:
[ 3210.327177] [4029:4029:c/r:do_restart:1451] sys_restart returns 0
[ 3210.327190] [4033:4033:c/r:wait_task_sync:919] task sync done (errno 0)
[ 3210.327192] [4033:4033:c/r:clear_task_ctx:852] task 4033 clear checkpoint_ctx
[ 3210.327194] [4033:4033:c/r:do_restart:1451] sys_restart returns -516
[ 3210.327227] pthread_test[4033]: segfault at b781f424 ip b781f424 sp
b75cc1c0 error 4
[ 3210.330254] [4031:4031:c/r:wait_task_sync:919] task sync done (errno 0)
[ 3210.330257] [4031:4031:c/r:clear_task_ctx:852] task 4031 clear checkpoint_ctx
[ 3210.330259] [4031:4031:c/r:restore_debug_free:144] 4 tasks
registered, nr_tasks was 0 nr_total 0
[ 3210.330261] [4031:4031:c/r:restore_debug_free:147] active pid was
2, ctx->errno 0
[ 3210.330263] [4031:4031:c/r:restore_debug_free:149] kflags 22 uflags
0 oflags 1
[ 3210.330265] [4031:4031:c/r:restore_debug_free:151] task[0] to run 4031
[ 3210.330267] [4031:4031:c/r:restore_debug_free:151] task[1] to run 4033
[ 3210.330269] [4031:4031:c/r:restore_debug_free:176] pid 4029 type
Coord state Success
[ 3210.330272] [4031:4031:c/r:restore_debug_free:176] pid 4031 type
Root state Success
[ 3210.330274] [4031:4031:c/r:restore_debug_free:176] pid 4033 type
Task state Success
[ 3210.330276] [4031:4031:c/r:restore_debug_free:176] pid 4032 type
Ghost state Success
[ 3210.330285] [4031:4031:c/r:pgarr_release_pages:102] total pages 0
[ 3210.330288] [4031:4031:c/r:do_restart:1451] sys_restart returns -512

Any thoughts?

best regards,
JP

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-07-26 18:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-19 19:36 multi-threaded app fails to restart John Paul Walters
     [not found] ` <AANLkTilxfsYGyYLwO__VmDLSFQ_s_Qe03G49kIEztVja-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-19 19:54   ` Nathan Lynch
2010-07-19 20:27     ` John Paul Walters
     [not found]       ` <AANLkTimpXSXQr1wew1wvZKnBFsOXD7f2tblY4EGmJoFM-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-20  3:24         ` Oren Laadan
     [not found]           ` <4C4516DD.1000809-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2010-07-20 18:58             ` John Paul Walters
     [not found]               ` <AANLkTimPENgm-LSh6iMv2uxegRdHEivbGMTYmEfiOEJG-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-20 23:12                 ` Oren Laadan
     [not found]                   ` <Pine.LNX.4.64.1007201906370.15255-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-21  0:03                     ` John Paul Walters
     [not found]                       ` <AANLkTinZYiWPtSegjRJWnlc6hipFAZyujr8-2ug6ettF-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-21  5:54                         ` Oren Laadan
     [not found]                           ` <Pine.LNX.4.64.1007210143120.22870-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-21 12:52                             ` John Paul Walters
     [not found]                               ` <AANLkTinOFIzK8RZnp9NHouKv-WA7Omr08pPTGfrfVLfP-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-22  1:04                                 ` Oren Laadan
     [not found]                                   ` <Pine.LNX.4.64.1007212102010.6257-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-22 16:23                                     ` John Paul Walters
     [not found]                                       ` <AANLkTimW98q0sFZeCAk3xHsEfBV9yhL4kUKHjNGxn_2P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-26 11:18                                         ` Oren Laadan
     [not found]                                           ` <Pine.LNX.4.64.1007260711310.1050-CXF6herHY6ykSYb+qCZC/1i27PF6R63G9nwVQlTi/Pw@public.gmane.org>
2010-07-26 17:11                                             ` Dan Smith
     [not found]                                               ` <8739v6tbgj.fsf-FLMGYpZoEPULwtHQx/6qkW3U47Q5hpJU@public.gmane.org>
2010-07-26 17:56                                                 ` John Paul Walters
     [not found]                                                   ` <AANLkTikaaxCdjgKywJ6SvHpez_R1PNiW5LzNYAdAONxr-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-26 18:18                                                     ` Oren Laadan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.