From: Jan Stancek <jstancek@redhat.com>
To: ltp-list@lists.sourceforge.net
Cc: Jeffrey Burke <jburke@redhat.com>
Subject: Re: [LTP] clone03/06 randomly crashing
Date: Mon, 3 Dec 2012 04:37:20 -0500 (EST) [thread overview]
Message-ID: <1835266843.9449106.1354527440428.JavaMail.root@redhat.com> (raw)
In-Reply-To: <50B8C48F.5010700@redhat.com>
----- Original Message -----
> From: "Jan Stancek" <jstancek@redhat.com>
> To: ltp-list@lists.sourceforge.net
> Cc: "Jeffrey Burke" <jburke@redhat.com>
> Sent: Friday, 30 November, 2012 3:37:03 PM
> Subject: [LTP] clone03/06 randomly crashing
>
> Hi,
>
> I'm occasionally getting core files from clone03/clone06 testcases.
> The testcase itself gives PASS, it is the child which is randomly
> crashing.
> It seems to occur more on single cpu systems.
>
> For example:
> Core was generated by `clone03'.
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000402bfd in tst_print (tcid=0x403d0e "clone03", tnum=1,
> ttype=2,
> tmesg=0x14c6070 "unexpected signal 15 received (pid = 17427).")
> at tst_res.c:412
> 412 {
> (gdb) bt
> #0 0x0000000000402bfd in tst_print (tcid=0x403d0e "clone03", tnum=1,
> ttype=2,
> tmesg=0x14c6070 "unexpected signal 15 received (pid = 17427).")
> at tst_res.c:412
> #1 0x00000000004031be in tst_res (ttype=2, fname=<value optimized
> out>, arg_fmt=<value optimized out>) at tst_res.c:316
> #2 0x0000000000403761 in tst_brk (ttype=2, fname=0x0, func=0x4013d0
> <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:640
> #3 0x0000000000403960 in tst_brkm (ttype=2, func=0x4013d0 <cleanup>,
> arg_fmt=<value optimized out>) at tst_res.c:698
> #4 0x0000000000403b45 in def_handler (sig=15) at tst_sig.c:248
> #5 <signal handler called>
> #6 0x00000037940db650 in __write_nocancel () at
> ../sysdeps/unix/syscall-template.S:82
> #7 0x000000000040169e in child_fn () at clone03.c:208
> #8 0x00000037940e890d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> Dump of assembler code for function tst_print:
> 0x0000000000402bd0 <+0>: mov %rbx,-0x30(%rsp)
> 0x0000000000402bd5 <+5>: mov %rbp,-0x28(%rsp)
> 0x0000000000402bda <+10>: mov %edx,%ebx
> 0x0000000000402bdc <+12>: mov %r12,-0x20(%rsp)
> 0x0000000000402be1 <+17>: mov %r13,-0x18(%rsp)
> 0x0000000000402be6 <+22>: mov %rdi,%r12
> 0x0000000000402be9 <+25>: mov %r14,-0x10(%rsp)
> 0x0000000000402bee <+30>: mov %r15,-0x8(%rsp)
> 0x0000000000402bf3 <+35>: sub $0x2858,%rsp
> 0x0000000000402bfa <+42>: mov %esi,%r14d
> => 0x0000000000402bfd <+45>: mov %rcx,0x18(%rsp)
>
> (gdb) p $rsp
> $1 = (void *) 0x14c3800
> (gdb) x/1x $rsp
> 0x14c3800: Cannot access memory at address 0x14c3800
>
> It looks like it receives SIGTERM and while handling SIGTERM it hits
> SIGSEGV.
> I don't know what is source of that SIGTERM. I was looking into the
> second part
> and looks like the stack for child is not large enough.
>
> I modified clone03.c (see attached clone03_poison.patch) to get some
> extra
> empty buffer before the child's stack, which was set to pattern 0xDE.
>
> Before:
> |-------------------------------|
> child_stack
> child_stack+CHILD_STACK_SIZE
> After:
> |---------------------|-------------------------------|
> poision_start child_stack
> child_stack+CHILD_STACK_SIZE
>
> Now if I start clone03 and kill it I can randomly reproduce the
> SIGSEGV (attached clone03_kill.sh).
> The backtrace usually looks like:
> ... (random place)
> #5 0x000000000040324e in tst_res (ttype=2, fname=<value optimized
> out>, arg_fmt=<value optimized out>) at tst_res.c:316
> #6 0x00000000004037f1 in tst_brk (ttype=2, fname=0x0, func=0x401420
> <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:640
> #7 0x00000000004039f0 in tst_brkm (ttype=2, func=0x401420 <cleanup>,
> arg_fmt=<value optimized out>) at tst_res.c:698
> #8 0x0000000000403bd5 in def_handler (sig=13) at tst_sig.c:248
> #9 <signal handler called>
> #10 0x0000003327cdb650 in __write_nocancel () at
> ../sysdeps/unix/syscall-template.S:82
> #11 0x000000000040172e in child_fn () at clone03.c:212
> #12 0x0000003327ce890d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>
> (gdb) p poison_start
> $1 = (void *) 0xa02010
> (gdb) p child_stack
> $2 = (void *) 0xa03010
>
> (gdb) x/16x poison_start
> 0xa02010: 0xdededede 0xdededede 0xdededede 0xdededede
> 0xa02020: 0xdededede 0xdededede 0xdededede 0xdededede
> 0xa02030: 0xdededede 0xdededede 0xdededede 0xdededede
> 0xa02040: 0xdededede 0xdededede 0xdededede 0xdededede
> ...
> (gdb)
> 0xa02490: 0xdededede 0xdededede 0xdededede 0xdededede
> 0xa024a0: 0x00000018 0x00000030 0x00a02800 0x00000000
> 0xa024b0: 0x00a02740 0x00000000 0xdededede 0xdededede
> 0xa024c0: 0xdededede 0xdededede 0x27409296 0x00000033
>
> The above shows that 0xDE pattern has been overwritten.
>
> Extending child stack helps with the second part: SIGSEGV
> #define CHILD_STACK_SIZE 16384*4
> but I have no idea, where is that first SIGTERM coming from. Any
> ideas?
It appears to be ltp-pan, which sees the child as orphan.
When I added "-d 511", I've got some additional output:
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=0 corefile=no
cutime=0 cstime=0
<<<test_end>>>
pids still running:
orphans still running: -26125
clone03 1 TBROK : unexpected signal 15 received (pid = 26126).
clone03 2 TBROK : Remaining cases broken
pan was signaled with sig 2...
propagating sig 2 to orphaned pgrp -26125
orphans still running:
I'll send a patch, that adds wait() to parent.
Regards,
Jan
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list
next prev parent reply other threads:[~2012-12-03 9:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 14:37 [LTP] clone03/06 randomly crashing Jan Stancek
2012-12-03 9:37 ` Jan Stancek [this message]
2012-12-03 13:03 ` Carmelo AMOROSO
2012-12-03 15:00 ` Wanlong Gao
2012-12-06 9:43 ` chrubis
[not found] <50C06D55.2020903@mips.com>
2012-12-06 10:47 ` Jan Stancek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1835266843.9449106.1354527440428.JavaMail.root@redhat.com \
--to=jstancek@redhat.com \
--cc=jburke@redhat.com \
--cc=ltp-list@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox