From: Jan Stancek <jstancek@redhat.com>
To: ltp-list@lists.sourceforge.net
Cc: Jeffrey Burke <jburke@redhat.com>
Subject: [LTP] clone03/06 randomly crashing
Date: Fri, 30 Nov 2012 15:37:03 +0100 [thread overview]
Message-ID: <50B8C48F.5010700@redhat.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 4316 bytes --]
Hi,
I'm occasionally getting core files from clone03/clone06 testcases.
The testcase itself gives PASS, it is the child which is randomly crashing.
It seems to occur more on single cpu systems.
For example:
Core was generated by `clone03'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000402bfd in tst_print (tcid=0x403d0e "clone03", tnum=1, ttype=2,
tmesg=0x14c6070 "unexpected signal 15 received (pid = 17427).") at tst_res.c:412
412 {
(gdb) bt
#0 0x0000000000402bfd in tst_print (tcid=0x403d0e "clone03", tnum=1, ttype=2,
tmesg=0x14c6070 "unexpected signal 15 received (pid = 17427).") at tst_res.c:412
#1 0x00000000004031be in tst_res (ttype=2, fname=<value optimized out>, arg_fmt=<value optimized out>) at tst_res.c:316
#2 0x0000000000403761 in tst_brk (ttype=2, fname=0x0, func=0x4013d0 <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:640
#3 0x0000000000403960 in tst_brkm (ttype=2, func=0x4013d0 <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:698
#4 0x0000000000403b45 in def_handler (sig=15) at tst_sig.c:248
#5 <signal handler called>
#6 0x00000037940db650 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#7 0x000000000040169e in child_fn () at clone03.c:208
#8 0x00000037940e890d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Dump of assembler code for function tst_print:
0x0000000000402bd0 <+0>: mov %rbx,-0x30(%rsp)
0x0000000000402bd5 <+5>: mov %rbp,-0x28(%rsp)
0x0000000000402bda <+10>: mov %edx,%ebx
0x0000000000402bdc <+12>: mov %r12,-0x20(%rsp)
0x0000000000402be1 <+17>: mov %r13,-0x18(%rsp)
0x0000000000402be6 <+22>: mov %rdi,%r12
0x0000000000402be9 <+25>: mov %r14,-0x10(%rsp)
0x0000000000402bee <+30>: mov %r15,-0x8(%rsp)
0x0000000000402bf3 <+35>: sub $0x2858,%rsp
0x0000000000402bfa <+42>: mov %esi,%r14d
=> 0x0000000000402bfd <+45>: mov %rcx,0x18(%rsp)
(gdb) p $rsp
$1 = (void *) 0x14c3800
(gdb) x/1x $rsp
0x14c3800: Cannot access memory at address 0x14c3800
It looks like it receives SIGTERM and while handling SIGTERM it hits SIGSEGV.
I don't know what is source of that SIGTERM. I was looking into the second part
and looks like the stack for child is not large enough.
I modified clone03.c (see attached clone03_poison.patch) to get some extra
empty buffer before the child's stack, which was set to pattern 0xDE.
Before:
|-------------------------------|
child_stack child_stack+CHILD_STACK_SIZE
After:
|---------------------|-------------------------------|
poision_start child_stack child_stack+CHILD_STACK_SIZE
Now if I start clone03 and kill it I can randomly reproduce the SIGSEGV (attached clone03_kill.sh).
The backtrace usually looks like:
... (random place)
#5 0x000000000040324e in tst_res (ttype=2, fname=<value optimized out>, arg_fmt=<value optimized out>) at tst_res.c:316
#6 0x00000000004037f1 in tst_brk (ttype=2, fname=0x0, func=0x401420 <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:640
#7 0x00000000004039f0 in tst_brkm (ttype=2, func=0x401420 <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:698
#8 0x0000000000403bd5 in def_handler (sig=13) at tst_sig.c:248
#9 <signal handler called>
#10 0x0000003327cdb650 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#11 0x000000000040172e in child_fn () at clone03.c:212
#12 0x0000003327ce890d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) p poison_start
$1 = (void *) 0xa02010
(gdb) p child_stack
$2 = (void *) 0xa03010
(gdb) x/16x poison_start
0xa02010: 0xdededede 0xdededede 0xdededede 0xdededede
0xa02020: 0xdededede 0xdededede 0xdededede 0xdededede
0xa02030: 0xdededede 0xdededede 0xdededede 0xdededede
0xa02040: 0xdededede 0xdededede 0xdededede 0xdededede
...
(gdb)
0xa02490: 0xdededede 0xdededede 0xdededede 0xdededede
0xa024a0: 0x00000018 0x00000030 0x00a02800 0x00000000
0xa024b0: 0x00a02740 0x00000000 0xdededede 0xdededede
0xa024c0: 0xdededede 0xdededede 0x27409296 0x00000033
The above shows that 0xDE pattern has been overwritten.
Extending child stack helps with the second part: SIGSEGV
#define CHILD_STACK_SIZE 16384*4
but I have no idea, where is that first SIGTERM coming from. Any ideas?
Regards,
Jan
[-- Attachment #2: clone03_kill.sh --]
[-- Type: application/x-sh, Size: 174 bytes --]
[-- Attachment #3: clone03_poison.patch --]
[-- Type: text/x-patch, Size: 1229 bytes --]
diff --git a/testcases/kernel/syscalls/clone/clone03.c b/testcases/kernel/syscalls/clone/clone03.c
index 24ee8e6..dada00c 100644
--- a/testcases/kernel/syscalls/clone/clone03.c
+++ b/testcases/kernel/syscalls/clone/clone03.c
@@ -87,13 +87,15 @@ static int pfd[2];
char *TCID = "clone03"; /* Test program identifier. */
int TST_TOTAL = 1; /* Total number of test cases. */
+void *poison_start; /* stack for child */
+void *child_stack; /* stack for child */
+#define POISON_SIZE getpagesize()
int main(int ac, char **av)
{
int lc;
char *msg;
- void *child_stack; /* stack for child */
char buff[10];
int child_pid;
@@ -104,10 +106,13 @@ int main(int ac, char **av)
setup();
/* Allocate stack for child */
- if ((child_stack = (void *)malloc(CHILD_STACK_SIZE)) == NULL) {
+ if ((poison_start = (void *)malloc(POISON_SIZE+CHILD_STACK_SIZE)) == NULL) {
tst_brkm(TBROK, cleanup, "Cannot allocate stack for child");
}
+ memset(poison_start, 0xDE, POISON_SIZE);
+ child_stack = poison_start + POISON_SIZE;
+
for (lc = 0; TEST_LOOPING(lc); lc++) {
Tst_count = 0;
@@ -154,7 +159,7 @@ int main(int ac, char **av)
}
- free(child_stack);
+ free(poison_start);
cleanup();
tst_exit();
[-- Attachment #4: Type: text/plain, Size: 214 bytes --]
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
TUNE You got it built. Now make it sing. Tune shows you how.
http://goparallel.sourceforge.net
[-- Attachment #5: Type: text/plain, Size: 155 bytes --]
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list
next reply other threads:[~2012-11-30 15:32 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-30 14:37 Jan Stancek [this message]
2012-12-03 9:37 ` [LTP] clone03/06 randomly crashing Jan Stancek
2012-12-03 13:03 ` Carmelo AMOROSO
2012-12-03 15:00 ` Wanlong Gao
2012-12-06 9:43 ` chrubis
[not found] <50C06D55.2020903@mips.com>
2012-12-06 10:47 ` Jan Stancek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50B8C48F.5010700@redhat.com \
--to=jstancek@redhat.com \
--cc=jburke@redhat.com \
--cc=ltp-list@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.