public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
To: Carmelo AMOROSO <carmelo.amoroso@st.com>
Cc: ltp-list@lists.sourceforge.net
Subject: Re: [LTP] clone03/06 randomly crashing
Date: Mon, 03 Dec 2012 23:00:45 +0800	[thread overview]
Message-ID: <50BCBE9D.7060506@cn.fujitsu.com> (raw)
In-Reply-To: <50BCA312.7070508@st.com>

On 12/03/2012 09:03 PM, Carmelo AMOROSO wrote:
> On 03/12/2012 10.37, Jan Stancek wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Jan Stancek" <jstancek@redhat.com>
>>> To: ltp-list@lists.sourceforge.net
>>> Cc: "Jeffrey Burke" <jburke@redhat.com>
>>> Sent: Friday, 30 November, 2012 3:37:03 PM
>>> Subject: [LTP] clone03/06 randomly crashing
>>>
>>> Hi,
>>>
>>> I'm occasionally getting core files from clone03/clone06 testcases.
>>> The testcase itself gives PASS, it is the child which is randomly
>>> crashing.
>>> It seems to occur more on single cpu systems.
>>>
>>> For example:
>>> Core was generated by `clone03'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  0x0000000000402bfd in tst_print (tcid=0x403d0e "clone03", tnum=1,
>>> ttype=2,
>>>     tmesg=0x14c6070 "unexpected signal 15 received (pid = 17427).")
>>>     at tst_res.c:412
>>> 412	{
>>> (gdb) bt
>>> #0  0x0000000000402bfd in tst_print (tcid=0x403d0e "clone03", tnum=1,
>>> ttype=2,
>>>     tmesg=0x14c6070 "unexpected signal 15 received (pid = 17427).")
>>>     at tst_res.c:412
>>> #1  0x00000000004031be in tst_res (ttype=2, fname=<value optimized
>>> out>, arg_fmt=<value optimized out>) at tst_res.c:316
>>> #2  0x0000000000403761 in tst_brk (ttype=2, fname=0x0, func=0x4013d0
>>> <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:640
>>> #3  0x0000000000403960 in tst_brkm (ttype=2, func=0x4013d0 <cleanup>,
>>> arg_fmt=<value optimized out>) at tst_res.c:698
>>> #4  0x0000000000403b45 in def_handler (sig=15) at tst_sig.c:248
>>> #5  <signal handler called>
>>> #6  0x00000037940db650 in __write_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:82
>>> #7  0x000000000040169e in child_fn () at clone03.c:208
>>> #8  0x00000037940e890d in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>>
>>> Dump of assembler code for function tst_print:
>>>    0x0000000000402bd0 <+0>:	mov    %rbx,-0x30(%rsp)
>>>    0x0000000000402bd5 <+5>:	mov    %rbp,-0x28(%rsp)
>>>    0x0000000000402bda <+10>:	mov    %edx,%ebx
>>>    0x0000000000402bdc <+12>:	mov    %r12,-0x20(%rsp)
>>>    0x0000000000402be1 <+17>:	mov    %r13,-0x18(%rsp)
>>>    0x0000000000402be6 <+22>:	mov    %rdi,%r12
>>>    0x0000000000402be9 <+25>:	mov    %r14,-0x10(%rsp)
>>>    0x0000000000402bee <+30>:	mov    %r15,-0x8(%rsp)
>>>    0x0000000000402bf3 <+35>:	sub    $0x2858,%rsp
>>>    0x0000000000402bfa <+42>:	mov    %esi,%r14d
>>> => 0x0000000000402bfd <+45>:	mov    %rcx,0x18(%rsp)
>>>
>>> (gdb) p $rsp
>>> $1 = (void *) 0x14c3800
>>> (gdb) x/1x $rsp
>>> 0x14c3800:	Cannot access memory at address 0x14c3800
>>>
>>> It looks like it receives SIGTERM and while handling SIGTERM it hits
>>> SIGSEGV.
>>> I don't know what is source of that SIGTERM. I was looking into the
>>> second part
>>> and looks like the stack for child is not large enough.
>>>
>>> I modified clone03.c (see attached clone03_poison.patch) to get some
>>> extra
>>> empty buffer before the child's stack, which was set to pattern 0xDE.
>>>
>>> Before:
>>>                           |-------------------------------|
>>>                      child_stack
>>>                                  child_stack+CHILD_STACK_SIZE
>>> After:
>>>     |---------------------|-------------------------------|
>>> poision_start        child_stack
>>>             child_stack+CHILD_STACK_SIZE
>>>
>>> Now if I start clone03 and kill it I can randomly reproduce the
>>> SIGSEGV (attached clone03_kill.sh).
>>> The backtrace usually looks like:
>>> ... (random place)
>>> #5  0x000000000040324e in tst_res (ttype=2, fname=<value optimized
>>> out>, arg_fmt=<value optimized out>) at tst_res.c:316
>>> #6  0x00000000004037f1 in tst_brk (ttype=2, fname=0x0, func=0x401420
>>> <cleanup>, arg_fmt=<value optimized out>) at tst_res.c:640
>>> #7  0x00000000004039f0 in tst_brkm (ttype=2, func=0x401420 <cleanup>,
>>> arg_fmt=<value optimized out>) at tst_res.c:698
>>> #8  0x0000000000403bd5 in def_handler (sig=13) at tst_sig.c:248
>>> #9  <signal handler called>
>>> #10 0x0000003327cdb650 in __write_nocancel () at
>>> ../sysdeps/unix/syscall-template.S:82
>>> #11 0x000000000040172e in child_fn () at clone03.c:212
>>> #12 0x0000003327ce890d in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
>>>
>>> (gdb) p poison_start
>>> $1 = (void *) 0xa02010
>>> (gdb) p child_stack
>>> $2 = (void *) 0xa03010
>>>
>>> (gdb) x/16x poison_start
>>> 0xa02010:	0xdededede	0xdededede	0xdededede	0xdededede
>>> 0xa02020:	0xdededede	0xdededede	0xdededede	0xdededede
>>> 0xa02030:	0xdededede	0xdededede	0xdededede	0xdededede
>>> 0xa02040:	0xdededede	0xdededede	0xdededede	0xdededede
>>> ...
>>> (gdb)
>>> 0xa02490:	0xdededede	0xdededede	0xdededede	0xdededede
>>> 0xa024a0:	0x00000018	0x00000030	0x00a02800	0x00000000
>>> 0xa024b0:	0x00a02740	0x00000000	0xdededede	0xdededede
>>> 0xa024c0:	0xdededede	0xdededede	0x27409296	0x00000033
>>>
>>> The above shows that 0xDE pattern has been overwritten.
>>>
>>> Extending child stack helps with the second part: SIGSEGV
>>> #define CHILD_STACK_SIZE 16384*4
>>> but I have no idea, where is that first SIGTERM coming from. Any
>>> ideas?
>>
>> It appears to be ltp-pan, which sees the child as orphan.
>> When I added "-d 511", I've got some additional output:
>>
>> <<<execution_status>>>
>> initiation_status="ok"
>> duration=0 termination_type=exited termination_id=0 corefile=no
>> cutime=0 cstime=0
>> <<<test_end>>>
>> pids still running: 
>> orphans still running: -26125 
>> clone03     1  TBROK  :  unexpected signal 15 received (pid = 26126).
>> clone03     2  TBROK  :  Remaining cases broken
>>
>> pan was signaled with sig 2...
>>   propagating sig 2 to orphaned pgrp -26125
>> orphans still running: 
>>
>> I'll send a patch, that adds wait() to parent.
>>
>> Regards,
>> Jan
> 
> Hi Jan,
> I think you're right. We have hit similar problems with setrlimit01, and
> few other tests.
> 
> Unfortunately we did not upstream these patches as we are still working
> with an older LTP.
> 
> I'll try to rebase it and share some other pending patches we are using
> in our project.

Sounds great, thank you very much.

Regards,
Wanlong Gao

> 
> Regards,
> Carmelo
> 
>>
>> ------------------------------------------------------------------------------
>> Keep yourself connected to Go Parallel: 
>> BUILD Helping you discover the best ways to construct your parallel projects.
>> http://goparallel.sourceforge.net
>> _______________________________________________
>> Ltp-list mailing list
>> Ltp-list@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ltp-list
>>
>>
> 
> 
> ------------------------------------------------------------------------------
> Keep yourself connected to Go Parallel: 
> BUILD Helping you discover the best ways to construct your parallel projects.
> http://goparallel.sourceforge.net
> _______________________________________________
> Ltp-list mailing list
> Ltp-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ltp-list
> 


------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

  reply	other threads:[~2012-12-03 15:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 14:37 [LTP] clone03/06 randomly crashing Jan Stancek
2012-12-03  9:37 ` Jan Stancek
2012-12-03 13:03   ` Carmelo AMOROSO
2012-12-03 15:00     ` Wanlong Gao [this message]
2012-12-06  9:43     ` chrubis
     [not found] <50C06D55.2020903@mips.com>
2012-12-06 10:47 ` Jan Stancek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50BCBE9D.7060506@cn.fujitsu.com \
    --to=gaowanlong@cn.fujitsu.com \
    --cc=carmelo.amoroso@st.com \
    --cc=ltp-list@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox