Flexible I/O Tester development
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Roger Sibert <roger_sibert@xyratex.com>
Cc: FIO <fio@vger.kernel.org>
Subject: Re: core dump / segfault after 48 hour run
Date: Mon, 30 Sep 2013 12:18:48 -0600	[thread overview]
Message-ID: <5249C088.8010202@kernel.dk> (raw)
In-Reply-To: <5249BF2C.8000505@kernel.dk>

On 09/30/2013 12:13 PM, Jens Axboe wrote:
> On 09/30/2013 10:20 AM, Roger Sibert wrote:
>> On Mon, Sep 30, 2013 at 12:07 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>> On 09/30/2013 07:04 AM, Roger Sibert wrote:
>>>> Hello Everyone,
>>>>
>>>> I was looking to use fio to run full disks writes to a SSD after doing
>>>> a secure erase to measure/see how long it takes before the performance
>>>> stabilizes.  Give or take after about 48 hours I see this on the
>>>> screen.
>>>>
>>>> B2-058:~/longtermruntime # ./fio.64bit.static longtermruntime-192h.fio
>>>> seqwrite-phase: (g=0): rw=write, bs=512K-512K/512K-512K/512K-512K,
>>>> ioengine=libaio, iodepth=16
>>>> fio-2.1.2-15-gd5603
>>>> Starting 1 process
>>>> fio: pid=6895, got signal=11ne] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>>>> 06d:07h:05m:31s]
>>>>
>>>> seqwrite-phase: (groupid=0, jobs=1): err= 0: pid=6895: Sun Sep 29 03:40:38 2013
>>>>     lat (usec) : 1000=0.01%
>>>>     lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=99.15%
>>>>     lat (msec) : 100=0.56%, 250=0.28%, 500=0.01%, 750=0.01%
>>>>   cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
>>>>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
>>>>      issued    : total=r=0/w=67108865/d=0, short=r=0/w=0/d=0
>>>>
>>>> Run status group 0 (all jobs):
>>>>   WRITE: io=0KB, aggrb=0KB/s, minb=0KB/s, maxb=0KB/s,
>>>> mint=144006511329msec, maxt=144006511329msec
>>>>
>>>> Disk stats (read/write):
>>>>   sdb: ios=0/67108865, merge=0/0, ticks=0/2354077568,
>>>> in_queue=2353971492, util=100.00%
>>>> fio: file hash not empty on exit
>>>>
>>>> I took a look at one of the core files
>>>>
>>>> B2-057:~/longtermruntime # gdb core core
>>>> GNU gdb (GDB) SUSE (7.0-0.4.16)
>>>> Copyright (C) 2009 Free Software Foundation, Inc.
>>>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>>>> This is free software: you are free to change and redistribute it.
>>>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>>>> and "show warranty" for details.
>>>> This GDB was configured as "x86_64-suse-linux".
>>>> For bug reporting instructions, please see:
>>>> <http://www.gnu.org/software/gdb/bugs/>...
>>>> "/root/longtermruntime/core": not in executable format: File format
>>>> not recognized
>>>> Missing separate debuginfo for the main executable file
>>>> Try: zypper install -C
>>>> "debuginfo(build-id)=559375f8a046f376897b4923007bff5b07ecd8d4"
>>>> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0  0x000000000040a6c9 in ?? ()
>>>>
>>>> Is there anything else that I can do prior to help pull out more debug
>>>> using gdb prior to restarting/retasking this systems?  My gdb skills
>>>> arent that great.
>>>
>>> I know it's a pain to reproduce (especially after a 48h run), but if you
>>> could edit the Makefile and remove the -O3 from the OPTFLAGS, then make
>>> clean, make all, and then reproduce. Then the core files will be of more
>>> use.
>>>
>>> For the core files you have now, try and do a 'bt' when you open them so
>>> I can see a backtrace. That might be enough to see what is going on.
>>>
>>> --
>>> Jens Axboe
>>>
>>
>> Let me try that again...  My gdb skills may be bad but it doesnt mean
>> I shouldnt recognize I was missing something.
>>
>> Changed how I called the core file which should have what you where
>> actually asking for.
>>
>> B2-057:~/longtermruntime # gdb ./fio.64bit.static ./core
>> GNU gdb (GDB) SUSE (7.0-0.4.16)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "x86_64-suse-linux".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /root/longtermruntime/fio.64bit.static...done.
>>
>> warning: core file may not match specified executable file.
>> Core was generated by `./fio.64bit.static longtermruntime-216h.fio'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x000000000040a6c9 in __add_log_sample (iolog=0x872510, val=62,
>> ddir=<value optimized out>, bs=<value optimized out>,
>>     t=<value optimized out>) at stat.c:1517
>> 1517    stat.c: No such file or directory.
>>         in stat.c
>> (gdb) bt
>> #0  0x000000000040a6c9 in __add_log_sample (iolog=0x872510, val=62,
>> ddir=<value optimized out>, bs=<value optimized out>,
>>     t=<value optimized out>) at stat.c:1517
>> #1  0x0000000000440b05 in fio_libaio_queued (nr=1, io_us=0x8929a0,
>> td=0x7fe5e312b000) at engines/libaio.c:199
>> #2  fio_libaio_commit (nr=1, io_us=0x8929a0, td=0x7fe5e312b000) at
>> engines/libaio.c:218
>> #3  0x0000000000405385 in td_io_commit (td=0x7fe5e312b000) at ioengines.c:379
>> #4  0x000000000040572a in td_io_queue (td=0x7fe5e312b000,
>> io_u=0x891f20) at ioengines.c:329
>> #5  0x000000000043692f in do_io (td=0x7fe5e312b000) at backend.c:701
>> #6  thread_main (td=0x7fe5e312b000) at backend.c:1314
>> #7  0x0000000000438447 in fork_main (offset=0, shmid=<value optimized
>> out>) at backend.c:1464
>> #8  run_threads (offset=0, shmid=<value optimized out>) at backend.c:1726
>> #9  0x000000000043889d in fio_backend () at backend.c:1912
>> #10 0x00000000004702a4 in __libc_start_main ()
>> #11 0x0000000000000000 in ?? ()
> 
> OK, that helps a whole lot. So my guess it that you ran out of memory.
> Currently fio does not flush out the existing log, it just keeps
> appending to it and flushes at the end. This is done to not disturb the
> actual data run, but it does mean that for long runs, you can gobble up
> a lot of memory...
> 
> I will commit something that is a little more defensive so we don't
> actually segfault, just stop logging. Then we can look into handling it
> better in the future.

I committed this:

http://git.kernel.dk/?p=fio.git;a=commit;h=3c568239a319087a965b06bc2ed94d058810100f

to handle the failure a bit more gracefully at least.

-- 
Jens Axboe


      reply	other threads:[~2013-09-30 18:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-30 13:04 core dump / segfault after 48 hour run Roger Sibert
2013-09-30 16:07 ` Jens Axboe
2013-09-30 16:17   ` Roger Sibert
2013-09-30 16:20   ` Roger Sibert
2013-09-30 18:13     ` Jens Axboe
2013-09-30 18:18       ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5249C088.8010202@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=roger_sibert@xyratex.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox