fio newer than 2.2.12 segfaults on examples/tiobench-example.fio

All of lore.kernel.org
 help / color / mirror / Atom feed

* fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
@ 2016-01-17 14:03 Bruce Cran
  2016-01-18 21:57 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Bruce Cran @ 2016-01-17 14:03 UTC (permalink / raw)
  To: fio

I'm seeing a crash when running anything newer than 2.2.12 (I've tried 
2.2.13 and 2.3) on my openSUSE Tumbleweed system: "./fio 
examples/tiobench-example.fio" causes a segfault in pthreads (apparently 
via fio_mutex_down).

-- 
Bruce

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-17 14:03 fio newer than 2.2.12 segfaults on examples/tiobench-example.fio Bruce Cran
@ 2016-01-18 21:57 ` Jens Axboe
  2016-01-21  4:30   ` Bruce Cran
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2016-01-18 21:57 UTC (permalink / raw)
  To: Bruce Cran, fio

On 01/17/2016 07:03 AM, Bruce Cran wrote:
> I'm seeing a crash when running anything newer than 2.2.12 (I've tried
> 2.2.13 and 2.3) on my openSUSE Tumbleweed system: "./fio
> examples/tiobench-example.fio" causes a segfault in pthreads (apparently
> via fio_mutex_down).

Can you compile with with --disable-optimizations passed to configure, 
ensure that a core file is dumped (ulimit -c1000000000000 or whatever), 
and then run gdb ./fio core to show a full backtrace?

I can't reproduce the crash with 2.3 or current -git, runs fine for me.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-18 21:57 ` Jens Axboe
@ 2016-01-21  4:30   ` Bruce Cran
  2016-01-21  4:33     ` Jens Axboe
  2016-01-21  4:38     ` Bruce Cran
  0 siblings, 2 replies; 8+ messages in thread
From: Bruce Cran @ 2016-01-21  4:30 UTC (permalink / raw)
  To: Jens Axboe, fio

On 01/18/2016 02:57 PM, Jens Axboe wrote:
> Can you compile with with --disable-optimizations passed to configure, 
> ensure that a core file is dumped (ulimit -c1000000000000 or 
> whatever), and then run gdb ./fio core to show a full backtrace?
>
> I can't reproduce the crash with 2.3 or current -git, runs fine for me.
>

I'm wondering if this is a microcode bug, since I have one of the new 
Skylake CPUs (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz; stepping: 3; 
microcode: 0x6a) and others have mentioned crashes in 
__lll_unlock_elision related to broken CPUs/BIOSes, though for older 
generations.
Anyway, here's the output and backtrace:

(gdb) run examples/tiobench-example.fio
Starting program: /home/bcran/workspace/fio/fio 
examples/tiobench-example.fio
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
f1: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
f2: (g=1): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
f3: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
f4: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
...
fio-2.4-1-g44ac
Starting 16 processes
Detaching after fork from child process 26797.
[New Thread 0x7fffe7d53700 (LWP 26796)]

Program received signal SIGSEGV, Segmentation fault.
__lll_unlock_elision (lock=0x7ffff7feb000, private=128) at 
../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
29    ../sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file or 
directory.
(gdb) bt
#0  __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at 
../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
#1  0x00000000004482fa in fio_mutex_down_timeout (mutex=0x7ffff7feb000, 
msecs=10000) at mutex.c:141
#2  0x0000000000468dda in run_threads (sk_out=0x0) at backend.c:2183
#3  0x0000000000469487 in fio_backend (sk_out=0x0) at backend.c:2381
#4  0x0000000000486c42 in main (argc=2, argv=0x7fffffffdf18, 
envp=0x7fffffffdf30) at fio.c:63

-- 
Bruce

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-21  4:30   ` Bruce Cran
@ 2016-01-21  4:33     ` Jens Axboe
  2016-01-21  4:43       ` Bruce Cran
  2016-01-21  4:38     ` Bruce Cran
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2016-01-21  4:33 UTC (permalink / raw)
  To: Bruce Cran, fio

On 01/20/2016 09:30 PM, Bruce Cran wrote:
> On 01/18/2016 02:57 PM, Jens Axboe wrote:
>> Can you compile with with --disable-optimizations passed to configure,
>> ensure that a core file is dumped (ulimit -c1000000000000 or
>> whatever), and then run gdb ./fio core to show a full backtrace?
>>
>> I can't reproduce the crash with 2.3 or current -git, runs fine for me.
>>
>
> I'm wondering if this is a microcode bug, since I have one of the new
> Skylake CPUs (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz; stepping: 3;
> microcode: 0x6a) and others have mentioned crashes in
> __lll_unlock_elision related to broken CPUs/BIOSes, though for older
> generations.
> Anyway, here's the output and backtrace:
>
> (gdb) run examples/tiobench-example.fio
> Starting program: /home/bcran/workspace/fio/fio
> examples/tiobench-example.fio
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> f1: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> f2: (g=1): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> f3: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> f4: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
> ...
> fio-2.4-1-g44ac
> Starting 16 processes
> Detaching after fork from child process 26797.
> [New Thread 0x7fffe7d53700 (LWP 26796)]
>
> Program received signal SIGSEGV, Segmentation fault.
> __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at
> ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
> 29    ../sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file or
> directory.
> (gdb) bt
> #0  __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at
> ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29
> #1  0x00000000004482fa in fio_mutex_down_timeout (mutex=0x7ffff7feb000,
> msecs=10000) at mutex.c:141
> #2  0x0000000000468dda in run_threads (sk_out=0x0) at backend.c:2183
> #3  0x0000000000469487 in fio_backend (sk_out=0x0) at backend.c:2381
> #4  0x0000000000486c42 in main (argc=2, argv=0x7fffffffdf18,
> envp=0x7fffffffdf30) at fio.c:63

Can you try and revert commit 09400a60042 and see if that fixes it?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-21  4:33     ` Jens Axboe
@ 2016-01-21  4:43       ` Bruce Cran
  2016-01-21  4:51         ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Bruce Cran @ 2016-01-21  4:43 UTC (permalink / raw)
  To: Jens Axboe, fio

On 01/20/2016 09:33 PM, Jens Axboe wrote:
> Can you try and revert commit 09400a60042 and see if that fixes it?

Yes, that does fix it.

-- 
Bruce


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-21  4:43       ` Bruce Cran
@ 2016-01-21  4:51         ` Jens Axboe
  2016-01-21  4:54           ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2016-01-21  4:51 UTC (permalink / raw)
  To: Bruce Cran, fio

On 01/20/2016 09:43 PM, Bruce Cran wrote:
> On 01/20/2016 09:33 PM, Jens Axboe wrote:
>> Can you try and revert commit 09400a60042 and see if that fixes it?
>
> Yes, that does fix it.

So it sounds like a double unlock issue, as per the bug report you also 
referenced.

Can you checkout out the 2.4 release again, and see if there are cases 
where pthread_cond_timedwait() returns with the mutex unlocked already? 
The man page states:

"Upon successful return, the mutex shall have been locked and shall  be
  owned by the calling thread."

but it's not clear if that's not the case for error return (eg timed out).

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-21  4:51         ` Jens Axboe
@ 2016-01-21  4:54           ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2016-01-21  4:54 UTC (permalink / raw)
  To: Bruce Cran, fio

On 01/20/2016 09:51 PM, Jens Axboe wrote:
> On 01/20/2016 09:43 PM, Bruce Cran wrote:
>> On 01/20/2016 09:33 PM, Jens Axboe wrote:
>>> Can you try and revert commit 09400a60042 and see if that fixes it?
>>
>> Yes, that does fix it.
>
> So it sounds like a double unlock issue, as per the bug report you also
> referenced.
>
> Can you checkout out the 2.4 release again, and see if there are cases
> where pthread_cond_timedwait() returns with the mutex unlocked already?
> The man page states:
>
> "Upon successful return, the mutex shall have been locked and shall  be
>   owned by the calling thread."
>
> but it's not clear if that's not the case for error return (eg timed out).

Duh, nevermind:

http://git.kernel.dk/cgit/fio/commit/?id=42e833fa08803ccea6c99df353398a7423845c51

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio
  2016-01-21  4:30   ` Bruce Cran
  2016-01-21  4:33     ` Jens Axboe
@ 2016-01-21  4:38     ` Bruce Cran
  1 sibling, 0 replies; 8+ messages in thread
From: Bruce Cran @ 2016-01-21  4:38 UTC (permalink / raw)
  To: Jens Axboe, fio

On 01/20/2016 09:30 PM, Bruce Cran wrote:
> I'm wondering if this is a microcode bug, since I have one of the new 
> Skylake CPUs (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz; stepping: 3; 
> microcode: 0x6a) and others have mentioned crashes in 
> __lll_unlock_elision related to broken CPUs/BIOSes, though for older 
> generations.

For anyone who wants to see what the problems around elision lock is all 
about, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574 
(libc6: lock elision hazard on Intel Broadwell and Skylake).

-- 
Bruce


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-01-21  4:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-17 14:03 fio newer than 2.2.12 segfaults on examples/tiobench-example.fio Bruce Cran
2016-01-18 21:57 ` Jens Axboe
2016-01-21  4:30   ` Bruce Cran
2016-01-21  4:33     ` Jens Axboe
2016-01-21  4:43       ` Bruce Cran
2016-01-21  4:51         ` Jens Axboe
2016-01-21  4:54           ` Jens Axboe
2016-01-21  4:38     ` Bruce Cran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.