* fio newer than 2.2.12 segfaults on examples/tiobench-example.fio @ 2016-01-17 14:03 Bruce Cran 2016-01-18 21:57 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Bruce Cran @ 2016-01-17 14:03 UTC (permalink / raw) To: fio I'm seeing a crash when running anything newer than 2.2.12 (I've tried 2.2.13 and 2.3) on my openSUSE Tumbleweed system: "./fio examples/tiobench-example.fio" causes a segfault in pthreads (apparently via fio_mutex_down). -- Bruce ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-17 14:03 fio newer than 2.2.12 segfaults on examples/tiobench-example.fio Bruce Cran @ 2016-01-18 21:57 ` Jens Axboe 2016-01-21 4:30 ` Bruce Cran 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2016-01-18 21:57 UTC (permalink / raw) To: Bruce Cran, fio On 01/17/2016 07:03 AM, Bruce Cran wrote: > I'm seeing a crash when running anything newer than 2.2.12 (I've tried > 2.2.13 and 2.3) on my openSUSE Tumbleweed system: "./fio > examples/tiobench-example.fio" causes a segfault in pthreads (apparently > via fio_mutex_down). Can you compile with with --disable-optimizations passed to configure, ensure that a core file is dumped (ulimit -c1000000000000 or whatever), and then run gdb ./fio core to show a full backtrace? I can't reproduce the crash with 2.3 or current -git, runs fine for me. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-18 21:57 ` Jens Axboe @ 2016-01-21 4:30 ` Bruce Cran 2016-01-21 4:33 ` Jens Axboe 2016-01-21 4:38 ` Bruce Cran 0 siblings, 2 replies; 8+ messages in thread From: Bruce Cran @ 2016-01-21 4:30 UTC (permalink / raw) To: Jens Axboe, fio On 01/18/2016 02:57 PM, Jens Axboe wrote: > Can you compile with with --disable-optimizations passed to configure, > ensure that a core file is dumped (ulimit -c1000000000000 or > whatever), and then run gdb ./fio core to show a full backtrace? > > I can't reproduce the crash with 2.3 or current -git, runs fine for me. > I'm wondering if this is a microcode bug, since I have one of the new Skylake CPUs (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz; stepping: 3; microcode: 0x6a) and others have mentioned crashes in __lll_unlock_elision related to broken CPUs/BIOSes, though for older generations. Anyway, here's the output and backtrace: (gdb) run examples/tiobench-example.fio Starting program: /home/bcran/workspace/fio/fio examples/tiobench-example.fio [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". f1: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 ... f2: (g=1): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 ... f3: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 ... f4: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 ... fio-2.4-1-g44ac Starting 16 processes Detaching after fork from child process 26797. [New Thread 0x7fffe7d53700 (LWP 26796)] Program received signal SIGSEGV, Segmentation fault. __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 29 ../sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file or directory. (gdb) bt #0 __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 #1 0x00000000004482fa in fio_mutex_down_timeout (mutex=0x7ffff7feb000, msecs=10000) at mutex.c:141 #2 0x0000000000468dda in run_threads (sk_out=0x0) at backend.c:2183 #3 0x0000000000469487 in fio_backend (sk_out=0x0) at backend.c:2381 #4 0x0000000000486c42 in main (argc=2, argv=0x7fffffffdf18, envp=0x7fffffffdf30) at fio.c:63 -- Bruce ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-21 4:30 ` Bruce Cran @ 2016-01-21 4:33 ` Jens Axboe 2016-01-21 4:43 ` Bruce Cran 2016-01-21 4:38 ` Bruce Cran 1 sibling, 1 reply; 8+ messages in thread From: Jens Axboe @ 2016-01-21 4:33 UTC (permalink / raw) To: Bruce Cran, fio On 01/20/2016 09:30 PM, Bruce Cran wrote: > On 01/18/2016 02:57 PM, Jens Axboe wrote: >> Can you compile with with --disable-optimizations passed to configure, >> ensure that a core file is dumped (ulimit -c1000000000000 or >> whatever), and then run gdb ./fio core to show a full backtrace? >> >> I can't reproduce the crash with 2.3 or current -git, runs fine for me. >> > > I'm wondering if this is a microcode bug, since I have one of the new > Skylake CPUs (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz; stepping: 3; > microcode: 0x6a) and others have mentioned crashes in > __lll_unlock_elision related to broken CPUs/BIOSes, though for older > generations. > Anyway, here's the output and backtrace: > > (gdb) run examples/tiobench-example.fio > Starting program: /home/bcran/workspace/fio/fio > examples/tiobench-example.fio > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > f1: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 > ... > f2: (g=1): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 > ... > f3: (g=2): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 > ... > f4: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 > ... > fio-2.4-1-g44ac > Starting 16 processes > Detaching after fork from child process 26797. > [New Thread 0x7fffe7d53700 (LWP 26796)] > > Program received signal SIGSEGV, Segmentation fault. > __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at > ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 > 29 ../sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file or > directory. > (gdb) bt > #0 __lll_unlock_elision (lock=0x7ffff7feb000, private=128) at > ../sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 > #1 0x00000000004482fa in fio_mutex_down_timeout (mutex=0x7ffff7feb000, > msecs=10000) at mutex.c:141 > #2 0x0000000000468dda in run_threads (sk_out=0x0) at backend.c:2183 > #3 0x0000000000469487 in fio_backend (sk_out=0x0) at backend.c:2381 > #4 0x0000000000486c42 in main (argc=2, argv=0x7fffffffdf18, > envp=0x7fffffffdf30) at fio.c:63 Can you try and revert commit 09400a60042 and see if that fixes it? -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-21 4:33 ` Jens Axboe @ 2016-01-21 4:43 ` Bruce Cran 2016-01-21 4:51 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Bruce Cran @ 2016-01-21 4:43 UTC (permalink / raw) To: Jens Axboe, fio On 01/20/2016 09:33 PM, Jens Axboe wrote: > Can you try and revert commit 09400a60042 and see if that fixes it? Yes, that does fix it. -- Bruce ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-21 4:43 ` Bruce Cran @ 2016-01-21 4:51 ` Jens Axboe 2016-01-21 4:54 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2016-01-21 4:51 UTC (permalink / raw) To: Bruce Cran, fio On 01/20/2016 09:43 PM, Bruce Cran wrote: > On 01/20/2016 09:33 PM, Jens Axboe wrote: >> Can you try and revert commit 09400a60042 and see if that fixes it? > > Yes, that does fix it. So it sounds like a double unlock issue, as per the bug report you also referenced. Can you checkout out the 2.4 release again, and see if there are cases where pthread_cond_timedwait() returns with the mutex unlocked already? The man page states: "Upon successful return, the mutex shall have been locked and shall be owned by the calling thread." but it's not clear if that's not the case for error return (eg timed out). -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-21 4:51 ` Jens Axboe @ 2016-01-21 4:54 ` Jens Axboe 0 siblings, 0 replies; 8+ messages in thread From: Jens Axboe @ 2016-01-21 4:54 UTC (permalink / raw) To: Bruce Cran, fio On 01/20/2016 09:51 PM, Jens Axboe wrote: > On 01/20/2016 09:43 PM, Bruce Cran wrote: >> On 01/20/2016 09:33 PM, Jens Axboe wrote: >>> Can you try and revert commit 09400a60042 and see if that fixes it? >> >> Yes, that does fix it. > > So it sounds like a double unlock issue, as per the bug report you also > referenced. > > Can you checkout out the 2.4 release again, and see if there are cases > where pthread_cond_timedwait() returns with the mutex unlocked already? > The man page states: > > "Upon successful return, the mutex shall have been locked and shall be > owned by the calling thread." > > but it's not clear if that's not the case for error return (eg timed out). Duh, nevermind: http://git.kernel.dk/cgit/fio/commit/?id=42e833fa08803ccea6c99df353398a7423845c51 -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio newer than 2.2.12 segfaults on examples/tiobench-example.fio 2016-01-21 4:30 ` Bruce Cran 2016-01-21 4:33 ` Jens Axboe @ 2016-01-21 4:38 ` Bruce Cran 1 sibling, 0 replies; 8+ messages in thread From: Bruce Cran @ 2016-01-21 4:38 UTC (permalink / raw) To: Jens Axboe, fio On 01/20/2016 09:30 PM, Bruce Cran wrote: > I'm wondering if this is a microcode bug, since I have one of the new > Skylake CPUs (Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz; stepping: 3; > microcode: 0x6a) and others have mentioned crashes in > __lll_unlock_elision related to broken CPUs/BIOSes, though for older > generations. For anyone who wants to see what the problems around elision lock is all about, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574 (libc6: lock elision hazard on Intel Broadwell and Skylake). -- Bruce ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-01-21 4:54 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-17 14:03 fio newer than 2.2.12 segfaults on examples/tiobench-example.fio Bruce Cran 2016-01-18 21:57 ` Jens Axboe 2016-01-21 4:30 ` Bruce Cran 2016-01-21 4:33 ` Jens Axboe 2016-01-21 4:43 ` Bruce Cran 2016-01-21 4:51 ` Jens Axboe 2016-01-21 4:54 ` Jens Axboe 2016-01-21 4:38 ` Bruce Cran
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.