* fio jobs die with sigsegv if --filesize=1tb
@ 2009-12-23 7:15 Case van Rij
2009-12-23 7:35 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Case van Rij @ 2009-12-23 7:15 UTC (permalink / raw)
To: fio
fairly basic random write test over nfs, 1 job, directio enabled,
pre-existing 1TB file results in SIGSEGV,
tested on: CentOS 5.4 x86_64, 2.6.18-164.6.1.el5 kernel, fio from
git, last change: Tue, 22 Dec 2009 08:06:43 +0000
fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite
--bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb
random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K,
ioengine=libaio, iodepth=4
Starting 1 process
fio: pid=24153, got signal=11
Run status group 0 (all jobs):
fio: file hash not empty on exit
strace:
[pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size",
O_RDONLY) = 8
[pid 23961] read(8, "64\n", 32) = 3
[pid 23961] close(8) = 0
[pid 23961] getpriority(PRIO_PROCESS, 0) = 20
[pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0
[pid 23961] getpriority(PRIO_PROCESS, 0) = 20
[pid 23961] io_setup(4, {47657422389248}) = 0
[pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0,
0}, ...}) = 0
[pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8
[pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0
[pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0
[pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
but no core file, since reap_threads cleans up after the sigsegv (?)
with debug=all:
io 26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1
file 26954 goodf=1, badf=2, ff=31
file 26954 get_next_file_rr: 0x2abe73d38028
file 26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb]
file 26954 get file /mnt/nfs/1tb.vdb, ref=1
random 26954 off rand 1425201762
random 26954 free: b=12242389915983151104, idx=536870912, bit=0
fio: pid=26954, got signal=11
process 26952 pid=26954: runstate 4 -> 9
process 26952 terminate group_id=-1
process 26952 setting terminate on random-writers/26954
diskutil 26952 update io ticks
the same test works if I replace --filesize=1tb with --filesize=1gb
(but makes for a far less interesting test).
Regards,
Case
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: fio jobs die with sigsegv if --filesize=1tb 2009-12-23 7:15 fio jobs die with sigsegv if --filesize=1tb Case van Rij @ 2009-12-23 7:35 ` Jens Axboe 2009-12-23 7:41 ` Jens Axboe 0 siblings, 1 reply; 4+ messages in thread From: Jens Axboe @ 2009-12-23 7:35 UTC (permalink / raw) To: Case van Rij; +Cc: fio On Tue, Dec 22 2009, Case van Rij wrote: > fairly basic random write test over nfs, 1 job, directio enabled, > pre-existing 1TB file results in SIGSEGV, > > tested on: CentOS 5.4 x86_64, 2.6.18-164.6.1.el5 kernel, fio from > git, last change: Tue, 22 Dec 2009 08:06:43 +0000 > > fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite > --bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb > random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K, > ioengine=libaio, iodepth=4 > Starting 1 process > fio: pid=24153, got signal=11 > > Run status group 0 (all jobs): > fio: file hash not empty on exit > > strace: > [pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", > O_RDONLY) = 8 > [pid 23961] read(8, "64\n", 32) = 3 > [pid 23961] close(8) = 0 > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20 > [pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0 > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20 > [pid 23961] io_setup(4, {47657422389248}) = 0 > [pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0, > 0}, ...}) = 0 > [pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8 > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0 > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0 > [pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > but no core file, since reap_threads cleans up after the sigsegv (?) > > with debug=all: > io 26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1 > file 26954 goodf=1, badf=2, ff=31 > file 26954 get_next_file_rr: 0x2abe73d38028 > file 26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb] > file 26954 get file /mnt/nfs/1tb.vdb, ref=1 > random 26954 off rand 1425201762 > random 26954 free: b=12242389915983151104, idx=536870912, bit=0 > fio: pid=26954, got signal=11 > process 26952 pid=26954: runstate 4 -> 9 > process 26952 terminate group_id=-1 > process 26952 setting terminate on random-writers/26954 > diskutil 26952 update io ticks > > the same test works if I replace --filesize=1tb with --filesize=1gb > (but makes for a far less interesting test). Looks like math overflow. Can you double check that ulimit -c is set reasonably high (I usually just do ulimit -c1000000000), then remove the -O2 from the fio makefile and recompile, then trigger the problem. That should give you a clean core dump, invoke gdb with fio and that core file so we can see exactly where it bombs. -- Jens Axboe ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fio jobs die with sigsegv if --filesize=1tb 2009-12-23 7:35 ` Jens Axboe @ 2009-12-23 7:41 ` Jens Axboe 2009-12-23 7:54 ` Jens Axboe 0 siblings, 1 reply; 4+ messages in thread From: Jens Axboe @ 2009-12-23 7:41 UTC (permalink / raw) To: Case van Rij; +Cc: fio On Wed, Dec 23 2009, Jens Axboe wrote: > On Tue, Dec 22 2009, Case van Rij wrote: > > fairly basic random write test over nfs, 1 job, directio enabled, > > pre-existing 1TB file results in SIGSEGV, > > > > tested on: CentOS 5.4 x86_64, 2.6.18-164.6.1.el5 kernel, fio from > > git, last change: Tue, 22 Dec 2009 08:06:43 +0000 > > > > fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite > > --bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb > > random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K, > > ioengine=libaio, iodepth=4 > > Starting 1 process > > fio: pid=24153, got signal=11 > > > > Run status group 0 (all jobs): > > fio: file hash not empty on exit > > > > strace: > > [pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", > > O_RDONLY) = 8 > > [pid 23961] read(8, "64\n", 32) = 3 > > [pid 23961] close(8) = 0 > > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20 > > [pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0 > > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20 > > [pid 23961] io_setup(4, {47657422389248}) = 0 > > [pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0, > > 0}, ...}) = 0 > > [pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8 > > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0 > > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0 > > [pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > > > but no core file, since reap_threads cleans up after the sigsegv (?) > > > > with debug=all: > > io 26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1 > > file 26954 goodf=1, badf=2, ff=31 > > file 26954 get_next_file_rr: 0x2abe73d38028 > > file 26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb] > > file 26954 get file /mnt/nfs/1tb.vdb, ref=1 > > random 26954 off rand 1425201762 > > random 26954 free: b=12242389915983151104, idx=536870912, bit=0 > > fio: pid=26954, got signal=11 > > process 26952 pid=26954: runstate 4 -> 9 > > process 26952 terminate group_id=-1 > > process 26952 setting terminate on random-writers/26954 > > diskutil 26952 update io ticks > > > > the same test works if I replace --filesize=1tb with --filesize=1gb > > (but makes for a far less interesting test). > > Looks like math overflow. Can you double check that ulimit -c is set > reasonably high (I usually just do ulimit -c1000000000), then remove the > -O2 from the fio makefile and recompile, then trigger the problem. That > should give you a clean core dump, invoke gdb with fio and that core > file so we can see exactly where it bombs. I think it's a simple parser problem. Try 'size=1t' instead. -- Jens Axboe ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: fio jobs die with sigsegv if --filesize=1tb 2009-12-23 7:41 ` Jens Axboe @ 2009-12-23 7:54 ` Jens Axboe 0 siblings, 0 replies; 4+ messages in thread From: Jens Axboe @ 2009-12-23 7:54 UTC (permalink / raw) To: Case van Rij; +Cc: fio On Wed, Dec 23 2009, Jens Axboe wrote: > On Wed, Dec 23 2009, Jens Axboe wrote: > > On Tue, Dec 22 2009, Case van Rij wrote: > > > fairly basic random write test over nfs, 1 job, directio enabled, > > > pre-existing 1TB file results in SIGSEGV, > > > > > > tested on: CentOS 5.4 x86_64, 2.6.18-164.6.1.el5 kernel, fio from > > > git, last change: Tue, 22 Dec 2009 08:06:43 +0000 > > > > > > fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite > > > --bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb > > > random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K, > > > ioengine=libaio, iodepth=4 > > > Starting 1 process > > > fio: pid=24153, got signal=11 > > > > > > Run status group 0 (all jobs): > > > fio: file hash not empty on exit > > > > > > strace: > > > [pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size", > > > O_RDONLY) = 8 > > > [pid 23961] read(8, "64\n", 32) = 3 > > > [pid 23961] close(8) = 0 > > > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20 > > > [pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0 > > > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20 > > > [pid 23961] io_setup(4, {47657422389248}) = 0 > > > [pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0, > > > 0}, ...}) = 0 > > > [pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8 > > > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0 > > > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0 > > > [pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > > > > > but no core file, since reap_threads cleans up after the sigsegv (?) > > > > > > with debug=all: > > > io 26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1 > > > file 26954 goodf=1, badf=2, ff=31 > > > file 26954 get_next_file_rr: 0x2abe73d38028 > > > file 26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb] > > > file 26954 get file /mnt/nfs/1tb.vdb, ref=1 > > > random 26954 off rand 1425201762 > > > random 26954 free: b=12242389915983151104, idx=536870912, bit=0 > > > fio: pid=26954, got signal=11 > > > process 26952 pid=26954: runstate 4 -> 9 > > > process 26952 terminate group_id=-1 > > > process 26952 setting terminate on random-writers/26954 > > > diskutil 26952 update io ticks > > > > > > the same test works if I replace --filesize=1tb with --filesize=1gb > > > (but makes for a far less interesting test). > > > > Looks like math overflow. Can you double check that ulimit -c is set > > reasonably high (I usually just do ulimit -c1000000000), then remove the > > -O2 from the fio makefile and recompile, then trigger the problem. That > > should give you a clean core dump, invoke gdb with fio and that core > > file so we can see exactly where it bombs. > > I think it's a simple parser problem. Try 'size=1t' instead. This should fix it, then both "1t" and "1tb" works (or upper case). The segfault is a separate issue, apparently fio doesn't like 1 byte IO jobs :-) I'll fix that separately, for now I've committed the below. diff --git a/parse.c b/parse.c index 7821861..a55e52b 100644 --- a/parse.c +++ b/parse.c @@ -162,9 +162,19 @@ int str_to_decimal(const char *str, long long *val, int kilo, void *data) if (*val == LONG_MAX && errno == ERANGE) return 1; - if (kilo) - *val *= get_mult_bytes(str[len - 1], data); - else + if (kilo) { + const char *p; + /* + * if the last char is 'b' or 'B', the user likely used + * "1gb" instead of just "1g". If the second to last is also + * a letter, adjust. + */ + p = str + len - 1; + if ((*p == 'b' || *p == 'B') && isalpha(*(p - 1))) + --p; + + *val *= get_mult_bytes(*p, data); + } else *val *= get_mult_time(str[len - 1]); return 0; -- Jens Axboe ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-12-23 7:54 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-23 7:15 fio jobs die with sigsegv if --filesize=1tb Case van Rij 2009-12-23 7:35 ` Jens Axboe 2009-12-23 7:41 ` Jens Axboe 2009-12-23 7:54 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox