Flexible I/O Tester development
 help / color / mirror / Atom feed
* fio jobs die with sigsegv if --filesize=1tb
@ 2009-12-23  7:15 Case van Rij
  2009-12-23  7:35 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Case van Rij @ 2009-12-23  7:15 UTC (permalink / raw)
  To: fio

fairly basic random write test over nfs, 1 job, directio enabled,
pre-existing 1TB file results in SIGSEGV,

tested on: CentOS 5.4 x86_64,  2.6.18-164.6.1.el5 kernel, fio from
git, last change: Tue, 22 Dec 2009 08:06:43 +0000

fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite
--bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb
random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K,
ioengine=libaio, iodepth=4
Starting 1 process
fio: pid=24153, got signal=11

Run status group 0 (all jobs):
fio: file hash not empty on exit

strace:
[pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size",
O_RDONLY) = 8
[pid 23961] read(8, "64\n", 32)         = 3
[pid 23961] close(8)                    = 0
[pid 23961] getpriority(PRIO_PROCESS, 0) = 20
[pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0
[pid 23961] getpriority(PRIO_PROCESS, 0) = 20
[pid 23961] io_setup(4, {47657422389248}) = 0
[pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0,
0}, ...}) = 0
[pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8
[pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0
[pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0
[pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) ---

but no core file, since reap_threads cleans up after the sigsegv (?)

with debug=all:
io       26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1
file     26954 goodf=1, badf=2, ff=31
file     26954 get_next_file_rr: 0x2abe73d38028
file     26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb]
file     26954 get file /mnt/nfs/1tb.vdb, ref=1
random   26954 off rand 1425201762
random   26954 free: b=12242389915983151104, idx=536870912, bit=0
fio: pid=26954, got signal=11
process  26952 pid=26954: runstate 4 -> 9
process  26952 terminate group_id=-1
process  26952 setting terminate on random-writers/26954
diskutil    26952 update io ticks

the same test works if I replace --filesize=1tb with --filesize=1gb
(but makes for a far less interesting test).

Regards,
Case

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fio jobs die with sigsegv if --filesize=1tb
  2009-12-23  7:15 fio jobs die with sigsegv if --filesize=1tb Case van Rij
@ 2009-12-23  7:35 ` Jens Axboe
  2009-12-23  7:41   ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2009-12-23  7:35 UTC (permalink / raw)
  To: Case van Rij; +Cc: fio

On Tue, Dec 22 2009, Case van Rij wrote:
> fairly basic random write test over nfs, 1 job, directio enabled,
> pre-existing 1TB file results in SIGSEGV,
> 
> tested on: CentOS 5.4 x86_64,  2.6.18-164.6.1.el5 kernel, fio from
> git, last change: Tue, 22 Dec 2009 08:06:43 +0000
> 
> fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite
> --bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb
> random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K,
> ioengine=libaio, iodepth=4
> Starting 1 process
> fio: pid=24153, got signal=11
> 
> Run status group 0 (all jobs):
> fio: file hash not empty on exit
> 
> strace:
> [pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size",
> O_RDONLY) = 8
> [pid 23961] read(8, "64\n", 32)         = 3
> [pid 23961] close(8)                    = 0
> [pid 23961] getpriority(PRIO_PROCESS, 0) = 20
> [pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0
> [pid 23961] getpriority(PRIO_PROCESS, 0) = 20
> [pid 23961] io_setup(4, {47657422389248}) = 0
> [pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0,
> 0}, ...}) = 0
> [pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8
> [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0
> [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0
> [pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> 
> but no core file, since reap_threads cleans up after the sigsegv (?)
> 
> with debug=all:
> io       26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1
> file     26954 goodf=1, badf=2, ff=31
> file     26954 get_next_file_rr: 0x2abe73d38028
> file     26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb]
> file     26954 get file /mnt/nfs/1tb.vdb, ref=1
> random   26954 off rand 1425201762
> random   26954 free: b=12242389915983151104, idx=536870912, bit=0
> fio: pid=26954, got signal=11
> process  26952 pid=26954: runstate 4 -> 9
> process  26952 terminate group_id=-1
> process  26952 setting terminate on random-writers/26954
> diskutil    26952 update io ticks
> 
> the same test works if I replace --filesize=1tb with --filesize=1gb
> (but makes for a far less interesting test).

Looks like math overflow. Can you double check that ulimit -c is set
reasonably high (I usually just do ulimit -c1000000000), then remove the
-O2 from the fio makefile and recompile, then trigger the problem. That
should give you a clean core dump, invoke gdb with fio and that core
file so we can see exactly where it bombs.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fio jobs die with sigsegv if --filesize=1tb
  2009-12-23  7:35 ` Jens Axboe
@ 2009-12-23  7:41   ` Jens Axboe
  2009-12-23  7:54     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2009-12-23  7:41 UTC (permalink / raw)
  To: Case van Rij; +Cc: fio

On Wed, Dec 23 2009, Jens Axboe wrote:
> On Tue, Dec 22 2009, Case van Rij wrote:
> > fairly basic random write test over nfs, 1 job, directio enabled,
> > pre-existing 1TB file results in SIGSEGV,
> > 
> > tested on: CentOS 5.4 x86_64,  2.6.18-164.6.1.el5 kernel, fio from
> > git, last change: Tue, 22 Dec 2009 08:06:43 +0000
> > 
> > fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite
> > --bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb
> > random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K,
> > ioengine=libaio, iodepth=4
> > Starting 1 process
> > fio: pid=24153, got signal=11
> > 
> > Run status group 0 (all jobs):
> > fio: file hash not empty on exit
> > 
> > strace:
> > [pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size",
> > O_RDONLY) = 8
> > [pid 23961] read(8, "64\n", 32)         = 3
> > [pid 23961] close(8)                    = 0
> > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20
> > [pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0
> > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20
> > [pid 23961] io_setup(4, {47657422389248}) = 0
> > [pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0,
> > 0}, ...}) = 0
> > [pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8
> > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0
> > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0
> > [pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > 
> > but no core file, since reap_threads cleans up after the sigsegv (?)
> > 
> > with debug=all:
> > io       26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1
> > file     26954 goodf=1, badf=2, ff=31
> > file     26954 get_next_file_rr: 0x2abe73d38028
> > file     26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb]
> > file     26954 get file /mnt/nfs/1tb.vdb, ref=1
> > random   26954 off rand 1425201762
> > random   26954 free: b=12242389915983151104, idx=536870912, bit=0
> > fio: pid=26954, got signal=11
> > process  26952 pid=26954: runstate 4 -> 9
> > process  26952 terminate group_id=-1
> > process  26952 setting terminate on random-writers/26954
> > diskutil    26952 update io ticks
> > 
> > the same test works if I replace --filesize=1tb with --filesize=1gb
> > (but makes for a far less interesting test).
> 
> Looks like math overflow. Can you double check that ulimit -c is set
> reasonably high (I usually just do ulimit -c1000000000), then remove the
> -O2 from the fio makefile and recompile, then trigger the problem. That
> should give you a clean core dump, invoke gdb with fio and that core
> file so we can see exactly where it bombs.

I think it's a simple parser problem. Try 'size=1t' instead.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fio jobs die with sigsegv if --filesize=1tb
  2009-12-23  7:41   ` Jens Axboe
@ 2009-12-23  7:54     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2009-12-23  7:54 UTC (permalink / raw)
  To: Case van Rij; +Cc: fio

On Wed, Dec 23 2009, Jens Axboe wrote:
> On Wed, Dec 23 2009, Jens Axboe wrote:
> > On Tue, Dec 22 2009, Case van Rij wrote:
> > > fairly basic random write test over nfs, 1 job, directio enabled,
> > > pre-existing 1TB file results in SIGSEGV,
> > > 
> > > tested on: CentOS 5.4 x86_64,  2.6.18-164.6.1.el5 kernel, fio from
> > > git, last change: Tue, 22 Dec 2009 08:06:43 +0000
> > > 
> > > fio --name=rndwrs --ioengine=libaio --iodepth=4 --rw=randwrite
> > > --bs=32k --direct=1 --size=1tb --numjobs=1 --filename=/mnt/nfs/1tb.vdb
> > > random-writers: (g=0): rw=randwrite, bs=32K-32K/32K-32K,
> > > ioengine=libaio, iodepth=4
> > > Starting 1 process
> > > fio: pid=24153, got signal=11
> > > 
> > > Run status group 0 (all jobs):
> > > fio: file hash not empty on exit
> > > 
> > > strace:
> > > [pid 23961] open("/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size",
> > > O_RDONLY) = 8
> > > [pid 23961] read(8, "64\n", 32)         = 3
> > > [pid 23961] close(8)                    = 0
> > > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20
> > > [pid 23961] setpriority(PRIO_PROCESS, 0, 0) = 0
> > > [pid 23961] getpriority(PRIO_PROCESS, 0) = 20
> > > [pid 23961] io_setup(4, {47657422389248}) = 0
> > > [pid 23961] getrusage(RUSAGE_SELF, {ru_utime={0, 1999}, ru_stime={0,
> > > 0}, ...}) = 0
> > > [pid 23961] open("/mnt/nfs/1tb.vdb", O_RDWR|O_CREAT|O_DIRECT, 0600) = 8
> > > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_DONTNEED) = 0
> > > [pid 23961] fadvise64(8, 0, 1, POSIX_FADV_RANDOM) = 0
> > > [pid 23961] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> > > 
> > > but no core file, since reap_threads cleans up after the sigsegv (?)
> > > 
> > > with debug=all:
> > > io       26954 invalidate cache /mnt/nfs/1tb.vdb: 0/1
> > > file     26954 goodf=1, badf=2, ff=31
> > > file     26954 get_next_file_rr: 0x2abe73d38028
> > > file     26954 get_next_file: 0x2abe73d38028 [/mnt/nfs/1tb.vdb]
> > > file     26954 get file /mnt/nfs/1tb.vdb, ref=1
> > > random   26954 off rand 1425201762
> > > random   26954 free: b=12242389915983151104, idx=536870912, bit=0
> > > fio: pid=26954, got signal=11
> > > process  26952 pid=26954: runstate 4 -> 9
> > > process  26952 terminate group_id=-1
> > > process  26952 setting terminate on random-writers/26954
> > > diskutil    26952 update io ticks
> > > 
> > > the same test works if I replace --filesize=1tb with --filesize=1gb
> > > (but makes for a far less interesting test).
> > 
> > Looks like math overflow. Can you double check that ulimit -c is set
> > reasonably high (I usually just do ulimit -c1000000000), then remove the
> > -O2 from the fio makefile and recompile, then trigger the problem. That
> > should give you a clean core dump, invoke gdb with fio and that core
> > file so we can see exactly where it bombs.
> 
> I think it's a simple parser problem. Try 'size=1t' instead.

This should fix it, then both "1t" and "1tb" works (or upper case). The
segfault is a separate issue, apparently fio doesn't like 1 byte IO jobs
:-)

I'll fix that separately, for now I've committed the below.

diff --git a/parse.c b/parse.c
index 7821861..a55e52b 100644
--- a/parse.c
+++ b/parse.c
@@ -162,9 +162,19 @@ int str_to_decimal(const char *str, long long *val, int kilo, void *data)
 	if (*val == LONG_MAX && errno == ERANGE)
 		return 1;
 
-	if (kilo)
-		*val *= get_mult_bytes(str[len - 1], data);
-	else
+	if (kilo) {
+		const char *p;
+		/*
+		 * if the last char is 'b' or 'B', the user likely used
+		 * "1gb" instead of just "1g". If the second to last is also
+		 * a letter, adjust.
+		 */
+		p = str + len - 1;
+		if ((*p == 'b' || *p == 'B') && isalpha(*(p - 1)))
+			--p;
+
+		*val *= get_mult_bytes(*p, data);
+	} else
 		*val *= get_mult_time(str[len - 1]);
 
 	return 0;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-23  7:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-23  7:15 fio jobs die with sigsegv if --filesize=1tb Case van Rij
2009-12-23  7:35 ` Jens Axboe
2009-12-23  7:41   ` Jens Axboe
2009-12-23  7:54     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox