* fio causes segfault after particular of random writes
@ 2010-09-23 12:38 Nikolaus Jeremic
2010-09-24 6:57 ` Jens Axboe
0 siblings, 1 reply; 3+ messages in thread
From: Nikolaus Jeremic @ 2010-09-23 12:38 UTC (permalink / raw)
To: fio
Hello,
I am using fio for benchmarking of SSDs and noticed that fio causes a
segfault after writing about 260000 MB with block size of 4069 bytes at
random in one job. Writing the same or just bigger amount of data
sequentially in 1 MB blocks works well. The situation is reproducible
with several fio versions, i.e. 1.34, 1.41, 1.43, 1.43.2 as of 09/16/2010.
My system runs Gentoo Linux with kernel version 2.6.35:
uname -ar:
Linux ava-srv1 2.6.35-gentoo-r8 #1 SMP Thu Sep 23 01:15:44 CEST 2010 x86_64 Intel(R) Xeon(R) CPU X5550 @ 2.67GHz GenuineIntel GNU/Linux
The kernel log says:
[40594.224072] fio[8194]: segfault at 60000008 ip 000000000040f010 sp 00007fff15d04540 error 6 in fio[400000+3d000]
Shell output:
fio ssd-test-half 2>&1> ssd_raid0_half.txt
fio: pid=8194, got signal=11
fio: file hash not empty on exit
Content of 'ssd_raid0_half.txt':
ssd_raid0_rw: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=31
ssd_raid0_rw: (g=1): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=31
Starting 2 processes
ssd_raid0_rw: (groupid=0, jobs=1): err= 0: pid=5614
Description : [SSD RAID0 random write test]
write: io=152636MB, bw=14733KB/s, iops=3683 , runt=10608985msec
slat (usec): min=0 , max=311 , avg=12.35, stdev= 6.25
clat (usec): min=0 , max=516847 , avg=8401.34, stdev=23180.68
lat (usec): min=0 , max=516861 , avg=8414.20, stdev=23181.16
bw (KB/s) : min= 1565, max=94672, per=100.30%, avg=14776.62, stdev=7192.34
cpu : usr=1.71%, sys=5.54%, ctx=22398949, majf=0, minf=687154
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,>=64=0.0%
issued r/w/d: total=0/39074846/0, short=0/0/0
lat (usec): 2=0.01%, 20=0.01%, 50=0.01%, 100=0.02%, 250=0.30%
lat (usec): 500=2.41%, 750=8.67%, 1000=11.26%
lat (msec): 2=40.79%, 4=17.93%, 10=3.51%, 20=2.65%, 50=9.30%
lat (msec): 100=1.06%, 250=2.05%, 500=0.05%, 750=0.01%
errors : total=0, first_error=0/<Success>
ssd_raid0_rw: (groupid=1, jobs=1): err= 0: pid=8194
Description : [SSD RAID0 random write test]
cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%,>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,>=64=0.0%
issued r/w/d: total=0/67108865/0, short=0/0/0
lat (usec): 2=0.01%, 20=0.01%, 50=0.01%, 100=0.02%, 250=0.47%
lat (usec): 500=2.42%, 750=7.34%, 1000=9.56%
lat (msec): 2=34.94%, 4=17.15%, 10=3.67%, 20=3.17%, 50=14.54%
lat (msec): 100=3.48%, 250=3.15%, 500=0.07%, 750=0.01%
errors : total=0, first_error=0/<Success>
Run status group 0 (all jobs):
WRITE: io=152636MB, aggrb=14732KB/s, minb=15086KB/s, maxb=15086KB/s, mint=10608985msec, maxt=10608985msec
Run status group 1 (all jobs):
Disk stats (read/write):
md9: ios=53/106183711, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=13/26545927, aggrmerge=0/0, aggrticks=2/308056055, aggrin_queue=308040475, aggrutil=0.00%
sdh: ios=4/26554543, merge=0/0, ticks=0/206354320, in_queue=206338240, util=0.00%
sdi: ios=14/26543759, merge=0/0, ticks=0/664104677, in_queue=664089630, util=0.00%
sdj: ios=8/26540642, merge=0/0, ticks=0/206861870, in_queue=206846313, util=0.00%
sdg: ios=29/26544767, merge=0/0, ticks=10/154903353, in_queue=154887717, util=0.00%
The corresponding job file contains:
[global]
name=ssd_raid0_rw
description=SSD RAID0 random write test
bs=4096
ioengine=libaio
iodepth=31
direct=1
continue_on_error=1
filename=/dev/md9
[rand-write1]
rw=randwrite
numjobs=1
norandommap
size=160050446336
stonewall
#write_iolog=io_patterns1.log
write_bw_log=bandwidth1
write_lat_log
[rand-write2]
rw=randwrite
numjobs=1
norandommap
size=320100892672
stonewall
#write_iolog=io_patterns2.log
write_bw_log=bandwidth2
write_lat_log
iostat -m (values were reset for md9 before starting the benchmark):
Linux 2.6.35-gentoo-r8 (ava-srv1) 09/23/10 _x86_64_ (16 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.08 0.01 0.24 5.23 0.00 94.44
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 0.77 0.00 0.06 106 2559
sdc 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sda 0.78 0.00 0.06 104 2559
md0 15.21 0.00 0.06 202 2529
md3 0.01 0.00 0.00 1 0
md2 0.01 0.00 0.00 1 0
md1 0.11 0.00 0.00 4 15
sdg 589.24 0.00 2.30 0 103919
sdh 589.46 0.00 2.30 0 103958
sdi 589.22 0.00 2.30 0 103915
sdj 589.15 0.00 2.30 0 103903
md9 2357.07 0.00 9.21 0 415696
The first job is done well, however the second one causes the segfault.
I would be glad if you could help me. Thank you.
Kind regards,
Nikolaus
--
Dipl.-Inf. Nikolaus Jeremic nikolaus.jeremic@uni-rostock.de
Universitaet Rostock Tel: (+49) 381 / 498 - 7633
Albert-Einstein-Str. 21 Fax: (+49) 381 / 498 - 7482
18051 Rostock, Germany wwwava.informatik.uni-rostock.de
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: fio causes segfault after particular of random writes
2010-09-23 12:38 fio causes segfault after particular of random writes Nikolaus Jeremic
@ 2010-09-24 6:57 ` Jens Axboe
2010-09-24 8:09 ` Jens Axboe
0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2010-09-24 6:57 UTC (permalink / raw)
To: Nikolaus Jeremic; +Cc: fio
On 2010-09-23 14:38, Nikolaus Jeremic wrote:
> Hello,
>
> I am using fio for benchmarking of SSDs and noticed that fio causes a
> segfault after writing about 260000 MB with block size of 4069 bytes
> at random in one job. Writing the same or just bigger amount of data
> sequentially in 1 MB blocks works well. The situation is reproducible
> with several fio versions, i.e. 1.34, 1.41, 1.43, 1.43.2 as of
> 09/16/2010.
That's not good. To help me with this, please do:
- Edit the Makefile in fio, remove the -O2 in there.
- make clean && make
- Run ulimit -c10000000000 or something large like that
- Now reproduce the problem. Fio will segfault again, and produce
a core file.
- compress the fio executable and core file and send them to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: fio causes segfault after particular of random writes
2010-09-24 6:57 ` Jens Axboe
@ 2010-09-24 8:09 ` Jens Axboe
0 siblings, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2010-09-24 8:09 UTC (permalink / raw)
To: Nikolaus Jeremic; +Cc: fio
On 2010-09-24 08:57, Jens Axboe wrote:
> On 2010-09-23 14:38, Nikolaus Jeremic wrote:
>> Hello,
>>
>> I am using fio for benchmarking of SSDs and noticed that fio causes a
>> segfault after writing about 260000 MB with block size of 4069 bytes
>> at random in one job. Writing the same or just bigger amount of data
>> sequentially in 1 MB blocks works well. The situation is reproducible
>> with several fio versions, i.e. 1.34, 1.41, 1.43, 1.43.2 as of
>> 09/16/2010.
>
> That's not good. To help me with this, please do:
>
> - Edit the Makefile in fio, remove the -O2 in there.
> - make clean && make
> - Run ulimit -c10000000000 or something large like that
> - Now reproduce the problem. Fio will segfault again, and produce
> a core file.
> - compress the fio executable and core file and send them to me.
One idea is that the logs grow way too large with your job descriptions.
You could try this patch, it'll prevent the log from overflowing. It
will also slow down the workload, a real fix would need to flush the log
out-of-line.
But give it a spin.
diff --git a/fio.c b/fio.c
index d20fc24..1306acf 100644
--- a/fio.c
+++ b/fio.c
@@ -1188,34 +1188,14 @@ static void *thread_main(void *data)
td->ts.io_bytes[1] = td->io_bytes[1];
fio_mutex_down(writeout_mutex);
- if (td->ts.bw_log) {
- if (td->o.bw_log_file) {
- finish_log_named(td, td->ts.bw_log,
- td->o.bw_log_file, "bw");
- } else
- finish_log(td, td->ts.bw_log, "bw");
- }
- if (td->ts.lat_log) {
- if (td->o.lat_log_file) {
- finish_log_named(td, td->ts.lat_log,
- td->o.lat_log_file, "lat");
- } else
- finish_log(td, td->ts.lat_log, "lat");
- }
- if (td->ts.slat_log) {
- if (td->o.lat_log_file) {
- finish_log_named(td, td->ts.slat_log,
- td->o.lat_log_file, "slat");
- } else
- finish_log(td, td->ts.slat_log, "slat");
- }
- if (td->ts.clat_log) {
- if (td->o.lat_log_file) {
- finish_log_named(td, td->ts.clat_log,
- td->o.lat_log_file, "clat");
- } else
- finish_log(td, td->ts.clat_log, "clat");
- }
+ if (td->ts.bw_log)
+ finish_log(td->ts.bw_log);
+ if (td->ts.lat_log)
+ finish_log(td->ts.lat_log);
+ if (td->ts.slat_log)
+ finish_log(td->ts.slat_log);
+ if (td->ts.clat_log)
+ finish_log(td->ts.clat_log);
fio_mutex_up(writeout_mutex);
if (td->o.exec_postrun)
exec_string(td->o.exec_postrun);
@@ -1680,8 +1660,8 @@ int main(int argc, char *argv[])
return 0;
if (write_bw_log) {
- setup_log(&agg_io_log[DDIR_READ]);
- setup_log(&agg_io_log[DDIR_WRITE]);
+ __setup_log(&agg_io_log[DDIR_READ], "agg-read_bw.log");
+ __setup_log(&agg_io_log[DDIR_WRITE], "agg-write_bw.log");
}
startup_mutex = fio_mutex_init(0);
@@ -1699,9 +1679,8 @@ int main(int argc, char *argv[])
if (!fio_abort) {
show_run_stats();
if (write_bw_log) {
- __finish_log(agg_io_log[DDIR_READ], "agg-read_bw.log");
- __finish_log(agg_io_log[DDIR_WRITE],
- "agg-write_bw.log");
+ finish_log(agg_io_log[DDIR_READ]);
+ finish_log(agg_io_log[DDIR_WRITE]);
}
}
diff --git a/init.c b/init.c
index fe4dbf2..f13d3e4 100644
--- a/init.c
+++ b/init.c
@@ -578,12 +578,25 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
goto err;
if (td->o.write_lat_log) {
- setup_log(&td->ts.lat_log);
- setup_log(&td->ts.slat_log);
- setup_log(&td->ts.clat_log);
+ if (td->o.lat_log_file)
+ setup_log_named(&td->ts.lat_log, td->o.lat_log_file, "lat");
+ else
+ setup_log(td, &td->ts.lat_log, "lat");
+ if (td->o.lat_log_file)
+ setup_log_named(&td->ts.slat_log, td->o.lat_log_file, "slat");
+ else
+ setup_log(td, &td->ts.slat_log, "slat");
+ if (td->o.lat_log_file)
+ setup_log_named(&td->ts.clat_log, td->o.lat_log_file, "clat");
+ else
+ setup_log(td, &td->ts.clat_log, "clat");
+ }
+ if (td->o.write_bw_log) {
+ if (td->o.bw_log_file)
+ setup_log_named(&td->ts.bw_log, td->o.bw_log_file, "bw");
+ else
+ setup_log(td, &td->ts.bw_log, "bw");
}
- if (td->o.write_bw_log)
- setup_log(&td->ts.bw_log);
if (!td->o.name)
td->o.name = strdup(jobname);
diff --git a/iolog.h b/iolog.h
index c35ce1e..2b2aa66 100644
--- a/iolog.h
+++ b/iolog.h
@@ -30,6 +30,8 @@ struct io_log {
unsigned long nr_samples;
unsigned long max_samples;
struct io_sample *log;
+ char *log_name;
+ unsigned int max_log_mb;
};
enum {
@@ -95,10 +97,11 @@ extern void show_run_stats(void);
extern void init_disk_util(struct thread_data *);
extern void update_rusage_stat(struct thread_data *);
extern void update_io_ticks(void);
-extern void setup_log(struct io_log **);
-extern void finish_log(struct thread_data *, struct io_log *, const char *);
-extern void finish_log_named(struct thread_data *, struct io_log *, const char *, const char *);
-extern void __finish_log(struct io_log *, const char *);
+extern void __setup_log(struct io_log **, const char *);
+extern void setup_log(struct thread_data *, struct io_log **, const char *);
+extern void setup_log_named(struct io_log **, const char *, const char *);
+extern void finish_log(struct io_log *);
+extern void flush_log(struct io_log *);
extern struct io_log *agg_io_log[2];
extern int write_bw_log;
extern void add_agg_sample(unsigned long, enum fio_ddir, unsigned int);
diff --git a/log.c b/log.c
index 266dc06..22d2524 100644
--- a/log.c
+++ b/log.c
@@ -491,22 +491,39 @@ int init_iolog(struct thread_data *td)
return ret;
}
-void setup_log(struct io_log **log)
+void __setup_log(struct io_log **log, const char *name)
{
struct io_log *l = malloc(sizeof(*l));
l->nr_samples = 0;
l->max_samples = 1024;
l->log = malloc(l->max_samples * sizeof(struct io_sample));
+ l->log_name = strdup(name);
+ l->max_log_mb = 10;
*log = l;
}
-void __finish_log(struct io_log *log, const char *name)
+void setup_log_named(struct io_log **log, const char *prefix,
+ const char *postfix)
+{
+ char file_name[256], *p;
+
+ snprintf(file_name, 200, "%s_%s.log", prefix, postfix);
+ p = basename(file_name);
+ __setup_log(log, p);
+}
+
+void setup_log(struct thread_data *td, struct io_log **log, const char *name)
+{
+ setup_log_named(log, td->o.name, name);
+}
+
+void flush_log(struct io_log *log)
{
unsigned int i;
FILE *f;
- f = fopen(name, "a");
+ f = fopen(log->log_name, "a");
if (!f) {
perror("fopen log");
return;
@@ -520,21 +537,13 @@ void __finish_log(struct io_log *log, const char *name)
}
fclose(f);
- free(log->log);
- free(log);
-}
-
-void finish_log_named(struct thread_data *td, struct io_log *log,
- const char *prefix, const char *postfix)
-{
- char file_name[256], *p;
-
- snprintf(file_name, 200, "%s_%s.log", prefix, postfix);
- p = basename(file_name);
- __finish_log(log, p);
+ log->nr_samples = 0;
}
-void finish_log(struct thread_data *td, struct io_log *log, const char *name)
+void finish_log(struct io_log *log)
{
- finish_log_named(td, log, td->o.name, name);
+ flush_log(log);
+ free(log->log);
+ free(log->log_name);
+ free(log);
}
diff --git a/stat.c b/stat.c
index b5ff010..02a8ad9 100644
--- a/stat.c
+++ b/stat.c
@@ -730,13 +730,24 @@ static void __add_log_sample(struct io_log *iolog, unsigned long val,
enum fio_ddir ddir, unsigned int bs,
unsigned long time)
{
- const int nr_samples = iolog->nr_samples;
+ int nr_samples = iolog->nr_samples;
if (iolog->nr_samples == iolog->max_samples) {
- int new_size = sizeof(struct io_sample) * iolog->max_samples*2;
-
- iolog->log = realloc(iolog->log, new_size);
- iolog->max_samples <<= 1;
+ int new_size;
+
+ new_size = sizeof(struct io_sample) * iolog->max_samples * 2;
+
+ /*
+ * If it fits, increase log size and add entry. If not, flush
+ * log
+ */
+ if (new_size <= (iolog->max_log_mb * 1024 * 1024UL)) {
+ iolog->log = realloc(iolog->log, new_size);
+ iolog->max_samples <<= 1;
+ } else {
+ flush_log(iolog);
+ nr_samples = iolog->nr_samples;
+ }
}
iolog->log[nr_samples].val = val;
--
Jens Axboe
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-09-24 8:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-23 12:38 fio causes segfault after particular of random writes Nikolaus Jeremic
2010-09-24 6:57 ` Jens Axboe
2010-09-24 8:09 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox