From: Jens Axboe <axboe@kernel.dk>
To: Alan Hagge <Alan.Hagge@warnerbros.com>
Cc: fio@vger.kernel.org
Subject: Re: How to re-use default sequential filenames?
Date: Fri, 5 Apr 2013 10:39:02 +0200 [thread overview]
Message-ID: <20130405083902.GL9683@kernel.dk> (raw)
In-Reply-To: <20130404184104.GE9683@kernel.dk>
On Thu, Apr 04 2013, Jens Axboe wrote:
> On Thu, Apr 04 2013, Jens Axboe wrote:
> > On Thu, Apr 04 2013, Alan Hagge wrote:
> > > I'm trying to put together a test of the write and read speed to some new
> > > SAN storage. Our workflow involves writing large numbers of 12 MiB files
> > > (on the order of 20,000 or so) at a time. I'd like to set up a config file
> > > section that will write all 20,000 files then read all 20,000 files and
> > > report on the write performance and the read performance (separately).
> > >
> > > I've tried something like this:
> > >
> > > [global]
> > > blocksize=4m
> > > filesize=12m
> > > nrfiles=20000
> > > openfiles=1
> > > file_service_type=sequential
> > > create_on_open=1
> > > ioengine=posixaio
> > >
> > > [write]
> > > rw=write
> > >
> > > [read]
> > > stonewall
> > > rw=read
> > >
> > > But the issue is that the files get created with default filenames
> > > (write.1.1, write.1.2, etc.), so that when the read job is run, it can't
> > > find any files (since it expects the files to be named read.1.1, read.1.2,
> > > etc.). If I try to specify the "filename=" option in either section, fio no
> > > longer appends the ".<thread>.<sequence>" to the filename, but rather tries
> > > to do all I/O to a single file.
> > >
> > > Is there a syntax for the "filename=" option that will allow me to specify a
> > > different root filename, but still use the ".<thread>.<sequence>" naming
> > > convention? Failing that, is there any other way to accomplish my goal?
> >
> > Good question, and no, you can't currently do that. But you should be
> > able to do that. Fio has no current option for specifying the naming. We
> > could have a fileprefix= option that allows you to set that.
> >
> > So we currently have two options. The first option is that you take on
> > this task. The file name (if not given with filename=) is generated in
> > init.c:add_job(), here:
> >
> > if (!td->o.filename && !td->files_index && !td->o.read_iolog_file) {
> > file_alloced = 1;
> >
> > if (td->o.nr_files == 1 && exists_and_not_file(jobname))
> > add_file(td, jobname);
> > else {
> > for (i = 0; i < td->o.nr_files; i++) {
> > sprintf(fname, "%s.%d.%d", jobname,
> > td->thread_number, i);
> > add_file(td, fname);
> > }
> > }
> > }
> >
> > Options are pretty easy to add, basically just an entry in the
> > fio_option options[] array in options.c with pretty much
> > self-explanatory fields. Add matching string type in fio.h to
> > thread_options{ }.
> >
> > The other option is that you claim that you are not a programmer, and
> > then you are at the mercy of someone else (most likely me!) doing it for
> > you. Since this is a good feature request, I can be talked into that as
> > well.
> >
> > Let me know.
>
> OK, so I give it a quick shot, see below. Basically it allows you to set
> fileprefix= to override the jobname.threadnumber part of the file. So
> not super flexible, we'd need some reserved keywords to make it fully
> flexible. Eg it would be nifty if you could do:
>
> fileprefix=$jobnum.$threadnum.$filenum
Changed it a bit, the option is now filename_format and it allows the
following keywords, which it replaces with the appropriate name or
number:
$jobname Name of the job
$jobnum Number of the job
$filenum Number of the file in the job
So for your use case, you would do:
filename_format=testfiles.$filenum
and then 'write' and 'read' job would be sharing those files. Let me
know if it works for you.
diff --git a/HOWTO b/HOWTO
index cf6d427..76effee 100644
--- a/HOWTO
+++ b/HOWTO
@@ -285,6 +285,32 @@ filename=str Fio normally makes up a filename based on the job name,
stdin or stdout. Which of the two depends on the read/write
direction set.
+filename_format=str
+ If sharing multiple files between jobs, it is usually necessary
+ to have fio generate the exact names that you want. By default,
+ fio will name a file based on the default file format
+ specification of jobname.jobnumber.filenumber. With this
+ option, that can be customized. Fio will recognize and replace
+ the following keywords in this string:
+
+ $jobname
+ The name of the worker thread or process.
+
+ $jobnum
+ The incremental number of the worker thread or
+ process.
+
+ $filenum
+ The incremental number of the file for that worker
+ thread or process.
+
+ To have dependent jobs share a set of files, this option can
+ be set to have fio generate filenames that are shared between
+ the two. For instance, if testfiles.$filenum is specified,
+ file number 4 for any job will be named testfiles.4. The
+ default of $jobname.$jobnum.$filenum will be used if
+ no other format specifier is given.
+
opendir=str Tell fio to recursively add any file it can find in this
directory and down the file system tree.
@@ -405,7 +431,7 @@ filesize=int Individual file sizes. May be a range, in which case fio
fill_device=bool
fill_fs=bool Sets size to something really large and waits for ENOSPC (no
space left on device) as the terminating condition. Only makes
- sense with sequential write. For a read workload, the mount
+ sense with sequential write. For a read workload, the mount
point will be filled first then IO started on the result. This
option doesn't make sense if operating on a raw device node,
since the size of that is already known by the file system.
diff --git a/filesetup.c b/filesetup.c
index e456186..88d6565 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -719,13 +719,14 @@ uint64_t get_start_offset(struct thread_data *td)
int setup_files(struct thread_data *td)
{
unsigned long long total_size, extend_size;
+ struct thread_options *o = &td->o;
struct fio_file *f;
unsigned int i;
int err = 0, need_extend;
dprint(FD_FILE, "setup files\n");
- if (td->o.read_iolog_file)
+ if (o->read_iolog_file)
goto done;
/*
@@ -753,15 +754,16 @@ int setup_files(struct thread_data *td)
total_size += f->real_file_size;
}
- if (td->o.fill_device)
+ if (o->fill_device)
td->fill_device_size = get_fs_free_counts(td);
/*
* device/file sizes are zero and no size given, punt
*/
- if ((!total_size || total_size == -1ULL) && !td->o.size &&
- !(td->io_ops->flags & FIO_NOIO) && !td->o.fill_device) {
- log_err("%s: you need to specify size=\n", td->o.name);
+ if ((!total_size || total_size == -1ULL) && !o->size &&
+ !(td->io_ops->flags & FIO_NOIO) && !o->fill_device &&
+ !(o->nr_files && (o->file_size_low || o->file_size_high))) {
+ log_err("%s: you need to specify size=\n", o->name);
td_verror(td, EINVAL, "total_file_size");
return 1;
}
@@ -776,27 +778,26 @@ int setup_files(struct thread_data *td)
for_each_file(td, f, i) {
f->file_offset = get_start_offset(td);
- if (!td->o.file_size_low) {
+ if (!o->file_size_low) {
/*
* no file size range given, file size is equal to
* total size divided by number of files. if that is
* zero, set it to the real file size.
*/
- f->io_size = td->o.size / td->o.nr_files;
+ f->io_size = o->size / o->nr_files;
if (!f->io_size)
f->io_size = f->real_file_size - f->file_offset;
- } else if (f->real_file_size < td->o.file_size_low ||
- f->real_file_size > td->o.file_size_high) {
- if (f->file_offset > td->o.file_size_low)
+ } else if (f->real_file_size < o->file_size_low ||
+ f->real_file_size > o->file_size_high) {
+ if (f->file_offset > o->file_size_low)
goto err_offset;
/*
* file size given. if it's fixed, use that. if it's a
* range, generate a random size in-between.
*/
- if (td->o.file_size_low == td->o.file_size_high) {
- f->io_size = td->o.file_size_low
- - f->file_offset;
- } else {
+ if (o->file_size_low == o->file_size_high)
+ f->io_size = o->file_size_low - f->file_offset;
+ else {
f->io_size = get_rand_file_size(td)
- f->file_offset;
}
@@ -806,15 +807,15 @@ int setup_files(struct thread_data *td)
if (f->io_size == -1ULL)
total_size = -1ULL;
else {
- if (td->o.size_percent)
- f->io_size = (f->io_size * td->o.size_percent) / 100;
+ if (o->size_percent)
+ f->io_size = (f->io_size * o->size_percent) / 100;
total_size += f->io_size;
}
if (f->filetype == FIO_TYPE_FILE &&
(f->io_size + f->file_offset) > f->real_file_size &&
!(td->io_ops->flags & FIO_DISKLESSIO)) {
- if (!td->o.create_on_open) {
+ if (!o->create_on_open) {
need_extend++;
extend_size += (f->io_size + f->file_offset);
} else
@@ -823,8 +824,8 @@ int setup_files(struct thread_data *td)
}
}
- if (!td->o.size || td->o.size > total_size)
- td->o.size = total_size;
+ if (!o->size || o->size > total_size)
+ o->size = total_size;
/*
* See if we need to extend some files
@@ -833,7 +834,7 @@ int setup_files(struct thread_data *td)
temp_stall_ts = 1;
if (output_format == FIO_OUTPUT_NORMAL)
log_info("%s: Laying out IO file(s) (%u file(s) /"
- " %lluMB)\n", td->o.name, need_extend,
+ " %lluMB)\n", o->name, need_extend,
extend_size >> 20);
for_each_file(td, f, i) {
@@ -844,7 +845,7 @@ int setup_files(struct thread_data *td)
assert(f->filetype == FIO_TYPE_FILE);
fio_file_clear_extend(f);
- if (!td->o.fill_device) {
+ if (!o->fill_device) {
old_len = f->real_file_size;
extend_len = f->io_size + f->file_offset -
old_len;
@@ -867,23 +868,23 @@ int setup_files(struct thread_data *td)
if (err)
return err;
- if (!td->o.zone_size)
- td->o.zone_size = td->o.size;
+ if (!o->zone_size)
+ o->zone_size = o->size;
/*
* iolog already set the total io size, if we read back
* stored entries.
*/
- if (!td->o.read_iolog_file)
- td->total_io_size = td->o.size * td->o.loops;
+ if (!o->read_iolog_file)
+ td->total_io_size = o->size * o->loops;
done:
- if (td->o.create_only)
+ if (o->create_only)
td->done = 1;
return 0;
err_offset:
- log_err("%s: you need to specify valid offset=\n", td->o.name);
+ log_err("%s: you need to specify valid offset=\n", o->name);
return 1;
}
diff --git a/fio.1 b/fio.1
index fe8ab76..0c2a243 100644
--- a/fio.1
+++ b/fio.1
@@ -151,6 +151,34 @@ a number of files by separating the names with a `:' character. `\-' is a
reserved name, meaning stdin or stdout, depending on the read/write direction
set.
.TP
+.BI filename_format \fR=\fPstr
+.B If sharing multiple files between jobs, it is usually necessary to have
+fio generate the exact names that you want. By default, fio will name a file
+based on the default file format specification of
+\fBjobname.jobnumber.filenumber\fP. With this option, that can be
+customized. Fio will recognize and replace the following keywords in this
+string:
+.RS
+.RS
+.TP
+.B $jobname
+The name of the worker thread or process.
+.TP
+.B $jobnum
+The incremental number of the worker thread or process.
+.TP
+.B $filenum
+The incremental number of the file for that worker thread or process.
+.RE
+.P
+To have dependent jobs share a set of files, this option can be set to
+have fio generate filenames that are shared between the two. For instance,
+if \fBtestfiles.$filenum\fR is specified, file number 4 for any job will
+be named \fBtestfiles.4\fR. The default of \fB$jobname.$jobnum.$filenum\fR
+will be used if no other format specifier is given.
+.RE
+.P
+.TP
.BI lockfile \fR=\fPstr
Fio defaults to not locking any files before it does IO to them. If a file or
file descriptor is shared, fio can serialize IO to that file to make the end
diff --git a/fio.h b/fio.h
index a1b2a93..db594ab 100644
--- a/fio.h
+++ b/fio.h
@@ -102,6 +102,7 @@ struct thread_options {
char *name;
char *directory;
char *filename;
+ char *filename_format;
char *opendir;
char *ioengine;
enum td_ddir td_ddir;
diff --git a/init.c b/init.c
index 9d15318..0da878d 100644
--- a/init.c
+++ b/init.c
@@ -799,6 +799,82 @@ static int setup_random_seeds(struct thread_data *td)
return 0;
}
+enum {
+ FPRE_NONE = 0,
+ FPRE_JOBNAME,
+ FPRE_JOBNUM,
+ FPRE_FILENUM
+};
+
+static struct fpre_keyword {
+ const char *keyword;
+ size_t strlen;
+ int key;
+} fpre_keywords[] = {
+ { .keyword = "$jobname", .key = FPRE_JOBNAME, },
+ { .keyword = "$jobnum", .key = FPRE_JOBNUM, },
+ { .keyword = "$filenum", .key = FPRE_FILENUM, },
+ { .keyword = NULL, },
+ };
+
+static char *make_filename(char *buf, struct thread_options *o,
+ const char *jobname, int jobnum, int filenum)
+{
+ struct fpre_keyword *f;
+ char copy[PATH_MAX];
+
+ if (!o->filename_format || !strlen(o->filename_format)) {
+ sprintf(buf, "%s.%d.%d", jobname, jobnum, filenum);
+ return NULL;
+ }
+
+ for (f = &fpre_keywords[0]; f->keyword; f++)
+ f->strlen = strlen(f->keyword);
+
+ strcpy(buf, o->filename_format);
+ memset(copy, 0, sizeof(copy));
+ for (f = &fpre_keywords[0]; f->keyword; f++) {
+ do {
+ size_t pre_len, post_start = 0;
+ char *str, *dst = copy;
+
+ str = strstr(buf, f->keyword);
+ if (!str)
+ break;
+
+ pre_len = str - buf;
+ if (strlen(str) != f->strlen)
+ post_start = pre_len + f->strlen;
+
+ if (pre_len) {
+ strncpy(dst, buf, pre_len);
+ dst += pre_len;
+ }
+
+ switch (f->key) {
+ case FPRE_JOBNAME:
+ dst += sprintf(dst, "%s", jobname);
+ break;
+ case FPRE_JOBNUM:
+ dst += sprintf(dst, "%d", jobnum);
+ break;
+ case FPRE_FILENUM:
+ dst += sprintf(dst, "%d", filenum);
+ break;
+ default:
+ assert(0);
+ break;
+ }
+
+ if (post_start)
+ strcpy(dst, buf + post_start);
+
+ strcpy(buf, copy);
+ } while (1);
+ }
+
+ return buf;
+}
/*
* Adds a job to the list of things todo. Sanitizes the various options
* to make sure we don't have conflicts, and initializes various
@@ -812,6 +888,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
unsigned int i;
char fname[PATH_MAX];
int numjobs, file_alloced;
+ struct thread_options *o = &td->o;
/*
* the def_thread is just for options, it's not a real job
@@ -835,26 +912,23 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
if (ioengine_load(td))
goto err;
- if (td->o.use_thread)
+ if (o->use_thread)
nr_thread++;
else
nr_process++;
- if (td->o.odirect)
+ if (o->odirect)
td->io_ops->flags |= FIO_RAWIO;
file_alloced = 0;
- if (!td->o.filename && !td->files_index && !td->o.read_iolog_file) {
+ if (!o->filename && !td->files_index && !o->read_iolog_file) {
file_alloced = 1;
- if (td->o.nr_files == 1 && exists_and_not_file(jobname))
+ if (o->nr_files == 1 && exists_and_not_file(jobname))
add_file(td, jobname);
else {
- for (i = 0; i < td->o.nr_files; i++) {
- sprintf(fname, "%s.%d.%d", jobname,
- td->thread_number, i);
- add_file(td, fname);
- }
+ for (i = 0; i < o->nr_files; i++)
+ add_file(td, make_filename(fname, o, jobname, td->thread_number, i));
}
}
@@ -879,9 +953,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
td->mutex = fio_mutex_init(FIO_MUTEX_LOCKED);
- td->ts.clat_percentiles = td->o.clat_percentiles;
- td->ts.percentile_precision = td->o.percentile_precision;
- memcpy(td->ts.percentile_list, td->o.percentile_list, sizeof(td->o.percentile_list));
+ td->ts.clat_percentiles = o->clat_percentiles;
+ td->ts.percentile_precision = o->percentile_precision;
+ memcpy(td->ts.percentile_list, o->percentile_list, sizeof(o->percentile_list));
for (i = 0; i < DDIR_RWDIR_CNT; i++) {
td->ts.clat_stat[i].min_val = ULONG_MAX;
@@ -889,9 +963,9 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
td->ts.lat_stat[i].min_val = ULONG_MAX;
td->ts.bw_stat[i].min_val = ULONG_MAX;
}
- td->ddir_seq_nr = td->o.ddir_seq_nr;
+ td->ddir_seq_nr = o->ddir_seq_nr;
- if ((td->o.stonewall || td->o.new_group) && prev_group_jobs) {
+ if ((o->stonewall || o->new_group) && prev_group_jobs) {
prev_group_jobs = 0;
groupid++;
}
@@ -907,43 +981,41 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
if (setup_rate(td))
goto err;
- if (td->o.write_lat_log) {
- setup_log(&td->lat_log, td->o.log_avg_msec);
- setup_log(&td->slat_log, td->o.log_avg_msec);
- setup_log(&td->clat_log, td->o.log_avg_msec);
+ if (o->write_lat_log) {
+ setup_log(&td->lat_log, o->log_avg_msec);
+ setup_log(&td->slat_log, o->log_avg_msec);
+ setup_log(&td->clat_log, o->log_avg_msec);
}
- if (td->o.write_bw_log)
- setup_log(&td->bw_log, td->o.log_avg_msec);
- if (td->o.write_iops_log)
- setup_log(&td->iops_log, td->o.log_avg_msec);
+ if (o->write_bw_log)
+ setup_log(&td->bw_log, o->log_avg_msec);
+ if (o->write_iops_log)
+ setup_log(&td->iops_log, o->log_avg_msec);
- if (!td->o.name)
- td->o.name = strdup(jobname);
+ if (!o->name)
+ o->name = strdup(jobname);
if (output_format == FIO_OUTPUT_NORMAL) {
if (!job_add_num) {
if (!strcmp(td->io_ops->name, "cpuio")) {
log_info("%s: ioengine=cpu, cpuload=%u,"
- " cpucycle=%u\n", td->o.name,
- td->o.cpuload,
- td->o.cpucycle);
+ " cpucycle=%u\n", o->name,
+ o->cpuload, o->cpucycle);
} else {
char *c1, *c2, *c3, *c4, *c5, *c6;
- c1 = to_kmg(td->o.min_bs[DDIR_READ]);
- c2 = to_kmg(td->o.max_bs[DDIR_READ]);
- c3 = to_kmg(td->o.min_bs[DDIR_WRITE]);
- c4 = to_kmg(td->o.max_bs[DDIR_WRITE]);
- c5 = to_kmg(td->o.min_bs[DDIR_TRIM]);
- c6 = to_kmg(td->o.max_bs[DDIR_TRIM]);
+ c1 = to_kmg(o->min_bs[DDIR_READ]);
+ c2 = to_kmg(o->max_bs[DDIR_READ]);
+ c3 = to_kmg(o->min_bs[DDIR_WRITE]);
+ c4 = to_kmg(o->max_bs[DDIR_WRITE]);
+ c5 = to_kmg(o->min_bs[DDIR_TRIM]);
+ c6 = to_kmg(o->max_bs[DDIR_TRIM]);
log_info("%s: (g=%d): rw=%s, bs=%s-%s/%s-%s/%s-%s,"
" ioengine=%s, iodepth=%u\n",
- td->o.name, td->groupid,
- ddir_str[td->o.td_ddir],
+ o->name, td->groupid,
+ ddir_str[o->td_ddir],
c1, c2, c3, c4, c5, c6,
- td->io_ops->name,
- td->o.iodepth);
+ td->io_ops->name, o->iodepth);
free(c1);
free(c2);
@@ -960,7 +1032,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num)
* recurse add identical jobs, clear numjobs and stonewall options
* as they don't apply to sub-jobs
*/
- numjobs = td->o.numjobs;
+ numjobs = o->numjobs;
while (--numjobs) {
struct thread_data *td_new = get_new_job(0, td, 1);
diff --git a/options.c b/options.c
index 3eb5fdc..bca217f 100644
--- a/options.c
+++ b/options.c
@@ -1132,6 +1132,14 @@ static struct fio_option options[FIO_MAX_OPTS] = {
.help = "File(s) to use for the workload",
},
{
+ .name = "filename_format",
+ .type = FIO_OPT_STR_STORE,
+ .off1 = td_var_offset(filename_format),
+ .prio = -1, /* must come after "directory" */
+ .help = "Override default $jobname.$jobnum.$filenum naming",
+ .def = "$jobname.$jobnum.$filenum",
+ },
+ {
.name = "kb_base",
.type = FIO_OPT_INT,
.off1 = td_var_offset(kb_base),
--
Jens Axboe
next prev parent reply other threads:[~2013-04-05 8:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-04 16:28 How to re-use default sequential filenames? Alan Hagge
2013-04-04 18:19 ` Jens Axboe
2013-04-04 18:41 ` Jens Axboe
2013-04-04 23:59 ` Michal Šmucr
2013-04-05 8:40 ` Jens Axboe
2013-04-05 19:24 ` Michal Šmucr
2013-04-05 19:31 ` Jens Axboe
2013-04-05 8:39 ` Jens Axboe [this message]
2013-04-07 23:28 ` Michal Šmucr
2013-04-08 11:17 ` Jens Axboe
2013-04-10 17:46 ` Alan Hagge
2013-04-11 11:18 ` Jens Axboe
2013-04-04 18:33 ` Matt Hayward
2013-04-04 19:02 ` Carl Zwanzig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130405083902.GL9683@kernel.dk \
--to=axboe@kernel.dk \
--cc=Alan.Hagge@warnerbros.com \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox