Flexible I/O Tester development
 help / color / mirror / Atom feed
* [PATCH] Allow to reset offset_increment counter
@ 2014-05-23 12:48 Jiri Horky
  2014-05-23 16:59 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Horky @ 2014-05-23 12:48 UTC (permalink / raw)
  To: fio

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

Hi all,

because I got bitten by this multiple times I decided to give this patch
a try :)

Current implementation of offset calculation when offset_increment is in
effect uses global thread_number as follows:
    f->file_offset = td->o.start_offset + (td->thread_number - 1) *
td->o.offset_increment;

The thread number gets incremented for every job (subjob) so even you
have multiple jobs with different filenames, the offset calculation is
shared. I find this very unintuitive, especially in cases the offsets
gets past the device/file. For example, if one wants to run sequential
read test in 16 threads of multiple devices (/dev/sd{b,c,d}) in one
group, which are of  1TB size, and to eliminate caching effect he wants
each read to start at different offset, the config could look like
following:

[seq-read-16threads-/dev/sdb]
rw=read
numjobs=16
offset_increment=50g
bs=512k
filename=/dev/sdb

[seq-read-16threads-/dev/sdc]
rw=read
numjobs=16
offset_increment=50g
bs=512k
filename=/dev/sdc

[seq-read-16threads-/dev/sdd]
rw=read
numjobs=16
offset_increment=50g
bs=512k
filename=/dev/sdd

The result will be that only /dev/sdb is full tested, three threads will
be run against /dev/sdc and /dev/sdd does not receive any IO at all.

The proposed patch add new option "reset_offset_increment" that causes
the offset calculation to be started again from zero. The idea behind is
similar to patch proposed here
(http://www.spinics.net/lists/fio/msg01213.html) but is more flexible.
The patch applies against 2.1.9 tag.

Regards
Jiri Horky

[-- Attachment #2: reset_offset.patch --]
[-- Type: text/x-patch, Size: 4201 bytes --]

diff -Nru fio-2.1.9/filesetup.c fio-2.1.9.patched/filesetup.c
--- fio-2.1.9/filesetup.c	2014-05-20 09:39:25.000000000 +0200
+++ fio-2.1.9.patched/filesetup.c	2014-05-23 13:22:31.069984999 +0200
@@ -750,7 +750,7 @@
 		return f->real_file_size;
 
 	return td->o.start_offset +
-		(td->thread_number - 1) * td->o.offset_increment;
+		(td->offset_thread_number - 1) * td->o.offset_increment;
 }
 
 /*
diff -Nru fio-2.1.9/fio.1 fio-2.1.9.patched/fio.1
--- fio-2.1.9/fio.1	2014-05-20 09:39:25.000000000 +0200
+++ fio-2.1.9.patched/fio.1	2014-05-23 13:40:02.154049921 +0200
@@ -647,10 +647,19 @@
 .TP
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes the
-offset + offset_increment * thread_number, where the thread number is a counter
-that starts at 0 and is incremented for each job. This option is useful if
-there are several jobs which are intended to operate on a file in parallel in
-disjoint segments, with even spacing between the starting points.
+offset + offset_increment * offset_thread_number, where the offset thread number is a
+counter that starts at 0 and is incremented for each job unless reset_offset_increment
+is specified. This option is useful if there are several jobs which are intended to
+operate  on a file in parallel in disjoint segments, with even spacing between the 
+starting points.
+.TP
+.BI reset_offset_increment \fR=\fPbool
+By default, fio uses global thread numbers when computing offset with offset_increment.
+This may be often confusing as different jobs/threads may operate on different files
+but still share the same offset_increment. This option resets the offset_thread_number
+to 0. If your jobs operate on different files, use this option for every file
+to get offset_increment-per-file behavior. If you use subjobs (numbjobs=X), the 
+reset will be done only once before the first subjob starts.
 .TP
 .BI number_ios \fR=\fPint
 Fio will normally perform IOs until it has exhausted the size of the region
diff -Nru fio-2.1.9/fio.h fio-2.1.9.patched/fio.h
--- fio-2.1.9/fio.h	2014-05-20 09:39:07.000000000 +0200
+++ fio-2.1.9.patched/fio.h	2014-05-23 12:57:43.263971178 +0200
@@ -101,6 +101,7 @@
 	char verror[FIO_VERROR_SIZE];
 	pthread_t thread;
 	unsigned int thread_number;
+	unsigned int offset_thread_number;
 	unsigned int groupid;
 	struct thread_stat ts;
 
diff -Nru fio-2.1.9/init.c fio-2.1.9.patched/init.c
--- fio-2.1.9/init.c	2014-05-20 09:39:07.000000000 +0200
+++ fio-2.1.9.patched/init.c	2014-05-23 13:26:00.969989815 +0200
@@ -43,6 +43,7 @@
 
 static struct thread_data def_thread;
 struct thread_data *threads = NULL;
+static int offset_thread_number = 0;
 static char **job_sections;
 static int nr_job_sections;
 
@@ -1116,6 +1117,11 @@
 		groupid++;
 	}
 
+	if (o->reset_offset_increment) {
+		offset_thread_number = 0;
+	}
+
+	td->offset_thread_number = ++offset_thread_number;
 	td->groupid = groupid;
 	prev_group_jobs++;
 
@@ -1198,6 +1204,7 @@
 		td_new->o.numjobs = 1;
 		td_new->o.stonewall = 0;
 		td_new->o.new_group = 0;
+		td_new->o.reset_offset_increment = 0;
 
 		if (file_alloced) {
 			if (td_new->files) {
diff -Nru fio-2.1.9/options.c fio-2.1.9.patched/options.c
--- fio-2.1.9/options.c	2014-05-20 09:39:25.000000000 +0200
+++ fio-2.1.9.patched/options.c	2014-05-23 13:00:24.929232053 +0200
@@ -1669,6 +1669,18 @@
 		.group	= FIO_OPT_G_INVALID,
 	},
 	{
+		.name	= "reset_offset_increment",
+		.lname	= "Reset offset increment counter",
+		.type	= FIO_OPT_STR_VAL,
+		.off1	= td_var_offset(reset_offset_increment),
+		.help	= "Resets increment counter to 0",
+		.parent = "offset",
+		.hide	= 1,
+		.def	= "0",
+		.category = FIO_OPT_C_IO,
+		.group	= FIO_OPT_G_INVALID,
+	},
+	{
 		.name	= "number_ios",
 		.lname	= "Number of IOs to perform",
 		.type	= FIO_OPT_STR_VAL,
diff -Nru fio-2.1.9/thread_options.h fio-2.1.9.patched/thread_options.h
--- fio-2.1.9/thread_options.h	2014-05-20 09:39:07.000000000 +0200
+++ fio-2.1.9.patched/thread_options.h	2014-05-23 12:58:38.012721048 +0200
@@ -247,6 +247,7 @@
 	unsigned int flow_sleep;
 
 	unsigned long long offset_increment;
+	unsigned int reset_offset_increment;
 	unsigned long long number_ios;
 
 	unsigned int sync_file_range;

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Allow to reset offset_increment counter
  2014-05-23 12:48 [PATCH] Allow to reset offset_increment counter Jiri Horky
@ 2014-05-23 16:59 ` Jens Axboe
  2014-05-23 17:24   ` Jiri Horky
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2014-05-23 16:59 UTC (permalink / raw)
  To: Jiri Horky, fio

On 2014-05-23 06:48, Jiri Horky wrote:
> Hi all,
>
> because I got bitten by this multiple times I decided to give this patch
> a try :)
>
> Current implementation of offset calculation when offset_increment is in
> effect uses global thread_number as follows:
>      f->file_offset = td->o.start_offset + (td->thread_number - 1) *
> td->o.offset_increment;
>
> The thread number gets incremented for every job (subjob) so even you
> have multiple jobs with different filenames, the offset calculation is
> shared. I find this very unintuitive, especially in cases the offsets
> gets past the device/file. For example, if one wants to run sequential
> read test in 16 threads of multiple devices (/dev/sd{b,c,d}) in one
> group, which are of  1TB size, and to eliminate caching effect he wants
> each read to start at different offset, the config could look like
> following:

Maybe it would be better to have this offset calculation be on a 
per-thread-per-file basis? You are right in that it only makes sense 
within the same file or device, so maybe it'd be better to make it work 
more like you expect.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Allow to reset offset_increment counter
  2014-05-23 16:59 ` Jens Axboe
@ 2014-05-23 17:24   ` Jiri Horky
  2014-06-02 18:28     ` Jiri Horky
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Horky @ 2014-05-23 17:24 UTC (permalink / raw)
  To: Jens Axboe, fio

Hi Jens,
On 05/23/2014 06:59 PM, Jens Axboe wrote:
> On 2014-05-23 06:48, Jiri Horky wrote:
>> Hi all,
>>
>> because I got bitten by this multiple times I decided to give this patch
>> a try :)
>>
>> Current implementation of offset calculation when offset_increment is in
>> effect uses global thread_number as follows:
>>      f->file_offset = td->o.start_offset + (td->thread_number - 1) *
>> td->o.offset_increment;
>>
>> The thread number gets incremented for every job (subjob) so even you
>> have multiple jobs with different filenames, the offset calculation is
>> shared. I find this very unintuitive, especially in cases the offsets
>> gets past the device/file. For example, if one wants to run sequential
>> read test in 16 threads of multiple devices (/dev/sd{b,c,d}) in one
>> group, which are of  1TB size, and to eliminate caching effect he wants
>> each read to start at different offset, the config could look like
>> following:
>
> Maybe it would be better to have this offset calculation be on a
> per-thread-per-file basis? You are right in that it only makes sense
> within the same file or device, so maybe it'd be better to make it
> work more like you expect.
I agree it should definitely be file-based, I just wasn't sure how you
would express that in the config file. Or you mean that that the offset
calculation would not be shared between different jobs (not subjobs)
even if they share the same file?
The fact is that one can always calculate the start offset in the new
job definition if he needs the offset calculation to be shared. And if
there are multiple files within a job, the offset_increment should be
independent.

I will try to look at this.

Jiri



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Allow to reset offset_increment counter
  2014-05-23 17:24   ` Jiri Horky
@ 2014-06-02 18:28     ` Jiri Horky
  2014-07-25  7:47       ` Jiri Horky
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Horky @ 2014-06-02 18:28 UTC (permalink / raw)
  To: Jens Axboe, fio

[-- Attachment #1: Type: text/plain, Size: 2079 bytes --]

Hi,

so here is another try. I thought about it a little and I think that it
only makes sense to define offset_increment together with numjobs=X
setting, i.e. when using subjobs. The patch reflects this. Each subjob
starts at next offset_increment for each file it operates on.

Cheers
Jirka H.


On 05/23/2014 07:24 PM, Jiri Horky wrote:
> Hi Jens,
> On 05/23/2014 06:59 PM, Jens Axboe wrote:
>> On 2014-05-23 06:48, Jiri Horky wrote:
>>> Hi all,
>>>
>>> because I got bitten by this multiple times I decided to give this patch
>>> a try :)
>>>
>>> Current implementation of offset calculation when offset_increment is in
>>> effect uses global thread_number as follows:
>>>      f->file_offset = td->o.start_offset + (td->thread_number - 1) *
>>> td->o.offset_increment;
>>>
>>> The thread number gets incremented for every job (subjob) so even you
>>> have multiple jobs with different filenames, the offset calculation is
>>> shared. I find this very unintuitive, especially in cases the offsets
>>> gets past the device/file. For example, if one wants to run sequential
>>> read test in 16 threads of multiple devices (/dev/sd{b,c,d}) in one
>>> group, which are of  1TB size, and to eliminate caching effect he wants
>>> each read to start at different offset, the config could look like
>>> following:
>> Maybe it would be better to have this offset calculation be on a
>> per-thread-per-file basis? You are right in that it only makes sense
>> within the same file or device, so maybe it'd be better to make it
>> work more like you expect.
> I agree it should definitely be file-based, I just wasn't sure how you
> would express that in the config file. Or you mean that that the offset
> calculation would not be shared between different jobs (not subjobs)
> even if they share the same file?
> The fact is that one can always calculate the start offset in the new
> job definition if he needs the offset calculation to be shared. And if
> there are multiple files within a job, the offset_increment should be
> independent.
>
> I will try to look at this.
>
> Jiri
>


[-- Attachment #2: fio.offset_increment.patch --]
[-- Type: text/x-patch, Size: 2194 bytes --]

diff --git a/filesetup.c b/filesetup.c
index 84eaed6..aa908ad 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -752,7 +752,7 @@ uint64_t get_start_offset(struct thread_data *td, struct fio_file *f)
 		return f->real_file_size;
 
 	return td->o.start_offset +
-		(td->thread_number - 1) * td->o.offset_increment;
+		td->subjob_number * td->o.offset_increment;
 }
 
 /*
diff --git a/fio.1 b/fio.1
index 62f40ea..10f4b77 100644
--- a/fio.1
+++ b/fio.1
@@ -658,9 +658,10 @@ Offset in the file to start I/O. Data before the offset will not be touched.
 .BI offset_increment \fR=\fPint
 If this is provided, then the real offset becomes the
 offset + offset_increment * thread_number, where the thread number is a counter
-that starts at 0 and is incremented for each job. This option is useful if
-there are several jobs which are intended to operate on a file in parallel in
-disjoint segments, with even spacing between the starting points.
+that starts at 0 and is incremented for each sub-job (i.e. when numjobs option
+is specified). This option is useful if there are several jobs which are
+intended to operate on a file in parallel in disjoint segments, with even
+spacing between the starting points.
 .TP
 .BI number_ios \fR=\fPint
 Fio will normally perform IOs until it has exhausted the size of the region
diff --git a/fio.h b/fio.h
index 4d4af0a..b0c247e 100644
--- a/fio.h
+++ b/fio.h
@@ -101,6 +101,7 @@ struct thread_data {
 	char verror[FIO_VERROR_SIZE];
 	pthread_t thread;
 	unsigned int thread_number;
+	unsigned int subjob_number;
 	unsigned int groupid;
 	struct thread_stat ts;
 
diff --git a/init.c b/init.c
index a546861..a454e5e 100644
--- a/init.c
+++ b/init.c
@@ -363,6 +363,7 @@ static struct thread_data *get_new_job(int global, struct thread_data *parent,
 	profile_add_hooks(td);
 
 	td->thread_number = thread_number;
+	td->subjob_number = 0;
 
 	if (!parent->o.group_reporting)
 		stat_number++;
@@ -1198,6 +1199,7 @@ static int add_job(struct thread_data *td, const char *jobname, int job_add_num,
 		td_new->o.numjobs = 1;
 		td_new->o.stonewall = 0;
 		td_new->o.new_group = 0;
+		td_new->subjob_number = numjobs;
 
 		if (file_alloced) {
 			if (td_new->files) {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Allow to reset offset_increment counter
  2014-06-02 18:28     ` Jiri Horky
@ 2014-07-25  7:47       ` Jiri Horky
  2014-07-25  7:51         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Horky @ 2014-07-25  7:47 UTC (permalink / raw)
  To: Jens Axboe, fio

Hi Jens,

I wonder if you missed this email or you just didn't like the patch?

Regards
Jiri Horky

On 06/02/2014 08:28 PM, Jiri Horky wrote:
> Hi,
>
> so here is another try. I thought about it a little and I think that it
> only makes sense to define offset_increment together with numjobs=X
> setting, i.e. when using subjobs. The patch reflects this. Each subjob
> starts at next offset_increment for each file it operates on.
>
> Cheers
> Jirka H.
>
>
> On 05/23/2014 07:24 PM, Jiri Horky wrote:
>> Hi Jens,
>> On 05/23/2014 06:59 PM, Jens Axboe wrote:
>>> On 2014-05-23 06:48, Jiri Horky wrote:
>>>> Hi all,
>>>>
>>>> because I got bitten by this multiple times I decided to give this patch
>>>> a try :)
>>>>
>>>> Current implementation of offset calculation when offset_increment is in
>>>> effect uses global thread_number as follows:
>>>>      f->file_offset = td->o.start_offset + (td->thread_number - 1) *
>>>> td->o.offset_increment;
>>>>
>>>> The thread number gets incremented for every job (subjob) so even you
>>>> have multiple jobs with different filenames, the offset calculation is
>>>> shared. I find this very unintuitive, especially in cases the offsets
>>>> gets past the device/file. For example, if one wants to run sequential
>>>> read test in 16 threads of multiple devices (/dev/sd{b,c,d}) in one
>>>> group, which are of  1TB size, and to eliminate caching effect he wants
>>>> each read to start at different offset, the config could look like
>>>> following:
>>> Maybe it would be better to have this offset calculation be on a
>>> per-thread-per-file basis? You are right in that it only makes sense
>>> within the same file or device, so maybe it'd be better to make it
>>> work more like you expect.
>> I agree it should definitely be file-based, I just wasn't sure how you
>> would express that in the config file. Or you mean that that the offset
>> calculation would not be shared between different jobs (not subjobs)
>> even if they share the same file?
>> The fact is that one can always calculate the start offset in the new
>> job definition if he needs the offset calculation to be shared. And if
>> there are multiple files within a job, the offset_increment should be
>> independent.
>>
>> I will try to look at this.
>>
>> Jiri
>>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Allow to reset offset_increment counter
  2014-07-25  7:47       ` Jiri Horky
@ 2014-07-25  7:51         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2014-07-25  7:51 UTC (permalink / raw)
  To: Jiri Horky, fio

On 2014-07-25 09:47, Jiri Horky wrote:
> Hi Jens,
>
> I wonder if you missed this email or you just didn't like the patch?

I mostly delayed it and then forgot about it. I agree on the notion of 
using sub-jobs for the index, practically speaking, I think that is the 
way that people would use it anyway. It'd be confusing if unrelated jobs 
had different offset increments. And with a bit of simple math, you can 
even make different groups of jobs index the same parts, if you want.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-07-25  7:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-23 12:48 [PATCH] Allow to reset offset_increment counter Jiri Horky
2014-05-23 16:59 ` Jens Axboe
2014-05-23 17:24   ` Jiri Horky
2014-06-02 18:28     ` Jiri Horky
2014-07-25  7:47       ` Jiri Horky
2014-07-25  7:51         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox