* [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
@ 2015-04-24 22:50 Akash Verma
2015-05-04 20:42 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Akash Verma @ 2015-04-24 22:50 UTC (permalink / raw)
To: fio; +Cc: Brian Fulton
If there's a load with a very small size, but forced to repeat using
time_based, the runtime reported at the end, as well as IOPS,
bandwidth are wrong.
E.g. using this control file:
[short_and_suite]
rw=randread
bs=128k
size=128k
ioengine=libaio
time_based=1
runtime=1
filename=datafile.tmp
(where datafile.tmp already exists, for a more dramatic demonstration)
The results I'm seeing on a hard drive are:
read : io=304128KB, bw=24750MB/s, iops=198000, runt= 12msec
This is a regression from FIO 2.0.5, which gives the following results:
read : io=242176KB, bw=242418KB/s, iops=1893 , runt= 999msec
which is much more reasonable.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-04-24 22:50 [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based Akash Verma
@ 2015-05-04 20:42 ` Jens Axboe
2015-05-04 21:25 ` Brian Fulton
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2015-05-04 20:42 UTC (permalink / raw)
To: Akash Verma, fio; +Cc: Brian Fulton
On 04/24/2015 04:50 PM, Akash Verma wrote:
> If there's a load with a very small size, but forced to repeat using
> time_based, the runtime reported at the end, as well as IOPS,
> bandwidth are wrong.
>
> E.g. using this control file:
>
> [short_and_suite]
> rw=randread
> bs=128k
> size=128k
> ioengine=libaio
> time_based=1
> runtime=1
> filename=datafile.tmp
>
> (where datafile.tmp already exists, for a more dramatic demonstration)
> The results I'm seeing on a hard drive are:
> read : io=304128KB, bw=24750MB/s, iops=198000, runt= 12msec
>
> This is a regression from FIO 2.0.5, which gives the following results:
> read : io=242176KB, bw=242418KB/s, iops=1893 , runt= 999msec
>
> which is much more reasonable.
Looks like a shorter run, try runtime=1s. And that makes me wonder if
the default conversion is buggy. What version are you comparing 2.0.5 to?
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-04 20:42 ` Jens Axboe
@ 2015-05-04 21:25 ` Brian Fulton
2015-05-04 22:02 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Brian Fulton @ 2015-05-04 21:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: Akash Verma, fio
Hi Jens,
I've been actively looking at this bug. It was introduced with change
e5437a073e658e8154b9e87bab5c7b3b06ed4255
(https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255).
In this change, the time granularity was changed from microseconds to
milliseconds. This change assumes that all loops take at least 1ms to
complete. This is not always true and tests where the loop gets
processed, regularly, in sub-millisecond time result in many zeroes
being summed which, ultimately, causes a completely inaccurate
runtime.
I have a fix for this, but I want to verify that it doesn't break
anything as it is changing the way the runtime is being calculated.
We are using fio version 2.2.6, though this issue should still be
present in the latest version (2.2.7).
Below is a patch fix based of the latest version of fio 2.2.7:
From f74e28614bfc7fcc87a3e43b4583a44ec7641dac Mon Sep 17 00:00:00 2001
From: Brian Fulton <bfulton@google.com>
Date: Mon, 4 May 2015 14:09:13 -0700
Subject: [PATCH] Fix Runtime, IOPS, bandwidth recorded incorrectly if small
size with time_based. Change runtime tabulation back to microseconds to
handle sub millisecond loops.
---
backend.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/backend.c b/backend.c
index 65a3e18..591bdf7 100644
--- a/backend.c
+++ b/backend.c
@@ -363,6 +363,19 @@ static inline int runtime_exceeded(struct
thread_data *td, struct timeval *t)
return 0;
}
+/*
+ * We need to update the runtime consistently in ms, but keep a running
+ * tally of the current elapsed time in microseconds for sub millisecond
+ * updates.
+ */
+static inline void update_runtime(struct thread_data *td,
+ unsigned long long *elapsed_us, const int ddir)
+{
+ td->ts.runtime[ddir] -= (elapsed_us[ddir] + 999) / 1000;
+ elapsed_us[ddir] += utime_since_now(&td->start);
+ td->ts.runtime[ddir] += (elapsed_us[ddir] + 999) / 1000;
+}
+
static int break_on_this_error(struct thread_data *td, enum fio_ddir ddir,
int *retptr)
{
@@ -1306,7 +1319,7 @@ static void io_workqueue_fn(struct thread_data
*td, struct io_u *io_u)
*/
static void *thread_main(void *data)
{
- unsigned long long elapsed;
+ unsigned long long elapsed_us[DDIR_TRIM+1] = {0};
struct thread_data *td = data;
struct thread_options *o = &td->o;
pthread_condattr_t attr;
@@ -1545,16 +1558,13 @@ static void *thread_main(void *data)
fio_mutex_down(stat_mutex);
if (td_read(td) && td->io_bytes[DDIR_READ]) {
- elapsed = mtime_since_now(&td->start);
- td->ts.runtime[DDIR_READ] += elapsed;
+ update_runtime(td, elapsed_us, DDIR_READ);
}
if (td_write(td) && td->io_bytes[DDIR_WRITE]) {
- elapsed = mtime_since_now(&td->start);
- td->ts.runtime[DDIR_WRITE] += elapsed;
+ update_runtime(td, elapsed_us, DDIR_WRITE);
}
if (td_trim(td) && td->io_bytes[DDIR_TRIM]) {
- elapsed = mtime_since_now(&td->start);
- td->ts.runtime[DDIR_TRIM] += elapsed;
+ update_runtime(td, elapsed_us, DDIR_TRIM);
}
fio_gettime(&td->start, NULL);
fio_mutex_up(stat_mutex);
@@ -1579,7 +1589,7 @@ static void *thread_main(void *data)
check_update_rusage(td);
fio_mutex_down(stat_mutex);
- td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start);
+ update_runtime(td, elapsed_us, DDIR_READ);
fio_gettime(&td->start, NULL);
fio_mutex_up(stat_mutex);
--
2.2.0.rc0.207.ga3a616c
I am currently performing regression tests on machines with this fix
to verify that the issues we were seeing are no longer present and to
verify that I didn't introduce any new issues.
- Brian
On Mon, May 4, 2015 at 1:42 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 04/24/2015 04:50 PM, Akash Verma wrote:
>>
>> If there's a load with a very small size, but forced to repeat using
>> time_based, the runtime reported at the end, as well as IOPS,
>> bandwidth are wrong.
>>
>> E.g. using this control file:
>>
>> [short_and_suite]
>> rw=randread
>> bs=128k
>> size=128k
>> ioengine=libaio
>> time_based=1
>> runtime=1
>> filename=datafile.tmp
>>
>> (where datafile.tmp already exists, for a more dramatic demonstration)
>> The results I'm seeing on a hard drive are:
>> read : io=304128KB, bw=24750MB/s, iops=198000, runt= 12msec
>>
>> This is a regression from FIO 2.0.5, which gives the following results:
>> read : io=242176KB, bw=242418KB/s, iops=1893 , runt= 999msec
>>
>> which is much more reasonable.
>
>
> Looks like a shorter run, try runtime=1s. And that makes me wonder if the
> default conversion is buggy. What version are you comparing 2.0.5 to?
>
> --
> Jens Axboe
>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-04 21:25 ` Brian Fulton
@ 2015-05-04 22:02 ` Jens Axboe
2015-05-04 22:08 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2015-05-04 22:02 UTC (permalink / raw)
To: Brian Fulton; +Cc: Akash Verma, fio
On 05/04/2015 03:25 PM, Brian Fulton wrote:
> Hi Jens,
>
> I've been actively looking at this bug. It was introduced with change
> e5437a073e658e8154b9e87bab5c7b3b06ed4255
> (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255).
>
> In this change, the time granularity was changed from microseconds to
> milliseconds. This change assumes that all loops take at least 1ms to
> complete. This is not always true and tests where the loop gets
> processed, regularly, in sub-millisecond time result in many zeroes
> being summed which, ultimately, causes a completely inaccurate
> runtime.
>
> I have a fix for this, but I want to verify that it doesn't break
> anything as it is changing the way the runtime is being calculated.
>
> We are using fio version 2.2.6, though this issue should still be
> present in the latest version (2.2.7).
>
>
> Below is a patch fix based of the latest version of fio 2.2.7:
Thanks, that makes more sense. Akash, can you confirm that this fixes it
for you?
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-04 22:02 ` Jens Axboe
@ 2015-05-04 22:08 ` Jens Axboe
2015-05-04 23:17 ` Brian Fulton
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2015-05-04 22:08 UTC (permalink / raw)
To: Brian Fulton; +Cc: Akash Verma, fio
On 05/04/2015 04:02 PM, Jens Axboe wrote:
> On 05/04/2015 03:25 PM, Brian Fulton wrote:
>> Hi Jens,
>>
>> I've been actively looking at this bug. It was introduced with change
>> e5437a073e658e8154b9e87bab5c7b3b06ed4255
>> (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255).
>>
>>
>> In this change, the time granularity was changed from microseconds to
>> milliseconds. This change assumes that all loops take at least 1ms to
>> complete. This is not always true and tests where the loop gets
>> processed, regularly, in sub-millisecond time result in many zeroes
>> being summed which, ultimately, causes a completely inaccurate
>> runtime.
>>
>> I have a fix for this, but I want to verify that it doesn't break
>> anything as it is changing the way the runtime is being calculated.
>>
>> We are using fio version 2.2.6, though this issue should still be
>> present in the latest version (2.2.7).
>>
>>
>> Below is a patch fix based of the latest version of fio 2.2.7:
>
> Thanks, that makes more sense. Akash, can you confirm that this fixes it
> for you?
Committed:
http://git.kernel.dk/?p=fio.git;a=commit;h=95603b7470fac1917cb11bc686e60b4339f8a2fc
Brian, I changed a few things:
- DDIR_RWDIR_CNT exists for array sizing
- You need a ',' after that zero to clear the array
- Style
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-04 22:08 ` Jens Axboe
@ 2015-05-04 23:17 ` Brian Fulton
2015-05-05 0:37 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Brian Fulton @ 2015-05-04 23:17 UTC (permalink / raw)
To: Jens Axboe; +Cc: Akash Verma, fio
Hi Jens,
Akash and I both discovered this bug and work together. He is
currently on holiday for the next two weeks or so. I am currently
using the same tests that originally discovered the bug to verify it
is fixed.
Out of curiosity, why is the comma required after the zero in order to
initialize all elements to zero? I used this initializer instead of
memset just as it was used here:
https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there
certain compilers that require the comma? I also checked the C99 spec
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section
6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not
loosing my mind.
Thanks,
Brian
On Mon, May 4, 2015 at 3:08 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 05/04/2015 04:02 PM, Jens Axboe wrote:
>>
>> On 05/04/2015 03:25 PM, Brian Fulton wrote:
>>>
>>> Hi Jens,
>>>
>>> I've been actively looking at this bug. It was introduced with change
>>> e5437a073e658e8154b9e87bab5c7b3b06ed4255
>>>
>>> (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255).
>>>
>>>
>>> In this change, the time granularity was changed from microseconds to
>>> milliseconds. This change assumes that all loops take at least 1ms to
>>> complete. This is not always true and tests where the loop gets
>>> processed, regularly, in sub-millisecond time result in many zeroes
>>> being summed which, ultimately, causes a completely inaccurate
>>> runtime.
>>>
>>> I have a fix for this, but I want to verify that it doesn't break
>>> anything as it is changing the way the runtime is being calculated.
>>>
>>> We are using fio version 2.2.6, though this issue should still be
>>> present in the latest version (2.2.7).
>>>
>>>
>>> Below is a patch fix based of the latest version of fio 2.2.7:
>>
>>
>> Thanks, that makes more sense. Akash, can you confirm that this fixes it
>> for you?
>
>
> Committed:
>
> http://git.kernel.dk/?p=fio.git;a=commit;h=95603b7470fac1917cb11bc686e60b4339f8a2fc
>
> Brian, I changed a few things:
>
> - DDIR_RWDIR_CNT exists for array sizing
> - You need a ',' after that zero to clear the array
> - Style
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-04 23:17 ` Brian Fulton
@ 2015-05-05 0:37 ` Jens Axboe
2015-05-05 1:12 ` Brian Fulton
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2015-05-05 0:37 UTC (permalink / raw)
To: Brian Fulton; +Cc: Akash Verma, fio
On 05/04/2015 05:17 PM, Brian Fulton wrote:
> Hi Jens,
>
> Akash and I both discovered this bug and work together. He is
> currently on holiday for the next two weeks or so. I am currently
> using the same tests that originally discovered the bug to verify it
> is fixed.
Ah perfect, I didn't realize that!
> Out of curiosity, why is the comma required after the zero in order to
> initialize all elements to zero? I used this initializer instead of
> memset just as it was used here:
> https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there
> certain compilers that require the comma? I also checked the C99 spec
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section
> 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not
> loosing my mind.
Actually I might be mistaken, it's just an array. For structs you need
the comma to ensure it zero fills the rest of the members, for arrays I
believe your approach was actually fine.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-05 0:37 ` Jens Axboe
@ 2015-05-05 1:12 ` Brian Fulton
2015-05-15 17:32 ` Brian Fulton
0 siblings, 1 reply; 9+ messages in thread
From: Brian Fulton @ 2015-05-05 1:12 UTC (permalink / raw)
To: Jens Axboe; +Cc: Akash Verma, fio
Hi Jens,
Thanks for your explanation regarding the comma initialization. I
just wanted to make sure I wasn't doing something that would break
certain compilers.
I was able to isolate the bug and reproduce it on one of my machines.
I then created the fix and found that the bug was no longer
reproducible with the fixed binary on the one machine (as of last
Friday May 1st 2015) .
I am currently running the tests that discovered the initial issue on
multiple machines. The tests may take some time to complete and upon
completion I'll post my findings.
- Brian
On Mon, May 4, 2015 at 5:37 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 05/04/2015 05:17 PM, Brian Fulton wrote:
>>
>> Hi Jens,
>>
>> Akash and I both discovered this bug and work together. He is
>> currently on holiday for the next two weeks or so. I am currently
>> using the same tests that originally discovered the bug to verify it
>> is fixed.
>
>
> Ah perfect, I didn't realize that!
>
>> Out of curiosity, why is the comma required after the zero in order to
>> initialize all elements to zero? I used this initializer instead of
>> memset just as it was used here:
>> https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there
>> certain compilers that require the comma? I also checked the C99 spec
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section
>> 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not
>> loosing my mind.
>
>
> Actually I might be mistaken, it's just an array. For structs you need the
> comma to ensure it zero fills the rest of the members, for arrays I believe
> your approach was actually fine.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based
2015-05-05 1:12 ` Brian Fulton
@ 2015-05-15 17:32 ` Brian Fulton
0 siblings, 0 replies; 9+ messages in thread
From: Brian Fulton @ 2015-05-15 17:32 UTC (permalink / raw)
To: Jens Axboe; +Cc: Akash Verma, fio
Hi Jens,
I've completed the regression tests and, from analyzing the results,
the bug appears to be no longer present. In addition, I did not see
any inconsistencies or issues that were caused by this change.
- Brian
On Mon, May 4, 2015 at 6:12 PM, Brian Fulton <bfulton@google.com> wrote:
> Hi Jens,
>
> Thanks for your explanation regarding the comma initialization. I
> just wanted to make sure I wasn't doing something that would break
> certain compilers.
>
> I was able to isolate the bug and reproduce it on one of my machines.
> I then created the fix and found that the bug was no longer
> reproducible with the fixed binary on the one machine (as of last
> Friday May 1st 2015) .
>
> I am currently running the tests that discovered the initial issue on
> multiple machines. The tests may take some time to complete and upon
> completion I'll post my findings.
>
> - Brian
>
> On Mon, May 4, 2015 at 5:37 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 05/04/2015 05:17 PM, Brian Fulton wrote:
>>>
>>> Hi Jens,
>>>
>>> Akash and I both discovered this bug and work together. He is
>>> currently on holiday for the next two weeks or so. I am currently
>>> using the same tests that originally discovered the bug to verify it
>>> is fixed.
>>
>>
>> Ah perfect, I didn't realize that!
>>
>>> Out of curiosity, why is the comma required after the zero in order to
>>> initialize all elements to zero? I used this initializer instead of
>>> memset just as it was used here:
>>> https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there
>>> certain compilers that require the comma? I also checked the C99 spec
>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section
>>> 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not
>>> loosing my mind.
>>
>>
>> Actually I might be mistaken, it's just an array. For structs you need the
>> comma to ensure it zero fills the rest of the members, for arrays I believe
>> your approach was actually fine.
>>
>> --
>> Jens Axboe
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-05-15 17:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-24 22:50 [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based Akash Verma
2015-05-04 20:42 ` Jens Axboe
2015-05-04 21:25 ` Brian Fulton
2015-05-04 22:02 ` Jens Axboe
2015-05-04 22:08 ` Jens Axboe
2015-05-04 23:17 ` Brian Fulton
2015-05-05 0:37 ` Jens Axboe
2015-05-05 1:12 ` Brian Fulton
2015-05-15 17:32 ` Brian Fulton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.