* [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based @ 2015-04-24 22:50 Akash Verma 2015-05-04 20:42 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Akash Verma @ 2015-04-24 22:50 UTC (permalink / raw) To: fio; +Cc: Brian Fulton If there's a load with a very small size, but forced to repeat using time_based, the runtime reported at the end, as well as IOPS, bandwidth are wrong. E.g. using this control file: [short_and_suite] rw=randread bs=128k size=128k ioengine=libaio time_based=1 runtime=1 filename=datafile.tmp (where datafile.tmp already exists, for a more dramatic demonstration) The results I'm seeing on a hard drive are: read : io=304128KB, bw=24750MB/s, iops=198000, runt= 12msec This is a regression from FIO 2.0.5, which gives the following results: read : io=242176KB, bw=242418KB/s, iops=1893 , runt= 999msec which is much more reasonable. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-04-24 22:50 [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based Akash Verma @ 2015-05-04 20:42 ` Jens Axboe 2015-05-04 21:25 ` Brian Fulton 0 siblings, 1 reply; 9+ messages in thread From: Jens Axboe @ 2015-05-04 20:42 UTC (permalink / raw) To: Akash Verma, fio; +Cc: Brian Fulton On 04/24/2015 04:50 PM, Akash Verma wrote: > If there's a load with a very small size, but forced to repeat using > time_based, the runtime reported at the end, as well as IOPS, > bandwidth are wrong. > > E.g. using this control file: > > [short_and_suite] > rw=randread > bs=128k > size=128k > ioengine=libaio > time_based=1 > runtime=1 > filename=datafile.tmp > > (where datafile.tmp already exists, for a more dramatic demonstration) > The results I'm seeing on a hard drive are: > read : io=304128KB, bw=24750MB/s, iops=198000, runt= 12msec > > This is a regression from FIO 2.0.5, which gives the following results: > read : io=242176KB, bw=242418KB/s, iops=1893 , runt= 999msec > > which is much more reasonable. Looks like a shorter run, try runtime=1s. And that makes me wonder if the default conversion is buggy. What version are you comparing 2.0.5 to? -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-04 20:42 ` Jens Axboe @ 2015-05-04 21:25 ` Brian Fulton 2015-05-04 22:02 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Brian Fulton @ 2015-05-04 21:25 UTC (permalink / raw) To: Jens Axboe; +Cc: Akash Verma, fio Hi Jens, I've been actively looking at this bug. It was introduced with change e5437a073e658e8154b9e87bab5c7b3b06ed4255 (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255). In this change, the time granularity was changed from microseconds to milliseconds. This change assumes that all loops take at least 1ms to complete. This is not always true and tests where the loop gets processed, regularly, in sub-millisecond time result in many zeroes being summed which, ultimately, causes a completely inaccurate runtime. I have a fix for this, but I want to verify that it doesn't break anything as it is changing the way the runtime is being calculated. We are using fio version 2.2.6, though this issue should still be present in the latest version (2.2.7). Below is a patch fix based of the latest version of fio 2.2.7: From f74e28614bfc7fcc87a3e43b4583a44ec7641dac Mon Sep 17 00:00:00 2001 From: Brian Fulton <bfulton@google.com> Date: Mon, 4 May 2015 14:09:13 -0700 Subject: [PATCH] Fix Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based. Change runtime tabulation back to microseconds to handle sub millisecond loops. --- backend.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/backend.c b/backend.c index 65a3e18..591bdf7 100644 --- a/backend.c +++ b/backend.c @@ -363,6 +363,19 @@ static inline int runtime_exceeded(struct thread_data *td, struct timeval *t) return 0; } +/* + * We need to update the runtime consistently in ms, but keep a running + * tally of the current elapsed time in microseconds for sub millisecond + * updates. + */ +static inline void update_runtime(struct thread_data *td, + unsigned long long *elapsed_us, const int ddir) +{ + td->ts.runtime[ddir] -= (elapsed_us[ddir] + 999) / 1000; + elapsed_us[ddir] += utime_since_now(&td->start); + td->ts.runtime[ddir] += (elapsed_us[ddir] + 999) / 1000; +} + static int break_on_this_error(struct thread_data *td, enum fio_ddir ddir, int *retptr) { @@ -1306,7 +1319,7 @@ static void io_workqueue_fn(struct thread_data *td, struct io_u *io_u) */ static void *thread_main(void *data) { - unsigned long long elapsed; + unsigned long long elapsed_us[DDIR_TRIM+1] = {0}; struct thread_data *td = data; struct thread_options *o = &td->o; pthread_condattr_t attr; @@ -1545,16 +1558,13 @@ static void *thread_main(void *data) fio_mutex_down(stat_mutex); if (td_read(td) && td->io_bytes[DDIR_READ]) { - elapsed = mtime_since_now(&td->start); - td->ts.runtime[DDIR_READ] += elapsed; + update_runtime(td, elapsed_us, DDIR_READ); } if (td_write(td) && td->io_bytes[DDIR_WRITE]) { - elapsed = mtime_since_now(&td->start); - td->ts.runtime[DDIR_WRITE] += elapsed; + update_runtime(td, elapsed_us, DDIR_WRITE); } if (td_trim(td) && td->io_bytes[DDIR_TRIM]) { - elapsed = mtime_since_now(&td->start); - td->ts.runtime[DDIR_TRIM] += elapsed; + update_runtime(td, elapsed_us, DDIR_TRIM); } fio_gettime(&td->start, NULL); fio_mutex_up(stat_mutex); @@ -1579,7 +1589,7 @@ static void *thread_main(void *data) check_update_rusage(td); fio_mutex_down(stat_mutex); - td->ts.runtime[DDIR_READ] += mtime_since_now(&td->start); + update_runtime(td, elapsed_us, DDIR_READ); fio_gettime(&td->start, NULL); fio_mutex_up(stat_mutex); -- 2.2.0.rc0.207.ga3a616c I am currently performing regression tests on machines with this fix to verify that the issues we were seeing are no longer present and to verify that I didn't introduce any new issues. - Brian On Mon, May 4, 2015 at 1:42 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 04/24/2015 04:50 PM, Akash Verma wrote: >> >> If there's a load with a very small size, but forced to repeat using >> time_based, the runtime reported at the end, as well as IOPS, >> bandwidth are wrong. >> >> E.g. using this control file: >> >> [short_and_suite] >> rw=randread >> bs=128k >> size=128k >> ioengine=libaio >> time_based=1 >> runtime=1 >> filename=datafile.tmp >> >> (where datafile.tmp already exists, for a more dramatic demonstration) >> The results I'm seeing on a hard drive are: >> read : io=304128KB, bw=24750MB/s, iops=198000, runt= 12msec >> >> This is a regression from FIO 2.0.5, which gives the following results: >> read : io=242176KB, bw=242418KB/s, iops=1893 , runt= 999msec >> >> which is much more reasonable. > > > Looks like a shorter run, try runtime=1s. And that makes me wonder if the > default conversion is buggy. What version are you comparing 2.0.5 to? > > -- > Jens Axboe > ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-04 21:25 ` Brian Fulton @ 2015-05-04 22:02 ` Jens Axboe 2015-05-04 22:08 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Jens Axboe @ 2015-05-04 22:02 UTC (permalink / raw) To: Brian Fulton; +Cc: Akash Verma, fio On 05/04/2015 03:25 PM, Brian Fulton wrote: > Hi Jens, > > I've been actively looking at this bug. It was introduced with change > e5437a073e658e8154b9e87bab5c7b3b06ed4255 > (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255). > > In this change, the time granularity was changed from microseconds to > milliseconds. This change assumes that all loops take at least 1ms to > complete. This is not always true and tests where the loop gets > processed, regularly, in sub-millisecond time result in many zeroes > being summed which, ultimately, causes a completely inaccurate > runtime. > > I have a fix for this, but I want to verify that it doesn't break > anything as it is changing the way the runtime is being calculated. > > We are using fio version 2.2.6, though this issue should still be > present in the latest version (2.2.7). > > > Below is a patch fix based of the latest version of fio 2.2.7: Thanks, that makes more sense. Akash, can you confirm that this fixes it for you? -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-04 22:02 ` Jens Axboe @ 2015-05-04 22:08 ` Jens Axboe 2015-05-04 23:17 ` Brian Fulton 0 siblings, 1 reply; 9+ messages in thread From: Jens Axboe @ 2015-05-04 22:08 UTC (permalink / raw) To: Brian Fulton; +Cc: Akash Verma, fio On 05/04/2015 04:02 PM, Jens Axboe wrote: > On 05/04/2015 03:25 PM, Brian Fulton wrote: >> Hi Jens, >> >> I've been actively looking at this bug. It was introduced with change >> e5437a073e658e8154b9e87bab5c7b3b06ed4255 >> (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255). >> >> >> In this change, the time granularity was changed from microseconds to >> milliseconds. This change assumes that all loops take at least 1ms to >> complete. This is not always true and tests where the loop gets >> processed, regularly, in sub-millisecond time result in many zeroes >> being summed which, ultimately, causes a completely inaccurate >> runtime. >> >> I have a fix for this, but I want to verify that it doesn't break >> anything as it is changing the way the runtime is being calculated. >> >> We are using fio version 2.2.6, though this issue should still be >> present in the latest version (2.2.7). >> >> >> Below is a patch fix based of the latest version of fio 2.2.7: > > Thanks, that makes more sense. Akash, can you confirm that this fixes it > for you? Committed: http://git.kernel.dk/?p=fio.git;a=commit;h=95603b7470fac1917cb11bc686e60b4339f8a2fc Brian, I changed a few things: - DDIR_RWDIR_CNT exists for array sizing - You need a ',' after that zero to clear the array - Style -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-04 22:08 ` Jens Axboe @ 2015-05-04 23:17 ` Brian Fulton 2015-05-05 0:37 ` Jens Axboe 0 siblings, 1 reply; 9+ messages in thread From: Brian Fulton @ 2015-05-04 23:17 UTC (permalink / raw) To: Jens Axboe; +Cc: Akash Verma, fio Hi Jens, Akash and I both discovered this bug and work together. He is currently on holiday for the next two weeks or so. I am currently using the same tests that originally discovered the bug to verify it is fixed. Out of curiosity, why is the comma required after the zero in order to initialize all elements to zero? I used this initializer instead of memset just as it was used here: https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there certain compilers that require the comma? I also checked the C99 spec http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not loosing my mind. Thanks, Brian On Mon, May 4, 2015 at 3:08 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 05/04/2015 04:02 PM, Jens Axboe wrote: >> >> On 05/04/2015 03:25 PM, Brian Fulton wrote: >>> >>> Hi Jens, >>> >>> I've been actively looking at this bug. It was introduced with change >>> e5437a073e658e8154b9e87bab5c7b3b06ed4255 >>> >>> (https://github.com/axboe/fio/commit/e5437a073e658e8154b9e87bab5c7b3b06ed4255). >>> >>> >>> In this change, the time granularity was changed from microseconds to >>> milliseconds. This change assumes that all loops take at least 1ms to >>> complete. This is not always true and tests where the loop gets >>> processed, regularly, in sub-millisecond time result in many zeroes >>> being summed which, ultimately, causes a completely inaccurate >>> runtime. >>> >>> I have a fix for this, but I want to verify that it doesn't break >>> anything as it is changing the way the runtime is being calculated. >>> >>> We are using fio version 2.2.6, though this issue should still be >>> present in the latest version (2.2.7). >>> >>> >>> Below is a patch fix based of the latest version of fio 2.2.7: >> >> >> Thanks, that makes more sense. Akash, can you confirm that this fixes it >> for you? > > > Committed: > > http://git.kernel.dk/?p=fio.git;a=commit;h=95603b7470fac1917cb11bc686e60b4339f8a2fc > > Brian, I changed a few things: > > - DDIR_RWDIR_CNT exists for array sizing > - You need a ',' after that zero to clear the array > - Style > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-04 23:17 ` Brian Fulton @ 2015-05-05 0:37 ` Jens Axboe 2015-05-05 1:12 ` Brian Fulton 0 siblings, 1 reply; 9+ messages in thread From: Jens Axboe @ 2015-05-05 0:37 UTC (permalink / raw) To: Brian Fulton; +Cc: Akash Verma, fio On 05/04/2015 05:17 PM, Brian Fulton wrote: > Hi Jens, > > Akash and I both discovered this bug and work together. He is > currently on holiday for the next two weeks or so. I am currently > using the same tests that originally discovered the bug to verify it > is fixed. Ah perfect, I didn't realize that! > Out of curiosity, why is the comma required after the zero in order to > initialize all elements to zero? I used this initializer instead of > memset just as it was used here: > https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there > certain compilers that require the comma? I also checked the C99 spec > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section > 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not > loosing my mind. Actually I might be mistaken, it's just an array. For structs you need the comma to ensure it zero fills the rest of the members, for arrays I believe your approach was actually fine. -- Jens Axboe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-05 0:37 ` Jens Axboe @ 2015-05-05 1:12 ` Brian Fulton 2015-05-15 17:32 ` Brian Fulton 0 siblings, 1 reply; 9+ messages in thread From: Brian Fulton @ 2015-05-05 1:12 UTC (permalink / raw) To: Jens Axboe; +Cc: Akash Verma, fio Hi Jens, Thanks for your explanation regarding the comma initialization. I just wanted to make sure I wasn't doing something that would break certain compilers. I was able to isolate the bug and reproduce it on one of my machines. I then created the fix and found that the bug was no longer reproducible with the fixed binary on the one machine (as of last Friday May 1st 2015) . I am currently running the tests that discovered the initial issue on multiple machines. The tests may take some time to complete and upon completion I'll post my findings. - Brian On Mon, May 4, 2015 at 5:37 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 05/04/2015 05:17 PM, Brian Fulton wrote: >> >> Hi Jens, >> >> Akash and I both discovered this bug and work together. He is >> currently on holiday for the next two weeks or so. I am currently >> using the same tests that originally discovered the bug to verify it >> is fixed. > > > Ah perfect, I didn't realize that! > >> Out of curiosity, why is the comma required after the zero in order to >> initialize all elements to zero? I used this initializer instead of >> memset just as it was used here: >> https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there >> certain compilers that require the comma? I also checked the C99 spec >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section >> 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not >> loosing my mind. > > > Actually I might be mistaken, it's just an array. For structs you need the > comma to ensure it zero fills the rest of the members, for arrays I believe > your approach was actually fine. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based 2015-05-05 1:12 ` Brian Fulton @ 2015-05-15 17:32 ` Brian Fulton 0 siblings, 0 replies; 9+ messages in thread From: Brian Fulton @ 2015-05-15 17:32 UTC (permalink / raw) To: Jens Axboe; +Cc: Akash Verma, fio Hi Jens, I've completed the regression tests and, from analyzing the results, the bug appears to be no longer present. In addition, I did not see any inconsistencies or issues that were caused by this change. - Brian On Mon, May 4, 2015 at 6:12 PM, Brian Fulton <bfulton@google.com> wrote: > Hi Jens, > > Thanks for your explanation regarding the comma initialization. I > just wanted to make sure I wasn't doing something that would break > certain compilers. > > I was able to isolate the bug and reproduce it on one of my machines. > I then created the fix and found that the bug was no longer > reproducible with the fixed binary on the one machine (as of last > Friday May 1st 2015) . > > I am currently running the tests that discovered the initial issue on > multiple machines. The tests may take some time to complete and upon > completion I'll post my findings. > > - Brian > > On Mon, May 4, 2015 at 5:37 PM, Jens Axboe <axboe@kernel.dk> wrote: >> On 05/04/2015 05:17 PM, Brian Fulton wrote: >>> >>> Hi Jens, >>> >>> Akash and I both discovered this bug and work together. He is >>> currently on holiday for the next two weeks or so. I am currently >>> using the same tests that originally discovered the bug to verify it >>> is fixed. >> >> >> Ah perfect, I didn't realize that! >> >>> Out of curiosity, why is the comma required after the zero in order to >>> initialize all elements to zero? I used this initializer instead of >>> memset just as it was used here: >>> https://github.com/axboe/fio/blob/master/cgroup.c#L26. Are there >>> certain compilers that require the comma? I also checked the C99 spec >>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section >>> 6.7.8, page 125, sections 1 and 21. I just want to make sure I'm not >>> loosing my mind. >> >> >> Actually I might be mistaken, it's just an array. For structs you need the >> comma to ensure it zero fills the rest of the members, for arrays I believe >> your approach was actually fine. >> >> -- >> Jens Axboe >> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-05-15 17:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-04-24 22:50 [Bug report] Runtime, IOPS, bandwidth recorded incorrectly if small size with time_based Akash Verma 2015-05-04 20:42 ` Jens Axboe 2015-05-04 21:25 ` Brian Fulton 2015-05-04 22:02 ` Jens Axboe 2015-05-04 22:08 ` Jens Axboe 2015-05-04 23:17 ` Brian Fulton 2015-05-05 0:37 ` Jens Axboe 2015-05-05 1:12 ` Brian Fulton 2015-05-15 17:32 ` Brian Fulton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.