* Something odd with the fio random offset generator & time_based runs
@ 2012-02-18 1:19 Steven Lang
2012-02-18 1:31 ` Steven Lang
0 siblings, 1 reply; 4+ messages in thread
From: Steven Lang @ 2012-02-18 1:19 UTC (permalink / raw)
To: fio
While experimenting with some IO loads, I noticed that the random
offset selection wasn't as random as it should be. I started looking
for something complicated thinking maybe it had to do with large
devices. But when I took a step back and tried a simpler test, I
found it applied to any size.
Using a random map masks the problem because it is forced to cover all
the offsets. But for IO loads on large devices with multiple jobs,
random maps aren't useful.
Here's a simple demonstration on Linux. (Linux only because it uses
the special /proc/self/fd files. It can work on other OSs by writing
to a normal file or pipe and running the rest of the command on that
file/pipe.)
First, with a random map...
$ ./fio --name=test --rw=randread --iodepth=1 --ioengine=null --thread
--bs=128 --size=1k --runtime=6 --time_based
--write_iolog=/proc/self/fd/1 | grep 'test.1.0 read' | sort | uniq -c
184946 test.1.0 read 0 128
184945 test.1.0 read 128 128
184945 test.1.0 read 256 128
184946 test.1.0 read 384 128
184945 test.1.0 read 512 128
184946 test.1.0 read 640 128
184945 test.1.0 read 768 128
184945 test.1.0 read 896 128
As expected, it is pretty evenly spread. But now without a random map...
$ ./fio --name=test --rw=randread --iodepth=1 --ioengine=null --thread
--bs=128 --size=1k --runtime=6 --time_based --norandommap
--write_iolog=/proc/self/fd/1 | grep 'test.1.0 read' | sort | uniq -c
188409 test.1.0 read 0 128
565224 test.1.0 read 256 128
188408 test.1.0 read 384 128
565224 test.1.0 read 640 128
Even with a random map, this has an impact. The offsets which are
favored by the RNG will always be used first, before those unfavored.
Notice that the 3 offsets that got an extra IO were all in the set
returned in the second test. (Offsets 0, 384 and 640.)
This is not a new regression either. I tried using the OS random flag...
$ ./fio --name=test --rw=randread --iodepth=1 --ioengine=null --thread
--bs=128 --size=1k --runtime=6 --time_based --norandommap
--use_os_rand=1 --write_iolog=/proc/self/fd/1 | grep 'test.1.0 read' |
sort | uniq -c
558950 test.1.0 read 0 128
372632 test.1.0 read 384 128
558949 test.1.0 read 512 128
And when I tried going to an old 1.x release before the random
changes, it was identical to the use_os_rand flag.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Something odd with the fio random offset generator & time_based runs
2012-02-18 1:19 Something odd with the fio random offset generator & time_based runs Steven Lang
@ 2012-02-18 1:31 ` Steven Lang
2012-02-20 9:18 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Steven Lang @ 2012-02-18 1:31 UTC (permalink / raw)
To: fio
Looking into it more, what seems to be going on is that after writing
td->o.size bytes, a time_based job silently closes all open files,
resets, and starts over. Part of the process of resetting involves
re-seeding all the random numbers.
It seems like there are other implications to the reset as well. For
example, in async ioengines it drains all the remaining IO, which
results in less throughput. Potentially latency numbers could be
effected as well, either positively or negatively.
Would it make sense, for time_based runs, to not break out of the loop
in do_io() so the state doesn't get reset? This would also allow the
removal of the check in keep_running() for time_based, which would
then allow "time_based" and "loops" to be used together in a
meaningful way.
At first blush, this would probably just be a case of adding "||
(td->o.time_based)" to the loop condition in do_io(), and that
certainly addressed the case of random IO whe norandommap I initially
saw. But the same performance issues could apply to sequential loads
and random mapped loads, and just changing the loop condition didn't
have any effect on that. There is something else which is determining
that the end condition is met for sequential loads, read_iolog loads,
and randommap loads. (For read_iolog loads, time_based and loops have
no effect, so I'm not sure the value of changing that.)
Would it be difficult to change the behavior so only loops= causes
do_io() to be called repeatedly like that?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Something odd with the fio random offset generator & time_based runs
2012-02-18 1:31 ` Steven Lang
@ 2012-02-20 9:18 ` Jens Axboe
2012-02-21 22:19 ` Steven Lang
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2012-02-20 9:18 UTC (permalink / raw)
To: Steven Lang; +Cc: fio
On 02/18/2012 02:31 AM, Steven Lang wrote:
> Looking into it more, what seems to be going on is that after writing
> td->o.size bytes, a time_based job silently closes all open files,
> resets, and starts over. Part of the process of resetting involves
> re-seeding all the random numbers.
>
> It seems like there are other implications to the reset as well. For
> example, in async ioengines it drains all the remaining IO, which
> results in less throughput. Potentially latency numbers could be
> effected as well, either positively or negatively.
>
> Would it make sense, for time_based runs, to not break out of the loop
> in do_io() so the state doesn't get reset? This would also allow the
> removal of the check in keep_running() for time_based, which would
> then allow "time_based" and "loops" to be used together in a
> meaningful way.
>
> At first blush, this would probably just be a case of adding "||
> (td->o.time_based)" to the loop condition in do_io(), and that
> certainly addressed the case of random IO whe norandommap I initially
> saw. But the same performance issues could apply to sequential loads
> and random mapped loads, and just changing the loop condition didn't
> have any effect on that. There is something else which is determining
> that the end condition is met for sequential loads, read_iolog loads,
> and randommap loads. (For read_iolog loads, time_based and loops have
> no effect, so I'm not sure the value of changing that.)
>
> Would it be difficult to change the behavior so only loops= causes
> do_io() to be called repeatedly like that?
I agree that the correct solution would be to NOT jump out of the loop
and just keep going for time_based. As you point out, it might need
another change or two to ensure that we handle every condition. For
things like random map, we do need to clear it when it's full (or close
to), but that need not reset everything.
Do you have any time to hunt this further?
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Something odd with the fio random offset generator & time_based runs
2012-02-20 9:18 ` Jens Axboe
@ 2012-02-21 22:19 ` Steven Lang
0 siblings, 0 replies; 4+ messages in thread
From: Steven Lang @ 2012-02-21 22:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
On Mon, Feb 20, 2012 at 1:18 AM, Jens Axboe <axboe@kernel.dk> wrote:
> I agree that the correct solution would be to NOT jump out of the loop
> and just keep going for time_based. As you point out, it might need
> another change or two to ensure that we handle every condition. For
> things like random map, we do need to clear it when it's full (or close
> to), but that need not reset everything.
>
> Do you have any time to hunt this further?
Probably not for a week or two; I haven't even had enough time to dig
into the main IO loop to figure out where the other exit conditions
(End of iolog, random map full, etc) are coming from.
I also still owe a documentation patch to add the continue_on_error
option to the man page.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-02-21 22:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-18 1:19 Something odd with the fio random offset generator & time_based runs Steven Lang
2012-02-18 1:31 ` Steven Lang
2012-02-20 9:18 ` Jens Axboe
2012-02-21 22:19 ` Steven Lang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox