terminating fio jobs upon reaching steady state

All of lore.kernel.org
 help / color / mirror / Atom feed

* terminating fio jobs upon reaching steady state
@ 2016-06-30 15:49 Vincent Fu
  2016-07-12 19:55 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Vincent Fu @ 2016-06-30 15:49 UTC (permalink / raw)
  To: Jens Axboe, fio@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 4835 bytes --]

Jens, I have been working on a patch to give fio the ability to terminate a job when steady state is attained. I have a crude implementation that is working at https://bitbucket.org/vincentfu/fio/commits/all but I'm hoping to receive some feedback on its design.

Currently, the feature works via helper_thread_main(). It collects data in a ring buffer for a specified duration and carries out the termination test when the ring buffer is full. If the test passes, the job terminates. If the test fails, the job continues, the oldest data point in the buffer is dropped, a new data point is added, and the termination test is carried out again. The cycle repeats until the job terminates because steady state was attained or for other reasons.

Reporting of results is done via JSON output and includes per second IOPS/BW measurements from the ring buffer.

Three new job options control this feature:

ss_dur = size of the ring buffer used for termination test. Data points are collected once per second.

ss_ramp = ramp time once job is underway before data collection begins (to avoid false positives detecting steady state for jobs run on freshly formatted SSDs)

ss = criterion:limit

criterion is one of iops, bw, iops_ slope, bw_slope

iops, bw                        terminate the job when the all data points in the buffer are within the specified limit of the mean

iops_slope, bw_slope    terminate the job when the magnitude of the least squares slope falls below the specified limit

limit can be either a fixed number or a percentage. ss=iops:10 will terminate the job when all IOPS values in the buffer are within 10 of the mean. ss=bw_slope:0.01% will terminate the job when the least squares slope falls below 0.01% of the mean bandwidth in the buffer.

This command:

./fio --name=test --rw=randrw --rwmixread=100 --filename=fio --time_based --runtime=30s --ss_dur=10 --ss_ramp=3 --ss=iops:10000 --output-format=json

results in something like this:

 "steadystate" : {
        "ss" : "iops:10000.000000",
        "duration" : 10,
        "steadystate_ramptime" : 3,
        "attained" : 1,
        "criterion" : 9664.599609,
        "data" : [
          613354,
          624473,
          623099,
          617032,
          617280,
          614679,
          607775,
          620743,
          617561,
          618400
        ] }

I'm wondering how important are the following requirements?

1) Full bandwidth/IOPS series for the entire runtime

I can imagine a use case where a user would want to see how the device behaves as it approaches steady state, in addition to the behavior of the device while it is in steady state. Right now the patch only reports IOPS/BW data from the ring buffer that was used for testing the termination criterion.

2) Ability to report latency data for the steady state period

An obvious use case is to run a workload in steady state for a long duration and then report its latency distribution. The only way I can imagine to have this feature provide latency data is to maintain a ring buffer of io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] arrays, one for each thread. Then the latency distribution for the steady state period will be the difference between the overall final latency distribution and the distribution at the head of the ring buffer. The disadvantage of this approach is high memory consumption. A work-around is to have one job use this feature to drive the device to steady state and then follow up with a second time-based identical follow-up job for latency data collection, although there will be no guarantee that the second job will meet the steady state criterion.

3) How should this interact with group_reporting?

Since reporting will be for the entire group, it seems sensible to have steady state detection operate at the group level, with BW and IOPS values summed over all threads in a group and all threads in a group terminating when steady state is attained.

Vincent

--
Vincent Fu
Software Dev Engr II
SanDisk | a Western Digital brand
Remote Worker, Rockville, MD, USA
T: 801 987 7079
Vincent.Fu@SanDisk.com<mailto:Vincent.Fu@SanDisk.com>

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

[-- Attachment #2: Type: text/html, Size: 6315 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: terminating fio jobs upon reaching steady state
  2016-06-30 15:49 terminating fio jobs upon reaching steady state Vincent Fu
@ 2016-07-12 19:55 ` Jens Axboe
  2016-07-12 21:50   ` Vincent Fu
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2016-07-12 19:55 UTC (permalink / raw)
  To: Vincent Fu, fio@vger.kernel.org

On 06/30/2016 08:49 AM, Vincent Fu wrote:
> Jens, I have been working on a patch to give fio the ability to
> terminate a job when steady state is attained. I have a crude
> implementation that is working at
> https://bitbucket.org/vincentfu/fio/commits/all but I'm hoping to
> receive some feedback on its design.
>
> Currently, the feature works via helper_thread_main(). It collects data
> in a ring buffer for a specified duration and carries out the
> termination test when the ring buffer is full. If the test passes, the
> job terminates. If the test fails, the job continues, the oldest data
> point in the buffer is dropped, a new data point is added, and the
> termination test is carried out again. The cycle repeats until the job
> terminates because steady state was attained or for other reasons.

Vincent, where are we at on this? Would be nice to get this included. A 
few comments:

- The global 'steadystate' bool. Seems like this should be a per-job 
thing, and not global. For a specific group of threads, whether we are 
in steady state or not should be checked for those td's.

- Should still work with ramptime. If we're in ramp time, no checking. 
Start steady state checking once we are out of ramp time.

- Would be nice if the SS output was there for non-json. At least 
something for the normal output, I think we can safely ignore the 
terse/csv in this regard.

- Should struct steady_state just be embedded in the thread_stat? It 
needs to be transmitted across the wire for client/server. So either you 
need to split it in two, or you need to embed it and sanitize it for 
on-the-wire transmission. You could keep the td->ss, and have a version 
that we export and embed that in thread_stat? If you do that, then have 
a steady_state_bla that's embedded in both struct steady_state and 
struct thread_stat.

Rest of the comments would be around cleanup, lots of little extra 
spaces here and there. But I can script sanitize that, so didn't want to 
address that in here.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: terminating fio jobs upon reaching steady state
  2016-07-12 19:55 ` Jens Axboe
@ 2016-07-12 21:50   ` Vincent Fu
  0 siblings, 0 replies; 3+ messages in thread
From: Vincent Fu @ 2016-07-12 21:50 UTC (permalink / raw)
  To: Jens Axboe, fio@vger.kernel.org

On 2016-7-12 3:55 PM, Jens Axboe wrote:
> On 06/30/2016 08:49 AM, Vincent Fu wrote:
>> Jens, I have been working on a patch to give fio the ability to
>> terminate a job when steady state is attained. I have a crude
>> implementation that is working at
>> https://bitbucket.org/vincentfu/fio/commits/all but I'm hoping to
>> receive some feedback on its design.
>>
>> Currently, the feature works via helper_thread_main(). It collects data
>> in a ring buffer for a specified duration and carries out the
>> termination test when the ring buffer is full. If the test passes, the
>> job terminates. If the test fails, the job continues, the oldest data
>> point in the buffer is dropped, a new data point is added, and the
>> termination test is carried out again. The cycle repeats until the job
>> terminates because steady state was attained or for other reasons.
> Vincent, where are we at on this? Would be nice to get this included. A
> few comments:
I have a tweak for the JSON output to make, and I'm still a bit troubled
by the absence of a couple features (latency data for the steady state
period, BW/IOPS sequence for the entire run), but maybe those are tasks
to work on later.
> - The global 'steadystate' bool. Seems like this should be a per-job
> thing, and not global. For a specific group of threads, whether we are
> in steady state or not should be checked for those td's.
I set it as a global to avoid iterating over all the td's if a user is
just running a normal job and steadystate testing is not engaged at all,
but it's easy enough to make this a per-job variable.
> - Should still work with ramptime. If we're in ramp time, no checking.
> Start steady state checking once we are out of ramp time.
I do have one question about this. Suppose steady state detection and
group_reporting are enabled with 2 jobs:
job 0: ramp time = 10s
job 1: ramp time = 60s
What if job 0 reaches steady state before job 1's ramp time elapses? I
think the appropriate thing to do is to terminate all the jobs even
though job 1 is still in its ramp time. Does this sound reasonable to you?
> - Would be nice if the SS output was there for non-json. At least
> something for the normal output, I think we can safely ignore the
> terse/csv in this regard.
Ok. I will add this.
> - Should struct steady_state just be embedded in the thread_stat? It
> needs to be transmitted across the wire for client/server. So either you
> need to split it in two, or you need to embed it and sanitize it for
> on-the-wire transmission. You could keep the td->ss, and have a version
> that we export and embed that in thread_stat? If you do that, then have
> a steady_state_bla that's embedded in both struct steady_state and
> struct thread_stat.
I might need more direction from you to understand the implications of
the different choices but I will study this some more.

Thanks for the feedback.

Vincent


--
Vincent Fu
Software Dev Engr II
SanDisk | a Western Digital brand
Remote Worker, Rockville, MD, USA
T: 801 987 7079
Vincent.Fu@SanDisk.com

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-12 21:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 15:49 terminating fio jobs upon reaching steady state Vincent Fu
2016-07-12 19:55 ` Jens Axboe
2016-07-12 21:50   ` Vincent Fu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.