All of lore.kernel.org
 help / color / mirror / Atom feed
* terminating fio jobs upon reaching steady state
@ 2016-06-30 15:49 Vincent Fu
  2016-07-12 19:55 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Vincent Fu @ 2016-06-30 15:49 UTC (permalink / raw)
  To: Jens Axboe, fio@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 4835 bytes --]

Jens, I have been working on a patch to give fio the ability to terminate a job when steady state is attained. I have a crude implementation that is working at https://bitbucket.org/vincentfu/fio/commits/all but I'm hoping to receive some feedback on its design.

Currently, the feature works via helper_thread_main(). It collects data in a ring buffer for a specified duration and carries out the termination test when the ring buffer is full. If the test passes, the job terminates. If the test fails, the job continues, the oldest data point in the buffer is dropped, a new data point is added, and the termination test is carried out again. The cycle repeats until the job terminates because steady state was attained or for other reasons.

Reporting of results is done via JSON output and includes per second IOPS/BW measurements from the ring buffer.

Three new job options control this feature:

ss_dur = size of the ring buffer used for termination test. Data points are collected once per second.

ss_ramp = ramp time once job is underway before data collection begins (to avoid false positives detecting steady state for jobs run on freshly formatted SSDs)

ss = criterion:limit

criterion is one of iops, bw, iops_ slope, bw_slope

iops, bw                        terminate the job when the all data points in the buffer are within the specified limit of the mean

iops_slope, bw_slope    terminate the job when the magnitude of the least squares slope falls below the specified limit

limit can be either a fixed number or a percentage. ss=iops:10 will terminate the job when all IOPS values in the buffer are within 10 of the mean. ss=bw_slope:0.01% will terminate the job when the least squares slope falls below 0.01% of the mean bandwidth in the buffer.

This command:

./fio --name=test --rw=randrw --rwmixread=100 --filename=fio --time_based --runtime=30s --ss_dur=10 --ss_ramp=3 --ss=iops:10000 --output-format=json

results in something like this:

 "steadystate" : {
        "ss" : "iops:10000.000000",
        "duration" : 10,
        "steadystate_ramptime" : 3,
        "attained" : 1,
        "criterion" : 9664.599609,
        "data" : [
          613354,
          624473,
          623099,
          617032,
          617280,
          614679,
          607775,
          620743,
          617561,
          618400
        ] }


I'm wondering how important are the following requirements?

1) Full bandwidth/IOPS series for the entire runtime

I can imagine a use case where a user would want to see how the device behaves as it approaches steady state, in addition to the behavior of the device while it is in steady state. Right now the patch only reports IOPS/BW data from the ring buffer that was used for testing the termination criterion.

2) Ability to report latency data for the steady state period

An obvious use case is to run a workload in steady state for a long duration and then report its latency distribution. The only way I can imagine to have this feature provide latency data is to maintain a ring buffer of io_u_plat[DDIR_RWDIR_CNT][FIO_IO_U_PLAT_NR] arrays, one for each thread. Then the latency distribution for the steady state period will be the difference between the overall final latency distribution and the distribution at the head of the ring buffer. The disadvantage of this approach is high memory consumption. A work-around is to have one job use this feature to drive the device to steady state and then follow up with a second time-based identical follow-up job for latency data collection, although there will be no guarantee that the second job will meet the steady state criterion.

3) How should this interact with group_reporting?

Since reporting will be for the entire group, it seems sensible to have steady state detection operate at the group level, with BW and IOPS values summed over all threads in a group and all threads in a group terminating when steady state is attained.


Vincent

--
Vincent Fu
Software Dev Engr II
SanDisk | a Western Digital brand
Remote Worker, Rockville, MD, USA
T: 801 987 7079
Vincent.Fu@SanDisk.com<mailto:Vincent.Fu@SanDisk.com>

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

[-- Attachment #2: Type: text/html, Size: 6315 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-12 21:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 15:49 terminating fio jobs upon reaching steady state Vincent Fu
2016-07-12 19:55 ` Jens Axboe
2016-07-12 21:50   ` Vincent Fu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.