running jobs serially

All of lore.kernel.org
 help / color / mirror / Atom feed

* running jobs serially
@ 2022-05-18 19:44 ` Antoine Beaupré
  2022-05-18 22:41   ` Vincent Fu
  0 siblings, 1 reply; 5+ messages in thread
From: Antoine Beaupré @ 2022-05-18 19:44 UTC (permalink / raw)
  To: fio

[-- Attachment #1: Type: text/plain, Size: 6472 bytes --]

Hi,

One of the things I've been struggling for with fio for a while is how
to run batches of jobs with it.

I know I can call fio multiple times with different job files or
parameters, that's easy. But what I'd like to do is have a *single* job
file (or even multiple, actually) that would describe *multiple*
workloads that would need to be tested.

In particular, I'm looking for a way to reproduce the benchmarks
suggested here:

https://arstechnica.com/gadgets/2020/02/how-fast-are-your-disks-find-out-the-open-source-way-with-fio/

... without having to write all the glue the author had to make here:

https://github.com/jimsalterjrs/fio-test-scaffolding/

... which is quite a bit of goo.

I was hoping a simple thing like this would just do it:

[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1

# Single 4KiB random read/write process
[randread-4k-4g-1x]
stonewall=1
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1

[randwrite-4k-4g-1x]
stonewall=1
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1

... but looking at the "normal" --output-format, it *looks* like the
jobs are all started at the same time. The files certainly seem to be
allocated all at once:

root@curie:/home# fio ars.fio 
randread-4k-4g-1x: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
randwrite-4k-4g-1x: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.25
Starting 2 processes
Jobs: 1 (f=1): [r(1),P(1)][0.0%][r=16.5MiB/s][r=4228 IOPS][eta 49710d:06h:28m:14s]
randread-4k-4g-1x: (groupid=0, jobs=1): err= 0: pid=1033470: Wed May 18 15:41:04 2022
  read: IOPS=4754, BW=18.6MiB/s (19.5MB/s)(18.6MiB/1001msec)
    slat (nsec): min=1429, max=391678, avg=3239.76, stdev=6013.78
    clat (usec): min=163, max=6917, avg=205.05, stdev=108.47
     lat (usec): min=166, max=7308, avg=208.29, stdev=114.12
    clat percentiles (usec):
     |  1.00th=[  169],  5.00th=[  174], 10.00th=[  174], 20.00th=[  178],
     | 30.00th=[  182], 40.00th=[  184], 50.00th=[  196], 60.00th=[  200],
     | 70.00th=[  204], 80.00th=[  215], 90.00th=[  239], 95.00th=[  269],
     | 99.00th=[  412], 99.50th=[  478], 99.90th=[  635], 99.95th=[ 1045],
     | 99.99th=[ 6915]
   bw (  KiB/s): min=19248, max=19248, per=100.00%, avg=19248.00, stdev= 0.00, samples=1
   iops        : min= 4812, max= 4812, avg=4812.00, stdev= 0.00, samples=1
  lat (usec)   : 250=92.79%, 500=6.81%, 750=0.34%
  lat (msec)   : 2=0.04%, 10=0.02%
  cpu          : usr=2.90%, sys=2.90%, ctx=4767, majf=0, minf=44
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=4759,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randwrite-4k-4g-1x: (groupid=1, jobs=1): err= 0: pid=1033477: Wed May 18 15:41:04 2022
  write: IOPS=32.3k, BW=126MiB/s (132MB/s)(174MiB/1378msec); 0 zone resets
    slat (nsec): min=955, max=326042, avg=2726.11, stdev=2538.38
    clat (nsec): min=343, max=6896.5k, avg=18139.12, stdev=69899.75
     lat (usec): min=10, max=6899, avg=20.87, stdev=70.02
    clat percentiles (usec):
     |  1.00th=[   11],  5.00th=[   11], 10.00th=[   12], 20.00th=[   12],
     | 30.00th=[   13], 40.00th=[   13], 50.00th=[   14], 60.00th=[   15],
     | 70.00th=[   16], 80.00th=[   18], 90.00th=[   26], 95.00th=[   34],
     | 99.00th=[   62], 99.50th=[   91], 99.90th=[  231], 99.95th=[  326],
     | 99.99th=[ 4047]
   bw (  KiB/s): min=196064, max=196064, per=100.00%, avg=196064.00, stdev= 0.00, samples=1
   iops        : min=49016, max=49016, avg=49016.00, stdev= 0.00, samples=1
  lat (nsec)   : 500=0.01%, 750=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.21%, 20=83.60%, 50=14.51%
  lat (usec)   : 100=1.22%, 250=0.37%, 500=0.03%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=11.84%, sys=18.01%, ctx=46292, majf=0, minf=46
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,44457,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=18.6MiB/s (19.5MB/s), 18.6MiB/s-18.6MiB/s (19.5MB/s-19.5MB/s), io=18.6MiB (19.5MB), run=1001-1001msec

Run status group 1 (all jobs):
  WRITE: bw=126MiB/s (132MB/s), 126MiB/s-126MiB/s (132MB/s-132MB/s), io=174MiB (182MB), run=1378-1378msec

Disk stats (read/write):
    dm-2: ios=4759/43132, merge=0/0, ticks=864/7281132, in_queue=7281996, util=67.32%, aggrios=4759/43181, aggrmerge=0/0, aggrticks=864/7378584, aggrin_queue=7379448, aggrutil=67.17%
    dm-0: ios=4759/43181, merge=0/0, ticks=864/7378584, in_queue=7379448, util=67.17%, aggrios=4759/43124, aggrmerge=0/57, aggrticks=778/8680, aggrin_queue=9487, aggrutil=67.02%
  sda: ios=4759/43124, merge=0/57, ticks=778/8680, in_queue=9487, util=67.02%


Those timestamps, specifically, should not be the same:

randwrite-4k-4g-1x: (groupid=1, jobs=1): err= 0: pid=1033477: Wed May 18 15:41:04 2022
randread-4k-4g-1x: (groupid=0, jobs=1): err= 0: pid=1033470: Wed May 18 15:41:04 2022

Am I missing something? Or are job files just *not* designed to run
things serially?

I looked in the archives for this, and only found this (unfulfilled,
AFAICT) request:

https://lore.kernel.org/fio/CANvN+emA01TZfbBx4aU+gg5CKfy+AEX_gZW7Jz4HMHvwkdBNoQ@mail.gmail.com/

and:

https://lore.kernel.org/fio/MWHPR04MB0320ED986E73B1E9994929B38F470@MWHPR04MB0320.namprd04.prod.outlook.com/

... but that talks about serialize_overlap which seems to be specific to
handling requests sent in parallel, not serializing jobs themselves.

For now, it feels like i need to revert to shell scripts and that's kind
of a little annoying: it would be really nice to be able to carry a full
workfload in a single job file.

Thanks, and sorry if that's a dumb question. :)

-- 
Antoine Beaupré
torproject.org system administration

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: running jobs serially
  2022-05-18 19:44 ` running jobs serially Antoine Beaupré
@ 2022-05-18 22:41   ` Vincent Fu
  2022-05-19  1:16     ` Antoine Beaupré
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Fu @ 2022-05-18 22:41 UTC (permalink / raw)
  To: Antoine Beaupré, fio@vger.kernel.org

> -----Original Message-----
> From: Antoine Beaupré [mailto:anarcat@torproject.org]
> Sent: Wednesday, May 18, 2022 3:45 PM
> To: fio@vger.kernel.org
> Subject: running jobs serially
> 
> Hi,
> 
> One of the things I've been struggling for with fio for a while is how
> to run batches of jobs with it.
> 
> I know I can call fio multiple times with different job files or
> parameters, that's easy. But what I'd like to do is have a *single* job
> file (or even multiple, actually) that would describe *multiple*
> workloads that would need to be tested.
> 
> In particular, I'm looking for a way to reproduce the benchmarks
> suggested here:
> 
> https://arstechnica.com/gadgets/2020/02/how-fast-are-your-disks-find-
> out-the-open-source-way-with-fio/
> 
> ... without having to write all the glue the author had to make here:
> 
> https://protect2.fireeye.com/v1/url?k=a07448a5-c1ff5de2-a075c3ea-
> 000babff99aa-0e92de5a06afec7e&q=1&e=72adeb5d-5707-4c64-bfbd-
> d6433a957054&u=https%3A%2F%2Fgithub.com%2Fjimsalterjrs%2Ffio-
> test-scaffolding%2F
> 
> ... which is quite a bit of goo.
> 
> I was hoping a simple thing like this would just do it:
> 
> [global]
> # cargo-culting Salter
> fallocate=none
> ioengine=posixaio
> runtime=60
> time_based=1
> end_fsync=1
> stonewall=1
> group_reporting=1
> 
> # Single 4KiB random read/write process
> [randread-4k-4g-1x]
> stonewall=1
> rw=randread
> bs=4k
> size=4g
> numjobs=1
> iodepth=1
> 
> [randwrite-4k-4g-1x]
> stonewall=1
> rw=randwrite
> bs=4k
> size=4g
> numjobs=1
> iodepth=1
> 
> ... but looking at the "normal" --output-format, it *looks* like the
> jobs are all started at the same time. The files certainly seem to be
> allocated all at once:
> 
> root@curie:/home# fio ars.fio
> randread-4k-4g-1x: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-
> 4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
> randwrite-4k-4g-1x: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W)
> 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
> fio-3.25
> Starting 2 processes
> Jobs: 1 (f=1): [r(1),P(1)][0.0%][r=16.5MiB/s][r=4228 IOPS][eta
> 49710d:06h:28m:14s]
> randread-4k-4g-1x: (groupid=0, jobs=1): err= 0: pid=1033470: Wed May
> 18 15:41:04 2022
>   read: IOPS=4754, BW=18.6MiB/s (19.5MB/s)(18.6MiB/1001msec)
>     slat (nsec): min=1429, max=391678, avg=3239.76, stdev=6013.78
>     clat (usec): min=163, max=6917, avg=205.05, stdev=108.47
>      lat (usec): min=166, max=7308, avg=208.29, stdev=114.12
>     clat percentiles (usec):
>      |  1.00th=[  169],  5.00th=[  174], 10.00th=[  174], 20.00th=[  178],
>      | 30.00th=[  182], 40.00th=[  184], 50.00th=[  196], 60.00th=[  200],
>      | 70.00th=[  204], 80.00th=[  215], 90.00th=[  239], 95.00th=[  269],
>      | 99.00th=[  412], 99.50th=[  478], 99.90th=[  635], 99.95th=[ 1045],
>      | 99.99th=[ 6915]
>    bw (  KiB/s): min=19248, max=19248, per=100.00%, avg=19248.00,
> stdev= 0.00, samples=1
>    iops        : min= 4812, max= 4812, avg=4812.00, stdev= 0.00, samples=1
>   lat (usec)   : 250=92.79%, 500=6.81%, 750=0.34%
>   lat (msec)   : 2=0.04%, 10=0.02%
>   cpu          : usr=2.90%, sys=2.90%, ctx=4767, majf=0, minf=44
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      issued rwts: total=4759,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
> randwrite-4k-4g-1x: (groupid=1, jobs=1): err= 0: pid=1033477: Wed May
> 18 15:41:04 2022
>   write: IOPS=32.3k, BW=126MiB/s (132MB/s)(174MiB/1378msec); 0 zone
> resets
>     slat (nsec): min=955, max=326042, avg=2726.11, stdev=2538.38
>     clat (nsec): min=343, max=6896.5k, avg=18139.12, stdev=69899.75
>      lat (usec): min=10, max=6899, avg=20.87, stdev=70.02
>     clat percentiles (usec):
>      |  1.00th=[   11],  5.00th=[   11], 10.00th=[   12], 20.00th=[   12],
>      | 30.00th=[   13], 40.00th=[   13], 50.00th=[   14], 60.00th=[   15],
>      | 70.00th=[   16], 80.00th=[   18], 90.00th=[   26], 95.00th=[   34],
>      | 99.00th=[   62], 99.50th=[   91], 99.90th=[  231], 99.95th=[  326],
>      | 99.99th=[ 4047]
>    bw (  KiB/s): min=196064, max=196064, per=100.00%, avg=196064.00,
> stdev= 0.00, samples=1
>    iops        : min=49016, max=49016, avg=49016.00, stdev= 0.00, samples=1
>   lat (nsec)   : 500=0.01%, 750=0.01%
>   lat (usec)   : 2=0.01%, 4=0.01%, 10=0.21%, 20=83.60%, 50=14.51%
>   lat (usec)   : 100=1.22%, 250=0.37%, 500=0.03%, 1000=0.01%
>   lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
>   cpu          : usr=11.84%, sys=18.01%, ctx=46292, majf=0, minf=46
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      issued rwts: total=0,44457,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>    READ: bw=18.6MiB/s (19.5MB/s), 18.6MiB/s-18.6MiB/s (19.5MB/s-
> 19.5MB/s), io=18.6MiB (19.5MB), run=1001-1001msec
> 
> Run status group 1 (all jobs):
>   WRITE: bw=126MiB/s (132MB/s), 126MiB/s-126MiB/s (132MB/s-
> 132MB/s), io=174MiB (182MB), run=1378-1378msec
> 
> Disk stats (read/write):
>     dm-2: ios=4759/43132, merge=0/0, ticks=864/7281132,
> in_queue=7281996, util=67.32%, aggrios=4759/43181, aggrmerge=0/0,
> aggrticks=864/7378584, aggrin_queue=7379448, aggrutil=67.17%
>     dm-0: ios=4759/43181, merge=0/0, ticks=864/7378584,
> in_queue=7379448, util=67.17%, aggrios=4759/43124, aggrmerge=0/57,
> aggrticks=778/8680, aggrin_queue=9487, aggrutil=67.02%
>   sda: ios=4759/43124, merge=0/57, ticks=778/8680, in_queue=9487,
> util=67.02%
> 
> 
> Those timestamps, specifically, should not be the same:
> 
> randwrite-4k-4g-1x: (groupid=1, jobs=1): err= 0: pid=1033477: Wed May
> 18 15:41:04 2022
> randread-4k-4g-1x: (groupid=0, jobs=1): err= 0: pid=1033470: Wed May
> 18 15:41:04 2022
> 
> Am I missing something? Or are job files just *not* designed to run
> things serially?
> 
> I looked in the archives for this, and only found this (unfulfilled,
> AFAICT) request:
> 
> https://lore.kernel.org/fio/CANvN+emA01TZfbBx4aU+gg5CKfy+AEX_gZ
> W7Jz4HMHvwkdBNoQ@mail.gmail.com/
> 
> and:
> 
> https://lore.kernel.org/fio/MWHPR04MB0320ED986E73B1E9994929B38F
> 470@MWHPR04MB0320.namprd04.prod.outlook.com/
> 
> ... but that talks about serialize_overlap which seems to be specific to
> handling requests sent in parallel, not serializing jobs themselves.
> 
> For now, it feels like i need to revert to shell scripts and that's kind
> of a little annoying: it would be really nice to be able to carry a full
> workfload in a single job file.
> 
> Thanks, and sorry if that's a dumb question. :)
> 
> --
> Antoine Beaupré
> torproject.org system administration

The jobs you are running have the *stonewall* option which should make them run
serially unless something is very broken. Here is documentation for the
stonewall option:

https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-stonewall

You could add the write_bw_log=filename and log_unix_epoch=1 options to
confirm. You should see a timestamp for each IO and should be able to make
sure that all the writes are happening after the reads.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: running jobs serially
  2022-05-18 22:41   ` Vincent Fu
@ 2022-05-19  1:16     ` Antoine Beaupré
  2022-05-19 13:35       ` Vincent Fu
  0 siblings, 1 reply; 5+ messages in thread
From: Antoine Beaupré @ 2022-05-19  1:16 UTC (permalink / raw)
  To: Vincent Fu, fio@vger.kernel.org

On 2022-05-18 22:41:24, Vincent Fu wrote:
> The jobs you are running have the *stonewall* option which should make them run
> serially unless something is very broken.

Yeah, so that's something I added deliberately for that purpose, but two
things make me think it's not working properly.

 1. the timestamps are identical for the two jobs

        randwrite-4k-4g-1x: (groupid=1, jobs=1): err= 0: pid=1033477: Wed May 18 15:41:04 2022
         randread-4k-4g-1x: (groupid=0, jobs=1): err= 0: pid=1033470: Wed May 18 15:41:04 2022

 2. when fio starts, it says:

         Starting 2 processes

    i would have expected it to start one process at a time

 3. when running larger batches, it starts laying out all files before
    starting the jobs:

$ fio ars.fio
randread-4k-4g-1x: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
randwrite-4k-4g-1x: (g=1): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
randread-64k-256m-16x: (g=2): rw=randread, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=posixaio, iodepth=16
...
randwrite-64k-256m-16x: (g=3): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=posixaio, iodepth=16
...
randread-1m-16g-1x: (g=4): rw=randread, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
randwrite-1m-16g-1x: (g=5): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.25
Starting 36 processes
randread-4k-4g-1x: Laying out IO file (1 file / 4096MiB)
randwrite-4k-4g-1x: Laying out IO file (1 file / 4096MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randwrite-64k-256m-16x: Laying out IO file (1 file / 256MiB)
randread-1m-16g-1x: Laying out IO file (1 file / 16384MiB)
[...]

I would have expected those files to be "laid" right before each job
starts, not all at once, in the beginning, although I'm not sure what
difference that would make. Maybe it would save disk space, at least?
Say if I have limited space left on the partition and I want to run
multiple large jobs, I'd expect each job to collect after itself..

> Here is documentation for the stonewall option:
>
> https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-stonewall

Speaking of which, it's not clear to me if I need to add stonewall to
each job or if I can just add it to the top-level global options and be
done with it...

> You could add the write_bw_log=filename and log_unix_epoch=1 options to
> confirm. You should see a timestamp for each IO and should be able to make
> sure that all the writes are happening after the reads.

So I tried this, and it's a little hard to figure out the output. But
looking at:

    head -1 $(ls *bw*.log -v)

it does look like the first line is incrementing and tests are not run
in parallel.

So maybe the bug is *just* 1 and 2: (1) the timestamps in the final
report are incorrect, and (2) processes are all started at once (and 1
may be related to 2!)

Does that make sense?

Thanks for the quick response!

-- 
Antoine Beaupré
torproject.org system administration

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: running jobs serially
  2022-05-19  1:16     ` Antoine Beaupré
@ 2022-05-19 13:35       ` Vincent Fu
  2022-05-19 14:02         ` Antoine Beaupré
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Fu @ 2022-05-19 13:35 UTC (permalink / raw)
  To: Antoine Beaupré, fio@vger.kernel.org

> -----Original Message-----
> From: Antoine Beaupré [mailto:anarcat@torproject.org]
> Sent: Wednesday, May 18, 2022 9:17 PM
> To: Vincent Fu <vincent.fu@samsung.com>; fio@vger.kernel.org
> Subject: RE: running jobs serially
> 
> On 2022-05-18 22:41:24, Vincent Fu wrote:
> > The jobs you are running have the *stonewall* option which should
> make them run
> > serially unless something is very broken.
> 
> Yeah, so that's something I added deliberately for that purpose, but two
> things make me think it's not working properly.
> 
>  1. the timestamps are identical for the two jobs
> 
>         randwrite-4k-4g-1x: (groupid=1, jobs=1): err= 0: pid=1033477: Wed
> May 18 15:41:04 2022
>          randread-4k-4g-1x: (groupid=0, jobs=1): err= 0: pid=1033470: Wed
> May 18 15:41:04 2022
> 
>  2. when fio starts, it says:
> 
>          Starting 2 processes
> 
>     i would have expected it to start one process at a time
> 

<snip>

> 
> Speaking of which, it's not clear to me if I need to add stonewall to
> each job or if I can just add it to the top-level global options and be
> done with it...
> 

The stonewall option is needed only in the global section and will apply
to all of the jobs.

<snip>

> 
> So maybe the bug is *just* 1 and 2: (1) the timestamps in the final
> report are incorrect, and (2) processes are all started at once (and 1
> may be related to 2!)
> 

The timestamp is actually the time at which the summary output was generated,
not the time the job started or stopped.

https://github.com/axboe/fio/blob/fio-3.30/stat.c#L1161

All of the processes are created when fio starts up but they do not start issuing IO
until it is their turn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: running jobs serially
  2022-05-19 13:35       ` Vincent Fu
@ 2022-05-19 14:02         ` Antoine Beaupré
  0 siblings, 0 replies; 5+ messages in thread
From: Antoine Beaupré @ 2022-05-19 14:02 UTC (permalink / raw)
  To: Vincent Fu, fio@vger.kernel.org

On 2022-05-19 13:35:02, Vincent Fu wrote:
>> -----Original Message-----
>> From: Antoine Beaupré [mailto:anarcat@torproject.org]

[...]

>> Speaking of which, it's not clear to me if I need to add stonewall to
>> each job or if I can just add it to the top-level global options and be
>> done with it...
>
> The stonewall option is needed only in the global section and will apply
> to all of the jobs.

That's great to hear, thanks. That makes total sense. It can still be
applied only to a single job if I want too, right?

>> So maybe the bug is *just* 1 and 2: (1) the timestamps in the final
>> report are incorrect, and (2) processes are all started at once (and 1
>> may be related to 2!)
>
> The timestamp is actually the time at which the summary output was generated,
> not the time the job started or stopped.
>
> https://github.com/axboe/fio/blob/fio-3.30/stat.c#L1161

Aaah... so that's why I was confused.

Could this be changed? It looks like neither the group_run_stats or the
thread_stat structs have the start timestamps, although thread_stat has
the runtime... Maybe it could be extended so that the display is a
little less confusing?

Or maybe I'm the only one confused by this?

> All of the processes are created when fio starts up but they do not start issuing IO
> until it is their turn.

Couldn't this affect metrics, especially on low-end systems with lots of
jobs? I would think that some processes could end up being swapped in
and out... If a process is stonewalled, I would have assumed it only
*starts* after the stonewall.

-- 
Antoine Beaupré
torproject.org system administration

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-19 14:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20220518195544uscas1p2df5012297be87e1151c9b1f2ea33ff01@uscas1p2.samsung.com>
2022-05-18 19:44 ` running jobs serially Antoine Beaupré
2022-05-18 22:41   ` Vincent Fu
2022-05-19  1:16     ` Antoine Beaupré
2022-05-19 13:35       ` Vincent Fu
2022-05-19 14:02         ` Antoine Beaupré

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.