Re: Running a separate fio process for each disk?

Flexible I/O Tester development
 help / color / mirror / Atom feed

* Re: Running a separate fio process for each disk?
       [not found] ` <56464ACC.9030605@kernel.dk>
@ 2015-11-13 22:04   ` Allen Schade
  2015-11-13 22:06     ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Allen Schade @ 2015-11-13 22:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

[-- Attachment #1: Type: text/plain, Size: 235 bytes --]

I'm actually launching a completely separate instance of fio for each disk.
I want to say its because when I ran them under the same fio process I had
issues with the json files merging the data in an unexpected way.

Version is 2.2.6

[-- Attachment #2: Type: text/html, Size: 285 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-13 22:04   ` Running a separate fio process for each disk? Allen Schade
@ 2015-11-13 22:06     ` Jens Axboe
  2015-11-20 18:28       ` Allen Schade
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-13 22:06 UTC (permalink / raw)
  To: Allen Schade; +Cc: fio

On 11/13/2015 03:04 PM, Allen Schade wrote:
> I'm actually launching a completely separate instance of fio for each
> disk. I want to say its because when I ran them under the same fio
> process I had issues with the json files merging the data in an
> unexpected way.

OK - in any case, that should be fine.

> Version is 2.2.6

Could you try current -git? I vaguely remember some clock issue that 
could have caused this.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-13 22:06     ` Jens Axboe
@ 2015-11-20 18:28       ` Allen Schade
  2015-11-20 19:37         ` Caio Villela
  0 siblings, 1 reply; 13+ messages in thread
From: Allen Schade @ 2015-11-20 18:28 UTC (permalink / raw)
  To: Jens Axboe, Caio Villela; +Cc: fio

Jens,
We tried with the latest version and gathered some detailed info.

Hey Caio,
Can you paste your experiment information as a reply here. Also switch
your email to plain text mode or the vger.kernel.org email address
will reject your email as spam.

On Fri, Nov 13, 2015 at 2:06 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 11/13/2015 03:04 PM, Allen Schade wrote:
>>
>> I'm actually launching a completely separate instance of fio for each
>> disk. I want to say its because when I ran them under the same fio
>> process I had issues with the json files merging the data in an
>> unexpected way.
>
>
> OK - in any case, that should be fine.
>
>> Version is 2.2.6
>
>
> Could you try current -git? I vaguely remember some clock issue that could
> have caused this.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-20 18:28       ` Allen Schade
@ 2015-11-20 19:37         ` Caio Villela
  2015-11-20 19:50           ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Caio Villela @ 2015-11-20 19:37 UTC (permalink / raw)
  To: Allen Schade; +Cc: Jens Axboe, fio

[-- Attachment #1: Type: text/plain, Size: 27592 bytes --]

Hello Allen and Jens,

Sorry for the long output, this is just in case you want the details.
Here is a simple explanation for the problem. I want to run a 15 minute
random write, using 1 Meg requests, and measure throughput and latency.
What seems to be the problem is that if the test system has a large number
of drives - the system that I am testing here has 28 drives - then the time
accounting seems to go bad for some of the processes.
What you see below is that during the 15 minutes from start, all disks are
getting hit the same, as they should. Then, after 15 minutes, there are 15
drives that are still running.... after 5 minutes over the specified 15
minutes, there is still one drive running. Then looking at the amount of
IOs sent to each drive, the ones that ran on that excess time have much
more IOs. FIO still reports that all drives ran for 15 minutes, although
some ran for more than 20 minutes.

We will attempt to run a single process instead of 28 instances of FIO to
see if this goes away.

Details:

I have the following control file to do this:

cat write_random_1M.fio
# Synthetic Latency Analysis Experimental FIO Control File
# Copyright (C) 2015 CODENAME
# Shared with VENDOR under NDA.
# Authors: Caio Villela

# --------------------Global Settings--------------------#

[global]
runtime=900
ioengine=sync
time_based=1
norandommap=1
bwavgtime=5000
direct=1
thread=1
do_verify=0
numjobs=1
continue_on_error=io
ramp_time=10

# ----------------Pure Random Workloads----------------#

# Random Write Workload
# 100% Writes @1024k
[random_write_1024k]
rw=randwrite
bs=1048576

I am monitoring the number of IOs per drive by using the script

cat disp_total_iops_512k.sh
for i in b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac; do
echo -n "sd$i "; cat /sys/block/sd$i/write_request_histo | grep 524288 |
awk '{print $2 + $3 + $4 + $5 + $6 + $7 + $8 + $9 + $10 + $11 + $12}'; done

And as you can see it starts with all drives with zero IOs.


****___ EXPERIMENT WITH 28 DRIVES   ____*****

Starts at 10:07:33, should end at 10:22:33


cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:07:33 PST 2015
sdb 0
sdc 0
sdd 0
sde 0
sdf 0
sdg 0
sdh 0
sdi 0
sdj 0
sdk 0
sdl 0
sdm 0
sdn 0
sdo 0
sdp 0
sdq 0
sdr 0
sds 0
sdt 0
sdu 0
sdv 0
sdw 0
sdx 0
sdy 0
sdz 0
sdaa 0
sdab 0
sdac 0


cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:09:24 PST 2015
sdb 21732
sdc 21835
sdd 21655
sde 21907
sdf 21949
sdg 21753
sdh 21863
sdi 21745
sdj 21679
sdk 21621
sdl 21894
sdm 21437
sdn 21555
sdo 21492
sdp 21677
sdq 21717
sdr 21736
sds 21350
sdt 21909
sdu 22082
sdv 22148
sdw 21778
sdx 21380
sdy 21770
sdz 21749
sdaa 22161
sdab 21798
sdac 21485

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:13:33 PST 2015
sdb 71328
sdc 71811
sdd 70860
sde 71680
sdf 71743
sdg 71171
sdh 71843
sdi 71512
sdj 71328
sdk 70977
sdl 71614
sdm 70715
sdn 70622
sdo 70424
sdp 71338
sdq 70990
sdr 71458
sds 70082
sdt 71995
sdu 72433
sdv 72504
sdw 71687
sdx 70402
sdy 71299
sdz 71376
sdaa 72729
sdab 71302
sdac 70775

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:22:25 PST 2015
sdb 177075
sdc 178134
sdd 176100
sde 178115
sdf 177950
sdg 176743
sdh 178478
sdi 177500
sdj 177239
sdk 176294
sdl 177594
sdm 175325
sdn 175189
sdo 174962
sdp 177076
sdq 176177
sdr 177336
sds 173842
sdt 178753
sdu 179514
sdv 179681
sdw 178229
sdx 174532
sdy 176912
sdz 176911
sdaa 180506
sdab 177070
sdac 175550
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:22:31 PST 2015 ---->> SHOULD END HERE !!!!
sdb 178204
sdc 179207
sdd 177169
sde 179235
sdf 179028
sdg 177822
sdh 179602
sdi 178644
sdj 178364
sdk 177397
sdl 178707
sdm 176411
sdn 176273
sdo 176054
sdp 178182
sdq 177258
sdr 178428
sds 174951
sdt 179844
sdu 180614
sdv 180792
sdw 179292
sdx 175625
sdy 178005
sdz 177983
sdaa 181612
sdab 178120
sdac 176646

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              41.50      1678.00         0.00       3356          0
sdb             170.00         0.00     87040.00          0     174080
sdc             186.50         0.00     95488.00          0     190976
sdd             210.00         0.00    107520.00          0     215040
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh             221.00         0.00    113152.00          0     226304
sdi             208.50         0.00    106752.00          0     213504
sdj             201.00         0.00    102912.00          0     205824
sdk             209.50         0.00    107264.00          0     214528
sdl             204.50         0.00    104704.00          0     209408
sdm             206.00         0.00    105472.00          0     210944
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             194.50         0.00     99584.00          0     199168
sds             192.00         0.00     98304.00          0     196608
sdt             198.00         0.00    101376.00          0     202752
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx             211.00         0.00    108032.00          0     216064
sdy             205.00         0.00    104960.00          0     209920
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab            205.50         0.00    105216.00          0     210432
sdac              0.00         0.00         0.00          0          0


Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              55.50      1842.00        32.00       3684         64
sdb             186.50         0.00     95488.00          0     190976
sdc               0.00         0.00         0.00          0          0
sdd             212.50         0.00    108800.00          0     217600
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh             217.00         0.00    111104.00          0     222208
sdi             206.50         0.00    105728.00          0     211456
sdj             195.50         0.00    100096.00          0     200192
sdk             201.00         0.00    102912.00          0     205824
sdl             199.00         0.00    101888.00          0     203776
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             181.00         0.00     92672.00          0     185344
sds             192.50         0.00     98560.00          0     197120
sdt             207.50         0.00    106240.00          0     212480
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx             211.00         0.00    108032.00          0     216064
sdy             190.00         0.00     97280.00          0     194560
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab            202.50         0.00    103680.00          0     207360
sdac              0.00         0.00         0.00          0          0



cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:23:12 PST 2015
sdb 186323
sdc 183418
sdd 185207
sde 182338
sdf 182166
sdg 181010
sdh 187822
sdi 186776
sdj 186457
sdk 184790
sdl 186859
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 186584
sds 181994
sdt 188087
sdu 183868
sdv 183976
sdw 182562
sdx 183664
sdy 186096
sdz 181240
sdaa 184892
sdab 186197
sdac 179804

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              17.00       362.00        56.00        724        112
sdb             207.50         0.00    106240.00          0     212480
sdc               0.00         0.00         0.00          0          0
sdd             190.50         0.00     97536.00          0     195072
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh             189.50         0.00     97024.00          0     194048
sdi               0.00         0.00         0.00          0          0
sdj             208.00         0.00    106496.00          0     212992
sdk               0.00         0.00         0.00          0          0
sdl             195.50         0.00    100096.00          0     200192
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             213.00         0.00    109056.00          0     218112
sds               0.00         0.00         0.00          0          0
sdt             208.00         0.00    106496.00          0     212992
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx             199.50         0.00    102144.00          0     204288
sdy             189.50         0.00     97024.00          0     194048
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab            215.00         0.00    110080.00          0     220160
sdac              0.00         0.00         0.00          0          0

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:23:29 PST 2015
sdb 189869
sdc 183418
sdd 188716
sde 182338
sdf 182166
sdg 181010
sdh 191382
sdi 187256
sdj 190025
sdk 184790
sdl 190415
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 190136
sds 181994
sdt 191684
sdu 183868
sdv 183976
sdw 182562
sdx 187147
sdy 189641
sdz 181240
sdaa 184892
sdab 189731
sdac 179804

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              41.50      1712.00         0.00       3424          0
sdb             206.00         0.00    105472.00          0     210944
sdc               0.00         0.00         0.00          0          0
sdd             198.50         0.00    101632.00          0     203264
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh             189.00         0.00     96768.00          0     193536
sdi               0.00         0.00         0.00          0          0
sdj             199.50         0.00    102144.00          0     204288
sdk               0.00         0.00         0.00          0          0
sdl             203.00         0.00    103936.00          0     207872
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             201.00         0.00    102912.00          0     205824
sds               0.00         0.00         0.00          0          0
sdt             195.00         0.00     99840.00          0     199680
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx             202.00         0.00    103424.00          0     206848
sdy              31.50         0.00     16128.00          0      32256
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab              0.00         0.00         0.00          0          0
sdac              0.00         0.00         0.00          0          0

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:23:55 PST 2015
sdb 194927
sdc 183418
sdd 193720
sde 182338
sdf 182166
sdg 181010
sdh 196466
sdi 187256
sdj 195111
sdk 184790
sdl 195463
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 195167
sds 181994
sdt 196775
sdu 183868
sdv 183976
sdw 182562
sdx 192105
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              61.50      2086.00         0.00       4172          0
sdb             189.00         0.00     96768.00          0     193536
sdc               0.00         0.00         0.00          0          0
sdd             200.00         0.00    102400.00          0     204800
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdj               0.00         0.00         0.00          0          0
sdk               0.00         0.00         0.00          0          0
sdl             184.50         0.00     94464.00          0     188928
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             215.50         0.00    110336.00          0     220672
sds               0.00         0.00         0.00          0          0
sdt             223.00         0.00    114176.00          0     228352
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx             205.00         0.00    104960.00          0     209920
sdy               0.00         0.00         0.00          0          0
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab              0.00         0.00         0.00          0          0
sdac              0.00         0.00         0.00          0          0

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:24:36 PST 2015
sdb 203100
sdc 183418
sdd 201814
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 203577
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 203330
sds 181994
sdt 204983
sdu 183868
sdv 183976
sdw 182562
sdx 200165
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              18.50      2060.00         8.00       4120         16
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
sdd             204.50         0.00    104704.00          0     209408
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdj               0.00         0.00         0.00          0          0
sdk               0.00         0.00         0.00          0          0
sdl             209.50         0.00    107264.00          0     214528
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             208.00         0.00    106496.00          0     212992
sds               0.00         0.00         0.00          0          0
sdt             210.00         0.00    107520.00          0     215040
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx               0.00         0.00         0.00          0          0
sdy               0.00         0.00         0.00          0          0
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab              0.00         0.00         0.00          0          0
sdac              0.00         0.00         0.00          0          0

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:25:09 PST 2015
sdb 207564
sdc 183418
sdd 208503
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 210362
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 210114
sds 181994
sdt 211744
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              19.00      2050.00        12.00       4100         24
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
sdd             195.00         0.00     99840.00          0     199680
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdj               0.00         0.00         0.00          0          0
sdk               0.00         0.00         0.00          0          0
sdl             197.00         0.00    100864.00          0     201728
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             212.00         0.00    108544.00          0     217088
sds               0.00         0.00         0.00          0          0
sdt             198.00         0.00    101376.00          0     202752
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx               0.00         0.00         0.00          0          0
sdy               0.00         0.00         0.00          0          0
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab              0.00         0.00         0.00          0          0
sdac              0.00         0.00         0.00          0          0


cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:26:12 PST 2015
sdb 207564
sdc 183418
sdd 220962
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 222883
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 222560
sds 181994
sdt 224287
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804




Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              19.50      2128.00         0.00       4256          0
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
sdd             219.00         0.00    112128.00          0     224256
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdj               0.00         0.00         0.00          0          0
sdk               0.00         0.00         0.00          0          0
sdl               0.00         0.00         0.00          0          0
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             207.00         0.00    105984.00          0     211968
sds               0.00         0.00         0.00          0          0
sdt             183.50         0.00     93952.00          0     187904
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx               0.00         0.00         0.00          0          0
sdy               0.00         0.00         0.00          0          0
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab              0.00         0.00         0.00          0          0
sdac              0.00         0.00         0.00          0          0

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:27:00 PST 2015
sdb 207564
sdc 183418
sdd 230499
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 228202
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 232184
sds 181994
sdt 234098
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804



Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              19.00      2056.00         8.00       4112         16
sdb               0.00         0.00         0.00          0          0
sdc               0.00         0.00         0.00          0          0
sdd               0.00         0.00         0.00          0          0
sde               0.00         0.00         0.00          0          0
sdf               0.00         0.00         0.00          0          0
sdg               0.00         0.00         0.00          0          0
sdh               0.00         0.00         0.00          0          0
sdi               0.00         0.00         0.00          0          0
sdj               0.00         0.00         0.00          0          0
sdk               0.00         0.00         0.00          0          0
sdl               0.00         0.00         0.00          0          0
sdm               0.00         0.00         0.00          0          0
sdn               0.00         0.00         0.00          0          0
sdo               0.00         0.00         0.00          0          0
sdp               0.00         0.00         0.00          0          0
sdq               0.00         0.00         0.00          0          0
sdr             197.00         0.00    100864.00          0     201728
sds               0.00         0.00         0.00          0          0
sdt               0.00         0.00         0.00          0          0
sdu               0.00         0.00         0.00          0          0
sdv               0.00         0.00         0.00          0          0
sdw               0.00         0.00         0.00          0          0
sdx               0.00         0.00         0.00          0          0
sdy               0.00         0.00         0.00          0          0
sdz               0.00         0.00         0.00          0          0
sdaa              0.00         0.00         0.00          0          0
sdab              0.00         0.00         0.00          0          0
sdac              0.00         0.00         0.00          0          0
cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:27:31 PST 2015
sdb 207564
sdc 183418
sdd 232922
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 228202
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 238277
sds 181994
sdt 238404
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804


All done at more than 5 minutes past the 15 minute mark.

cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh
Thu Nov 19 10:27:51 PST 2015
sdb 207564
sdc 183418
sdd 232922
sde 182338
sdf 182166
sdg 181010
sdh 201704
sdi 187256
sdj 200326
sdk 184790
sdl 228202
sdm 181900
sdn 179394
sdo 179228
sdp 181386
sdq 180374
sdr 238426
sds 181994
sdt 238404
sdu 183868
sdv 183976
sdw 182562
sdx 201236
sdy 192874
sdz 181240
sdaa 184892
sdab 191350
sdac 179804

On Fri, Nov 20, 2015 at 10:28 AM, Allen Schade <aschade@google.com> wrote:

> Jens,
> We tried with the latest version and gathered some detailed info.
>
> Hey Caio,
> Can you paste your experiment information as a reply here. Also switch
> your email to plain text mode or the vger.kernel.org email address
> will reject your email as spam.
>
> On Fri, Nov 13, 2015 at 2:06 PM, Jens Axboe <axboe@kernel.dk> wrote:
> > On 11/13/2015 03:04 PM, Allen Schade wrote:
> >>
> >> I'm actually launching a completely separate instance of fio for each
> >> disk. I want to say its because when I ran them under the same fio
> >> process I had issues with the json files merging the data in an
> >> unexpected way.
> >
> >
> > OK - in any case, that should be fine.
> >
> >> Version is 2.2.6
> >
> >
> > Could you try current -git? I vaguely remember some clock issue that
> could
> > have caused this.
> >
> > --
> > Jens Axboe
> >
>

[-- Attachment #2: Type: text/html, Size: 90856 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-20 19:37         ` Caio Villela
@ 2015-11-20 19:50           ` Jens Axboe
  2015-11-20 22:20             ` Akash Verma
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-20 19:50 UTC (permalink / raw)
  To: Caio Villela, Allen Schade; +Cc: fio

On 11/20/2015 12:37 PM, Caio Villela wrote:
> Hello Allen and Jens,
>
> Sorry for the long output, this is just in case you want the details.
> Here is a simple explanation for the problem. I want to run a 15 minute
> random write, using 1 Meg requests, and measure throughput and latency.
> What seems to be the problem is that if the test system has a large
> number of drives - the system that I am testing here has 28 drives -
> then the time accounting seems to go bad for some of the processes.
> What you see below is that during the 15 minutes from start, all disks
> are getting hit the same, as they should. Then, after 15 minutes, there
> are 15 drives that are still running.... after 5 minutes over the
> specified 15 minutes, there is still one drive running. Then looking at
> the amount of IOs sent to each drive, the ones that ran on that excess
> time have much more IOs. FIO still reports that all drives ran for 15
> minutes, although some ran for more than 20 minutes.
>
> We will attempt to run a single process instead of 28 instances of FIO
> to see if this goes away.

Could you also check if adding clocksource=gettimeofday makes any 
difference? This sounds very odd.

Assuming this was run with fio -git?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-20 19:50           ` Jens Axboe
@ 2015-11-20 22:20             ` Akash Verma
  2015-11-21  0:03               ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Akash Verma @ 2015-11-20 22:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caio Villela, Allen Schade, fio

Hi Jens,
The issue is not seen with non-cpu clock sources, or when using a
single process (with individual threads, the only config I tried). We
only see the issue when using multiple processes and the cpu clock
source.

On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 11/20/2015 12:37 PM, Caio Villela wrote:
>>
>> Hello Allen and Jens,
>>
>> Sorry for the long output, this is just in case you want the details.
>> Here is a simple explanation for the problem. I want to run a 15 minute
>> random write, using 1 Meg requests, and measure throughput and latency.
>> What seems to be the problem is that if the test system has a large
>> number of drives - the system that I am testing here has 28 drives -
>> then the time accounting seems to go bad for some of the processes.
>> What you see below is that during the 15 minutes from start, all disks
>> are getting hit the same, as they should. Then, after 15 minutes, there
>> are 15 drives that are still running.... after 5 minutes over the
>> specified 15 minutes, there is still one drive running. Then looking at
>> the amount of IOs sent to each drive, the ones that ran on that excess
>> time have much more IOs. FIO still reports that all drives ran for 15
>> minutes, although some ran for more than 20 minutes.
>>
>> We will attempt to run a single process instead of 28 instances of FIO
>> to see if this goes away.
>
>
> Could you also check if adding clocksource=gettimeofday makes any
> difference? This sounds very odd.
>
> Assuming this was run with fio -git?
>
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-20 22:20             ` Akash Verma
@ 2015-11-21  0:03               ` Jens Axboe
  2015-11-21  0:21                 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-21  0:03 UTC (permalink / raw)
  To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio

[-- Attachment #1: Type: text/plain, Size: 2104 bytes --]

Hi,

OK, I see. Can you pull the latest -git, and then run fio --cpuclock-test
on one of the boxes where you see the issue? It should have
commit 5896d827e1e2 or later.


On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com> wrote:

> Hi Jens,
> The issue is not seen with non-cpu clock sources, or when using a
> single process (with individual threads, the only config I tried). We
> only see the issue when using multiple processes and the cpu clock
> source.
>
> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote:
> > On 11/20/2015 12:37 PM, Caio Villela wrote:
> >>
> >> Hello Allen and Jens,
> >>
> >> Sorry for the long output, this is just in case you want the details.
> >> Here is a simple explanation for the problem. I want to run a 15 minute
> >> random write, using 1 Meg requests, and measure throughput and latency.
> >> What seems to be the problem is that if the test system has a large
> >> number of drives - the system that I am testing here has 28 drives -
> >> then the time accounting seems to go bad for some of the processes.
> >> What you see below is that during the 15 minutes from start, all disks
> >> are getting hit the same, as they should. Then, after 15 minutes, there
> >> are 15 drives that are still running.... after 5 minutes over the
> >> specified 15 minutes, there is still one drive running. Then looking at
> >> the amount of IOs sent to each drive, the ones that ran on that excess
> >> time have much more IOs. FIO still reports that all drives ran for 15
> >> minutes, although some ran for more than 20 minutes.
> >>
> >> We will attempt to run a single process instead of 28 instances of FIO
> >> to see if this goes away.
> >
> >
> > Could you also check if adding clocksource=gettimeofday makes any
> > difference? This sounds very odd.
> >
> > Assuming this was run with fio -git?
> >
> >
> > --
> > Jens Axboe
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe fio" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

[-- Attachment #2: Type: text/html, Size: 3018 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-21  0:03               ` Jens Axboe
@ 2015-11-21  0:21                 ` Jens Axboe
  2015-11-24 15:51                   ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-21  0:21 UTC (permalink / raw)
  To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio

And finally, there's a potential fix, if you run commit
99afcdb53dc3 or later. So please do try that as well, and
see if that behaves any better for you.


On 11/20/2015 05:03 PM, Jens Axboe wrote:
> Hi,
>
> OK, I see. Can you pull the latest -git, and then run fio
> --cpuclock-test on one of the boxes where you see the issue? It should
> have commit 5896d827e1e2 or later.
>
>
> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
> <mailto:akashv@google.com>> wrote:
>
>     Hi Jens,
>     The issue is not seen with non-cpu clock sources, or when using a
>     single process (with individual threads, the only config I tried). We
>     only see the issue when using multiple processes and the cpu clock
>     source.
>
>     On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>     <mailto:axboe@kernel.dk>> wrote:
>      > On 11/20/2015 12:37 PM, Caio Villela wrote:
>      >>
>      >> Hello Allen and Jens,
>      >>
>      >> Sorry for the long output, this is just in case you want the
>     details.
>      >> Here is a simple explanation for the problem. I want to run a 15
>     minute
>      >> random write, using 1 Meg requests, and measure throughput and
>     latency.
>      >> What seems to be the problem is that if the test system has a large
>      >> number of drives - the system that I am testing here has 28 drives -
>      >> then the time accounting seems to go bad for some of the processes.
>      >> What you see below is that during the 15 minutes from start, all
>     disks
>      >> are getting hit the same, as they should. Then, after 15
>     minutes, there
>      >> are 15 drives that are still running.... after 5 minutes over the
>      >> specified 15 minutes, there is still one drive running. Then
>     looking at
>      >> the amount of IOs sent to each drive, the ones that ran on that
>     excess
>      >> time have much more IOs. FIO still reports that all drives ran
>     for 15
>      >> minutes, although some ran for more than 20 minutes.
>      >>
>      >> We will attempt to run a single process instead of 28 instances
>     of FIO
>      >> to see if this goes away.
>      >
>      >
>      > Could you also check if adding clocksource=gettimeofday makes any
>      > difference? This sounds very odd.
>      >
>      > Assuming this was run with fio -git?
>      >
>      >
>      > --
>      > Jens Axboe
>      >
>     > --
>     > To unsubscribe from this list: send the line "unsubscribe fio" in
>     > the body of a message tomajordomo@vger.kernel.org <mailto:majordomo@vger.kernel.org>
>     > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-21  0:21                 ` Jens Axboe
@ 2015-11-24 15:51                   ` Jens Axboe
  2015-11-24 20:51                     ` Akash Verma
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-24 15:51 UTC (permalink / raw)
  To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio

Did you try current -git yet? I think it should work for both scenarios. 
It's a silly bug, would be great to have confirmation that it's fixed. 
Then I'll spin a new release.


On 11/20/2015 05:21 PM, Jens Axboe wrote:
> And finally, there's a potential fix, if you run commit
> 99afcdb53dc3 or later. So please do try that as well, and
> see if that behaves any better for you.
>
>
> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>> Hi,
>>
>> OK, I see. Can you pull the latest -git, and then run fio
>> --cpuclock-test on one of the boxes where you see the issue? It should
>> have commit 5896d827e1e2 or later.
>>
>>
>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>> <mailto:akashv@google.com>> wrote:
>>
>>     Hi Jens,
>>     The issue is not seen with non-cpu clock sources, or when using a
>>     single process (with individual threads, the only config I tried). We
>>     only see the issue when using multiple processes and the cpu clock
>>     source.
>>
>>     On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>     <mailto:axboe@kernel.dk>> wrote:
>>      > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>      >>
>>      >> Hello Allen and Jens,
>>      >>
>>      >> Sorry for the long output, this is just in case you want the
>>     details.
>>      >> Here is a simple explanation for the problem. I want to run a 15
>>     minute
>>      >> random write, using 1 Meg requests, and measure throughput and
>>     latency.
>>      >> What seems to be the problem is that if the test system has a
>> large
>>      >> number of drives - the system that I am testing here has 28
>> drives -
>>      >> then the time accounting seems to go bad for some of the
>> processes.
>>      >> What you see below is that during the 15 minutes from start, all
>>     disks
>>      >> are getting hit the same, as they should. Then, after 15
>>     minutes, there
>>      >> are 15 drives that are still running.... after 5 minutes over the
>>      >> specified 15 minutes, there is still one drive running. Then
>>     looking at
>>      >> the amount of IOs sent to each drive, the ones that ran on that
>>     excess
>>      >> time have much more IOs. FIO still reports that all drives ran
>>     for 15
>>      >> minutes, although some ran for more than 20 minutes.
>>      >>
>>      >> We will attempt to run a single process instead of 28 instances
>>     of FIO
>>      >> to see if this goes away.
>>      >
>>      >
>>      > Could you also check if adding clocksource=gettimeofday makes any
>>      > difference? This sounds very odd.
>>      >
>>      > Assuming this was run with fio -git?
>>      >
>>      >
>>      > --
>>      > Jens Axboe
>>      >
>>     > --
>>     > To unsubscribe from this list: send the line "unsubscribe fio" in
>>     > the body of a message tomajordomo@vger.kernel.org
>> <mailto:majordomo@vger.kernel.org>
>>     > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>
>>
>
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-24 15:51                   ` Jens Axboe
@ 2015-11-24 20:51                     ` Akash Verma
  2015-11-25  1:18                       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Akash Verma @ 2015-11-24 20:51 UTC (permalink / raw)
  To: Jens Axboe, Michael Bella; +Cc: Caio Villela, Allen Schade, fio

Sorry for not getting back - I didn't get a chance to try the latest
git, and I'm off on vacation soon; I'm ccing Michael and Caio who
might have a chance to try it out before Thursday. Michael or Caio,
could you try run the two things Jens asked (the cpuclock test using
the FIO we've been currently using as well as the latest from Git; and
the regular multi-process FIO run with the latest git)?

On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote:
> Did you try current -git yet? I think it should work for both scenarios.
> It's a silly bug, would be great to have confirmation that it's fixed. Then
> I'll spin a new release.
>
>
>
> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>>
>> And finally, there's a potential fix, if you run commit
>> 99afcdb53dc3 or later. So please do try that as well, and
>> see if that behaves any better for you.
>>
>>
>> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>>>
>>> Hi,
>>>
>>> OK, I see. Can you pull the latest -git, and then run fio
>>> --cpuclock-test on one of the boxes where you see the issue? It should
>>> have commit 5896d827e1e2 or later.
>>>
>>>
>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>>> <mailto:akashv@google.com>> wrote:
>>>
>>>     Hi Jens,
>>>     The issue is not seen with non-cpu clock sources, or when using a
>>>     single process (with individual threads, the only config I tried). We
>>>     only see the issue when using multiple processes and the cpu clock
>>>     source.
>>>
>>>     On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>>     <mailto:axboe@kernel.dk>> wrote:
>>>      > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>>      >>
>>>      >> Hello Allen and Jens,
>>>      >>
>>>      >> Sorry for the long output, this is just in case you want the
>>>     details.
>>>      >> Here is a simple explanation for the problem. I want to run a 15
>>>     minute
>>>      >> random write, using 1 Meg requests, and measure throughput and
>>>     latency.
>>>      >> What seems to be the problem is that if the test system has a
>>> large
>>>      >> number of drives - the system that I am testing here has 28
>>> drives -
>>>      >> then the time accounting seems to go bad for some of the
>>> processes.
>>>      >> What you see below is that during the 15 minutes from start, all
>>>     disks
>>>      >> are getting hit the same, as they should. Then, after 15
>>>     minutes, there
>>>      >> are 15 drives that are still running.... after 5 minutes over the
>>>      >> specified 15 minutes, there is still one drive running. Then
>>>     looking at
>>>      >> the amount of IOs sent to each drive, the ones that ran on that
>>>     excess
>>>      >> time have much more IOs. FIO still reports that all drives ran
>>>     for 15
>>>      >> minutes, although some ran for more than 20 minutes.
>>>      >>
>>>      >> We will attempt to run a single process instead of 28 instances
>>>     of FIO
>>>      >> to see if this goes away.
>>>      >
>>>      >
>>>      > Could you also check if adding clocksource=gettimeofday makes any
>>>      > difference? This sounds very odd.
>>>      >
>>>      > Assuming this was run with fio -git?
>>>      >
>>>      >
>>>      > --
>>>      > Jens Axboe
>>>      >
>>>     > --
>>>     > To unsubscribe from this list: send the line "unsubscribe fio" in
>>>     > the body of a message tomajordomo@vger.kernel.org
>>> <mailto:majordomo@vger.kernel.org>
>>>     > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>>
>
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-24 20:51                     ` Akash Verma
@ 2015-11-25  1:18                       ` Jens Axboe
  2015-12-03 18:54                         ` Akash Verma
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2015-11-25  1:18 UTC (permalink / raw)
  To: Akash Verma, Michael Bella; +Cc: Caio Villela, Allen Schade, fio

No worries, I know this week is a bit more problematic than usual. I'll 
hold off on the new release until I know.


On 11/24/2015 01:51 PM, Akash Verma wrote:
> Sorry for not getting back - I didn't get a chance to try the latest
> git, and I'm off on vacation soon; I'm ccing Michael and Caio who
> might have a chance to try it out before Thursday. Michael or Caio,
> could you try run the two things Jens asked (the cpuclock test using
> the FIO we've been currently using as well as the latest from Git; and
> the regular multi-process FIO run with the latest git)?
>
> On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote:
>> Did you try current -git yet? I think it should work for both scenarios.
>> It's a silly bug, would be great to have confirmation that it's fixed. Then
>> I'll spin a new release.
>>
>>
>>
>> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>>>
>>> And finally, there's a potential fix, if you run commit
>>> 99afcdb53dc3 or later. So please do try that as well, and
>>> see if that behaves any better for you.
>>>
>>>
>>> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>>>>
>>>> Hi,
>>>>
>>>> OK, I see. Can you pull the latest -git, and then run fio
>>>> --cpuclock-test on one of the boxes where you see the issue? It should
>>>> have commit 5896d827e1e2 or later.
>>>>
>>>>
>>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>>>> <mailto:akashv@google.com>> wrote:
>>>>
>>>>      Hi Jens,
>>>>      The issue is not seen with non-cpu clock sources, or when using a
>>>>      single process (with individual threads, the only config I tried). We
>>>>      only see the issue when using multiple processes and the cpu clock
>>>>      source.
>>>>
>>>>      On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>>>      <mailto:axboe@kernel.dk>> wrote:
>>>>       > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>>>       >>
>>>>       >> Hello Allen and Jens,
>>>>       >>
>>>>       >> Sorry for the long output, this is just in case you want the
>>>>      details.
>>>>       >> Here is a simple explanation for the problem. I want to run a 15
>>>>      minute
>>>>       >> random write, using 1 Meg requests, and measure throughput and
>>>>      latency.
>>>>       >> What seems to be the problem is that if the test system has a
>>>> large
>>>>       >> number of drives - the system that I am testing here has 28
>>>> drives -
>>>>       >> then the time accounting seems to go bad for some of the
>>>> processes.
>>>>       >> What you see below is that during the 15 minutes from start, all
>>>>      disks
>>>>       >> are getting hit the same, as they should. Then, after 15
>>>>      minutes, there
>>>>       >> are 15 drives that are still running.... after 5 minutes over the
>>>>       >> specified 15 minutes, there is still one drive running. Then
>>>>      looking at
>>>>       >> the amount of IOs sent to each drive, the ones that ran on that
>>>>      excess
>>>>       >> time have much more IOs. FIO still reports that all drives ran
>>>>      for 15
>>>>       >> minutes, although some ran for more than 20 minutes.
>>>>       >>
>>>>       >> We will attempt to run a single process instead of 28 instances
>>>>      of FIO
>>>>       >> to see if this goes away.
>>>>       >
>>>>       >
>>>>       > Could you also check if adding clocksource=gettimeofday makes any
>>>>       > difference? This sounds very odd.
>>>>       >
>>>>       > Assuming this was run with fio -git?
>>>>       >
>>>>       >
>>>>       > --
>>>>       > Jens Axboe
>>>>       >
>>>>      > --
>>>>      > To unsubscribe from this list: send the line "unsubscribe fio" in
>>>>      > the body of a message tomajordomo@vger.kernel.org
>>>> <mailto:majordomo@vger.kernel.org>
>>>>      > More majordomo info athttp://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Jens Axboe
>>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-11-25  1:18                       ` Jens Axboe
@ 2015-12-03 18:54                         ` Akash Verma
  2015-12-03 18:58                           ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Akash Verma @ 2015-12-03 18:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Bella, Caio Villela, Allen Schade, fio

[-- Attachment #1: Type: text/plain, Size: 4255 bytes --]

Jens, I confirmed that the issue is not seen with the latest FIO (I used
version fio-2.2.12-15-gcdab).

On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@kernel.dk> wrote:

> No worries, I know this week is a bit more problematic than usual. I'll
> hold off on the new release until I know.
>
>
>
> On 11/24/2015 01:51 PM, Akash Verma wrote:
>
>> Sorry for not getting back - I didn't get a chance to try the latest
>> git, and I'm off on vacation soon; I'm ccing Michael and Caio who
>> might have a chance to try it out before Thursday. Michael or Caio,
>> could you try run the two things Jens asked (the cpuclock test using
>> the FIO we've been currently using as well as the latest from Git; and
>> the regular multi-process FIO run with the latest git)?
>>
>> On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>>> Did you try current -git yet? I think it should work for both scenarios.
>>> It's a silly bug, would be great to have confirmation that it's fixed.
>>> Then
>>> I'll spin a new release.
>>>
>>>
>>>
>>> On 11/20/2015 05:21 PM, Jens Axboe wrote:
>>>
>>>>
>>>> And finally, there's a potential fix, if you run commit
>>>> 99afcdb53dc3 or later. So please do try that as well, and
>>>> see if that behaves any better for you.
>>>>
>>>>
>>>> On 11/20/2015 05:03 PM, Jens Axboe wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> OK, I see. Can you pull the latest -git, and then run fio
>>>>> --cpuclock-test on one of the boxes where you see the issue? It should
>>>>> have commit 5896d827e1e2 or later.
>>>>>
>>>>>
>>>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com
>>>>> <mailto:akashv@google.com>> wrote:
>>>>>
>>>>>      Hi Jens,
>>>>>      The issue is not seen with non-cpu clock sources, or when using a
>>>>>      single process (with individual threads, the only config I
>>>>> tried). We
>>>>>      only see the issue when using multiple processes and the cpu clock
>>>>>      source.
>>>>>
>>>>>      On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk
>>>>>      <mailto:axboe@kernel.dk>> wrote:
>>>>>       > On 11/20/2015 12:37 PM, Caio Villela wrote:
>>>>>       >>
>>>>>       >> Hello Allen and Jens,
>>>>>       >>
>>>>>       >> Sorry for the long output, this is just in case you want the
>>>>>      details.
>>>>>       >> Here is a simple explanation for the problem. I want to run a
>>>>> 15
>>>>>      minute
>>>>>       >> random write, using 1 Meg requests, and measure throughput and
>>>>>      latency.
>>>>>       >> What seems to be the problem is that if the test system has a
>>>>> large
>>>>>       >> number of drives - the system that I am testing here has 28
>>>>> drives -
>>>>>       >> then the time accounting seems to go bad for some of the
>>>>> processes.
>>>>>       >> What you see below is that during the 15 minutes from start,
>>>>> all
>>>>>      disks
>>>>>       >> are getting hit the same, as they should. Then, after 15
>>>>>      minutes, there
>>>>>       >> are 15 drives that are still running.... after 5 minutes over
>>>>> the
>>>>>       >> specified 15 minutes, there is still one drive running. Then
>>>>>      looking at
>>>>>       >> the amount of IOs sent to each drive, the ones that ran on
>>>>> that
>>>>>      excess
>>>>>       >> time have much more IOs. FIO still reports that all drives ran
>>>>>      for 15
>>>>>       >> minutes, although some ran for more than 20 minutes.
>>>>>       >>
>>>>>       >> We will attempt to run a single process instead of 28
>>>>> instances
>>>>>      of FIO
>>>>>       >> to see if this goes away.
>>>>>       >
>>>>>       >
>>>>>       > Could you also check if adding clocksource=gettimeofday makes
>>>>> any
>>>>>       > difference? This sounds very odd.
>>>>>       >
>>>>>       > Assuming this was run with fio -git?
>>>>>       >
>>>>>       >
>>>>>       > --
>>>>>       > Jens Axboe
>>>>>       >
>>>>>      > --
>>>>>      > To unsubscribe from this list: send the line "unsubscribe fio"
>>>>> in
>>>>>      > the body of a message tomajordomo@vger.kernel.org
>>>>> <mailto:majordomo@vger.kernel.org>
>>>>>      > More majordomo info athttp://
>>>>> vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Jens Axboe
>>>
>>>
>
> --
> Jens Axboe
>
>

[-- Attachment #2: Type: text/html, Size: 5938 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Running a separate fio process for each disk?
  2015-12-03 18:54                         ` Akash Verma
@ 2015-12-03 18:58                           ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2015-12-03 18:58 UTC (permalink / raw)
  To: Akash Verma; +Cc: Michael Bella, Caio Villela, Allen Schade, fio

Perfect! Thanks for reporting and re-testing.


On 12/03/2015 11:54 AM, Akash Verma wrote:
> Jens, I confirmed that the issue is not seen with the latest FIO (I used
> version fio-2.2.12-15-gcdab).
>
> On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@kernel.dk
> <mailto:axboe@kernel.dk>> wrote:
>
>     No worries, I know this week is a bit more problematic than usual.
>     I'll hold off on the new release until I know.
>
>
>
>     On 11/24/2015 01:51 PM, Akash Verma wrote:
>
>         Sorry for not getting back - I didn't get a chance to try the latest
>         git, and I'm off on vacation soon; I'm ccing Michael and Caio who
>         might have a chance to try it out before Thursday. Michael or Caio,
>         could you try run the two things Jens asked (the cpuclock test using
>         the FIO we've been currently using as well as the latest from
>         Git; and
>         the regular multi-process FIO run with the latest git)?
>
>         On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk
>         <mailto:axboe@kernel.dk>> wrote:
>
>             Did you try current -git yet? I think it should work for
>             both scenarios.
>             It's a silly bug, would be great to have confirmation that
>             it's fixed. Then
>             I'll spin a new release.
>
>
>
>             On 11/20/2015 05:21 PM, Jens Axboe wrote:
>
>
>                 And finally, there's a potential fix, if you run commit
>                 99afcdb53dc3 or later. So please do try that as well, and
>                 see if that behaves any better for you.
>
>
>                 On 11/20/2015 05:03 PM, Jens Axboe wrote:
>
>
>                     Hi,
>
>                     OK, I see. Can you pull the latest -git, and then
>                     run fio
>                     --cpuclock-test on one of the boxes where you see
>                     the issue? It should
>                     have commit 5896d827e1e2 or later.
>
>
>                     On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma
>                     <akashv@google.com <mailto:akashv@google.com>
>                     <mailto:akashv@google.com
>                     <mailto:akashv@google.com>>> wrote:
>
>                           Hi Jens,
>                           The issue is not seen with non-cpu clock
>                     sources, or when using a
>                           single process (with individual threads, the
>                     only config I tried). We
>                           only see the issue when using multiple
>                     processes and the cpu clock
>                           source.
>
>                           On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe
>                     <axboe@kernel.dk <mailto:axboe@kernel.dk>
>                           <mailto:axboe@kernel.dk
>                     <mailto:axboe@kernel.dk>>> wrote:
>                            > On 11/20/2015 12:37 PM, Caio Villela wrote:
>                            >>
>                            >> Hello Allen and Jens,
>                            >>
>                            >> Sorry for the long output, this is just in
>                     case you want the
>                           details.
>                            >> Here is a simple explanation for the
>                     problem. I want to run a 15
>                           minute
>                            >> random write, using 1 Meg requests, and
>                     measure throughput and
>                           latency.
>                            >> What seems to be the problem is that if
>                     the test system has a
>                     large
>                            >> number of drives - the system that I am
>                     testing here has 28
>                     drives -
>                            >> then the time accounting seems to go bad
>                     for some of the
>                     processes.
>                            >> What you see below is that during the 15
>                     minutes from start, all
>                           disks
>                            >> are getting hit the same, as they should.
>                     Then, after 15
>                           minutes, there
>                            >> are 15 drives that are still running....
>                     after 5 minutes over the
>                            >> specified 15 minutes, there is still one
>                     drive running. Then
>                           looking at
>                            >> the amount of IOs sent to each drive, the
>                     ones that ran on that
>                           excess
>                            >> time have much more IOs. FIO still reports
>                     that all drives ran
>                           for 15
>                            >> minutes, although some ran for more than
>                     20 minutes.
>                            >>
>                            >> We will attempt to run a single process
>                     instead of 28 instances
>                           of FIO
>                            >> to see if this goes away.
>                            >
>                            >
>                            > Could you also check if adding
>                     clocksource=gettimeofday makes any
>                            > difference? This sounds very odd.
>                            >
>                            > Assuming this was run with fio -git?
>                            >
>                            >
>                            > --
>                            > Jens Axboe
>                            >
>                           > --
>                           > To unsubscribe from this list: send the line
>                     "unsubscribe fio" in
>                           > the body of a message
>                     tomajordomo@vger.kernel.org
>                     <mailto:tomajordomo@vger.kernel.org>
>                     <mailto:majordomo@vger.kernel.org
>                     <mailto:majordomo@vger.kernel.org>>
>                           > More majordomo info
>                     athttp://vger.kernel.org/majordomo-info.html
>                     <http://vger.kernel.org/majordomo-info.html>
>
>
>
>
>
>
>             --
>             Jens Axboe
>
>
>
>     --
>     Jens Axboe
>
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-03 18:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CADp+U7ibiKciX8_cpzGzob4oL-UF-H+W7kYuiujovD0ba=hM6A@mail.gmail.com>
     [not found] ` <56464ACC.9030605@kernel.dk>
2015-11-13 22:04   ` Running a separate fio process for each disk? Allen Schade
2015-11-13 22:06     ` Jens Axboe
2015-11-20 18:28       ` Allen Schade
2015-11-20 19:37         ` Caio Villela
2015-11-20 19:50           ` Jens Axboe
2015-11-20 22:20             ` Akash Verma
2015-11-21  0:03               ` Jens Axboe
2015-11-21  0:21                 ` Jens Axboe
2015-11-24 15:51                   ` Jens Axboe
2015-11-24 20:51                     ` Akash Verma
2015-11-25  1:18                       ` Jens Axboe
2015-12-03 18:54                         ` Akash Verma
2015-12-03 18:58                           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox