* Re: Running a separate fio process for each disk? [not found] ` <56464ACC.9030605@kernel.dk> @ 2015-11-13 22:04 ` Allen Schade 2015-11-13 22:06 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Allen Schade @ 2015-11-13 22:04 UTC (permalink / raw) To: Jens Axboe; +Cc: fio [-- Attachment #1: Type: text/plain, Size: 235 bytes --] I'm actually launching a completely separate instance of fio for each disk. I want to say its because when I ran them under the same fio process I had issues with the json files merging the data in an unexpected way. Version is 2.2.6 [-- Attachment #2: Type: text/html, Size: 285 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-13 22:04 ` Running a separate fio process for each disk? Allen Schade @ 2015-11-13 22:06 ` Jens Axboe 2015-11-20 18:28 ` Allen Schade 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2015-11-13 22:06 UTC (permalink / raw) To: Allen Schade; +Cc: fio On 11/13/2015 03:04 PM, Allen Schade wrote: > I'm actually launching a completely separate instance of fio for each > disk. I want to say its because when I ran them under the same fio > process I had issues with the json files merging the data in an > unexpected way. OK - in any case, that should be fine. > Version is 2.2.6 Could you try current -git? I vaguely remember some clock issue that could have caused this. -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-13 22:06 ` Jens Axboe @ 2015-11-20 18:28 ` Allen Schade 2015-11-20 19:37 ` Caio Villela 0 siblings, 1 reply; 13+ messages in thread From: Allen Schade @ 2015-11-20 18:28 UTC (permalink / raw) To: Jens Axboe, Caio Villela; +Cc: fio Jens, We tried with the latest version and gathered some detailed info. Hey Caio, Can you paste your experiment information as a reply here. Also switch your email to plain text mode or the vger.kernel.org email address will reject your email as spam. On Fri, Nov 13, 2015 at 2:06 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 11/13/2015 03:04 PM, Allen Schade wrote: >> >> I'm actually launching a completely separate instance of fio for each >> disk. I want to say its because when I ran them under the same fio >> process I had issues with the json files merging the data in an >> unexpected way. > > > OK - in any case, that should be fine. > >> Version is 2.2.6 > > > Could you try current -git? I vaguely remember some clock issue that could > have caused this. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-20 18:28 ` Allen Schade @ 2015-11-20 19:37 ` Caio Villela 2015-11-20 19:50 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Caio Villela @ 2015-11-20 19:37 UTC (permalink / raw) To: Allen Schade; +Cc: Jens Axboe, fio [-- Attachment #1: Type: text/plain, Size: 27592 bytes --] Hello Allen and Jens, Sorry for the long output, this is just in case you want the details. Here is a simple explanation for the problem. I want to run a 15 minute random write, using 1 Meg requests, and measure throughput and latency. What seems to be the problem is that if the test system has a large number of drives - the system that I am testing here has 28 drives - then the time accounting seems to go bad for some of the processes. What you see below is that during the 15 minutes from start, all disks are getting hit the same, as they should. Then, after 15 minutes, there are 15 drives that are still running.... after 5 minutes over the specified 15 minutes, there is still one drive running. Then looking at the amount of IOs sent to each drive, the ones that ran on that excess time have much more IOs. FIO still reports that all drives ran for 15 minutes, although some ran for more than 20 minutes. We will attempt to run a single process instead of 28 instances of FIO to see if this goes away. Details: I have the following control file to do this: cat write_random_1M.fio # Synthetic Latency Analysis Experimental FIO Control File # Copyright (C) 2015 CODENAME # Shared with VENDOR under NDA. # Authors: Caio Villela # --------------------Global Settings--------------------# [global] runtime=900 ioengine=sync time_based=1 norandommap=1 bwavgtime=5000 direct=1 thread=1 do_verify=0 numjobs=1 continue_on_error=io ramp_time=10 # ----------------Pure Random Workloads----------------# # Random Write Workload # 100% Writes @1024k [random_write_1024k] rw=randwrite bs=1048576 I am monitoring the number of IOs per drive by using the script cat disp_total_iops_512k.sh for i in b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac; do echo -n "sd$i "; cat /sys/block/sd$i/write_request_histo | grep 524288 | awk '{print $2 + $3 + $4 + $5 + $6 + $7 + $8 + $9 + $10 + $11 + $12}'; done And as you can see it starts with all drives with zero IOs. ****___ EXPERIMENT WITH 28 DRIVES ____***** Starts at 10:07:33, should end at 10:22:33 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:07:33 PST 2015 sdb 0 sdc 0 sdd 0 sde 0 sdf 0 sdg 0 sdh 0 sdi 0 sdj 0 sdk 0 sdl 0 sdm 0 sdn 0 sdo 0 sdp 0 sdq 0 sdr 0 sds 0 sdt 0 sdu 0 sdv 0 sdw 0 sdx 0 sdy 0 sdz 0 sdaa 0 sdab 0 sdac 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:09:24 PST 2015 sdb 21732 sdc 21835 sdd 21655 sde 21907 sdf 21949 sdg 21753 sdh 21863 sdi 21745 sdj 21679 sdk 21621 sdl 21894 sdm 21437 sdn 21555 sdo 21492 sdp 21677 sdq 21717 sdr 21736 sds 21350 sdt 21909 sdu 22082 sdv 22148 sdw 21778 sdx 21380 sdy 21770 sdz 21749 sdaa 22161 sdab 21798 sdac 21485 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:13:33 PST 2015 sdb 71328 sdc 71811 sdd 70860 sde 71680 sdf 71743 sdg 71171 sdh 71843 sdi 71512 sdj 71328 sdk 70977 sdl 71614 sdm 70715 sdn 70622 sdo 70424 sdp 71338 sdq 70990 sdr 71458 sds 70082 sdt 71995 sdu 72433 sdv 72504 sdw 71687 sdx 70402 sdy 71299 sdz 71376 sdaa 72729 sdab 71302 sdac 70775 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:22:25 PST 2015 sdb 177075 sdc 178134 sdd 176100 sde 178115 sdf 177950 sdg 176743 sdh 178478 sdi 177500 sdj 177239 sdk 176294 sdl 177594 sdm 175325 sdn 175189 sdo 174962 sdp 177076 sdq 176177 sdr 177336 sds 173842 sdt 178753 sdu 179514 sdv 179681 sdw 178229 sdx 174532 sdy 176912 sdz 176911 sdaa 180506 sdab 177070 sdac 175550 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:22:31 PST 2015 ---->> SHOULD END HERE !!!! sdb 178204 sdc 179207 sdd 177169 sde 179235 sdf 179028 sdg 177822 sdh 179602 sdi 178644 sdj 178364 sdk 177397 sdl 178707 sdm 176411 sdn 176273 sdo 176054 sdp 178182 sdq 177258 sdr 178428 sds 174951 sdt 179844 sdu 180614 sdv 180792 sdw 179292 sdx 175625 sdy 178005 sdz 177983 sdaa 181612 sdab 178120 sdac 176646 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 41.50 1678.00 0.00 3356 0 sdb 170.00 0.00 87040.00 0 174080 sdc 186.50 0.00 95488.00 0 190976 sdd 210.00 0.00 107520.00 0 215040 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 221.00 0.00 113152.00 0 226304 sdi 208.50 0.00 106752.00 0 213504 sdj 201.00 0.00 102912.00 0 205824 sdk 209.50 0.00 107264.00 0 214528 sdl 204.50 0.00 104704.00 0 209408 sdm 206.00 0.00 105472.00 0 210944 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 194.50 0.00 99584.00 0 199168 sds 192.00 0.00 98304.00 0 196608 sdt 198.00 0.00 101376.00 0 202752 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 211.00 0.00 108032.00 0 216064 sdy 205.00 0.00 104960.00 0 209920 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 205.50 0.00 105216.00 0 210432 sdac 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 55.50 1842.00 32.00 3684 64 sdb 186.50 0.00 95488.00 0 190976 sdc 0.00 0.00 0.00 0 0 sdd 212.50 0.00 108800.00 0 217600 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 217.00 0.00 111104.00 0 222208 sdi 206.50 0.00 105728.00 0 211456 sdj 195.50 0.00 100096.00 0 200192 sdk 201.00 0.00 102912.00 0 205824 sdl 199.00 0.00 101888.00 0 203776 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 181.00 0.00 92672.00 0 185344 sds 192.50 0.00 98560.00 0 197120 sdt 207.50 0.00 106240.00 0 212480 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 211.00 0.00 108032.00 0 216064 sdy 190.00 0.00 97280.00 0 194560 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 202.50 0.00 103680.00 0 207360 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:23:12 PST 2015 sdb 186323 sdc 183418 sdd 185207 sde 182338 sdf 182166 sdg 181010 sdh 187822 sdi 186776 sdj 186457 sdk 184790 sdl 186859 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 186584 sds 181994 sdt 188087 sdu 183868 sdv 183976 sdw 182562 sdx 183664 sdy 186096 sdz 181240 sdaa 184892 sdab 186197 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 17.00 362.00 56.00 724 112 sdb 207.50 0.00 106240.00 0 212480 sdc 0.00 0.00 0.00 0 0 sdd 190.50 0.00 97536.00 0 195072 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 189.50 0.00 97024.00 0 194048 sdi 0.00 0.00 0.00 0 0 sdj 208.00 0.00 106496.00 0 212992 sdk 0.00 0.00 0.00 0 0 sdl 195.50 0.00 100096.00 0 200192 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 213.00 0.00 109056.00 0 218112 sds 0.00 0.00 0.00 0 0 sdt 208.00 0.00 106496.00 0 212992 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 199.50 0.00 102144.00 0 204288 sdy 189.50 0.00 97024.00 0 194048 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 215.00 0.00 110080.00 0 220160 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:23:29 PST 2015 sdb 189869 sdc 183418 sdd 188716 sde 182338 sdf 182166 sdg 181010 sdh 191382 sdi 187256 sdj 190025 sdk 184790 sdl 190415 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 190136 sds 181994 sdt 191684 sdu 183868 sdv 183976 sdw 182562 sdx 187147 sdy 189641 sdz 181240 sdaa 184892 sdab 189731 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 41.50 1712.00 0.00 3424 0 sdb 206.00 0.00 105472.00 0 210944 sdc 0.00 0.00 0.00 0 0 sdd 198.50 0.00 101632.00 0 203264 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 189.00 0.00 96768.00 0 193536 sdi 0.00 0.00 0.00 0 0 sdj 199.50 0.00 102144.00 0 204288 sdk 0.00 0.00 0.00 0 0 sdl 203.00 0.00 103936.00 0 207872 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 201.00 0.00 102912.00 0 205824 sds 0.00 0.00 0.00 0 0 sdt 195.00 0.00 99840.00 0 199680 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 202.00 0.00 103424.00 0 206848 sdy 31.50 0.00 16128.00 0 32256 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 0.00 0.00 0.00 0 0 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:23:55 PST 2015 sdb 194927 sdc 183418 sdd 193720 sde 182338 sdf 182166 sdg 181010 sdh 196466 sdi 187256 sdj 195111 sdk 184790 sdl 195463 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 195167 sds 181994 sdt 196775 sdu 183868 sdv 183976 sdw 182562 sdx 192105 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 61.50 2086.00 0.00 4172 0 sdb 189.00 0.00 96768.00 0 193536 sdc 0.00 0.00 0.00 0 0 sdd 200.00 0.00 102400.00 0 204800 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdl 184.50 0.00 94464.00 0 188928 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 215.50 0.00 110336.00 0 220672 sds 0.00 0.00 0.00 0 0 sdt 223.00 0.00 114176.00 0 228352 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 205.00 0.00 104960.00 0 209920 sdy 0.00 0.00 0.00 0 0 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 0.00 0.00 0.00 0 0 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:24:36 PST 2015 sdb 203100 sdc 183418 sdd 201814 sde 182338 sdf 182166 sdg 181010 sdh 201704 sdi 187256 sdj 200326 sdk 184790 sdl 203577 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 203330 sds 181994 sdt 204983 sdu 183868 sdv 183976 sdw 182562 sdx 200165 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 18.50 2060.00 8.00 4120 16 sdb 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdd 204.50 0.00 104704.00 0 209408 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdl 209.50 0.00 107264.00 0 214528 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 208.00 0.00 106496.00 0 212992 sds 0.00 0.00 0.00 0 0 sdt 210.00 0.00 107520.00 0 215040 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 0.00 0.00 0.00 0 0 sdy 0.00 0.00 0.00 0 0 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 0.00 0.00 0.00 0 0 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:25:09 PST 2015 sdb 207564 sdc 183418 sdd 208503 sde 182338 sdf 182166 sdg 181010 sdh 201704 sdi 187256 sdj 200326 sdk 184790 sdl 210362 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 210114 sds 181994 sdt 211744 sdu 183868 sdv 183976 sdw 182562 sdx 201236 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 19.00 2050.00 12.00 4100 24 sdb 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdd 195.00 0.00 99840.00 0 199680 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdl 197.00 0.00 100864.00 0 201728 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 212.00 0.00 108544.00 0 217088 sds 0.00 0.00 0.00 0 0 sdt 198.00 0.00 101376.00 0 202752 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 0.00 0.00 0.00 0 0 sdy 0.00 0.00 0.00 0 0 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 0.00 0.00 0.00 0 0 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:26:12 PST 2015 sdb 207564 sdc 183418 sdd 220962 sde 182338 sdf 182166 sdg 181010 sdh 201704 sdi 187256 sdj 200326 sdk 184790 sdl 222883 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 222560 sds 181994 sdt 224287 sdu 183868 sdv 183976 sdw 182562 sdx 201236 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 19.50 2128.00 0.00 4256 0 sdb 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdd 219.00 0.00 112128.00 0 224256 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdl 0.00 0.00 0.00 0 0 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 207.00 0.00 105984.00 0 211968 sds 0.00 0.00 0.00 0 0 sdt 183.50 0.00 93952.00 0 187904 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 0.00 0.00 0.00 0 0 sdy 0.00 0.00 0.00 0 0 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 0.00 0.00 0.00 0 0 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:27:00 PST 2015 sdb 207564 sdc 183418 sdd 230499 sde 182338 sdf 182166 sdg 181010 sdh 201704 sdi 187256 sdj 200326 sdk 184790 sdl 228202 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 232184 sds 181994 sdt 234098 sdu 183868 sdv 183976 sdw 182562 sdx 201236 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 19.00 2056.00 8.00 4112 16 sdb 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdf 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdl 0.00 0.00 0.00 0 0 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdr 197.00 0.00 100864.00 0 201728 sds 0.00 0.00 0.00 0 0 sdt 0.00 0.00 0.00 0 0 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdx 0.00 0.00 0.00 0 0 sdy 0.00 0.00 0.00 0 0 sdz 0.00 0.00 0.00 0 0 sdaa 0.00 0.00 0.00 0 0 sdab 0.00 0.00 0.00 0 0 sdac 0.00 0.00 0.00 0 0 cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:27:31 PST 2015 sdb 207564 sdc 183418 sdd 232922 sde 182338 sdf 182166 sdg 181010 sdh 201704 sdi 187256 sdj 200326 sdk 184790 sdl 228202 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 238277 sds 181994 sdt 238404 sdu 183868 sdv 183976 sdw 182562 sdx 201236 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 All done at more than 5 minutes past the 15 minute mark. cln4:/home/hdd_test# date; ./disp_total_iops_512k.sh Thu Nov 19 10:27:51 PST 2015 sdb 207564 sdc 183418 sdd 232922 sde 182338 sdf 182166 sdg 181010 sdh 201704 sdi 187256 sdj 200326 sdk 184790 sdl 228202 sdm 181900 sdn 179394 sdo 179228 sdp 181386 sdq 180374 sdr 238426 sds 181994 sdt 238404 sdu 183868 sdv 183976 sdw 182562 sdx 201236 sdy 192874 sdz 181240 sdaa 184892 sdab 191350 sdac 179804 On Fri, Nov 20, 2015 at 10:28 AM, Allen Schade <aschade@google.com> wrote: > Jens, > We tried with the latest version and gathered some detailed info. > > Hey Caio, > Can you paste your experiment information as a reply here. Also switch > your email to plain text mode or the vger.kernel.org email address > will reject your email as spam. > > On Fri, Nov 13, 2015 at 2:06 PM, Jens Axboe <axboe@kernel.dk> wrote: > > On 11/13/2015 03:04 PM, Allen Schade wrote: > >> > >> I'm actually launching a completely separate instance of fio for each > >> disk. I want to say its because when I ran them under the same fio > >> process I had issues with the json files merging the data in an > >> unexpected way. > > > > > > OK - in any case, that should be fine. > > > >> Version is 2.2.6 > > > > > > Could you try current -git? I vaguely remember some clock issue that > could > > have caused this. > > > > -- > > Jens Axboe > > > [-- Attachment #2: Type: text/html, Size: 90856 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-20 19:37 ` Caio Villela @ 2015-11-20 19:50 ` Jens Axboe 2015-11-20 22:20 ` Akash Verma 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2015-11-20 19:50 UTC (permalink / raw) To: Caio Villela, Allen Schade; +Cc: fio On 11/20/2015 12:37 PM, Caio Villela wrote: > Hello Allen and Jens, > > Sorry for the long output, this is just in case you want the details. > Here is a simple explanation for the problem. I want to run a 15 minute > random write, using 1 Meg requests, and measure throughput and latency. > What seems to be the problem is that if the test system has a large > number of drives - the system that I am testing here has 28 drives - > then the time accounting seems to go bad for some of the processes. > What you see below is that during the 15 minutes from start, all disks > are getting hit the same, as they should. Then, after 15 minutes, there > are 15 drives that are still running.... after 5 minutes over the > specified 15 minutes, there is still one drive running. Then looking at > the amount of IOs sent to each drive, the ones that ran on that excess > time have much more IOs. FIO still reports that all drives ran for 15 > minutes, although some ran for more than 20 minutes. > > We will attempt to run a single process instead of 28 instances of FIO > to see if this goes away. Could you also check if adding clocksource=gettimeofday makes any difference? This sounds very odd. Assuming this was run with fio -git? -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-20 19:50 ` Jens Axboe @ 2015-11-20 22:20 ` Akash Verma 2015-11-21 0:03 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Akash Verma @ 2015-11-20 22:20 UTC (permalink / raw) To: Jens Axboe; +Cc: Caio Villela, Allen Schade, fio Hi Jens, The issue is not seen with non-cpu clock sources, or when using a single process (with individual threads, the only config I tried). We only see the issue when using multiple processes and the cpu clock source. On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 11/20/2015 12:37 PM, Caio Villela wrote: >> >> Hello Allen and Jens, >> >> Sorry for the long output, this is just in case you want the details. >> Here is a simple explanation for the problem. I want to run a 15 minute >> random write, using 1 Meg requests, and measure throughput and latency. >> What seems to be the problem is that if the test system has a large >> number of drives - the system that I am testing here has 28 drives - >> then the time accounting seems to go bad for some of the processes. >> What you see below is that during the 15 minutes from start, all disks >> are getting hit the same, as they should. Then, after 15 minutes, there >> are 15 drives that are still running.... after 5 minutes over the >> specified 15 minutes, there is still one drive running. Then looking at >> the amount of IOs sent to each drive, the ones that ran on that excess >> time have much more IOs. FIO still reports that all drives ran for 15 >> minutes, although some ran for more than 20 minutes. >> >> We will attempt to run a single process instead of 28 instances of FIO >> to see if this goes away. > > > Could you also check if adding clocksource=gettimeofday makes any > difference? This sounds very odd. > > Assuming this was run with fio -git? > > > -- > Jens Axboe > > -- > To unsubscribe from this list: send the line "unsubscribe fio" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-20 22:20 ` Akash Verma @ 2015-11-21 0:03 ` Jens Axboe 2015-11-21 0:21 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2015-11-21 0:03 UTC (permalink / raw) To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio [-- Attachment #1: Type: text/plain, Size: 2104 bytes --] Hi, OK, I see. Can you pull the latest -git, and then run fio --cpuclock-test on one of the boxes where you see the issue? It should have commit 5896d827e1e2 or later. On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com> wrote: > Hi Jens, > The issue is not seen with non-cpu clock sources, or when using a > single process (with individual threads, the only config I tried). We > only see the issue when using multiple processes and the cpu clock > source. > > On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote: > > On 11/20/2015 12:37 PM, Caio Villela wrote: > >> > >> Hello Allen and Jens, > >> > >> Sorry for the long output, this is just in case you want the details. > >> Here is a simple explanation for the problem. I want to run a 15 minute > >> random write, using 1 Meg requests, and measure throughput and latency. > >> What seems to be the problem is that if the test system has a large > >> number of drives - the system that I am testing here has 28 drives - > >> then the time accounting seems to go bad for some of the processes. > >> What you see below is that during the 15 minutes from start, all disks > >> are getting hit the same, as they should. Then, after 15 minutes, there > >> are 15 drives that are still running.... after 5 minutes over the > >> specified 15 minutes, there is still one drive running. Then looking at > >> the amount of IOs sent to each drive, the ones that ran on that excess > >> time have much more IOs. FIO still reports that all drives ran for 15 > >> minutes, although some ran for more than 20 minutes. > >> > >> We will attempt to run a single process instead of 28 instances of FIO > >> to see if this goes away. > > > > > > Could you also check if adding clocksource=gettimeofday makes any > > difference? This sounds very odd. > > > > Assuming this was run with fio -git? > > > > > > -- > > Jens Axboe > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fio" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > [-- Attachment #2: Type: text/html, Size: 3018 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-21 0:03 ` Jens Axboe @ 2015-11-21 0:21 ` Jens Axboe 2015-11-24 15:51 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2015-11-21 0:21 UTC (permalink / raw) To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio And finally, there's a potential fix, if you run commit 99afcdb53dc3 or later. So please do try that as well, and see if that behaves any better for you. On 11/20/2015 05:03 PM, Jens Axboe wrote: > Hi, > > OK, I see. Can you pull the latest -git, and then run fio > --cpuclock-test on one of the boxes where you see the issue? It should > have commit 5896d827e1e2 or later. > > > On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com > <mailto:akashv@google.com>> wrote: > > Hi Jens, > The issue is not seen with non-cpu clock sources, or when using a > single process (with individual threads, the only config I tried). We > only see the issue when using multiple processes and the cpu clock > source. > > On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk > <mailto:axboe@kernel.dk>> wrote: > > On 11/20/2015 12:37 PM, Caio Villela wrote: > >> > >> Hello Allen and Jens, > >> > >> Sorry for the long output, this is just in case you want the > details. > >> Here is a simple explanation for the problem. I want to run a 15 > minute > >> random write, using 1 Meg requests, and measure throughput and > latency. > >> What seems to be the problem is that if the test system has a large > >> number of drives - the system that I am testing here has 28 drives - > >> then the time accounting seems to go bad for some of the processes. > >> What you see below is that during the 15 minutes from start, all > disks > >> are getting hit the same, as they should. Then, after 15 > minutes, there > >> are 15 drives that are still running.... after 5 minutes over the > >> specified 15 minutes, there is still one drive running. Then > looking at > >> the amount of IOs sent to each drive, the ones that ran on that > excess > >> time have much more IOs. FIO still reports that all drives ran > for 15 > >> minutes, although some ran for more than 20 minutes. > >> > >> We will attempt to run a single process instead of 28 instances > of FIO > >> to see if this goes away. > > > > > > Could you also check if adding clocksource=gettimeofday makes any > > difference? This sounds very odd. > > > > Assuming this was run with fio -git? > > > > > > -- > > Jens Axboe > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fio" in > > the body of a message tomajordomo@vger.kernel.org <mailto:majordomo@vger.kernel.org> > > More majordomo info athttp://vger.kernel.org/majordomo-info.html > > -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-21 0:21 ` Jens Axboe @ 2015-11-24 15:51 ` Jens Axboe 2015-11-24 20:51 ` Akash Verma 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2015-11-24 15:51 UTC (permalink / raw) To: Akash Verma; +Cc: Caio Villela, Allen Schade, fio Did you try current -git yet? I think it should work for both scenarios. It's a silly bug, would be great to have confirmation that it's fixed. Then I'll spin a new release. On 11/20/2015 05:21 PM, Jens Axboe wrote: > And finally, there's a potential fix, if you run commit > 99afcdb53dc3 or later. So please do try that as well, and > see if that behaves any better for you. > > > On 11/20/2015 05:03 PM, Jens Axboe wrote: >> Hi, >> >> OK, I see. Can you pull the latest -git, and then run fio >> --cpuclock-test on one of the boxes where you see the issue? It should >> have commit 5896d827e1e2 or later. >> >> >> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com >> <mailto:akashv@google.com>> wrote: >> >> Hi Jens, >> The issue is not seen with non-cpu clock sources, or when using a >> single process (with individual threads, the only config I tried). We >> only see the issue when using multiple processes and the cpu clock >> source. >> >> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk >> <mailto:axboe@kernel.dk>> wrote: >> > On 11/20/2015 12:37 PM, Caio Villela wrote: >> >> >> >> Hello Allen and Jens, >> >> >> >> Sorry for the long output, this is just in case you want the >> details. >> >> Here is a simple explanation for the problem. I want to run a 15 >> minute >> >> random write, using 1 Meg requests, and measure throughput and >> latency. >> >> What seems to be the problem is that if the test system has a >> large >> >> number of drives - the system that I am testing here has 28 >> drives - >> >> then the time accounting seems to go bad for some of the >> processes. >> >> What you see below is that during the 15 minutes from start, all >> disks >> >> are getting hit the same, as they should. Then, after 15 >> minutes, there >> >> are 15 drives that are still running.... after 5 minutes over the >> >> specified 15 minutes, there is still one drive running. Then >> looking at >> >> the amount of IOs sent to each drive, the ones that ran on that >> excess >> >> time have much more IOs. FIO still reports that all drives ran >> for 15 >> >> minutes, although some ran for more than 20 minutes. >> >> >> >> We will attempt to run a single process instead of 28 instances >> of FIO >> >> to see if this goes away. >> > >> > >> > Could you also check if adding clocksource=gettimeofday makes any >> > difference? This sounds very odd. >> > >> > Assuming this was run with fio -git? >> > >> > >> > -- >> > Jens Axboe >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe fio" in >> > the body of a message tomajordomo@vger.kernel.org >> <mailto:majordomo@vger.kernel.org> >> > More majordomo info athttp://vger.kernel.org/majordomo-info.html >> >> > > -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-24 15:51 ` Jens Axboe @ 2015-11-24 20:51 ` Akash Verma 2015-11-25 1:18 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Akash Verma @ 2015-11-24 20:51 UTC (permalink / raw) To: Jens Axboe, Michael Bella; +Cc: Caio Villela, Allen Schade, fio Sorry for not getting back - I didn't get a chance to try the latest git, and I'm off on vacation soon; I'm ccing Michael and Caio who might have a chance to try it out before Thursday. Michael or Caio, could you try run the two things Jens asked (the cpuclock test using the FIO we've been currently using as well as the latest from Git; and the regular multi-process FIO run with the latest git)? On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote: > Did you try current -git yet? I think it should work for both scenarios. > It's a silly bug, would be great to have confirmation that it's fixed. Then > I'll spin a new release. > > > > On 11/20/2015 05:21 PM, Jens Axboe wrote: >> >> And finally, there's a potential fix, if you run commit >> 99afcdb53dc3 or later. So please do try that as well, and >> see if that behaves any better for you. >> >> >> On 11/20/2015 05:03 PM, Jens Axboe wrote: >>> >>> Hi, >>> >>> OK, I see. Can you pull the latest -git, and then run fio >>> --cpuclock-test on one of the boxes where you see the issue? It should >>> have commit 5896d827e1e2 or later. >>> >>> >>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com >>> <mailto:akashv@google.com>> wrote: >>> >>> Hi Jens, >>> The issue is not seen with non-cpu clock sources, or when using a >>> single process (with individual threads, the only config I tried). We >>> only see the issue when using multiple processes and the cpu clock >>> source. >>> >>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk >>> <mailto:axboe@kernel.dk>> wrote: >>> > On 11/20/2015 12:37 PM, Caio Villela wrote: >>> >> >>> >> Hello Allen and Jens, >>> >> >>> >> Sorry for the long output, this is just in case you want the >>> details. >>> >> Here is a simple explanation for the problem. I want to run a 15 >>> minute >>> >> random write, using 1 Meg requests, and measure throughput and >>> latency. >>> >> What seems to be the problem is that if the test system has a >>> large >>> >> number of drives - the system that I am testing here has 28 >>> drives - >>> >> then the time accounting seems to go bad for some of the >>> processes. >>> >> What you see below is that during the 15 minutes from start, all >>> disks >>> >> are getting hit the same, as they should. Then, after 15 >>> minutes, there >>> >> are 15 drives that are still running.... after 5 minutes over the >>> >> specified 15 minutes, there is still one drive running. Then >>> looking at >>> >> the amount of IOs sent to each drive, the ones that ran on that >>> excess >>> >> time have much more IOs. FIO still reports that all drives ran >>> for 15 >>> >> minutes, although some ran for more than 20 minutes. >>> >> >>> >> We will attempt to run a single process instead of 28 instances >>> of FIO >>> >> to see if this goes away. >>> > >>> > >>> > Could you also check if adding clocksource=gettimeofday makes any >>> > difference? This sounds very odd. >>> > >>> > Assuming this was run with fio -git? >>> > >>> > >>> > -- >>> > Jens Axboe >>> > >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe fio" in >>> > the body of a message tomajordomo@vger.kernel.org >>> <mailto:majordomo@vger.kernel.org> >>> > More majordomo info athttp://vger.kernel.org/majordomo-info.html >>> >>> >> >> > > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-24 20:51 ` Akash Verma @ 2015-11-25 1:18 ` Jens Axboe 2015-12-03 18:54 ` Akash Verma 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2015-11-25 1:18 UTC (permalink / raw) To: Akash Verma, Michael Bella; +Cc: Caio Villela, Allen Schade, fio No worries, I know this week is a bit more problematic than usual. I'll hold off on the new release until I know. On 11/24/2015 01:51 PM, Akash Verma wrote: > Sorry for not getting back - I didn't get a chance to try the latest > git, and I'm off on vacation soon; I'm ccing Michael and Caio who > might have a chance to try it out before Thursday. Michael or Caio, > could you try run the two things Jens asked (the cpuclock test using > the FIO we've been currently using as well as the latest from Git; and > the regular multi-process FIO run with the latest git)? > > On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote: >> Did you try current -git yet? I think it should work for both scenarios. >> It's a silly bug, would be great to have confirmation that it's fixed. Then >> I'll spin a new release. >> >> >> >> On 11/20/2015 05:21 PM, Jens Axboe wrote: >>> >>> And finally, there's a potential fix, if you run commit >>> 99afcdb53dc3 or later. So please do try that as well, and >>> see if that behaves any better for you. >>> >>> >>> On 11/20/2015 05:03 PM, Jens Axboe wrote: >>>> >>>> Hi, >>>> >>>> OK, I see. Can you pull the latest -git, and then run fio >>>> --cpuclock-test on one of the boxes where you see the issue? It should >>>> have commit 5896d827e1e2 or later. >>>> >>>> >>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com >>>> <mailto:akashv@google.com>> wrote: >>>> >>>> Hi Jens, >>>> The issue is not seen with non-cpu clock sources, or when using a >>>> single process (with individual threads, the only config I tried). We >>>> only see the issue when using multiple processes and the cpu clock >>>> source. >>>> >>>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk >>>> <mailto:axboe@kernel.dk>> wrote: >>>> > On 11/20/2015 12:37 PM, Caio Villela wrote: >>>> >> >>>> >> Hello Allen and Jens, >>>> >> >>>> >> Sorry for the long output, this is just in case you want the >>>> details. >>>> >> Here is a simple explanation for the problem. I want to run a 15 >>>> minute >>>> >> random write, using 1 Meg requests, and measure throughput and >>>> latency. >>>> >> What seems to be the problem is that if the test system has a >>>> large >>>> >> number of drives - the system that I am testing here has 28 >>>> drives - >>>> >> then the time accounting seems to go bad for some of the >>>> processes. >>>> >> What you see below is that during the 15 minutes from start, all >>>> disks >>>> >> are getting hit the same, as they should. Then, after 15 >>>> minutes, there >>>> >> are 15 drives that are still running.... after 5 minutes over the >>>> >> specified 15 minutes, there is still one drive running. Then >>>> looking at >>>> >> the amount of IOs sent to each drive, the ones that ran on that >>>> excess >>>> >> time have much more IOs. FIO still reports that all drives ran >>>> for 15 >>>> >> minutes, although some ran for more than 20 minutes. >>>> >> >>>> >> We will attempt to run a single process instead of 28 instances >>>> of FIO >>>> >> to see if this goes away. >>>> > >>>> > >>>> > Could you also check if adding clocksource=gettimeofday makes any >>>> > difference? This sounds very odd. >>>> > >>>> > Assuming this was run with fio -git? >>>> > >>>> > >>>> > -- >>>> > Jens Axboe >>>> > >>>> > -- >>>> > To unsubscribe from this list: send the line "unsubscribe fio" in >>>> > the body of a message tomajordomo@vger.kernel.org >>>> <mailto:majordomo@vger.kernel.org> >>>> > More majordomo info athttp://vger.kernel.org/majordomo-info.html >>>> >>>> >>> >>> >> >> >> -- >> Jens Axboe >> -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-11-25 1:18 ` Jens Axboe @ 2015-12-03 18:54 ` Akash Verma 2015-12-03 18:58 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Akash Verma @ 2015-12-03 18:54 UTC (permalink / raw) To: Jens Axboe; +Cc: Michael Bella, Caio Villela, Allen Schade, fio [-- Attachment #1: Type: text/plain, Size: 4255 bytes --] Jens, I confirmed that the issue is not seen with the latest FIO (I used version fio-2.2.12-15-gcdab). On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@kernel.dk> wrote: > No worries, I know this week is a bit more problematic than usual. I'll > hold off on the new release until I know. > > > > On 11/24/2015 01:51 PM, Akash Verma wrote: > >> Sorry for not getting back - I didn't get a chance to try the latest >> git, and I'm off on vacation soon; I'm ccing Michael and Caio who >> might have a chance to try it out before Thursday. Michael or Caio, >> could you try run the two things Jens asked (the cpuclock test using >> the FIO we've been currently using as well as the latest from Git; and >> the regular multi-process FIO run with the latest git)? >> >> On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk> wrote: >> >>> Did you try current -git yet? I think it should work for both scenarios. >>> It's a silly bug, would be great to have confirmation that it's fixed. >>> Then >>> I'll spin a new release. >>> >>> >>> >>> On 11/20/2015 05:21 PM, Jens Axboe wrote: >>> >>>> >>>> And finally, there's a potential fix, if you run commit >>>> 99afcdb53dc3 or later. So please do try that as well, and >>>> see if that behaves any better for you. >>>> >>>> >>>> On 11/20/2015 05:03 PM, Jens Axboe wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>>> OK, I see. Can you pull the latest -git, and then run fio >>>>> --cpuclock-test on one of the boxes where you see the issue? It should >>>>> have commit 5896d827e1e2 or later. >>>>> >>>>> >>>>> On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma <akashv@google.com >>>>> <mailto:akashv@google.com>> wrote: >>>>> >>>>> Hi Jens, >>>>> The issue is not seen with non-cpu clock sources, or when using a >>>>> single process (with individual threads, the only config I >>>>> tried). We >>>>> only see the issue when using multiple processes and the cpu clock >>>>> source. >>>>> >>>>> On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe <axboe@kernel.dk >>>>> <mailto:axboe@kernel.dk>> wrote: >>>>> > On 11/20/2015 12:37 PM, Caio Villela wrote: >>>>> >> >>>>> >> Hello Allen and Jens, >>>>> >> >>>>> >> Sorry for the long output, this is just in case you want the >>>>> details. >>>>> >> Here is a simple explanation for the problem. I want to run a >>>>> 15 >>>>> minute >>>>> >> random write, using 1 Meg requests, and measure throughput and >>>>> latency. >>>>> >> What seems to be the problem is that if the test system has a >>>>> large >>>>> >> number of drives - the system that I am testing here has 28 >>>>> drives - >>>>> >> then the time accounting seems to go bad for some of the >>>>> processes. >>>>> >> What you see below is that during the 15 minutes from start, >>>>> all >>>>> disks >>>>> >> are getting hit the same, as they should. Then, after 15 >>>>> minutes, there >>>>> >> are 15 drives that are still running.... after 5 minutes over >>>>> the >>>>> >> specified 15 minutes, there is still one drive running. Then >>>>> looking at >>>>> >> the amount of IOs sent to each drive, the ones that ran on >>>>> that >>>>> excess >>>>> >> time have much more IOs. FIO still reports that all drives ran >>>>> for 15 >>>>> >> minutes, although some ran for more than 20 minutes. >>>>> >> >>>>> >> We will attempt to run a single process instead of 28 >>>>> instances >>>>> of FIO >>>>> >> to see if this goes away. >>>>> > >>>>> > >>>>> > Could you also check if adding clocksource=gettimeofday makes >>>>> any >>>>> > difference? This sounds very odd. >>>>> > >>>>> > Assuming this was run with fio -git? >>>>> > >>>>> > >>>>> > -- >>>>> > Jens Axboe >>>>> > >>>>> > -- >>>>> > To unsubscribe from this list: send the line "unsubscribe fio" >>>>> in >>>>> > the body of a message tomajordomo@vger.kernel.org >>>>> <mailto:majordomo@vger.kernel.org> >>>>> > More majordomo info athttp:// >>>>> vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>> >>>> >>> >>> -- >>> Jens Axboe >>> >>> > > -- > Jens Axboe > > [-- Attachment #2: Type: text/html, Size: 5938 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Running a separate fio process for each disk? 2015-12-03 18:54 ` Akash Verma @ 2015-12-03 18:58 ` Jens Axboe 0 siblings, 0 replies; 13+ messages in thread From: Jens Axboe @ 2015-12-03 18:58 UTC (permalink / raw) To: Akash Verma; +Cc: Michael Bella, Caio Villela, Allen Schade, fio Perfect! Thanks for reporting and re-testing. On 12/03/2015 11:54 AM, Akash Verma wrote: > Jens, I confirmed that the issue is not seen with the latest FIO (I used > version fio-2.2.12-15-gcdab). > > On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@kernel.dk > <mailto:axboe@kernel.dk>> wrote: > > No worries, I know this week is a bit more problematic than usual. > I'll hold off on the new release until I know. > > > > On 11/24/2015 01:51 PM, Akash Verma wrote: > > Sorry for not getting back - I didn't get a chance to try the latest > git, and I'm off on vacation soon; I'm ccing Michael and Caio who > might have a chance to try it out before Thursday. Michael or Caio, > could you try run the two things Jens asked (the cpuclock test using > the FIO we've been currently using as well as the latest from > Git; and > the regular multi-process FIO run with the latest git)? > > On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@kernel.dk > <mailto:axboe@kernel.dk>> wrote: > > Did you try current -git yet? I think it should work for > both scenarios. > It's a silly bug, would be great to have confirmation that > it's fixed. Then > I'll spin a new release. > > > > On 11/20/2015 05:21 PM, Jens Axboe wrote: > > > And finally, there's a potential fix, if you run commit > 99afcdb53dc3 or later. So please do try that as well, and > see if that behaves any better for you. > > > On 11/20/2015 05:03 PM, Jens Axboe wrote: > > > Hi, > > OK, I see. Can you pull the latest -git, and then > run fio > --cpuclock-test on one of the boxes where you see > the issue? It should > have commit 5896d827e1e2 or later. > > > On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma > <akashv@google.com <mailto:akashv@google.com> > <mailto:akashv@google.com > <mailto:akashv@google.com>>> wrote: > > Hi Jens, > The issue is not seen with non-cpu clock > sources, or when using a > single process (with individual threads, the > only config I tried). We > only see the issue when using multiple > processes and the cpu clock > source. > > On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe > <axboe@kernel.dk <mailto:axboe@kernel.dk> > <mailto:axboe@kernel.dk > <mailto:axboe@kernel.dk>>> wrote: > > On 11/20/2015 12:37 PM, Caio Villela wrote: > >> > >> Hello Allen and Jens, > >> > >> Sorry for the long output, this is just in > case you want the > details. > >> Here is a simple explanation for the > problem. I want to run a 15 > minute > >> random write, using 1 Meg requests, and > measure throughput and > latency. > >> What seems to be the problem is that if > the test system has a > large > >> number of drives - the system that I am > testing here has 28 > drives - > >> then the time accounting seems to go bad > for some of the > processes. > >> What you see below is that during the 15 > minutes from start, all > disks > >> are getting hit the same, as they should. > Then, after 15 > minutes, there > >> are 15 drives that are still running.... > after 5 minutes over the > >> specified 15 minutes, there is still one > drive running. Then > looking at > >> the amount of IOs sent to each drive, the > ones that ran on that > excess > >> time have much more IOs. FIO still reports > that all drives ran > for 15 > >> minutes, although some ran for more than > 20 minutes. > >> > >> We will attempt to run a single process > instead of 28 instances > of FIO > >> to see if this goes away. > > > > > > Could you also check if adding > clocksource=gettimeofday makes any > > difference? This sounds very odd. > > > > Assuming this was run with fio -git? > > > > > > -- > > Jens Axboe > > > > -- > > To unsubscribe from this list: send the line > "unsubscribe fio" in > > the body of a message > tomajordomo@vger.kernel.org > <mailto:tomajordomo@vger.kernel.org> > <mailto:majordomo@vger.kernel.org > <mailto:majordomo@vger.kernel.org>> > > More majordomo info > athttp://vger.kernel.org/majordomo-info.html > <http://vger.kernel.org/majordomo-info.html> > > > > > > > -- > Jens Axboe > > > > -- > Jens Axboe > > -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-12-03 18:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CADp+U7ibiKciX8_cpzGzob4oL-UF-H+W7kYuiujovD0ba=hM6A@mail.gmail.com>
[not found] ` <56464ACC.9030605@kernel.dk>
2015-11-13 22:04 ` Running a separate fio process for each disk? Allen Schade
2015-11-13 22:06 ` Jens Axboe
2015-11-20 18:28 ` Allen Schade
2015-11-20 19:37 ` Caio Villela
2015-11-20 19:50 ` Jens Axboe
2015-11-20 22:20 ` Akash Verma
2015-11-21 0:03 ` Jens Axboe
2015-11-21 0:21 ` Jens Axboe
2015-11-24 15:51 ` Jens Axboe
2015-11-24 20:51 ` Akash Verma
2015-11-25 1:18 ` Jens Axboe
2015-12-03 18:54 ` Akash Verma
2015-12-03 18:58 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox