All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Thierry BERTAUD <tbertaud@conforama.fr>,
	"fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: Re: fio-2c2c93d and with a lot of numjobs causing fio Jobs to Time Out
Date: Wed, 30 Dec 2015 10:08:16 -0700	[thread overview]
Message-ID: <56840F80.4060903@kernel.dk> (raw)
In-Reply-To: <VI1PR05MB13437814C9CD31D2CECE7398B1FC0@VI1PR05MB1343.eurprd05.prod.outlook.com>

On 12/29/2015 01:29 AM, Thierry BERTAUD wrote:
> Fio team,
>
> During a important workload fio is hanging.
> with iodepth=128 and numjobs=128 , i have the issue but not with both set at 64.
> Environment is AIX 6.1 against SAN disk (hba card 8 Gbps)
>
> Could you take a look at this?
>
> Error message and script file are copy-pasted below.
> # cat myfio.fio
> [global]
> ioengine=posixaio
> invalidate=1
> norandommap
> refill_buffers
> randrepeat=0
> runtime=300
> time_based
> direct=1
> group_reporting
>
> [test]
> stonewall
> filename=/dev/${HDISK}
> rw=${RWTYPE}
> rwmixread=${RRATIO}
> bs=${BS}
> iodepth=${QD}
> numjobs=${NJ}
> write_bw_log=${BS}-${RWTYPE}-${RRATIO}R-${QD}q-${QD}-${hdisk}_bw.results
> write_iops_log=${BS}-${RWTYPE}-${RRATIO}R-${QD}q-${QD}-${hdisk}_iops.results
> write_lat_log=${BS}-${RWTYPE}-${RRATIO}R-${QD}q-${QD}-${hdisk}_lat.results
> #
>
> Command:
> # HDISK=hdisk3 RWTYPE=rw RRATIO=0 BS=65536 QD=128 NJ=128 /home/padmin/fio/fio-2c2c93d/fio myfio.fio
>
> PS: Output:
> # ps -ef | grep fio | wc -l
>       130
> # ps -ef | grep fio | grep -v grep | wc -l
>       130
> # nmon
> # ps -ef | grep fio | grep -v grep | wc -l
>       130
> # ps -ef | grep fio | grep -v grep | wc -l
>       130
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 02:59:07 CET 2015
>        17
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 02:59:12 CET 2015
>        17
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:01:35 CET 2015
>         1
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:04:12 CET 2015
>        10
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:10:17 CET 2015
>         1
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:12:18 CET 2015
>        18
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:13:39 CET 2015
>        18
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:20:52 CET 2015
>        34
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:25:03 CET 2015
>        66
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 03:26:43 CET 2015
>        66
> #
> # date;ps -ef | grep fio | grep -v grep | wc -l
> Tue Dec 29 09:15:05 CET 2015
>         2
> #
>
> Command output:
> # HDISK=hdisk3 RWTYPE=rw RRATIO=0 BS=65536 QD=128 NJ=128 /home/padmin/fio/fio-2c2c93d/fio myfio.fio
> test: (g=0): rw=rw, bs=64K-64K/64K-64K/64K-64K, ioengine=posixaio, iodepth=128
> ...
> fio-2.3
> Starting 128 processes
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.0 iops] [eta 17m:00s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.12.9% done] [0KB/1991KB/0KB /s] [0/31/0 iops] [eta 41m:01s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 115 (f=115): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(14),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(4),_(1),W(1),_(1),W(1),_(1),W(43)] [7.3% done] [0KB/11662KBJobs: 115 (f=115): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(14),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(4),_(1),W(1),_(1),W(1),_(1),W(43)] [7.3% done] [0KB/17351KBJobs: 115 (f=115): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(14),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(4),_(1),W(1),_(1),W(1),_(1),W(43)] [7.3% done] [0KB/6328KB/Jobs: 115 (f=115): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(14),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(4),_(1),W(1),_(1),W(1),_(1),W(43)] [7.3% done] [0KB/7111KB/fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 113 (f=113): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(4),_(1),W(1),_(1),W(1),_(1),W(43)] [6.4%fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 111 (f=111): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(4),_(1),W(1),_(1),W(1),_(1),W(13),_(1),Wfio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 108 (f=108): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(13),_(1),WJobs: 108 (f=108): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(13),_(1),WJobs: 108 (f=108): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(13),_(1),Wfio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 106 (f=106): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(5),_(1),W(fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.:12m:03s]
> Jobs: 105 (f=105): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(1),_(1),W(Jobs: 105 (f=105): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(1),_(1),W(fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. 02h:18m:13s]
> Jobs: 104 (f=104): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(1),_(1),W(Jobs: 104 (f=104): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(1),_(1),W(fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job./0 iops] [eta 02h:24m:22s]
> Jobs: 103 (f=103): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(1),_(1),W(Jobs: 103 (f=103): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(3),W(1),_(1),W(fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.ops] [eta 02h:30m:34s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 100 (f=100): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.10168KB/0KB /s] [0/158/0 iops] [eta 02h:49m:19s]
> Jobs: 99 (f=99): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(2)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.933KB/0KB /s] [0/233/0 iops] [eta 02h:55m:35s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 97 (f=97): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(2)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. [3.3% done] [0KB/9376KB/0KB /s] [0/146/0 iops] [eta 03h:08m:09s]
> Jobs: 96 (f=96): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(2)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job._(1),W(2)] [3.2% done] [0KB/1493KB/0KB /s] [0/23/0 iops] [eta 03h:14m:27s]
> Jobs: 95 (f=95): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(1),W(1),_(1),W(2),_(1),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(2),_(2),W(3),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(3),_(1),W(1),_(1),W(3),_(1),W(3),_(1),W(5),_(2),W(1),_(1),W(2)] [3.1% done] [0KB/15573KB/0KB /s] [0/243/0 iops] [eta 03h:20m:4fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 94 (f=94): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(3),W(2),_(1),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(2),_(2),W(3),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(3),_(1),W(1),_(1),W(3),_(1),W(3),_(1),W(5),_(2),W(1),_(1),W(2)] [3.0% done] [0KB/11733KB/0KB /s] [0/183/0 iops] [eta 03h:27m:05s]       fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 93 (f=93): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(3),_(1),W(1)Jobs: 93 (f=93): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(3),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. [2.9% done] [0KB/8675KB/0KB /s] [0/135/0 iops] [eta 03h:33m:23s]
> Jobs: 92 (f=92): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job._(1),W(2)] [2.8% done] [0KB/1207KB/0KB /s] [0/18/0 iops] [eta 03h:39m:44s]
> Jobs: 91 (f=91): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)Jobs: 91 (f=91): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(1),W(3),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job._(1),W(2)] [2.8% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03h:46m:06s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 88 (f=88): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job._(1),W(2)] [2.6% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 04h:05m:21s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 86 (f=86): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. [2.4% done] [0KB/19697KB/0KB /s] [0/307/0 iops] [eta 04h:18m:12s]
> Jobs: 85 (f=85): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job._(1)] [2.4% done] [0KB/18062KB/0KB /s] [0/282/0 iops] [eta 04h:24m:37s]
> Jobs: 84 (f=84): [W(5),_(1),W(10),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1)fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.% done] [0KB/8736KB/0KB /s] [0/136/0 iops] [eta 04h:31m:03s]
> Jobs: 83 (f=83): [W(5),_(1),W(5),_(1),W(4),_(1),W(11),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.(1)] [2.3% done] [0KB/34379KB/0KB /s] [0/537/0 iops] [eta 04h:37m:31s]
> Jobs: 82 (f=82): [W(5),_(1),W(5),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.1),W(1),_(1)] [2.2% done] [0KB/9447KB/0KB /s] [0/147/0 iops] [eta 04h:44m:00s]
> Jobs: 81 (f=81): [W(5),_(1),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(1),W(6),_(1),W(4),_(1),W(5),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(3),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.2% done] [0KB/9173KB/0KB /s] [0/143/0 iops] [eta 04h:50mfio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 78 (f=78): [W(5),_(1),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.1% done] [0KB/4830KB/0KB /s] [0/75/0fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 77 (f=77): [W(4),_(2),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.0% done] [0KB/16763KB/0KB /s] [0/261Jobs: 77 (f=77): [W(4),_(2),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.0% done] [0KB/0KB/0KB /s] [0/0/0 iopJobs: 77 (f=77): [W(4),_(2),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3
),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.0% done] [0KB/11377KB/0KB /s] [0/177Jobs: 77 (f=77): [W(4),_(2),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(8),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.1% done] [0KB/12017KB/0KB /s] [0/187fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 76 (f=76): [W(4),_(2),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(5),_(1),W(2),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.0% done] [0KB/22162KB/0KB fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 75 (f=75): [W(2),_(1),W(1),_(2),W(1),_(1),W(3),_(1),W(4),_(1),W(2),_(1),W(5),_(1),W(2),_(1),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [2.0% done] [0KB/10fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 72 (f=72): [W(2),_(1),W(1),_(5),W(2),_(1),W(4),_(1),W(2),_(1),W(5),_(1),W(1),_(2),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.9% done] [0KB/6250KB/0KB /fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 70 (f=70): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(2),_(1),W(2),_(1),W(5),_(1),W(1),_(2),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.8% done] [0KB/10fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 69 (f=69): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(1),_(2),W(2),_(1),W(5),_(1),W(1),_(2),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.8% done] [0KB/78Jobs: 69 (f=69): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(1),_(2),W(2),_(1),W(5),_(1),W(1),_(2),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(3),W(1),_(1),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.8% done] [0KB/12fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 68 (f=68): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(1),_(2),W(2),_(1),W(5),_(1),W(1),_(2),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(5),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.8% done] [0KB/1280KB/0KB /Jobs: 68 (f=68): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(1),_(2),W(2),_(1),W(5),_(1),W(1),_(2),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(1),W(2),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(2),f(1),_(5),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.8% done] [0KB/8097KB/fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 65 (f=65): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(1),_(2),W(2),_(1),W(5),_(4),W(6),_(2),W(5),_(1),W(4),_(1),W(2),_(2),W(1),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(2),W(1),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(5),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.7% done] [0KB/11649KB/0KB /s] [0/182fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 62 (f=62): [W(2),_(1),W(1),_(5),W(1),_(2),W(1),_(1),W(1),_(3),W(1),_(1),W(5),_(4),W(5),_(3),W(5),_(1),W(3),_(2),W(2),_(2),W(1),_(1),W(3),_(1),W(8),_(2),W(2),_(3),W(2),_(1),W(1),_(4),W(1),_(3),W(1),_(2),W(1),_(2),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(1),W(1),_(3),W(3),_(5),W(1),_(1),W(2),_(4),W(2),_(3),W(1),_(1),W(1),_(1)] [1.6% done] [0KB/6257KB/0KB /s] [0/97/0fio: job 'test' hasn't exited in 103 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 106 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 104 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 105 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 104 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 105 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 105 seconds, it appears to be stuck. Doing forceful exit of this job.
> fio: job 'test' hasn't exited in 104 seconds, it appears to be stuck. Doing forceful exit of this job.
> Jobs: 4 (f=4): [_(55),W(1),_(13),W(1),_(6),W(1),_(42),W(1),_(8)] [0.5% done] [0KB/105.8MB/0KB /s] [0/1681/0 iops] [eta 21h:01m:48s]                                                                                                                                                                                                                                             fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.iops] [eta 21h:01m:02s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job. 21h:08m:31s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.3s]
> fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.
>
> WHat the process do:
> # truss -af -p 3015098
> 3015098:        psargs: fio myfio.fio
> 3015098:                                                        = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> 3015098:        117309669: statx("/tmp/fio-dump-status", 0x2FF20648, 128, 010) Err#2  ENOENT
> 3015098:        117309669: _nsleep(0x2FF20A60, 0x2FF20AD8)      = 0
> Pstatus: process is not stopped

That's the main thread, what are the IO threads doing?

> # truss -acf -p 3015098
> psargs: fio myfio.fio
> Pstatus: process is not stopped
> syscall               seconds   calls  errors
> _nsleep        10.44    1423
> kioctl           .00      56     56
> statx            .00    1367   1367
>                           ----     ---    ---
> sys totals:               .00    2846   1423
> usr time:                 .00
> elapsed:                  .00
> #
>
> The file not exist in /tmp but the program look for it!!!
> # ls -l /tmp/fio-dump-status
> ls: 0653-341 The file /tmp/fio-dump-status does not exist.
> #

That's expected - when the main thread goes to sleep, it checks for this 
status dump file. If it exists, then it'll show the current summary of 
stats, then unlink it.

-- 
Jens Axboe



  parent reply	other threads:[~2015-12-30 17:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-29  8:29 fio-2c2c93d and with a lot of numjobs causing fio Jobs to Time Out Thierry BERTAUD
2015-12-29 14:30 ` Thierry BERTAUD
2015-12-30 17:08 ` Jens Axboe [this message]
2015-12-30 18:34   ` Thierry BERTAUD
2015-12-31  4:02     ` Jens Axboe
2015-12-31 16:18       ` Thierry BERTAUD
2015-12-31 16:33         ` Andrey Kuzmin
2016-01-01 12:43           ` Thierry BERTAUD
2016-01-01 12:46             ` Thierry BERTAUD
2016-01-01 12:54             ` Andrey Kuzmin
2016-01-01 14:00               ` Thierry BERTAUD
2016-01-01 14:04                 ` Andrey Kuzmin
2016-01-01 14:49                   ` Thierry BERTAUD
2016-01-01 16:19                     ` Andrey Kuzmin
2016-01-01 17:31                       ` Thierry BERTAUD
2016-01-18 23:58                         ` How to stress a 8Gbps card to get 500 000 IOPS or 1600 MB/s Thierry BERTAUD
2016-01-19  0:15                           ` David Nellans
2016-01-19  0:29                             ` Thierry BERTAUD
     [not found]                               ` <CAB-428m2HJBUPdtYsCKtxfYPuscV1i=ucCvCAPh4ShqWPNmTXw@mail.gmail.com>
     [not found]                                 ` <VI1PR05MB1343CA9499CC6AB3B7A0D2C0B1C10@VI1PR05MB1343.eurprd05.prod.outlook.com>
2016-01-19  2:45                                   ` Alireza Haghdoost
2016-01-19  0:33                             ` Kulkarni, Vasu
2016-01-19  0:40                               ` David Nellans
2016-01-19  1:04                                 ` Thierry BERTAUD
2016-01-19  1:15                                   ` Alireza Haghdoost
2016-01-19  1:59                                     ` Thierry BERTAUD

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56840F80.4060903@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=tbertaud@conforama.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.