Flexible I/O Tester development
 help / color / mirror / Atom feed
* Shared files with thread=1: func=xfer, err=Bad file descriptor
@ 2014-04-01  1:08 Andrey Kuzmin
  2014-04-01 14:31 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Kuzmin @ 2014-04-01  1:08 UTC (permalink / raw)
  To: fio; +Cc: Jens Axboe

The following simple setup
---
[global]
ioengine=psync
direct=1
sync=1

[randrwrite]
filename=/tmp/test.dat
lockfile=exclusive
filesize=1m
;
thread=1
numjobs=4
runtime=10
time_based=1
;
rw=randwrite
bs=4k
---

runs fine with jobs being processes, and fails as follows with thread=1.

randrwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
fio-2.1.6.1-23-g9f97
Starting 4 threads
fio: pid=10455, err=9/file:engines/sync.c:67, func=xfer, error=Bad
file descriptor]
fio: pid=10453, err=9/file:engines/sync.c:67, func=xfer, error=Bad
file descriptor
fio: pid=10454, err=9/file:engines/sync.c:67, func=xfer, error=Bad
file descriptor
Jobs: 1 (f=1): [wXXX] [100.0% done] [0KB/2372KB/0KB /s] [0/593/0 iops]
[eta 00m:00s]
randrwrite: (groupid=0, jobs=1): err= 0: pid=10452: Mon Mar 31 18:05:26 2014
  write: io=14900KB, bw=1489.9KB/s, iops=372, runt= 10001msec
    clat (usec): min=419, max=260739, avg=1546.00, stdev=4329.54
     lat (usec): min=424, max=260744, avg=1551.95, stdev=4329.58
    clat percentiles (usec):
     |  1.00th=[  588],  5.00th=[  732], 10.00th=[  780], 20.00th=[  828],
     | 30.00th=[  876], 40.00th=[  956], 50.00th=[ 1080], 60.00th=[ 1432],
     | 70.00th=[ 2064], 80.00th=[ 2256], 90.00th=[ 2480], 95.00th=[ 2736],
     | 99.00th=[ 3344], 99.50th=[ 3632], 99.90th=[ 4512], 99.95th=[23680],
     | 99.99th=[261120]
    bw (KB  /s): min=    5, max= 1099, per=22.09%, avg=519.90, stdev=359.34
    lat (usec) : 500=0.19%, 750=6.42%, 1000=37.61%
    lat (msec) : 2=23.68%, 4=31.81%, 10=0.24%, 50=0.03%, 500=0.03%
  cpu          : usr=0.76%, sys=8.08%, ctx=8191, majf=0, minf=5
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=3725/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
func=xfer, error=Bad file descriptor): pid=10453: Mon Mar 31 18:05:26
2014
  write: io=3072.0KB, bw=581465B/s, iops=142, runt=  5410msec
    clat (usec): min=587, max=33827, avg=1817.50, stdev=2464.14
     lat (usec): min=593, max=33837, avg=1823.85, stdev=2464.47
    clat percentiles (usec):
     |  1.00th=[  804],  5.00th=[  876], 10.00th=[  932], 20.00th=[  988],
     | 30.00th=[ 1048], 40.00th=[ 1096], 50.00th=[ 1208], 60.00th=[ 2024],
     | 70.00th=[ 2224], 80.00th=[ 2352], 90.00th=[ 2576], 95.00th=[ 2896],
     | 99.00th=[ 4080], 99.50th=[32640], 99.90th=[34048], 99.95th=[34048],
     | 99.99th=[34048]
    bw (KB  /s): min=  165, max= 1266, per=25.28%, avg=595.00, stdev=450.04
    lat (usec) : 750=0.39%, 1000=22.76%
    lat (msec) : 2=36.67%, 4=38.88%, 10=0.52%, 20=0.13%, 50=0.52%
  cpu          : usr=0.15%, sys=5.69%, ctx=2288, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=769/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
func=xfer, error=Bad file descriptor): pid=10454: Mon Mar 31 18:05:26
2014
  write: io=2528.0KB, bw=478497B/s, iops=117, runt=  5410msec
    clat (usec): min=299, max=33920, avg=2106.79, stdev=3775.38
     lat (usec): min=303, max=33930, avg=2113.60, stdev=3775.68
    clat percentiles (usec):
     |  1.00th=[  564],  5.00th=[  820], 10.00th=[  932], 20.00th=[  996],
     | 30.00th=[ 1064], 40.00th=[ 1160], 50.00th=[ 1256], 60.00th=[ 1976],
     | 70.00th=[ 2224], 80.00th=[ 2384], 90.00th=[ 2672], 95.00th=[ 3056],
     | 99.00th=[33024], 99.50th=[33024], 99.90th=[34048], 99.95th=[34048],
     | 99.99th=[34048]
    bw (KB  /s): min=   46, max=  728, per=18.61%, avg=438.17, stdev=265.37
    lat (usec) : 500=0.16%, 750=3.63%, 1000=16.59%
    lat (msec) : 2=39.81%, 4=37.28%, 10=0.63%, 20=0.32%, 50=1.42%
  cpu          : usr=0.30%, sys=5.03%, ctx=2048, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.2%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=633/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
func=xfer, error=Bad file descriptor): pid=10455: Mon Mar 31 18:05:26
2014
  write: io=3052.0KB, bw=583286B/s, iops=142, runt=  5358msec
    clat (usec): min=541, max=24712, avg=1707.90, stdev=1533.18
     lat (usec): min=547, max=24743, avg=1714.89, stdev=1533.64
    clat percentiles (usec):
     |  1.00th=[  796],  5.00th=[  876], 10.00th=[  908], 20.00th=[  980],
     | 30.00th=[ 1048], 40.00th=[ 1128], 50.00th=[ 1240], 60.00th=[ 1736],
     | 70.00th=[ 2224], 80.00th=[ 2384], 90.00th=[ 2640], 95.00th=[ 2896],
     | 99.00th=[ 3280], 99.50th=[ 4640], 99.90th=[24704], 99.95th=[24704],
     | 99.99th=[24704]
    bw (KB  /s): min=  204, max= 1009, per=20.39%, avg=480.00, stdev=283.93
    lat (usec) : 750=0.65%, 1000=22.12%
    lat (msec) : 2=38.35%, 4=38.09%, 10=0.26%, 50=0.39%
  cpu          : usr=0.37%, sys=5.45%, ctx=2261, majf=0, minf=5
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=764/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=23552KB, aggrb=2354KB/s, minb=467KB/s, maxb=1489KB/s,
mint=5358msec, maxt=10001msec

Disk stats (read/write):
  sda: ios=0/13818, merge=0/2227, ticks=0/7428, in_queue=7260, util=72.37%

Is file sharing with non-trivial lockfile= setting is expected to work
with threads=1, or is it an unanticipated use case?

Regards,
Andrey


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared files with thread=1: func=xfer, err=Bad file descriptor
  2014-04-01  1:08 Shared files with thread=1: func=xfer, err=Bad file descriptor Andrey Kuzmin
@ 2014-04-01 14:31 ` Jens Axboe
  2014-04-01 15:56   ` Andrey Kuzmin
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2014-04-01 14:31 UTC (permalink / raw)
  To: Andrey Kuzmin, fio

On 03/31/2014 07:08 PM, Andrey Kuzmin wrote:
> The following simple setup
> ---
> [global]
> ioengine=psync
> direct=1
> sync=1
>
> [randrwrite]
> filename=/tmp/test.dat
> lockfile=exclusive
> filesize=1m
> ;
> thread=1
> numjobs=4
> runtime=10
> time_based=1
> ;
> rw=randwrite
> bs=4k
> ---
>
> runs fine with jobs being processes, and fails as follows with thread=1.
>
> randrwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
> ...
> fio-2.1.6.1-23-g9f97
> Starting 4 threads
> fio: pid=10455, err=9/file:engines/sync.c:67, func=xfer, error=Bad
> file descriptor]
> fio: pid=10453, err=9/file:engines/sync.c:67, func=xfer, error=Bad
> file descriptor
> fio: pid=10454, err=9/file:engines/sync.c:67, func=xfer, error=Bad
> file descriptor
> Jobs: 1 (f=1): [wXXX] [100.0% done] [0KB/2372KB/0KB /s] [0/593/0 iops]
> [eta 00m:00s]
> randrwrite: (groupid=0, jobs=1): err= 0: pid=10452: Mon Mar 31 18:05:26 2014
>    write: io=14900KB, bw=1489.9KB/s, iops=372, runt= 10001msec
>      clat (usec): min=419, max=260739, avg=1546.00, stdev=4329.54
>       lat (usec): min=424, max=260744, avg=1551.95, stdev=4329.58
>      clat percentiles (usec):
>       |  1.00th=[  588],  5.00th=[  732], 10.00th=[  780], 20.00th=[  828],
>       | 30.00th=[  876], 40.00th=[  956], 50.00th=[ 1080], 60.00th=[ 1432],
>       | 70.00th=[ 2064], 80.00th=[ 2256], 90.00th=[ 2480], 95.00th=[ 2736],
>       | 99.00th=[ 3344], 99.50th=[ 3632], 99.90th=[ 4512], 99.95th=[23680],
>       | 99.99th=[261120]
>      bw (KB  /s): min=    5, max= 1099, per=22.09%, avg=519.90, stdev=359.34
>      lat (usec) : 500=0.19%, 750=6.42%, 1000=37.61%
>      lat (msec) : 2=23.68%, 4=31.81%, 10=0.24%, 50=0.03%, 500=0.03%
>    cpu          : usr=0.76%, sys=8.08%, ctx=8191, majf=0, minf=5
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued    : total=r=0/w=3725/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
> randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
> func=xfer, error=Bad file descriptor): pid=10453: Mon Mar 31 18:05:26
> 2014
>    write: io=3072.0KB, bw=581465B/s, iops=142, runt=  5410msec
>      clat (usec): min=587, max=33827, avg=1817.50, stdev=2464.14
>       lat (usec): min=593, max=33837, avg=1823.85, stdev=2464.47
>      clat percentiles (usec):
>       |  1.00th=[  804],  5.00th=[  876], 10.00th=[  932], 20.00th=[  988],
>       | 30.00th=[ 1048], 40.00th=[ 1096], 50.00th=[ 1208], 60.00th=[ 2024],
>       | 70.00th=[ 2224], 80.00th=[ 2352], 90.00th=[ 2576], 95.00th=[ 2896],
>       | 99.00th=[ 4080], 99.50th=[32640], 99.90th=[34048], 99.95th=[34048],
>       | 99.99th=[34048]
>      bw (KB  /s): min=  165, max= 1266, per=25.28%, avg=595.00, stdev=450.04
>      lat (usec) : 750=0.39%, 1000=22.76%
>      lat (msec) : 2=36.67%, 4=38.88%, 10=0.52%, 20=0.13%, 50=0.52%
>    cpu          : usr=0.15%, sys=5.69%, ctx=2288, majf=0, minf=0
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued    : total=r=0/w=769/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
> randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
> func=xfer, error=Bad file descriptor): pid=10454: Mon Mar 31 18:05:26
> 2014
>    write: io=2528.0KB, bw=478497B/s, iops=117, runt=  5410msec
>      clat (usec): min=299, max=33920, avg=2106.79, stdev=3775.38
>       lat (usec): min=303, max=33930, avg=2113.60, stdev=3775.68
>      clat percentiles (usec):
>       |  1.00th=[  564],  5.00th=[  820], 10.00th=[  932], 20.00th=[  996],
>       | 30.00th=[ 1064], 40.00th=[ 1160], 50.00th=[ 1256], 60.00th=[ 1976],
>       | 70.00th=[ 2224], 80.00th=[ 2384], 90.00th=[ 2672], 95.00th=[ 3056],
>       | 99.00th=[33024], 99.50th=[33024], 99.90th=[34048], 99.95th=[34048],
>       | 99.99th=[34048]
>      bw (KB  /s): min=   46, max=  728, per=18.61%, avg=438.17, stdev=265.37
>      lat (usec) : 500=0.16%, 750=3.63%, 1000=16.59%
>      lat (msec) : 2=39.81%, 4=37.28%, 10=0.63%, 20=0.32%, 50=1.42%
>    cpu          : usr=0.30%, sys=5.03%, ctx=2048, majf=0, minf=2
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.2%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued    : total=r=0/w=633/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
> randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
> func=xfer, error=Bad file descriptor): pid=10455: Mon Mar 31 18:05:26
> 2014
>    write: io=3052.0KB, bw=583286B/s, iops=142, runt=  5358msec
>      clat (usec): min=541, max=24712, avg=1707.90, stdev=1533.18
>       lat (usec): min=547, max=24743, avg=1714.89, stdev=1533.64
>      clat percentiles (usec):
>       |  1.00th=[  796],  5.00th=[  876], 10.00th=[  908], 20.00th=[  980],
>       | 30.00th=[ 1048], 40.00th=[ 1128], 50.00th=[ 1240], 60.00th=[ 1736],
>       | 70.00th=[ 2224], 80.00th=[ 2384], 90.00th=[ 2640], 95.00th=[ 2896],
>       | 99.00th=[ 3280], 99.50th=[ 4640], 99.90th=[24704], 99.95th=[24704],
>       | 99.99th=[24704]
>      bw (KB  /s): min=  204, max= 1009, per=20.39%, avg=480.00, stdev=283.93
>      lat (usec) : 750=0.65%, 1000=22.12%
>      lat (msec) : 2=38.35%, 4=38.09%, 10=0.26%, 50=0.39%
>    cpu          : usr=0.37%, sys=5.45%, ctx=2261, majf=0, minf=5
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued    : total=r=0/w=764/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    WRITE: io=23552KB, aggrb=2354KB/s, minb=467KB/s, maxb=1489KB/s,
> mint=5358msec, maxt=10001msec
>
> Disk stats (read/write):
>    sda: ios=0/13818, merge=0/2227, ticks=0/7428, in_queue=7260, util=72.37%
>
> Is file sharing with non-trivial lockfile= setting is expected to work
> with threads=1, or is it an unanticipated use case?

No, that's supposed to work. I think it's a race between shared files, 
and opening/closing of them when you have a timed based job. I'll take a 
look at it.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared files with thread=1: func=xfer, err=Bad file descriptor
  2014-04-01 14:31 ` Jens Axboe
@ 2014-04-01 15:56   ` Andrey Kuzmin
  2014-04-01 18:24     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Kuzmin @ 2014-04-01 15:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Thanks. On a separate (or may be related) note, shared files handling
in dup_files seems to be broken (unless I'm missing the logic behind
dropping non-shared files). The patch below has fixed it for me.

diff --git a/init.c b/init.c
index 9b0666d..99038b2 100644
--- a/init.c
+++ b/init.c
@@ -1160,17 +1161,20 @@ static int add_job(struct thread_data *td, const char *j
                td_new->o.new_group = 0;

                if (file_alloced) {
-                       td_new->files_index = 0;
-                       td_new->files_size = 0;
                        if (td_new->files) {
                                struct fio_file *f;
                                for_each_file(td_new, f, i) {
                                        if (f->file_name)
-                                               free(f->file_name);
-                                       free(f);
+                                               sfree(f->file_name);
+                                       sfree(f);
                                }
+                               free(td_new->files);
                                td_new->files = NULL;
                        }
+                       td_new->files_index = 0;
+                       td_new->files_size = 0;
                        if (td_new->o.filename) {
                                free(td_new->o.filename);
                                td_new->o.filename = NULL;
Regards,
Andrey


On Tue, Apr 1, 2014 at 7:31 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 03/31/2014 07:08 PM, Andrey Kuzmin wrote:
>>
>> The following simple setup
>> ---
>> [global]
>> ioengine=psync
>> direct=1
>> sync=1
>>
>> [randrwrite]
>> filename=/tmp/test.dat
>> lockfile=exclusive
>> filesize=1m
>> ;
>> thread=1
>> numjobs=4
>> runtime=10
>> time_based=1
>> ;
>> rw=randwrite
>> bs=4k
>> ---
>>
>> runs fine with jobs being processes, and fails as follows with thread=1.
>>
>> randrwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=psync,
>> iodepth=1
>> ...
>> fio-2.1.6.1-23-g9f97
>> Starting 4 threads
>> fio: pid=10455, err=9/file:engines/sync.c:67, func=xfer, error=Bad
>> file descriptor]
>> fio: pid=10453, err=9/file:engines/sync.c:67, func=xfer, error=Bad
>> file descriptor
>> fio: pid=10454, err=9/file:engines/sync.c:67, func=xfer, error=Bad
>> file descriptor
>> Jobs: 1 (f=1): [wXXX] [100.0% done] [0KB/2372KB/0KB /s] [0/593/0 iops]
>> [eta 00m:00s]
>> randrwrite: (groupid=0, jobs=1): err= 0: pid=10452: Mon Mar 31 18:05:26
>> 2014
>>    write: io=14900KB, bw=1489.9KB/s, iops=372, runt= 10001msec
>>      clat (usec): min=419, max=260739, avg=1546.00, stdev=4329.54
>>       lat (usec): min=424, max=260744, avg=1551.95, stdev=4329.58
>>      clat percentiles (usec):
>>       |  1.00th=[  588],  5.00th=[  732], 10.00th=[  780], 20.00th=[
>> 828],
>>       | 30.00th=[  876], 40.00th=[  956], 50.00th=[ 1080], 60.00th=[
>> 1432],
>>       | 70.00th=[ 2064], 80.00th=[ 2256], 90.00th=[ 2480], 95.00th=[
>> 2736],
>>       | 99.00th=[ 3344], 99.50th=[ 3632], 99.90th=[ 4512],
>> 99.95th=[23680],
>>       | 99.99th=[261120]
>>      bw (KB  /s): min=    5, max= 1099, per=22.09%, avg=519.90,
>> stdev=359.34
>>      lat (usec) : 500=0.19%, 750=6.42%, 1000=37.61%
>>      lat (msec) : 2=23.68%, 4=31.81%, 10=0.24%, 50=0.03%, 500=0.03%
>>    cpu          : usr=0.76%, sys=8.08%, ctx=8191, majf=0, minf=5
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       issued    : total=r=0/w=3725/d=0, short=r=0/w=0/d=0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>> randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
>> func=xfer, error=Bad file descriptor): pid=10453: Mon Mar 31 18:05:26
>> 2014
>>    write: io=3072.0KB, bw=581465B/s, iops=142, runt=  5410msec
>>      clat (usec): min=587, max=33827, avg=1817.50, stdev=2464.14
>>       lat (usec): min=593, max=33837, avg=1823.85, stdev=2464.47
>>      clat percentiles (usec):
>>       |  1.00th=[  804],  5.00th=[  876], 10.00th=[  932], 20.00th=[
>> 988],
>>       | 30.00th=[ 1048], 40.00th=[ 1096], 50.00th=[ 1208], 60.00th=[
>> 2024],
>>       | 70.00th=[ 2224], 80.00th=[ 2352], 90.00th=[ 2576], 95.00th=[
>> 2896],
>>       | 99.00th=[ 4080], 99.50th=[32640], 99.90th=[34048],
>> 99.95th=[34048],
>>       | 99.99th=[34048]
>>      bw (KB  /s): min=  165, max= 1266, per=25.28%, avg=595.00,
>> stdev=450.04
>>      lat (usec) : 750=0.39%, 1000=22.76%
>>      lat (msec) : 2=36.67%, 4=38.88%, 10=0.52%, 20=0.13%, 50=0.52%
>>    cpu          : usr=0.15%, sys=5.69%, ctx=2288, majf=0, minf=0
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       issued    : total=r=0/w=769/d=0, short=r=0/w=0/d=0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>> randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
>> func=xfer, error=Bad file descriptor): pid=10454: Mon Mar 31 18:05:26
>> 2014
>>    write: io=2528.0KB, bw=478497B/s, iops=117, runt=  5410msec
>>      clat (usec): min=299, max=33920, avg=2106.79, stdev=3775.38
>>       lat (usec): min=303, max=33930, avg=2113.60, stdev=3775.68
>>      clat percentiles (usec):
>>       |  1.00th=[  564],  5.00th=[  820], 10.00th=[  932], 20.00th=[
>> 996],
>>       | 30.00th=[ 1064], 40.00th=[ 1160], 50.00th=[ 1256], 60.00th=[
>> 1976],
>>       | 70.00th=[ 2224], 80.00th=[ 2384], 90.00th=[ 2672], 95.00th=[
>> 3056],
>>       | 99.00th=[33024], 99.50th=[33024], 99.90th=[34048],
>> 99.95th=[34048],
>>       | 99.99th=[34048]
>>      bw (KB  /s): min=   46, max=  728, per=18.61%, avg=438.17,
>> stdev=265.37
>>      lat (usec) : 500=0.16%, 750=3.63%, 1000=16.59%
>>      lat (msec) : 2=39.81%, 4=37.28%, 10=0.63%, 20=0.32%, 50=1.42%
>>    cpu          : usr=0.30%, sys=5.03%, ctx=2048, majf=0, minf=2
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       complete  : 0=0.2%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       issued    : total=r=0/w=633/d=0, short=r=0/w=0/d=0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>> randrwrite: (groupid=0, jobs=1): err= 9 (file:engines/sync.c:67,
>> func=xfer, error=Bad file descriptor): pid=10455: Mon Mar 31 18:05:26
>> 2014
>>    write: io=3052.0KB, bw=583286B/s, iops=142, runt=  5358msec
>>      clat (usec): min=541, max=24712, avg=1707.90, stdev=1533.18
>>       lat (usec): min=547, max=24743, avg=1714.89, stdev=1533.64
>>      clat percentiles (usec):
>>       |  1.00th=[  796],  5.00th=[  876], 10.00th=[  908], 20.00th=[
>> 980],
>>       | 30.00th=[ 1048], 40.00th=[ 1128], 50.00th=[ 1240], 60.00th=[
>> 1736],
>>       | 70.00th=[ 2224], 80.00th=[ 2384], 90.00th=[ 2640], 95.00th=[
>> 2896],
>>       | 99.00th=[ 3280], 99.50th=[ 4640], 99.90th=[24704],
>> 99.95th=[24704],
>>       | 99.99th=[24704]
>>      bw (KB  /s): min=  204, max= 1009, per=20.39%, avg=480.00,
>> stdev=283.93
>>      lat (usec) : 750=0.65%, 1000=22.12%
>>      lat (msec) : 2=38.35%, 4=38.09%, 10=0.26%, 50=0.39%
>>    cpu          : usr=0.37%, sys=5.45%, ctx=2261, majf=0, minf=5
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       complete  : 0=0.1%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       issued    : total=r=0/w=764/d=0, short=r=0/w=0/d=0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>>
>> Run status group 0 (all jobs):
>>    WRITE: io=23552KB, aggrb=2354KB/s, minb=467KB/s, maxb=1489KB/s,
>> mint=5358msec, maxt=10001msec
>>
>> Disk stats (read/write):
>>    sda: ios=0/13818, merge=0/2227, ticks=0/7428, in_queue=7260,
>> util=72.37%
>>
>> Is file sharing with non-trivial lockfile= setting is expected to work
>> with threads=1, or is it an unanticipated use case?
>
>
> No, that's supposed to work. I think it's a race between shared files, and
> opening/closing of them when you have a timed based job. I'll take a look at
> it.
>
> --
> Jens Axboe
>


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Shared files with thread=1: func=xfer, err=Bad file descriptor
  2014-04-01 15:56   ` Andrey Kuzmin
@ 2014-04-01 18:24     ` Jens Axboe
  2014-04-01 18:56       ` Andrey Kuzmin
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2014-04-01 18:24 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: fio

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

On 04/01/2014 09:56 AM, Andrey Kuzmin wrote:
> Thanks. On a separate (or may be related) note, shared files handling
> in dup_files seems to be broken (unless I'm missing the logic behind
> dropping non-shared files). The patch below has fixed it for me.
>
> diff --git a/init.c b/init.c
> index 9b0666d..99038b2 100644
> --- a/init.c
> +++ b/init.c
> @@ -1160,17 +1161,20 @@ static int add_job(struct thread_data *td, const char *j
>                  td_new->o.new_group = 0;
>
>                  if (file_alloced) {
> -                       td_new->files_index = 0;
> -                       td_new->files_size = 0;
>                          if (td_new->files) {
>                                  struct fio_file *f;
>                                  for_each_file(td_new, f, i) {
>                                          if (f->file_name)
> -                                               free(f->file_name);
> -                                       free(f);
> +                                               sfree(f->file_name);
> +                                       sfree(f);
>                                  }
> +                               free(td_new->files);
>                                  td_new->files = NULL;
>                          }
> +                       td_new->files_index = 0;
> +                       td_new->files_size = 0;
>                          if (td_new->o.filename) {
>                                  free(td_new->o.filename);
>                                  td_new->o.filename = NULL;

Thanks, I'll add this, definitely looks like a missing conversion at 
some point...

And while this isn't the bug, I think it's touching in the right area. 
Can you try with this one-liner and see if that fixes your issues with 
EBADF?

-- 
Jens Axboe


[-- Attachment #2: file-close-race-shadow.patch --]
[-- Type: text/x-patch, Size: 315 bytes --]

diff --git a/filesetup.c b/filesetup.c
index db80a8b7e28b..ecbe3a5fb41b 100644
--- a/filesetup.c
+++ b/filesetup.c
@@ -1421,6 +1459,7 @@ void dup_files(struct thread_data *td, struct thread_data *org)
 			assert(0);
 		}
 		__f->fd = -1;
+		__f->shadow_fd = -1;
 		fio_file_reset(td, __f);
 
 		if (f->file_name) {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Shared files with thread=1: func=xfer, err=Bad file descriptor
  2014-04-01 18:24     ` Jens Axboe
@ 2014-04-01 18:56       ` Andrey Kuzmin
  2014-04-01 18:59         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Andrey Kuzmin @ 2014-04-01 18:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Seems to have fixed it: EBADF does not show up anymore even in a 120s run.

randrwrite: (groupid=0, jobs=1): err= 0: pid=11373: Tue Apr  1 11:54:39 2014
  write: io=25608KB, bw=218501B/s, iops=53, runt=120011msec
    clat (usec): min=317, max=281422, avg=3839.52, stdev=8501.45
     lat (usec): min=320, max=281427, avg=3847.68, stdev=8501.90
    clat percentiles (usec):
     |  1.00th=[  652],  5.00th=[  844], 10.00th=[  956], 20.00th=[ 1160],
     | 30.00th=[ 1512], 40.00th=[ 2160], 50.00th=[ 2576], 60.00th=[ 2960],
     | 70.00th=[ 3248], 80.00th=[ 3664], 90.00th=[ 4768], 95.00th=[15424],
     | 99.00th=[33024], 99.50th=[33536], 99.90th=[83456], 99.95th=[209920],
     | 99.99th=[280576]
    bw (KB  /s): min=    1, max= 1016, per=24.43%, avg=238.72, stdev=226.57
    lat (usec) : 500=0.05%, 750=2.58%, 1000=9.56%
    lat (msec) : 2=25.02%, 4=47.47%, 10=9.03%, 20=3.86%, 50=2.25%
    lat (msec) : 100=0.11%, 250=0.05%, 500=0.03%
  cpu          : usr=0.24%, sys=3.14%, ctx=20699, majf=0, minf=3
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=6402/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randrwrite: (groupid=0, jobs=1): err= 0: pid=11374: Tue Apr  1 11:54:39 2014
  write: io=26880KB, bw=229370B/s, iops=55, runt=120003msec
    clat (usec): min=353, max=309465, avg=3914.37, stdev=9882.36
     lat (usec): min=356, max=309476, avg=3922.20, stdev=9882.87
    clat percentiles (usec):
     |  1.00th=[  636],  5.00th=[  812], 10.00th=[  924], 20.00th=[ 1112],
     | 30.00th=[ 1400], 40.00th=[ 1976], 50.00th=[ 2416], 60.00th=[ 2864],
     | 70.00th=[ 3184], 80.00th=[ 3600], 90.00th=[ 4640], 95.00th=[15424],
     | 99.00th=[33536], 99.50th=[36096], 99.90th=[166912], 99.95th=[232448],
     | 99.99th=[309248]
    bw (KB  /s): min=    2, max= 1208, per=24.62%, avg=240.54, stdev=228.83
    lat (usec) : 500=0.10%, 750=3.15%, 1000=10.70%
    lat (msec) : 2=26.32%, 4=45.31%, 10=8.14%, 20=3.41%, 50=2.68%
    lat (msec) : 100=0.06%, 250=0.07%, 500=0.04%
  cpu          : usr=0.29%, sys=3.12%, ctx=21225, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=6720/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randrwrite: (groupid=0, jobs=1): err= 0: pid=11375: Tue Apr  1 11:54:39 2014
  write: io=30728KB, bw=262194B/s, iops=64, runt=120008msec
    clat (usec): min=305, max=315339, avg=4275.75, stdev=12442.00
     lat (usec): min=309, max=315347, avg=4284.04, stdev=12442.51
    clat percentiles (usec):
     |  1.00th=[  628],  5.00th=[  812], 10.00th=[  924], 20.00th=[ 1112],
     | 30.00th=[ 1384], 40.00th=[ 1960], 50.00th=[ 2384], 60.00th=[ 2832],
     | 70.00th=[ 3184], 80.00th=[ 3600], 90.00th=[ 4640], 95.00th=[16768],
     | 99.00th=[33536], 99.50th=[39680], 99.90th=[232448], 99.95th=[280576],
     | 99.99th=[313344]
    bw (KB  /s): min=    2, max= 1295, per=28.94%, avg=282.72, stdev=271.58
    lat (usec) : 500=0.09%, 750=3.37%, 1000=10.80%
    lat (msec) : 2=26.56%, 4=44.99%, 10=7.39%, 20=3.16%, 50=3.33%
    lat (msec) : 100=0.09%, 250=0.13%, 500=0.08%
  cpu          : usr=0.33%, sys=3.40%, ctx=22837, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=7682/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1
randrwrite: (groupid=0, jobs=1): err= 0: pid=11376: Tue Apr  1 11:54:39 2014
  write: io=34064KB, bw=290616B/s, iops=70, runt=120026msec
    clat (usec): min=351, max=376615, avg=4112.48, stdev=11571.34
     lat (usec): min=355, max=376621, avg=4120.65, stdev=11572.04
    clat percentiles (usec):
     |  1.00th=[  644],  5.00th=[  812], 10.00th=[  924], 20.00th=[ 1096],
     | 30.00th=[ 1352], 40.00th=[ 1864], 50.00th=[ 2384], 60.00th=[ 2768],
     | 70.00th=[ 3120], 80.00th=[ 3504], 90.00th=[ 4576], 95.00th=[16192],
     | 99.00th=[34048], 99.50th=[41728], 99.90th=[193536], 99.95th=[259072],
     | 99.99th=[374784]
    bw (KB  /s): min=    1, max= 1736, per=31.18%, avg=304.63, stdev=317.27
    lat (usec) : 500=0.09%, 750=3.43%, 1000=10.79%
    lat (msec) : 2=27.56%, 4=44.46%, 10=7.21%, 20=2.92%, 50=3.21%
    lat (msec) : 100=0.15%, 250=0.11%, 500=0.07%
  cpu          : usr=0.28%, sys=3.66%, ctx=24223, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=8516/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=117280KB, aggrb=977KB/s, minb=213KB/s, maxb=283KB/s,
mint=120003msec, maxt=120026msec

Disk stats (read/write):
  sda: ios=0/76211, merge=0/17768, ticks=0/99456, in_queue=98572, util=81.61%
Regards,
Andrey


On Tue, Apr 1, 2014 at 11:24 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 04/01/2014 09:56 AM, Andrey Kuzmin wrote:
>>
>> Thanks. On a separate (or may be related) note, shared files handling
>> in dup_files seems to be broken (unless I'm missing the logic behind
>> dropping non-shared files). The patch below has fixed it for me.
>>
>> diff --git a/init.c b/init.c
>> index 9b0666d..99038b2 100644
>> --- a/init.c
>> +++ b/init.c
>> @@ -1160,17 +1161,20 @@ static int add_job(struct thread_data *td, const
>> char *j
>>                  td_new->o.new_group = 0;
>>
>>                  if (file_alloced) {
>> -                       td_new->files_index = 0;
>> -                       td_new->files_size = 0;
>>                          if (td_new->files) {
>>                                  struct fio_file *f;
>>                                  for_each_file(td_new, f, i) {
>>                                          if (f->file_name)
>> -                                               free(f->file_name);
>> -                                       free(f);
>> +                                               sfree(f->file_name);
>> +                                       sfree(f);
>>                                  }
>> +                               free(td_new->files);
>>                                  td_new->files = NULL;
>>                          }
>> +                       td_new->files_index = 0;
>> +                       td_new->files_size = 0;
>>                          if (td_new->o.filename) {
>>                                  free(td_new->o.filename);
>>                                  td_new->o.filename = NULL;
>
>
> Thanks, I'll add this, definitely looks like a missing conversion at some
> point...
>
> And while this isn't the bug, I think it's touching in the right area. Can
> you try with this one-liner and see if that fixes your issues with EBADF?
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Shared files with thread=1: func=xfer, err=Bad file descriptor
  2014-04-01 18:56       ` Andrey Kuzmin
@ 2014-04-01 18:59         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2014-04-01 18:59 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: fio

On 04/01/2014 12:56 PM, Andrey Kuzmin wrote:
> Seems to have fixed it: EBADF does not show up anymore even in a 120s run.

Great, it has been committed. The issue was that fio would mistakenly 
not assign shadow_fd on that one alloc, hence it would think it was fd 0 
and close it. And when open(2) then (validly) started handing out 
untracked fd 0, then all hell broke lose.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-01 18:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-01  1:08 Shared files with thread=1: func=xfer, err=Bad file descriptor Andrey Kuzmin
2014-04-01 14:31 ` Jens Axboe
2014-04-01 15:56   ` Andrey Kuzmin
2014-04-01 18:24     ` Jens Axboe
2014-04-01 18:56       ` Andrey Kuzmin
2014-04-01 18:59         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox