[BUG] active zones exceeded error with max_open

public inbox for fio@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] active zones exceeded error with max_open_zones
@ 2025-04-23 17:11 Sean Anderson
  2025-04-24  3:10 ` Damien Le Moal
  2025-04-24  6:13 ` Shinichiro Kawasaki
  0 siblings, 2 replies; 8+ messages in thread
From: Sean Anderson @ 2025-04-23 17:11 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, Jens Axboe, fio; +Cc: Damien Le Moal

Hi,

I'm getting an "active zones exceeded" error when running fio with
--rw=randwrite mode:

# fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
flushes: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.39
Starting 1 process
active zones exceeded error, dev my_zone_dev, sector 189520 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
fio: io_u error on file /dev/my_zone_dev: Value too large for defined data type: write offset=97034240, buflen=4096
/dev/my_zone_dev: Exceeded max_active_zones limit. Check conditions of zones out of I/O ranges.
fio: pid=2549, err=75/file:io_u.c:1976, func=io_u error, error=Value too large for defined data type

flushes: (groupid=0, jobs=1): err=75 (file:io_u.c:1976, func=io_u error, error=Value too large for defined data type): pid=2549: Wed Apr 23 17:01:03 2025
   write: IOPS=262, BW=1050KiB/s (1075kB/s)(9092KiB/8661msec); 0 zone resets
     clat (usec): min=983, max=20564, avg=3645.67, stdev=4347.94
      lat (usec): min=984, max=20564, avg=3645.75, stdev=4347.94
     clat percentiles (usec):
      |  1.00th=[  996],  5.00th=[ 1012], 10.00th=[ 1029], 20.00th=[ 1418],
      | 30.00th=[ 1434], 40.00th=[ 1434], 50.00th=[ 1450], 60.00th=[ 1450],
      | 70.00th=[ 1467], 80.00th=[ 5669], 90.00th=[12256], 95.00th=[12780],
      | 99.00th=[15008], 99.50th=[15533], 99.90th=[16712], 99.95th=[17171],
      | 99.99th=[20579]
    bw (  KiB/s): min=  500, max= 1205, per=100.00%, avg=1052.88, stdev=195.04, samples=17
    iops        : min=  125, max=  301, avg=262.88, stdev=48.79, samples=17
   lat (usec)   : 1000=1.76%
   lat (msec)   : 2=74.05%, 4=1.10%, 10=4.75%, 20=18.25%, 50=0.04%
   fsync/fdatasync/sync_file_range:
     sync (usec): min=50, max=11641, avg=160.03, stdev=798.31
     sync percentiles (usec):
      |  1.00th=[   53],  5.00th=[   57], 10.00th=[   66], 20.00th=[   73],
      | 30.00th=[   81], 40.00th=[   82], 50.00th=[   83], 60.00th=[   84],
      | 70.00th=[   85], 80.00th=[   87], 90.00th=[  178], 95.00th=[  208],
      | 99.00th=[  603], 99.50th=[ 1549], 99.90th=[11600], 99.95th=[11600],
      | 99.99th=[11600]
   cpu          : usr=0.00%, sys=49.31%, ctx=2823, majf=0, minf=181
   IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued rwts: total=0,2274,0,2273 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   WRITE: bw=1050KiB/s (1075kB/s), 1050KiB/s-1050KiB/s (1075kB/s-1075kB/s), io=9092KiB (9310kB), run=8661-8661msec

Disk stats (read/write):
   my_zone_dev: ios=170/4498, sectors=1336/17992, merge=0/0, ticks=0/118, in_queue=230, util=47.80%

The issue seems to be that fio writes to a bunch of zones but never
finishes them because they're not full yet:

# blkzone report -c 16 /dev/my_block_dev
   start: 0x000000000, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000020, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000040, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000060, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000080, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x0000000a0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x0000000c0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x0000000e0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000100, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000120, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000140, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000160, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x000000180, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x0000001a0, len 0x000020, cap 0x00001f, wptr 0x000010 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x0000001c0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
   start: 0x0000001e0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]

This issue doesn't seem to occur with --rw=write because sequential
writes fill up zones and they get finished automatically.

--Sean

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-23 17:11 [BUG] active zones exceeded error with max_open_zones Sean Anderson
@ 2025-04-24  3:10 ` Damien Le Moal
  2025-04-24  5:27   ` Sean Anderson
  2025-04-24  6:13 ` Shinichiro Kawasaki
  1 sibling, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2025-04-24  3:10 UTC (permalink / raw)
  To: Sean Anderson, Shin'ichiro Kawasaki, Jens Axboe, fio

On 4/24/25 02:11, Sean Anderson wrote:
> Hi,
> 
> I'm getting an "active zones exceeded" error when running fio with
> --rw=randwrite mode:
> 
> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev

--max_open_zones=1978 is an extremely large value that likely exceeds the drive
capabilities, which is what fio is telling you.
What are your drive maximum open and active zones limits ?

cat /sys/block/my_zone_dev/queue/max_open_zones
cat /sys/block/my_zone_dev/queue/max_active_zones

fio will use the min_not_zero of these 2 values as the maximum number of zones
that can be written simultaneously. Especially if your drive has an active zone
limit, you *cannot* write to more zones than that limit at the same time.
fio will default to max_open_zones=min_not_zero(drive max open, drive max
active) and for a random write workload, it will:
- pick zones randomly up to max_open_zones
- direct write IOs to a randomly chosen zone in the current set of open zones
and when an open zone becomes full, randomly pick another zone to replace it.

For your workload, if you want to measure the maximum "random" write performance
of your disk, simply do NOT specify --max_open_zones=. fio will pick the best
possible number for you.


> flushes: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
> fio-3.39
> Starting 1 process
> active zones exceeded error, dev my_zone_dev, sector 189520 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
> fio: io_u error on file /dev/my_zone_dev: Value too large for defined data type: write offset=97034240, buflen=4096
> /dev/my_zone_dev: Exceeded max_active_zones limit. Check conditions of zones out of I/O ranges.
> fio: pid=2549, err=75/file:io_u.c:1976, func=io_u error, error=Value too large for defined data type
> 
> flushes: (groupid=0, jobs=1): err=75 (file:io_u.c:1976, func=io_u error, error=Value too large for defined data type): pid=2549: Wed Apr 23 17:01:03 2025
>    write: IOPS=262, BW=1050KiB/s (1075kB/s)(9092KiB/8661msec); 0 zone resets
>      clat (usec): min=983, max=20564, avg=3645.67, stdev=4347.94
>       lat (usec): min=984, max=20564, avg=3645.75, stdev=4347.94
>      clat percentiles (usec):
>       |  1.00th=[  996],  5.00th=[ 1012], 10.00th=[ 1029], 20.00th=[ 1418],
>       | 30.00th=[ 1434], 40.00th=[ 1434], 50.00th=[ 1450], 60.00th=[ 1450],
>       | 70.00th=[ 1467], 80.00th=[ 5669], 90.00th=[12256], 95.00th=[12780],
>       | 99.00th=[15008], 99.50th=[15533], 99.90th=[16712], 99.95th=[17171],
>       | 99.99th=[20579]
>     bw (  KiB/s): min=  500, max= 1205, per=100.00%, avg=1052.88, stdev=195.04, samples=17
>     iops        : min=  125, max=  301, avg=262.88, stdev=48.79, samples=17
>    lat (usec)   : 1000=1.76%
>    lat (msec)   : 2=74.05%, 4=1.10%, 10=4.75%, 20=18.25%, 50=0.04%
>    fsync/fdatasync/sync_file_range:
>      sync (usec): min=50, max=11641, avg=160.03, stdev=798.31
>      sync percentiles (usec):
>       |  1.00th=[   53],  5.00th=[   57], 10.00th=[   66], 20.00th=[   73],
>       | 30.00th=[   81], 40.00th=[   82], 50.00th=[   83], 60.00th=[   84],
>       | 70.00th=[   85], 80.00th=[   87], 90.00th=[  178], 95.00th=[  208],
>       | 99.00th=[  603], 99.50th=[ 1549], 99.90th=[11600], 99.95th=[11600],
>       | 99.99th=[11600]
>    cpu          : usr=0.00%, sys=49.31%, ctx=2823, majf=0, minf=181
>    IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued rwts: total=0,2274,0,2273 short=0,0,0,0 dropped=0,0,0,0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>    WRITE: bw=1050KiB/s (1075kB/s), 1050KiB/s-1050KiB/s (1075kB/s-1075kB/s), io=9092KiB (9310kB), run=8661-8661msec
> 
> Disk stats (read/write):
>    my_zone_dev: ios=170/4498, sectors=1336/17992, merge=0/0, ticks=0/118, in_queue=230, util=47.80%
> 
> The issue seems to be that fio writes to a bunch of zones but never
> finishes them because they're not full yet:
> 
> # blkzone report -c 16 /dev/my_block_dev
>    start: 0x000000000, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000020, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000040, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000060, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000080, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x0000000a0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x0000000c0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x0000000e0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000100, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000120, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000140, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000160, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x000000180, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x0000001a0, len 0x000020, cap 0x00001f, wptr 0x000010 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x0000001c0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>    start: 0x0000001e0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
> 
> This issue doesn't seem to occur with --rw=write because sequential
> writes fill up zones and they get finished automatically.
> 
> --Sean


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-24  3:10 ` Damien Le Moal
@ 2025-04-24  5:27   ` Sean Anderson
  2025-04-24  5:40     ` Damien Le Moal
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Anderson @ 2025-04-24  5:27 UTC (permalink / raw)
  To: Damien Le Moal, Shin'ichiro Kawasaki, Jens Axboe, fio

On 4/23/25 23:10, Damien Le Moal wrote:
> On 4/24/25 02:11, Sean Anderson wrote:
>> Hi,
>>
>> I'm getting an "active zones exceeded" error when running fio with
>> --rw=randwrite mode:
>>
>> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
> 
> --max_open_zones=1978 is an extremely large value that likely exceeds the drive
> capabilities, which is what fio is telling you.
> What are your drive maximum open and active zones limits ?
> 
> cat /sys/block/my_zone_dev/queue/max_open_zones
> cat /sys/block/my_zone_dev/queue/max_active_zones

This is correct for the drive.

> fio will use the min_not_zero of these 2 values as the maximum number of zones
> that can be written simultaneously. Especially if your drive has an active zone
> limit, you *cannot* write to more zones than that limit at the same time.
> fio will default to max_open_zones=min_not_zero(drive max open, drive max
> active) and for a random write workload, it will:
> - pick zones randomly up to max_open_zones
> - direct write IOs to a randomly chosen zone in the current set of open zones
> and when an open zone becomes full, randomly pick another zone to replace it.

Well the issue is that it appears to just pick random zones, not random open zones.

> For your workload, if you want to measure the maximum "random" write performance
> of your disk, simply do NOT specify --max_open_zones=. fio will pick the best
> possible number for you.

Same issue.

--Sean

>> flushes: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
>> fio-3.39
>> Starting 1 process
>> active zones exceeded error, dev my_zone_dev, sector 189520 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
>> fio: io_u error on file /dev/my_zone_dev: Value too large for defined data type: write offset=97034240, buflen=4096
>> /dev/my_zone_dev: Exceeded max_active_zones limit. Check conditions of zones out of I/O ranges.
>> fio: pid=2549, err=75/file:io_u.c:1976, func=io_u error, error=Value too large for defined data type
>>
>> flushes: (groupid=0, jobs=1): err=75 (file:io_u.c:1976, func=io_u error, error=Value too large for defined data type): pid=2549: Wed Apr 23 17:01:03 2025
>>     write: IOPS=262, BW=1050KiB/s (1075kB/s)(9092KiB/8661msec); 0 zone resets
>>       clat (usec): min=983, max=20564, avg=3645.67, stdev=4347.94
>>        lat (usec): min=984, max=20564, avg=3645.75, stdev=4347.94
>>       clat percentiles (usec):
>>        |  1.00th=[  996],  5.00th=[ 1012], 10.00th=[ 1029], 20.00th=[ 1418],
>>        | 30.00th=[ 1434], 40.00th=[ 1434], 50.00th=[ 1450], 60.00th=[ 1450],
>>        | 70.00th=[ 1467], 80.00th=[ 5669], 90.00th=[12256], 95.00th=[12780],
>>        | 99.00th=[15008], 99.50th=[15533], 99.90th=[16712], 99.95th=[17171],
>>        | 99.99th=[20579]
>>      bw (  KiB/s): min=  500, max= 1205, per=100.00%, avg=1052.88, stdev=195.04, samples=17
>>      iops        : min=  125, max=  301, avg=262.88, stdev=48.79, samples=17
>>     lat (usec)   : 1000=1.76%
>>     lat (msec)   : 2=74.05%, 4=1.10%, 10=4.75%, 20=18.25%, 50=0.04%
>>     fsync/fdatasync/sync_file_range:
>>       sync (usec): min=50, max=11641, avg=160.03, stdev=798.31
>>       sync percentiles (usec):
>>        |  1.00th=[   53],  5.00th=[   57], 10.00th=[   66], 20.00th=[   73],
>>        | 30.00th=[   81], 40.00th=[   82], 50.00th=[   83], 60.00th=[   84],
>>        | 70.00th=[   85], 80.00th=[   87], 90.00th=[  178], 95.00th=[  208],
>>        | 99.00th=[  603], 99.50th=[ 1549], 99.90th=[11600], 99.95th=[11600],
>>        | 99.99th=[11600]
>>     cpu          : usr=0.00%, sys=49.31%, ctx=2823, majf=0, minf=181
>>     IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>>        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>        complete  : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>>        issued rwts: total=0,2274,0,2273 short=0,0,0,0 dropped=0,0,0,0
>>        latency   : target=0, window=0, percentile=100.00%, depth=1
>>
>> Run status group 0 (all jobs):
>>     WRITE: bw=1050KiB/s (1075kB/s), 1050KiB/s-1050KiB/s (1075kB/s-1075kB/s), io=9092KiB (9310kB), run=8661-8661msec
>>
>> Disk stats (read/write):
>>     my_zone_dev: ios=170/4498, sectors=1336/17992, merge=0/0, ticks=0/118, in_queue=230, util=47.80%
>>
>> The issue seems to be that fio writes to a bunch of zones but never
>> finishes them because they're not full yet:
>>
>> # blkzone report -c 16 /dev/my_block_dev
>>     start: 0x000000000, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000020, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000040, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000060, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000080, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x0000000a0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x0000000c0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x0000000e0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000100, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000120, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000140, len 0x000020, cap 0x00001f, wptr 0x000008 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000160, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x000000180, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x0000001a0, len 0x000020, cap 0x00001f, wptr 0x000010 reset:0 non-seq:0, zcond: 4(cl) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x0000001c0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>     start: 0x0000001e0, len 0x000020, cap 0x00001f, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
>>
>> This issue doesn't seem to occur with --rw=write because sequential
>> writes fill up zones and they get finished automatically.
>>
>> --Sean
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-24  5:27   ` Sean Anderson
@ 2025-04-24  5:40     ` Damien Le Moal
  2025-04-24  5:53       ` Sean Anderson
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2025-04-24  5:40 UTC (permalink / raw)
  To: Sean Anderson, Shin'ichiro Kawasaki, Jens Axboe, fio

On 4/24/25 14:27, Sean Anderson wrote:
> On 4/23/25 23:10, Damien Le Moal wrote:
>> On 4/24/25 02:11, Sean Anderson wrote:
>>> Hi,
>>>
>>> I'm getting an "active zones exceeded" error when running fio with
>>> --rw=randwrite mode:
>>>
>>> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
>>
>> --max_open_zones=1978 is an extremely large value that likely exceeds the drive
>> capabilities, which is what fio is telling you.
>> What are your drive maximum open and active zones limits ?
>>
>> cat /sys/block/my_zone_dev/queue/max_open_zones
>> cat /sys/block/my_zone_dev/queue/max_active_zones
> 
> This is correct for the drive.

I am sure it is. My point is that you cannot use --max_open_zone fio option with
a value larger than what these sysfs attribute values are.

>> fio will use the min_not_zero of these 2 values as the maximum number of zones
>> that can be written simultaneously. Especially if your drive has an active zone
>> limit, you *cannot* write to more zones than that limit at the same time.
>> fio will default to max_open_zones=min_not_zero(drive max open, drive max
>> active) and for a random write workload, it will:
>> - pick zones randomly up to max_open_zones
>> - direct write IOs to a randomly chosen zone in the current set of open zones
>> and when an open zone becomes full, randomly pick another zone to replace it.
> 
> Well the issue is that it appears to just pick random zones, not random open zones.

The drive may be completely empty, all zones empty, so no open zones to chose
from. So fio picks a random zone and adds it to the set of open zones it tracks.
It will repeat that until the set of open zones reaches the limit, at which
point, fio has no choice but to keep writting these open zones until they are full.

If the drive already has open zone in the fio workload range, these zones will
be added to the set of open zones on fio start.

>> For your workload, if you want to measure the maximum "random" write performance
>> of your disk, simply do NOT specify --max_open_zones=. fio will pick the best
>> possible number for you.
> 
> Same issue.

Are you specifying an offset+size range ? If yes, how many zones are open in
that range and outside of it ?

What drive is it ? Looking at the blkzone report you sent, the zones look
ridiculously small (32 sectors...)... Is this a null_blk device ?
If that is the case, try creating the drive with larger zones. You may be
hitting a bug with such tiny zones. Since such drive do not exist in the field,
we never really tested such extreme configuration.

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-24  5:40     ` Damien Le Moal
@ 2025-04-24  5:53       ` Sean Anderson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Anderson @ 2025-04-24  5:53 UTC (permalink / raw)
  To: Damien Le Moal, Shin'ichiro Kawasaki, Jens Axboe, fio

On 4/24/25 01:40, Damien Le Moal wrote:
> On 4/24/25 14:27, Sean Anderson wrote:
>> On 4/23/25 23:10, Damien Le Moal wrote:
>>> On 4/24/25 02:11, Sean Anderson wrote:
>>>> Hi,
>>>>
>>>> I'm getting an "active zones exceeded" error when running fio with
>>>> --rw=randwrite mode:
>>>>
>>>> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
>>>
>>> --max_open_zones=1978 is an extremely large value that likely exceeds the drive
>>> capabilities, which is what fio is telling you.
>>> What are your drive maximum open and active zones limits ?
>>>
>>> cat /sys/block/my_zone_dev/queue/max_open_zones
>>> cat /sys/block/my_zone_dev/queue/max_active_zones
>>
>> This is correct for the drive.
> 
> I am sure it is. My point is that you cannot use --max_open_zone fio option with
> a value larger than what these sysfs attribute values are.

as I said

# cat /sys/block/my_zone_dev/queue/max_open_zones
1978
# cat /sys/block/my_zone_dev/queue/max_active_zones
1978

>>> fio will use the min_not_zero of these 2 values as the maximum number of zones
>>> that can be written simultaneously. Especially if your drive has an active zone
>>> limit, you *cannot* write to more zones than that limit at the same time.
>>> fio will default to max_open_zones=min_not_zero(drive max open, drive max
>>> active) and for a random write workload, it will:
>>> - pick zones randomly up to max_open_zones
>>> - direct write IOs to a randomly chosen zone in the current set of open zones
>>> and when an open zone becomes full, randomly pick another zone to replace it.
>>
>> Well the issue is that it appears to just pick random zones, not random open zones.
> 
> The drive may be completely empty, all zones empty, so no open zones to chose
> from. So fio picks a random zone and adds it to the set of open zones it tracks.
> It will repeat that until the set of open zones reaches the limit, at which
> point, fio has no choice but to keep writting these open zones until they are full.
> 
> If the drive already has open zone in the fio workload range, these zones will
> be added to the set of open zones on fio start.

But it's not doing this.

>>> For your workload, if you want to measure the maximum "random" write performance
>>> of your disk, simply do NOT specify --max_open_zones=. fio will pick the best
>>> possible number for you.
>>
>> Same issue.
> 
> Are you specifying an offset+size range ? If yes, how many zones are open in
> that range and outside of it ?

The full command line and output is in my original email. I ran `blkzone reset`
before this. If I don't do a reset first it fails almost immediately.

> What drive is it ? Looking at the blkzone report you sent, the zones look
> ridiculously small (32 sectors...)... Is this a null_blk device ?

Something I'm working on. I'm not done testing yet (which is why I was messing around
with fio). You can probably guess what it is based on the geometry.

I'm actually thinking of creating a device mapper driver (layer?) to combine the zones
because a lot of other layers expect larger zones. E.g. btrfs expects 4M zones. Not
sure whether it should go under linear or not.

But tbh this is a bit strange to me. Ideally filesystems should take advantage of smaller
zones because they more closely approximate conventional zones. And if they need to store
larger structures they should just store them in multiple zones.

> If that is the case, try creating the drive with larger zones. You may be
> hitting a bug with such tiny zones. Since such drive do not exist in the field,
> we never really tested such extreme configuration.

The hardware I am targeting naturally has small zones, so I am interested in finding/fixing
these sorts of bugs.

--Sean

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-23 17:11 [BUG] active zones exceeded error with max_open_zones Sean Anderson
  2025-04-24  3:10 ` Damien Le Moal
@ 2025-04-24  6:13 ` Shinichiro Kawasaki
  2025-04-24 14:01   ` Sean Anderson
  1 sibling, 1 reply; 8+ messages in thread
From: Shinichiro Kawasaki @ 2025-04-24  6:13 UTC (permalink / raw)
  To: Sean Anderson; +Cc: Jens Axboe, fio@vger.kernel.org, Damien Le Moal

On Apr 23, 2025 / 13:11, Sean Anderson wrote:
> Hi,
> 
> I'm getting an "active zones exceeded" error when running fio with
> --rw=randwrite mode:
> 
> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
> flushes: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1

Hi,

The block size is 4k. And according to the blkzone report, it looks like the gap
between the zone size and zone capacity is 512b. So, I guess the fio can not
fill the gap by 4k writes. It looks likely this unalignment between the gap and
the block size left unwritten small remainders in many zones, then many zones
are kept open and the device exceeded the max active zone limit. Based on this
guess, I suggest to try with 512 byte block size to align it with the gap.

> fio-3.39

Recently, I contributed a fix which handles the case many small remainders are
left by random write workload to zoned block devices [1]. The fix was upstreamed
after the fio version 3.39. I also suggest to try out the latest fio code with
4k blocksize.

[1] https://github.com/axboe/fio/commit/e2e29bf6f8300186d267fa46a7b266d14d174575

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-24  6:13 ` Shinichiro Kawasaki
@ 2025-04-24 14:01   ` Sean Anderson
  2025-04-25  4:14     ` Sean Anderson
  0 siblings, 1 reply; 8+ messages in thread
From: Sean Anderson @ 2025-04-24 14:01 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jens Axboe, fio@vger.kernel.org, Damien Le Moal

On 4/24/25 02:13, Shinichiro Kawasaki wrote:
> On Apr 23, 2025 / 13:11, Sean Anderson wrote:
>> Hi,
>>
>> I'm getting an "active zones exceeded" error when running fio with
>> --rw=randwrite mode:
>>
>> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
>> flushes: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
> 
> Hi,
> 
> The block size is 4k. And according to the blkzone report, it looks like the gap
> between the zone size and zone capacity is 512b. So, I guess the fio can not
> fill the gap by 4k writes. It looks likely this unalignment between the gap and
> the block size left unwritten small remainders in many zones, then many zones
> are kept open and the device exceeded the max active zone limit. Based on this
> guess, I suggest to try with 512 byte block size to align it with the gap.

Same issue. And fio seems to be capable of working with the gap in --rw=write mode.

>> fio-3.39
> 
> Recently, I contributed a fix which handles the case many small remainders are
> left by random write workload to zoned block devices [1]. The fix was upstreamed
> after the fio version 3.39. I also suggest to try out the latest fio code with
> 4k blocksize.
> 
> [1] https://github.com/axboe/fio/commit/e2e29bf6f8300186d267fa46a7b266d14d174575

Yeah, I saw that, but it didn't appear to be related to this issue. I will try it
out tonight.

--Sean

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] active zones exceeded error with max_open_zones
  2025-04-24 14:01   ` Sean Anderson
@ 2025-04-25  4:14     ` Sean Anderson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Anderson @ 2025-04-25  4:14 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: Jens Axboe, fio@vger.kernel.org, Damien Le Moal

On 4/24/25 10:01, Sean Anderson wrote:
> On 4/24/25 02:13, Shinichiro Kawasaki wrote:
>> On Apr 23, 2025 / 13:11, Sean Anderson wrote:
>>> Hi,
>>>
>>> I'm getting an "active zones exceeded" error when running fio with
>>> --rw=randwrite mode:
>>>
>>> # fio --bs=4k --rw=randwrite --norandommap --fsync=1 --number_ios=16384 --name=flushes --direct=1 --zonemode=zbd --max_open_zones=1978 --filename=/dev/my_zone_dev
>>> flushes: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
>>
>> Hi,
>>
>> The block size is 4k. And according to the blkzone report, it looks like the gap
>> between the zone size and zone capacity is 512b. So, I guess the fio can not
>> fill the gap by 4k writes. It looks likely this unalignment between the gap and
>> the block size left unwritten small remainders in many zones, then many zones
>> are kept open and the device exceeded the max active zone limit. Based on this
>> guess, I suggest to try with 512 byte block size to align it with the gap.
> 
> Same issue. And fio seems to be capable of working with the gap in --rw=write mode.
> 
>>> fio-3.39
>>
>> Recently, I contributed a fix which handles the case many small remainders are
>> left by random write workload to zoned block devices [1]. The fix was upstreamed
>> after the fio version 3.39. I also suggest to try out the latest fio code with
>> 4k blocksize.
>>
>> [1] https://github.com/axboe/fio/commit/e2e29bf6f8300186d267fa46a7b266d14d174575
> 
> Yeah, I saw that, but it didn't appear to be related to this issue. I will try it
> out tonight.

Same issue. I will look into this more this weekend.

--Sean


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-25  4:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 17:11 [BUG] active zones exceeded error with max_open_zones Sean Anderson
2025-04-24  3:10 ` Damien Le Moal
2025-04-24  5:27   ` Sean Anderson
2025-04-24  5:40     ` Damien Le Moal
2025-04-24  5:53       ` Sean Anderson
2025-04-24  6:13 ` Shinichiro Kawasaki
2025-04-24 14:01   ` Sean Anderson
2025-04-25  4:14     ` Sean Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox