Evenly distribute jobs and iodepth over a 1 TiB device so that every byte is written to in parallel

public inbox for fio@vger.kernel.org
 help / color / mirror / Atom feed

From: Thomas Glanzmann <thomas@glanzmann.de>
To: fio@vger.kernel.org
Subject: Evenly distribute jobs and iodepth over a 1 TiB device so that every byte is written to in parallel
Date: Tue, 15 Jul 2025 07:17:45 +0200	[thread overview]
Message-ID: <aHXkeSsuqlIj2R64@glanzmann.de> (raw)

Hello,
I have a 1 TiB NVMe namespace from a NetApp connected via two distinct
direct links to a Linux system over NVMe/TCP. I would like to generate
read and write I/O using multiple jobs/iodepth so that every byte of the
device is being written to in parallel with the maximum number of
available parallel inflight I/Os. The NetApp does deduplication
and compression by default so I want to generate random data. Because I
think if I don't do refill_buffers, the NetApp gets that the same data
is used over and over again and dedups it. I tried:

fio --ioengine=libaio --refill_buffers --filesize=25G --ramp_time=2s \
--runtime=1m --numjobs=40 --direct=1 --verify=0 --randrepeat=0 \
--group_reporting --filename=/dev/nvme0n1 --name=1mhqd --blocksize=1m \
--iodepth=1638 --readwrite=write

So I'm on a Linux system with 40 hyperthreads and a mellanox 2x 25
Gbit/s card hooked up to the NetApp:

(live) [~] ip -br a s
...
eth6             UP             192.168.0.100/24
eth7             UP             192.168.1.100/24

(live) [~] nvme list-subsys /dev/nvme0n1
nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.e0a0273a60b711f09deed039ead647e8:subsystem.svm1_subsystem_553
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:20f011e6-9ab8-584f-abb0-a260d2d685c4
\
 +- nvme0 tcp traddr=192.168.0.2,trsvcid=4420,src_addr=192.168.0.100 live optimized
 +- nvme1 tcp traddr=192.168.1.2,trsvcid=4420,src_addr=192.168.1.100 live optimized

na2501::*> network interface show
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
...
svm1
            lif_svm1_2660 up/up   192.168.1.2/24     na2501-02     e4c     true
            lif_svm1_9354 up/up   192.168.0.2/24     na2501-01     e4c     true

So when I run the above command the NetApp only reports a few hundered GiB of
physically allocated space:

na2501::*> aggr show -fields physical-used
aggregate      physical-used
-------------- -------------
dataFA_4_p0_i1 169.5GB

So, I ran:

(live) [~] pv < /dev/urandom > /dev/nvme0n1
1.00TiB 0:59:14 [ 294MiB/s] [======================>] 100%

And afterwards more physical space was used:

na2501::*> aggr show -fields physical-used
aggregate      physical-used
-------------- -------------
dataFA_4_p0_i1 1.15TB

So, what is the best way to use fio to write random data to every byte of this
1 TiB device in parallel?

	- Is there a command line parameter?
	- Or should I create 40 25.6 GiB (1024/40) partitions and give them as
	  colon separated list to fio?

I also would like to determine the number of queues and queue depth? Is there a
command available. When I run:

fio --ioengine=libaio --refill_buffers --filesize=8G --ramp_time=2s \
--runtime=1m --numjobs=40 --direct=1 --verify=0 --randrepeat=0 \
--group_reporting --filename=/dev/nvme0n1 --name=4khqd --blocksize=4k \
--iodepth=1638 --readwrite=randwrite

And also watch 'iostat -xm 2' I can see aqu-sz is 194.87 per path and
391.96 for the multipathed device nvme0n1. So I kind of know it but
would like to have a command on Linux that shows me the available queues
and queue depths.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.67    0.00    6.48   85.13    0.00    6.72

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0c0n1        0.00      0.00     0.00   0.00    0.00     0.00 60855.00    237.71     0.00   0.00    3.20     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00  194.87 100.00
nvme0c1n1        0.00      0.00     0.00   0.00    0.00     0.00 58719.00    229.37     0.00   0.00    3.32     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00  194.94 100.00
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00 119570.50    467.07     0.00   0.00    3.28     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00  391.96 100.00

Cheers,
        Thomas

next             reply	other threads:[~2025-07-15  5:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-15  5:17 Thomas Glanzmann [this message]
2025-07-15 20:44 ` Evenly distribute jobs and iodepth over a 1 TiB device so that every byte is written to in parallel Sitsofe Wheeler
2025-07-17  7:52   ` Thomas Glanzmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHXkeSsuqlIj2R64@glanzmann.de \
    --to=thomas@glanzmann.de \
    --cc=fio@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox