From: Thomas Glanzmann <thomas@glanzmann.de>
To: fio@vger.kernel.org
Subject: Evenly distribute jobs and iodepth over a 1 TiB device so that every byte is written to in parallel
Date: Tue, 15 Jul 2025 07:17:45 +0200 [thread overview]
Message-ID: <aHXkeSsuqlIj2R64@glanzmann.de> (raw)
Hello,
I have a 1 TiB NVMe namespace from a NetApp connected via two distinct
direct links to a Linux system over NVMe/TCP. I would like to generate
read and write I/O using multiple jobs/iodepth so that every byte of the
device is being written to in parallel with the maximum number of
available parallel inflight I/Os. The NetApp does deduplication
and compression by default so I want to generate random data. Because I
think if I don't do refill_buffers, the NetApp gets that the same data
is used over and over again and dedups it. I tried:
fio --ioengine=libaio --refill_buffers --filesize=25G --ramp_time=2s \
--runtime=1m --numjobs=40 --direct=1 --verify=0 --randrepeat=0 \
--group_reporting --filename=/dev/nvme0n1 --name=1mhqd --blocksize=1m \
--iodepth=1638 --readwrite=write
So I'm on a Linux system with 40 hyperthreads and a mellanox 2x 25
Gbit/s card hooked up to the NetApp:
(live) [~] ip -br a s
...
eth6 UP 192.168.0.100/24
eth7 UP 192.168.1.100/24
(live) [~] nvme list-subsys /dev/nvme0n1
nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.e0a0273a60b711f09deed039ead647e8:subsystem.svm1_subsystem_553
hostnqn=nqn.2014-08.org.nvmexpress:uuid:20f011e6-9ab8-584f-abb0-a260d2d685c4
\
+- nvme0 tcp traddr=192.168.0.2,trsvcid=4420,src_addr=192.168.0.100 live optimized
+- nvme1 tcp traddr=192.168.1.2,trsvcid=4420,src_addr=192.168.1.100 live optimized
na2501::*> network interface show
Logical Status Network Current Current Is
Vserver Interface Admin/Oper Address/Mask Node Port Home
----------- ---------- ---------- ------------------ ------------- ------- ----
...
svm1
lif_svm1_2660 up/up 192.168.1.2/24 na2501-02 e4c true
lif_svm1_9354 up/up 192.168.0.2/24 na2501-01 e4c true
So when I run the above command the NetApp only reports a few hundered GiB of
physically allocated space:
na2501::*> aggr show -fields physical-used
aggregate physical-used
-------------- -------------
dataFA_4_p0_i1 169.5GB
So, I ran:
(live) [~] pv < /dev/urandom > /dev/nvme0n1
1.00TiB 0:59:14 [ 294MiB/s] [======================>] 100%
And afterwards more physical space was used:
na2501::*> aggr show -fields physical-used
aggregate physical-used
-------------- -------------
dataFA_4_p0_i1 1.15TB
So, what is the best way to use fio to write random data to every byte of this
1 TiB device in parallel?
- Is there a command line parameter?
- Or should I create 40 25.6 GiB (1024/40) partitions and give them as
colon separated list to fio?
I also would like to determine the number of queues and queue depth? Is there a
command available. When I run:
fio --ioengine=libaio --refill_buffers --filesize=8G --ramp_time=2s \
--runtime=1m --numjobs=40 --direct=1 --verify=0 --randrepeat=0 \
--group_reporting --filename=/dev/nvme0n1 --name=4khqd --blocksize=4k \
--iodepth=1638 --readwrite=randwrite
And also watch 'iostat -xm 2' I can see aqu-sz is 194.87 per path and
391.96 for the multipathed device nvme0n1. So I kind of know it but
would like to have a command on Linux that shows me the available queues
and queue depths.
avg-cpu: %user %nice %system %iowait %steal %idle
1.67 0.00 6.48 85.13 0.00 6.72
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0c0n1 0.00 0.00 0.00 0.00 0.00 0.00 60855.00 237.71 0.00 0.00 3.20 4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 194.87 100.00
nvme0c1n1 0.00 0.00 0.00 0.00 0.00 0.00 58719.00 229.37 0.00 0.00 3.32 4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 194.94 100.00
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 119570.50 467.07 0.00 0.00 3.28 4.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 391.96 100.00
Cheers,
Thomas
next reply other threads:[~2025-07-15 5:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 5:17 Thomas Glanzmann [this message]
2025-07-15 20:44 ` Evenly distribute jobs and iodepth over a 1 TiB device so that every byte is written to in parallel Sitsofe Wheeler
2025-07-17 7:52 ` Thomas Glanzmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHXkeSsuqlIj2R64@glanzmann.de \
--to=thomas@glanzmann.de \
--cc=fio@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox