From: Thomas Glanzmann <thomas@glanzmann.de>
To: linux-nfs@vger.kernel.org
Subject: NFS 4 Trunking load balancing and failover
Date: Wed, 22 Jan 2025 05:46:00 +0100 [thread overview]
Message-ID: <Z5B4CPZlNgobvwxu@glanzmann.de> (raw)
Hello,
we tried to use nconnect and link trunking to access a NetApp NFS file using a
Debian Linux Kernel 6.12.9 with the following commands:
mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
root@debian-08:~# netstat -an | grep 2049
tcp 0 0 10.0.10.28:834 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:826 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:951 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:707 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:853 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:914 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.20.28:862 10.0.20.48:2049 ESTABLISHED
tcp 0 0 10.0.20.28:771 10.0.20.48:2049 TIME_WAIT
tcp 0 0 10.0.10.28:844 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:980 10.0.10.48:2049 ESTABLISHED
On the netapp you can see that the traffic is unequally distributed over the
two links:
n2 : 1/22/2025 04:38:58
*Recv Sent
Recv Data Recv Sent Data Sent Current
LIF Vserver Packet (Bps) Errors Packet (Bps) Errors Port
---- ----------------- ------ --------- ------ ------ ------- ------ -------
nfs1 frontend-08-nfs41 26865 905471818 0 13786 1599403 0 e0e-10
nfs2 frontend-08-nfs41 3952 114124809 0 1737 201578 0 e0f-20
While that works, we noticed that to the first ip addresses 8 tcp
connections are established and to the second only one tcp connection is
established. When generating load we can see that the majority of the NFS
traffic goes to the first ip. Is there a way to have more TCP connections
established to the second ip?
Also we noticed that when we take the first server ip down, the NFS
sessions stalls. We hoped that the NFS client code transparently uses the
second ip address. Is that planned for the future?
I also tried the above with the VMware ESX hypervisor. And with the most
recent version 8.0 Update 3 C. The traffic is equally distributed
across the two links and when taking down one of two links, the I/O
continues.
Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
are connected using 2 10 Gbit/s links using two vlans to a Linux VM
which is also connected using two dedicated 10 Gbit/s links. In order to
direct the traffic, we use two VLANs. As a result we have two
dedicated 10 Gbit/s links between Linux VM and NetApp.
We also noticed that we get the best possible performance from Linux to NetApp
filer using the following mount options over one path:
-o vers=3,nconnect=16
With that setup we can get 150k 4k randop iops with a queue depth of 256
(4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
are busy 25 - 50% (16 4TB SSDs).
We used the following commands to generate load. nproc = 4.
# high queue depth:
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1
# 1 qd:
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1
Cheers,
Thomas
next reply other threads:[~2025-01-22 4:51 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-22 4:46 Thomas Glanzmann [this message]
2025-01-23 14:02 ` NFS 4 Trunking load balancing and failover Anton Gavriliuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5B4CPZlNgobvwxu@glanzmann.de \
--to=thomas@glanzmann.de \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox