* NFS 4 Trunking load balancing and failover
@ 2025-01-22 4:46 Thomas Glanzmann
2025-01-23 14:02 ` Anton Gavriliuk
0 siblings, 1 reply; 2+ messages in thread
From: Thomas Glanzmann @ 2025-01-22 4:46 UTC (permalink / raw)
To: linux-nfs
Hello,
we tried to use nconnect and link trunking to access a NetApp NFS file using a
Debian Linux Kernel 6.12.9 with the following commands:
mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
root@debian-08:~# netstat -an | grep 2049
tcp 0 0 10.0.10.28:834 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:826 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:951 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:707 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:853 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:914 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.20.28:862 10.0.20.48:2049 ESTABLISHED
tcp 0 0 10.0.20.28:771 10.0.20.48:2049 TIME_WAIT
tcp 0 0 10.0.10.28:844 10.0.10.48:2049 ESTABLISHED
tcp 0 0 10.0.10.28:980 10.0.10.48:2049 ESTABLISHED
On the netapp you can see that the traffic is unequally distributed over the
two links:
n2 : 1/22/2025 04:38:58
*Recv Sent
Recv Data Recv Sent Data Sent Current
LIF Vserver Packet (Bps) Errors Packet (Bps) Errors Port
---- ----------------- ------ --------- ------ ------ ------- ------ -------
nfs1 frontend-08-nfs41 26865 905471818 0 13786 1599403 0 e0e-10
nfs2 frontend-08-nfs41 3952 114124809 0 1737 201578 0 e0f-20
While that works, we noticed that to the first ip addresses 8 tcp
connections are established and to the second only one tcp connection is
established. When generating load we can see that the majority of the NFS
traffic goes to the first ip. Is there a way to have more TCP connections
established to the second ip?
Also we noticed that when we take the first server ip down, the NFS
sessions stalls. We hoped that the NFS client code transparently uses the
second ip address. Is that planned for the future?
I also tried the above with the VMware ESX hypervisor. And with the most
recent version 8.0 Update 3 C. The traffic is equally distributed
across the two links and when taking down one of two links, the I/O
continues.
Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
are connected using 2 10 Gbit/s links using two vlans to a Linux VM
which is also connected using two dedicated 10 Gbit/s links. In order to
direct the traffic, we use two VLANs. As a result we have two
dedicated 10 Gbit/s links between Linux VM and NetApp.
We also noticed that we get the best possible performance from Linux to NetApp
filer using the following mount options over one path:
-o vers=3,nconnect=16
With that setup we can get 150k 4k randop iops with a queue depth of 256
(4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
are busy 25 - 50% (16 4TB SSDs).
We used the following commands to generate load. nproc = 4.
# high queue depth:
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1
# 1 qd:
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1
Cheers,
Thomas
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: NFS 4 Trunking load balancing and failover
2025-01-22 4:46 NFS 4 Trunking load balancing and failover Thomas Glanzmann
@ 2025-01-23 14:02 ` Anton Gavriliuk
0 siblings, 0 replies; 2+ messages in thread
From: Anton Gavriliuk @ 2025-01-23 14:02 UTC (permalink / raw)
To: Thomas Glanzmann; +Cc: linux-nfs
> Also we noticed that when we take the first server ip down, the NFS
> sessions stalls. We hoped that the NFS client code transparently uses the
> second ip address. Is that planned for the future?
This is a very good question. Half a year ago I had exactly the same problem.
It looks that the current NFS4 trunking is good for load balancing,
but not for failover.
If between NFS4 server and client there are N links, losing any single
link means losing all other N-1 links.
Anton
ср, 22 янв. 2025 г. в 06:54, Thomas Glanzmann <thomas@glanzmann.de>:
>
> Hello,
> we tried to use nconnect and link trunking to access a NetApp NFS file using a
> Debian Linux Kernel 6.12.9 with the following commands:
>
> mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
> mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
>
> root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
> root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
> root@debian-08:~# netstat -an | grep 2049
> tcp 0 0 10.0.10.28:834 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.10.28:826 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.10.28:951 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.10.28:707 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.10.28:853 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.10.28:914 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.20.28:862 10.0.20.48:2049 ESTABLISHED
> tcp 0 0 10.0.20.28:771 10.0.20.48:2049 TIME_WAIT
> tcp 0 0 10.0.10.28:844 10.0.10.48:2049 ESTABLISHED
> tcp 0 0 10.0.10.28:980 10.0.10.48:2049 ESTABLISHED
>
> On the netapp you can see that the traffic is unequally distributed over the
> two links:
>
> n2 : 1/22/2025 04:38:58
> *Recv Sent
> Recv Data Recv Sent Data Sent Current
> LIF Vserver Packet (Bps) Errors Packet (Bps) Errors Port
> ---- ----------------- ------ --------- ------ ------ ------- ------ -------
> nfs1 frontend-08-nfs41 26865 905471818 0 13786 1599403 0 e0e-10
> nfs2 frontend-08-nfs41 3952 114124809 0 1737 201578 0 e0f-20
>
> While that works, we noticed that to the first ip addresses 8 tcp
> connections are established and to the second only one tcp connection is
> established. When generating load we can see that the majority of the NFS
> traffic goes to the first ip. Is there a way to have more TCP connections
> established to the second ip?
>
> Also we noticed that when we take the first server ip down, the NFS
> sessions stalls. We hoped that the NFS client code transparently uses the
> second ip address. Is that planned for the future?
>
> I also tried the above with the VMware ESX hypervisor. And with the most
> recent version 8.0 Update 3 C. The traffic is equally distributed
> across the two links and when taking down one of two links, the I/O
> continues.
>
> Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
> are connected using 2 10 Gbit/s links using two vlans to a Linux VM
> which is also connected using two dedicated 10 Gbit/s links. In order to
> direct the traffic, we use two VLANs. As a result we have two
> dedicated 10 Gbit/s links between Linux VM and NetApp.
>
> We also noticed that we get the best possible performance from Linux to NetApp
> filer using the following mount options over one path:
>
> -o vers=3,nconnect=16
>
> With that setup we can get 150k 4k randop iops with a queue depth of 256
> (4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
> 4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
> are busy 25 - 50% (16 4TB SSDs).
>
> We used the following commands to generate load. nproc = 4.
>
> # high queue depth:
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1
>
> # 1 qd:
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1
>
> Cheers,
> Thomas
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-01-23 14:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-22 4:46 NFS 4 Trunking load balancing and failover Thomas Glanzmann
2025-01-23 14:02 ` Anton Gavriliuk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox