NFS 4 Trunking load balancing and failover

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* NFS 4 Trunking load balancing and failover
@ 2025-01-22  4:46 Thomas Glanzmann
  2025-01-23 14:02 ` Anton Gavriliuk
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Glanzmann @ 2025-01-22  4:46 UTC (permalink / raw)
  To: linux-nfs

Hello,
we tried to use nconnect and link trunking to access a NetApp NFS file using a
Debian Linux Kernel 6.12.9 with the following commands:

mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt

root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
root@debian-08:~# netstat -an | grep 2049
tcp        0      0 10.0.10.28:834          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:826          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:951          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:707          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:853          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:914          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.20.28:862          10.0.20.48:2049         ESTABLISHED
tcp        0      0 10.0.20.28:771          10.0.20.48:2049         TIME_WAIT
tcp        0      0 10.0.10.28:844          10.0.10.48:2049         ESTABLISHED
tcp        0      0 10.0.10.28:980          10.0.10.48:2049         ESTABLISHED

On the netapp you can see that the traffic is unequally distributed over the
two links:

n2 : 1/22/2025 04:38:58
                                  *Recv                  Sent
                         Recv      Data   Recv   Sent    Data   Sent Current
 LIF           Vserver Packet     (Bps) Errors Packet   (Bps) Errors    Port
---- ----------------- ------ --------- ------ ------ ------- ------ -------
nfs1 frontend-08-nfs41  26865 905471818      0  13786 1599403      0  e0e-10
nfs2 frontend-08-nfs41   3952 114124809      0   1737  201578      0  e0f-20

While that works, we noticed that to the first ip addresses 8 tcp
connections are established and to the second only one tcp connection is
established. When generating load we can see that the majority of the NFS
traffic goes to the first ip. Is there a way to have more TCP connections
established to the second ip?

Also we noticed that when we take the first server ip down, the NFS
sessions stalls. We hoped that the NFS client code transparently uses the
second ip address. Is that planned for the future?

I also tried the above with the VMware ESX hypervisor. And with the most
recent version 8.0 Update 3 C. The traffic is equally distributed
across the two links and when taking down one of two links, the I/O
continues.

Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
are connected using 2 10 Gbit/s links using two vlans to a Linux VM
which is also connected using two dedicated 10 Gbit/s links. In order to
direct the traffic, we use two VLANs. As a result we have two
dedicated 10 Gbit/s links between Linux VM and NetApp.

We also noticed that we get the best possible performance from Linux to NetApp
filer using the following mount options over one path:

-o vers=3,nconnect=16

With that setup we can get 150k 4k randop iops with a queue depth of 256
(4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
are busy 25 - 50% (16 4TB SSDs).

We used the following commands to generate load. nproc = 4.

# high queue depth:
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1

# 1 qd:
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1

Cheers,
        Thomas

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: NFS 4 Trunking load balancing and failover
  2025-01-22  4:46 NFS 4 Trunking load balancing and failover Thomas Glanzmann
@ 2025-01-23 14:02 ` Anton Gavriliuk
  0 siblings, 0 replies; 2+ messages in thread
From: Anton Gavriliuk @ 2025-01-23 14:02 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: linux-nfs

> Also we noticed that when we take the first server ip down, the NFS
> sessions stalls. We hoped that the NFS client code transparently uses the
> second ip address. Is that planned for the future?

This is a very good question.  Half a year ago I had exactly the same problem.

It looks that the current NFS4 trunking is good for load balancing,
but not for failover.

If between NFS4 server and client there are N links, losing any single
link means losing all other N-1 links.

Anton



ср, 22 янв. 2025 г. в 06:54, Thomas Glanzmann <thomas@glanzmann.de>:
>
> Hello,
> we tried to use nconnect and link trunking to access a NetApp NFS file using a
> Debian Linux Kernel 6.12.9 with the following commands:
>
> mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
> mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
>
> root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.10.48:/vol41 /mnt
> root@debian-08:~# mount -o nconnect=8,max_connect=16 10.0.20.48:/vol41 /mnt
> root@debian-08:~# netstat -an | grep 2049
> tcp        0      0 10.0.10.28:834          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:826          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:951          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:707          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:853          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:914          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.20.28:862          10.0.20.48:2049         ESTABLISHED
> tcp        0      0 10.0.20.28:771          10.0.20.48:2049         TIME_WAIT
> tcp        0      0 10.0.10.28:844          10.0.10.48:2049         ESTABLISHED
> tcp        0      0 10.0.10.28:980          10.0.10.48:2049         ESTABLISHED
>
> On the netapp you can see that the traffic is unequally distributed over the
> two links:
>
> n2 : 1/22/2025 04:38:58
>                                   *Recv                  Sent
>                          Recv      Data   Recv   Sent    Data   Sent Current
>  LIF           Vserver Packet     (Bps) Errors Packet   (Bps) Errors    Port
> ---- ----------------- ------ --------- ------ ------ ------- ------ -------
> nfs1 frontend-08-nfs41  26865 905471818      0  13786 1599403      0  e0e-10
> nfs2 frontend-08-nfs41   3952 114124809      0   1737  201578      0  e0f-20
>
> While that works, we noticed that to the first ip addresses 8 tcp
> connections are established and to the second only one tcp connection is
> established. When generating load we can see that the majority of the NFS
> traffic goes to the first ip. Is there a way to have more TCP connections
> established to the second ip?
>
> Also we noticed that when we take the first server ip down, the NFS
> sessions stalls. We hoped that the NFS client code transparently uses the
> second ip address. Is that planned for the future?
>
> I also tried the above with the VMware ESX hypervisor. And with the most
> recent version 8.0 Update 3 C. The traffic is equally distributed
> across the two links and when taking down one of two links, the I/O
> continues.
>
> Our Setup: We have a NetApp AFF A150. The controllers of the AFF A150
> are connected using 2 10 Gbit/s links using two vlans to a Linux VM
> which is also connected using two dedicated 10 Gbit/s links. In order to
> direct the traffic, we use two VLANs. As a result we have two
> dedicated 10 Gbit/s links between Linux VM and NetApp.
>
> We also noticed that we get the best possible performance from Linux to NetApp
> filer using the following mount options over one path:
>
> -o vers=3,nconnect=16
>
> With that setup we can get 150k 4k randop iops with a queue depth of 256
> (4 threads with 64 queue depth). This maxes out a 10 Gbit/s link with
> 4k random I/Os it also maxes out the cpu of our NetApp controller. The disks
> are busy 25 - 50% (16 4TB SSDs).
>
> We used the following commands to generate load. nproc = 4.
>
> # high queue depth:
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=64 --readwrite=write --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=256 --readwrite=randwrite --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=64 --readwrite=read --unlink=1
> fio --ioengine=libaio --filesize=2G --ramp_time=2s --runtime=1m --numjobs=$(nproc) --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=256 --readwrite=randread --unlink=1
>
> # 1 qd:
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randwrite --blocksize=4k --iodepth=1 --readwrite=randwrite --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=read --blocksize=1m --iodepth=1 --readwrite=read --unlink=1
> fio --ioengine=libaio --filesize=16G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=randread --blocksize=4k --iodepth=1 --readwrite=randread --unlink=1
>
> Cheers,
>         Thomas
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-01-23 14:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-22  4:46 NFS 4 Trunking load balancing and failover Thomas Glanzmann
2025-01-23 14:02 ` Anton Gavriliuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox