From: Thomas Glanzmann <thomas@glanzmann.de>
To: linux-nvme@lists.infradead.org
Subject: NetApp Snapmirror Active Sync NVMe target and Linux initiator: Paths disappearing after a power failure of one NetApp a150 system
Date: Thu, 18 Sep 2025 18:09:52 +0200 [thread overview]
Message-ID: <aMwu0DK_KRGF_UGD@glanzmann.de> (raw)
Hello,
today I tried to use two NetApp AFF A150 which consists of two
controllers and 8 disks each with NVMe/TCP. The idea is that you have 2
paths to each controller via two VLANs. If a controller or A150 fails the other
one should take over. That worked, however when the manually power fenced
system came back, I saw that the paths to it where 'chaining' and than
disappearing. However, I could manually reconnect the paths.
I assume that the NVME target of the powerfenced system came up without the
full configuration loaded. As a result the Linux NVMe initiator droped the
paths. Is that correct or is it a bug in the Linux NVMe stack. I used Debian
trixie (6.12.43+deb13-cloud-amd64).
- Before I had 8 paths:
(debian-05) [~] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme4 tcp traddr=10.0.10.235,trsvcid=4420,src_addr=10.0.10.25 live optimized
+- nvme5 tcp traddr=10.0.20.235,trsvcid=4420,src_addr=10.0.20.25 live optimized
+- nvme6 tcp traddr=10.0.10.236,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme7 tcp traddr=10.0.20.236,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
(debian-05) [~] mount /dev/nvme4n1 /mnt
(debian-05) [~] cd /mnt/
(debian-05) [/mnt] while true; do date | tee -a date.log; sync; sleep 1; done
...
Thu Sep 18 16:04:49 CEST 2025
Thu Sep 18 16:04:50 CEST 2025
# 10 seconds for the path failover
Thu Sep 18 16:05:00 CEST 2025
Thu Sep 18 16:05:01 CEST 2025
...
- While the first AFF A150 was offline:
(debian-05) [/mnt] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme4 tcp traddr=10.0.10.235,trsvcid=4420 connecting optimized
+- nvme5 tcp traddr=10.0.20.235,trsvcid=4420 connecting optimized
+- nvme6 tcp traddr=10.0.10.236,trsvcid=4420 connecting non-optimized
+- nvme7 tcp traddr=10.0.20.236,trsvcid=4420 connecting non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
- Than I don't have the output because I ran it in watch, it said 'changing' for the failed paths.
- And than the paths disappeared completely and did not come back.
(debian-05) [/mnt] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
- Than I manually reconnected:
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.10.235 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.20.235 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.10.236 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.20.236 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme12 tcp traddr=10.0.20.236,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme5 tcp traddr=10.0.10.235,trsvcid=4420,src_addr=10.0.10.25 live optimized
+- nvme6 tcp traddr=10.0.20.235,trsvcid=4420,src_addr=10.0.20.25 live optimized
+- nvme7 tcp traddr=10.0.10.236,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimizedo
In dmesg I saw:
[90303.743154] nvme nvme6: Reconnecting in 10 seconds...
[90311.695617] nvme nvme4: Connect Invalid Data Parameter, subsysnqn "nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem"
[90311.695714] nvme nvme4: failed to connect queue: 0 ret=16770
[90311.695754] nvme nvme4: Failed reconnect attempt 1/-1
[90311.695775] nvme nvme5: Connect Invalid Data Parameter, subsysnqn "nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem"
[90311.695789] nvme nvme4: Removing controller (16770)...
[90311.695876] nvme nvme5: failed to connect queue: 0 ret=16770
[90311.695909] nvme nvme4: Removing ctrl: NQN "nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem"
full dmesg: https://tg.st/u/297bb062eca5c9d05b533691ce98c2fffc31deafe45fe399d7492381b47313bb.txt
Cheers,
Thomas
reply other threads:[~2025-09-18 16:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMwu0DK_KRGF_UGD@glanzmann.de \
--to=thomas@glanzmann.de \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).