From: Thomas Glanzmann <thomas@glanzmann.de>
To: linux-nvme@lists.infradead.org
Subject: NetApp Snapmirror Active Sync NVMe target and Linux initiator: Paths disappearing after a power failure of one NetApp a150 system
Date: Thu, 18 Sep 2025 18:09:52 +0200 [thread overview]
Message-ID: <aMwu0DK_KRGF_UGD@glanzmann.de> (raw)
Hello,
today I tried to use two NetApp AFF A150 which consists of two
controllers and 8 disks each with NVMe/TCP. The idea is that you have 2
paths to each controller via two VLANs. If a controller or A150 fails the other
one should take over. That worked, however when the manually power fenced
system came back, I saw that the paths to it where 'chaining' and than
disappearing. However, I could manually reconnect the paths.
I assume that the NVME target of the powerfenced system came up without the
full configuration loaded. As a result the Linux NVMe initiator droped the
paths. Is that correct or is it a bug in the Linux NVMe stack. I used Debian
trixie (6.12.43+deb13-cloud-amd64).
- Before I had 8 paths:
(debian-05) [~] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme4 tcp traddr=10.0.10.235,trsvcid=4420,src_addr=10.0.10.25 live optimized
+- nvme5 tcp traddr=10.0.20.235,trsvcid=4420,src_addr=10.0.20.25 live optimized
+- nvme6 tcp traddr=10.0.10.236,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme7 tcp traddr=10.0.20.236,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
(debian-05) [~] mount /dev/nvme4n1 /mnt
(debian-05) [~] cd /mnt/
(debian-05) [/mnt] while true; do date | tee -a date.log; sync; sleep 1; done
...
Thu Sep 18 16:04:49 CEST 2025
Thu Sep 18 16:04:50 CEST 2025
# 10 seconds for the path failover
Thu Sep 18 16:05:00 CEST 2025
Thu Sep 18 16:05:01 CEST 2025
...
- While the first AFF A150 was offline:
(debian-05) [/mnt] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme4 tcp traddr=10.0.10.235,trsvcid=4420 connecting optimized
+- nvme5 tcp traddr=10.0.20.235,trsvcid=4420 connecting optimized
+- nvme6 tcp traddr=10.0.10.236,trsvcid=4420 connecting non-optimized
+- nvme7 tcp traddr=10.0.20.236,trsvcid=4420 connecting non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
- Than I don't have the output because I ran it in watch, it said 'changing' for the failed paths.
- And than the paths disappeared completely and did not come back.
(debian-05) [/mnt] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
- Than I manually reconnected:
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.10.235 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.20.235 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.10.236 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme connect -t tcp -a 10.0.20.236 -s 4420 -n nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem -i 14
(debian-05) [/mnt] nvme list-subsys /dev/nvme4n1
nvme-subsys4 - NQN=nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem
hostnqn=nqn.2014-08.org.nvmexpress:uuid:a44b1e42-29ec-1c09-c043-741282002d9c
\
+- nvme10 tcp traddr=10.0.10.238,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme11 tcp traddr=10.0.20.238,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme12 tcp traddr=10.0.20.236,trsvcid=4420,src_addr=10.0.20.25 live non-optimized
+- nvme5 tcp traddr=10.0.10.235,trsvcid=4420,src_addr=10.0.10.25 live optimized
+- nvme6 tcp traddr=10.0.20.235,trsvcid=4420,src_addr=10.0.20.25 live optimized
+- nvme7 tcp traddr=10.0.10.236,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme8 tcp traddr=10.0.10.237,trsvcid=4420,src_addr=10.0.10.25 live non-optimized
+- nvme9 tcp traddr=10.0.20.237,trsvcid=4420,src_addr=10.0.20.25 live non-optimizedo
In dmesg I saw:
[90303.743154] nvme nvme6: Reconnecting in 10 seconds...
[90311.695617] nvme nvme4: Connect Invalid Data Parameter, subsysnqn "nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem"
[90311.695714] nvme nvme4: failed to connect queue: 0 ret=16770
[90311.695754] nvme nvme4: Failed reconnect attempt 1/-1
[90311.695775] nvme nvme5: Connect Invalid Data Parameter, subsysnqn "nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem"
[90311.695789] nvme nvme4: Removing controller (16770)...
[90311.695876] nvme nvme5: failed to connect queue: 0 ret=16770
[90311.695909] nvme nvme4: Removing ctrl: NQN "nqn.1992-08.com.netapp:sn.347a30cc947511f083a6d039ea2800fc:subsystem.subsystem"
full dmesg: https://tg.st/u/297bb062eca5c9d05b533691ce98c2fffc31deafe45fe399d7492381b47313bb.txt
Cheers,
Thomas
reply other threads:[~2025-09-18 16:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMwu0DK_KRGF_UGD@glanzmann.de \
--to=thomas@glanzmann.de \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.