linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Unable to reconnect namespace via NVMe/TCP
@ 2025-08-12 15:48 Anton Gavriliuk
  2025-08-13  6:01 ` Nilay Shroff
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Anton Gavriliuk @ 2025-08-12 15:48 UTC (permalink / raw)
  To: linux-nvme

Hi

There are NVMe/TCP target and initiator servers, both running on
RHEL10 (6.12.0-55.25.1.el10_0.x86_64)

NVMe/TCP target exports single NVMe SSD

      "namespaces": [
        {
          "device": {
            "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
            "path": "/dev/nvme0n1"
          },
          "enable": 1,
          "nsid": 1
        }
      ],

If NVMe/TCP target is not available, initiator tries to reconnect
every 10 seconds

[ 2586.071048] nvme nvme9: failed to connect socket: -111
[ 2586.071403] nvme nvme9: Failed reconnect attempt 16/-1
[ 2586.071565] nvme nvme9: Reconnecting in 10 seconds...
[ 2596.310921] nvme nvme9: failed to connect socket: -111
[ 2596.311186] nvme nvme9: Failed reconnect attempt 17/-1
[ 2596.311349] nvme nvme9: Reconnecting in 10 seconds...
[ 2606.550772] nvme nvme9: failed to connect socket: -111
[ 2606.551252] nvme nvme9: Failed reconnect attempt 18/-1
[ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...

when NVMe/TCP target become available, initiator failed reconnect the namespace

[ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
[ 2616.793080] nvme nvme9: creating 16 I/O queues.
[ 2616.829881] nvme nvme9: mapped 16/0/0 default/read/poll queues.
[ 2616.833685] nvme nvme9: Successfully reconnected (attempt 19/-1)
[ 2616.834446] nvme nvme9: identifiers changed for nsid 1
[ 2616.835618] block nvme9n1: no usable path - requeuing I/O
[ 2616.856602] block nvme9n1: no available path - failing I/O
[ 2616.856811] block nvme9n1: no available path - failing I/O

and there is no nvme9n1 namespace in the "nvme list" output.

Anton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-12 15:48 Unable to reconnect namespace via NVMe/TCP Anton Gavriliuk
@ 2025-08-13  6:01 ` Nilay Shroff
  2025-08-18 20:58 ` Chris Leech
  2025-08-19  6:09 ` Hannes Reinecke
  2 siblings, 0 replies; 9+ messages in thread
From: Nilay Shroff @ 2025-08-13  6:01 UTC (permalink / raw)
  To: Anton Gavriliuk, linux-nvme



On 8/12/25 9:18 PM, Anton Gavriliuk wrote:
> Hi
> 
> There are NVMe/TCP target and initiator servers, both running on
> RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> 
> NVMe/TCP target exports single NVMe SSD
> 
>       "namespaces": [
>         {
>           "device": {
>             "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
>             "path": "/dev/nvme0n1"
>           },
>           "enable": 1,
>           "nsid": 1
>         }
>       ],
> 
> If NVMe/TCP target is not available, initiator tries to reconnect
> every 10 seconds
> 
> [ 2586.071048] nvme nvme9: failed to connect socket: -111
> [ 2586.071403] nvme nvme9: Failed reconnect attempt 16/-1
> [ 2586.071565] nvme nvme9: Reconnecting in 10 seconds...
> [ 2596.310921] nvme nvme9: failed to connect socket: -111
> [ 2596.311186] nvme nvme9: Failed reconnect attempt 17/-1
> [ 2596.311349] nvme nvme9: Reconnecting in 10 seconds...
> [ 2606.550772] nvme nvme9: failed to connect socket: -111
> [ 2606.551252] nvme nvme9: Failed reconnect attempt 18/-1
> [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> 
> when NVMe/TCP target become available, initiator failed reconnect the namespace
> 
> [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> [ 2616.793080] nvme nvme9: creating 16 I/O queues.
> [ 2616.829881] nvme nvme9: mapped 16/0/0 default/read/poll queues.
> [ 2616.833685] nvme nvme9: Successfully reconnected (attempt 19/-1)
> [ 2616.834446] nvme nvme9: identifiers changed for nsid 1

It seems after the initiator re-connected to the target, the namespace
identifiers (UUID/NGUID/EUI64 or CSI(command set identifiers)) have
changed for nsid 1. So host kernel removed this (nvme9n1) namespace.

> [ 2616.835618] block nvme9n1: no usable path - requeuing I/O
> [ 2616.856602] block nvme9n1: no available path - failing I/O
> [ 2616.856811] block nvme9n1: no available path - failing I/O
> 
> and there is no nvme9n1 namespace in the "nvme list" output.
> 
So you may check on the target about what could have possibly changed
during reconnect time window for nsid 1. 

As a side note, it’s generally best to reproduce and report issues with
an upstream kernel when posting to the kernel mailing list. This helps
ensure the report gets prompt attention and makes it easier for others
to debug and assist.

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-12 15:48 Unable to reconnect namespace via NVMe/TCP Anton Gavriliuk
  2025-08-13  6:01 ` Nilay Shroff
@ 2025-08-18 20:58 ` Chris Leech
  2025-08-19 13:12   ` Maurizio Lombardi
  2025-08-19 13:45   ` Anton Gavriliuk
  2025-08-19  6:09 ` Hannes Reinecke
  2 siblings, 2 replies; 9+ messages in thread
From: Chris Leech @ 2025-08-18 20:58 UTC (permalink / raw)
  To: Anton Gavriliuk; +Cc: linux-nvme

On Tue, Aug 12, 2025 at 06:48:29PM +0300, Anton Gavriliuk wrote:
> Hi
> 
> There are NVMe/TCP target and initiator servers, both running on
> RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> 
> NVMe/TCP target exports single NVMe SSD
> 
>       "namespaces": [
>         {
>           "device": {
>             "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
>             "path": "/dev/nvme0n1"
>           },
>           "enable": 1,
>           "nsid": 1
>         }
>       ],
> 
> If NVMe/TCP target is not available, initiator tries to reconnect
> every 10 seconds

How is the target becoming unavailable?  Is it as network interruption,
or is the target being rebooted or reconfigured?

Is the shared snippet the entire "namespaces" section of the
configuration file?

It looks like the target code will generate a random uuid for a device
when it's configured, which could then trip up the host attempting to
reconnect across a target reboot. But I think the uuid can be saved as
well as the nguid in the target configuration.

- Chris Leech



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-12 15:48 Unable to reconnect namespace via NVMe/TCP Anton Gavriliuk
  2025-08-13  6:01 ` Nilay Shroff
  2025-08-18 20:58 ` Chris Leech
@ 2025-08-19  6:09 ` Hannes Reinecke
  2025-08-19 10:55   ` Yi Zhang
  2 siblings, 1 reply; 9+ messages in thread
From: Hannes Reinecke @ 2025-08-19  6:09 UTC (permalink / raw)
  To: linux-nvme

On 8/12/25 17:48, Anton Gavriliuk wrote:
> Hi
> 
> There are NVMe/TCP target and initiator servers, both running on
> RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> 
> NVMe/TCP target exports single NVMe SSD
> 
>        "namespaces": [
>          {
>            "device": {
>              "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
>              "path": "/dev/nvme0n1"
>            },
>            "enable": 1,
>            "nsid": 1
>          }
>        ],
> 
> If NVMe/TCP target is not available, initiator tries to reconnect
> every 10 seconds
> 
> [ 2586.071048] nvme nvme9: failed to connect socket: -111
> [ 2586.071403] nvme nvme9: Failed reconnect attempt 16/-1
> [ 2586.071565] nvme nvme9: Reconnecting in 10 seconds...
> [ 2596.310921] nvme nvme9: failed to connect socket: -111
> [ 2596.311186] nvme nvme9: Failed reconnect attempt 17/-1
> [ 2596.311349] nvme nvme9: Reconnecting in 10 seconds...
> [ 2606.550772] nvme nvme9: failed to connect socket: -111
> [ 2606.551252] nvme nvme9: Failed reconnect attempt 18/-1
> [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> 
> when NVMe/TCP target become available, initiator failed reconnect the namespace
> 
> [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> [ 2616.793080] nvme nvme9: creating 16 I/O queues.
> [ 2616.829881] nvme nvme9: mapped 16/0/0 default/read/poll queues.
> [ 2616.833685] nvme nvme9: Successfully reconnected (attempt 19/-1)
> [ 2616.834446] nvme nvme9: identifiers changed for nsid 1
> [ 2616.835618] block nvme9n1: no usable path - requeuing I/O
> [ 2616.856602] block nvme9n1: no available path - failing I/O
> [ 2616.856811] block nvme9n1: no available path - failing I/O
> 
> and there is no nvme9n1 namespace in the "nvme list" output.
> 
This looks like the missed re-scan issue I found recently.
Should be fixed with
9546ad1a9bda ("nvme: requeue namespace scan on missed AENs")

(And you are running RHEL. Please open a bugzilla with RH.)
(And why am I even answering that?)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-19  6:09 ` Hannes Reinecke
@ 2025-08-19 10:55   ` Yi Zhang
  2025-08-19 13:43     ` Yi Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Yi Zhang @ 2025-08-19 10:55 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-nvme, Chris Leech, Maurizio Lombardi

Hi Hannes

I tried with the upstream kernel v6.17-rc2, and it can still be reproduced.

# dmesg | tail -30
[  219.560691] nvme nvme0: Failed reconnect attempt 3/-1
[  219.565784] nvme nvme0: Reconnecting in 10 seconds...
[  229.795215] nvme nvme0: failed to connect socket: -111
[  229.800369] nvme nvme0: Failed reconnect attempt 4/-1
[  229.805450] nvme nvme0: Reconnecting in 10 seconds...
[  240.034918] nvme nvme0: failed to connect socket: -111
[  240.040093] nvme nvme0: Failed reconnect attempt 5/-1
[  240.045165] nvme nvme0: Reconnecting in 10 seconds...
[  250.274619] nvme nvme0: failed to connect socket: -111
[  250.279776] nvme nvme0: Failed reconnect attempt 6/-1
[  250.284855] nvme nvme0: Reconnecting in 10 seconds...
[  260.514102] nvme nvme0: failed to connect socket: -111
[  260.519261] nvme nvme0: Failed reconnect attempt 7/-1
[  260.524340] nvme nvme0: Reconnecting in 10 seconds...
[  270.754031] nvme nvme0: failed to connect socket: -111
[  270.759184] nvme nvme0: Failed reconnect attempt 8/-1
[  270.764263] nvme nvme0: Reconnecting in 10 seconds...
[  280.993410] nvme nvme0: failed to connect socket: -111
[  280.998591] nvme nvme0: Failed reconnect attempt 9/-1
[  281.003653] nvme nvme0: Reconnecting in 10 seconds...
[  291.249090] nvme nvme0: creating 4 I/O queues.
[  291.264959] nvme nvme0: mapped 4/0/0 default/read/poll queues.
[  291.271975] nvme nvme0: Successfully reconnected (attempt 10/-1)
[  291.273897] nvme nvme0: identifiers changed for nsid 2
[  291.283631] block nvme0n1: no available path - failing I/O
[  291.289139] block nvme0n1: no available path - failing I/O
[  291.294649] block nvme0n1: no available path - failing I/O
[  291.300159] block nvme0n1: no available path - failing I/O
[  291.305665] block nvme0n1: no available path - failing I/O
[  291.311197] block nvme0n1: no available path - failing I/O

On Tue, Aug 19, 2025 at 2:11 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 8/12/25 17:48, Anton Gavriliuk wrote:
> > Hi
> >
> > There are NVMe/TCP target and initiator servers, both running on
> > RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> >
> > NVMe/TCP target exports single NVMe SSD
> >
> >        "namespaces": [
> >          {
> >            "device": {
> >              "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
> >              "path": "/dev/nvme0n1"
> >            },
> >            "enable": 1,
> >            "nsid": 1
> >          }
> >        ],
> >
> > If NVMe/TCP target is not available, initiator tries to reconnect
> > every 10 seconds
> >
> > [ 2586.071048] nvme nvme9: failed to connect socket: -111
> > [ 2586.071403] nvme nvme9: Failed reconnect attempt 16/-1
> > [ 2586.071565] nvme nvme9: Reconnecting in 10 seconds...
> > [ 2596.310921] nvme nvme9: failed to connect socket: -111
> > [ 2596.311186] nvme nvme9: Failed reconnect attempt 17/-1
> > [ 2596.311349] nvme nvme9: Reconnecting in 10 seconds...
> > [ 2606.550772] nvme nvme9: failed to connect socket: -111
> > [ 2606.551252] nvme nvme9: Failed reconnect attempt 18/-1
> > [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> >
> > when NVMe/TCP target become available, initiator failed reconnect the namespace
> >
> > [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> > [ 2616.793080] nvme nvme9: creating 16 I/O queues.
> > [ 2616.829881] nvme nvme9: mapped 16/0/0 default/read/poll queues.
> > [ 2616.833685] nvme nvme9: Successfully reconnected (attempt 19/-1)
> > [ 2616.834446] nvme nvme9: identifiers changed for nsid 1
> > [ 2616.835618] block nvme9n1: no usable path - requeuing I/O
> > [ 2616.856602] block nvme9n1: no available path - failing I/O
> > [ 2616.856811] block nvme9n1: no available path - failing I/O
> >
> > and there is no nvme9n1 namespace in the "nvme list" output.
> >
> This looks like the missed re-scan issue I found recently.
> Should be fixed with
> 9546ad1a9bda ("nvme: requeue namespace scan on missed AENs")
>
> (And you are running RHEL. Please open a bugzilla with RH.)
> (And why am I even answering that?)
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                  Kernel Storage Architect
> hare@suse.de                                +49 911 74053 688
> SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
>


--
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-18 20:58 ` Chris Leech
@ 2025-08-19 13:12   ` Maurizio Lombardi
  2025-08-19 13:45   ` Anton Gavriliuk
  1 sibling, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2025-08-19 13:12 UTC (permalink / raw)
  To: Chris Leech, Anton Gavriliuk; +Cc: linux-nvme

On Mon Aug 18, 2025 at 10:58 PM CEST, Chris Leech wrote:
> On Tue, Aug 12, 2025 at 06:48:29PM +0300, Anton Gavriliuk wrote:
>> Hi
>> 
>> There are NVMe/TCP target and initiator servers, both running on
>> RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
>> 
>> NVMe/TCP target exports single NVMe SSD
>> 
>>       "namespaces": [
>>         {
>>           "device": {
>>             "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
>>             "path": "/dev/nvme0n1"
>>           },
>>           "enable": 1,
>>           "nsid": 1
>>         }
>>       ],
>> 
>> If NVMe/TCP target is not available, initiator tries to reconnect
>> every 10 seconds
>
> How is the target becoming unavailable?  Is it as network interruption,
> or is the target being rebooted or reconfigured?
>
> Is the shared snippet the entire "namespaces" section of the
> configuration file?
>
> It looks like the target code will generate a random uuid for a device
> when it's configured, which could then trip up the host attempting to
> reconnect across a target reboot. But I think the uuid can be saved as
> well as the nguid in the target configuration.

Indeed, normally, nvmetcli saves the uuid in the config file.

Maurizio


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-19 10:55   ` Yi Zhang
@ 2025-08-19 13:43     ` Yi Zhang
  2025-08-19 14:29       ` Anton Gavriliuk
  0 siblings, 1 reply; 9+ messages in thread
From: Yi Zhang @ 2025-08-19 13:43 UTC (permalink / raw)
  To: Hannes Reinecke, Anton Gavriliuk
  Cc: linux-nvme, Chris Leech, Maurizio Lombardi

Hi Anton

Please try to add the uuid in the device field, which should fix your issue.

On Tue, Aug 19, 2025 at 6:55 PM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hi Hannes
>
> I tried with the upstream kernel v6.17-rc2, and it can still be reproduced.
>
> # dmesg | tail -30
> [  219.560691] nvme nvme0: Failed reconnect attempt 3/-1
> [  219.565784] nvme nvme0: Reconnecting in 10 seconds...
> [  229.795215] nvme nvme0: failed to connect socket: -111
> [  229.800369] nvme nvme0: Failed reconnect attempt 4/-1
> [  229.805450] nvme nvme0: Reconnecting in 10 seconds...
> [  240.034918] nvme nvme0: failed to connect socket: -111
> [  240.040093] nvme nvme0: Failed reconnect attempt 5/-1
> [  240.045165] nvme nvme0: Reconnecting in 10 seconds...
> [  250.274619] nvme nvme0: failed to connect socket: -111
> [  250.279776] nvme nvme0: Failed reconnect attempt 6/-1
> [  250.284855] nvme nvme0: Reconnecting in 10 seconds...
> [  260.514102] nvme nvme0: failed to connect socket: -111
> [  260.519261] nvme nvme0: Failed reconnect attempt 7/-1
> [  260.524340] nvme nvme0: Reconnecting in 10 seconds...
> [  270.754031] nvme nvme0: failed to connect socket: -111
> [  270.759184] nvme nvme0: Failed reconnect attempt 8/-1
> [  270.764263] nvme nvme0: Reconnecting in 10 seconds...
> [  280.993410] nvme nvme0: failed to connect socket: -111
> [  280.998591] nvme nvme0: Failed reconnect attempt 9/-1
> [  281.003653] nvme nvme0: Reconnecting in 10 seconds...
> [  291.249090] nvme nvme0: creating 4 I/O queues.
> [  291.264959] nvme nvme0: mapped 4/0/0 default/read/poll queues.
> [  291.271975] nvme nvme0: Successfully reconnected (attempt 10/-1)
> [  291.273897] nvme nvme0: identifiers changed for nsid 2
> [  291.283631] block nvme0n1: no available path - failing I/O
> [  291.289139] block nvme0n1: no available path - failing I/O
> [  291.294649] block nvme0n1: no available path - failing I/O
> [  291.300159] block nvme0n1: no available path - failing I/O
> [  291.305665] block nvme0n1: no available path - failing I/O
> [  291.311197] block nvme0n1: no available path - failing I/O
>
> On Tue, Aug 19, 2025 at 2:11 PM Hannes Reinecke <hare@suse.de> wrote:
> >
> > On 8/12/25 17:48, Anton Gavriliuk wrote:
> > > Hi
> > >
> > > There are NVMe/TCP target and initiator servers, both running on
> > > RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> > >
> > > NVMe/TCP target exports single NVMe SSD
> > >
> > >        "namespaces": [
> > >          {
> > >            "device": {
> > >              "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
> > >              "path": "/dev/nvme0n1"
> > >            },
> > >            "enable": 1,
> > >            "nsid": 1
> > >          }
> > >        ],
> > >
> > > If NVMe/TCP target is not available, initiator tries to reconnect
> > > every 10 seconds
> > >
> > > [ 2586.071048] nvme nvme9: failed to connect socket: -111
> > > [ 2586.071403] nvme nvme9: Failed reconnect attempt 16/-1
> > > [ 2586.071565] nvme nvme9: Reconnecting in 10 seconds...
> > > [ 2596.310921] nvme nvme9: failed to connect socket: -111
> > > [ 2596.311186] nvme nvme9: Failed reconnect attempt 17/-1
> > > [ 2596.311349] nvme nvme9: Reconnecting in 10 seconds...
> > > [ 2606.550772] nvme nvme9: failed to connect socket: -111
> > > [ 2606.551252] nvme nvme9: Failed reconnect attempt 18/-1
> > > [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> > >
> > > when NVMe/TCP target become available, initiator failed reconnect the namespace
> > >
> > > [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> > > [ 2616.793080] nvme nvme9: creating 16 I/O queues.
> > > [ 2616.829881] nvme nvme9: mapped 16/0/0 default/read/poll queues.
> > > [ 2616.833685] nvme nvme9: Successfully reconnected (attempt 19/-1)
> > > [ 2616.834446] nvme nvme9: identifiers changed for nsid 1
> > > [ 2616.835618] block nvme9n1: no usable path - requeuing I/O
> > > [ 2616.856602] block nvme9n1: no available path - failing I/O
> > > [ 2616.856811] block nvme9n1: no available path - failing I/O
> > >
> > > and there is no nvme9n1 namespace in the "nvme list" output.
> > >
> > This looks like the missed re-scan issue I found recently.
> > Should be fixed with
> > 9546ad1a9bda ("nvme: requeue namespace scan on missed AENs")
> >
> > (And you are running RHEL. Please open a bugzilla with RH.)
> > (And why am I even answering that?)
> >
> > Cheers,
> >
> > Hannes
> > --
> > Dr. Hannes Reinecke                  Kernel Storage Architect
> > hare@suse.de                                +49 911 74053 688
> > SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> > HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
> >
>
>
> --
> Best Regards,
>   Yi Zhang



-- 
Best Regards,
  Yi Zhang



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-18 20:58 ` Chris Leech
  2025-08-19 13:12   ` Maurizio Lombardi
@ 2025-08-19 13:45   ` Anton Gavriliuk
  1 sibling, 0 replies; 9+ messages in thread
From: Anton Gavriliuk @ 2025-08-19 13:45 UTC (permalink / raw)
  To: Chris Leech; +Cc: linux-nvme

> How is the target becoming unavailable?  Is it as network interruption, or is the target being rebooted or reconfigured?

NVMe/TCP target reboot.

> Is the shared snippet the entire "namespaces" section of the
configuration file?

Yes, correct.

Anton

пн, 18 авг. 2025 г. в 23:58, Chris Leech <cleech@redhat.com>:
>
> On Tue, Aug 12, 2025 at 06:48:29PM +0300, Anton Gavriliuk wrote:
> > Hi
> >
> > There are NVMe/TCP target and initiator servers, both running on
> > RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> >
> > NVMe/TCP target exports single NVMe SSD
> >
> >       "namespaces": [
> >         {
> >           "device": {
> >             "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
> >             "path": "/dev/nvme0n1"
> >           },
> >           "enable": 1,
> >           "nsid": 1
> >         }
> >       ],
> >
> > If NVMe/TCP target is not available, initiator tries to reconnect
> > every 10 seconds
>
> How is the target becoming unavailable?  Is it as network interruption,
> or is the target being rebooted or reconfigured?
>
> Is the shared snippet the entire "namespaces" section of the
> configuration file?
>
> It looks like the target code will generate a random uuid for a device
> when it's configured, which could then trip up the host attempting to
> reconnect across a target reboot. But I think the uuid can be saved as
> well as the nguid in the target configuration.
>
> - Chris Leech
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Unable to reconnect namespace via NVMe/TCP
  2025-08-19 13:43     ` Yi Zhang
@ 2025-08-19 14:29       ` Anton Gavriliuk
  0 siblings, 0 replies; 9+ messages in thread
From: Anton Gavriliuk @ 2025-08-19 14:29 UTC (permalink / raw)
  To: Yi Zhang; +Cc: Hannes Reinecke, linux-nvme, Chris Leech, Maurizio Lombardi

Hi Yi Zhang

> Please try to add the uuid in the device field, which should fix your issue.

On the NVMe/TCP target for the given device (/dev/nvme0n1) nguid and
uuid are exactly the same

[root@memverge4 ~]# cat /sys/class/block/nvme0n1/uuid
01000000-0000-0000-8ce3-8ee3064aa4f2
[root@memverge4 ~]# cat /sys/class/block/nvme0n1/nguid
01000000-0000-0000-8ce3-8ee3064aa4f2

So I added uuid

      "namespaces": [
        {
          "device": {
            "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
            "uuid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
            "path": "/dev/nvme0n1"
          },
          "enable": 1,
           "nsid": 1
        }
      ],

Yes, this fixed my issue - automatically reconnect namespace after
NVMe/TCP target reboot.

Anton

вт, 19 авг. 2025 г. в 16:43, Yi Zhang <yi.zhang@redhat.com>:
>
> Hi Anton
>
> Please try to add the uuid in the device field, which should fix your issue.
>
> On Tue, Aug 19, 2025 at 6:55 PM Yi Zhang <yi.zhang@redhat.com> wrote:
> >
> > Hi Hannes
> >
> > I tried with the upstream kernel v6.17-rc2, and it can still be reproduced.
> >
> > # dmesg | tail -30
> > [  219.560691] nvme nvme0: Failed reconnect attempt 3/-1
> > [  219.565784] nvme nvme0: Reconnecting in 10 seconds...
> > [  229.795215] nvme nvme0: failed to connect socket: -111
> > [  229.800369] nvme nvme0: Failed reconnect attempt 4/-1
> > [  229.805450] nvme nvme0: Reconnecting in 10 seconds...
> > [  240.034918] nvme nvme0: failed to connect socket: -111
> > [  240.040093] nvme nvme0: Failed reconnect attempt 5/-1
> > [  240.045165] nvme nvme0: Reconnecting in 10 seconds...
> > [  250.274619] nvme nvme0: failed to connect socket: -111
> > [  250.279776] nvme nvme0: Failed reconnect attempt 6/-1
> > [  250.284855] nvme nvme0: Reconnecting in 10 seconds...
> > [  260.514102] nvme nvme0: failed to connect socket: -111
> > [  260.519261] nvme nvme0: Failed reconnect attempt 7/-1
> > [  260.524340] nvme nvme0: Reconnecting in 10 seconds...
> > [  270.754031] nvme nvme0: failed to connect socket: -111
> > [  270.759184] nvme nvme0: Failed reconnect attempt 8/-1
> > [  270.764263] nvme nvme0: Reconnecting in 10 seconds...
> > [  280.993410] nvme nvme0: failed to connect socket: -111
> > [  280.998591] nvme nvme0: Failed reconnect attempt 9/-1
> > [  281.003653] nvme nvme0: Reconnecting in 10 seconds...
> > [  291.249090] nvme nvme0: creating 4 I/O queues.
> > [  291.264959] nvme nvme0: mapped 4/0/0 default/read/poll queues.
> > [  291.271975] nvme nvme0: Successfully reconnected (attempt 10/-1)
> > [  291.273897] nvme nvme0: identifiers changed for nsid 2
> > [  291.283631] block nvme0n1: no available path - failing I/O
> > [  291.289139] block nvme0n1: no available path - failing I/O
> > [  291.294649] block nvme0n1: no available path - failing I/O
> > [  291.300159] block nvme0n1: no available path - failing I/O
> > [  291.305665] block nvme0n1: no available path - failing I/O
> > [  291.311197] block nvme0n1: no available path - failing I/O
> >
> > On Tue, Aug 19, 2025 at 2:11 PM Hannes Reinecke <hare@suse.de> wrote:
> > >
> > > On 8/12/25 17:48, Anton Gavriliuk wrote:
> > > > Hi
> > > >
> > > > There are NVMe/TCP target and initiator servers, both running on
> > > > RHEL10 (6.12.0-55.25.1.el10_0.x86_64)
> > > >
> > > > NVMe/TCP target exports single NVMe SSD
> > > >
> > > >        "namespaces": [
> > > >          {
> > > >            "device": {
> > > >              "nguid": "01000000-0000-0000-8ce3-8ee3064aa4f2",
> > > >              "path": "/dev/nvme0n1"
> > > >            },
> > > >            "enable": 1,
> > > >            "nsid": 1
> > > >          }
> > > >        ],
> > > >
> > > > If NVMe/TCP target is not available, initiator tries to reconnect
> > > > every 10 seconds
> > > >
> > > > [ 2586.071048] nvme nvme9: failed to connect socket: -111
> > > > [ 2586.071403] nvme nvme9: Failed reconnect attempt 16/-1
> > > > [ 2586.071565] nvme nvme9: Reconnecting in 10 seconds...
> > > > [ 2596.310921] nvme nvme9: failed to connect socket: -111
> > > > [ 2596.311186] nvme nvme9: Failed reconnect attempt 17/-1
> > > > [ 2596.311349] nvme nvme9: Reconnecting in 10 seconds...
> > > > [ 2606.550772] nvme nvme9: failed to connect socket: -111
> > > > [ 2606.551252] nvme nvme9: Failed reconnect attempt 18/-1
> > > > [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> > > >
> > > > when NVMe/TCP target become available, initiator failed reconnect the namespace
> > > >
> > > > [ 2606.551592] nvme nvme9: Reconnecting in 10 seconds...
> > > > [ 2616.793080] nvme nvme9: creating 16 I/O queues.
> > > > [ 2616.829881] nvme nvme9: mapped 16/0/0 default/read/poll queues.
> > > > [ 2616.833685] nvme nvme9: Successfully reconnected (attempt 19/-1)
> > > > [ 2616.834446] nvme nvme9: identifiers changed for nsid 1
> > > > [ 2616.835618] block nvme9n1: no usable path - requeuing I/O
> > > > [ 2616.856602] block nvme9n1: no available path - failing I/O
> > > > [ 2616.856811] block nvme9n1: no available path - failing I/O
> > > >
> > > > and there is no nvme9n1 namespace in the "nvme list" output.
> > > >
> > > This looks like the missed re-scan issue I found recently.
> > > Should be fixed with
> > > 9546ad1a9bda ("nvme: requeue namespace scan on missed AENs")
> > >
> > > (And you are running RHEL. Please open a bugzilla with RH.)
> > > (And why am I even answering that?)
> > >
> > > Cheers,
> > >
> > > Hannes
> > > --
> > > Dr. Hannes Reinecke                  Kernel Storage Architect
> > > hare@suse.de                                +49 911 74053 688
> > > SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> > > HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
> > >
> >
> >
> > --
> > Best Regards,
> >   Yi Zhang
>
>
>
> --
> Best Regards,
>   Yi Zhang
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-19 18:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12 15:48 Unable to reconnect namespace via NVMe/TCP Anton Gavriliuk
2025-08-13  6:01 ` Nilay Shroff
2025-08-18 20:58 ` Chris Leech
2025-08-19 13:12   ` Maurizio Lombardi
2025-08-19 13:45   ` Anton Gavriliuk
2025-08-19  6:09 ` Hannes Reinecke
2025-08-19 10:55   ` Yi Zhang
2025-08-19 13:43     ` Yi Zhang
2025-08-19 14:29       ` Anton Gavriliuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).