All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] nvme: missed AENs during scanning
@ 2025-04-03  7:19 Hannes Reinecke
  2025-04-03  7:19 ` [PATCH 1/2] nvme: requeue namespace scan on missed AENs Hannes Reinecke
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Hannes Reinecke @ 2025-04-03  7:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

Hi all,

I have been tracking down a long-standing issue from one of our partners
which had a test where namespaces have been remapped in quick succession.
The operation was to remove a namespace and add a new namespace at the same
NSID with a different UUID and different ANA group ID.
And that repeated in quick succession.
There had been several attempts to fix this ([1], [2], and the patch from
Keith in 1f021341eef4 ("nvme-multipath: defer partition scanning")), but
the test case continued to fail ending up with all paths in ANA Inaccessible.

Eventually it turned out that we're skipping NS Changed AENs; if namescan
scanning is active when we receive a NS Changed AEN we'll simply ignore it,
and do not re-read the namespace list as we should.
Additionally the NVMe base spec states (NVMe Base Specification v2.1, Figure
151 'Asynchonous Event Information - Notice': Asymmetric Namespace Access Change):

  A controller shall not send this even if an Attached Namespace Attribute Changed
  asynchronous event [ .. ] is sent for the same event.

so we need to re-read the ANA log page after we rescanned the namespace list to
update the ANA states of the new namespaces.

As usual, comments and reviews are welcome.

[1] https://lore.kernel.org/linux-nvme/20240902111548.41430-1-hare@kernel.org/
[2] https://lore.kernel.org/linux-nvme/20241007100134.21104-1-hare@kernel.org/

Hannes Reinecke (2):
  nvme: requeue namespace scan on missed AENs
  nvme: re-read ANA log page after ns scan completes

 drivers/nvme/host/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] nvme: requeue namespace scan on missed AENs
  2025-04-03  7:19 [PATCH 0/2] nvme: missed AENs during scanning Hannes Reinecke
@ 2025-04-03  7:19 ` Hannes Reinecke
  2025-04-05 23:01   ` Keith Busch
  2025-04-03  7:19 ` [PATCH 2/2] nvme: re-read ANA log page after ns scan completes Hannes Reinecke
  2025-04-07 14:32 ` [PATCH 0/2] nvme: missed AENs during scanning Christoph Hellwig
  2 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2025-04-03  7:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

From: Hannes Reinecke <hare@kernel.org

Scanning for namespaces can take some time, so if the target is
reconfigured while the scan is running we will miss the AEN.
So check if the NVME_AER_NOTICE_NS_CHANGED bit is set once the scan is
finished, and requeue scanning to pick up any missed changes.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8359d0aa0e44..70f9c2d2b113 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4292,6 +4292,10 @@ static void nvme_scan_work(struct work_struct *work)
 			nvme_scan_ns_sequential(ctrl);
 	}
 	mutex_unlock(&ctrl->scan_lock);
+
+	/* Requeue if we have missed AENs */
+	if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
+		nvme_queue_scan(ctrl);
 }
 
 /*
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: re-read ANA log page after ns scan completes
  2025-04-03  7:19 [PATCH 0/2] nvme: missed AENs during scanning Hannes Reinecke
  2025-04-03  7:19 ` [PATCH 1/2] nvme: requeue namespace scan on missed AENs Hannes Reinecke
@ 2025-04-03  7:19 ` Hannes Reinecke
  2025-04-03 14:55   ` Keith Busch
  2025-04-05 23:02   ` Keith Busch
  2025-04-07 14:32 ` [PATCH 0/2] nvme: missed AENs during scanning Christoph Hellwig
  2 siblings, 2 replies; 10+ messages in thread
From: Hannes Reinecke @ 2025-04-03  7:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-nvme, Hannes Reinecke

When scanning for new namespaces we might have missed an ANA AEN,
so we should always re-read the ANA log page after scanning to ensure
we don't miss updates there.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
---
 drivers/nvme/host/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 70f9c2d2b113..6197de34c3c8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4296,6 +4296,11 @@ static void nvme_scan_work(struct work_struct *work)
 	/* Requeue if we have missed AENs */
 	if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
 		nvme_queue_scan(ctrl);
+#if CONFIG_NVME_MULTIPATH
+	else
+		/* Re-read the ANA log page to not miss updates */
+		queue_work(nvme_wq, &ctrl->ana_work);
+#endif
 }
 
 /*
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvme: re-read ANA log page after ns scan completes
  2025-04-03  7:19 ` [PATCH 2/2] nvme: re-read ANA log page after ns scan completes Hannes Reinecke
@ 2025-04-03 14:55   ` Keith Busch
  2025-04-03 15:32     ` Hannes Reinecke
  2025-04-05 23:02   ` Keith Busch
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2025-04-03 14:55 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme

On Thu, Apr 03, 2025 at 09:19:30AM +0200, Hannes Reinecke wrote:
> When scanning for new namespaces we might have missed an ANA AEN,
> so we should always re-read the ANA log page after scanning to ensure
> we don't miss updates there.

Worst case we might check a log page that hasn't changed. Rescans should
be rare, so I guess it's fine. But I'm not sure why scanning might cause
an ANA AEN to be missed. ?
 
> Signed-off-by: Hannes Reinecke <hare@kernel.org>
> ---
>  drivers/nvme/host/core.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 70f9c2d2b113..6197de34c3c8 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4296,6 +4296,11 @@ static void nvme_scan_work(struct work_struct *work)
>  	/* Requeue if we have missed AENs */
>  	if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
>  		nvme_queue_scan(ctrl);
> +#if CONFIG_NVME_MULTIPATH
> +	else
> +		/* Re-read the ANA log page to not miss updates */
> +		queue_work(nvme_wq, &ctrl->ana_work);
> +#endif


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvme: re-read ANA log page after ns scan completes
  2025-04-03 14:55   ` Keith Busch
@ 2025-04-03 15:32     ` Hannes Reinecke
  2025-04-03 15:37       ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Hannes Reinecke @ 2025-04-03 15:32 UTC (permalink / raw)
  To: Keith Busch, Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme

On 4/3/25 16:55, Keith Busch wrote:
> On Thu, Apr 03, 2025 at 09:19:30AM +0200, Hannes Reinecke wrote:
>> When scanning for new namespaces we might have missed an ANA AEN,
>> so we should always re-read the ANA log page after scanning to ensure
>> we don't miss updates there.
> 
> Worst case we might check a log page that hasn't changed. Rescans should
> be rare, so I guess it's fine. But I'm not sure why scanning might cause
> an ANA AEN to be missed. ?
>   
Well, it's not that we're _missing_ AENs here.
As per quoted spec ANA change events are not sent if there had been
an event (on the controller) which would have caused both an ANA change
event and an NS changed AEN to be sent.

So any NS changed event really needs to be treated as a combination of
NS changed _and_ ANA changed, and after each namespace rescan we need
to re-read the ANA log page.

Sigh.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvme: re-read ANA log page after ns scan completes
  2025-04-03 15:32     ` Hannes Reinecke
@ 2025-04-03 15:37       ` Keith Busch
  2025-04-03 15:52         ` Hannes Reinecke
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2025-04-03 15:37 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Hannes Reinecke, Christoph Hellwig, Sagi Grimberg, linux-nvme

On Thu, Apr 03, 2025 at 05:32:59PM +0200, Hannes Reinecke wrote:
> On 4/3/25 16:55, Keith Busch wrote:
> > On Thu, Apr 03, 2025 at 09:19:30AM +0200, Hannes Reinecke wrote:
> > > When scanning for new namespaces we might have missed an ANA AEN,
> > > so we should always re-read the ANA log page after scanning to ensure
> > > we don't miss updates there.
> > 
> > Worst case we might check a log page that hasn't changed. Rescans should
> > be rare, so I guess it's fine. But I'm not sure why scanning might cause
> > an ANA AEN to be missed. ?
> Well, it's not that we're _missing_ AENs here.
> As per quoted spec ANA change events are not sent if there had been
> an event (on the controller) which would have caused both an ANA change
> event and an NS changed AEN to be sent.
> 
> So any NS changed event really needs to be treated as a combination of
> NS changed _and_ ANA changed, and after each namespace rescan we need
> to re-read the ANA log page.

Oh, good call. I forgot about that text.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvme: re-read ANA log page after ns scan completes
  2025-04-03 15:37       ` Keith Busch
@ 2025-04-03 15:52         ` Hannes Reinecke
  0 siblings, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2025-04-03 15:52 UTC (permalink / raw)
  To: Keith Busch; +Cc: Hannes Reinecke, Christoph Hellwig, Sagi Grimberg, linux-nvme

On 4/3/25 17:37, Keith Busch wrote:
> On Thu, Apr 03, 2025 at 05:32:59PM +0200, Hannes Reinecke wrote:
>> On 4/3/25 16:55, Keith Busch wrote:
>>> On Thu, Apr 03, 2025 at 09:19:30AM +0200, Hannes Reinecke wrote:
>>>> When scanning for new namespaces we might have missed an ANA AEN,
>>>> so we should always re-read the ANA log page after scanning to ensure
>>>> we don't miss updates there.
>>>
>>> Worst case we might check a log page that hasn't changed. Rescans should
>>> be rare, so I guess it's fine. But I'm not sure why scanning might cause
>>> an ANA AEN to be missed. ?
>> Well, it's not that we're _missing_ AENs here.
>> As per quoted spec ANA change events are not sent if there had been
>> an event (on the controller) which would have caused both an ANA change
>> event and an NS changed AEN to be sent.
>>
>> So any NS changed event really needs to be treated as a combination of
>> NS changed _and_ ANA changed, and after each namespace rescan we need
>> to re-read the ANA log page.
> 
> Oh, good call. I forgot about that text.

NP. We all did :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] nvme: requeue namespace scan on missed AENs
  2025-04-03  7:19 ` [PATCH 1/2] nvme: requeue namespace scan on missed AENs Hannes Reinecke
@ 2025-04-05 23:01   ` Keith Busch
  0 siblings, 0 replies; 10+ messages in thread
From: Keith Busch @ 2025-04-05 23:01 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme

On Thu, Apr 03, 2025 at 09:19:29AM +0200, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare@kernel.org
> 
> Scanning for namespaces can take some time, so if the target is
> reconfigured while the scan is running we will miss the AEN.
> So check if the NVME_AER_NOTICE_NS_CHANGED bit is set once the scan is
> finished, and requeue scanning to pick up any missed changes.

Looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] nvme: re-read ANA log page after ns scan completes
  2025-04-03  7:19 ` [PATCH 2/2] nvme: re-read ANA log page after ns scan completes Hannes Reinecke
  2025-04-03 14:55   ` Keith Busch
@ 2025-04-05 23:02   ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2025-04-05 23:02 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, linux-nvme

On Thu, Apr 03, 2025 at 09:19:30AM +0200, Hannes Reinecke wrote:
> When scanning for new namespaces we might have missed an ANA AEN,
> so we should always re-read the ANA log page after scanning to ensure
> we don't miss updates there.

Maybe just update the commit message to refer to the spec: 

  A controller shall not send this event if an Attached Namespace Attribute Changed asynchronous
  event ... is sent for the same event

Change looks good.

Reviewed-by: Keith Busch <kbusch@kernel.org>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] nvme: missed AENs during scanning
  2025-04-03  7:19 [PATCH 0/2] nvme: missed AENs during scanning Hannes Reinecke
  2025-04-03  7:19 ` [PATCH 1/2] nvme: requeue namespace scan on missed AENs Hannes Reinecke
  2025-04-03  7:19 ` [PATCH 2/2] nvme: re-read ANA log page after ns scan completes Hannes Reinecke
@ 2025-04-07 14:32 ` Christoph Hellwig
  2 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-04-07 14:32 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme

Thanks, I've applied this to nvme-6.15 with minor edits to the commit
messages, mostly base on what Keith suggested.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-04-07 16:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-03  7:19 [PATCH 0/2] nvme: missed AENs during scanning Hannes Reinecke
2025-04-03  7:19 ` [PATCH 1/2] nvme: requeue namespace scan on missed AENs Hannes Reinecke
2025-04-05 23:01   ` Keith Busch
2025-04-03  7:19 ` [PATCH 2/2] nvme: re-read ANA log page after ns scan completes Hannes Reinecke
2025-04-03 14:55   ` Keith Busch
2025-04-03 15:32     ` Hannes Reinecke
2025-04-03 15:37       ` Keith Busch
2025-04-03 15:52         ` Hannes Reinecke
2025-04-05 23:02   ` Keith Busch
2025-04-07 14:32 ` [PATCH 0/2] nvme: missed AENs during scanning Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.