* [PATCH] nvme: re-read ANA log on NS CHANGED AEN
@ 2020-12-04 15:03 Hannes Reinecke
2020-12-04 15:20 ` Keith Busch
0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2020-12-04 15:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-nvme, Sagi Grimberg, Keith Busch, Hannes Reinecke
As discussed on the mailing list we might be getting an NS CHANGED AEN
for a namespace on a newly created ANA group, and due to optimisations
within the spec no corresponding ANA CHANGED AEN will be send.
Ideally we would re-read the ANA log when we're figuring out that
no ANA Group exists, but that code is hidden behind two stacked void
functions, so it'll be impossible to recover from an error on reading
the ANA log.
Instead this patch re-reads the ANA log on every NS CHANGED AEN prior
to scanning the namespaces; that will resolve the situation, too,
but doesn't risk into running into an unrecoverable error.
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
drivers/nvme/host/core.c | 8 +++++++-
drivers/nvme/host/multipath.c | 18 ++++++++++++++----
drivers/nvme/host/nvme.h | 1 +
3 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 95ef4943d8bd..084a05442ac2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -123,7 +123,7 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
nvme_update_bdev_size(ns->disk);
}
-static void nvme_queue_scan(struct nvme_ctrl *ctrl)
+void nvme_queue_scan(struct nvme_ctrl *ctrl)
{
/*
* Only new queue scan work when admin and IO queues are both alive
@@ -4292,6 +4292,12 @@ static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
switch (aer_notice_type) {
case NVME_AER_NOTICE_NS_CHANGED:
set_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events);
+#ifdef CONFIG_NVME_MULTIPATH
+ if (ctrl->ana_log_buf) {
+ queue_work(nvme_wq, &ctrl->ana_work);
+ break;
+ }
+#endif
nvme_queue_scan(ctrl);
break;
case NVME_AER_NOTICE_FW_ACT_STARTING:
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 74896be40c17..b920d57d0e39 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -535,10 +535,16 @@ static int nvme_read_ana_log(struct nvme_ctrl *ctrl)
goto out_unlock;
}
- error = nvme_parse_ana_log(ctrl, &nr_change_groups,
- nvme_update_ana_state);
- if (error)
- goto out_unlock;
+ /*
+ * Don't update ANA groups if triggered by an NS CHANGED
+ * AEN; we'll be rescanning all namespaces anyway afterwards.
+ */
+ if (!test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) {
+ error = nvme_parse_ana_log(ctrl, &nr_change_groups,
+ nvme_update_ana_state);
+ if (error)
+ goto out_unlock;
+ }
/*
* In theory we should have an ANATT timer per group as they might enter
@@ -557,6 +563,10 @@ static int nvme_read_ana_log(struct nvme_ctrl *ctrl)
del_timer_sync(&ctrl->anatt_timer);
out_unlock:
mutex_unlock(&ctrl->ana_lock);
+
+ if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
+ nvme_queue_scan(ctrl);
+
return error;
}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index cc111136a981..b0d01c2ce092 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -589,6 +589,7 @@ void nvme_uninit_ctrl(struct nvme_ctrl *ctrl);
void nvme_start_ctrl(struct nvme_ctrl *ctrl);
void nvme_stop_ctrl(struct nvme_ctrl *ctrl);
int nvme_init_identify(struct nvme_ctrl *ctrl);
+void nvme_queue_scan(struct nvme_ctrl *ctrl);
void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
--
2.16.4
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] nvme: re-read ANA log on NS CHANGED AEN
2020-12-04 15:03 [PATCH] nvme: re-read ANA log on NS CHANGED AEN Hannes Reinecke
@ 2020-12-04 15:20 ` Keith Busch
2020-12-04 16:32 ` Hannes Reinecke
0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2020-12-04 15:20 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg
On Fri, Dec 04, 2020 at 04:03:10PM +0100, Hannes Reinecke wrote:
> - error = nvme_parse_ana_log(ctrl, &nr_change_groups,
> - nvme_update_ana_state);
> - if (error)
> - goto out_unlock;
> + /*
> + * Don't update ANA groups if triggered by an NS CHANGED
> + * AEN; we'll be rescanning all namespaces anyway afterwards.
> + */
> + if (!test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) {
> + error = nvme_parse_ana_log(ctrl, &nr_change_groups,
> + nvme_update_ana_state);
> + if (error)
> + goto out_unlock;
> + }
>
> /*
> * In theory we should have an ANATT timer per group as they might enter
> @@ -557,6 +563,10 @@ static int nvme_read_ana_log(struct nvme_ctrl *ctrl)
> del_timer_sync(&ctrl->anatt_timer);
> out_unlock:
> mutex_unlock(&ctrl->ana_lock);
> +
> + if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
> + nvme_queue_scan(ctrl);
> +
> return error;
> }
The scan work can be started from other contexts too, so there are some
unfortunate races here if one of those other contexts clears ctrl->events
before the ANA work gets to see it, which then gets you back to having
unusable namespaces when that scan work uses a stale ana log.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] nvme: re-read ANA log on NS CHANGED AEN
2020-12-04 15:20 ` Keith Busch
@ 2020-12-04 16:32 ` Hannes Reinecke
2020-12-04 17:37 ` Keith Busch
0 siblings, 1 reply; 5+ messages in thread
From: Hannes Reinecke @ 2020-12-04 16:32 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg
On 12/4/20 4:20 PM, Keith Busch wrote:
> On Fri, Dec 04, 2020 at 04:03:10PM +0100, Hannes Reinecke wrote:
>> - error = nvme_parse_ana_log(ctrl, &nr_change_groups,
>> - nvme_update_ana_state);
>> - if (error)
>> - goto out_unlock;
>> + /*
>> + * Don't update ANA groups if triggered by an NS CHANGED
>> + * AEN; we'll be rescanning all namespaces anyway afterwards.
>> + */
>> + if (!test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) {
>> + error = nvme_parse_ana_log(ctrl, &nr_change_groups,
>> + nvme_update_ana_state);
>> + if (error)
>> + goto out_unlock;
>> + }
>>
>> /*
>> * In theory we should have an ANATT timer per group as they might enter
>> @@ -557,6 +563,10 @@ static int nvme_read_ana_log(struct nvme_ctrl *ctrl)
>> del_timer_sync(&ctrl->anatt_timer);
>> out_unlock:
>> mutex_unlock(&ctrl->ana_lock);
>> +
>> + if (test_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events))
>> + nvme_queue_scan(ctrl);
>> +
>> return error;
>> }
>
> The scan work can be started from other contexts too, so there are some
> unfortunate races here if one of those other contexts clears ctrl->events
> before the ANA work gets to see it, which then gets you back to having
> unusable namespaces when that scan work uses a stale ana log.
>
Maybe this on top?
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 27be05ef0ed7..83e28fb9fba4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -114,6 +114,10 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
static void nvme_queue_scan(struct nvme_ctrl *ctrl)
{
+#ifdef CONFIG_NVME_MULTIPATH
+ if (ctrl->ana_log_buf)
+ flush_work(nvme_wq, &ctrl->ana_work);
+#endif
/*
* Only new queue scan work when admin and IO queues are both alive
*/
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] nvme: re-read ANA log on NS CHANGED AEN
2020-12-04 16:32 ` Hannes Reinecke
@ 2020-12-04 17:37 ` Keith Busch
2020-12-05 9:50 ` Hannes Reinecke
0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2020-12-04 17:37 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg
On Fri, Dec 04, 2020 at 05:32:21PM +0100, Hannes Reinecke wrote:
> On 12/4/20 4:20 PM, Keith Busch wrote:
> > The scan work can be started from other contexts too, so there are some
> > unfortunate races here if one of those other contexts clears ctrl->events
> > before the ANA work gets to see it, which then gets you back to having
> > unusable namespaces when that scan work uses a stale ana log.
> >
> Maybe this on top?
It does shrink the race, but doesn't eliminate it if scan work starts
after the bit is set in ctrl->events, but before the ana work is queued.
I've another question, though. You mentioned this approach for recovery
reasons. Whether you read the ana log here or where I suggested, there's
no recovery occuring if the ana log read fails: your patch just queues
the scan work that won't find the matching ana group id, which is the
same result if refreshing the ana log inline with the namespace scan
also fails. Is their a difference I'm not seeing?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvme: re-read ANA log on NS CHANGED AEN
2020-12-04 17:37 ` Keith Busch
@ 2020-12-05 9:50 ` Hannes Reinecke
0 siblings, 0 replies; 5+ messages in thread
From: Hannes Reinecke @ 2020-12-05 9:50 UTC (permalink / raw)
To: Keith Busch; +Cc: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg
On 12/4/20 6:37 PM, Keith Busch wrote:
> On Fri, Dec 04, 2020 at 05:32:21PM +0100, Hannes Reinecke wrote:
>> On 12/4/20 4:20 PM, Keith Busch wrote:
>>> The scan work can be started from other contexts too, so there are some
>>> unfortunate races here if one of those other contexts clears ctrl->events
>>> before the ANA work gets to see it, which then gets you back to having
>>> unusable namespaces when that scan work uses a stale ana log.
>>>
>> Maybe this on top?
>
> It does shrink the race, but doesn't eliminate it if scan work starts
> after the bit is set in ctrl->events, but before the ana work is queued.
>
> I've another question, though. You mentioned this approach for recovery
> reasons. Whether you read the ana log here or where I suggested, there's
> no recovery occuring if the ana log read fails: your patch just queues
> the scan work that won't find the matching ana group id, which is the
> same result if refreshing the ana log inline with the namespace scan
> also fails. Is their a difference I'm not seeing?
>
My point was that with your approach we're doing I/O in the middle of a
non-recoverable section.
While the result of the actual I/O probably don't matter much (as you
rightly pointed out), it's the side effect of an I/O error which I'm
worried about.
If we hit an I/O error it means that error recovery _will_ have started.
Which means that we cannot rely on the remaining part of that function
to be executed properly, as the controller might have been torn down
already. So my worry here is that we're left with a halfway-initialized
block device and have no way to recover that.
In queued approach I've suggested an I/O error on reading the ANA log
would cause the scan to be aborted, which is perfectly fine as it'll be
redone at the end of the error recovery anyway.
At the expense of always having to read the ANA log before scanning,
I'll admit.
But maybe there's another way, by lifting the check for the ANA
information prior to setting up the block device ... lemme check.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-12-05 9:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-04 15:03 [PATCH] nvme: re-read ANA log on NS CHANGED AEN Hannes Reinecke
2020-12-04 15:20 ` Keith Busch
2020-12-04 16:32 ` Hannes Reinecke
2020-12-04 17:37 ` Keith Busch
2020-12-05 9:50 ` Hannes Reinecke
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.