linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] 2 patches about asynchronous events notification
@ 2016-06-17 17:21 Guilherme G. Piccoli
  2016-06-17 17:21 ` [PATCH 1/2] nvme: introduce asynchronous events textual output Guilherme G. Piccoli
  2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli
  0 siblings, 2 replies; 10+ messages in thread
From: Guilherme G. Piccoli @ 2016-06-17 17:21 UTC (permalink / raw)


The 2 patches in this small series are related to asynchronous events' handling.

The 1st aims to print textual information instead of just print the event code.
The 2nd patch enables asynchronous events notification by default.

There are some points below, for which I'd be glad to hear some suggestions!

1) In the 1st patch, we do first a direct lookup in a table and then we
fallback to full table search in case direct lookup does not succeed. We can
rely in direct lookup solely if we are sure the NVMe spec. won't ever add
events with non-incremental codes. Since I cannot guarantee this myself, my
choice was to keep the full table search (which code won't be reached with
current events of the spec.);

2) The 2nd patch was sent as a RFC, because I'm not sure if there is some
con in enabling those events by default; I might be missing some caveat here.

Thanks in advance.



Guilherme G. Piccoli (2):
  nvme: introduce asynchronous events textual output
  nvme: enable asynchronous events notification by default

 drivers/nvme/host/core.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/nvme/host/nvme.h |  9 ++++++
 drivers/nvme/host/pci.c  | 16 ++++++++++
 3 files changed, 102 insertions(+), 2 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] nvme: introduce asynchronous events textual output
  2016-06-17 17:21 [PATCH 0/2] 2 patches about asynchronous events notification Guilherme G. Piccoli
@ 2016-06-17 17:21 ` Guilherme G. Piccoli
  2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli
  1 sibling, 0 replies; 10+ messages in thread
From: Guilherme G. Piccoli @ 2016-06-17 17:21 UTC (permalink / raw)


Currently the nvme driver prints asynchronous events as numbers,
based on NVMe specification codes. This patch introduces the
textual output for such events, keeping the codes too.

This is useful for error reporting purposes in human readable
format.

Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
---
 drivers/nvme/host/core.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/nvme/host/nvme.h |  9 ++++++
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1a51584..8f95947 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -58,6 +58,79 @@ static DEFINE_SPINLOCK(dev_list_lock);
 
 static struct class *nvme_class;
 
+/* NVMe Async Events descriptive messages:
+ * Error status messages - Type 0x0
+ */
+static const struct nvme_ae_info nvme_ae_error_status[] = {
+	{ 0x0, "Write to invalid doorbell register"	},
+	{ 0x1, "Invalid doorbell write value"		},
+	{ 0x2, "Diagnostic failure"			},
+	{ 0x3, "Persistent internal error"		},
+	{ 0x4, "Transient internal error"		},
+	{ 0x5, "Firmware image load error"		},
+};
+
+/* SMART/Health status messages - Type 0x1 */
+static const struct nvme_ae_info nvme_ae_health_status[] = {
+	{ 0x0, "NVM subsystem reliability"		},
+	{ 0x1, "Temperature threshold"			},
+	{ 0x2, "Spare below threshold"			},
+};
+
+/* Notice messages - Type 0x2
+ * Event 0x0 in the below table is referred as NVME_AER_NOTICE_NS_CHANGED on
+ * nvme.h and is not just reported; the driver perform a queue rescan too.
+ */
+static const struct nvme_ae_info nvme_ae_notice[] = {
+	{ 0x0, "Namespace attribute changed"		},
+	{ 0x1, "Firmware activation starting"		},
+};
+
+/* NVM/IO Command Set Specific Status messages - Type 0x6 */
+static const struct nvme_ae_info nvme_ae_io_command_set[] = {
+	{ 0x0, "Reservation log page available"		},
+};
+
+static const char *nvme_get_ae_info(u32 result)
+{
+	const struct nvme_ae_info *tbl;
+	u8 evnt, size, type, i;
+
+	type = result & NVME_AE_TYPE_MASK;
+	evnt = (result & NVME_AE_INFO_MASK) >> 8;
+
+	switch (type) {
+	case 0x0: /* Error status */
+		tbl = nvme_ae_error_status;
+		size = ARRAY_SIZE(nvme_ae_error_status);
+		break;
+	case 0x1: /* SMART/Health status */
+		tbl = nvme_ae_health_status;
+		size = ARRAY_SIZE(nvme_ae_health_status);
+		break;
+	case 0x2: /* Notice */
+		tbl = nvme_ae_notice;
+		size = ARRAY_SIZE(nvme_ae_notice);
+		break;
+	case 0x6: /* NVM/IO Command Set Specific Status */
+		tbl = nvme_ae_io_command_set;
+		size = ARRAY_SIZE(nvme_ae_io_command_set);
+		break;
+	default: /* Reserved types/Vendor specific */
+		return NULL;
+	}
+
+	/* Try first a direct lookup to avoid full table search */
+	if (evnt < size && tbl[evnt].event == evnt)
+		return tbl[evnt].desc;
+
+	for (i = 0; i < size; i++)
+		if (tbl[i].event == evnt)
+			return tbl[i].desc;
+
+	return NULL;
+}
+
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state)
 {
@@ -1656,11 +1729,13 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl,
 
 	switch (result & 0xff07) {
 	case NVME_AER_NOTICE_NS_CHANGED:
-		dev_info(ctrl->device, "rescanning\n");
+		dev_info(ctrl->device,
+			"rescanning (Namespace attribute change detected)\n");
 		nvme_queue_scan(ctrl);
 		break;
 	default:
-		dev_warn(ctrl->device, "async event result %08x\n", result);
+		dev_warn(ctrl->device, "async event result: %s (%08x)\n",
+					nvme_get_ae_info(result), result);
 	}
 }
 EXPORT_SYMBOL_GPL(nvme_complete_async_event);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1daa048..a075eb1 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -154,6 +154,15 @@ struct nvme_ctrl_ops {
 	void (*submit_async_event)(struct nvme_ctrl *ctrl, int aer_idx);
 };
 
+/* NVMe Asynchronous Events (AE) textual parsing helpers */
+#define NVME_AE_TYPE_MASK	0x7
+#define NVME_AE_INFO_MASK	0xFF00
+
+struct nvme_ae_info {
+	u8 event;
+	const char *desc;
+};
+
 static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl)
 {
 	u32 val = 0;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-17 17:21 [PATCH 0/2] 2 patches about asynchronous events notification Guilherme G. Piccoli
  2016-06-17 17:21 ` [PATCH 1/2] nvme: introduce asynchronous events textual output Guilherme G. Piccoli
@ 2016-06-17 17:21 ` Guilherme G. Piccoli
  2016-06-20  6:55   ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Guilherme G. Piccoli @ 2016-06-17 17:21 UTC (permalink / raw)


Asynchronous events notification currently is disabled by default;
to enable it, one should issue a set-feature command through
nvme-cli userspace application. The tool also allows disabling
these events once they're enable, as per user desire.

This patch makes the asynchronous events notification enabled by
default; to do so, we submit the set-feature command from the
driver, in the end of nvme_reset_work() routine. This way, the
feature is enabled on the driver initialization and after resets,
in case they happen.

Enabling the asynchronous events notification feature by default
is interesting as an error report feature.

Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
---
 drivers/nvme/host/pci.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index befac5b..3a82390 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1779,6 +1779,18 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 		nvme_put_ctrl(&dev->ctrl);
 }
 
+static int nvme_enable_async_events(struct nvme_dev *dev)
+{
+	struct nvme_command c;
+
+	memset(&c, 0, sizeof(c));
+	c.common.opcode = nvme_admin_set_features;
+	c.common.cdw10[0] = 0xB; /* Feature: Async Event Configuration */
+	c.common.cdw10[1] = 0x3FF; /* Enable all available events */
+
+	return nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0);
+}
+
 static void nvme_reset_work(struct work_struct *work)
 {
 	struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
@@ -1849,6 +1861,10 @@ static void nvme_reset_work(struct work_struct *work)
 
 	if (dev->online_queues > 1)
 		nvme_queue_scan(&dev->ctrl);
+
+	if (nvme_enable_async_events(dev))
+		dev_info(dev->ctrl.device,
+			"failed to enable async events notification\n");
 	return;
 
  out:
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli
@ 2016-06-20  6:55   ` Sagi Grimberg
  2016-06-20 17:03     ` Guilherme G. Piccoli
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2016-06-20  6:55 UTC (permalink / raw)



> Asynchronous events notification currently is disabled by default;
> to enable it, one should issue a set-feature command through
> nvme-cli userspace application. The tool also allows disabling
> these events once they're enable, as per user desire.
>
> This patch makes the asynchronous events notification enabled by
> default; to do so, we submit the set-feature command from the
> driver, in the end of nvme_reset_work() routine. This way, the
> feature is enabled on the driver initialization and after resets,
> in case they happen.

AEN was already enabled by default in:

commit f866fc4282a81673ef973ad54c68235a3263b42e
Author: Christoph Hellwig <hch at lst.de>
Date:   Tue Apr 26 13:52:00 2016 +0200

     nvme: move AER handling to common code

     The transport driver still needs to do the actual submission, but 
all the
     higher level code can be shared.

     Signed-off-by: Christoph Hellwig <hch at lst.de>
     Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
     Signed-off-by: Jens Axboe <axboe at fb.com>

(Assuming you have at least a single queue).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-20  6:55   ` Sagi Grimberg
@ 2016-06-20 17:03     ` Guilherme G. Piccoli
  2016-06-20 20:21       ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Guilherme G. Piccoli @ 2016-06-20 17:03 UTC (permalink / raw)


On 06/20/2016 03:55 AM, Sagi Grimberg wrote:
>
>> Asynchronous events notification currently is disabled by default;
>> to enable it, one should issue a set-feature command through
>> nvme-cli userspace application. The tool also allows disabling
>> these events once they're enable, as per user desire.
>>
>> This patch makes the asynchronous events notification enabled by
>> default; to do so, we submit the set-feature command from the
>> driver, in the end of nvme_reset_work() routine. This way, the
>> feature is enabled on the driver initialization and after resets,
>> in case they happen.
>
> AEN was already enabled by default in:
>
> commit f866fc4282a81673ef973ad54c68235a3263b42e
> Author: Christoph Hellwig <hch at lst.de>
> Date:   Tue Apr 26 13:52:00 2016 +0200
>
>      nvme: move AER handling to common code
>
>      The transport driver still needs to do the actual submission, but
> all the
>      higher level code can be shared.
>
>      Signed-off-by: Christoph Hellwig <hch at lst.de>
>      Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
>      Signed-off-by: Jens Axboe <axboe at fb.com>
>
> (Assuming you have at least a single queue).

Sagi, I might be wrong in my understanding (sorry in advance if this is 
the case hehe) but I guess the above commit only re-factored the async 
events handling on driver - the functions reworked by the patch aim to 
handle the events once they are 'captured', which is feasible only if 
they are enabled by setting the feature (with nvme-cli, for example).

The goal of my patch is to do the 'job' of nvme-cli automatically from 
within the driver.

One thing I was thinking is to add a sysfs parameter to be able to avoid 
the automatic enablement of async events in case we have a bogus adapter 
showing too many events, for example...not sure if this is necessary 
though, so I didn't add the parameter on this submission.

Thanks,


Guilherme

>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-20 17:03     ` Guilherme G. Piccoli
@ 2016-06-20 20:21       ` Keith Busch
  2016-06-24  8:17         ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Keith Busch @ 2016-06-20 20:21 UTC (permalink / raw)


On Mon, Jun 20, 2016@02:03:37PM -0300, Guilherme G. Piccoli wrote:
> The goal of my patch is to do the 'job' of nvme-cli automatically from
> within the driver.
> 
> One thing I was thinking is to add a sysfs parameter to be able to avoid the
> automatic enablement of async events in case we have a bogus adapter showing
> too many events, for example...not sure if this is necessary though, so I
> didn't add the parameter on this submission.

Instead of simply logging a message, is there something better we can do
to notify a user of an async event result? The event should be masked
by the controller until the appropriate log page is read, and I don't
think scanning the kernel logs is how such a program wants to find out
which page to read.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-20 20:21       ` Keith Busch
@ 2016-06-24  8:17         ` Christoph Hellwig
  2016-06-28 15:11           ` Keith Busch
  2016-07-06 22:47           ` Guilherme G. Piccoli
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-06-24  8:17 UTC (permalink / raw)


On Mon, Jun 20, 2016@04:21:35PM -0400, Keith Busch wrote:
> Instead of simply logging a message, is there something better we can do
> to notify a user of an async event result? The event should be masked
> by the controller until the appropriate log page is read, and I don't
> think scanning the kernel logs is how such a program wants to find out
> which page to read.

Yes, I'm not too excited about handling the AERs in kernel space,
having a userspace daemon to listen for them would generally be more
useful.

The only exception is the namespace changed AER that we already handle.
But while we're at it: why do we make use of the changed namespace
list log page in this case instead of doing a full recan.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-24  8:17         ` Christoph Hellwig
@ 2016-06-28 15:11           ` Keith Busch
  2016-06-29  4:17             ` Gabriel Krisman Bertazi
  2016-07-06 22:47           ` Guilherme G. Piccoli
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2016-06-28 15:11 UTC (permalink / raw)


On Fri, Jun 24, 2016@01:17:26AM -0700, Christoph Hellwig wrote:
> Yes, I'm not too excited about handling the AERs in kernel space,
> having a userspace daemon to listen for them would generally be more
> useful.

How do daemons typically want to be notified of an event like this? I'm
thinking sysfs_notify or maybe a KOBJ_CHANGE uevent. I don't know, this
is unfamiliar territory to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-28 15:11           ` Keith Busch
@ 2016-06-29  4:17             ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 10+ messages in thread
From: Gabriel Krisman Bertazi @ 2016-06-29  4:17 UTC (permalink / raw)


Keith Busch <keith.busch at intel.com> writes:

> On Fri, Jun 24, 2016@01:17:26AM -0700, Christoph Hellwig wrote:
>> Yes, I'm not too excited about handling the AERs in kernel space,
>> having a userspace daemon to listen for them would generally be more
>> useful.
>
> How do daemons typically want to be notified of an event like this? I'm
> thinking sysfs_notify or maybe a KOBJ_CHANGE uevent. I don't know, this
> is unfamiliar territory to me.

For IPR (IBM Hardware RAID SCSI controller) we use KOBJ_CHANGE events
for a very similar situation, in which the driver signals a userspace
daemon, which goes over a sysfs attribute to collect adapter data dumps.

I'm also working on another async notifications mechanism for IPR,
similar to what is supported by nvme, in which I plan to use the same
uevent interface for userspace notifications.

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default
  2016-06-24  8:17         ` Christoph Hellwig
  2016-06-28 15:11           ` Keith Busch
@ 2016-07-06 22:47           ` Guilherme G. Piccoli
  1 sibling, 0 replies; 10+ messages in thread
From: Guilherme G. Piccoli @ 2016-07-06 22:47 UTC (permalink / raw)


On 06/24/2016 05:17 AM, Christoph Hellwig wrote:
> On Mon, Jun 20, 2016@04:21:35PM -0400, Keith Busch wrote:
>> Instead of simply logging a message, is there something better we can do
>> to notify a user of an async event result? The event should be masked
>> by the controller until the appropriate log page is read, and I don't
>> think scanning the kernel logs is how such a program wants to find out
>> which page to read.
>
> Yes, I'm not too excited about handling the AERs in kernel space,
> having a userspace daemon to listen for them would generally be more
> useful.

Thanks very much for all the replies - sorry for the delay in my response.

I guess we have 2 ways of dealing with those events, one being from 
kernel, the other by working in a daemon like Krisman mentioned (the IPR 
case). Both approaches seems to have their importance; an automatic 
error reporting tool might schedule greps on dmesg to find errors, so it 
can be useful showing them on kernel logs. More yet, some critical 
errors should be reported on kernel logs always, IMHO (like the adapter 
persistent error).

On the other hand, daemons seem to be a better accepted solution (as per 
this thread's comments hehe) and they are more popular maybe. So 2 
possible implementations come to my mind, one not excluding the other. 
I'll elaborate below, but since they can work concomitantly, I'd go for 
implementing both.


(i) We can implement a daemon like Krisman suggested, that uses uevents 
and reads the last async event from a sysfs entry, cleaning the 
necessary bit to allow more events to be reported;

(ii) We can implement a special sysfs parameter that defaults to 0, 
allowing the use of (i). This parameter is a timeout, in seconds - when 
this is set to a value >0, approach (i) is disabled, and we start to 
monitor the events from the driver. Once an event is captured, we print 
it on kernel log, wait the timeout and clear the necessary bit to allow 
new events to be reported, and so on...allowing this way a dmesg events 
approach, without flooding the log or inhibiting the users that prefer (i).


Comments and suggestions are much appreciated.
Thanks in advance,



Guilherme


> The only exception is the namespace changed AER that we already handle.
> But while we're at it: why do we make use of the changed namespace
> list log page in this case instead of doing a full recan.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-07-06 22:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-17 17:21 [PATCH 0/2] 2 patches about asynchronous events notification Guilherme G. Piccoli
2016-06-17 17:21 ` [PATCH 1/2] nvme: introduce asynchronous events textual output Guilherme G. Piccoli
2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli
2016-06-20  6:55   ` Sagi Grimberg
2016-06-20 17:03     ` Guilherme G. Piccoli
2016-06-20 20:21       ` Keith Busch
2016-06-24  8:17         ` Christoph Hellwig
2016-06-28 15:11           ` Keith Busch
2016-06-29  4:17             ` Gabriel Krisman Bertazi
2016-07-06 22:47           ` Guilherme G. Piccoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).