* [PATCH 0/2] 2 patches about asynchronous events notification @ 2016-06-17 17:21 Guilherme G. Piccoli 2016-06-17 17:21 ` [PATCH 1/2] nvme: introduce asynchronous events textual output Guilherme G. Piccoli 2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli 0 siblings, 2 replies; 10+ messages in thread From: Guilherme G. Piccoli @ 2016-06-17 17:21 UTC (permalink / raw) The 2 patches in this small series are related to asynchronous events' handling. The 1st aims to print textual information instead of just print the event code. The 2nd patch enables asynchronous events notification by default. There are some points below, for which I'd be glad to hear some suggestions! 1) In the 1st patch, we do first a direct lookup in a table and then we fallback to full table search in case direct lookup does not succeed. We can rely in direct lookup solely if we are sure the NVMe spec. won't ever add events with non-incremental codes. Since I cannot guarantee this myself, my choice was to keep the full table search (which code won't be reached with current events of the spec.); 2) The 2nd patch was sent as a RFC, because I'm not sure if there is some con in enabling those events by default; I might be missing some caveat here. Thanks in advance. Guilherme G. Piccoli (2): nvme: introduce asynchronous events textual output nvme: enable asynchronous events notification by default drivers/nvme/host/core.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++-- drivers/nvme/host/nvme.h | 9 ++++++ drivers/nvme/host/pci.c | 16 ++++++++++ 3 files changed, 102 insertions(+), 2 deletions(-) -- 2.1.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] nvme: introduce asynchronous events textual output 2016-06-17 17:21 [PATCH 0/2] 2 patches about asynchronous events notification Guilherme G. Piccoli @ 2016-06-17 17:21 ` Guilherme G. Piccoli 2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli 1 sibling, 0 replies; 10+ messages in thread From: Guilherme G. Piccoli @ 2016-06-17 17:21 UTC (permalink / raw) Currently the nvme driver prints asynchronous events as numbers, based on NVMe specification codes. This patch introduces the textual output for such events, keeping the codes too. This is useful for error reporting purposes in human readable format. Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com> --- drivers/nvme/host/core.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++-- drivers/nvme/host/nvme.h | 9 ++++++ 2 files changed, 86 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 1a51584..8f95947 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -58,6 +58,79 @@ static DEFINE_SPINLOCK(dev_list_lock); static struct class *nvme_class; +/* NVMe Async Events descriptive messages: + * Error status messages - Type 0x0 + */ +static const struct nvme_ae_info nvme_ae_error_status[] = { + { 0x0, "Write to invalid doorbell register" }, + { 0x1, "Invalid doorbell write value" }, + { 0x2, "Diagnostic failure" }, + { 0x3, "Persistent internal error" }, + { 0x4, "Transient internal error" }, + { 0x5, "Firmware image load error" }, +}; + +/* SMART/Health status messages - Type 0x1 */ +static const struct nvme_ae_info nvme_ae_health_status[] = { + { 0x0, "NVM subsystem reliability" }, + { 0x1, "Temperature threshold" }, + { 0x2, "Spare below threshold" }, +}; + +/* Notice messages - Type 0x2 + * Event 0x0 in the below table is referred as NVME_AER_NOTICE_NS_CHANGED on + * nvme.h and is not just reported; the driver perform a queue rescan too. + */ +static const struct nvme_ae_info nvme_ae_notice[] = { + { 0x0, "Namespace attribute changed" }, + { 0x1, "Firmware activation starting" }, +}; + +/* NVM/IO Command Set Specific Status messages - Type 0x6 */ +static const struct nvme_ae_info nvme_ae_io_command_set[] = { + { 0x0, "Reservation log page available" }, +}; + +static const char *nvme_get_ae_info(u32 result) +{ + const struct nvme_ae_info *tbl; + u8 evnt, size, type, i; + + type = result & NVME_AE_TYPE_MASK; + evnt = (result & NVME_AE_INFO_MASK) >> 8; + + switch (type) { + case 0x0: /* Error status */ + tbl = nvme_ae_error_status; + size = ARRAY_SIZE(nvme_ae_error_status); + break; + case 0x1: /* SMART/Health status */ + tbl = nvme_ae_health_status; + size = ARRAY_SIZE(nvme_ae_health_status); + break; + case 0x2: /* Notice */ + tbl = nvme_ae_notice; + size = ARRAY_SIZE(nvme_ae_notice); + break; + case 0x6: /* NVM/IO Command Set Specific Status */ + tbl = nvme_ae_io_command_set; + size = ARRAY_SIZE(nvme_ae_io_command_set); + break; + default: /* Reserved types/Vendor specific */ + return NULL; + } + + /* Try first a direct lookup to avoid full table search */ + if (evnt < size && tbl[evnt].event == evnt) + return tbl[evnt].desc; + + for (i = 0; i < size; i++) + if (tbl[i].event == evnt) + return tbl[i].desc; + + return NULL; +} + bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, enum nvme_ctrl_state new_state) { @@ -1656,11 +1729,13 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, switch (result & 0xff07) { case NVME_AER_NOTICE_NS_CHANGED: - dev_info(ctrl->device, "rescanning\n"); + dev_info(ctrl->device, + "rescanning (Namespace attribute change detected)\n"); nvme_queue_scan(ctrl); break; default: - dev_warn(ctrl->device, "async event result %08x\n", result); + dev_warn(ctrl->device, "async event result: %s (%08x)\n", + nvme_get_ae_info(result), result); } } EXPORT_SYMBOL_GPL(nvme_complete_async_event); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 1daa048..a075eb1 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -154,6 +154,15 @@ struct nvme_ctrl_ops { void (*submit_async_event)(struct nvme_ctrl *ctrl, int aer_idx); }; +/* NVMe Asynchronous Events (AE) textual parsing helpers */ +#define NVME_AE_TYPE_MASK 0x7 +#define NVME_AE_INFO_MASK 0xFF00 + +struct nvme_ae_info { + u8 event; + const char *desc; +}; + static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl) { u32 val = 0; -- 2.1.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-17 17:21 [PATCH 0/2] 2 patches about asynchronous events notification Guilherme G. Piccoli 2016-06-17 17:21 ` [PATCH 1/2] nvme: introduce asynchronous events textual output Guilherme G. Piccoli @ 2016-06-17 17:21 ` Guilherme G. Piccoli 2016-06-20 6:55 ` Sagi Grimberg 1 sibling, 1 reply; 10+ messages in thread From: Guilherme G. Piccoli @ 2016-06-17 17:21 UTC (permalink / raw) Asynchronous events notification currently is disabled by default; to enable it, one should issue a set-feature command through nvme-cli userspace application. The tool also allows disabling these events once they're enable, as per user desire. This patch makes the asynchronous events notification enabled by default; to do so, we submit the set-feature command from the driver, in the end of nvme_reset_work() routine. This way, the feature is enabled on the driver initialization and after resets, in case they happen. Enabling the asynchronous events notification feature by default is interesting as an error report feature. Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com> --- drivers/nvme/host/pci.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index befac5b..3a82390 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1779,6 +1779,18 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status) nvme_put_ctrl(&dev->ctrl); } +static int nvme_enable_async_events(struct nvme_dev *dev) +{ + struct nvme_command c; + + memset(&c, 0, sizeof(c)); + c.common.opcode = nvme_admin_set_features; + c.common.cdw10[0] = 0xB; /* Feature: Async Event Configuration */ + c.common.cdw10[1] = 0x3FF; /* Enable all available events */ + + return nvme_submit_sync_cmd(dev->ctrl.admin_q, &c, NULL, 0); +} + static void nvme_reset_work(struct work_struct *work) { struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work); @@ -1849,6 +1861,10 @@ static void nvme_reset_work(struct work_struct *work) if (dev->online_queues > 1) nvme_queue_scan(&dev->ctrl); + + if (nvme_enable_async_events(dev)) + dev_info(dev->ctrl.device, + "failed to enable async events notification\n"); return; out: -- 2.1.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli @ 2016-06-20 6:55 ` Sagi Grimberg 2016-06-20 17:03 ` Guilherme G. Piccoli 0 siblings, 1 reply; 10+ messages in thread From: Sagi Grimberg @ 2016-06-20 6:55 UTC (permalink / raw) > Asynchronous events notification currently is disabled by default; > to enable it, one should issue a set-feature command through > nvme-cli userspace application. The tool also allows disabling > these events once they're enable, as per user desire. > > This patch makes the asynchronous events notification enabled by > default; to do so, we submit the set-feature command from the > driver, in the end of nvme_reset_work() routine. This way, the > feature is enabled on the driver initialization and after resets, > in case they happen. AEN was already enabled by default in: commit f866fc4282a81673ef973ad54c68235a3263b42e Author: Christoph Hellwig <hch at lst.de> Date: Tue Apr 26 13:52:00 2016 +0200 nvme: move AER handling to common code The transport driver still needs to do the actual submission, but all the higher level code can be shared. Signed-off-by: Christoph Hellwig <hch at lst.de> Reviewed-by: Sagi Grimberg <sagi at grimberg.me> Signed-off-by: Jens Axboe <axboe at fb.com> (Assuming you have at least a single queue). ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-20 6:55 ` Sagi Grimberg @ 2016-06-20 17:03 ` Guilherme G. Piccoli 2016-06-20 20:21 ` Keith Busch 0 siblings, 1 reply; 10+ messages in thread From: Guilherme G. Piccoli @ 2016-06-20 17:03 UTC (permalink / raw) On 06/20/2016 03:55 AM, Sagi Grimberg wrote: > >> Asynchronous events notification currently is disabled by default; >> to enable it, one should issue a set-feature command through >> nvme-cli userspace application. The tool also allows disabling >> these events once they're enable, as per user desire. >> >> This patch makes the asynchronous events notification enabled by >> default; to do so, we submit the set-feature command from the >> driver, in the end of nvme_reset_work() routine. This way, the >> feature is enabled on the driver initialization and after resets, >> in case they happen. > > AEN was already enabled by default in: > > commit f866fc4282a81673ef973ad54c68235a3263b42e > Author: Christoph Hellwig <hch at lst.de> > Date: Tue Apr 26 13:52:00 2016 +0200 > > nvme: move AER handling to common code > > The transport driver still needs to do the actual submission, but > all the > higher level code can be shared. > > Signed-off-by: Christoph Hellwig <hch at lst.de> > Reviewed-by: Sagi Grimberg <sagi at grimberg.me> > Signed-off-by: Jens Axboe <axboe at fb.com> > > (Assuming you have at least a single queue). Sagi, I might be wrong in my understanding (sorry in advance if this is the case hehe) but I guess the above commit only re-factored the async events handling on driver - the functions reworked by the patch aim to handle the events once they are 'captured', which is feasible only if they are enabled by setting the feature (with nvme-cli, for example). The goal of my patch is to do the 'job' of nvme-cli automatically from within the driver. One thing I was thinking is to add a sysfs parameter to be able to avoid the automatic enablement of async events in case we have a bogus adapter showing too many events, for example...not sure if this is necessary though, so I didn't add the parameter on this submission. Thanks, Guilherme > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-20 17:03 ` Guilherme G. Piccoli @ 2016-06-20 20:21 ` Keith Busch 2016-06-24 8:17 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Keith Busch @ 2016-06-20 20:21 UTC (permalink / raw) On Mon, Jun 20, 2016@02:03:37PM -0300, Guilherme G. Piccoli wrote: > The goal of my patch is to do the 'job' of nvme-cli automatically from > within the driver. > > One thing I was thinking is to add a sysfs parameter to be able to avoid the > automatic enablement of async events in case we have a bogus adapter showing > too many events, for example...not sure if this is necessary though, so I > didn't add the parameter on this submission. Instead of simply logging a message, is there something better we can do to notify a user of an async event result? The event should be masked by the controller until the appropriate log page is read, and I don't think scanning the kernel logs is how such a program wants to find out which page to read. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-20 20:21 ` Keith Busch @ 2016-06-24 8:17 ` Christoph Hellwig 2016-06-28 15:11 ` Keith Busch 2016-07-06 22:47 ` Guilherme G. Piccoli 0 siblings, 2 replies; 10+ messages in thread From: Christoph Hellwig @ 2016-06-24 8:17 UTC (permalink / raw) On Mon, Jun 20, 2016@04:21:35PM -0400, Keith Busch wrote: > Instead of simply logging a message, is there something better we can do > to notify a user of an async event result? The event should be masked > by the controller until the appropriate log page is read, and I don't > think scanning the kernel logs is how such a program wants to find out > which page to read. Yes, I'm not too excited about handling the AERs in kernel space, having a userspace daemon to listen for them would generally be more useful. The only exception is the namespace changed AER that we already handle. But while we're at it: why do we make use of the changed namespace list log page in this case instead of doing a full recan. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-24 8:17 ` Christoph Hellwig @ 2016-06-28 15:11 ` Keith Busch 2016-06-29 4:17 ` Gabriel Krisman Bertazi 2016-07-06 22:47 ` Guilherme G. Piccoli 1 sibling, 1 reply; 10+ messages in thread From: Keith Busch @ 2016-06-28 15:11 UTC (permalink / raw) On Fri, Jun 24, 2016@01:17:26AM -0700, Christoph Hellwig wrote: > Yes, I'm not too excited about handling the AERs in kernel space, > having a userspace daemon to listen for them would generally be more > useful. How do daemons typically want to be notified of an event like this? I'm thinking sysfs_notify or maybe a KOBJ_CHANGE uevent. I don't know, this is unfamiliar territory to me. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-28 15:11 ` Keith Busch @ 2016-06-29 4:17 ` Gabriel Krisman Bertazi 0 siblings, 0 replies; 10+ messages in thread From: Gabriel Krisman Bertazi @ 2016-06-29 4:17 UTC (permalink / raw) Keith Busch <keith.busch at intel.com> writes: > On Fri, Jun 24, 2016@01:17:26AM -0700, Christoph Hellwig wrote: >> Yes, I'm not too excited about handling the AERs in kernel space, >> having a userspace daemon to listen for them would generally be more >> useful. > > How do daemons typically want to be notified of an event like this? I'm > thinking sysfs_notify or maybe a KOBJ_CHANGE uevent. I don't know, this > is unfamiliar territory to me. For IPR (IBM Hardware RAID SCSI controller) we use KOBJ_CHANGE events for a very similar situation, in which the driver signals a userspace daemon, which goes over a sysfs attribute to collect adapter data dumps. I'm also working on another async notifications mechanism for IPR, similar to what is supported by nvme, in which I plan to use the same uevent interface for userspace notifications. -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default 2016-06-24 8:17 ` Christoph Hellwig 2016-06-28 15:11 ` Keith Busch @ 2016-07-06 22:47 ` Guilherme G. Piccoli 1 sibling, 0 replies; 10+ messages in thread From: Guilherme G. Piccoli @ 2016-07-06 22:47 UTC (permalink / raw) On 06/24/2016 05:17 AM, Christoph Hellwig wrote: > On Mon, Jun 20, 2016@04:21:35PM -0400, Keith Busch wrote: >> Instead of simply logging a message, is there something better we can do >> to notify a user of an async event result? The event should be masked >> by the controller until the appropriate log page is read, and I don't >> think scanning the kernel logs is how such a program wants to find out >> which page to read. > > Yes, I'm not too excited about handling the AERs in kernel space, > having a userspace daemon to listen for them would generally be more > useful. Thanks very much for all the replies - sorry for the delay in my response. I guess we have 2 ways of dealing with those events, one being from kernel, the other by working in a daemon like Krisman mentioned (the IPR case). Both approaches seems to have their importance; an automatic error reporting tool might schedule greps on dmesg to find errors, so it can be useful showing them on kernel logs. More yet, some critical errors should be reported on kernel logs always, IMHO (like the adapter persistent error). On the other hand, daemons seem to be a better accepted solution (as per this thread's comments hehe) and they are more popular maybe. So 2 possible implementations come to my mind, one not excluding the other. I'll elaborate below, but since they can work concomitantly, I'd go for implementing both. (i) We can implement a daemon like Krisman suggested, that uses uevents and reads the last async event from a sysfs entry, cleaning the necessary bit to allow more events to be reported; (ii) We can implement a special sysfs parameter that defaults to 0, allowing the use of (i). This parameter is a timeout, in seconds - when this is set to a value >0, approach (i) is disabled, and we start to monitor the events from the driver. Once an event is captured, we print it on kernel log, wait the timeout and clear the necessary bit to allow new events to be reported, and so on...allowing this way a dmesg events approach, without flooding the log or inhibiting the users that prefer (i). Comments and suggestions are much appreciated. Thanks in advance, Guilherme > The only exception is the namespace changed AER that we already handle. > But while we're at it: why do we make use of the changed namespace > list log page in this case instead of doing a full recan. > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-07-06 22:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-17 17:21 [PATCH 0/2] 2 patches about asynchronous events notification Guilherme G. Piccoli 2016-06-17 17:21 ` [PATCH 1/2] nvme: introduce asynchronous events textual output Guilherme G. Piccoli 2016-06-17 17:21 ` [PATCH 2/2] [RFC] nvme: enable asynchronous events notification by default Guilherme G. Piccoli 2016-06-20 6:55 ` Sagi Grimberg 2016-06-20 17:03 ` Guilherme G. Piccoli 2016-06-20 20:21 ` Keith Busch 2016-06-24 8:17 ` Christoph Hellwig 2016-06-28 15:11 ` Keith Busch 2016-06-29 4:17 ` Gabriel Krisman Bertazi 2016-07-06 22:47 ` Guilherme G. Piccoli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).