* [PATCH v2] nvme: Avoid reset work on watchdog timer function during error recovery
@ 2016-04-12 23:32 Guilherme G. Piccoli
2016-04-13 5:22 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Guilherme G. Piccoli @ 2016-04-12 23:32 UTC (permalink / raw)
This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.
If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).
In this patch we also have splitted the huge condition expression in
nvme_watchdog_timer() by introducing an auxiliary function to help
make the code more readable.
Reviewed-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
---
drivers/nvme/host/pci.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
v2:
* Refactored the if condition on nvme_watchdog_timer() into a
new auxiliary function to improve readability.
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 24ccda3..897cfcb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1350,22 +1350,42 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
return result;
}
+static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
+{
+
+ /* If true, indicates loss of adapter communication, possibly by a
+ * NVMe Subsystem reset. */
+ bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);
+
+ /* If there is a reset ongoing, we shouldn't reset again. */
+ if (work_busy(&dev->reset_work))
+ return false;
+
+ /* We shouldn't reset unless the controller is on fatal error state
+ * _or_ if we lost the communication with it. */
+ if (!(csts & NVME_CSTS_CFS) &&
+ !(dev->subsystem && (csts & NVME_CSTS_NSSRO)))
+ return false;
+
+ /* If PCI error recovery process is happening, we cannot reset or
+ * the recovery mechanism will surely fail. */
+ if (pci_channel_offline(to_pci_dev(dev->dev)))
+ return false;
+
+ return true;
+}
+
static void nvme_watchdog_timer(unsigned long data)
{
struct nvme_dev *dev = (struct nvme_dev *)data;
u32 csts = readl(dev->bar + NVME_REG_CSTS);
- /*
- * Skip controllers currently under reset.
- */
- if (!work_pending(&dev->reset_work) && !work_busy(&dev->reset_work) &&
- ((csts & NVME_CSTS_CFS) ||
- (dev->subsystem && (csts & NVME_CSTS_NSSRO)))) {
- if (queue_work(nvme_workq, &dev->reset_work)) {
+ /* Skip controllers under certain specific conditions. */
+ if (nvme_should_reset(dev, csts)) {
+ if (queue_work(nvme_workq, &dev->reset_work))
dev_warn(dev->dev,
"Failed status: 0x%x, reset controller.\n",
csts);
- }
return;
}
--
2.1.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2] nvme: Avoid reset work on watchdog timer function during error recovery
2016-04-12 23:32 [PATCH v2] nvme: Avoid reset work on watchdog timer function during error recovery Guilherme G. Piccoli
@ 2016-04-13 5:22 ` Jens Axboe
2016-04-13 6:03 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2016-04-13 5:22 UTC (permalink / raw)
On 04/12/2016 05:32 PM, Guilherme G. Piccoli wrote:
> This patch adds a check on nvme_watchdog_timer() function to avoid the
> call to reset_work() when an error recovery process is ongoing on
> controller. The check is made by looking at pci_channel_offline()
> result.
>
> If we don't check for this on nvme_watchdog_timer(), error recovery
> mechanism can't recover well, because reset_work() won't be able to
> do its job (since we're in the middle of an error) and so the
> controller is removed from the system before error recovery mechanism
> can perform slot reset (which would allow the adapter to recover).
>
> In this patch we also have splitted the huge condition expression in
> nvme_watchdog_timer() by introducing an auxiliary function to help
> make the code more readable.
Added for 4.7, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] nvme: Avoid reset work on watchdog timer function during error recovery
2016-04-13 5:22 ` Jens Axboe
@ 2016-04-13 6:03 ` Jens Axboe
2016-04-13 14:12 ` Guilherme G. Piccoli
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2016-04-13 6:03 UTC (permalink / raw)
On 04/12/2016 11:22 PM, Jens Axboe wrote:
> On 04/12/2016 05:32 PM, Guilherme G. Piccoli wrote:
>> This patch adds a check on nvme_watchdog_timer() function to avoid the
>> call to reset_work() when an error recovery process is ongoing on
>> controller. The check is made by looking at pci_channel_offline()
>> result.
>>
>> If we don't check for this on nvme_watchdog_timer(), error recovery
>> mechanism can't recover well, because reset_work() won't be able to
>> do its job (since we're in the middle of an error) and so the
>> controller is removed from the system before error recovery mechanism
>> can perform slot reset (which would allow the adapter to recover).
>>
>> In this patch we also have splitted the huge condition expression in
>> nvme_watchdog_timer() by introducing an auxiliary function to help
>> make the code more readable.
>
> Added for 4.7, thanks.
And dropped again, please at least verify that patches don't introduce
new warnings. You're adding a nssro bool, that then never gets used. Did
you test this version?
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] nvme: Avoid reset work on watchdog timer function during error recovery
2016-04-13 6:03 ` Jens Axboe
@ 2016-04-13 14:12 ` Guilherme G. Piccoli
0 siblings, 0 replies; 4+ messages in thread
From: Guilherme G. Piccoli @ 2016-04-13 14:12 UTC (permalink / raw)
On 04/13/2016 03:03 AM, Jens Axboe wrote:
> On 04/12/2016 11:22 PM, Jens Axboe wrote:
>> On 04/12/2016 05:32 PM, Guilherme G. Piccoli wrote:
>>> This patch adds a check on nvme_watchdog_timer() function to avoid the
>>> call to reset_work() when an error recovery process is ongoing on
>>> controller. The check is made by looking at pci_channel_offline()
>>> result.
>>>
>>> If we don't check for this on nvme_watchdog_timer(), error recovery
>>> mechanism can't recover well, because reset_work() won't be able to
>>> do its job (since we're in the middle of an error) and so the
>>> controller is removed from the system before error recovery mechanism
>>> can perform slot reset (which would allow the adapter to recover).
>>>
>>> In this patch we also have splitted the huge condition expression in
>>> nvme_watchdog_timer() by introducing an auxiliary function to help
>>> make the code more readable.
>>
>> Added for 4.7, thanks.
>
> And dropped again, please at least verify that patches don't introduce
> new warnings. You're adding a nssro bool, that then never gets used. Did
> you test this version?
>
I'm very sorry Jens, totally my fault. The patch was tested and it's
working fine. I just...forgot to check compilation warnings and to use
the checkpatch script :(
Just sent a v3 correcting the issue you pointed and some minor issues
the checkpatch script pointed. The v3 was re-tested too!
Thanks,
Guilherme
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-04-13 14:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-12 23:32 [PATCH v2] nvme: Avoid reset work on watchdog timer function during error recovery Guilherme G. Piccoli
2016-04-13 5:22 ` Jens Axboe
2016-04-13 6:03 ` Jens Axboe
2016-04-13 14:12 ` Guilherme G. Piccoli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.