* [PATCH] Avoid reset work on watchdog timer function during error recovery
@ 2016-04-06 20:30 Guilherme G. Piccoli
2016-04-06 20:50 ` Keith Busch
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Guilherme G. Piccoli @ 2016-04-06 20:30 UTC (permalink / raw)
This patch adds a check on nvme_watchdog_timer() function to avoid the
call to reset_work() when an error recovery process is ongoing on
controller. The check is made by looking at pci_channel_offline()
result.
If we don't check for this on nvme_watchdog_timer(), error recovery
mechanism can't recover well, because reset_work() won't be able to
do its job (since we're in the middle of an error) and so the
controller is removed from the system before error recovery mechanism
can perform slot reset (which would allow the adapter to recover).
Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
---
drivers/nvme/host/pci.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 24ccda3..3cc146d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1357,10 +1357,12 @@ static void nvme_watchdog_timer(unsigned long data)
/*
* Skip controllers currently under reset.
+ * Also, skip controllers going through PCI error recovery.
*/
if (!work_pending(&dev->reset_work) && !work_busy(&dev->reset_work) &&
((csts & NVME_CSTS_CFS) ||
- (dev->subsystem && (csts & NVME_CSTS_NSSRO)))) {
+ (dev->subsystem && (csts & NVME_CSTS_NSSRO))) &&
+ !pci_channel_offline(to_pci_dev(dev->dev))) {
if (queue_work(nvme_workq, &dev->reset_work)) {
dev_warn(dev->dev,
"Failed status: 0x%x, reset controller.\n",
--
2.1.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-06 20:30 [PATCH] Avoid reset work on watchdog timer function during error recovery Guilherme G. Piccoli
@ 2016-04-06 20:50 ` Keith Busch
2016-04-07 8:05 ` Johannes Thumshirn
2016-04-07 13:11 ` Christoph Hellwig
2 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2016-04-06 20:50 UTC (permalink / raw)
On Wed, Apr 06, 2016@05:30:35PM -0300, Guilherme G. Piccoli wrote:
> This patch adds a check on nvme_watchdog_timer() function to avoid the
> call to reset_work() when an error recovery process is ongoing on
> controller. The check is made by looking at pci_channel_offline()
> result.
>
> If we don't check for this on nvme_watchdog_timer(), error recovery
> mechanism can't recover well, because reset_work() won't be able to
> do its job (since we're in the middle of an error) and so the
> controller is removed from the system before error recovery mechanism
> can perform slot reset (which would allow the adapter to recover).
>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
Reviewed-by: Keith Busch <keith.busch at intel.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-06 20:30 [PATCH] Avoid reset work on watchdog timer function during error recovery Guilherme G. Piccoli
2016-04-06 20:50 ` Keith Busch
@ 2016-04-07 8:05 ` Johannes Thumshirn
2016-04-07 13:11 ` Christoph Hellwig
2 siblings, 0 replies; 13+ messages in thread
From: Johannes Thumshirn @ 2016-04-07 8:05 UTC (permalink / raw)
On Mittwoch, 6. April 2016 17:30:35 CEST Guilherme G. Piccoli wrote:
> This patch adds a check on nvme_watchdog_timer() function to avoid the
> call to reset_work() when an error recovery process is ongoing on
> controller. The check is made by looking at pci_channel_offline()
> result.
>
> If we don't check for this on nvme_watchdog_timer(), error recovery
> mechanism can't recover well, because reset_work() won't be able to
> do its job (since we're in the middle of an error) and so the
> controller is removed from the system before error recovery mechanism
> can perform slot reset (which would allow the adapter to recover).
>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli at linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
--
Johannes Thumshirn Storage
jthumshirn at suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-06 20:30 [PATCH] Avoid reset work on watchdog timer function during error recovery Guilherme G. Piccoli
2016-04-06 20:50 ` Keith Busch
2016-04-07 8:05 ` Johannes Thumshirn
@ 2016-04-07 13:11 ` Christoph Hellwig
2016-04-07 13:17 ` Guilherme G. Piccoli
` (2 more replies)
2 siblings, 3 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-04-07 13:11 UTC (permalink / raw)
On Wed, Apr 06, 2016@05:30:35PM -0300, Guilherme G. Piccoli wrote:
> /*
> * Skip controllers currently under reset.
> + * Also, skip controllers going through PCI error recovery.
> */
> if (!work_pending(&dev->reset_work) && !work_busy(&dev->reset_work) &&
> ((csts & NVME_CSTS_CFS) ||
> - (dev->subsystem && (csts & NVME_CSTS_NSSRO)))) {
> + (dev->subsystem && (csts & NVME_CSTS_NSSRO))) &&
> + !pci_channel_offline(to_pci_dev(dev->dev))) {
> if (queue_work(nvme_workq, &dev->reset_work)) {
> dev_warn(dev->dev,
> "Failed status: 0x%x, reset controller.\n",
This looks correct to me, but the condition is getting basically
unreadable. Can you factor it out into a little helper returns a
boolean value if we should reset or not, and document each condition,
e.g. something like the function below, just with proper comments:
static bool nvme_should_reset(struct nvme_dev *dev)
{
u32 csts = readl(dev->bar + NVME_REG_CSTS);
if (!(csts & NVME_CSTS_CFS) &&
!((dev->subsystem && (csts & NVME_CSTS_NSSRO))))
return false;
if (work_pending(&dev->reset_work))
return false;
if (work_busy(&dev->reset_work))
return false;
if (pci_channel_offline(to_pci_dev(dev->dev))
return false;
return true;
}
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 13:11 ` Christoph Hellwig
@ 2016-04-07 13:17 ` Guilherme G. Piccoli
2016-04-07 13:18 ` Sagi Grimberg
2016-04-07 14:24 ` Keith Busch
2 siblings, 0 replies; 13+ messages in thread
From: Guilherme G. Piccoli @ 2016-04-07 13:17 UTC (permalink / raw)
On 04/07/2016 10:11 AM, Christoph Hellwig wrote:
> On Wed, Apr 06, 2016@05:30:35PM -0300, Guilherme G. Piccoli wrote:
>> /*
>> * Skip controllers currently under reset.
>> + * Also, skip controllers going through PCI error recovery.
>> */
>> if (!work_pending(&dev->reset_work) && !work_busy(&dev->reset_work) &&
>> ((csts & NVME_CSTS_CFS) ||
>> - (dev->subsystem && (csts & NVME_CSTS_NSSRO)))) {
>> + (dev->subsystem && (csts & NVME_CSTS_NSSRO))) &&
>> + !pci_channel_offline(to_pci_dev(dev->dev))) {
>> if (queue_work(nvme_workq, &dev->reset_work)) {
>> dev_warn(dev->dev,
>> "Failed status: 0x%x, reset controller.\n",
>
> This looks correct to me, but the condition is getting basically
> unreadable. Can you factor it out into a little helper returns a
> boolean value if we should reset or not, and document each condition,
> e.g. something like the function below, just with proper comments:
>
> static bool nvme_should_reset(struct nvme_dev *dev)
> {
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
>
> if (!(csts & NVME_CSTS_CFS) &&
> !((dev->subsystem && (csts & NVME_CSTS_NSSRO))))
> return false;
>
> if (work_pending(&dev->reset_work))
> return false;
> if (work_busy(&dev->reset_work))
> return false;
> if (pci_channel_offline(to_pci_dev(dev->dev))
> return false;
>
> return true;
> }
>
Heheh agreed, very good suggestion indeed. I found myself struggling to
understand this huge condition line. I'll improve it following your
suggestion, thanks.
Cheers,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 13:11 ` Christoph Hellwig
2016-04-07 13:17 ` Guilherme G. Piccoli
@ 2016-04-07 13:18 ` Sagi Grimberg
2016-04-07 13:26 ` Christoph Hellwig
2016-04-07 14:24 ` Keith Busch
2 siblings, 1 reply; 13+ messages in thread
From: Sagi Grimberg @ 2016-04-07 13:18 UTC (permalink / raw)
On 07/04/16 16:11, Christoph Hellwig wrote:
> On Wed, Apr 06, 2016@05:30:35PM -0300, Guilherme G. Piccoli wrote:
>> /*
>> * Skip controllers currently under reset.
>> + * Also, skip controllers going through PCI error recovery.
>> */
>> if (!work_pending(&dev->reset_work) && !work_busy(&dev->reset_work) &&
>> ((csts & NVME_CSTS_CFS) ||
>> - (dev->subsystem && (csts & NVME_CSTS_NSSRO)))) {
>> + (dev->subsystem && (csts & NVME_CSTS_NSSRO))) &&
>> + !pci_channel_offline(to_pci_dev(dev->dev))) {
>> if (queue_work(nvme_workq, &dev->reset_work)) {
>> dev_warn(dev->dev,
>> "Failed status: 0x%x, reset controller.\n",
> This looks correct to me, but the condition is getting basically
> unreadable. Can you factor it out into a little helper returns a
> boolean value if we should reset or not, and document each condition,
> e.g. something like the function below, just with proper comments:
Agreed, but nvme_should_reset() sounds like you are checking if a reset
is needed, maybe call it nvme_can_reset()?
>
> static bool nvme_should_reset(struct nvme_dev *dev)
> {
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
>
> if (!(csts & NVME_CSTS_CFS) &&
> !((dev->subsystem && (csts & NVME_CSTS_NSSRO))))
> return false;
>
> if (work_pending(&dev->reset_work))
> return false;
> if (work_busy(&dev->reset_work))
> return false;
> if (pci_channel_offline(to_pci_dev(dev->dev))
> return false;
>
> return true;
> }
>
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 13:18 ` Sagi Grimberg
@ 2016-04-07 13:26 ` Christoph Hellwig
0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2016-04-07 13:26 UTC (permalink / raw)
On Thu, Apr 07, 2016@04:18:51PM +0300, Sagi Grimberg wrote:
> Agreed, but nvme_should_reset() sounds like you are checking if a reset
> is needed,
But that's exaxtly what it does! It reads CSTS to check for one of the
two bits in there indicating that we should initiate a reset, except for
various conditions why we think we better shouldn't reset at the
moment..
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 13:11 ` Christoph Hellwig
2016-04-07 13:17 ` Guilherme G. Piccoli
2016-04-07 13:18 ` Sagi Grimberg
@ 2016-04-07 14:24 ` Keith Busch
2016-04-07 15:42 ` Guilherme G. Piccoli
2 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2016-04-07 14:24 UTC (permalink / raw)
On Thu, Apr 07, 2016@06:11:31AM -0700, Christoph Hellwig wrote:
> static bool nvme_should_reset(struct nvme_dev *dev)
> {
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
>
> if (!(csts & NVME_CSTS_CFS) &&
> !((dev->subsystem && (csts & NVME_CSTS_NSSRO))))
> return false;
>
> if (work_pending(&dev->reset_work))
> return false;
> if (work_busy(&dev->reset_work))
> return false;
We could also skip checking work_pending since work_busy does that for us.
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 14:24 ` Keith Busch
@ 2016-04-07 15:42 ` Guilherme G. Piccoli
2016-04-07 15:46 ` Keith Busch
0 siblings, 1 reply; 13+ messages in thread
From: Guilherme G. Piccoli @ 2016-04-07 15:42 UTC (permalink / raw)
On 04/07/2016 11:24 AM, Keith Busch wrote:
> On Thu, Apr 07, 2016@06:11:31AM -0700, Christoph Hellwig wrote:
>> static bool nvme_should_reset(struct nvme_dev *dev)
>> {
>> u32 csts = readl(dev->bar + NVME_REG_CSTS);
>>
>> if (!(csts & NVME_CSTS_CFS) &&
>> !((dev->subsystem && (csts & NVME_CSTS_NSSRO))))
>> return false;
>>
>> if (work_pending(&dev->reset_work))
>> return false;
>> if (work_busy(&dev->reset_work))
>> return false;
>
> We could also skip checking work_pending since work_busy does that for us.
>
Great! What name should/can I choose for the function?
Thanks,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 15:42 ` Guilherme G. Piccoli
@ 2016-04-07 15:46 ` Keith Busch
2016-04-07 16:03 ` Guilherme G. Piccoli
0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2016-04-07 15:46 UTC (permalink / raw)
On Thu, Apr 07, 2016@12:42:29PM -0300, Guilherme G. Piccoli wrote:
> Great! What name should/can I choose for the function?
I liked Christoph's 'nvme_should_reset' suggestion.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 15:46 ` Keith Busch
@ 2016-04-07 16:03 ` Guilherme G. Piccoli
2016-04-12 19:01 ` Christoph Hellwig
0 siblings, 1 reply; 13+ messages in thread
From: Guilherme G. Piccoli @ 2016-04-07 16:03 UTC (permalink / raw)
On 04/07/2016 12:46 PM, Keith Busch wrote:
> On Thu, Apr 07, 2016@12:42:29PM -0300, Guilherme G. Piccoli wrote:
>> Great! What name should/can I choose for the function?
>
> I liked Christoph's 'nvme_should_reset' suggestion.
>
Ok, cool. I'll use this name.
Thanks,
Guilherme
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Avoid reset work on watchdog timer function during error recovery
2016-04-07 16:03 ` Guilherme G. Piccoli
@ 2016-04-12 19:01 ` Christoph Hellwig
2016-04-12 23:33 ` Guilherme G. Piccoli
0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2016-04-12 19:01 UTC (permalink / raw)
On Thu, Apr 07, 2016@01:03:02PM -0300, Guilherme G. Piccoli wrote:
> On 04/07/2016 12:46 PM, Keith Busch wrote:
> >On Thu, Apr 07, 2016@12:42:29PM -0300, Guilherme G. Piccoli wrote:
> >>Great! What name should/can I choose for the function?
> >
> >I liked Christoph's 'nvme_should_reset' suggestion.
> >
>
> Ok, cool. I'll use this name.
Did you manage to get around resending it?
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-04-12 23:33 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-06 20:30 [PATCH] Avoid reset work on watchdog timer function during error recovery Guilherme G. Piccoli
2016-04-06 20:50 ` Keith Busch
2016-04-07 8:05 ` Johannes Thumshirn
2016-04-07 13:11 ` Christoph Hellwig
2016-04-07 13:17 ` Guilherme G. Piccoli
2016-04-07 13:18 ` Sagi Grimberg
2016-04-07 13:26 ` Christoph Hellwig
2016-04-07 14:24 ` Keith Busch
2016-04-07 15:42 ` Guilherme G. Piccoli
2016-04-07 15:46 ` Keith Busch
2016-04-07 16:03 ` Guilherme G. Piccoli
2016-04-12 19:01 ` Christoph Hellwig
2016-04-12 23:33 ` Guilherme G. Piccoli
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.