Re: [PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nilay Shroff <nilay@linux.ibm.com>
To: Keith Busch <kbusch@kernel.org>
Cc: linux-nvme@lists.infradead.org, axboe@fb.com, hch@lst.de,
	sagi@grimberg.me, linux-block@vger.kernel.org,
	gjoyce@linux.ibm.com
Subject: Re: [PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset
Date: Wed, 13 Mar 2024 17:29:35 +0530	[thread overview]
Message-ID: <af991867-02a3-4ff9-94fc-50955913f227@linux.ibm.com> (raw)
In-Reply-To: <ZfBm72xPozFN99GA@kbusch-mbp.mynextlight.net>



On 3/12/24 20:00, Keith Busch wrote:
> On Mon, Mar 11, 2024 at 06:28:21PM +0530, Nilay Shroff wrote:
>> @@ -3295,10 +3304,13 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
>>         case pci_channel_io_frozen:
>>                 dev_warn(dev->ctrl.device,
>>                         "frozen state error detected, reset controller\n");
>> -               if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>> -                       nvme_dev_disable(dev, true);
>> -                       return PCI_ERS_RESULT_DISCONNECT;
>> +               if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING) {
>> +                       if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>> +                               nvme_dev_disable(dev, true);
>> +                               return PCI_ERS_RESULT_DISCONNECT;
>> +                       }
>>                 }
>> +               flush_work(&dev->ctrl.reset_work);
> 
> I was messing with a similar idea. I wasn't sure if EEH calls the error
> handler inline with the error, in which case this would try to flush the
> work within the same work, which obviously doesn't work. As long as its
> called from a different thread, then this should be fine.
The EEH recovery happens from different thread and so flush work should 
work here as expected.

>> @@ -2776,6 +2776,16 @@ static void nvme_reset_work(struct work_struct *work)
>>   out_unlock:
>>         mutex_unlock(&dev->shutdown_lock);
>>   out:
>> +       /*
>> +        * If PCI recovery is ongoing then let it finish first
>> +        */
>> +       if (pci_channel_offline(to_pci_dev(dev->dev))) {
>> +               dev_warn(dev->ctrl.device, "PCI recovery is ongoing so let it finish\n");
>> +               if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING)
>> +                       WRITE_ONCE(dev->ctrl.state, NVME_CTRL_RESETTING);
> 
> This may break the state machine, like if the device was hot removed
> during all this error handling. This will force the state back to
> RESETTING when it should be DEAD.
Agreed, we shouldn't force reset state to RESETTING here.
> 
> I think what you need is just allow a controller to reset from a
> connecting state. Have to be careful that wouldn't break any other
> expectations, though.
Yeah ok got your point, so I have reworked the ctrl state machine as you 
suggested and tested the changes. The updated state machine code is shown
below:

@@ -546,10 +546,11 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
                break;
        case NVME_CTRL_RESETTING:
                switch (old_state) {
                case NVME_CTRL_NEW:
                case NVME_CTRL_LIVE:
+               case NVME_CTRL_CONNECTING:
                        changed = true;
                        fallthrough;
                default:
                        break;
                }

And accordingly updated reset_work function is show below. Here we ensure that
even though the pci error recovery is in progress, if we couldn't move ctrl state
to RESETTING then we would let rest_work forward progress and set the ctrl state
to DEAD.

@@ -2774,10 +2774,19 @@ static void nvme_reset_work(struct work_struct *work)
        return;
 
  out_unlock:
        mutex_unlock(&dev->shutdown_lock);
  out:
+       /*
+        * If PCI recovery is ongoing then let it finish first
+        */
+       if (pci_channel_offline(to_pci_dev(dev->dev))) {
+               if (nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
+                       dev_warn(dev->ctrl.device, "PCI recovery is ongoing, let it finish\n");
+                       return;
+               }
+       }
        /*
         * Set state to deleting now to avoid blocking nvme_wait_reset(), which
         * may be holding this pci_dev's device lock.
         */
        dev_warn(dev->ctrl.device, "Disabling device after reset failure: %d\n",


Now I have also included in my test the hot removal of NVMe adapter while EEH recovery
is in progress. And the  EEH recovery code handles this case well : When EEH recovery 
is in progress and if we hot removes the adapter (which is being recovered) then EEH
handler would stop the recovery, set the PCI channel state to "pci_channel_io_perm_failure".


Collected the logs of this case, (shown below):
-----------------------------------------------
# nvme list-subsys 
nvme-subsys0 - NQN=nqn.1994-11.com.samsung:nvme:PM1735:2.5-inch:S6EUNA0R500358
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:12b49f6e-0276-4746-b10c-56815b7e6dc2
               iopolicy=numa

# lspci 
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X

# dmesg
[  561.639102] EEH: Recovering PHB#18-PE#10000
[  561.639120] EEH: PE location: N/A, PHB location: N/A
[  561.639128] EEH: Frozen PHB#18-PE#10000 detected
<snip>
<snip>
[  561.639428] EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[  561.639441] EEH: Notify device drivers to shutdown
[  561.639450] EEH: Beginning: 'error_detected(IO frozen)'
[  561.639458] PCI 0018:01:00.0#10000: EEH: Invoking nvme->error_detected(IO frozen)
[  561.639462] nvme nvme0: frozen state error detected, reset controller
[  561.719078] nvme 0018:01:00.0: enabling device (0000 -> 0002)
[  561.719318] nvme nvme0: PCI recovery is ongoing so let it finish

<!!!! WHILE EEH RECOVERY IS IN PROGRESS WE HOT REMOVE THE NVMe ADAPTER !!!!>

[  563.850328] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
<snip>
[  571.879092] PCI 0018:01:00.0#10000: EEH: nvme driver reports: 'need reset'
[  571.879097] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
<snip>
<snip>
[  571.881761] EEH: Reset without hotplug activity
[  574.039807] EEH: PHB#18-PE#10000: Slot inactive after reset: 0x2 (attempt 1)
[  574.309091] EEH: Failure 2 resetting PHB#18-PE#10000 (attempt 2)
[  574.309091] 
[  574.579094] EEH: Failure 2 resetting PHB#18-PE#10000 (attempt 3)
[  574.579094] 
[  574.579101] eeh_handle_normal_event: Cannot reset, err=-5
[  574.579104] EEH: Unable to recover from failure from PHB#18-PE#10000.
[  574.579104] Please try reseating or replacing it
<snip>
<snip>
[  574.581314] EEH: Beginning: 'error_detected(permanent failure)'
[  574.581320] PCI 0018:01:00.0#10000: EEH: no device

# lspci
<empty>

# nvme list-subsys
<empty>


Another case tested, when the reset_work is ongoing post subsystem-reset command
and if user immediately hot removes the NVMe adapter then EEH recovery is *not* 
triggered and ctrl forward progress to the "DEAD" state.

Collected the logs of this case, (shown below):
-----------------------------------------------
# nvme list-subsys 
nvme-subsys0 - NQN=nqn.1994-11.com.samsung:nvme:PM1735:2.5-inch:S6EUNA0R500358
               hostnqn=nqn.2014-08.org.nvmexpress:uuid:12b49f6e-0276-4746-b10c-56815b7e6dc2
               iopolicy=numa

# lspci 
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X

# nvme subsystem-reset /dev/nvme0

<!!!! HOT REMOVE THE NVMe ADAPTER !!!!>

# dmesg
[ 9967.143886] eeh_handle_normal_event: Cannot find PCI bus for PHB#18-PE#10000
[ 9967.224078] eeh_handle_normal_event: Cannot find PCI bus for PHB#18-PE#10000
<snip>
[ 9967.223858] nvme 0018:01:00.0: enabling device (0000 -> 0002)
[ 9967.224140] nvme nvme0: Disabling device after reset failure: -19

# lspci
<empty>

# nvme list-subsys
<empty>


Please let me know if the above changes look good to you. If it looks good then 
I would spin a new version of the patch and send for a review.

Thanks,
--Nilay

next prev parent reply	other threads:[~2024-03-13 11:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-09  5:02 [PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset Nilay Shroff
2024-02-22 21:00 ` Greg Joyce
     [not found] ` <2c76725c-7bb6-4827-b45a-dbe1acbefba7@imap.linux.ibm.com>
2024-02-27 18:14   ` Nilay Shroff
2024-02-27 18:29 ` Keith Busch
2024-02-28 11:19   ` Nilay Shroff
2024-02-29 12:27     ` Nilay Shroff
2024-03-06 11:20   ` Nilay Shroff
2024-03-06 15:19     ` Keith Busch
2024-03-08 15:41 ` Keith Busch
     [not found]   ` <039541c8-2e13-442e-bd5b-90a799a9851a@linux.ibm.com>
     [not found]     ` <ZeyD6xh0LGZyRBfO@kbusch-mbp>
2024-03-09 19:05       ` Nilay Shroff
2024-03-11  4:41         ` Keith Busch
2024-03-11 12:58           ` Nilay Shroff
2024-03-12 14:30             ` Keith Busch
2024-03-13 11:59               ` Nilay Shroff [this message]
2024-03-22  5:02                 ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af991867-02a3-4ff9-94fc-50955913f227@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@fb.com \
    --cc=gjoyce@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).