public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sinan Kaya <okaya@codeaurora.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, timur@codeaurora.org, ryan@finnie.org,
	linux-arm-msm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, stable@vger.kernel.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Kate Stewart <kstewart@linuxfoundation.org>,
	Frederick Lawler <fred@fredlawl.com>,
	Dongdong Liu <liudongdong3@huawei.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	open list <linux-kernel@vger.kernel.org>,
	Don Brace <don.brace@microsemi.com>,
	esc.storagedev@microsemi.com, linux-scsi@vger.kernel.org
Subject: Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown
Date: Wed, 23 May 2018 18:57:18 -0400	[thread overview]
Message-ID: <61f70fd6-52fd-da07-ce73-303f95132131@codeaurora.org> (raw)
In-Reply-To: <20180523213249.GD150632@bhelgaas-glaptop.roam.corp.google.com>

On 5/23/2018 5:32 PM, Bjorn Helgaas wrote:
> 
> The crash seems to indicate that the hpsa device attempted a DMA after
> we cleared the Root Port's PCI_COMMAND_MASTER, which means
> hpsa_shutdown() didn't stop DMA from the device (it looks like *most*
> shutdown methods don't disable device DMA, so it's in good company).

All drivers are expected to shutdown DMA and interrupts in their shutdown()
routines. They can skip removing threads, data structures etc. but DMA and
interrupt disabling are required. This is the difference between shutdown()
and remove() callbacks.

If you see that this is not being done in HPSA, then that is where the
bugfix should be.

Counter argument is that if shutdown() is not implemented, at least remove()
should be called. Expecting all drivers to implement shutdown() callbacks
is just bad by design in my opinion. 

Code should have fallen back to remove() if shutdown() doesn't exist.
I can propose a patch for this but this is yet another story to chase.

> 
>> This has been found to cause crashes on HP DL360 Gen9 machines during
>> reboot. Besides, kexec is already clearing the bus master bit in
>> pci_device_shutdown() after all PCI drivers are removed.
> 
> The original path was:
> 
>   pci_device_shutdown(hpsa)
>     drv->shutdown
>       hpsa_shutdown                     # hpsa_pci_driver.shutdown
>   ...
>   pci_device_shutdown(RP)               # root port
>     drv->shutdown
>       pcie_portdrv_remove               # pcie_portdriver.shutdown
>         pcie_port_device_remove
>           pci_disable_device
>             do_pci_disable_device
>               # clear RP PCI_COMMAND_MASTER
>     if (kexec)
>       pci_clear_master(RP)
>         # clear RP PCI_COMMAND_MASTER
> 
> If I understand correctly, the new path after this patch is:
> 
>   pci_device_shutdown(hpsa)
>     drv->shutdown
>       hpsa_shutdown                     # hpsa_pci_driver.shutdown
>   ...
>   pci_device_shutdown(RP)               # root port
>     drv->shutdown
>       pcie_portdrv_shutdown             # pcie_portdriver.shutdown
>         __pcie_portdrv_remove(RP, false)
>           pcie_port_device_remove(RP, false)
>             # do NOT clear RP PCI_COMMAND_MASTER

yup

>     if (kexec)
>       pci_clear_master(RP)
>         # clear RP PCI_COMMAND_MASTER
> 
> I guess this patch avoids the panic during reboot because we're not in
> the kexec path, so we never clear PCI_COMMAND_MASTER for the Root
> Port, so the hpsa device can DMA happily until the lights go out.
> 
> But DMA continuing for some random amount of time before the reboot or
> shutdown happens makes me a little queasy.  That doesn't sound safe.
> The more I think about this, the more confused I get.  What am I
> missing?  

see above.

> 
>> Just remove the extra clear in shutdown path by seperating the remove and
>> shutdown APIs in the PORTDRV.
>>
>>  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
>> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = {
>>  
>>  	.probe		= pcie_portdrv_probe,
>>  	.remove		= pcie_portdrv_remove,
>> -	.shutdown	= pcie_portdrv_remove,
>> +	.shutdown	= pcie_portdrv_shutdown,
> 
> What are the circumstances when we call .remove() vs .shutdown()?
> 
> I guess the main (maybe only) way to call .remove() is to hot-remove
> the port?  And .shutdown() is basically used in the reboot and kexec
> paths?

Correct. shutdown() is only called during reboot/shutdown calls. If you echo
1 into the remove file, remove() gets called. Handy for hotplug use cases.
It needs to be the exact opposite of the probe. It needs to clean up resources
etc. and have the HW in a state where it can be reinitialized via probe again.

> 
>>  	.err_handler	= &pcie_portdrv_err_handler,
>>  
>> -- 
>> 2.7.4
>>
> 


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2018-05-23 22:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23  2:44 [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown Sinan Kaya
2018-05-23 21:32 ` Bjorn Helgaas
2018-05-23 22:57   ` Sinan Kaya [this message]
2018-05-24 11:43     ` Sinan Kaya
2018-05-24 13:07       ` Bjorn Helgaas
2018-05-24 13:35         ` okaya
     [not found]         ` <36b790d3fb4c43349cfa560283c03ab5@microsemi.com>
2018-05-28 21:25           ` Sinan Kaya
2018-05-24 18:35     ` Bjorn Helgaas
2018-05-25 13:30       ` Sinan Kaya
2018-05-25 22:10         ` Bjorn Helgaas
2018-05-25 22:34           ` okaya
2018-05-30  2:41           ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61f70fd6-52fd-da07-ce73-303f95132131@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=bhelgaas@google.com \
    --cc=don.brace@microsemi.com \
    --cc=esc.storagedev@microsemi.com \
    --cc=fred@fredlawl.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=ryan@finnie.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox