All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch]PCI:AER:Notify which device has no error_detected callback
@ 2019-12-13 11:44 Yicong Yang
  2019-12-13 22:57 ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Yicong Yang @ 2019-12-13 11:44 UTC (permalink / raw)
  To: linux-pci, bhelgaas

The PCI error recovery will fail if any device under
root port doesn't have an error_detected callback.
Currently only failure result is printed, which is
not enough to determine which device leads to the
failure and the detailed failure reason.

Add print information if certain device under root
port has no error_detected callback.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 drivers/pci/pcie/err.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index b0e6048..ec37c33 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
 		 * error callbacks of "any" device in the subtree, and will
 		 * exit in the disconnected error state.
 		 */
-		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
+		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
 			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
+			pci_info(dev, "AER: Device has no error_detected callback\n");
+		}
 		else
 			vote = PCI_ERS_RESULT_NONE;
 	} else {
--
2.8.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Patch]PCI:AER:Notify which device has no error_detected callback
  2019-12-13 11:44 [Patch]PCI:AER:Notify which device has no error_detected callback Yicong Yang
@ 2019-12-13 22:57 ` Bjorn Helgaas
  2019-12-16 10:30   ` Yicong Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2019-12-13 22:57 UTC (permalink / raw)
  To: Yicong Yang; +Cc: linux-pci

On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote:
> The PCI error recovery will fail if any device under
> root port doesn't have an error_detected callback.
> Currently only failure result is printed, which is
> not enough to determine which device leads to the
> failure and the detailed failure reason.
> 
> Add print information if certain device under root
> port has no error_detected callback.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>

Applied to pci/aer for v5.6, thanks!

I also added a trivial follow-on patch to factor out the "AER: "
prefix (attached below).  This code is now used by DPC as well as AER,
so "AER: " might not be exactly the correct prefix, but I didn't try
to untangle that.

> ---
>  drivers/pci/pcie/err.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index b0e6048..ec37c33 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
>  		 * error callbacks of "any" device in the subtree, and will
>  		 * exit in the disconnected error state.
>  		 */
> -		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
> +		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> +			pci_info(dev, "AER: Device has no error_detected callback\n");
> +		}
>  		else
>  			vote = PCI_ERS_RESULT_NONE;
>  	} else {
> --
> 2.8.1
> 

commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Fri Dec 13 16:46:05 2019 -0600

    PCI/AER: Factor message prefixes with dev_fmt()
    
    Define dev_fmt() with the common prefix of log messages so we don't have to
    repeat it in every printk.  No functional change intended.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index abde5e000f5d..747ef8b41d1f 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -10,6 +10,8 @@
  *	Zhang Yanmin (yanmin.zhang@intel.com)
  */
 
+#define dev_fmt(fmt) "AER: " fmt
+
 #include <linux/pci.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
@@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev,
 		 */
 		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
 			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
-			pci_info(dev, "AER: Driver has no error_detected callback\n");
+			pci_info(dev, "driver has no error_detected callback\n");
 		} else {
 			vote = PCI_ERS_RESULT_NONE;
 		}
@@ -236,12 +238,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
 
 	pci_aer_clear_device_status(dev);
 	pci_cleanup_aer_uncorrect_error_status(dev);
-	pci_info(dev, "AER: Device recovery successful\n");
+	pci_info(dev, "device recovery successful\n");
 	return;
 
 failed:
 	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
 
 	/* TODO: Should kernel panic here? */
-	pci_info(dev, "AER: Device recovery failed\n");
+	pci_info(dev, "device recovery failed\n");
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Patch]PCI:AER:Notify which device has no error_detected callback
  2019-12-13 22:57 ` Bjorn Helgaas
@ 2019-12-16 10:30   ` Yicong Yang
  2019-12-16 17:58     ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Yicong Yang @ 2019-12-16 10:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci



On 2019/12/14 6:57, Bjorn Helgaas wrote:
> On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote:
>> The PCI error recovery will fail if any device under
>> root port doesn't have an error_detected callback.
>> Currently only failure result is printed, which is
>> not enough to determine which device leads to the
>> failure and the detailed failure reason.
>>
>> Add print information if certain device under root
>> port has no error_detected callback.
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> Applied to pci/aer for v5.6, thanks!
>
> I also added a trivial follow-on patch to factor out the "AER: "
> prefix (attached below).  This code is now used by DPC as well as AER,
> so "AER: " might not be exactly the correct prefix, but I didn't try
> to untangle that.
>
>> ---
>>  drivers/pci/pcie/err.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index b0e6048..ec37c33 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
>>  		 * error callbacks of "any" device in the subtree, and will
>>  		 * exit in the disconnected error state.
>>  		 */
>> -		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
>> +		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
>> +			pci_info(dev, "AER: Device has no error_detected callback\n");
>> +		}
>>  		else
>>  			vote = PCI_ERS_RESULT_NONE;
>>  	} else {
>> --
>> 2.8.1
>>
> commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Fri Dec 13 16:46:05 2019 -0600
>
>     PCI/AER: Factor message prefixes with dev_fmt()
>     
>     Define dev_fmt() with the common prefix of log messages so we don't have to
>     repeat it in every printk.  No functional change intended.
>     
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index abde5e000f5d..747ef8b41d1f 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -10,6 +10,8 @@
>   *	Zhang Yanmin (yanmin.zhang@intel.com)
>   */
>  
> +#define dev_fmt(fmt) "AER: " fmt
> +
>  #include <linux/pci.h>
>  #include <linux/module.h>
>  #include <linux/kernel.h>
> @@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev,
>  		 */
>  		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> -			pci_info(dev, "AER: Driver has no error_detected callback\n");
> +			pci_info(dev, "driver has no error_detected callback\n");

I think use "device" here is more proper. Sometimes the device is even not bind to the driver, which
will also lead to the recovery failure.


>  		} else {
>  			vote = PCI_ERS_RESULT_NONE;
>  		}
> @@ -236,12 +238,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>  
>  	pci_aer_clear_device_status(dev);
>  	pci_cleanup_aer_uncorrect_error_status(dev);
> -	pci_info(dev, "AER: Device recovery successful\n");
> +	pci_info(dev, "device recovery successful\n");
>  	return;
>  
>  failed:
>  	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
>  
>  	/* TODO: Should kernel panic here? */
> -	pci_info(dev, "AER: Device recovery failed\n");
> +	pci_info(dev, "device recovery failed\n");
>  }
>
> .
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch]PCI:AER:Notify which device has no error_detected callback
  2019-12-16 10:30   ` Yicong Yang
@ 2019-12-16 17:58     ` Bjorn Helgaas
  2019-12-16 18:28       ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2019-12-16 17:58 UTC (permalink / raw)
  To: Yicong Yang; +Cc: linux-pci, Alex Williamson

[+cc Alex]

On Mon, Dec 16, 2019 at 06:30:11PM +0800, Yicong Yang wrote:
> On 2019/12/14 6:57, Bjorn Helgaas wrote:
> > On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote:
> >> The PCI error recovery will fail if any device under
> >> root port doesn't have an error_detected callback.
> >> Currently only failure result is printed, which is
> >> not enough to determine which device leads to the
> >> failure and the detailed failure reason.
> >>
> >> Add print information if certain device under root
> >> port has no error_detected callback.
> >>
> >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> > Applied to pci/aer for v5.6, thanks!
> >
> > I also added a trivial follow-on patch to factor out the "AER: "
> > prefix (attached below).  This code is now used by DPC as well as AER,
> > so "AER: " might not be exactly the correct prefix, but I didn't try
> > to untangle that.
> >
> >> ---
> >>  drivers/pci/pcie/err.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> >> index b0e6048..ec37c33 100644
> >> --- a/drivers/pci/pcie/err.c
> >> +++ b/drivers/pci/pcie/err.c
> >> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
> >>  		 * error callbacks of "any" device in the subtree, and will
> >>  		 * exit in the disconnected error state.
> >>  		 */
> >> -		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
> >> +		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
> >>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> >> +			pci_info(dev, "AER: Device has no error_detected callback\n");
> >> +		}
> >>  		else
> >>  			vote = PCI_ERS_RESULT_NONE;
> >>  	} else {
> >> --
> >> 2.8.1
> >>
> > commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()")
> > Author: Bjorn Helgaas <bhelgaas@google.com>
> > Date:   Fri Dec 13 16:46:05 2019 -0600
> >
> >     PCI/AER: Factor message prefixes with dev_fmt()
> >     
> >     Define dev_fmt() with the common prefix of log messages so we don't have to
> >     repeat it in every printk.  No functional change intended.
> >     
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index abde5e000f5d..747ef8b41d1f 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -10,6 +10,8 @@
> >   *	Zhang Yanmin (yanmin.zhang@intel.com)
> >   */
> >  
> > +#define dev_fmt(fmt) "AER: " fmt
> > +
> >  #include <linux/pci.h>
> >  #include <linux/module.h>
> >  #include <linux/kernel.h>
> > @@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev,
> >  		 */
> >  		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
> >  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > -			pci_info(dev, "AER: Driver has no error_detected callback\n");
> > +			pci_info(dev, "driver has no error_detected callback\n");
> 
> I think use "device" here is more proper. Sometimes the device is
> even not bind to the driver, which will also lead to the recovery
> failure.

Hmmm, maybe so, although the error_detected callback is definitely a
property of the *driver*, not the device.  What would you think of
something like this?

  pci_info(dev, dev->driver ? "driver not capable of error recovery\n" :
                              "no driver\n");

I am curious about the larger question of why we fail if the device
has no driver.  I understand why we fail the recovery for a device
with a driver that has no err_handlers: we can't just reset the device
out from under the driver.

But why do we fail if a device has no driver at all?  Shouldn't it be
safe to reset such a device?  The commit log of 918b4053184c
("PCI/AER: Report success only when every device has AER-aware
driver") suggests that it has something to do with KVM and the
pci-stub driver, but I don't understand it.

> >  		} else {
> >  			vote = PCI_ERS_RESULT_NONE;
> >  		}
> > @@ -236,12 +238,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> >  
> >  	pci_aer_clear_device_status(dev);
> >  	pci_cleanup_aer_uncorrect_error_status(dev);
> > -	pci_info(dev, "AER: Device recovery successful\n");
> > +	pci_info(dev, "device recovery successful\n");
> >  	return;
> >  
> >  failed:
> >  	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> >  
> >  	/* TODO: Should kernel panic here? */
> > -	pci_info(dev, "AER: Device recovery failed\n");
> > +	pci_info(dev, "device recovery failed\n");
> >  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch]PCI:AER:Notify which device has no error_detected callback
  2019-12-16 17:58     ` Bjorn Helgaas
@ 2019-12-16 18:28       ` Alex Williamson
  2019-12-17 11:12         ` Yicong Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2019-12-16 18:28 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Yicong Yang, linux-pci

On Mon, 16 Dec 2019 11:58:32 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Alex]
> 
> On Mon, Dec 16, 2019 at 06:30:11PM +0800, Yicong Yang wrote:
> > On 2019/12/14 6:57, Bjorn Helgaas wrote:  
> > > On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote:  
> > >> The PCI error recovery will fail if any device under
> > >> root port doesn't have an error_detected callback.
> > >> Currently only failure result is printed, which is
> > >> not enough to determine which device leads to the
> > >> failure and the detailed failure reason.
> > >>
> > >> Add print information if certain device under root
> > >> port has no error_detected callback.
> > >>
> > >> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>  
> > > Applied to pci/aer for v5.6, thanks!
> > >
> > > I also added a trivial follow-on patch to factor out the "AER: "
> > > prefix (attached below).  This code is now used by DPC as well as AER,
> > > so "AER: " might not be exactly the correct prefix, but I didn't try
> > > to untangle that.
> > >  
> > >> ---
> > >>  drivers/pci/pcie/err.c | 4 +++-
> > >>  1 file changed, 3 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > >> index b0e6048..ec37c33 100644
> > >> --- a/drivers/pci/pcie/err.c
> > >> +++ b/drivers/pci/pcie/err.c
> > >> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
> > >>  		 * error callbacks of "any" device in the subtree, and will
> > >>  		 * exit in the disconnected error state.
> > >>  		 */
> > >> -		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
> > >> +		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
> > >>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > >> +			pci_info(dev, "AER: Device has no error_detected callback\n");
> > >> +		}
> > >>  		else
> > >>  			vote = PCI_ERS_RESULT_NONE;
> > >>  	} else {
> > >> --
> > >> 2.8.1
> > >>  
> > > commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()")
> > > Author: Bjorn Helgaas <bhelgaas@google.com>
> > > Date:   Fri Dec 13 16:46:05 2019 -0600
> > >
> > >     PCI/AER: Factor message prefixes with dev_fmt()
> > >     
> > >     Define dev_fmt() with the common prefix of log messages so we don't have to
> > >     repeat it in every printk.  No functional change intended.
> > >     
> > >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > >
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index abde5e000f5d..747ef8b41d1f 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -10,6 +10,8 @@
> > >   *	Zhang Yanmin (yanmin.zhang@intel.com)
> > >   */
> > >  
> > > +#define dev_fmt(fmt) "AER: " fmt
> > > +
> > >  #include <linux/pci.h>
> > >  #include <linux/module.h>
> > >  #include <linux/kernel.h>
> > > @@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev,
> > >  		 */
> > >  		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
> > >  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
> > > -			pci_info(dev, "AER: Driver has no error_detected callback\n");
> > > +			pci_info(dev, "driver has no error_detected callback\n");  
> > 
> > I think use "device" here is more proper. Sometimes the device is
> > even not bind to the driver, which will also lead to the recovery
> > failure.  
> 
> Hmmm, maybe so, although the error_detected callback is definitely a
> property of the *driver*, not the device.  What would you think of
> something like this?
> 
>   pci_info(dev, dev->driver ? "driver not capable of error recovery\n" :
>                               "no driver\n");
> 
> I am curious about the larger question of why we fail if the device
> has no driver.  I understand why we fail the recovery for a device
> with a driver that has no err_handlers: we can't just reset the device
> out from under the driver.
> 
> But why do we fail if a device has no driver at all?  Shouldn't it be
> safe to reset such a device?  The commit log of 918b4053184c
> ("PCI/AER: Report success only when every device has AER-aware
> driver") suggests that it has something to do with KVM and the
> pci-stub driver, but I don't understand it.

That commit is from 2012, when legacy KVM device assignment existed in
the kernel and was able to drive a device without really binding to it
as a driver.  Sane users of this mechanism would at least bind the
device to pci-stub to prevent it from being used by both a host driver
and KVM simultaneously.  In any case, legacy KVM device assignment no
longer exists in the kernel, so if that was the justification to not
reset driver-less devices, we can probably fill that gap now.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Patch]PCI:AER:Notify which device has no error_detected callback
  2019-12-16 18:28       ` Alex Williamson
@ 2019-12-17 11:12         ` Yicong Yang
  0 siblings, 0 replies; 6+ messages in thread
From: Yicong Yang @ 2019-12-17 11:12 UTC (permalink / raw)
  To: Alex Williamson, Bjorn Helgaas; +Cc: linux-pci

On 2019/12/17 2:28, Alex Williamson wrote:
> On Mon, 16 Dec 2019 11:58:32 -0600
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> [+cc Alex]
>>
>> On Mon, Dec 16, 2019 at 06:30:11PM +0800, Yicong Yang wrote:
>>> On 2019/12/14 6:57, Bjorn Helgaas wrote:  
>>>> On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote:  
>>>>> The PCI error recovery will fail if any device under
>>>>> root port doesn't have an error_detected callback.
>>>>> Currently only failure result is printed, which is
>>>>> not enough to determine which device leads to the
>>>>> failure and the detailed failure reason.
>>>>>
>>>>> Add print information if certain device under root
>>>>> port has no error_detected callback.
>>>>>
>>>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>  
>>>> Applied to pci/aer for v5.6, thanks!
>>>>
>>>> I also added a trivial follow-on patch to factor out the "AER: "
>>>> prefix (attached below).  This code is now used by DPC as well as AER,
>>>> so "AER: " might not be exactly the correct prefix, but I didn't try
>>>> to untangle that.
>>>>  
>>>>> ---
>>>>>  drivers/pci/pcie/err.c | 4 +++-
>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>>> index b0e6048..ec37c33 100644
>>>>> --- a/drivers/pci/pcie/err.c
>>>>> +++ b/drivers/pci/pcie/err.c
>>>>> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev,
>>>>>  		 * error callbacks of "any" device in the subtree, and will
>>>>>  		 * exit in the disconnected error state.
>>>>>  		 */
>>>>> -		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE)
>>>>> +		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>>>>>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
>>>>> +			pci_info(dev, "AER: Device has no error_detected callback\n");
>>>>> +		}
>>>>>  		else
>>>>>  			vote = PCI_ERS_RESULT_NONE;
>>>>>  	} else {
>>>>> --
>>>>> 2.8.1
>>>>>  
>>>> commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()")
>>>> Author: Bjorn Helgaas <bhelgaas@google.com>
>>>> Date:   Fri Dec 13 16:46:05 2019 -0600
>>>>
>>>>     PCI/AER: Factor message prefixes with dev_fmt()
>>>>     
>>>>     Define dev_fmt() with the common prefix of log messages so we don't have to
>>>>     repeat it in every printk.  No functional change intended.
>>>>     
>>>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>>>
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index abde5e000f5d..747ef8b41d1f 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -10,6 +10,8 @@
>>>>   *	Zhang Yanmin (yanmin.zhang@intel.com)
>>>>   */
>>>>  
>>>> +#define dev_fmt(fmt) "AER: " fmt
>>>> +
>>>>  #include <linux/pci.h>
>>>>  #include <linux/module.h>
>>>>  #include <linux/kernel.h>
>>>> @@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev,
>>>>  		 */
>>>>  		if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) {
>>>>  			vote = PCI_ERS_RESULT_NO_AER_DRIVER;
>>>> -			pci_info(dev, "AER: Driver has no error_detected callback\n");
>>>> +			pci_info(dev, "driver has no error_detected callback\n");  
>>> I think use "device" here is more proper. Sometimes the device is
>>> even not bind to the driver, which will also lead to the recovery
>>> failure.  
>> Hmmm, maybe so, although the error_detected callback is definitely a
>> property of the *driver*, not the device.  What would you think of
>> something like this?
>>
>>   pci_info(dev, dev->driver ? "driver not capable of error recovery\n" :
>>                               "no driver\n");

well, it seems like a better way. maybe :-)
    pci_info(dev, dev->driver ? "driver has no error_detected callback\n" :
                                                   "device has no driver\n")

>>
>> I am curious about the larger question of why we fail if the device
>> has no driver.  I understand why we fail the recovery for a device
>> with a driver that has no err_handlers: we can't just reset the device
>> out from under the driver.
>>
>> But why do we fail if a device has no driver at all?  Shouldn't it be
>> safe to reset such a device?  The commit log of 918b4053184c
>> ("PCI/AER: Report success only when every device has AER-aware
>> driver") suggests that it has something to do with KVM and the
>> pci-stub driver, but I don't understand it.
> That commit is from 2012, when legacy KVM device assignment existed in
> the kernel and was able to drive a device without really binding to it
> as a driver.  Sane users of this mechanism would at least bind the
> device to pci-stub to prevent it from being used by both a host driver
> and KVM simultaneously.  In any case, legacy KVM device assignment no
> longer exists in the kernel, so if that was the justification to not
> reset driver-less devices, we can probably fill that gap now.  Thanks,
>
> Alex
>

I think the reason why we fail if a device has no driver is the same as the driver has no
error_detected callback. Under both conditions, the next step process is uncertain.

If the error is from other devices, probably we can just ignore the unbind devices. But
I'm not certain whether errors will occur at devices with no drivers and results to ignore
them. If it's okay, maybe just ignore the unbind devices and return PCI_ERS_RESULT_NONE.
however, I tried to inject errors on an unbind device using aer_inject and the recovery
routine is invoked. is there real condition like this?
> .
>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-12-17 11:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-13 11:44 [Patch]PCI:AER:Notify which device has no error_detected callback Yicong Yang
2019-12-13 22:57 ` Bjorn Helgaas
2019-12-16 10:30   ` Yicong Yang
2019-12-16 17:58     ` Bjorn Helgaas
2019-12-16 18:28       ` Alex Williamson
2019-12-17 11:12         ` Yicong Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.