* pci_error_handlers for pci_host_bridge ?
@ 2018-06-06 12:27 Subrahmanya Lingappa
2018-06-06 12:50 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: Subrahmanya Lingappa @ 2018-06-06 12:27 UTC (permalink / raw)
To: linux-pci; +Cc: Bjorn Helgaas
Hi,
according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
as part of AER handling, struct pci_error_handlers{} is implemented by
endpoint drivers to handle device specific recovery steps for "struct
pci_driver".
But we have a platform_driver which implements "struct
pci_host_bridge" which also supports AER capability how can we support
pci_error_handlers() for the host bridge drivers ?
Thanks,
Subrahmanya.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-06 12:27 pci_error_handlers for pci_host_bridge ? Subrahmanya Lingappa
@ 2018-06-06 12:50 ` Bjorn Helgaas
2018-06-07 10:15 ` Subrahmanya Lingappa
0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2018-06-06 12:50 UTC (permalink / raw)
To: Subrahmanya Lingappa; +Cc: linux-pci, Bjorn Helgaas
Hi Subrahmanya,
On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
> Hi,
> according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>
> as part of AER handling, struct pci_error_handlers{} is implemented by
> endpoint drivers to handle device specific recovery steps for "struct
> pci_driver".
>
> But we have a platform_driver which implements "struct
> pci_host_bridge" which also supports AER capability how can we support
> pci_error_handlers() for the host bridge drivers ?
I assume you're referring to Mobiveil. Can you explain more of the
topology here? Can you also include "sudo lspci -vv" output?
The AER capability is an optional capability of PCIe device functions.
A host bridge is not itself a PCIe function; it's a bridge between a
platform-specific host bus and the PCIe bus.
Sometimes there is a PCI function that corresponds to the host bridge,
but that's not required by the PCI specs and there is no generic
programming model for it.
If you have an PCIe function corresponding to the Mobiveil host
bridge, and it has an AER capability, what would you want the error
handlers to do? This function would not normally be a Root Port or
other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
would integrate with the PCIe hierarchy.
Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-06 12:50 ` Bjorn Helgaas
@ 2018-06-07 10:15 ` Subrahmanya Lingappa
2018-06-07 10:50 ` poza
0 siblings, 1 reply; 9+ messages in thread
From: Subrahmanya Lingappa @ 2018-06-07 10:15 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-pci, Bjorn Helgaas
Bjorn,
On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> Hi Subrahmanya,
>
> On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
>> Hi,
>> according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>>
>> as part of AER handling, struct pci_error_handlers{} is implemented by
>> endpoint drivers to handle device specific recovery steps for "struct
>> pci_driver".
>>
>> But we have a platform_driver which implements "struct
>> pci_host_bridge" which also supports AER capability how can we support
>> pci_error_handlers() for the host bridge drivers ?
>
> I assume you're referring to Mobiveil. Can you explain more of the
> topology here? Can you also include "sudo lspci -vv" output?
>
Yes, it is for Mobiveil's Host bridge driver :
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
lspci output is now is not available, I'll try to get it sooner.
we have an endpoint connected directly to Rootport as follows
RC Rootport ----->BUS------> EP
> The AER capability is an optional capability of PCIe device functions.
> A host bridge is not itself a PCIe function; it's a bridge between a
> platform-specific host bus and the PCIe bus.
>
> Sometimes there is a PCI function that corresponds to the host bridge,
> but that's not required by the PCI specs and there is no generic
> programming model for it.
>
> If you have an PCIe function corresponding to the Mobiveil host
> bridge, and it has an AER capability, what would you want the error
> handlers to do? This function would not normally be a Root Port or
> other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
> would integrate with the PCIe hierarchy.
>
Yes we do have a PCI function with AER capability, after an AER reported by EP,
AER driver initiates an hot_reset on subordinate bus, which happens to
be downstream port
for RC. So we get a downstream port link down happens in this case RC
driver needs to follow
a specific register restore sequence, which is most of the HW specific
initialization done in probe function of the driver to recover
properly.
I was wondering if this can be handled by using AER error handlers, or
would suggest a better way to handle this ?
As of now plan is to handle this situation is by calling a minimal
probe recovery sequence after link down interrupt within the
driver interrupt service routine.
> Bjorn
Thanks,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-07 10:15 ` Subrahmanya Lingappa
@ 2018-06-07 10:50 ` poza
2018-06-07 13:59 ` Bjorn Helgaas
2018-06-09 10:08 ` Subrahmanya Lingappa
0 siblings, 2 replies; 9+ messages in thread
From: poza @ 2018-06-07 10:50 UTC (permalink / raw)
To: Subrahmanya Lingappa
Cc: Bjorn Helgaas, linux-pci, Bjorn Helgaas, linux-pci-owner
On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
> Bjorn,
>
> On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org>
> wrote:
>> Hi Subrahmanya,
>>
>> On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
>>> Hi,
>>> according to
>>> https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>>>
>>> as part of AER handling, struct pci_error_handlers{} is implemented
>>> by
>>> endpoint drivers to handle device specific recovery steps for "struct
>>> pci_driver".
>>>
>>> But we have a platform_driver which implements "struct
>>> pci_host_bridge" which also supports AER capability how can we
>>> support
>>> pci_error_handlers() for the host bridge drivers ?
>>
>> I assume you're referring to Mobiveil. Can you explain more of the
>> topology here? Can you also include "sudo lspci -vv" output?
>>
> Yes, it is for Mobiveil's Host bridge driver :
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
> lspci output is now is not available, I'll try to get it sooner.
>
> we have an endpoint connected directly to Rootport as follows
> RC Rootport ----->BUS------> EP
>
>> The AER capability is an optional capability of PCIe device functions.
>> A host bridge is not itself a PCIe function; it's a bridge between a
>> platform-specific host bus and the PCIe bus.
>>
>> Sometimes there is a PCI function that corresponds to the host bridge,
>> but that's not required by the PCI specs and there is no generic
>> programming model for it.
>>
>> If you have an PCIe function corresponding to the Mobiveil host
>> bridge, and it has an AER capability, what would you want the error
>> handlers to do? This function would not normally be a Root Port or
>> other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
>> would integrate with the PCIe hierarchy.
>>
> Yes we do have a PCI function with AER capability, after an AER
> reported by EP,
> AER driver initiates an hot_reset on subordinate bus, which happens to
> be downstream port
> for RC. So we get a downstream port link down happens in this case RC
> driver needs to follow
> a specific register restore sequence, which is most of the HW specific
> initialization done in probe function of the driver to recover
> properly.
Are you looking at something similar to pci_error_handlers to be called
for your RC driver ?
where probably you are expecting during ERR_NONFATAL recovery you would
want to restore some of your platform
specific registers. I dont think that support exists now. since
pci_error_handlers is of struct pci_driver
while yours is platform driver.
Although please also note that ERR_FATAL is no more handled with error
and recovery callbacks.
that are just going to be handled with removal, re-enumeration of the
devices.
but I suppose in any case you want to restore the registers in any type
of uncorrectable error.
although this is really platform specific, some sort of quirk I cna
think of, but again err.c has to check
that quirk's existence and calls platform specific callback
(that again I am not sure because such things do not exist with respect
to error/recovery callbacks)
Yeah just re-thinking, this is too specific, not to be addressed by
generic framework I think.
> I was wondering if this can be handled by using AER error handlers, or
> would suggest a better way to handle this ?
>
> As of now plan is to handle this situation is by calling a minimal
> probe recovery sequence after link down interrupt within the
> driver interrupt service routine.
Well I think that is a better place,
I was wondering why are you loosing registers at the first point ?
Is because of link down even you are loosing them ? some issue with hw !
Lets hear from Bjorn anyway, I am curious.
>
>> Bjorn
>
> Thanks,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-07 10:50 ` poza
@ 2018-06-07 13:59 ` Bjorn Helgaas
2018-06-08 11:57 ` Subrahmanya Lingappa
2018-06-09 10:08 ` Subrahmanya Lingappa
1 sibling, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2018-06-07 13:59 UTC (permalink / raw)
To: poza; +Cc: Subrahmanya Lingappa, linux-pci, Bjorn Helgaas, linux-pci-owner
On Thu, Jun 07, 2018 at 04:20:10PM +0530, poza@codeaurora.org wrote:
> On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
> > Bjorn,
> >
> > On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org>
> > wrote:
> > > Hi Subrahmanya,
> > >
> > > On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
> > > > Hi,
> > > > according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
> > > >
> > > > as part of AER handling, struct pci_error_handlers{} is
> > > > implemented by
> > > > endpoint drivers to handle device specific recovery steps for "struct
> > > > pci_driver".
> > > >
> > > > But we have a platform_driver which implements "struct
> > > > pci_host_bridge" which also supports AER capability how can we
> > > > support
> > > > pci_error_handlers() for the host bridge drivers ?
> > >
> > > I assume you're referring to Mobiveil. Can you explain more of the
> > > topology here? Can you also include "sudo lspci -vv" output?
> > >
> > Yes, it is for Mobiveil's Host bridge driver :
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
> > lspci output is now is not available, I'll try to get it sooner.
> >
> > we have an endpoint connected directly to Rootport as follows
> > RC Rootport ----->BUS------> EP
> >
> > > The AER capability is an optional capability of PCIe device functions.
> > > A host bridge is not itself a PCIe function; it's a bridge between a
> > > platform-specific host bus and the PCIe bus.
> > >
> > > Sometimes there is a PCI function that corresponds to the host bridge,
> > > but that's not required by the PCI specs and there is no generic
> > > programming model for it.
> > >
> > > If you have an PCIe function corresponding to the Mobiveil host
> > > bridge, and it has an AER capability, what would you want the error
> > > handlers to do? This function would not normally be a Root Port or
> > > other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
> > > would integrate with the PCIe hierarchy.
> > >
> > Yes we do have a PCI function with AER capability, after an AER
> > reported by EP, AER driver initiates an hot_reset on subordinate
> > bus, which happens to be downstream port for RC. So we get a
> > downstream port link down happens in this case RC driver needs to
> > follow a specific register restore sequence, which is most of the
> > HW specific initialization done in probe function of the driver
> > to recover properly.
I'm still not quite getting your point. Let's try a concrete example.
Assume this topology (this is what you described above):
00:00.0 Root Port to [bus 01]
01:00.0 Endpoint
Then we have this situation:
- AER driver binds to Root Port 00:00.0
- Endpoint 01:00.0 detects an error and sends ERR_* message upstream
- Root Port 00:00.0 receives ERR_* message
- Root Port 00:00.0 generates AER interrupt
- AER driver handles the interrupt and does a secondary bus reset on
Root Port 00:00.0. This resets the Endpoint (01:00.0) but not the
Root Port itself.
It sounds like you need to do an RC register restore sequence at this
point, i.e., something like mobiveil_host_init()? That sounds like a
hardware defect. Doing a secondary bus reset via the Root Port's
bridge control register should have no effect on the RC itself.
What am I missing?
> Are you looking at something similar to pci_error_handlers to be called for
> your RC driver ?
> where probably you are expecting during ERR_NONFATAL recovery you would want
> to restore some of your platform
> specific registers. I dont think that support exists now. since
> pci_error_handlers is of struct pci_driver
> while yours is platform driver.
>
> Although please also note that ERR_FATAL is no more handled with error and
> recovery callbacks.
> that are just going to be handled with removal, re-enumeration of the
> devices.
>
> but I suppose in any case you want to restore the registers in any type of
> uncorrectable error.
>
> although this is really platform specific, some sort of quirk I cna think
> of, but again err.c has to check
> that quirk's existence and calls platform specific callback
> (that again I am not sure because such things do not exist with respect to
> error/recovery callbacks)
>
> Yeah just re-thinking, this is too specific, not to be addressed by generic
> framework I think.
>
> > I was wondering if this can be handled by using AER error handlers, or
> > would suggest a better way to handle this ?
> >
> > As of now plan is to handle this situation is by calling a minimal
> > probe recovery sequence after link down interrupt within the
> > driver interrupt service routine.
>
> Well I think that is a better place,
> I was wondering why are you loosing registers at the first point ?
> Is because of link down even you are loosing them ? some issue with hw !
>
>
> Lets hear from Bjorn anyway, I am curious.
>
> >
> > > Bjorn
> >
> > Thanks,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-07 13:59 ` Bjorn Helgaas
@ 2018-06-08 11:57 ` Subrahmanya Lingappa
2018-06-08 12:36 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: Subrahmanya Lingappa @ 2018-06-08 11:57 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: poza, linux-pci, Bjorn Helgaas, linux-pci-owner
Bjorn,
On Thu, Jun 7, 2018 at 7:29 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Thu, Jun 07, 2018 at 04:20:10PM +0530, poza@codeaurora.org wrote:
>> On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
>> > Bjorn,
>> >
>> > On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org>
>> > wrote:
>> > > Hi Subrahmanya,
>> > >
>> > > On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
>> > > > Hi,
>> > > > according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>> > > >
>> > > > as part of AER handling, struct pci_error_handlers{} is
>> > > > implemented by
>> > > > endpoint drivers to handle device specific recovery steps for "struct
>> > > > pci_driver".
>> > > >
>> > > > But we have a platform_driver which implements "struct
>> > > > pci_host_bridge" which also supports AER capability how can we
>> > > > support
>> > > > pci_error_handlers() for the host bridge drivers ?
>> > >
>> > > I assume you're referring to Mobiveil. Can you explain more of the
>> > > topology here? Can you also include "sudo lspci -vv" output?
>> > >
>> > Yes, it is for Mobiveil's Host bridge driver :
>> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
>> > lspci output is now is not available, I'll try to get it sooner.
>> >
>> > we have an endpoint connected directly to Rootport as follows
>> > RC Rootport ----->BUS------> EP
>> >
>> > > The AER capability is an optional capability of PCIe device functions.
>> > > A host bridge is not itself a PCIe function; it's a bridge between a
>> > > platform-specific host bus and the PCIe bus.
>> > >
>> > > Sometimes there is a PCI function that corresponds to the host bridge,
>> > > but that's not required by the PCI specs and there is no generic
>> > > programming model for it.
>> > >
>> > > If you have an PCIe function corresponding to the Mobiveil host
>> > > bridge, and it has an AER capability, what would you want the error
>> > > handlers to do? This function would not normally be a Root Port or
>> > > other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
>> > > would integrate with the PCIe hierarchy.
>> > >
>> > Yes we do have a PCI function with AER capability, after an AER
>> > reported by EP, AER driver initiates an hot_reset on subordinate
>> > bus, which happens to be downstream port for RC. So we get a
>> > downstream port link down happens in this case RC driver needs to
>> > follow a specific register restore sequence, which is most of the
>> > HW specific initialization done in probe function of the driver
>> > to recover properly.
>
> I'm still not quite getting your point. Let's try a concrete example.
> Assume this topology (this is what you described above):
>
> 00:00.0 Root Port to [bus 01]
> 01:00.0 Endpoint
>
> Then we have this situation:
>
> - AER driver binds to Root Port 00:00.0
> - Endpoint 01:00.0 detects an error and sends ERR_* message upstream
> - Root Port 00:00.0 receives ERR_* message
> - Root Port 00:00.0 generates AER interrupt
> - AER driver handles the interrupt and does a secondary bus reset on
> Root Port 00:00.0. This resets the Endpoint (01:00.0) but not the
> Root Port itself.
>
> It sounds like you need to do an RC register restore sequence at this
> point, i.e., something like mobiveil_host_init()? That sounds like a
> hardware defect. Doing a secondary bus reset via the Root Port's
> bridge control register should have no effect on the RC itself.
>
> What am I missing?
Yes, in this situation PCI config registers are fine, but due to a bug
RC, config space registers will get reset
in the SW we need to call the mobiveil_host_init() again to recover in
case of directly connected EP to the Root-port.
We do get an local interrupt when this problem occurs, so this can be
handled in Driver's ISR;
or is there a better way to handle it ?
Thanks.
>
>> Are you looking at something similar to pci_error_handlers to be called for
>> your RC driver ?
>> where probably you are expecting during ERR_NONFATAL recovery you would want
>> to restore some of your platform
>> specific registers. I dont think that support exists now. since
>> pci_error_handlers is of struct pci_driver
>> while yours is platform driver.
>>
>> Although please also note that ERR_FATAL is no more handled with error and
>> recovery callbacks.
>> that are just going to be handled with removal, re-enumeration of the
>> devices.
>>
>> but I suppose in any case you want to restore the registers in any type of
>> uncorrectable error.
>>
>> although this is really platform specific, some sort of quirk I cna think
>> of, but again err.c has to check
>> that quirk's existence and calls platform specific callback
>> (that again I am not sure because such things do not exist with respect to
>> error/recovery callbacks)
>>
>> Yeah just re-thinking, this is too specific, not to be addressed by generic
>> framework I think.
>>
>> > I was wondering if this can be handled by using AER error handlers, or
>> > would suggest a better way to handle this ?
>> >
>> > As of now plan is to handle this situation is by calling a minimal
>> > probe recovery sequence after link down interrupt within the
>> > driver interrupt service routine.
>>
>> Well I think that is a better place,
>> I was wondering why are you loosing registers at the first point ?
>> Is because of link down even you are loosing them ? some issue with hw !
>>
>>
>> Lets hear from Bjorn anyway, I am curious.
>>
>> >
>> > > Bjorn
>> >
>> > Thanks,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-08 11:57 ` Subrahmanya Lingappa
@ 2018-06-08 12:36 ` Bjorn Helgaas
2018-06-09 10:10 ` Subrahmanya Lingappa
0 siblings, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2018-06-08 12:36 UTC (permalink / raw)
To: Subrahmanya Lingappa; +Cc: poza, linux-pci, Bjorn Helgaas, linux-pci-owner
On Fri, Jun 08, 2018 at 05:27:12PM +0530, Subrahmanya Lingappa wrote:
> Bjorn,
>
> On Thu, Jun 7, 2018 at 7:29 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Jun 07, 2018 at 04:20:10PM +0530, poza@codeaurora.org wrote:
> >> On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
> >> > Bjorn,
> >> >
> >> > On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org>
> >> > wrote:
> >> > > Hi Subrahmanya,
> >> > >
> >> > > On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
> >> > > > Hi,
> >> > > > according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
> >> > > >
> >> > > > as part of AER handling, struct pci_error_handlers{} is
> >> > > > implemented by
> >> > > > endpoint drivers to handle device specific recovery steps for "struct
> >> > > > pci_driver".
> >> > > >
> >> > > > But we have a platform_driver which implements "struct
> >> > > > pci_host_bridge" which also supports AER capability how can we
> >> > > > support
> >> > > > pci_error_handlers() for the host bridge drivers ?
> >> > >
> >> > > I assume you're referring to Mobiveil. Can you explain more of the
> >> > > topology here? Can you also include "sudo lspci -vv" output?
> >> > >
> >> > Yes, it is for Mobiveil's Host bridge driver :
> >> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
> >> > lspci output is now is not available, I'll try to get it sooner.
> >> >
> >> > we have an endpoint connected directly to Rootport as follows
> >> > RC Rootport ----->BUS------> EP
> >> >
> >> > > The AER capability is an optional capability of PCIe device functions.
> >> > > A host bridge is not itself a PCIe function; it's a bridge between a
> >> > > platform-specific host bus and the PCIe bus.
> >> > >
> >> > > Sometimes there is a PCI function that corresponds to the host bridge,
> >> > > but that's not required by the PCI specs and there is no generic
> >> > > programming model for it.
> >> > >
> >> > > If you have an PCIe function corresponding to the Mobiveil host
> >> > > bridge, and it has an AER capability, what would you want the error
> >> > > handlers to do? This function would not normally be a Root Port or
> >> > > other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
> >> > > would integrate with the PCIe hierarchy.
> >> > >
> >> > Yes we do have a PCI function with AER capability, after an AER
> >> > reported by EP, AER driver initiates an hot_reset on subordinate
> >> > bus, which happens to be downstream port for RC. So we get a
> >> > downstream port link down happens in this case RC driver needs to
> >> > follow a specific register restore sequence, which is most of the
> >> > HW specific initialization done in probe function of the driver
> >> > to recover properly.
> >
> > I'm still not quite getting your point. Let's try a concrete example.
> > Assume this topology (this is what you described above):
> >
> > 00:00.0 Root Port to [bus 01]
> > 01:00.0 Endpoint
> >
> > Then we have this situation:
> >
> > - AER driver binds to Root Port 00:00.0
> > - Endpoint 01:00.0 detects an error and sends ERR_* message upstream
> > - Root Port 00:00.0 receives ERR_* message
> > - Root Port 00:00.0 generates AER interrupt
> > - AER driver handles the interrupt and does a secondary bus reset on
> > Root Port 00:00.0. This resets the Endpoint (01:00.0) but not the
> > Root Port itself.
> >
> > It sounds like you need to do an RC register restore sequence at this
> > point, i.e., something like mobiveil_host_init()? That sounds like a
> > hardware defect. Doing a secondary bus reset via the Root Port's
> > bridge control register should have no effect on the RC itself.
> >
> > What am I missing?
> Yes, in this situation PCI config registers are fine, but due to a bug
> RC, config space registers will get reset
> in the SW we need to call the mobiveil_host_init() again to recover in
> case of directly connected EP to the Root-port.
>
> We do get an local interrupt when this problem occurs, so this can be
> handled in Driver's ISR;
> or is there a better way to handle it ?
I don't see a better way to handle it.
If the PCI core were doing something wrong here, e.g., missing some
restore required by hardware that follows the spec, we would of course
want a nice way to fix it in the core.
But this sounds like hardware that does not follow the spec, so the
workaround should be in the hardware-specific driver.
Note that this may be an issue if/when you want to support ACPI with
this hardware, because in that case we use the generic ACPI host
bridge driver (drivers/acpi/pci_root.c) and there is less opportunity
for device-specific workarounds like this.
Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-07 10:50 ` poza
2018-06-07 13:59 ` Bjorn Helgaas
@ 2018-06-09 10:08 ` Subrahmanya Lingappa
1 sibling, 0 replies; 9+ messages in thread
From: Subrahmanya Lingappa @ 2018-06-09 10:08 UTC (permalink / raw)
To: poza; +Cc: Bjorn Helgaas, linux-pci, Bjorn Helgaas, linux-pci-owner
Poza,
On Thu, Jun 7, 2018 at 4:20 PM, <poza@codeaurora.org> wrote:
>
> On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
>>
>> Bjorn,
>>
>> On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>
>>> Hi Subrahmanya,
>>>
>>> On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
>>>>
>>>> Hi,
>>>> according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>>>>
>>>> as part of AER handling, struct pci_error_handlers{} is implemented by
>>>> endpoint drivers to handle device specific recovery steps for "struct
>>>> pci_driver".
>>>>
>>>> But we have a platform_driver which implements "struct
>>>> pci_host_bridge" which also supports AER capability how can we support
>>>> pci_error_handlers() for the host bridge drivers ?
>>>
>>>
>>> I assume you're referring to Mobiveil. Can you explain more of the
>>> topology here? Can you also include "sudo lspci -vv" output?
>>>
>> Yes, it is for Mobiveil's Host bridge driver :
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
>> lspci output is now is not available, I'll try to get it sooner.
>>
>> we have an endpoint connected directly to Rootport as follows
>> RC Rootport ----->BUS------> EP
>>
>>> The AER capability is an optional capability of PCIe device functions.
>>> A host bridge is not itself a PCIe function; it's a bridge between a
>>> platform-specific host bus and the PCIe bus.
>>>
>>> Sometimes there is a PCI function that corresponds to the host bridge,
>>> but that's not required by the PCI specs and there is no generic
>>> programming model for it.
>>>
>>> If you have an PCIe function corresponding to the Mobiveil host
>>> bridge, and it has an AER capability, what would you want the error
>>> handlers to do? This function would not normally be a Root Port or
>>> other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
>>> would integrate with the PCIe hierarchy.
>>>
>> Yes we do have a PCI function with AER capability, after an AER reported by EP,
>> AER driver initiates an hot_reset on subordinate bus, which happens to
>> be downstream port
>> for RC. So we get a downstream port link down happens in this case RC
>> driver needs to follow
>> a specific register restore sequence, which is most of the HW specific
>> initialization done in probe function of the driver to recover
>> properly.
>
>
> Are you looking at something similar to pci_error_handlers to be called for your RC driver ?
> where probably you are expecting during ERR_NONFATAL recovery you would want to restore some of your platform
> specific registers. I dont think that support exists now. since pci_error_handlers is of struct pci_driver
> while yours is platform driver.
>
Yes, that's why I asked this question here to ask how do we handle in
case of a platform driver.
>
> Although please also note that ERR_FATAL is no more handled with error and recovery callbacks.
> that are just going to be handled with removal, re-enumeration of the devices.
>
> but I suppose in any case you want to restore the registers in any type of uncorrectable error.
>
yes
>
> although this is really platform specific, some sort of quirk I cna think of, but again err.c has to check
> that quirk's existence and calls platform specific callback
> (that again I am not sure because such things do not exist with respect to error/recovery callbacks)
>
can you point me to any other cases this scheme might have been implemented ?
>
> Yeah just re-thinking, this is too specific, not to be addressed by generic framework I think.
>
>> I was wondering if this can be handled by using AER error handlers, or
>> would suggest a better way to handle this ?
>>
>> As of now plan is to handle this situation is by calling a minimal
>> probe recovery sequence after link down interrupt within the
>> driver interrupt service routine.
>
>
> Well I think that is a better place,
> I was wondering why are you loosing registers at the first point ?
> Is because of link down even you are loosing them ? some issue with hw !
>
yes, due to AER FATAL error link reset happens on subordinate bus, in
this case since EP is directly connected to root port,
its a downstream port link down, in this case RC is designed to get
reset to its config registers, though PCI config space remains intact.
Thanks.
>
>
> Lets hear from Bjorn anyway, I am curious.
>
>>
>>> Bjorn
>>
>>
>> Thanks,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: pci_error_handlers for pci_host_bridge ?
2018-06-08 12:36 ` Bjorn Helgaas
@ 2018-06-09 10:10 ` Subrahmanya Lingappa
0 siblings, 0 replies; 9+ messages in thread
From: Subrahmanya Lingappa @ 2018-06-09 10:10 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: poza, linux-pci, Bjorn Helgaas, linux-pci-owner
Bjorn,
Understood;
Thanks for the insights.
For now we will fix with ISR patch to root bridge driver locally till
we get a proper Hardware fix.
Thanks.
On Fri, Jun 8, 2018 at 6:06 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Jun 08, 2018 at 05:27:12PM +0530, Subrahmanya Lingappa wrote:
>> Bjorn,
>>
>> On Thu, Jun 7, 2018 at 7:29 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > On Thu, Jun 07, 2018 at 04:20:10PM +0530, poza@codeaurora.org wrote:
>> >> On 2018-06-07 15:45, Subrahmanya Lingappa wrote:
>> >> > Bjorn,
>> >> >
>> >> > On Wed, Jun 6, 2018 at 6:20 PM, Bjorn Helgaas <helgaas@kernel.org>
>> >> > wrote:
>> >> > > Hi Subrahmanya,
>> >> > >
>> >> > > On Wed, Jun 06, 2018 at 05:57:17PM +0530, Subrahmanya Lingappa wrote:
>> >> > > > Hi,
>> >> > > > according to https://github.com/torvalds/linux/blob/master/Documentation/PCI/pci-error-recovery.txt
>> >> > > >
>> >> > > > as part of AER handling, struct pci_error_handlers{} is
>> >> > > > implemented by
>> >> > > > endpoint drivers to handle device specific recovery steps for "struct
>> >> > > > pci_driver".
>> >> > > >
>> >> > > > But we have a platform_driver which implements "struct
>> >> > > > pci_host_bridge" which also supports AER capability how can we
>> >> > > > support
>> >> > > > pci_error_handlers() for the host bridge drivers ?
>> >> > >
>> >> > > I assume you're referring to Mobiveil. Can you explain more of the
>> >> > > topology here? Can you also include "sudo lspci -vv" output?
>> >> > >
>> >> > Yes, it is for Mobiveil's Host bridge driver :
>> >> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/pci/host/pcie-mobiveil.c
>> >> > lspci output is now is not available, I'll try to get it sooner.
>> >> >
>> >> > we have an endpoint connected directly to Rootport as follows
>> >> > RC Rootport ----->BUS------> EP
>> >> >
>> >> > > The AER capability is an optional capability of PCIe device functions.
>> >> > > A host bridge is not itself a PCIe function; it's a bridge between a
>> >> > > platform-specific host bus and the PCIe bus.
>> >> > >
>> >> > > Sometimes there is a PCI function that corresponds to the host bridge,
>> >> > > but that's not required by the PCI specs and there is no generic
>> >> > > programming model for it.
>> >> > >
>> >> > > If you have an PCIe function corresponding to the Mobiveil host
>> >> > > bridge, and it has an AER capability, what would you want the error
>> >> > > handlers to do? This function would not normally be a Root Port or
>> >> > > other type 1 PCI-to-PCI bridge device, so it's not clear how its AER
>> >> > > would integrate with the PCIe hierarchy.
>> >> > >
>> >> > Yes we do have a PCI function with AER capability, after an AER
>> >> > reported by EP, AER driver initiates an hot_reset on subordinate
>> >> > bus, which happens to be downstream port for RC. So we get a
>> >> > downstream port link down happens in this case RC driver needs to
>> >> > follow a specific register restore sequence, which is most of the
>> >> > HW specific initialization done in probe function of the driver
>> >> > to recover properly.
>> >
>> > I'm still not quite getting your point. Let's try a concrete example.
>> > Assume this topology (this is what you described above):
>> >
>> > 00:00.0 Root Port to [bus 01]
>> > 01:00.0 Endpoint
>> >
>> > Then we have this situation:
>> >
>> > - AER driver binds to Root Port 00:00.0
>> > - Endpoint 01:00.0 detects an error and sends ERR_* message upstream
>> > - Root Port 00:00.0 receives ERR_* message
>> > - Root Port 00:00.0 generates AER interrupt
>> > - AER driver handles the interrupt and does a secondary bus reset on
>> > Root Port 00:00.0. This resets the Endpoint (01:00.0) but not the
>> > Root Port itself.
>> >
>> > It sounds like you need to do an RC register restore sequence at this
>> > point, i.e., something like mobiveil_host_init()? That sounds like a
>> > hardware defect. Doing a secondary bus reset via the Root Port's
>> > bridge control register should have no effect on the RC itself.
>> >
>> > What am I missing?
>> Yes, in this situation PCI config registers are fine, but due to a bug
>> RC, config space registers will get reset
>> in the SW we need to call the mobiveil_host_init() again to recover in
>> case of directly connected EP to the Root-port.
>>
>> We do get an local interrupt when this problem occurs, so this can be
>> handled in Driver's ISR;
>> or is there a better way to handle it ?
>
> I don't see a better way to handle it.
>
> If the PCI core were doing something wrong here, e.g., missing some
> restore required by hardware that follows the spec, we would of course
> want a nice way to fix it in the core.
>
> But this sounds like hardware that does not follow the spec, so the
> workaround should be in the hardware-specific driver.
>
> Note that this may be an issue if/when you want to support ACPI with
> this hardware, because in that case we use the generic ACPI host
> bridge driver (drivers/acpi/pci_root.c) and there is less opportunity
> for device-specific workarounds like this.
>
> Bjorn
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-06-09 10:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-06 12:27 pci_error_handlers for pci_host_bridge ? Subrahmanya Lingappa
2018-06-06 12:50 ` Bjorn Helgaas
2018-06-07 10:15 ` Subrahmanya Lingappa
2018-06-07 10:50 ` poza
2018-06-07 13:59 ` Bjorn Helgaas
2018-06-08 11:57 ` Subrahmanya Lingappa
2018-06-08 12:36 ` Bjorn Helgaas
2018-06-09 10:10 ` Subrahmanya Lingappa
2018-06-09 10:08 ` Subrahmanya Lingappa
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.