* [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
@ 2013-11-05 22:33 Rajat Jain
2013-11-06 0:25 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-05 22:33 UTC (permalink / raw)
To: linux-pci, linux-hotplug, Bjorn Helgaas, Kenji Kaneshige,
Yijing Wang
Cc: Guenter Roeck, Rajat Jain, Greg KH, Tom Nguyen, Kristen Accardi,
Rajat Jain
In case of a spurious "cmd completed", pcie_write_cmd() does not
clear it, but yet expects more "cmd completed" events to be generated.
This does not happen because the previous (spurious) event has not
been acknowledged. Fix that.
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
---
This is how I saw it in action: my controller does not implement any
hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
completed bit.
- During initialization,
pcie_disable_notification()
-> pcie_write_cmd()
-> writes to Slot control register
-> which causes PCI_EXP_SLTSTA_CC to get set, which is not
cleared, because IRQ is not generated (we just disabled
notifications).
- After some time,
pcie_enable_notification()
-> pcie_write_cmd()
-> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
-> Does not clear it, yet expects more command completed
events to be generated (never happens).
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-05 22:33 Rajat Jain
@ 2013-11-06 0:25 ` Bjorn Helgaas
2013-11-06 2:38 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2013-11-06 0:25 UTC (permalink / raw)
To: Rajat Jain
Cc: linux-pci, linux-hotplug, Kenji Kaneshige, Yijing Wang,
Guenter Roeck, Rajat Jain, Greg KH, Tom Nguyen, Kristen Accardi,
Rajat Jain
On Tue, Nov 05, 2013 at 02:33:28PM -0800, Rajat Jain wrote:
> In case of a spurious "cmd completed", pcie_write_cmd() does not
> clear it, but yet expects more "cmd completed" events to be generated.
> This does not happen because the previous (spurious) event has not
> been acknowledged. Fix that.
>
> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Guenter Roeck <groeck@juniper.net>
> ---
> This is how I saw it in action: my controller does not implement any
> hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
> completed bit.
> - During initialization,
> pcie_disable_notification()
> -> pcie_write_cmd()
> -> writes to Slot control register
> -> which causes PCI_EXP_SLTSTA_CC to get set, which is not
> cleared, because IRQ is not generated (we just disabled
> notifications).
> - After some time,
> pcie_enable_notification()
> -> pcie_write_cmd()
> -> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
> -> Does not clear it, yet expects more command completed
> events to be generated (never happens).
I'm not sure this "cmd completed" is actually spurious. Spec section
7.8.10 is very clear that any write to Slot Control must cause a
hot-plug command to be generated (if the port is hot-plug capable).
Can you collect "lspci -vv" output for your controller? I assume
you're hitting this case in pcie_init() (added by 5808639bfa98
("pciehp: fix slow probing")):
/*
* Controller doesn't notify of command completion if the "No
* Command Completed Support" bit is set in Slot Capability
* register or the controller supports none of power
* controller, attention led, power led and EMI.
*/
if (NO_CMD_CMPL(ctrl) ||
!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
ctrl->no_cmd_complete = 1;
and we're setting "no_cmd_complete = 1" for your controller, which
keeps us from waiting for completion in pcie_write_cmd().
I'm dubious about the assertion that a controller without power
control, attention LED, power LED, or EMI can't support command
completion. I don't see anything in the spec to that effect.
Since you're seeing PCI_EXP_SLTSTA_CC=1, your controller *should*
support Command Completion notification and PCI_EXP_SLTCAP_NCCS should
be 0 (per Table 7-20), so I wonder what happens on your system if you
change pcie_init() so it leaves "ctrl->no_cmd_complete = 0" instead?
Does it work correctly then?
I know we can't just drop the "!(POWER_CTRL(ctrl) | ...)" tests
because we don't want to reintroduce the problem fixed by
5808639bfa98, but I wonder if we can find a better fix that addresses
both problems.
Bjorn
>
> drivers/pci/hotplug/pciehp_hpc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 5b8d749..ba8e06f 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
> }
>
> if (slot_status & PCI_EXP_SLTSTA_CC) {
> + pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
> if (!ctrl->no_cmd_complete) {
> /*
> * After 1 sec and CMD_COMPLETED still not set, just
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-06 0:25 ` Bjorn Helgaas
@ 2013-11-06 2:38 ` Rajat Jain
2013-11-07 21:53 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-06 2:38 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-hotplug, Kenji Kaneshige, Yijing Wang,
Guenter Roeck, Rajat Jain, Greg KH, Tom Nguyen, Kristen Accardi,
Rajat Jain
Hello Bjorn,
On Tue, Nov 5, 2013 at 4:25 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Nov 05, 2013 at 02:33:28PM -0800, Rajat Jain wrote:
>> In case of a spurious "cmd completed", pcie_write_cmd() does not
>> clear it, but yet expects more "cmd completed" events to be generated.
>> This does not happen because the previous (spurious) event has not
>> been acknowledged. Fix that.
>>
>> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
>> Signed-off-by: Guenter Roeck <groeck@juniper.net>
>> ---
>> This is how I saw it in action: my controller does not implement any
>> hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
>> completed bit.
>> - During initialization,
>> pcie_disable_notification()
>> -> pcie_write_cmd()
>> -> writes to Slot control register
>> -> which causes PCI_EXP_SLTSTA_CC to get set, which is not
>> cleared, because IRQ is not generated (we just disabled
>> notifications).
>> - After some time,
>> pcie_enable_notification()
>> -> pcie_write_cmd()
>> -> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
>> -> Does not clear it, yet expects more command completed
>> events to be generated (never happens).
>
> I'm not sure this "cmd completed" is actually spurious. Spec section
> 7.8.10 is very clear that any write to Slot Control must cause a
> hot-plug command to be generated (if the port is hot-plug capable).
I agree, and I think I am witnessing a "genuine" command completion
(caused by disabling of notifications).
>
> Can you collect "lspci -vv" output for your controller?
>
Sure:
PCI bridge: Integrated Device Technology, Inc. Device 807a (rev 02)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=02, secondary=50, subordinate=5f, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: 8c000000-8fffffff
Prefetchable memory behind bridge: 00000000b0600000-00000000b07fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L0s L1, Latency
L0 <4us, L1 <4us
ClockPM- Surprise+ LLActRep+ BwNot+
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk-
DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
Slot #3, PowerLimit 0.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [c0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000ff041740 Data: 0003
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [200 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=4
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Port Arbitration Table <?>
Capabilities: [320 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl+ DirectTrans+
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
EgressCtrl- DirectTrans-
Capabilities: [330 v1] #12
Kernel driver in use: pcieport
In addition, here is what the pciehp driver spews out:
Hotplug Controller:
Seg/Bus/Dev/Func/IRQ : 0000:02:03.0 IRQ 21
Vendor ID : 0x111d
Device ID : 0x807a
Subsystem ID : 0x0000
Subsystem Vendor ID : 0x0000
PCIe Cap offset : 0x40
PCI resource [7] : [io 0x13000-0x13fff]
PCI resource [8] : [mem 0xc0c000000-0xc0fffffff]
PCI resource [9] : [mem 0xc30600000-0xc307fffff 64bit pref]
Slot Capabilities : 0x00180040
Physical Slot Number : 3
Attention Button : no
Power Controller : no
MRL Sensor : no
Attention Indicator : no
Power Indicator : no
Hot-Plug Surprise : no
EMI Present : no
Command Completed : yes
Slot Status : 0x0010
Slot Control : 0x0000
Link Active Reporting supported
HPC vendor_id 111d device_id 807a ss_vid 0 ss_did 0
Registering domain:bus:dev=0000:50:00 sun=3
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
The last two output lines are part of the problem. Each controller
takes 1 second to initialized as the message shows.
> I assume
> you're hitting this case in pcie_init() (added by 5808639bfa98
> ("pciehp: fix slow probing")):
>
> /*
> * Controller doesn't notify of command completion if the "No
> * Command Completed Support" bit is set in Slot Capability
> * register or the controller supports none of power
> * controller, attention led, power led and EMI.
> */
> if (NO_CMD_CMPL(ctrl) ||
> !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
> ctrl->no_cmd_complete = 1;
>
> and we're setting "no_cmd_complete = 1" for your controller, which
> keeps us from waiting for completion in pcie_write_cmd().
>
> I'm dubious about the assertion that a controller without power
> control, attention LED, power LED, or EMI can't support command
> completion. I don't see anything in the spec to that effect.
I agree, specially since I am seeing this contrary behavior (to that
assertion) in action :-)
>
> Since you're seeing PCI_EXP_SLTSTA_CC=1, your controller *should*
> support Command Completion notification and PCI_EXP_SLTCAP_NCCS should
> be 0 (per Table 7-20), so I wonder what happens on your system if you
> change pcie_init() so it leaves "ctrl->no_cmd_complete = 0" instead?
> Does it work correctly then?
Yes, it work well after that (Both the error output lines go away as
well). And the 1 second delay per controller is also gone.
>
> I know we can't just drop the "!(POWER_CTRL(ctrl) | ...)" tests
> because we don't want to reintroduce the problem fixed by
> 5808639bfa98, but I wonder if we can find a better fix that addresses
> both problems.
Please let me know if any one has any suggestions? IMHO, this patch
should yet not cause the original problem to reappear because we are
clearing the bit *only* when we know that it is already set (and no
way to clear it because at least in my controller no subsequent
interrupts can be generated). But I do not have enough understanding,
and will be glad to get any pointers.
Thanks,
- Rajat
>
> Bjorn
>
>>
>> drivers/pci/hotplug/pciehp_hpc.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> index 5b8d749..ba8e06f 100644
>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> @@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>> }
>>
>> if (slot_status & PCI_EXP_SLTSTA_CC) {
>> + pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
>> if (!ctrl->no_cmd_complete) {
>> /*
>> * After 1 sec and CMD_COMPLETED still not set, just
>> --
>> 1.7.9.5
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-06 2:38 ` Rajat Jain
@ 2013-11-07 21:53 ` Rajat Jain
2013-11-08 1:23 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-07 21:53 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-hotplug, Kenji Kaneshige, Yijing Wang,
Guenter Roeck, Greg KH, Tom Nguyen, Kristen Accardi, Rajat Jain,
Rajat Jain
Hi Bjorn / list,
I think there are two independent problems that need to be addressed.
Problem #1: In any condition where (e.g. spurious interrupt) once the
execution reaches inside the if condition below, the PCI_EXP_SLTSTA_CC
bit is already set, and needs to be cleared. This does not happen
today and hence this patch was aimed at solving that. Please note that
this will not cause any problems to the systems that were fixed by
5808639bfa98, because this is equally applicable to any case.
>>>
>>> drivers/pci/hotplug/pciehp_hpc.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>>> index 5b8d749..ba8e06f 100644
>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>>> }
>>>
>>> if (slot_status & PCI_EXP_SLTSTA_CC) {
>>> + pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
>>> if (!ctrl->no_cmd_complete) {
>>> /*
>>> * After 1 sec and CMD_COMPLETED still not set, just
>>> --
>>> 1.7.9.5
>>>
Problem #2: Looks like we have now 2 classes of systems:
a) The 5808639bfa98 was addressed towards systems that advertise the
"Command completion notification" capability, but do not notify of
command completion in case MRL, Power Ctrl, Attn LED and Pwr LED bits
are not written in SLOT_CTRL register (not implemented in
capabilities). Thus no command completion shall be generated for
pcie_disable_notifications() for e.g.
b) I have a system that where none of the above described HP elements
are present implemented, but the controller supports command
completion, and sends it out for every write of the slot control
register. Thus notification for command complete is generated for
pcie_disable_notifiction().
I don't believe there is a way to differentiate between these 2
classes of systems at init time, unless we try to generate a
notification and do or do not get one.
The PCIe specification section 7.8.10 seems to lean towards category
(b) of systems, but today this class of systems shall get penalized by
delay of 1 second (per controller) during probe. I can try and address
that, but admittedly I could not think of a better solution than
moving this code from pcie_init() to pcie_write_cmd().
> if (NO_CMD_CMPL(ctrl) ||
> !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
> ctrl->no_cmd_complete = 1;
Please advise.
Also, I'd appreciate if you could please let me know if there are any
comments on the patch for problem #1 above.
Thanks,
Rajat
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-07 21:53 ` Rajat Jain
@ 2013-11-08 1:23 ` Bjorn Helgaas
2013-11-08 17:30 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2013-11-08 1:23 UTC (permalink / raw)
To: Rajat Jain
Cc: linux-pci, linux-hotplug, Kenji Kaneshige, Yijing Wang,
Guenter Roeck, Greg KH, Tom Nguyen, Kristen Accardi, Rajat Jain,
Rajat Jain
On Thu, Nov 07, 2013 at 01:53:44PM -0800, Rajat Jain wrote:
> Hi Bjorn / list,
>
> I think there are two independent problems that need to be addressed.
>
> Problem #1: In any condition where (e.g. spurious interrupt) once the
> execution reaches inside the if condition below, the PCI_EXP_SLTSTA_CC
> bit is already set, and needs to be cleared. This does not happen
> today and hence this patch was aimed at solving that. Please note that
> this will not cause any problems to the systems that were fixed by
> 5808639bfa98, because this is equally applicable to any case.
Your patch might be the right thing to do, but something still niggles
at me, and I can't quite put my finger on it yet.
> >>>
> >>> drivers/pci/hotplug/pciehp_hpc.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> >>> index 5b8d749..ba8e06f 100644
> >>> --- a/drivers/pci/hotplug/pciehp_hpc.c
> >>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> >>> @@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
> >>> }
> >>>
> >>> if (slot_status & PCI_EXP_SLTSTA_CC) {
> >>> + pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
> >>> if (!ctrl->no_cmd_complete) {
> >>> /*
> >>> * After 1 sec and CMD_COMPLETED still not set, just
> >>> --
> >>> 1.7.9.5
> >>>
>
> Problem #2: Looks like we have now 2 classes of systems:
>
> a) The 5808639bfa98 was addressed towards systems that advertise the
> "Command completion notification" capability, but do not notify of
> command completion in case MRL, Power Ctrl, Attn LED and Pwr LED bits
> are not written in SLOT_CTRL register (not implemented in
> capabilities). Thus no command completion shall be generated for
> pcie_disable_notifications() for e.g.
This seems pretty clearly out of spec. I read the 82801H spec for the
part used in the system in bz 10751, and the documentation seems to
comply with the spec. Possibly the part doesn't work as advertised,
but I think it's more likely there's some subtlety we're still missing.
I'm a little bit surprised that we'd be using pciehp on that Thinkpad R61
instead of acpiphp. The kernel in bz 10751 is so ancient that it might
not be negotiating correctly for control of pciehp.
> b) I have a system that where none of the above described HP elements
> are present implemented, but the controller supports command
> completion, and sends it out for every write of the slot control
> register. Thus notification for command complete is generated for
> pcie_disable_notifiction().
>
> I don't believe there is a way to differentiate between these 2
> classes of systems at init time, unless we try to generate a
> notification and do or do not get one.
>
> The PCIe specification section 7.8.10 seems to lean towards category
> (b) of systems, but today this class of systems shall get penalized by
> delay of 1 second (per controller) during probe.
Where does this delay happen on your system? I tried to work through
the path but I don't see it yet. Here's what I expect to happen on your
system:
pciehp_probe
pcie_init
readl(SLTCAP)
if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI)) # true
ctrl->no_cmd_complete = 1 # set (condition above is true)
writew(SLTSTA, 0x1f) # clear ABP PFD MRLSC PDC CC
pcie_disable_notification
pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
readw(SLTSTA) # CC == 0 (was cleared above)
writew(SLTCTL)
...
# no waiting here because no_cmd_complete == 1
...
hardware sets SLTSTA.CC
...
pcie_init_notification
pcie_enable_notification
pcie_write_cmd
readw(SLTSTA) # CC == 1 (was set by HW above)
if (SLTSTA.CC) # true
if (!NO_CMD_CMPL)) # true
dbg("Unexpected CMD_COMPLETED. ...")
ctrl->no_cmd_complete = 0
writew(SLTCTL)
...
# this time we wait because no_cmd_complete == 0
...
pcie_wait_cmd(..., poll == 1) # now no_cmd_complete == 0
pcie_poll_cmd
readw(SLTSTA) # CC == 1 on first read
if (SLTSTA.CC)
writew(SLTSTA, CC)
return 1
I would think you'd see the "Unexpected CMD_COMPLETED. Need to wait for
command completed event" message (if enabled), but I don't see where the
one second timeout would happen. It looks like we would call
pcie_poll_cmd(), but it would return immediately because SLTSTA.CC is
already set.
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-08 1:23 ` Bjorn Helgaas
@ 2013-11-08 17:30 ` Rajat Jain
2013-11-08 23:15 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-08 17:30 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, linux-hotplug, Kenji Kaneshige, Yijing Wang,
Guenter Roeck, Greg KH, Tom Nguyen, Kristen Accardi, Rajat Jain,
Rajat Jain
Hi,
On Thu, Nov 7, 2013 at 5:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Nov 07, 2013 at 01:53:44PM -0800, Rajat Jain wrote:
>> Hi Bjorn / list,
>>
>> I think there are two independent problems that need to be addressed.
>>
>> Problem #1: In any condition where (e.g. spurious interrupt) once the
>> execution reaches inside the if condition below, the PCI_EXP_SLTSTA_CC
>> bit is already set, and needs to be cleared. This does not happen
>> today and hence this patch was aimed at solving that. Please note that
>> this will not cause any problems to the systems that were fixed by
>> 5808639bfa98, because this is equally applicable to any case.
>
> Your patch might be the right thing to do, but something still niggles
> at me, and I can't quite put my finger on it yet.
>
>> >>>
>> >>> drivers/pci/hotplug/pciehp_hpc.c | 1 +
>> >>> 1 file changed, 1 insertion(+)
>> >>>
>> >>> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
>> >>> index 5b8d749..ba8e06f 100644
>> >>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>> >>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>> >>> @@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>> >>> }
>> >>>
>> >>> if (slot_status & PCI_EXP_SLTSTA_CC) {
>> >>> + pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
>> >>> if (!ctrl->no_cmd_complete) {
>> >>> /*
>> >>> * After 1 sec and CMD_COMPLETED still not set, just
>> >>> --
>> >>> 1.7.9.5
>> >>>
>>
>> Problem #2: Looks like we have now 2 classes of systems:
>>
>> a) The 5808639bfa98 was addressed towards systems that advertise the
>> "Command completion notification" capability, but do not notify of
>> command completion in case MRL, Power Ctrl, Attn LED and Pwr LED bits
>> are not written in SLOT_CTRL register (not implemented in
>> capabilities). Thus no command completion shall be generated for
>> pcie_disable_notifications() for e.g.
>
> This seems pretty clearly out of spec. I read the 82801H spec for the
> part used in the system in bz 10751, and the documentation seems to
> comply with the spec. Possibly the part doesn't work as advertised,
> but I think it's more likely there's some subtlety we're still missing.
>
> I'm a little bit surprised that we'd be using pciehp on that Thinkpad R61
> instead of acpiphp. The kernel in bz 10751 is so ancient that it might
> not be negotiating correctly for control of pciehp.
>
>> b) I have a system that where none of the above described HP elements
>> are present implemented, but the controller supports command
>> completion, and sends it out for every write of the slot control
>> register. Thus notification for command complete is generated for
>> pcie_disable_notifiction().
>>
>> I don't believe there is a way to differentiate between these 2
>> classes of systems at init time, unless we try to generate a
>> notification and do or do not get one.
>>
>> The PCIe specification section 7.8.10 seems to lean towards category
>> (b) of systems, but today this class of systems shall get penalized by
>> delay of 1 second (per controller) during probe.
>
> Where does this delay happen on your system? I tried to work through
> the path but I don't see it yet. Here's what I expect to happen on your
> system:
>
> pciehp_probe
> pcie_init
> readl(SLTCAP)
> if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI)) # true
> ctrl->no_cmd_complete = 1 # set (condition above is true)
> writew(SLTSTA, 0x1f) # clear ABP PFD MRLSC PDC CC
> pcie_disable_notification
> pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
> readw(SLTSTA) # CC == 0 (was cleared above)
> writew(SLTCTL)
> ...
> # no waiting here because no_cmd_complete == 1
> ...
> hardware sets SLTSTA.CC
> ...
> pcie_init_notification
> pcie_enable_notification
> pcie_write_cmd
> readw(SLTSTA) # CC == 1 (was set by HW above)
> if (SLTSTA.CC) # true
> if (!NO_CMD_CMPL)) # true
> dbg("Unexpected CMD_COMPLETED. ...")
> ctrl->no_cmd_complete = 0
> writew(SLTCTL)
> ...
> # this time we wait because no_cmd_complete == 0
> ...
> pcie_wait_cmd(..., poll == 1) # now no_cmd_complete == 0
Actually pcie_wait_cmd() shall be called with poll=0. That is because
the following conditions are both false (we just enabled both the
interrupts while enabling notifications):
if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) ||
!(slot_ctrl & PCI_EXP_SLTCTL_CCIE))
poll = 1;
Thus it does not poll, but waits for interrupt. Which does not happen
because the CC bit in the slot status register was not cleared. As a
result, we get the following messages on the console for each
controller (and 1 second delay):
===========================
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
===========================
With my patch, we'd eliminate the 1 second delay and the second line
of output above. If we also want to get rid of the first line
"Unexpected CMD_COMPLETED..." also, that would probably have to be
solved by changing the behavior to assume that CC shall be generated
for every SLOTCTRL register write.
Please suggest.
Thanks,
Rajat
> pcie_poll_cmd
> readw(SLTSTA) # CC == 1 on first read
> if (SLTSTA.CC)
> writew(SLTSTA, CC)
> return 1
>
> I would think you'd see the "Unexpected CMD_COMPLETED. Need to wait for
> command completed event" message (if enabled),
> but I don't see where the
> one second timeout would happen. It looks like we would call
> pcie_poll_cmd(), but it would return immediately because SLTSTA.CC is
> already set.
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-08 17:30 ` Rajat Jain
@ 2013-11-08 23:15 ` Bjorn Helgaas
2013-11-11 21:26 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2013-11-08 23:15 UTC (permalink / raw)
To: Rajat Jain
Cc: linux-pci@vger.kernel.org, linux hotplug mailing, Kenji Kaneshige,
Yijing Wang, Guenter Roeck, Greg KH, Tom Nguyen, Kristen Accardi,
Rajat Jain, Rajat Jain
On Fri, Nov 8, 2013 at 10:30 AM, Rajat Jain <rajatjain.linux@gmail.com> wrote:
> On Thu, Nov 7, 2013 at 5:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Where does this delay happen on your system? I tried to work through
>> the path but I don't see it yet. Here's what I expect to happen on your
>> system:
>>
>> pciehp_probe
>> pcie_init
>> readl(SLTCAP)
>> if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI)) # true
>> ctrl->no_cmd_complete = 1 # set (condition above is true)
>> writew(SLTSTA, 0x1f) # clear ABP PFD MRLSC PDC CC
>> pcie_disable_notification
>> pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
>> readw(SLTSTA) # CC == 0 (was cleared above)
>> writew(SLTCTL)
>> ...
>> # no waiting here because no_cmd_complete == 1
>> ...
>> hardware sets SLTSTA.CC
>> ...
>> pcie_init_notification
>> pcie_enable_notification
>> pcie_write_cmd
>> readw(SLTSTA) # CC == 1 (was set by HW above)
>> if (SLTSTA.CC) # true
>> if (!NO_CMD_CMPL)) # true
>> dbg("Unexpected CMD_COMPLETED. ...")
>> ctrl->no_cmd_complete = 0
>> writew(SLTCTL)
>> ...
>> # this time we wait because no_cmd_complete == 0
>> ...
>> pcie_wait_cmd(..., poll == 1) # now no_cmd_complete == 0
>
> Actually pcie_wait_cmd() shall be called with poll=0. That is because
> the following conditions are both false (we just enabled both the
> interrupts while enabling notifications):
>
> if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) ||
> !(slot_ctrl & PCI_EXP_SLTCTL_CCIE))
> poll = 1;
>
> Thus it does not poll, but waits for interrupt. Which does not happen
> because the CC bit in the slot status register was not cleared.
Right, of course. Thanks for pointing out my error.
> As a
> result, we get the following messages on the console for each
> controller (and 1 second delay):
> ===========================
> Unexpected CMD_COMPLETED. Need to wait for command completed event.
> Command not completed in 1000 msec
> ===========================
> With my patch, we'd eliminate the 1 second delay and the second line
> of output above. If we also want to get rid of the first line
> "Unexpected CMD_COMPLETED..." also, that would probably have to be
> solved by changing the behavior to assume that CC shall be generated
> for every SLOTCTRL register write.
I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
We shouldn't complain about hardware that is working perfectly fine.
I don't know the best way to do that yet. I have found a box with the
same hardware that was fixed by 5808639bfa98, so I hope to play with
it and figure out something that will work nicely for both scenarios.
BTW, can you open a report at bugzilla.kernel.org and attach your
"lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
like to have the details archived somewhere.
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-08 23:15 ` Bjorn Helgaas
@ 2013-11-11 21:26 ` Rajat Jain
2013-11-23 0:51 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-11 21:26 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci@vger.kernel.org, linux hotplug mailing, Kenji Kaneshige,
Yijing Wang, Greg KH, Tom Nguyen, Kristen Accardi, Rajat Jain,
Rajat Jain, Guenter Roeck
Hello,
>> With my patch, we'd eliminate the 1 second delay and the second line
>> of output above. If we also want to get rid of the first line
>> "Unexpected CMD_COMPLETED..." also, that would probably have to be
>> solved by changing the behavior to assume that CC shall be generated
>> for every SLOTCTRL register write.
>
> I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
> We shouldn't complain about hardware that is working perfectly fine.
> I don't know the best way to do that yet. I have found a box with the
> same hardware that was fixed by 5808639bfa98, so I hope to play with
> it and figure out something that will work nicely for both scenarios.
Please keep posted :-)
>
> BTW, can you open a report at bugzilla.kernel.org and attach your
> "lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
> like to have the details archived somewhere.
Done:
https://bugzilla.kernel.org/show_bug.cgi?id=64821
Please let me know if you need any help in trying something out - I'd
be more than keen to help - write code or test.
BTW, I figured another way that solves the current problem at hand
(for both kind of controllers). Today the events are cleared just
before call to pcie_disable_notification, if we clear them afterwards
also, the current problem is taken care of for all controllers. Of
course this does not look the right way to solve the problem though
(since the ctrl->no_cmd_completed shall still be 1, which is not
right).
@@ -948,6 +948,10 @@ struct controller *pcie_init(struct pcie_device *dev)
/* Clear all remaining event bits in Slot Status register */
if (pciehp_writew(ctrl, PCI_EXP_SLTSTA, 0x1f))
goto abort_ctrl;
/* Disable sotfware notification */
pcie_disable_notification(ctrl);
+ /* Clear all remaining event bits in Slot Status register */
+ if (pciehp_writew(ctrl, PCI_EXP_SLTSTA, 0x1f))
+ goto abort_ctrl;
+
ctrl_info(ctrl, "HPC vendor_id %x device_id %x ss_vid %x ss_did %x\n",
pdev->vendor, pdev->device, pdev->subsystem_vendor,
pdev->subsystem_device);
Thanks,
Rajat
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-11 21:26 ` Rajat Jain
@ 2013-11-23 0:51 ` Bjorn Helgaas
2013-11-23 1:59 ` Guenter Roeck
2013-11-23 14:56 ` Rajat Jain
0 siblings, 2 replies; 20+ messages in thread
From: Bjorn Helgaas @ 2013-11-23 0:51 UTC (permalink / raw)
To: Rajat Jain
Cc: linux-pci@vger.kernel.org, linux hotplug mailing, Kenji Kaneshige,
Yijing Wang, Greg KH, Tom Nguyen, Kristen Accardi, Rajat Jain,
Rajat Jain, Guenter Roeck
On Mon, Nov 11, 2013 at 01:26:06PM -0800, Rajat Jain wrote:
> Hello,
>
> >> With my patch, we'd eliminate the 1 second delay and the second line
> >> of output above. If we also want to get rid of the first line
> >> "Unexpected CMD_COMPLETED..." also, that would probably have to be
> >> solved by changing the behavior to assume that CC shall be generated
> >> for every SLOTCTRL register write.
> >
> > I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
> > We shouldn't complain about hardware that is working perfectly fine.
> > I don't know the best way to do that yet. I have found a box with the
> > same hardware that was fixed by 5808639bfa98, so I hope to play with
> > it and figure out something that will work nicely for both scenarios.
>
> Please keep posted :-)
>
> >
> > BTW, can you open a report at bugzilla.kernel.org and attach your
> > "lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
> > like to have the details archived somewhere.
>
> Done:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=64821
>
> Please let me know if you need any help in trying something out - I'd
> be more than keen to help - write code or test.
What do you think of the patch below? I'm afraid we'll trip over
a few other old parts similar to the 82801H, but I'd rather do that
than penalize the parts that work correctly.
PCI: pciehp: Support command completion even with no hotplug hardware
From: Bjorn Helgaas <bhelgaas@google.com>
Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
problem on hardware that doesn't conform to the spec, but caused a
similar problem for hardware that *does* conform to the spec.
Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control generates a
hot-plug command. Ports that can accept new commands with no delay can set
the "No Command Completed Support" bit. Otherwise the port must indicate
its completion of the command and readiness to accept a new command with a
"command completed event."
5808639bfa98 assumes ports that lack a power controller, power indicator,
attention indicator, and interlock will not generate completion events,
even if they neglect to set "No Command Completed Support." But on ports
that lack those elements and *do* support command completion notification,
it causes:
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
and forces us to wait for a 1 second timeout.
This patch makes the 5808639bfa98 workaround into a quirk that's applied
only to devices known to be broken, currently just Intel 82801H ports.
There are probably other similarly-broken devices that may now probe
slowly, but I don't know how to catch them all without penalizing the
ones that play by the rules.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
Reported-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3eea3fdd4b0b..2fd2bd59e07f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller *ctrl)
ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
}
+static int pciehp_no_command_complete(struct controller *ctrl)
+{
+ struct pcie_device *dev = ctrl->pcie;
+ struct pci_dev *pdev = dev->port;
+ u16 vendor, device;
+
+ if (NO_CMD_CMPL(ctrl))
+ return 1;
+
+ /*
+ * Controller should notify on command completion unless the "No
+ * Command Completed Support" bit is set. But some hardware does
+ * not. See https://bugzilla.kernel.org/show_bug.cgi?id=10751
+ */
+ if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) {
+ vendor = pdev->vendor;
+ device = pdev->device;
+ if (vendor == PCI_VENDOR_ID_INTEL &&
+ device >= 0x283f && device <= 0x2849) {
+ dev_info(&dev->device, "device [%04x:%04x] does not notify on hotplug command completion\n",
+ vendor, device);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device *dev)
mutex_init(&ctrl->ctrl_lock);
init_waitqueue_head(&ctrl->queue);
dbg_ctrl(ctrl);
- /*
- * Controller doesn't notify of command completion if the "No
- * Command Completed Support" bit is set in Slot Capability
- * register or the controller supports none of power
- * controller, attention led, power led and EMI.
- */
- if (NO_CMD_CMPL(ctrl) ||
- !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
- ctrl->no_cmd_complete = 1;
+
+ ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
/* Check if Data Link Layer Link Active Reporting is implemented */
if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-23 0:51 ` Bjorn Helgaas
@ 2013-11-23 1:59 ` Guenter Roeck
2013-11-23 14:56 ` Rajat Jain
1 sibling, 0 replies; 20+ messages in thread
From: Guenter Roeck @ 2013-11-23 1:59 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Rajat Jain, linux-pci@vger.kernel.org, linux hotplug mailing,
Kenji Kaneshige, Yijing Wang, Greg KH, Tom Nguyen,
Kristen Accardi, Rajat Jain, Rajat Jain, Guenter Roeck
On Fri, Nov 22, 2013 at 05:51:02PM -0700, Bjorn Helgaas wrote:
> On Mon, Nov 11, 2013 at 01:26:06PM -0800, Rajat Jain wrote:
> > Hello,
> >
> > >> With my patch, we'd eliminate the 1 second delay and the second line
> > >> of output above. If we also want to get rid of the first line
> > >> "Unexpected CMD_COMPLETED..." also, that would probably have to be
> > >> solved by changing the behavior to assume that CC shall be generated
> > >> for every SLOTCTRL register write.
> > >
> > > I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
> > > We shouldn't complain about hardware that is working perfectly fine.
> > > I don't know the best way to do that yet. I have found a box with the
> > > same hardware that was fixed by 5808639bfa98, so I hope to play with
> > > it and figure out something that will work nicely for both scenarios.
> >
> > Please keep posted :-)
> >
> > >
> > > BTW, can you open a report at bugzilla.kernel.org and attach your
> > > "lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
> > > like to have the details archived somewhere.
> >
> > Done:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=64821
> >
> > Please let me know if you need any help in trying something out - I'd
> > be more than keen to help - write code or test.
>
> What do you think of the patch below? I'm afraid we'll trip over
> a few other old parts similar to the 82801H, but I'd rather do that
> than penalize the parts that work correctly.
>
Works nicely as far as I can see. Tested on P2020 and P5040 based systems
with IDT 89HPES48H12G2.
Tested-by: Guenter Roeck <groeck@juniper.net>
Guenter
>
> PCI: pciehp: Support command completion even with no hotplug hardware
>
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
> problem on hardware that doesn't conform to the spec, but caused a
> similar problem for hardware that *does* conform to the spec.
>
> Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control generates a
> hot-plug command. Ports that can accept new commands with no delay can set
> the "No Command Completed Support" bit. Otherwise the port must indicate
> its completion of the command and readiness to accept a new command with a
> "command completed event."
>
> 5808639bfa98 assumes ports that lack a power controller, power indicator,
> attention indicator, and interlock will not generate completion events,
> even if they neglect to set "No Command Completed Support." But on ports
> that lack those elements and *do* support command completion notification,
> it causes:
>
> Unexpected CMD_COMPLETED. Need to wait for command completed event.
> Command not completed in 1000 msec
>
> and forces us to wait for a 1 second timeout.
>
> This patch makes the 5808639bfa98 workaround into a quirk that's applied
> only to devices known to be broken, currently just Intel 82801H ports.
> There are probably other similarly-broken devices that may now probe
> slowly, but I don't know how to catch them all without penalizing the
> ones that play by the rules.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
> Reported-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++++---------
> 1 file changed, 30 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 3eea3fdd4b0b..2fd2bd59e07f 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller *ctrl)
> ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
> }
>
> +static int pciehp_no_command_complete(struct controller *ctrl)
> +{
> + struct pcie_device *dev = ctrl->pcie;
> + struct pci_dev *pdev = dev->port;
> + u16 vendor, device;
> +
> + if (NO_CMD_CMPL(ctrl))
> + return 1;
> +
> + /*
> + * Controller should notify on command completion unless the "No
> + * Command Completed Support" bit is set. But some hardware does
> + * not. See https://bugzilla.kernel.org/show_bug.cgi?id=10751
> + */
> + if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) {
> + vendor = pdev->vendor;
> + device = pdev->device;
> + if (vendor == PCI_VENDOR_ID_INTEL &&
> + device >= 0x283f && device <= 0x2849) {
> + dev_info(&dev->device, "device [%04x:%04x] does not notify on hotplug command completion\n",
> + vendor, device);
> + return 1;
> + }
> + }
> +
> + return 0;
> +}
> +
> struct controller *pcie_init(struct pcie_device *dev)
> {
> struct controller *ctrl;
> @@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device *dev)
> mutex_init(&ctrl->ctrl_lock);
> init_waitqueue_head(&ctrl->queue);
> dbg_ctrl(ctrl);
> - /*
> - * Controller doesn't notify of command completion if the "No
> - * Command Completed Support" bit is set in Slot Capability
> - * register or the controller supports none of power
> - * controller, attention led, power led and EMI.
> - */
> - if (NO_CMD_CMPL(ctrl) ||
> - !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
> - ctrl->no_cmd_complete = 1;
> +
> + ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
>
> /* Check if Data Link Layer Link Active Reporting is implemented */
> if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-23 0:51 ` Bjorn Helgaas
2013-11-23 1:59 ` Guenter Roeck
@ 2013-11-23 14:56 ` Rajat Jain
2013-11-23 19:32 ` Bjorn Helgaas
1 sibling, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-23 14:56 UTC (permalink / raw)
To: Bjorn Helgaas, Rajat Jain
Cc: linux-pci@vger.kernel.org, linux hotplug mailing, Kenji Kaneshige,
Yijing Wang, Greg KH, Tom Nguyen, Kristen Accardi, Rajat Jain,
Guenter Roeck
Hello Bjorn,
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=64821
> >
> > Please let me know if you need any help in trying something out - I'd
> > be more than keen to help - write code or test.
>
> What do you think of the patch below? I'm afraid we'll trip over a few
> other old parts similar to the 82801H, but I'd rather do that than
> penalize the parts that work correctly.
Tested and works fine. Looks good (on a side note, we are introducing device specific check in a common code, but I do not think there is any other cleaner way). I think this patch should be applied as it solves the problem of slow probing.
On a different note, I feel there is still a need to apply my original patch. There is still an open problem in case of spurious interrupts (or in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)" becomes true in pcie_write_cmd()). That is because once that happens, we never clear that interrupt, and no further hotplug interrupts shall be received unless we do that.
Thanks,
Rajat
>
>
> PCI: pciehp: Support command completion even with no hotplug hardware
>
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
> problem on hardware that doesn't conform to the spec, but caused a
> similar problem for hardware that *does* conform to the spec.
>
> Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control
> generates a hot-plug command. Ports that can accept new commands with
> no delay can set the "No Command Completed Support" bit. Otherwise the
> port must indicate its completion of the command and readiness to accept
> a new command with a "command completed event."
>
> 5808639bfa98 assumes ports that lack a power controller, power
> indicator, attention indicator, and interlock will not generate
> completion events, even if they neglect to set "No Command Completed
> Support." But on ports that lack those elements and *do* support command
> completion notification, it causes:
>
> Unexpected CMD_COMPLETED. Need to wait for command completed event.
> Command not completed in 1000 msec
>
> and forces us to wait for a 1 second timeout.
>
> This patch makes the 5808639bfa98 workaround into a quirk that's applied
> only to devices known to be broken, currently just Intel 82801H ports.
> There are probably other similarly-broken devices that may now probe
> slowly, but I don't know how to catch them all without penalizing the
> ones that play by the rules.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
> Reported-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++++--
> -------
> 1 file changed, 30 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c
> b/drivers/pci/hotplug/pciehp_hpc.c
> index 3eea3fdd4b0b..2fd2bd59e07f 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller
> *ctrl)
> ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
> }
>
> +static int pciehp_no_command_complete(struct controller *ctrl) {
> + struct pcie_device *dev = ctrl->pcie;
> + struct pci_dev *pdev = dev->port;
> + u16 vendor, device;
> +
> + if (NO_CMD_CMPL(ctrl))
> + return 1;
> +
> + /*
> + * Controller should notify on command completion unless the "No
> + * Command Completed Support" bit is set. But some hardware does
> + * not. See https://bugzilla.kernel.org/show_bug.cgi?id=10751
> + */
> + if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) |
> EMI(ctrl))) {
> + vendor = pdev->vendor;
> + device = pdev->device;
> + if (vendor == PCI_VENDOR_ID_INTEL &&
> + device >= 0x283f && device <= 0x2849) {
> + dev_info(&dev->device, "device [%04x:%04x] does not
> notify on hotplug command completion\n",
> + vendor, device);
> + return 1;
> + }
> + }
> +
> + return 0;
> +}
> +
> struct controller *pcie_init(struct pcie_device *dev) {
> struct controller *ctrl;
> @@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device
> *dev)
> mutex_init(&ctrl->ctrl_lock);
> init_waitqueue_head(&ctrl->queue);
> dbg_ctrl(ctrl);
> - /*
> - * Controller doesn't notify of command completion if the "No
> - * Command Completed Support" bit is set in Slot Capability
> - * register or the controller supports none of power
> - * controller, attention led, power led and EMI.
> - */
> - if (NO_CMD_CMPL(ctrl) ||
> - !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) |
> EMI(ctrl)))
> - ctrl->no_cmd_complete = 1;
> +
> + ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
>
> /* Check if Data Link Layer Link Active Reporting is
> implemented */
> if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-23 14:56 ` Rajat Jain
@ 2013-11-23 19:32 ` Bjorn Helgaas
2013-11-25 19:03 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2013-11-23 19:32 UTC (permalink / raw)
To: Rajat Jain
Cc: Rajat Jain, linux-pci@vger.kernel.org, linux hotplug mailing,
Kenji Kaneshige, Yijing Wang, Greg KH, Tom Nguyen,
Kristen Accardi, Rajat Jain, Guenter Roeck
On Sat, Nov 23, 2013 at 7:56 AM, Rajat Jain <rajatjain@juniper.net> wrote:
> Hello Bjorn,
>
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=64821
>> >
>> > Please let me know if you need any help in trying something out - I'd
>> > be more than keen to help - write code or test.
>>
>> What do you think of the patch below? I'm afraid we'll trip over a few
>> other old parts similar to the 82801H, but I'd rather do that than
>> penalize the parts that work correctly.
>
> Tested and works fine. Looks good (on a side note, we are introducing device specific check in a common code, but I do not think there is any other cleaner way). I think this patch should be applied as it solves the problem of slow probing.
Thanks for testing. I don't like the device-specific code either, but
it does seem like a device-specific defect, and I didn't have any
better ideas.
> On a different note, I feel there is still a need to apply my original patch. There is still an open problem in case of spurious interrupts (or in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)" becomes true in pcie_write_cmd()). That is because once that happens, we never clear that interrupt, and no further hotplug interrupts shall be received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
restructuring rather than apply a point fix. For example:
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better to
look at it only in pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll by
calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is backwards.
Currently we issue the command, then wait for it to complete. I
think we should issue the command, note the current time, and return
without waiting. The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if anybody
can actually produce the scenario you mention.
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-23 19:32 ` Bjorn Helgaas
@ 2013-11-25 19:03 ` Rajat Jain
2014-02-12 0:34 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2013-11-25 19:03 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Rajat Jain, linux-pci@vger.kernel.org, linux hotplug mailing,
Kenji Kaneshige, Yijing Wang, Greg KH, Tom Nguyen,
Kristen Accardi, Rajat Jain, Guenter Roeck
Hello,
> > On a different note, I feel there is still a need to apply my original
> patch. There is still an open problem in case of spurious interrupts (or
> in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)"
> becomes true in pcie_write_cmd()). That is because once that happens, we
> never clear that interrupt, and no further hotplug interrupts shall be
> received unless we do that.
>
> I agree this is an issue and we should address it somehow. My
> hesitation is just that I'd prefer to do some more aggressive
> restructuring rather than apply a point fix. For example:
OK, I'll attempt to fix it that way when I get time.
>
> - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(), pcie_poll_cmd(),
> and pcie_write_cmd(). I think it would be better to look at it only in
> pcie_isr().
>
> - I don't think pcie_poll_cmd() should exist at all; we should poll by
> calling pcie_isr() instead.
>
> - We need pcie_write_cmd(), but I think the way it waits is backwards.
> Currently we issue the command, then wait for it to complete. I think
> we should issue the command, note the current time, and return without
> waiting. The *next* time we need to issue a command, we can wait for
> completion of the previous one (or timeout) if necessary.
>
> But maybe we need the point fix in the interim, especially if anybody
> can actually produce the scenario you mention.
Ok.
Thanks,
Rajat
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2013-11-25 19:03 ` Rajat Jain
@ 2014-02-12 0:34 ` Bjorn Helgaas
2014-02-12 1:08 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2014-02-12 0:34 UTC (permalink / raw)
To: Rajat Jain
Cc: Rajat Jain, linux-pci@vger.kernel.org, linux hotplug mailing,
Kenji Kaneshige, Yijing Wang, Greg KH, Tom Nguyen,
Kristen Accardi, Rajat Jain, Guenter Roeck
On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
> Hello,
>
> > > On a different note, I feel there is still a need to apply my original
> > patch. There is still an open problem in case of spurious interrupts (or
> > in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)"
> > becomes true in pcie_write_cmd()). That is because once that happens, we
> > never clear that interrupt, and no further hotplug interrupts shall be
> > received unless we do that.
> >
> > I agree this is an issue and we should address it somehow. My
> > hesitation is just that I'd prefer to do some more aggressive
> > restructuring rather than apply a point fix. For example:
>
> OK, I'll attempt to fix it that way when I get time.
>
> >
> > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(), pcie_poll_cmd(),
> > and pcie_write_cmd(). I think it would be better to look at it only in
> > pcie_isr().
> >
> > - I don't think pcie_poll_cmd() should exist at all; we should poll by
> > calling pcie_isr() instead.
> >
> > - We need pcie_write_cmd(), but I think the way it waits is backwards.
> > Currently we issue the command, then wait for it to complete. I think
> > we should issue the command, note the current time, and return without
> > waiting. The *next* time we need to issue a command, we can wait for
> > completion of the previous one (or timeout) if necessary.
> >
> > But maybe we need the point fix in the interim, especially if anybody
> > can actually produce the scenario you mention.
>
> Ok.
This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2014-02-12 0:34 ` Bjorn Helgaas
@ 2014-02-12 1:08 ` Rajat Jain
2014-02-20 7:42 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2014-02-12 1:08 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Rajat Jain, linux-pci@vger.kernel.org, linux hotplug mailing,
Kenji Kaneshige, Yijing Wang, Greg KH, Tom Nguyen,
Kristen Accardi, Rajat Jain, Guenter Roeck
SGVsbG8gQmpvcm4sDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogbGlu
dXgtaG90cGx1Zy1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1ob3RwbHVnLQ0K
PiBvd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFsZiBPZiBCam9ybiBIZWxnYWFzDQo+IFNl
bnQ6IFR1ZXNkYXksIEZlYnJ1YXJ5IDExLCAyMDE0IDQ6MzUgUE0NCj4gVG86IFJhamF0IEphaW4N
Cj4gQ2M6IFJhamF0IEphaW47IGxpbnV4LXBjaUB2Z2VyLmtlcm5lbC5vcmc7IGxpbnV4IGhvdHBs
dWcgbWFpbGluZzsgS2VuamkNCj4gS2FuZXNoaWdlOyBZaWppbmcgV2FuZzsgR3JlZyBLSDsgVG9t
IE5ndXllbjsgS3Jpc3RlbiBBY2NhcmRpOyBSYWphdA0KPiBKYWluOyBHdWVudGVyIFJvZWNrDQo+
IFN1YmplY3Q6IFJlOiBbUEFUQ0hdIHBjaWVocDogQWNrbm93bGVkZ2UgdGhlIHNwdXJpb3VzICJj
bWQgY29tcGxldGVkIg0KPiBldmVudC4NCj4gDQo+IE9uIE1vbiwgTm92IDI1LCAyMDEzIGF0IDA3
OjAzOjExUE0gKzAwMDAsIFJhamF0IEphaW4gd3JvdGU6DQo+ID4gSGVsbG8sDQo+ID4NCj4gPiA+
ID4gT24gYSBkaWZmZXJlbnQgbm90ZSwgSSBmZWVsIHRoZXJlIGlzIHN0aWxsIGEgbmVlZCB0byBh
cHBseSBteQ0KPiA+ID4gPiBvcmlnaW5hbA0KPiA+ID4gcGF0Y2guIFRoZXJlIGlzIHN0aWxsIGFu
IG9wZW4gcHJvYmxlbSBpbiBjYXNlIG9mIHNwdXJpb3VzIGludGVycnVwdHMNCj4gPiA+IChvciBp
biBhbnkgY2FzZSB3aGVyZSB0aGUgY29uZGl0aW9uICJpZiAoc2xvdF9zdGF0dXMgJg0KPiBQQ0lf
RVhQX1NMVFNUQV9DQykiDQo+ID4gPiBiZWNvbWVzIHRydWUgaW4gcGNpZV93cml0ZV9jbWQoKSku
IFRoYXQgaXMgYmVjYXVzZSBvbmNlIHRoYXQNCj4gPiA+IGhhcHBlbnMsIHdlIG5ldmVyIGNsZWFy
IHRoYXQgaW50ZXJydXB0LCBhbmQgbm8gZnVydGhlciBob3RwbHVnDQo+ID4gPiBpbnRlcnJ1cHRz
IHNoYWxsIGJlIHJlY2VpdmVkIHVubGVzcyB3ZSBkbyB0aGF0Lg0KPiA+ID4NCj4gPiA+IEkgYWdy
ZWUgdGhpcyBpcyBhbiBpc3N1ZSBhbmQgd2Ugc2hvdWxkIGFkZHJlc3MgaXQgc29tZWhvdy4gIE15
DQo+ID4gPiBoZXNpdGF0aW9uIGlzIGp1c3QgdGhhdCBJJ2QgcHJlZmVyIHRvIGRvIHNvbWUgbW9y
ZSBhZ2dyZXNzaXZlDQo+ID4gPiByZXN0cnVjdHVyaW5nIHJhdGhlciB0aGFuIGFwcGx5IGEgcG9p
bnQgZml4LiAgRm9yIGV4YW1wbGU6DQo+ID4NCj4gPiBPSywgSSdsbCBhdHRlbXB0IHRvIGZpeCBp
dCB0aGF0IHdheSB3aGVuIEkgZ2V0IHRpbWUuDQo+ID4NCj4gPiA+DQo+ID4gPiAtIFdlIGN1cnJl
bnRseSBsb29rIGF0IFBDSV9FWFBfU0xUU1RBX0NDIGluIHBjaWVfaXNyKCksDQo+ID4gPiBwY2ll
X3BvbGxfY21kKCksIGFuZCBwY2llX3dyaXRlX2NtZCgpLiAgSSB0aGluayBpdCB3b3VsZCBiZSBi
ZXR0ZXINCj4gPiA+IHRvIGxvb2sgYXQgaXQgb25seSBpbiBwY2llX2lzcigpLg0KPiA+ID4NCj4g
PiA+IC0gSSBkb24ndCB0aGluayBwY2llX3BvbGxfY21kKCkgc2hvdWxkIGV4aXN0IGF0IGFsbDsg
d2Ugc2hvdWxkIHBvbGwNCj4gPiA+IGJ5IGNhbGxpbmcgcGNpZV9pc3IoKSBpbnN0ZWFkLg0KPiA+
ID4NCj4gPiA+IC0gV2UgbmVlZCBwY2llX3dyaXRlX2NtZCgpLCBidXQgSSB0aGluayB0aGUgd2F5
IGl0IHdhaXRzIGlzDQo+IGJhY2t3YXJkcy4NCj4gPiA+ICBDdXJyZW50bHkgd2UgaXNzdWUgdGhl
IGNvbW1hbmQsIHRoZW4gd2FpdCBmb3IgaXQgdG8gY29tcGxldGUuICBJDQo+ID4gPiB0aGluayB3
ZSBzaG91bGQgaXNzdWUgdGhlIGNvbW1hbmQsIG5vdGUgdGhlIGN1cnJlbnQgdGltZSwgYW5kIHJl
dHVybg0KPiA+ID4gd2l0aG91dCB3YWl0aW5nLiAgVGhlICpuZXh0KiB0aW1lIHdlIG5lZWQgdG8g
aXNzdWUgYSBjb21tYW5kLCB3ZSBjYW4NCj4gPiA+IHdhaXQgZm9yIGNvbXBsZXRpb24gb2YgdGhl
IHByZXZpb3VzIG9uZSAob3IgdGltZW91dCkgaWYgbmVjZXNzYXJ5Lg0KPiA+ID4NCj4gPiA+IEJ1
dCBtYXliZSB3ZSBuZWVkIHRoZSBwb2ludCBmaXggaW4gdGhlIGludGVyaW0sIGVzcGVjaWFsbHkg
aWYNCj4gPiA+IGFueWJvZHkgY2FuIGFjdHVhbGx5IHByb2R1Y2UgdGhlIHNjZW5hcmlvIHlvdSBt
ZW50aW9uLg0KPiA+DQo+ID4gT2suDQo+IA0KPiBUaGlzIHBhdGNoIGlzIHN0aWxsIGluIHBhdGNo
d29yaywgYnV0IEkndmUgbG9zdCB0cmFjayBvZiB3aGVyZSB3ZSBhcmUuDQo+IERpZCB5b3UgcmVz
b2x2ZSB0aGlzIGluIHRoZSBzZXJpZXMgdGhhdCBJIGp1c3QgYXBwbGllZCwgb3IgaXMgaXQgc3Rp
bGwNCj4gYW4gb3V0c3RhbmRpbmcgaXNzdWU/DQoNCk5vLCBJIGRpZCBub3Qgc29sdmUgaXQuIEl0
IGlzIHN0aWxsIGFuIG91dHN0YW5kaW5nIGlzc3VlLiBTbyBmYXIgSSBhbSB1c2luZyB5b3VyIHBh
dGNoIHRvIG92ZXJjb21lIHRoaXM6DQoNCmh0dHA6Ly93d3cuc3Bpbmljcy5uZXQvbGlzdHMvaG90
cGx1Zy9tc2cwNTgzMC5odG1sDQoNCkknbGwganVzdCBhdHRlbXB0IHRvIGNvbmNsdWRlIHRoZSBz
dGF0dXMgb24gdGhpcyBpc3N1ZSBzbyB0aGF0IHlvdSBjYW4gbWFrZSB0aGUgZGVjaXNpb24gb24g
dGhlIGNvdXJzZSBvZiBhY3Rpb24uIElNSE8gdGhlcmUgYXJlIDIgaW5kZXBlbmRlbnQgaXNzdWVz
IHRoYXQgd2UgZGlzY3Vzc2VkIGluIHRoaXMgdGhyZWFkOg0KCQ0KMSkgUENJZSBjb21wbGlhbnQg
SFcgKHRoYXQgZ2VuZXJhdGVzIGNtZCBjb21wbGV0ZWQgaW50ZXJydXB0cyBhdCBldmVyeSB3cml0
ZSBvZiBTbG90X2N0cmwgcmVnaXN0ZXIpIGJlaW5nIHBlbmFsaXplZCB3aXRoIDEgc2Vjb25kIGRl
bGF5IGR1cmluZyB0aGUgYm9vdCB1cC4gWW91ciBwYXRjaCBzb2x2ZXMgdGhpcy4NCg0KMikgSWYg
dGhlcmUgaXMgYSBnZW51aW5lIHNwdXJpb3VzIGludGVycnVwdCwgaXQgZG9lcyBub3QgZ2V0IGFj
a25vd2xlZGdlZC4gSSBoYWQgb3JpZ2luYWxseSBwb3N0ZWQgYSBwYXRjaCBmb3IgVEhJUyBwcm9i
bGVtLg0KaHR0cDovL3d3dy5zcGluaWNzLm5ldC9saXN0cy9ob3RwbHVnL21zZzA1ODE1Lmh0bWwN
Cg0KWW91IGhhZCBpbmRpY2F0ZWQgdGhhdCB5b3Ugd291bGQgcmF0aGVyIHdhbnQgYSBiaWdnZXIg
cmVzdHJ1Y3R1cmluZyBvZiB0aGUgZHJpdmVyIHRvIHNvbHZlICgyKS4NCg0KTXkgb2JzZXJ2YXRp
b246IE1ZIHByb2JsZW0gKGluIG15IHNldHVwKSBpcyBub3Qgc2VlbiBpZiBJIHVzZSBlaXRoZXIg
b2YgdGhlIHBhdGNoZXMgKHlvdXJzIG9yIG1pbmUpLg0KDQpNeSBvcGluaW9uOiBJIHRoaW5rIG15
IHBhdGNoIHNvbHZlcyAoMikgYnV0IG1pZ2h0IG5vdCBzb2x2ZSAoMSkgZm9yIGFsbCBjb3JuZXIg
Y2FzZXMuIEFsc28geW91ciBwYXRjaCBzb2x2ZXMgKDEpIGJ1dCBtYXkgbm90IHNvbHZlICgyKSBm
b3IgYWxsIGNvcm5lciBjYXNlcyAtVGh1cyB3ZSBzaG91bGQgcHJvYmFibHkgc29sdmUgYm90aCBv
ZiB0aGVzZSBwcm9ibGVtcyBpbmRpdmlkdWFsbHkuDQoNClRoYW5rcywNCg0KUmFqYXQNCg0KDQoN
CiANCg0KDQo+IA0KPiBCam9ybg0KPiAtLQ0KPiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlz
dDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtaG90cGx1ZyINCj4gaW4gdGhlIGJv
ZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcgTW9yZSBtYWpvcmRv
bW8NCj4gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1s
DQo+IA0KDQo=
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2014-02-12 1:08 ` Rajat Jain
@ 2014-02-20 7:42 ` Rajat Jain
2014-02-20 22:20 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2014-02-20 7:42 UTC (permalink / raw)
To: Rajat Jain
Cc: Bjorn Helgaas, Rajat Jain, linux-pci@vger.kernel.org,
linux hotplug mailing, Kenji Kaneshige, Yijing Wang, Greg KH,
Tom Nguyen, Kristen Accardi, Guenter Roeck
Hello Bjorn,
>>
>> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
>> > Hello,
>> >
>> > > > On a different note, I feel there is still a need to apply my
>> > > > original
>> > > patch. There is still an open problem in case of spurious interrupts
>> > > (or in any case where the condition "if (slot_status &
>> PCI_EXP_SLTSTA_CC)"
>> > > becomes true in pcie_write_cmd()). That is because once that
>> > > happens, we never clear that interrupt, and no further hotplug
>> > > interrupts shall be received unless we do that.
>> > >
>> > > I agree this is an issue and we should address it somehow. My
>> > > hesitation is just that I'd prefer to do some more aggressive
>> > > restructuring rather than apply a point fix. For example:
>> >
>> > OK, I'll attempt to fix it that way when I get time.
>> >
>> > >
>> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
>> > > pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
>> > > to look at it only in pcie_isr().
>> > >
>> > > - I don't think pcie_poll_cmd() should exist at all; we should poll
>> > > by calling pcie_isr() instead.
>> > >
>> > > - We need pcie_write_cmd(), but I think the way it waits is
>> backwards.
>> > > Currently we issue the command, then wait for it to complete. I
>> > > think we should issue the command, note the current time, and return
>> > > without waiting. The *next* time we need to issue a command, we can
>> > > wait for completion of the previous one (or timeout) if necessary.
>> > >
>> > > But maybe we need the point fix in the interim, especially if
>> > > anybody can actually produce the scenario you mention.
>> >
>> > Ok.
>>
>> This patch is still in patchwork, but I've lost track of where we are.
>> Did you resolve this in the series that I just applied, or is it still
>> an outstanding issue?
>
> No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:
>
> http://www.spinics.net/lists/hotplug/msg05830.html
>
> I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
>
> 1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
>
> 2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
> http://www.spinics.net/lists/hotplug/msg05815.html
>
> You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
>
> My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
>
> My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
>
Just wondering if you decided on how to solve this problem?
Are you planning this for 3.15?
Thanks,
Rajat
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2014-02-20 7:42 ` Rajat Jain
@ 2014-02-20 22:20 ` Bjorn Helgaas
2014-02-21 1:43 ` Rajat Jain
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2014-02-20 22:20 UTC (permalink / raw)
To: Rajat Jain
Cc: Rajat Jain, Rajat Jain, linux-pci@vger.kernel.org,
linux hotplug mailing, Kenji Kaneshige, Yijing Wang, Greg KH,
Tom Nguyen, Kristen Accardi, Guenter Roeck
On Thu, Feb 20, 2014 at 12:42 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
> Hello Bjorn,
>
>>>
>>> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
>>> > Hello,
>>> >
>>> > > > On a different note, I feel there is still a need to apply my
>>> > > > original
>>> > > patch. There is still an open problem in case of spurious interrupts
>>> > > (or in any case where the condition "if (slot_status &
>>> PCI_EXP_SLTSTA_CC)"
>>> > > becomes true in pcie_write_cmd()). That is because once that
>>> > > happens, we never clear that interrupt, and no further hotplug
>>> > > interrupts shall be received unless we do that.
>>> > >
>>> > > I agree this is an issue and we should address it somehow. My
>>> > > hesitation is just that I'd prefer to do some more aggressive
>>> > > restructuring rather than apply a point fix. For example:
>>> >
>>> > OK, I'll attempt to fix it that way when I get time.
>>> >
>>> > >
>>> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
>>> > > pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
>>> > > to look at it only in pcie_isr().
>>> > >
>>> > > - I don't think pcie_poll_cmd() should exist at all; we should poll
>>> > > by calling pcie_isr() instead.
>>> > >
>>> > > - We need pcie_write_cmd(), but I think the way it waits is
>>> backwards.
>>> > > Currently we issue the command, then wait for it to complete. I
>>> > > think we should issue the command, note the current time, and return
>>> > > without waiting. The *next* time we need to issue a command, we can
>>> > > wait for completion of the previous one (or timeout) if necessary.
>>> > >
>>> > > But maybe we need the point fix in the interim, especially if
>>> > > anybody can actually produce the scenario you mention.
>>> >
>>> > Ok.
>>>
>>> This patch is still in patchwork, but I've lost track of where we are.
>>> Did you resolve this in the series that I just applied, or is it still
>>> an outstanding issue?
>>
>> No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:
>>
>> http://www.spinics.net/lists/hotplug/msg05830.html
>>
>> I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
>>
>> 1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
>>
>> 2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
>> http://www.spinics.net/lists/hotplug/msg05815.html
>>
>> You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
>>
>> My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
>>
>> My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
>>
>
> Just wondering if you decided on how to solve this problem?
>
> Are you planning this for 3.15?
Sorry, I haven't had a chance to work on this, so I don't think *I*
will get anything done for v3.15. To make forward progress, maybe we
should merge your original patch? Would you mind posting it again so
it gets into patchwork again?
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
@ 2014-02-21 1:42 Rajat Jain
2014-04-24 22:48 ` Bjorn Helgaas
0 siblings, 1 reply; 20+ messages in thread
From: Rajat Jain @ 2014-02-21 1:42 UTC (permalink / raw)
To: Bjorn Helgaas, linux-pci, linux-kernel, Alex Williamson
Cc: Rajat Jain, Guenter Roeck
In case of a spurious "cmd completed", pcie_write_cmd() does not
clear it, but yet expects more "cmd completed" events to be generated.
This does not happen because the previous (spurious) event has not
been acknowledged. Fix that.
Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
---
drivers/pci/hotplug/pciehp_hpc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index d7d058f..1463412 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -159,6 +159,8 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
+ PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2014-02-20 22:20 ` Bjorn Helgaas
@ 2014-02-21 1:43 ` Rajat Jain
0 siblings, 0 replies; 20+ messages in thread
From: Rajat Jain @ 2014-02-21 1:43 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Rajat Jain, Rajat Jain, linux-pci@vger.kernel.org,
linux hotplug mailing, Kenji Kaneshige, Yijing Wang, Greg KH,
Tom Nguyen, Kristen Accardi, Guenter Roeck
On Thu, Feb 20, 2014 at 2:20 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Feb 20, 2014 at 12:42 AM, Rajat Jain <rajatxjain@gmail.com> wrote:
>> Hello Bjorn,
>>
>>>>
>>>> On Mon, Nov 25, 2013 at 07:03:11PM +0000, Rajat Jain wrote:
>>>> > Hello,
>>>> >
>>>> > > > On a different note, I feel there is still a need to apply my
>>>> > > > original
>>>> > > patch. There is still an open problem in case of spurious interrupts
>>>> > > (or in any case where the condition "if (slot_status &
>>>> PCI_EXP_SLTSTA_CC)"
>>>> > > becomes true in pcie_write_cmd()). That is because once that
>>>> > > happens, we never clear that interrupt, and no further hotplug
>>>> > > interrupts shall be received unless we do that.
>>>> > >
>>>> > > I agree this is an issue and we should address it somehow. My
>>>> > > hesitation is just that I'd prefer to do some more aggressive
>>>> > > restructuring rather than apply a point fix. For example:
>>>> >
>>>> > OK, I'll attempt to fix it that way when I get time.
>>>> >
>>>> > >
>>>> > > - We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
>>>> > > pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
>>>> > > to look at it only in pcie_isr().
>>>> > >
>>>> > > - I don't think pcie_poll_cmd() should exist at all; we should poll
>>>> > > by calling pcie_isr() instead.
>>>> > >
>>>> > > - We need pcie_write_cmd(), but I think the way it waits is
>>>> backwards.
>>>> > > Currently we issue the command, then wait for it to complete. I
>>>> > > think we should issue the command, note the current time, and return
>>>> > > without waiting. The *next* time we need to issue a command, we can
>>>> > > wait for completion of the previous one (or timeout) if necessary.
>>>> > >
>>>> > > But maybe we need the point fix in the interim, especially if
>>>> > > anybody can actually produce the scenario you mention.
>>>> >
>>>> > Ok.
>>>>
>>>> This patch is still in patchwork, but I've lost track of where we are.
>>>> Did you resolve this in the series that I just applied, or is it still
>>>> an outstanding issue?
>>>
>>> No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:
>>>
>>> http://www.spinics.net/lists/hotplug/msg05830.html
>>>
>>> I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:
>>>
>>> 1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
>>>
>>> 2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
>>> http://www.spinics.net/lists/hotplug/msg05815.html
>>>
>>> You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
>>>
>>> My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
>>>
>>> My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
>>>
>>
>> Just wondering if you decided on how to solve this problem?
>>
>> Are you planning this for 3.15?
>
> Sorry, I haven't had a chance to work on this, so I don't think *I*
> will get anything done for v3.15. To make forward progress, maybe we
> should merge your original patch? Would you mind posting it again so
> it gets into patchwork again?
Just sent it again, Thanks!
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
2014-02-21 1:42 [PATCH] pciehp: Acknowledge the spurious "cmd completed" event Rajat Jain
@ 2014-04-24 22:48 ` Bjorn Helgaas
0 siblings, 0 replies; 20+ messages in thread
From: Bjorn Helgaas @ 2014-04-24 22:48 UTC (permalink / raw)
To: Rajat Jain
Cc: linux-pci, linux-kernel, Alex Williamson, Rajat Jain,
Guenter Roeck
On Thu, Feb 20, 2014 at 05:42:31PM -0800, Rajat Jain wrote:
> In case of a spurious "cmd completed", pcie_write_cmd() does not
> clear it, but yet expects more "cmd completed" events to be generated.
> This does not happen because the previous (spurious) event has not
> been acknowledged. Fix that.
>
> Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
> Signed-off-by: Rajat Jain <rajatjain@juniper.net>
> Signed-off-by: Guenter Roeck <groeck@juniper.net>
Applied to pci/hotplug for v3.16, thanks!
> ---
> drivers/pci/hotplug/pciehp_hpc.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index d7d058f..1463412 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -159,6 +159,8 @@ static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
>
> pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> if (slot_status & PCI_EXP_SLTSTA_CC) {
> + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> + PCI_EXP_SLTSTA_CC);
> if (!ctrl->no_cmd_complete) {
> /*
> * After 1 sec and CMD_COMPLETED still not set, just
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-04-24 22:48 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-21 1:42 [PATCH] pciehp: Acknowledge the spurious "cmd completed" event Rajat Jain
2014-04-24 22:48 ` Bjorn Helgaas
-- strict thread matches above, loose matches on Subject: below --
2013-11-05 22:33 Rajat Jain
2013-11-06 0:25 ` Bjorn Helgaas
2013-11-06 2:38 ` Rajat Jain
2013-11-07 21:53 ` Rajat Jain
2013-11-08 1:23 ` Bjorn Helgaas
2013-11-08 17:30 ` Rajat Jain
2013-11-08 23:15 ` Bjorn Helgaas
2013-11-11 21:26 ` Rajat Jain
2013-11-23 0:51 ` Bjorn Helgaas
2013-11-23 1:59 ` Guenter Roeck
2013-11-23 14:56 ` Rajat Jain
2013-11-23 19:32 ` Bjorn Helgaas
2013-11-25 19:03 ` Rajat Jain
2014-02-12 0:34 ` Bjorn Helgaas
2014-02-12 1:08 ` Rajat Jain
2014-02-20 7:42 ` Rajat Jain
2014-02-20 22:20 ` Bjorn Helgaas
2014-02-21 1:43 ` Rajat Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).