On 11/11/2011 05:11 PM, Matthew Garrett wrote: > On Fri, Nov 11, 2011 at 12:27:55PM +0000, James Bottomley wrote: > >> The next question is: is the driver the correct place? This sounds like >> a PCIe Link Power Management blacklist set ... which might need updating >> on the fly ... might we need a user knob for this (like we have for the >> SCSI black/white list)? > > The only thing that knows whether any given piece of hardware is broken > here is the hardware-specific driver. There's already a user knob for > this, but some devices will fall over the moment the hardware is enabled > if the ASPM state is broken which makes it more difficult to fix up that > way. It's quite hard to identify all affected hardware, as far as I understand in all cases it was various controllers. Probably hardware vendor could do it, but I doubt that it is possible in current situation. Just some details about my test node: it was Adaptec 5805 [ 3.045666] Adaptec aacraid driver 1.1-7[28000]-ms [ 3.045964] aacraid 0000:04:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32 [ 3.311288] AAC0: kernel 5.2-0[18252] Nov 22 2010 [ 3.311398] AAC0: monitor 5.2-0[18252] [ 3.311508] AAC0: bios 5.2-0[18252] [ 3.311611] AAC0: serial 1D2811B07D0 [ 3.311715] AAC0: Non-DASD support enabled. [ 3.311818] AAC0: 64bit support enabled. [ 3.311927] AAC0: 64 Bit DAC enabled [ 3.319115] scsi0 : aacraid When ASPM policy was changed on non-patched driver there was lot of following messages in dmesg [ 6.279007] pcieport 0000:00:09.0: [ 6] Bad TLP [ 6.279119] pcieport 0000:00:09.0: AER: Corrected error received: id=0048 [ 6.279378] pcieport 0000:00:09.0: AER: Corrected error received: id=0048 [ 6.279495] pcieport 0000:00:09.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0048(Receiver ID) [ 6.279661] pcieport 0000:00:09.0: device [8086:3410] error status/mask=00000040/00000001 [ 6.279823] pcieport 0000:00:09.0: [ 6] Bad TLP [ 6.283044] pcieport 0000:00:09.0: AER: Corrected error received: id=0048 [ 6.283185] pcieport 0000:00:09.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0048(Receiver ID) [ 6.283391] pcieport 0000:00:09.0: device [8086:3410] error status/mask=00000040/00000001 lspci output is attached thank you, Vasily Averin