* Re: suspended DRAM bridge
2013-04-01 21:41 ` Rafael J. Wysocki
@ 2013-04-01 22:03 ` Martin Mokrejs
2013-04-02 14:30 ` Martin Mokrejs
1 sibling, 0 replies; 5+ messages in thread
From: Martin Mokrejs @ 2013-04-01 22:03 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux PM list, Sarah Sharp
Rafael J. Wysocki wrote:
> On Monday, April 01, 2013 11:09:09 PM Martin Mokrejs wrote:
>> Hi Rafael,
>>
>> Rafael J. Wysocki wrote:
>>> On Monday, April 01, 2013 06:50:17 PM Martin Mokrejs wrote:
>>>> Hi Rafael,
>>>> I have a simple question. Why seems my DRAM controller suspended?
>>>
>>> I suppose that runtime PM is disabled for that device and therefore
>>> runtime_status is meaningless.
>>
>> But I really mean this pair of values:
>>
>> /sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
>>
>> /sys/bus/pci/devices/0000:00:00.0/power/control:auto
>>
>>
>>>
>>> And really, that attribute is for *debugging* things by developers who know
>>> what they are looking for and not for random poking.
>>
>> Well, if me or you are to figure out why laptop-mode-tools make my life
>> even more miserable with hotplug issues the requests to provide
>>
>> grep . /sys/bus/pci/devices/*/power/runtime_status
>> grep . /sys/bus/pci/devices/*/power/control
>>
>> provide crap. How can I infer something if I cannot trust the values?
>
> The phrase "trust the values" above doesn't make sense. You need to know what
> the values are supposed to *mean* in the first place, which you evidently
> don't.
>
> And what they mean is:
>
> /sys/devices/.../power/runtime_status - the current value of the device's
> runtime_status attribute at the moment. [Notice that you need to know the code
> in question to know the meaning of that attribute.] That field always has
> certain value, even though it may not make sense *to* *you*.
That's why I asked what 'suspended' means in case of a DRAM bridge (of the only
bridge in a working laptop). ;-)
>
> /sys/devices/.../power/control - "on" means that user space doesn't allow the
> device to be runtime-suspended, while "auto" meanse that it *does* allow that
> to happen. Nothing more or less than that.
>
> In particular, "auto" doesn't need to mean that the device will be
> runtime-suspended at all and the value of runtime_status requires
> interpretation.
But your conclusion was that
/sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
does NOT mean it is actually suspended otherwise the laptop would not work.
>
> That's how it goes.
I did realize that control:'auto' does NOT immediately mean runtime_status:'suspended',
don't worry. I posted above a pair of values for 0000:00:00.0 and your explanation
seemed that
/sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
and
/sys/bus/pci/devices/0000:00:00.0/power/control:auto
does not say the truth (that the device is actually suspended).
But if I sent you /sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
then what for do you need /sys/bus/pci/devices/0000:00:00.0/power/control ?
And even, if I sent /sys/bus/pci/devices/0000:00:00.0/power/control:auto.
>
> Now, if you want me (or anyone else on this list) to help you, why don't you
> test 3.9-rc5 with the patch at https://patchwork.kernel.org/patch/2368081/
> applied and send *one* message describing *briefly* what *does* *not* *work*
> for you, without attaching any logs, lspci outputs and so on just yet?
You just asked, ok, will do.
>
> Then, we can try to address the problems you have in 3.9-rc5 and go back to the
> (still supported) 'stable' kernels from there.
>
> Does that sound like a workable plan?
Sure.
Martin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: suspended DRAM bridge
2013-04-01 21:41 ` Rafael J. Wysocki
2013-04-01 22:03 ` Martin Mokrejs
@ 2013-04-02 14:30 ` Martin Mokrejs
1 sibling, 0 replies; 5+ messages in thread
From: Martin Mokrejs @ 2013-04-02 14:30 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux PM list, Sarah Sharp
Rafael J. Wysocki wrote:
> Now, if you want me (or anyone else on this list) to help you, why don't you
> test 3.9-rc5 with the patch at https://patchwork.kernel.org/patch/2368081/
> applied and send *one* message describing *briefly* what *does* *not* *work*
> for you, without attaching any logs, lspci outputs and so on just yet?
>
> Then, we can try to address the problems you have in 3.9-rc5 and go back to the
> (still supported) 'stable' kernels from there.
So, I tried 3.9-rc5 with the above patch and neither of my issues with dead xHCI
port nor pciehp being completely broken fixed. I did not test whether there is an
improvement on the acpiphp side (which was a regression in 3.8 compared to 3.7).
I tested all under laptop-mode-tools which enabled the powersaving.
The xHCI issue while pcie_aspm=off:
An unplug of a mouse results in:
/sys/bus/pci/devices/0000:00:1c.4/power/runtime_status:active
[cut]
-/sys/bus/pci/devices/0000:0b:00.0/power/runtime_status:active
+/sys/bus/pci/devices/0000:0b:00.0/power/runtime_status:suspended
After the unplug the TI chip does not detect a device being re-connected to the port
while it is suspended. I think this just repeats what Sarah already said. Either 'lsusb -v'
or 'echo on > /sys/bus/pci/devices/0000:0b:00.0/power/control' recover from the problem.
That also ensures the dead port does not happen again upon next USB device unplug.
Either pcieport should deny suspend of the particular device underneath or xhci_hcd
should do it by itself as it actually caused the suicide itself.
The eSATA-card based pciehp testing while pcie_aspm=off:
Although 1c.7 and 11:00 were not suspended an eject of the cold-plugged card was unnoticed.
That results in /proc/iomem and /proc/interrupts reporting old values claiming the cold-boot
status did not change during hot eject.
'lspci -vvv' reports 0xff values for a broken 11:00 entry which covers only the very first
line of an entry (like lspci without extra verbosity).
Then, rmmod sata_sil24 removes just the driver association with memory ranges
assigned to it but the 11:00 device remains in /proc/iomem with its memory ranges.
At the same time /proc/interrupts claims IRQ 19 was released (if we can trust it).
If it's unclear, due to those 0xff one cannot squeeze any details from lspci.
In dmesg just the rmmod caused some new lines (confirming the eject was unnoticed).
Subsequent hot insert of the card does not result in IRQ being obtained if we trust
/proc/interrupts claimed it was never released. But, lspci shows it received
IRQ 19 and compared to cold-booted state with a card inserted and driver loaded,
'Latency: 0, Cache Line Size: 64 bytes' does not appear anymore in 'lspci -vvv'
describing the hot plugged card. Actually, the whole line is gone.
During the hot insert a driver was not loaded so manual modprobe sata_sil24 loads the
driver but the driver is failing already during its init with 'enabling device (0000 -> 0003)'
and 'failed to clear port RST' while it claims it is using IRQ 19.
Loading of the driver caused 'Latency: 0' line to pop up in 'lspci -vvv' for the 11:00 device.
Attaching a SATA drive to the eSATA card is not noticed anywhere.
Subsequent rescan-scsi-bus call from a shell does not help, sata_sil24 tries same procedure
like during its initialization which failed. So it fails again. SATA disk attached to the card
is not detected, link remains down.
I could have distilled more out of the log files but you wanted the answer to be brief,
so, in brief, it still doesn't work.
Because this goes only to you two and linux-pm, please let me know whether linux-pci
and linux-acpi should be updated on this. They got quite a lot of info last days/weeks
and this email goes to maybe a lot other people and avoid those who were already involved.
I will leave it up to you. Just a note.
Martin
^ permalink raw reply [flat|nested] 5+ messages in thread