Re: suspended DRAM bridge

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

* Re: suspended DRAM bridge
       [not found] <5159BAC9.80700@fold.natur.cuni.cz>
@ 2013-04-01 20:56 ` Rafael J. Wysocki
  2013-04-01 21:09   ` Martin Mokrejs
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2013-04-01 20:56 UTC (permalink / raw)
  To: Martin Mokrejs; +Cc: Linux PM list

On Monday, April 01, 2013 06:50:17 PM Martin Mokrejs wrote:
> Hi Rafael,
>   I have a simple question. Why seems my DRAM controller suspended?

I suppose that runtime PM is disabled for that device and therefore
runtime_status is meaningless.

And really, that attribute is for *debugging* things by developers who know
what they are looking for and not for random poking.

Besides, this is a host bridge, not a DRAM controller.

> Does it make any sense in a laptop computer? How could the laptop work at all?

I suppose it wouldn't work if the PCI host bridge were suspended.  At least
it couldn't access memory and the PCI bus then.

Thanks,
Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: suspended DRAM bridge
  2013-04-01 20:56 ` suspended DRAM bridge Rafael J. Wysocki
@ 2013-04-01 21:09   ` Martin Mokrejs
  2013-04-01 21:41     ` Rafael J. Wysocki
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Mokrejs @ 2013-04-01 21:09 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list

Hi Rafael,

Rafael J. Wysocki wrote:
> On Monday, April 01, 2013 06:50:17 PM Martin Mokrejs wrote:
>> Hi Rafael,
>>   I have a simple question. Why seems my DRAM controller suspended?
> 
> I suppose that runtime PM is disabled for that device and therefore
> runtime_status is meaningless.

But I really mean this pair of values:

/sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended

/sys/bus/pci/devices/0000:00:00.0/power/control:auto

> 
> And really, that attribute is for *debugging* things by developers who know
> what they are looking for and not for random poking.

Well, if me or you are to figure out why laptop-mode-tools make my life
even more miserable with hotplug issues the requests to provide

grep . /sys/bus/pci/devices/*/power/runtime_status
grep . /sys/bus/pci/devices/*/power/control

provide crap. How can I infer something if I cannot trust the values?
Don't you think that this is the reason why you have a headache and me as well?
Seriously, only pcieport driver reports PME# enabled/disabled messages
in my system although

find  /sys/bus/pci/devices/*/power/ -name control | while read f; do echo on > $f; done

should trigger similar message from other drivers as well! Provided they are
somewhat equally verbose under same kernel debug level. But they are all silent.
And if I start to think what the values mean it looks silly my only DRAM controller
is suspended.

I really do think that devices which cannot be ever suspended under a particular
condition should not claim they are suspended if they did not.

I reported that I sometimes see only PME# enabled (or just disabled) in dmesg
from a same device in dmesg *NOT* the accompanied opposite action on the same device.
As I see that most of my pci devices do not ever report a change of their status
I was hoping /sys/bus/pci/devices/*/power/runtime_status is correct.

> 
> Besides, this is a host bridge, not a DRAM controller.

Hmm.

> 
>> Does it make any sense in a laptop computer? How could the laptop work at all?
> 
> I suppose it wouldn't work if the PCI host bridge were suspended.  At least
> it couldn't access memory and the PCI bus then.

Thank you, had the same, although naive, expectation.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: suspended DRAM bridge
  2013-04-01 21:09   ` Martin Mokrejs
@ 2013-04-01 21:41     ` Rafael J. Wysocki
  2013-04-01 22:03       ` Martin Mokrejs
  2013-04-02 14:30       ` Martin Mokrejs
  0 siblings, 2 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2013-04-01 21:41 UTC (permalink / raw)
  To: Martin Mokrejs; +Cc: Linux PM list, Sarah Sharp

On Monday, April 01, 2013 11:09:09 PM Martin Mokrejs wrote:
> Hi Rafael,
> 
> Rafael J. Wysocki wrote:
> > On Monday, April 01, 2013 06:50:17 PM Martin Mokrejs wrote:
> >> Hi Rafael,
> >>   I have a simple question. Why seems my DRAM controller suspended?
> > 
> > I suppose that runtime PM is disabled for that device and therefore
> > runtime_status is meaningless.
> 
> But I really mean this pair of values:
> 
> /sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
> 
> /sys/bus/pci/devices/0000:00:00.0/power/control:auto
> 
> 
> > 
> > And really, that attribute is for *debugging* things by developers who know
> > what they are looking for and not for random poking.
> 
> Well, if me or you are to figure out why laptop-mode-tools make my life
> even more miserable with hotplug issues the requests to provide
> 
> grep . /sys/bus/pci/devices/*/power/runtime_status
> grep . /sys/bus/pci/devices/*/power/control
> 
> provide crap. How can I infer something if I cannot trust the values?

The phrase "trust the values" above doesn't make sense.  You need to know what
the values are supposed to *mean* in the first place, which you evidently
don't.

And what they mean is:

/sys/devices/.../power/runtime_status - the current value of the device's
runtime_status attribute at the moment.  [Notice that you need to know the code
in question to know the meaning of that attribute.]  That field always has
certain value, even though it may not make sense *to* *you*.

/sys/devices/.../power/control - "on" means that user space doesn't allow the
device to be runtime-suspended, while "auto" meanse that it *does* allow that
to happen.  Nothing more or less than that.

In particular, "auto" doesn't need to mean that the device will be
runtime-suspended at all and the value of runtime_status requires
interpretation.

That's how it goes.

Now, if you want me (or anyone else on this list) to help you, why don't you
test 3.9-rc5 with the patch at https://patchwork.kernel.org/patch/2368081/
applied and send *one* message describing *briefly* what *does* *not* *work*
for you, without attaching any logs, lspci outputs and so on just yet?

Then, we can try to address the problems you have in 3.9-rc5 and go back to the
(still supported) 'stable' kernels from there.

Does that sound like a workable plan?

Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: suspended DRAM bridge
  2013-04-01 21:41     ` Rafael J. Wysocki
@ 2013-04-01 22:03       ` Martin Mokrejs
  2013-04-02 14:30       ` Martin Mokrejs
  1 sibling, 0 replies; 5+ messages in thread
From: Martin Mokrejs @ 2013-04-01 22:03 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list, Sarah Sharp

Rafael J. Wysocki wrote:
> On Monday, April 01, 2013 11:09:09 PM Martin Mokrejs wrote:
>> Hi Rafael,
>>
>> Rafael J. Wysocki wrote:
>>> On Monday, April 01, 2013 06:50:17 PM Martin Mokrejs wrote:
>>>> Hi Rafael,
>>>>   I have a simple question. Why seems my DRAM controller suspended?
>>>
>>> I suppose that runtime PM is disabled for that device and therefore
>>> runtime_status is meaningless.
>>
>> But I really mean this pair of values:
>>
>> /sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
>>
>> /sys/bus/pci/devices/0000:00:00.0/power/control:auto
>>
>>
>>>
>>> And really, that attribute is for *debugging* things by developers who know
>>> what they are looking for and not for random poking.
>>
>> Well, if me or you are to figure out why laptop-mode-tools make my life
>> even more miserable with hotplug issues the requests to provide
>>
>> grep . /sys/bus/pci/devices/*/power/runtime_status
>> grep . /sys/bus/pci/devices/*/power/control
>>
>> provide crap. How can I infer something if I cannot trust the values?
> 
> The phrase "trust the values" above doesn't make sense.  You need to know what
> the values are supposed to *mean* in the first place, which you evidently
> don't.
> 
> And what they mean is:
> 
> /sys/devices/.../power/runtime_status - the current value of the device's
> runtime_status attribute at the moment.  [Notice that you need to know the code
> in question to know the meaning of that attribute.]  That field always has
> certain value, even though it may not make sense *to* *you*.

That's why I asked what 'suspended' means in case of a DRAM bridge (of the only
bridge in a working laptop). ;-)

> 
> /sys/devices/.../power/control - "on" means that user space doesn't allow the
> device to be runtime-suspended, while "auto" meanse that it *does* allow that
> to happen.  Nothing more or less than that.
> 
> In particular, "auto" doesn't need to mean that the device will be
> runtime-suspended at all and the value of runtime_status requires
> interpretation.

But your conclusion was that 
/sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
does NOT mean it is actually suspended otherwise the laptop would not work.

> 
> That's how it goes.

I did realize that control:'auto' does NOT immediately mean runtime_status:'suspended',
don't worry. I posted above a pair of values for 0000:00:00.0 and your explanation
seemed that 
/sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
and
/sys/bus/pci/devices/0000:00:00.0/power/control:auto

does not say the truth (that the device is actually suspended).


But if I sent you /sys/bus/pci/devices/0000:00:00.0/power/runtime_status:suspended
then what for do you need /sys/bus/pci/devices/0000:00:00.0/power/control ?
And even, if I sent /sys/bus/pci/devices/0000:00:00.0/power/control:auto.

> 
> Now, if you want me (or anyone else on this list) to help you, why don't you
> test 3.9-rc5 with the patch at https://patchwork.kernel.org/patch/2368081/
> applied and send *one* message describing *briefly* what *does* *not* *work*
> for you, without attaching any logs, lspci outputs and so on just yet?

You just asked, ok, will do.

> 
> Then, we can try to address the problems you have in 3.9-rc5 and go back to the
> (still supported) 'stable' kernels from there.
> 
> Does that sound like a workable plan?

Sure.

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: suspended DRAM bridge
  2013-04-01 21:41     ` Rafael J. Wysocki
  2013-04-01 22:03       ` Martin Mokrejs
@ 2013-04-02 14:30       ` Martin Mokrejs
  1 sibling, 0 replies; 5+ messages in thread
From: Martin Mokrejs @ 2013-04-02 14:30 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list, Sarah Sharp

Rafael J. Wysocki wrote:
> Now, if you want me (or anyone else on this list) to help you, why don't you
> test 3.9-rc5 with the patch at https://patchwork.kernel.org/patch/2368081/
> applied and send *one* message describing *briefly* what *does* *not* *work*
> for you, without attaching any logs, lspci outputs and so on just yet?
> 
> Then, we can try to address the problems you have in 3.9-rc5 and go back to the
> (still supported) 'stable' kernels from there.

So, I tried 3.9-rc5 with the above patch and neither of my issues with dead xHCI
port nor pciehp being completely broken fixed. I did not test whether there is an
improvement on the acpiphp side (which was a regression in 3.8 compared to 3.7).
I tested all under laptop-mode-tools which enabled the powersaving.

The xHCI issue while pcie_aspm=off:
An unplug of a mouse results in:

/sys/bus/pci/devices/0000:00:1c.4/power/runtime_status:active
[cut]
-/sys/bus/pci/devices/0000:0b:00.0/power/runtime_status:active
+/sys/bus/pci/devices/0000:0b:00.0/power/runtime_status:suspended

After the unplug the TI chip does not detect a device being re-connected to the port
while it is suspended. I think this just repeats what Sarah already said. Either 'lsusb -v'
or 'echo on > /sys/bus/pci/devices/0000:0b:00.0/power/control' recover from the problem.
That also ensures the dead port does not happen again upon next USB device unplug.
Either pcieport should deny suspend of the particular device underneath or xhci_hcd
should do it by itself as it actually caused the suicide itself.

The eSATA-card based pciehp testing while pcie_aspm=off:
Although 1c.7 and 11:00 were not suspended an eject of the cold-plugged card was unnoticed.
That results in /proc/iomem and /proc/interrupts reporting old values claiming the cold-boot
status did not change during hot eject.
'lspci -vvv' reports 0xff values for a broken 11:00 entry which covers only the very first
line of an entry (like lspci without extra verbosity).
Then, rmmod sata_sil24 removes just the driver association with memory ranges
assigned to it but the 11:00 device remains in /proc/iomem with its memory ranges.
At the same time /proc/interrupts claims IRQ 19 was released (if we can trust it).
If it's unclear, due to those 0xff one cannot squeeze any details from lspci.
In dmesg just the rmmod caused some new lines (confirming the eject was unnoticed).

Subsequent hot insert of the card does not result in IRQ being obtained if we trust
/proc/interrupts claimed it was never released. But, lspci shows it received
IRQ 19 and compared to cold-booted state with a card inserted and driver loaded,
'Latency: 0, Cache Line Size: 64 bytes' does not appear anymore in 'lspci -vvv'
describing the hot plugged card. Actually, the whole line is gone.

During the hot insert a driver was not loaded so manual modprobe sata_sil24 loads the
driver but the driver is failing already during its init with 'enabling device (0000 -> 0003)'
and 'failed to clear port RST' while it claims it is using IRQ 19.
Loading of the driver caused 'Latency: 0' line to pop up in 'lspci -vvv' for the 11:00 device.

Attaching a SATA drive to the eSATA card is not noticed anywhere.

Subsequent rescan-scsi-bus call from a shell does not help, sata_sil24 tries same procedure
like during its initialization which failed. So it fails again. SATA disk attached to the card
is not detected, link remains down.

I could have distilled more out of the log files but you wanted the answer to be brief,
so, in brief, it still doesn't work.

Because this goes only to you two and linux-pm, please let me know whether linux-pci
and linux-acpi should be updated on this. They got quite a lot of info last days/weeks
and this email goes to maybe a lot other people and avoid those who were already involved.
I will leave it up to you. Just a note.

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-02 14:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5159BAC9.80700@fold.natur.cuni.cz>
2013-04-01 20:56 ` suspended DRAM bridge Rafael J. Wysocki
2013-04-01 21:09   ` Martin Mokrejs
2013-04-01 21:41     ` Rafael J. Wysocki
2013-04-01 22:03       ` Martin Mokrejs
2013-04-02 14:30       ` Martin Mokrejs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox