All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Frans Pop <elendil@planet.nl>, Greg KH <greg@kroah.com>,
	Ingo Molnar <mingo@elte.hu>,
	jbarnes@virtuousgeek.org, lenb@kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	tiwai@suse.de, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected)
Date: Thu, 4 Dec 2008 23:40:58 +0100	[thread overview]
Message-ID: <200812042340.58694.rjw@sisk.pl> (raw)
In-Reply-To: <alpine.LFD.2.00.0812040754020.3256@nehalem.linux-foundation.org>

On Thursday, 4 of December 2008, Linus Torvalds wrote:
> 
> On Thu, 4 Dec 2008, Frans Pop wrote:
> 
> > On Wednesday 03 December 2008, Linus Torvalds wrote:
> > > Well, I think that what _would_ be generally correct, and actually
> > > pretty simple, is a rather different approach: just not sizing things
> > > behind a transparent bridge AT ALL, since it really shouldn't matter.
> > 
> > I've given your patch a try and the few resumes from STR I've done were 
> > all successful. That's not 100% conclusive yet, but a nice start.
> > Some info from logs etc. below.
> 
> Ok, but I thought you had a hard time reproducing this _anyway_, even with 
> just plain -rc7. No?
> 
> That said, of the various patches posted, the "don't bother allocating 
> bridging windows for transparent bridges" one is not just the simplest, 
> but the only one that actually makes sense so far.
> 
> So I'm happy it's apparently working for you, I'm just wondering about 
> whather your success means a lot. It seems that Rafael is the one who had 
> more failures?

This most probably is correct and I got a resume failure with that patch
applied, so it evidently doesn't fix the problem. :-(

> > > > Also, I would be happy to actually understand _why_ this happens.
> > >
> > > 100% agreed. I do _not_ see why it should ever matter how we set up a
> > > PCI bridging window - whether prefetchable or not - on a bridge that
> > > should be transparent. It sounds really odd. I'm wondering if there is
> > > something we're missing here.
> > 
> > The theory that it is really a resume issue and not a device layout issue 
> > sounds logical. Especially as everything always works correctly after a 
> > normal boot.
> 
> Yes, that does sound like a convincing argument. Usually real PCI resource 
> clashes result in some kind of run-time problems, and wouldn't necessarily 
> be suspend-specific per se.
> 
> That said, suspend/resume does a lot of unusual things, so it could still 
> be some odd PCI resource clash that only triggers problems in the 
> suspend/resume case. But since the exact layouts and the sizing of the 
> resources doesn't really seem to matter, a simple PCI resource clash seems 
> rather unlikely.

I agree.

That said the "don't bother allocating bridging windows for transparent
bridges" patch resulted in the following layout on my box (from /proc/iomem):

88000000-8807ffff : 0000:00:02.1
88080000-88083fff : 0000:00:1b.0
  88080000-88083fff : ICH HD audio
88084000-88087fff : 0000:03:0b.1
88088000-88088fff : 0000:03:0b.0
  88088000-88088fff : yenta_socket
88089000-880897ff : 0000:03:0b.1
  88089000-880897ff : firewire_ohci
88089800-880898ff : 0000:03:0b.3
  88089800-880898ff : mmc0
8808a000-8808afff : Intel Flush Page
8c000000-8fffffff : PCI CardBus 0000:04
90000000-93ffffff : PCI CardBus 0000:04

while my "don't allocate bridging windows for cardbus bridges behind
transparent bridges" patch I've just sent (appended for easier reference)
results in the layout:

88000000-880fffff : PCI Bus 0000:03
  88000000-88003fff : 0000:03:0b.1
  88004000-88004fff : 0000:03:0b.0
    88004000-88004fff : yenta_socket
  88005000-880057ff : 0000:03:0b.1
    88005000-880057ff : firewire_ohci
  88005800-880058ff : 0000:03:0b.3
    88005800-880058ff : mmc0
88100000-8817ffff : 0000:00:02.1
88180000-88183fff : 0000:00:1b.0
  88180000-88183fff : ICH HD audio
88184000-88184fff : Intel Flush Page
8c000000-8fffffff : PCI CardBus 0000:04
90000000-93ffffff : PCI CardBus 0000:04

where devices behind the transparent bridge (PCI Bus 0000:03) are located
_before_ ICH HD audio in the memory address space, and this one appears to
work.  So there _may_ be an effect of the layout too.

> So some kind of resume-time ordering or timing issue does seem like the 
> most likely thing. But that still leaves us not knowing what the real 
> _root_ cause of this all is - very irritating. Even if not allocating the 
> unnecessary bridging windows "fixes" things, it would be really really 
> good to know exactly what it is that causes problems.

Well, given that both affected boxes have the same chipset (945GM), I seriously
suspect a nastiness in that chipset we're not aware of.  Especially that
the problem is not reproducible without snd_hda_intel (at least on my box).

> > Below info from 3 kernels, all based on 2.6.28-rc7-91:
> > A) unpatched
> > B) with the revert/debug patch
> > C) with the oneliner "ignore transparent bridges" patch
> > 
> > AFAICT all results are probably as expected.
> > 
> > From lspci -vvxxx:
> > 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge
> > - for A)
> > 	I/O behind bridge: 00003000-00003fff
> > 	Memory behind bridge: e0100000-e03fffff
> > 	Prefetchable memory behind bridge: 0000000080000000-0000000083ffffff
> > - for B)
> > 	I/O behind bridge: 00003000-00003fff
> > 	Memory behind bridge: e0100000-e03fffff
> > - for C)
> > 	Memory behind bridge: e0100000-e03fffff
> 
> And this all makes total sense. The e0100000-e03fffff MMIO bridge range is 
> apparently set up by the firmware, which is why it shows up in all cases. 
> And the (A) case has that prefetchable memory range, because that's the 
> only case that finds - and cares about - the prefetch window for the 
> CardBus controller. 
> 
> And both (A) and (B) have the IO bridging window, because regardless of 
> whether we see a valid CardBus prefetchable memory window with good 
> alignment, we'll always see the IO ports, so we'll try to allocate that 
> bridging window, except in (C) when we decide that due to the transparent 
> nature, we simply don't care.
> 
> So the PCI resources make sense in all three cases, and we understand 
> those. The differences in the actual Cardbus ranges also all make sense. 
> So it all still boils down to the PCI layer doing everything right in 
> _all_ cases, just making slightly different - but all valid - choices 
> depending on essentially random details (eg the revert/debug patch case 
> the "random detail" is just enabling a small incorrect alignment).
> 
> IOW, it really doesn't look like a PCI resource allocator bug. Quite the 
> reverse, I'd say that in the end this whole thread points out just how 
> robust the whole PCI and cardbus resource allocation is, with the code 
> really very gracefully just adjusting in a sane manner to all these 
> different cases.
> 
> Of course, none of that helps us with any kind of idea of what the real 
> problem is. Device ordering bug in setting up PCI resources at resume? 
> Perhaps just a plain bug in PCI bridge resume code (even when you resume 
> things in the right order)?
> 
> And I still worry that perhaps it's just a timing bug, where having a PCI 
> bridging window changes timing of various PCI accesses, and the _real_ bug 
> is actually in the sound card or ethernet driver resume, which happens to 
> work with one timing and not with another.
> 
> Since it's apparently STR, has anybody gotten _anything_ sane out of 
> trying to enable PM_TRACE_RTC, and then doing that 
> 
> 	echo 1 > /sys/power/pm_trace
> 
> because even with the (very limited) set of standard trace-points, it 
> should still be able to tell which device we were trying to resume last in 
> the failure case Maybe that gives some hint?

Well, I think more fine-grained debugging will be necessary.

I've already checked the resume ordering of PCI devices on my box and it
is the following:

pci:0000:00:00.0
pci:0000:00:02.0 <- graphics
pci:0000:00:02.1 <- graphics
pci:0000:00:1b.0 <- snd_hda_intel
pci:0000:00:1c.0 <- PCI Express port 1
pci:0000:00:1c.2 <- PCI Express port 3
pci:0000:00:1d.0 <- USB UHCI
pci:0000:00:1d.1 <- USB UHCI
pci:0000:00:1d.2 <- USB UHCI
pci:0000:00:1d.3 <- USB UHCI
pci:0000:00:1d.7 <- USB EHCI
pci:0000:00:1e.0 <- transparent bridge (Intel Corporation 82801 Mobile PCI Bridge)
pci:0000:00:1f.0 <- ISA bridge
pci:0000:00:1f.2 <- SATA (ahci)
pci:0000:01:00.0 <- e1000e
No Bus:0000:01
pci:0000:02:00.0 <- wireless (iwlagn)
No Bus:0000:02
pci:0000:03:0b.0 <- cardbus bridge
pci:0000:03:0b.1 <- FireWire
pci:0000:03:0b.3 <- SD Host controller (Texas Instruments)
No Bus:0000:04
No Bus:0000:03

So, snd_hda_intel resumes before all of the bridges and the layout of devices
_behind_ the transparent bridge shouldn't affect it at all.

Thanks,
Rafael

  parent reply	other threads:[~2008-12-04 22:41 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-02  2:20 Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected) Rafael J. Wysocki
2008-12-02  3:32 ` Linus Torvalds
2008-12-02  3:42   ` Linus Torvalds
2008-12-02  4:31     ` Frans Pop
2008-12-02  4:46       ` Linus Torvalds
2008-12-02  5:29         ` Frans Pop
2008-12-02  5:56           ` Frans Pop
2008-12-02 15:46           ` Linus Torvalds
2008-12-02 17:46             ` Frans Pop
2008-12-02 18:17               ` Linus Torvalds
2008-12-05  8:53             ` MSI changes in .28 Frans Pop
2008-12-05  9:09               ` Yinghai Lu
2008-12-05 12:20               ` Ingo Molnar
2008-12-05 13:04                 ` Eric Dumazet
2008-12-05 17:49                 ` H. Peter Anvin
2008-12-02  4:13   ` Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected) Frans Pop
2008-12-02  4:36     ` Linus Torvalds
2008-12-02 22:38       ` Rafael J. Wysocki
2008-12-02 23:37         ` Linus Torvalds
2008-12-03  0:00           ` Rafael J. Wysocki
2008-12-03  0:05             ` Rafael J. Wysocki
2008-12-03  0:31             ` Rafael J. Wysocki
2008-12-03  0:41             ` Linus Torvalds
2008-12-03  1:22               ` Rafael J. Wysocki
2008-12-03  2:02                 ` Linus Torvalds
2008-12-03  7:40                   ` Rafael J. Wysocki
2008-12-03  7:52                     ` Rafael J. Wysocki
2008-12-03 11:20                       ` Rafael J. Wysocki
2008-12-03 15:53                         ` Linus Torvalds
2008-12-04  1:23                           ` Rafael J. Wysocki
2008-12-04  4:40                             ` Linus Torvalds
2008-12-04  8:21                               ` Frans Pop
2008-12-04 22:01                               ` Rafael J. Wysocki
2008-12-04 11:29                           ` Frans Pop
2008-12-04 16:17                             ` Linus Torvalds
2008-12-04 18:00                               ` Frans Pop
2008-12-04 20:03                                 ` Linus Torvalds
2008-12-05 21:26                                   ` Linus Torvalds
2008-12-05 22:01                                     ` Rafael J. Wysocki
2008-12-05 22:14                                       ` Linus Torvalds
2008-12-06  0:04                                         ` Rafael J. Wysocki
2008-12-06  0:50                                           ` Linus Torvalds
2008-12-06  1:18                                             ` Rafael J. Wysocki
2008-12-06  1:55                                               ` Linus Torvalds
2008-12-06  2:18                                                 ` Rafael J. Wysocki
2008-12-06 13:53                                                   ` Rafael J. Wysocki
2008-12-06  2:45                                                 ` Greg KH
2009-01-28 12:00                                     ` Frans Pop
2009-01-29 14:11                                       ` Ingo Molnar
2009-01-29 14:48                                         ` Rafael J. Wysocki
2009-01-29 16:44                                           ` Alexey Starikovskiy
2009-01-30  4:35                                         ` Frans Pop
2008-12-06  9:20                                   ` [patch,rfc] usb: restore config before enabling device on resume Frans Pop
2008-12-06 13:48                                     ` Rafael J. Wysocki
2008-12-06 15:02                                       ` Frans Pop
2008-12-10 14:06                                   ` "APIC error on CPU1: 00(40)" during resume (was: Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500) Frans Pop
2008-12-10 15:51                                     ` Linus Torvalds
2008-12-10 16:05                                       ` Frans Pop
2008-12-10 16:26                                         ` Linus Torvalds
2008-12-10 16:52                                           ` Matthew Garrett
2008-12-10 17:13                                             ` Linus Torvalds
2008-12-10 17:33                                           ` Ingo Molnar
2008-12-10 18:41                                             ` Maxim Levitsky
2008-12-20 21:31                                             ` "APIC error on CPU1: 00(40)" during resume Frans Pop
2008-12-21  8:29                                               ` Ingo Molnar
2008-12-23  4:28                                                 ` Len Brown
2008-12-04 22:46                                 ` Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected) Rafael J. Wysocki
2008-12-04 22:40                               ` Rafael J. Wysocki [this message]
2008-12-04 23:22                                 ` Linus Torvalds
2008-12-04 23:45                                   ` Rafael J. Wysocki
2008-12-05  0:07                                     ` Linus Torvalds
2008-12-05  0:20                                       ` Rafael J. Wysocki
2008-12-05  6:55                                     ` Frans Pop
2008-12-04 22:09                             ` Rafael J. Wysocki
2008-12-04 22:20                               ` Linus Torvalds
2008-12-04 23:31                                 ` Rafael J. Wysocki
2008-12-05  0:03                                   ` Linus Torvalds
2008-12-05  0:45                                     ` Linus Torvalds
2008-12-05  1:08                                       ` Rafael J. Wysocki
2008-12-05  1:45                                         ` Linus Torvalds
2008-12-05  2:55                                           ` Linus Torvalds
2008-12-05  3:25                                             ` Linus Torvalds
2008-12-05  6:44                                               ` Frans Pop
2008-12-05  8:27                                                 ` Frans Pop
2008-12-05 12:00                                               ` Rafael J. Wysocki
2008-12-05 15:57                                                 ` Linus Torvalds
2008-12-05 21:32                                                   ` Rafael J. Wysocki
2008-12-05 17:25                                               ` Jesse Barnes
2008-12-02 15:49   ` Rafael J. Wysocki
2008-12-06 14:05 ` [PATCH 0/3] Fix hibernation regression on Toshiba Portege R500 Rafael J. Wysocki
2008-12-06 14:07   ` [PATCH 1/3] PCI: Rework default handling of suspend and resume Rafael J. Wysocki
2008-12-06 14:07     ` Rafael J. Wysocki
2008-12-06 17:07     ` Linus Torvalds
2008-12-06 17:22       ` Rafael J. Wysocki
2008-12-06 17:22       ` Rafael J. Wysocki
2008-12-06 17:33         ` Linus Torvalds
2008-12-06 17:33         ` Linus Torvalds
2008-12-06 17:43           ` Rafael J. Wysocki
2008-12-06 18:00             ` Linus Torvalds
2008-12-06 18:00               ` Linus Torvalds
2008-12-06 21:24               ` Rafael J. Wysocki
2008-12-06 21:24               ` Rafael J. Wysocki
2008-12-07  4:44               ` Jesse Barnes
2008-12-07  4:44               ` Jesse Barnes
2008-12-07  5:41               ` Greg KH
2008-12-07  5:41               ` Greg KH
2008-12-07 12:47                 ` Rafael J. Wysocki
2008-12-07 16:44                   ` Linus Torvalds
2008-12-07 21:02                     ` Rafael J. Wysocki
2008-12-07 21:02                     ` Rafael J. Wysocki
2008-12-07 16:44                   ` Linus Torvalds
2008-12-07 17:26                   ` Greg KH
2008-12-07 17:26                   ` Greg KH
2008-12-07 23:34                     ` [PATCH 1/3] PCI: Rework default handling of suspend and resume (rebased) Rafael J. Wysocki
2008-12-07 23:34                     ` Rafael J. Wysocki
2008-12-07 12:47                 ` [PATCH 1/3] PCI: Rework default handling of suspend and resume Rafael J. Wysocki
2008-12-06 18:30             ` [linux-pm] " Alan Stern
2008-12-06 21:36               ` Rafael J. Wysocki
2008-12-06 21:36                 ` [linux-pm] " Rafael J. Wysocki
2008-12-06 22:24                 ` Linus Torvalds
2008-12-06 23:25                   ` Arjan van de Ven
2008-12-06 23:35                     ` Alan Cox
2008-12-06 23:35                     ` Alan Cox
2008-12-07  6:00                     ` Linus Torvalds
2008-12-07  6:00                     ` [linux-pm] " Linus Torvalds
2008-12-07  6:03                       ` Linus Torvalds
2008-12-07  6:03                       ` [linux-pm] " Linus Torvalds
2008-12-07 13:39                         ` Rafael J. Wysocki
2008-12-07 13:39                         ` [linux-pm] " Rafael J. Wysocki
2008-12-07 16:34                           ` Linus Torvalds
2008-12-14  9:28                             ` Pavel Machek
2008-12-14  9:28                             ` [linux-pm] " Pavel Machek
2008-12-07 16:34                           ` Linus Torvalds
2008-12-07 17:18                           ` Arjan van de Ven
2008-12-07 17:18                           ` [linux-pm] " Arjan van de Ven
2008-12-07  9:44                       ` Takashi Iwai
2008-12-07  9:44                       ` [linux-pm] " Takashi Iwai
2008-12-07 12:30                         ` Rafael J. Wysocki
2008-12-06 23:25                   ` Arjan van de Ven
2008-12-06 22:24                 ` Linus Torvalds
2008-12-07  0:02                 ` [linux-pm] " Alan Stern
2008-12-07 13:14                   ` Rafael J. Wysocki
2008-12-07 13:14                     ` [linux-pm] " Rafael J. Wysocki
2008-12-07  0:02                 ` Alan Stern
2008-12-08 22:13                 ` USB suspend and resume for PCI host controllers Alan Stern
2008-12-06 18:30             ` [PATCH 1/3] PCI: Rework default handling of suspend and resume Alan Stern
2008-12-06 21:09             ` Alan Cox
2008-12-06 21:09               ` Alan Cox
2008-12-06 21:50               ` Rafael J. Wysocki
2008-12-06 21:50               ` Rafael J. Wysocki
2008-12-06 17:43           ` Rafael J. Wysocki
2008-12-06 17:07     ` Linus Torvalds
2008-12-06 14:07   ` [PATCH 2/3] PCI: Suspend and resume PCI Express ports with interrupts disabled Rafael J. Wysocki
2008-12-06 17:15     ` Linus Torvalds
2008-12-06 17:15     ` Linus Torvalds
2008-12-06 17:25       ` Rafael J. Wysocki
2008-12-06 17:38         ` Linus Torvalds
2008-12-06 17:38         ` Linus Torvalds
2008-12-06 17:46           ` Rafael J. Wysocki
2008-12-06 17:46           ` Rafael J. Wysocki
2008-12-07  2:18             ` Jesse Barnes
2008-12-07 12:53               ` Rafael J. Wysocki
2008-12-07 12:53               ` Rafael J. Wysocki
2008-12-07  2:18             ` Jesse Barnes
2008-12-06 17:25       ` Rafael J. Wysocki
2008-12-06 14:07   ` Rafael J. Wysocki
2008-12-06 14:09   ` [PATCH 3/3] Sound (HDA Intel): Restore PCI configuration space with interrupts off Rafael J. Wysocki
2008-12-07  4:45     ` Jesse Barnes
2008-12-07  4:45     ` Jesse Barnes
2008-12-07  9:47       ` Takashi Iwai
2008-12-11  7:07         ` Takashi Iwai
2008-12-11  7:07         ` Takashi Iwai
2008-12-11 20:03           ` Rafael J. Wysocki
2008-12-11 20:03           ` Rafael J. Wysocki
2008-12-11 20:27             ` Takashi Iwai
2008-12-11 20:27             ` Takashi Iwai
2008-12-11 20:38               ` Rafael J. Wysocki
2008-12-12  6:32                 ` Takashi Iwai
2008-12-12  6:32                 ` Takashi Iwai
2008-12-11 20:38               ` Rafael J. Wysocki
2008-12-07  9:47       ` Takashi Iwai
2008-12-06 14:09   ` Rafael J. Wysocki
2008-12-06 19:30   ` [PATCH 0/3] Fix hibernation regression on Toshiba Portege R500 Frans Pop
2008-12-06 19:30   ` Frans Pop
2008-12-06 14:05 ` Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2008-12-02  7:53 Regression from 2.6.26: Hibernation (possibly suspend) broken on Toshiba R500 (bisected) Frans Pop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200812042340.58694.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=akpm@linux-foundation.org \
    --cc=elendil@planet.nl \
    --cc=greg@kroah.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tiwai@suse.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.