linux-tegra.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
       [not found]   ` <1344577046-14847-2-git-send-email-shawn.guo-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
@ 2012-09-06 22:35     ` Stephen Warren
       [not found]       ` <5049252D.9000802-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Warren @ 2012-09-06 22:35 UTC (permalink / raw)
  To: Shawn Guo
  Cc: Kevin Hilman, Nishanth Menon, Mike Turquette, Richard Zhao,
	linux-pm-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Mark Brown,
	cpufreq-u79uwXL29TY76Z2rM5mHXA, Rafael J. Wysocki,
	Shilimkar Santosh, Rob Herring,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Russell King - ARM Linux, Richard Zhao,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 08/09/2012 11:37 PM, Shawn Guo wrote:
> From: Richard Zhao <richard.zhao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> 
> If CONFIG_SMP, cpufreq skips loops_per_jiffy update, because different
> arch has different per-cpu loops_per_jiffy definition.
> 
> Signed-off-by: Richard Zhao <richard.zhao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Acked-by: Russell King <rmk+kernel-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
> Acked-by: Santosh Shilimkar <santosh.shilimkar-l0cyMroinI0@public.gmane.org>
> Signed-off-by: Shawn Guo <shawn.guo-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

I believe this patch is causing issues initializing PCI-Express on Tegra.

In next-20120906, I cold-booted 10 times. 3 times, PCIe initialized OK,
and 7 times, the driver timed out in arch/arm/mach-tegra/pcie.c function
tegra_pcie_check_link(). With this patch reverted, another 10 cold boot
attempts all succeeded just fine. Similarly, the regression appeared in
next-20120905, and I isolated it to arch/arm/kernel/, and this is the
only patch in that directory between next-20120904 and next-20120905.

Do you have any idea what the problem might be?

Looking at the timestamps in dmesg in the failing case, the driver is
waiting the expected (per pcie.c driver code) 1.2 seconds before giving
up on the port, although I suppose if the kernel's idea of real-time is
off, then the dmesg log timestamps might be off too.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
       [not found]       ` <5049252D.9000802-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
@ 2012-09-07  2:58         ` Shawn Guo
       [not found]           ` <20120907025847.GI26709-rvtDTF3kK1ictlrPMvKcciBecyulp+rMXqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Shawn Guo @ 2012-09-07  2:58 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Rafael J. Wysocki, Kevin Hilman, Nishanth Menon, Mike Turquette,
	Russell King - ARM Linux, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Mark Brown,
	cpufreq-u79uwXL29TY76Z2rM5mHXA, Shilimkar Santosh, Rob Herring,
	Richard Zhao, Richard Zhao,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Thu, Sep 06, 2012 at 04:35:25PM -0600, Stephen Warren wrote:
> I believe this patch is causing issues initializing PCI-Express on Tegra.
> 
> In next-20120906, I cold-booted 10 times. 3 times, PCIe initialized OK,
> and 7 times, the driver timed out in arch/arm/mach-tegra/pcie.c function
> tegra_pcie_check_link(). With this patch reverted, another 10 cold boot
> attempts all succeeded just fine. Similarly, the regression appeared in
> next-20120905, and I isolated it to arch/arm/kernel/, and this is the
> only patch in that directory between next-20120904 and next-20120905.
> 
> Do you have any idea what the problem might be?
> 
> Looking at the timestamps in dmesg in the failing case, the driver is
> waiting the expected (per pcie.c driver code) 1.2 seconds before giving
> up on the port, although I suppose if the kernel's idea of real-time is
> off, then the dmesg log timestamps might be off too.

Just for identifying the problem, can you test the following change to
see if it fixes the failure.

Regards,
Shawn

diff --git a/arch/arm/mach-tegra/cpu-tegra.c b/arch/arm/mach-tegra/cpu-tegra.c
index ceb52db..a3bbdc9 100644
--- a/arch/arm/mach-tegra/cpu-tegra.c
+++ b/arch/arm/mach-tegra/cpu-tegra.c
@@ -247,5 +247,5 @@ static void __exit tegra_cpufreq_exit(void)
 MODULE_AUTHOR("Colin Cross <ccross-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org>");
 MODULE_DESCRIPTION("cpufreq driver for Nvidia Tegra2");
 MODULE_LICENSE("GPL");
-module_init(tegra_cpufreq_init);
+late_init(tegra_cpufreq_init);
 module_exit(tegra_cpufreq_exit);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
       [not found]           ` <20120907025847.GI26709-rvtDTF3kK1ictlrPMvKcciBecyulp+rMXqFh9Ls21Oc@public.gmane.org>
@ 2012-09-07 16:55             ` Stephen Warren
  2012-09-10  1:17               ` Shawn Guo
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Warren @ 2012-09-07 16:55 UTC (permalink / raw)
  To: Shawn Guo
  Cc: Rafael J. Wysocki, Kevin Hilman, Nishanth Menon, Mike Turquette,
	Russell King - ARM Linux, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Mark Brown,
	cpufreq-u79uwXL29TY76Z2rM5mHXA, Shilimkar Santosh, Rob Herring,
	Richard Zhao, Richard Zhao,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 09/06/2012 08:58 PM, Shawn Guo wrote:
> On Thu, Sep 06, 2012 at 04:35:25PM -0600, Stephen Warren wrote:
>> I believe this patch is causing issues initializing PCI-Express on Tegra.
>>
>> In next-20120906, I cold-booted 10 times. 3 times, PCIe initialized OK,
>> and 7 times, the driver timed out in arch/arm/mach-tegra/pcie.c function
>> tegra_pcie_check_link(). With this patch reverted, another 10 cold boot
>> attempts all succeeded just fine. Similarly, the regression appeared in
>> next-20120905, and I isolated it to arch/arm/kernel/, and this is the
>> only patch in that directory between next-20120904 and next-20120905.
>>
>> Do you have any idea what the problem might be?
>>
>> Looking at the timestamps in dmesg in the failing case, the driver is
>> waiting the expected (per pcie.c driver code) 1.2 seconds before giving
>> up on the port, although I suppose if the kernel's idea of real-time is
>> off, then the dmesg log timestamps might be off too.
> 
> Just for identifying the problem, can you test the following change to
> see if it fixes the failure.

Yes, that does solve the problem (well, with s/late_init/late_initcall/).

As you imply, that change wouldn't help if cpu-tegra.c was built as a
module. So, is this something you can work around in your patch?

Thanks for the quick response.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
  2012-09-07 16:55             ` Stephen Warren
@ 2012-09-10  1:17               ` Shawn Guo
       [not found]                 ` <20120910011741.GP26709-rvtDTF3kK1ictlrPMvKcciBecyulp+rMXqFh9Ls21Oc@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Shawn Guo @ 2012-09-10  1:17 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Rafael J. Wysocki, Kevin Hilman, Nishanth Menon, Mike Turquette,
	Russell King - ARM Linux, linux-pm, devicetree-discuss,
	Mark Brown, cpufreq, Shilimkar Santosh, Rob Herring, Richard Zhao,
	Richard Zhao, linux-arm-kernel, linux-tegra@vger.kernel.org

On Fri, Sep 07, 2012 at 10:55:00AM -0600, Stephen Warren wrote:
> Yes, that does solve the problem (well, with s/late_init/late_initcall/).
> 
> As you imply, that change wouldn't help if cpu-tegra.c was built as a
> module.

I doubt that.  Have you confirmed it with testing?  When you install
module cpu-tegra, the pcie initialization has been done, right?

> So, is this something you can work around in your patch?
> 
Before we head to a solution, we need to identify the cause.

One thing I note is there is no loops_per_jiffy updating in cpu-tegra
driver.  But if I understand correctly, the driver is running on
MP system.  It should update loops_per_jiffy when cpu frequency
changes, right?  I'm worrying about that the timeout in tegra pcie
driver just happens to work with a wrong loops_per_jiffy, but becomes
broken when loops_per_jiffy gets correct.

-- 
Regards,
Shawn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
       [not found]                 ` <20120910011741.GP26709-rvtDTF3kK1ictlrPMvKcciBecyulp+rMXqFh9Ls21Oc@public.gmane.org>
@ 2012-09-10 17:17                   ` Stephen Warren
       [not found]                     ` <504E20BE.8050107-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Warren @ 2012-09-10 17:17 UTC (permalink / raw)
  To: Shawn Guo, Prashant Gaikwad
  Cc: Rafael J. Wysocki, Kevin Hilman, Nishanth Menon, Mike Turquette,
	Russell King - ARM Linux, linux-pm-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Mark Brown,
	cpufreq-u79uwXL29TY76Z2rM5mHXA, Shilimkar Santosh, Rob Herring,
	Richard Zhao, Richard Zhao,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 09/09/2012 07:17 PM, Shawn Guo wrote:
> On Fri, Sep 07, 2012 at 10:55:00AM -0600, Stephen Warren wrote:
>> Yes, that does solve the problem (well, with s/late_init/late_initcall/).
>>
>> As you imply, that change wouldn't help if cpu-tegra.c was built as a
>> module.
> 
> I doubt that.  Have you confirmed it with testing?  When you install
> module cpu-tegra, the pcie initialization has been done, right?

Never mind, the code can't be built as a module anyway.

Aside from that, I misinterpreted your test patch, and thought that it
was moving the Tegra cpufreq driver initialization earlier, but it's
actually moving it later, and in fact by chance after PCIe
initialization, which explains why it solves the issue.

I think the root of the problem is that cpufreq is lowering the CPU
frequency, yet the patch which converted Tegra to the common clock
framework removed the code that actually changes the CPU clock rate. So,
cpufreq thinks the CPU is running at e.g. 216MHz, but it's actually
still running at 1.0GHz, and hence re-calculating the delay loops breaks
things, since delays aren't as long as the system thinks they are. The
variability is due to whether lowering the CPU frequency just happens to
occur before or after the PCIe controller is initialized.

Prashant, are you able to fix the clock driver deficiency within the
next 2-3 days or so (so I can include the fix in the pull requests I
send for 3.7, which need to be sent before the end of the week)? Or, do
we need to disable cpufreq for Tegra because of this?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
       [not found]                     ` <504E20BE.8050107-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
@ 2012-09-11 11:59                       ` Prashant Gaikwad
       [not found]                         ` <504F2789.1070006-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Prashant Gaikwad @ 2012-09-11 11:59 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Shawn Guo, Rafael J. Wysocki, Kevin Hilman, Nishanth Menon,
	Mike Turquette, Russell King - ARM Linux,
	linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
	Mark Brown, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Shilimkar Santosh, Rob Herring, Richard Zhao, Richard Zhao,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Monday 10 September 2012 10:47 PM, Stephen Warren wrote:
> On 09/09/2012 07:17 PM, Shawn Guo wrote:
>> On Fri, Sep 07, 2012 at 10:55:00AM -0600, Stephen Warren wrote:
>>> Yes, that does solve the problem (well, with s/late_init/late_initcall/).
>>>
>>> As you imply, that change wouldn't help if cpu-tegra.c was built as a
>>> module.
>> I doubt that.  Have you confirmed it with testing?  When you install
>> module cpu-tegra, the pcie initialization has been done, right?
> Never mind, the code can't be built as a module anyway.
>
> Aside from that, I misinterpreted your test patch, and thought that it
> was moving the Tegra cpufreq driver initialization earlier, but it's
> actually moving it later, and in fact by chance after PCIe
> initialization, which explains why it solves the issue.
>
> I think the root of the problem is that cpufreq is lowering the CPU
> frequency, yet the patch which converted Tegra to the common clock
> framework removed the code that actually changes the CPU clock rate. So,
> cpufreq thinks the CPU is running at e.g. 216MHz, but it's actually
> still running at 1.0GHz, and hence re-calculating the delay loops breaks
> things, since delays aren't as long as the system thinks they are. The
> variability is due to whether lowering the CPU frequency just happens to
> occur before or after the PCIe controller is initialized.
>
> Prashant, are you able to fix the clock driver deficiency within the
> next 2-3 days or so (so I can include the fix in the pull requests I
> send for 3.7, which need to be sent before the end of the week)? Or, do
> we need to disable cpufreq for Tegra because of this?

Your fix looks good to me except the concern I have mentioned in reply to that patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp
       [not found]                         ` <504F2789.1070006-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2012-09-11 15:16                           ` Stephen Warren
  0 siblings, 0 replies; 7+ messages in thread
From: Stephen Warren @ 2012-09-11 15:16 UTC (permalink / raw)
  To: Prashant Gaikwad
  Cc: Shawn Guo, Rafael J. Wysocki, Kevin Hilman, Nishanth Menon,
	Mike Turquette, Russell King - ARM Linux,
	linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
	Mark Brown, cpufreq-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Shilimkar Santosh, Rob Herring, Richard Zhao, Richard Zhao,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On 09/11/2012 05:59 AM, Prashant Gaikwad wrote:
> On Monday 10 September 2012 10:47 PM, Stephen Warren wrote:
>> On 09/09/2012 07:17 PM, Shawn Guo wrote:
>>> On Fri, Sep 07, 2012 at 10:55:00AM -0600, Stephen Warren wrote:
>>>> Yes, that does solve the problem (well, with
>>>> s/late_init/late_initcall/).
>>>>
>>>> As you imply, that change wouldn't help if cpu-tegra.c was built as a
>>>> module.
>>> I doubt that.  Have you confirmed it with testing?  When you install
>>> module cpu-tegra, the pcie initialization has been done, right?
>> Never mind, the code can't be built as a module anyway.
>>
>> Aside from that, I misinterpreted your test patch, and thought that it
>> was moving the Tegra cpufreq driver initialization earlier, but it's
>> actually moving it later, and in fact by chance after PCIe
>> initialization, which explains why it solves the issue.
>>
>> I think the root of the problem is that cpufreq is lowering the CPU
>> frequency, yet the patch which converted Tegra to the common clock
>> framework removed the code that actually changes the CPU clock rate. So,
>> cpufreq thinks the CPU is running at e.g. 216MHz, but it's actually
>> still running at 1.0GHz, and hence re-calculating the delay loops breaks
>> things, since delays aren't as long as the system thinks they are. The
>> variability is due to whether lowering the CPU frequency just happens to
>> occur before or after the PCIe controller is initialized.
>>
>> Prashant, are you able to fix the clock driver deficiency within the
>> next 2-3 days or so (so I can include the fix in the pull requests I
>> send for 3.7, which need to be sent before the end of the week)? Or, do
>> we need to disable cpufreq for Tegra because of this?
> 
> Your fix looks good to me except the concern I have mentioned in reply
> to that patch.

Well, the cpufreq driver will need explicit knowledge of
Tega20-vs-Tegra30 anyway, due to e.g. differing CPU clock names,
probable different latencies, etc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-09-11 15:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1344577046-14847-1-git-send-email-shawn.guo@linaro.org>
     [not found] ` <1344577046-14847-2-git-send-email-shawn.guo@linaro.org>
     [not found]   ` <1344577046-14847-2-git-send-email-shawn.guo-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2012-09-06 22:35     ` [PATCH v3 1/3] ARM: add cpufreq transiton notifier to adjust loops_per_jiffy for smp Stephen Warren
     [not found]       ` <5049252D.9000802-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2012-09-07  2:58         ` Shawn Guo
     [not found]           ` <20120907025847.GI26709-rvtDTF3kK1ictlrPMvKcciBecyulp+rMXqFh9Ls21Oc@public.gmane.org>
2012-09-07 16:55             ` Stephen Warren
2012-09-10  1:17               ` Shawn Guo
     [not found]                 ` <20120910011741.GP26709-rvtDTF3kK1ictlrPMvKcciBecyulp+rMXqFh9Ls21Oc@public.gmane.org>
2012-09-10 17:17                   ` Stephen Warren
     [not found]                     ` <504E20BE.8050107-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2012-09-11 11:59                       ` Prashant Gaikwad
     [not found]                         ` <504F2789.1070006-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2012-09-11 15:16                           ` Stephen Warren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).