* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
[not found] ` <CAMAWPa98kyaUJpt=GLw4cH7fkg+8BkrA0kds=H-2-4fWXy_tqw@mail.gmail.com>
@ 2014-12-16 21:03 ` Nishanth Menon
2014-12-17 4:07 ` Viresh Kumar
0 siblings, 1 reply; 11+ messages in thread
From: Nishanth Menon @ 2014-12-16 21:03 UTC (permalink / raw)
To: Kevin Hilman, Tony Lindgren, Viresh Kumar
Cc: linux-omap, linux-pm@vger.kernel.org
+Viresh and linux-pm
On 12/16/2014 10:59 AM, Kevin Hilman wrote:
> FYI... New boot failures in today's linux-next for omap4-panda-es and
> omap5-uevm:
>
> http://status.armcloud.us/boot/?lab-khilman&fail&omap
http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html
[ 2.071996] ------------[ cut here ]------------
[ 2.076831] kernel BUG at ../drivers/cpufreq/cpufreq.c:1258!
[ 2.082753] Internal error: Oops - BUG: 0 [#1] SMP ARM
http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap4-panda-es.html
[ 2.109588] ------------[ cut here ]------------
[ 2.114410] kernel BUG at ../drivers/cpufreq/cpufreq.c:1258!
[ 2.120330] Internal error: Oops - BUG: 0 [#1] SMP ARM
>
> The omap3-beagle-xm one has been fixed by Tero, but the fix hasn't hit
> -next yet. I haven't had a chance to bisect these yet.
>
> Kevin
>
>
> ---------- Forwarded message ----------
> From: Kevin's boot bot <khilman@kernel.org>
> Date: Mon, Dec 15, 2014 at 10:28 PM
> Subject: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
> To: kernel-build-reports@lists.linaro.org
>
>
> Full Build report: http://status.armcloud.us/build/next/kernel/next-20141216/
> Full Boot report:
> http://status.armcloud.us/boot/all/job/next/kernel/next-20141216/
>
> Tree/Branch: next
> Git describe: next-20141216
>
> Failed boot tests
> =================
> emev2-kzm9d: FAIL: arm-shmobile_defconfig
> u-boot: ERROR: timeout getting
> DHCP address.
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-shmobile_defconfig/lab-khilman/boot-emev2-kzm9d.html
> exynos5420-arndale-octa: FAIL:
> arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y
> kernel: ERROR: failed to boot:
> <class 'pexpect.TIMEOUT'>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-khilman/boot-exynos5420-arndale-octa.html
> exynos5800-peach-pi: FAIL:
> arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y
> kernel: ERROR: failed to boot:
> <class 'pexpect.TIMEOUT'>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-khilman/boot-exynos5800-peach-pi.html
> omap5-uevm: FAIL:
> arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y
> kernel: Unable to handle kernel
> paging request at virtual address ffffffec
>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y/lab-khilman/boot-omap5-uevm.html
> exynos5800-peach-pi: FAIL: arm-multi_v7_defconfig
> kernel: ERROR: failed to boot:
> <class 'pexpect.TIMEOUT'>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-exynos5800-peach-pi.html
> exynos5420-arndale-octa: FAIL: arm-multi_v7_defconfig
> kernel: ERROR: failed to boot:
> <class 'pexpect.TIMEOUT'>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-exynos5420-arndale-octa.html
> omap5-uevm: FAIL: arm-multi_v7_defconfig
> kernel: Unable to handle kernel
> paging request at virtual address ffffffec
>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html
> omap4-panda-es: FAIL: arm-multi_v7_defconfig
> kernel: ERROR: failed to boot:
> Kernel panic
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap4-panda-es.html
> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig
> kernel: ERROR: failed to boot:
> <class 'pexpect.TIMEOUT'>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-exynos_defconfig/lab-khilman/boot-exynos5420-arndale-octa.html
> exynos5800-peach-pi: FAIL: arm-exynos_defconfig
> kernel: ERROR: failed to boot:
> <class 'pexpect.TIMEOUT'>
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-exynos_defconfig/lab-khilman/boot-exynos5800-peach-pi.html
> exynos5410-odroid-xu: FAIL: arm-exynos_defconfig
> ERROR: Timeout waiting for
> command: if test -n ${preboot}; then run preboot; fi.
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-exynos_defconfig/lab-khilman/boot-exynos5410-odroid-xu.html
> omap3-beagle-xm,legacy: FAIL: arm-omap2plus_defconfig
> kernel: ERROR: did not start booting.
>
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-omap2plus_defconfig/lab-khilman/boot-omap3-beagle-xm,legacy.html
>
> Full Report
> ===========
>
> arm-shmobile_defconfig
> ----------------------
> emev2-kzm9d: FAIL - u-boot: ERROR: timeout
> getting DHCP address.
>
> arm-davinci_all_defconfig
> -------------------------
> dm365evm,legacy: PASS
> da850-evm: PASS
>
> arm-tegra_defconfig
> -------------------
> tegra124-jetson-tk1: PASS
> tegra30-beaver: PASS
>
> arm-bcm2835_defconfig
> ---------------------
> bcm2835-rpi: PASS
>
> arm-multi_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y
> ----------------------------------------------
> armada-xp-openblocks-ax3-4: PASS
> armada-370-mirabox: PASS
>
> arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y
> ----------------------------------------
> tegra124-jetson-tk1: PASS
> exynos5410-odroid-xu: PASS
> sun7i-a20-cubieboard2: PASS
> exynos5420-arndale-octa: FAIL - kernel: ERROR: failed
> to boot: <class 'pexpect.TIMEOUT'>
> armada-xp-openblocks-ax3-4: PASS
> exynos5800-peach-pi: FAIL - kernel: ERROR: failed
> to boot: <class 'pexpect.TIMEOUT'>
> sun7i-a20-bananapi: PASS
> omap5-uevm: FAIL (Warnings: 1) - kernel:
> Unable to handle kernel paging request at virtual address ffffffec
>
> exynos5422-odroid-xu3: PASS (Warnings: 1)
> exynos5250-arndale: PASS
> rk3288-evb-rk808: PASS
> vexpress-v2p-ca15: PASS
>
> arm-imx_v6_v7_defconfig
> -----------------------
> imx6dl-wandboard,wand-dual: PASS
> imx6dl-wandboard,wand-solo: PASS
> imx6q-wandboard: PASS
>
> arm-versatile_defconfig
> -----------------------
> versatilepb,legacy: PASS
>
> arm-multi_v7_defconfig
> ----------------------
> am335x-boneblack: PASS
> qcom-msm8974-sony-xperia-honami: PASS
> armada-370-mirabox: PASS
> sun4i-a10-cubieboard: PASS
> omap3-overo-tobi: PASS
> am335x-bone: PASS
> sun7i-a20-bananapi: PASS
> tegra124-jetson-tk1: PASS
> omap3-beagle-xm: PASS
> sun7i-a20-cubieboard2: PASS
> exynos5410-odroid-xu: PASS
> omap4-panda: PASS
> imx6q-wandboard: PASS
> imx6dl-wandboard,wand-dual: PASS
> omap3-beagle: PASS
> ste-snowball: PASS
> tegra30-beaver: PASS
> omap3-n900: PASS
> qcom-apq8074-dragonboard: PASS
> qcom-apq8064-cm-qs600: PASS
> bcm28155-ap: PASS
> exynos5800-peach-pi: FAIL - kernel: ERROR: failed
> to boot: <class 'pexpect.TIMEOUT'>
> imx6dl-wandboard,wand-solo: PASS
> omap3-overo-storm-tobi: PASS
> exynos5422-odroid-xu3: PASS (Warnings: 1)
> rk3288-evb-rk808: PASS
> qcom-apq8064-ifc6410: PASS
> stih410-b2120: PASS
> exynos5420-arndale-octa: FAIL - kernel: ERROR: failed
> to boot: <class 'pexpect.TIMEOUT'>
> armada-xp-openblocks-ax3-4: PASS
> vexpress-v2p-ca9: PASS
> vexpress-v2p-ca15: PASS
> omap5-uevm: FAIL (Warnings: 1) - kernel:
> Unable to handle kernel paging request at virtual address ffffffec
>
> omap4-panda-es: FAIL (Warnings: 1) - kernel:
> ERROR: failed to boot: Kernel panic
> exynos5250-arndale: PASS
> zynq-zc702: PASS
> am437x-gp-evm: PASS (Warnings: 1)
>
> arm-vexpress_defconfig
> ----------------------
> vexpress-v2p-ca9: PASS
> vexpress-v2p-ca15: PASS
>
> arm-sunxi_defconfig
> -------------------
> sun7i-a20-bananapi: PASS
> sun7i-a20-cubieboard2: PASS
> sun4i-a10-cubieboard: PASS
>
> arm-qcom_defconfig
> ------------------
> qcom-apq8064-cm-qs600: PASS
> qcom-apq8064-ifc6410: PASS
> qcom-msm8974-sony-xperia-honami: PASS
> qcom-apq8074-dragonboard: PASS
>
> arm-u8500_defconfig
> -------------------
> ste-snowball: PASS
>
> arm-exynos_defconfig
> --------------------
> exynos5420-arndale-octa: FAIL (Warnings: 1) - kernel:
> ERROR: failed to boot: <class 'pexpect.TIMEOUT'>
> exynos5422-odroid-xu3: PASS (Warnings: 1)
> exynos5250-arndale: PASS
> exynos5800-peach-pi: FAIL - kernel: ERROR: failed
> to boot: <class 'pexpect.TIMEOUT'>
> exynos5410-odroid-xu: FAIL - ERROR: Timeout waiting
> for command: if test -n ${preboot}; then run preboot; fi.
>
> arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y
> ----------------------------------------------
> armada-xp-openblocks-ax3-4: PASS
> armada-370-mirabox: PASS
>
> arm-bcm_defconfig
> -----------------
> bcm28155-ap: PASS
>
> arm-omap2plus_defconfig
> -----------------------
> omap4-panda: PASS
> omap3-overo-storm-tobi: PASS
> omap3-beagle-xm: PASS
> omap3-n900: PASS
> omap3-beagle-xm,legacy: FAIL - kernel: ERROR: did not
> start booting.
> am335x-bone: PASS
> am335x-boneblack: PASS
> omap3-n900,legacy: PASS (Warnings: 3)
> omap3-overo-tobi: PASS
> omap3-beagle,legacy: PASS (Warnings: 2)
> omap3-overo-tobi,legacy: PASS (Warnings: 2)
> omap3-overo-storm-tobi,legacy: PASS (Warnings: 1)
> omap5-uevm: PASS (Warnings: 1)
> omap4-panda-es: PASS
> am437x-gp-evm: PASS
> omap3-beagle: PASS
>
> arm-sama5_defconfig
> -------------------
> sama5d35ek: PASS
> at91-sama5d3_xplained: PASS
>
> arm-mvebu_v7_defconfig
> ----------------------
> armada-xp-openblocks-ax3-4: PASS
> armada-370-mirabox: PASS
>
> arm64-defconfig+CONFIG_OF_UNITTEST=y
> ------------------------------------
> qemu-aarch64,legacy: PASS
>
> arm64-defconfig
> ---------------
> qemu-aarch64,legacy: PASS
>
> _______________________________________________
> Kernel-build-reports mailing list
> Kernel-build-reports@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/kernel-build-reports
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-16 21:03 ` Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216) Nishanth Menon
@ 2014-12-17 4:07 ` Viresh Kumar
2014-12-17 15:28 ` Nishanth Menon
2014-12-17 17:16 ` Kevin Hilman
0 siblings, 2 replies; 11+ messages in thread
From: Viresh Kumar @ 2014-12-17 4:07 UTC (permalink / raw)
To: Nishanth Menon
Cc: Kevin Hilman, Tony Lindgren, linux-omap, linux-pm@vger.kernel.org
On 17 December 2014 at 02:33, Nishanth Menon <nm@ti.com> wrote:
> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html
> [ 2.071996] ------------[ cut here ]------------
> [ 2.076831] kernel BUG at ../drivers/cpufreq/cpufreq.c:1258!
> [ 2.082753] Internal error: Oops - BUG: 0 [#1] SMP ARM
This is what we have hit:
if ((cpufreq_driver->flags & CPUFREQ_NEED_INITIAL_FREQ_CHECK)
&& has_target()) {
/* Are we running at unknown frequency ? */
ret = cpufreq_frequency_table_get_index(policy, policy->cur);
if (ret == -EINVAL) {
/* Warn user and fix it */
pr_warn("%s: CPU%d: Running at unlisted freq: %u KHz\n",
__func__, policy->cpu, policy->cur);
ret = __cpufreq_driver_target(policy, policy->cur - 1,
CPUFREQ_RELATION_L);
/*
* Reaching here after boot in a few seconds may not
* mean that system will remain stable at "unknown"
* frequency for longer duration. Hence, a BUG_ON().
*/
BUG_ON(ret); /********* We have hit
this one *******/
pr_warn("%s: CPU%d: Unlisted initial frequency
changed to: %u KHz\n",
__func__, policy->cpu, policy->cur);
}
}
So the SoC was running on unlisted frequency and when we tried to
change to some other valid (listed) frequency, we failed.
The comment over it describes why it is a BUG.. Its some SoC issue
and need to be resolved by somebody with a board.
So, in short __cpufreq_driver_target() failed to change freq..
--
viresh
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 4:07 ` Viresh Kumar
@ 2014-12-17 15:28 ` Nishanth Menon
2014-12-17 16:27 ` Viresh Kumar
2014-12-17 17:16 ` Kevin Hilman
1 sibling, 1 reply; 11+ messages in thread
From: Nishanth Menon @ 2014-12-17 15:28 UTC (permalink / raw)
To: Viresh Kumar
Cc: Kevin Hilman, Tony Lindgren, linux-omap, linux-pm@vger.kernel.org
On 09:37-20141217, Viresh Kumar wrote:
> On 17 December 2014 at 02:33, Nishanth Menon <nm@ti.com> wrote:
> > http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html
> > [ 2.071996] ------------[ cut here ]------------
> > [ 2.076831] kernel BUG at ../drivers/cpufreq/cpufreq.c:1258!
> > [ 2.082753] Internal error: Oops - BUG: 0 [#1] SMP ARM
>
> This is what we have hit:
>
> if ((cpufreq_driver->flags & CPUFREQ_NEED_INITIAL_FREQ_CHECK)
> && has_target()) {
> /* Are we running at unknown frequency ? */
> ret = cpufreq_frequency_table_get_index(policy, policy->cur);
> if (ret == -EINVAL) {
> /* Warn user and fix it */
> pr_warn("%s: CPU%d: Running at unlisted freq: %u KHz\n",
> __func__, policy->cpu, policy->cur);
> ret = __cpufreq_driver_target(policy, policy->cur - 1,
> CPUFREQ_RELATION_L);
>
> /*
> * Reaching here after boot in a few seconds may not
> * mean that system will remain stable at "unknown"
> * frequency for longer duration. Hence, a BUG_ON().
> */
> BUG_ON(ret); /********* We have hit
> this one *******/
> pr_warn("%s: CPU%d: Unlisted initial frequency
> changed to: %u KHz\n",
> __func__, policy->cpu, policy->cur);
> }
> }
>
>
> So the SoC was running on unlisted frequency and when we tried to
> change to some other valid (listed) frequency, we failed.
>
> The comment over it describes why it is a BUG.. Its some SoC issue
> and need to be resolved by somebody with a board.
>
> So, in short __cpufreq_driver_target() failed to change freq..
I still do not see the need to crash the entire system - OK, fine
cpufreq is broke, but the remaining part of the system can easily
function. That BUG does look like a ugly point and lack of proper
cleanup logic - cpufreq should be expected to report and gracefully
shut itself down, not screw up my platform boot.
For that matter we have had cpufreq working on TI platforms for years
now. I will eventually find time to track the issue down if no one else
beats me to it, but it kinda indicates that things are probably starting
to bitrot...
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 15:28 ` Nishanth Menon
@ 2014-12-17 16:27 ` Viresh Kumar
2014-12-17 16:43 ` Nishanth Menon
0 siblings, 1 reply; 11+ messages in thread
From: Viresh Kumar @ 2014-12-17 16:27 UTC (permalink / raw)
To: Nishanth Menon
Cc: Kevin Hilman, Tony Lindgren, linux-omap, linux-pm@vger.kernel.org
On 17 December 2014 at 20:58, Nishanth Menon <nm@ti.com> wrote:
> I still do not see the need to crash the entire system - OK, fine
> cpufreq is broke, but the remaining part of the system can easily
> function. That BUG does look like a ugly point and lack of proper
> cleanup logic - cpufreq should be expected to report and gracefully
> shut itself down, not screw up my platform boot.
http://lists.linaro.org/pipermail/linaro-kernel/2013-November/009128.html
We came to this conclusion because you insisted that its not safe
for the system to continue on a unsupported frequency from kernel's
freq table. It may run well, but we don't know what will happen in
longer run..
And surely it deserves a bug-on then.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 16:27 ` Viresh Kumar
@ 2014-12-17 16:43 ` Nishanth Menon
2014-12-18 1:54 ` Viresh Kumar
0 siblings, 1 reply; 11+ messages in thread
From: Nishanth Menon @ 2014-12-17 16:43 UTC (permalink / raw)
To: Viresh Kumar
Cc: Kevin Hilman, Tony Lindgren, linux-omap, linux-pm@vger.kernel.org
On Wed, Dec 17, 2014 at 10:27 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 17 December 2014 at 20:58, Nishanth Menon <nm@ti.com> wrote:
>> I still do not see the need to crash the entire system - OK, fine
>> cpufreq is broke, but the remaining part of the system can easily
>> function. That BUG does look like a ugly point and lack of proper
>> cleanup logic - cpufreq should be expected to report and gracefully
>> shut itself down, not screw up my platform boot.
>
> http://lists.linaro.org/pipermail/linaro-kernel/2013-November/009128.html
>
> We came to this conclusion because you insisted that its not safe
> for the system to continue on a unsupported frequency from kernel's
> freq table. It may run well, but we don't know what will happen in
> longer run..
>
I do realize that i did have different opinion given bootloader screw
ups. Given that we have discovered a potentially bad configuration (in
this case for some reason almost ALL TI platforms "have bad
configuration" - could be due to recent clock code changes or what
ever), just killing boot does not make sense to me as purely
bootloader being the cause may not always be the case for that path to
go wrong.
---
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 4:07 ` Viresh Kumar
2014-12-17 15:28 ` Nishanth Menon
@ 2014-12-17 17:16 ` Kevin Hilman
2014-12-17 19:11 ` Kevin Hilman
2014-12-18 2:01 ` Viresh Kumar
1 sibling, 2 replies; 11+ messages in thread
From: Kevin Hilman @ 2014-12-17 17:16 UTC (permalink / raw)
To: Viresh Kumar
Cc: Nishanth Menon, Kevin Hilman, Tony Lindgren, linux-omap,
linux-pm@vger.kernel.org, Tero Kristo
[+ Tero]
On Tue, Dec 16, 2014 at 8:07 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 17 December 2014 at 02:33, Nishanth Menon <nm@ti.com> wrote:
>> http://storage.armcloud.us/kernel-ci/next/next-20141216/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html
>> [ 2.071996] ------------[ cut here ]------------
>> [ 2.076831] kernel BUG at ../drivers/cpufreq/cpufreq.c:1258!
>> [ 2.082753] Internal error: Oops - BUG: 0 [#1] SMP ARM
>
[...]
> So the SoC was running on unlisted frequency and when we tried to
> change to some other valid (listed) frequency, we failed.
>
> The comment over it describes why it is a BUG.. Its some SoC issue
> and need to be resolved by somebody with a board.
>
> So, in short __cpufreq_driver_target() failed to change freq..
So this looks like a bug that has been hiding, but just exposed
because cpufreq-cpu0 (now cpufreq-dt) was not getting built-in since
before v3.18.
On omap4-panda-es, v3.18 with multi_v7_defconfig + CPUFREQ_DT enabled,
I see this:
[ 2.062103] cpufreq: __cpufreq_add_dev: CPU0: Running at unlisted
freq: 699977 KHz
[ 2.070404] cpufreq: __cpufreq_add_dev: CPU0: Unlisted initial
frequency changed to: 700000 KHz
No BUG. But, in next-20141216,
[ 2.083953] cpufreq: __cpufreq_add_dev: CPU0: Running at unlisted
freq: 699977 KHz
[ 2.091949] cpu cpu0: failed to set clock rate: -22
[ 2.097045] cpufreq: __target_index: Failed to change cpu frequency: -22
And then the BUG.
So the BUG() itself isn't the problem with this regression. There's
been a fair amount of changes in the OMAP clk driver (including some
other regressions), so I suspect the culprit to be lying somewhere in
the recent OMAP clock changes.
Kevin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 17:16 ` Kevin Hilman
@ 2014-12-17 19:11 ` Kevin Hilman
2014-12-18 2:01 ` Viresh Kumar
1 sibling, 0 replies; 11+ messages in thread
From: Kevin Hilman @ 2014-12-17 19:11 UTC (permalink / raw)
To: Kevin Hilman
Cc: Viresh Kumar, Nishanth Menon, Tony Lindgren, linux-omap,
linux-pm@vger.kernel.org, Tero Kristo, Mike Turquette
On Wed, Dec 17, 2014 at 9:16 AM, Kevin Hilman <khilman@kernel.org> wrote:
[...]
> So the BUG() itself isn't the problem with this regression. There's
> been a fair amount of changes in the OMAP clk driver (including some
> other regressions), so I suspect the culprit to be lying somewhere in
> the recent OMAP clock changes.
So I attempted to bisect this down, and while it's pinpointing the
problem to the clk-next branch, it's not that simple I bisected this
on next-20141216, which is where it first showed up, and the bisect
reported: e03f3bb62ca8a1124bc408046c50aed7629b24cc is the first bad
commit.
That commit is where clk-next is merged into linux-next. What's
intersting is that testing clk-next by itself was just fine, and
linux-next before merging clk-next was just fine. The bisection here
is complicated because OMAP clock-related changes when in through
drivers/clk and through arch/arm/mach-omap2, so debugging this down
further is going to be a more manual effort. And one I will leave for
someone else.
Kevin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 16:43 ` Nishanth Menon
@ 2014-12-18 1:54 ` Viresh Kumar
0 siblings, 0 replies; 11+ messages in thread
From: Viresh Kumar @ 2014-12-18 1:54 UTC (permalink / raw)
To: Nishanth Menon
Cc: Kevin Hilman, Tony Lindgren, linux-omap, linux-pm@vger.kernel.org
On 17 December 2014 at 22:13, Nishanth Menon <nm@ti.com> wrote:
> I do realize that i did have different opinion given bootloader screw
> ups. Given that we have discovered a potentially bad configuration (in
> this case for some reason almost ALL TI platforms "have bad
> configuration" - could be due to recent clock code changes or what
> ever), just killing boot does not make sense to me as purely
> bootloader being the cause may not always be the case for that path to
> go wrong.
Sorry, I still *disagree*.
Its not about bootloaders getting screwed up.. Yes, bootloader was indeed
running at a frequency not listed in kernel. But we *tried* our best to get
that resolved and we failed to change frequency.
We *shouldn't* allow kernel to move a inch further. It might be dangerous
for the platform to continue further as we are talking about the almighty
CPU. So, the bug is still the right way forward, atleast for me.
--
viresh
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-17 17:16 ` Kevin Hilman
2014-12-17 19:11 ` Kevin Hilman
@ 2014-12-18 2:01 ` Viresh Kumar
2014-12-18 22:37 ` Nishanth Menon
1 sibling, 1 reply; 11+ messages in thread
From: Viresh Kumar @ 2014-12-18 2:01 UTC (permalink / raw)
To: Kevin Hilman
Cc: Nishanth Menon, Tony Lindgren, linux-omap,
linux-pm@vger.kernel.org, Tero Kristo
On 17 December 2014 at 22:46, Kevin Hilman <khilman@kernel.org> wrote:
> So this looks like a bug that has been hiding, but just exposed
> because cpufreq-cpu0 (now cpufreq-dt) was not getting built-in since
> before v3.18.
>
> On omap4-panda-es, v3.18 with multi_v7_defconfig + CPUFREQ_DT enabled,
> I see this:
>
> [ 2.062103] cpufreq: __cpufreq_add_dev: CPU0: Running at unlisted
> freq: 699977 KHz
> [ 2.070404] cpufreq: __cpufreq_add_dev: CPU0: Unlisted initial
> frequency changed to: 700000 KHz
>
> No BUG. But, in next-20141216,
>
> [ 2.083953] cpufreq: __cpufreq_add_dev: CPU0: Running at unlisted
> freq: 699977 KHz
> [ 2.091949] cpu cpu0: failed to set clock rate: -22
> [ 2.097045] cpufreq: __target_index: Failed to change cpu frequency: -22
>
> And then the BUG.
>
> So the BUG() itself isn't the problem with this regression. There's
> been a fair amount of changes in the OMAP clk driver (including some
> other regressions), so I suspect the culprit to be lying somewhere in
> the recent OMAP clock changes.
Yeah. I agree..
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-18 2:01 ` Viresh Kumar
@ 2014-12-18 22:37 ` Nishanth Menon
2014-12-18 22:51 ` Kevin Hilman
0 siblings, 1 reply; 11+ messages in thread
From: Nishanth Menon @ 2014-12-18 22:37 UTC (permalink / raw)
To: Viresh Kumar, Kevin Hilman
Cc: Tony Lindgren, linux-omap, linux-pm@vger.kernel.org, Tero Kristo
On 12/17/2014 08:01 PM, Viresh Kumar wrote:
> On 17 December 2014 at 22:46, Kevin Hilman <khilman@kernel.org> wrote:
>> So this looks like a bug that has been hiding, but just exposed
>> because cpufreq-cpu0 (now cpufreq-dt) was not getting built-in since
>> before v3.18.
>>
>> On omap4-panda-es, v3.18 with multi_v7_defconfig + CPUFREQ_DT enabled,
>> I see this:
>>
>> [ 2.062103] cpufreq: __cpufreq_add_dev: CPU0: Running at unlisted
>> freq: 699977 KHz
>> [ 2.070404] cpufreq: __cpufreq_add_dev: CPU0: Unlisted initial
>> frequency changed to: 700000 KHz
>>
>> No BUG. But, in next-20141216,
>>
>> [ 2.083953] cpufreq: __cpufreq_add_dev: CPU0: Running at unlisted
>> freq: 699977 KHz
>> [ 2.091949] cpu cpu0: failed to set clock rate: -22
>> [ 2.097045] cpufreq: __target_index: Failed to change cpu frequency: -22
>>
>> And then the BUG.
>>
>> So the BUG() itself isn't the problem with this regression. There's
>> been a fair amount of changes in the OMAP clk driver (including some
>> other regressions), so I suspect the culprit to be lying somewhere in
>> the recent OMAP clock changes.
>
> Yeah. I agree..
>
https://git.linaro.org/people/mike.turquette/linux.git/commit/6f8e853d18a98ee95832ffebfaa288d42ae28cd5
Finally makes it work.
build warnings actually did give an indication of the issue at hand..
> next-20141216
> arch/arm/mach-omap2/cclock3xxx_data.c:262:2: warning: initialization from incompatible pointer type [enabled by default]
> arch/arm/mach-omap2/cclock3xxx_data.c:262:2: warning: (near initialization for ‘dpll1_ck_ops.determine_rate’) [enabled by default]
> arch/arm/mach-omap2/cclock3xxx_data.c:375:2: warning: initialization from incompatible pointer type [enabled by default]
> arch/arm/mach-omap2/cclock3xxx_data.c:375:2: warning: (near initialization for ‘dpll4_ck_ops.determine_rate’) [enabled by default]
> drivers/clk/ti/dpll.c:38:2: warning: initialization from incompatible pointer type [enabled by default]
> drivers/clk/ti/dpll.c:38:2: warning: (near initialization for ‘dpll_m4xen_ck_ops.determine_rate’) [enabled by default]
> drivers/clk/ti/dpll.c:61:2: warning: initialization from incompatible pointer type [enabled by default]
> drivers/clk/ti/dpll.c:61:2: warning: (near initialization for ‘dpll_ck_ops.determine_rate’) [enabled by default]
> drivers/clk/ti/dpll.c:72:2: warning: initialization from incompatible pointer type [enabled by default]
> drivers/clk/ti/dpll.c:72:2: warning: (near initialization for ‘dpll_no_gate_ck_ops.determine_rate’) [enabled by default]
> drivers/clk/ti/dpll.c:111:2: warning: initialization from incompatible pointer type [enabled by default]
> drivers/clk/ti/dpll.c:111:2: warning: (near initialization for ‘omap3_dpll_ck_ops.determine_rate’) [enabled by default]
> drivers/clk/ti/dpll.c:123:2: warning: initialization from incompatible pointer type [enabled by default]
> drivers/clk/ti/dpll.c:123:2: warning: (near initialization for ‘omap3_dpll_per_ck_ops.determine_rate’) [enabled by default]
As of next-20141218 things seem to have settled down a bit.
next-20141218 + https://patchwork.kernel.org/patch/5484401/
Various platforms I have access to looks like the following: it is a
pretty simple script, and I am still getting my remote farm to work
properly when scripts are downloaded over serial port.. but anyways..
> next-20141218
> 1: am335x-evm: BOOT: PASS: err=10 warn=25, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s2BvPhDIrP
> 2: am335x-sk: BOOT: PASS: err=9 warn=26, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s21kzAb5L3
> 3: am3517-evm: BOOT: FAIL: http://slexy.org/raw/s202ZptHTS (script download failed + a warning has popped up for OPP)
> 4: am37x-evm: BOOT: FAIL: http://slexy.org/raw/s20pvU2Nl2 (script download fail)
> 5: am437x-sk: BOOT: PASS: crit=2 err=13 warn=57, CPUFreq: N/A, CPUIdle: N/A: http://slexy.org/raw/s2hKPxe9YG
> 6: am43xx-epos: BOOT: PASS: crit=2 err=16 warn=58, CPUFreq: N/A, CPUIdle: N/A: http://slexy.org/raw/s2R6OoHCBj
> 7: am43xx-gpevm: BOOT: PASS: crit=2 err=13 warn=57, CPUFreq: N/A, CPUIdle: N/A: http://slexy.org/raw/s20fG9U1ZL
> 8: BeagleBoard-X15(am57xx-evm): BOOT: PASS: err=20 warn=24, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s2qzlorZWB
> 9: BeagleBoard-XM: BOOT: PASS: err=9 warn=19, CPUFreq: PASS, CPUIdle: FAIL: http://slexy.org/raw/s2OzLVwqPy
> 10: beagleboard-vanilla: BOOT: PASS: err=9 warn=25, CPUFreq: PASS, CPUIdle: FAIL: http://slexy.org/raw/s21EMHNZVl
> 11: beaglebone-black: BOOT: PASS: err=8 warn=25, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s2quawv4Qe
> 12: beaglebone: BOOT: PASS: err=9 warn=20, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s2iCPBeBSA
> 13: craneboard: BOOT: PASS: err=21 warn=93, CPUFreq: N/A, CPUIdle: N/A: http://slexy.org/raw/s21Mshl31r
> 14: dra72x-evm: BOOT: PASS: crit=2 err=14 warn=27, CPUFreq: N/A, CPUIdle: N/A: http://slexy.org/raw/s2JiZnJlRL
> 15: dra7xx-evm: BOOT: PASS: err=11 warn=31, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s20g14WWfL
> 16: OMAP3430-Labrador(LDP): BOOT: PASS: err=7 warn=25, CPUFreq: PASS, CPUIdle: FAIL: http://slexy.org/raw/s2PZR2j1Kw
> 17: n900: BOOT: FAIL: http://slexy.org/raw/s2lHkDllfG (I have been seeing this for a while - not reproducible on Tony's setup.. not a regression)
> 18: omap5-evm: BOOT: PASS: err=15 warn=24, CPUFreq: PASS, CPUIdle: N/A: http://slexy.org/raw/s21GJKUu8d
> 19: pandaboard-es: BOOT: PASS: err=20 warn=33, CPUFreq: PASS, CPUIdle: PASS: http://slexy.org/raw/s21X5eWQnW
> 20: pandaboard-vanilla: BOOT: PASS: err=20 warn=31, CPUFreq: PASS, CPUIdle: PASS: http://slexy.org/raw/s21HadWaKr
> 21: sdp2430: BOOT: FAIL: http://slexy.org/raw/s284xHOf52 (script download fail)
> 22: sdp3430: BOOT: PASS: err=22 warn=28, CPUFreq: PASS, CPUIdle: FAIL: http://slexy.org/raw/s2GJpDG16e
> 23: sdp4430: BOOT: PASS: err=21 warn=33, CPUFreq: PASS, CPUIdle: PASS: http://slexy.org/raw/s21ChfWSXs
> TOTAL = 23 boards, Booted Boards = 19, No Boot boards = 4
script download fail does imply my farm has still issues to resolve..
but anyways.. more or less we are back operational again since it was
broken by next-20141216
--
Regards,
Nishanth Menon
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216)
2014-12-18 22:37 ` Nishanth Menon
@ 2014-12-18 22:51 ` Kevin Hilman
0 siblings, 0 replies; 11+ messages in thread
From: Kevin Hilman @ 2014-12-18 22:51 UTC (permalink / raw)
To: Nishanth Menon
Cc: Viresh Kumar, Kevin Hilman, Tony Lindgren, linux-omap,
linux-pm@vger.kernel.org, Tero Kristo
On Thu, Dec 18, 2014 at 2:37 PM, Nishanth Menon <nm@ti.com> wrote:
> script download fail does imply my farm has still issues to resolve..
> but anyways.. more or less we are back operational again since it was
> broken by next-20141216
I'm not seeing any more failures in next-20141218 either.
Kevin
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-12-18 22:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <E1Y0lcK-00026B-7I@ip-10-35-177-41.ec2.internal>
[not found] ` <CAMAWPa98kyaUJpt=GLw4cH7fkg+8BkrA0kds=H-2-4fWXy_tqw@mail.gmail.com>
2014-12-16 21:03 ` Fwd: next boot: 101 boots: 89 pass, 12 fail (next-20141216) Nishanth Menon
2014-12-17 4:07 ` Viresh Kumar
2014-12-17 15:28 ` Nishanth Menon
2014-12-17 16:27 ` Viresh Kumar
2014-12-17 16:43 ` Nishanth Menon
2014-12-18 1:54 ` Viresh Kumar
2014-12-17 17:16 ` Kevin Hilman
2014-12-17 19:11 ` Kevin Hilman
2014-12-18 2:01 ` Viresh Kumar
2014-12-18 22:37 ` Nishanth Menon
2014-12-18 22:51 ` Kevin Hilman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).