public inbox for linux-clk@vger.kernel.org
 help / color / mirror / Atom feed
From: Tony Lindgren <tony@atomide.com>
To: Tero Kristo <t-kristo@ti.com>
Cc: Christophe Lyon <christophe.lyon@linaro.org>,
	Stephen Boyd <sboyd@kernel.org>,
	Jerome Brunet <jbrunet@baylibre.com>,
	Michael Turquette <mturquette@baylibre.com>,
	Shawn Lin <shawn.lin@rock-chips.com>,
	Arnd Bergmann <arnd@arndb.de>, Jyri Sarha <jsarha@ti.com>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	linux-omap <linux-omap@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	linux-clk <linux-clk@vger.kernel.org>
Subject: Re: Possible kernel bug in torvalds/linux/master
Date: Sun, 25 Mar 2018 08:39:27 -0700	[thread overview]
Message-ID: <20180325153927.GB5700@atomide.com> (raw)
In-Reply-To: <20180325151904.GA5700@atomide.com>

* Tony Lindgren <tony@atomide.com> [180325 15:20]:
> Hi,
> 
> * Arnd Bergmann <arnd@arndb.de> [180325 13:30]:
> > On Sun, Mar 25, 2018 at 3:03 PM, Christophe Lyon
> > <christophe.lyon@linaro.org> wrote:
> > > Hi Arnd,
> > >
> > > We have a Jenkins jobs that builds the kernel from torvalds/linux
> > > master branch mutli_v7 defconfig every day, using our last GCC release
> > > (7.2-2017-11), and boots a beaglebone-black board.
> > >
> > > Last week it started to fail, I first suspected a Lava problem, but
> > > the job now fails every time, and Remi Duraffort from the Lava team
> > > thinks it's really a kernel problem.
> > >
> > > Is this something you are interested in investigating? Or should we
> > > switch to another "less-edge" branch?
> > >
> > > The last successful run:
> > > https://ci.linaro.org/job/tcwg-buildapp/app=linux+multi_v7,label=tcwg-x86_64-build,target=arm-linux-gnueabihf/75/
> > > The next one failed:
> > > https://ci.linaro.org/job/tcwg-buildapp/app=linux+multi_v7,label=tcwg-x86_64-build,target=arm-linux-gnueabihf/76
> > >
> > > Build 75 was with this kernel commit:
> > > Merge branch 'for-4.16-fixes'
> > > 1b5f3ba415fe4cf8b8b39c8d104ed44cde330658
> > >
> > > Build 76 was with:
> > > Merge tag 'clk-fixes-for-linus'
> > > 3215b9d57a2c75c4305a3956ca303d7004485200
> > 
> > Hi Christophe,
> > 
> > This branch is certainly the right one to test, thanks for the report!
> > From looking at the output above, it seems that the kernel no longer
> > boots at all, and fails to even print any messages. Between the
> > two runs, I see the following commits:
> > 
> > 3215b9d57a2c Merge tag 'clk-fixes-for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
> > 303851e14a8f Merge tag 'for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> > 76c0b6a36a12 Merge tag 'scsi-fixes' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> > 645102eac15e Merge tag 'nfsd-4.16-1' of git://linux-nfs.org/~bfields/linux
> > 32d43cd391ba kvm/x86: fix icebp instruction handling
> > e8980d67d601 RDMA/ucma: Ensure that CM_ID exists prior to access it
> > 68ef3bc31664 nfsd: remove blocked locks on client teardown
> > 80cf79ae4f68 RDMA/verbs: Remove restrack entry from XRCD structure
> > ed65a4dc2208 RDMA/ucma: Fix use-after-free access in ucma_close
> > 7997f3b2df75 clk: bcm2835: Protect sections updating shared registers
> > 49012d1bf5f7 clk: bcm2835: Fix ana->maskX definitions
> > 2975d5de6428 RDMA/ucma: Check AF family prior resolving address
> > 8a53fc511c5e clk: aspeed: Prevent reset if clock is enabled
> > d90c76bb6112 clk: aspeed: Fix is_enabled for certain clocks
> > bd8602ca42f6 infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
> > 5388a508479d infiniband: qplib_fp: fix pointer cast
> > 42cea83f9524 IB/mlx5: Fix cleanup order on unload
> > 0c81ffc60d52 RDMA/ucma: Don't allow join attempts for unsupported AF family
> > 7688f2c3bbf5 RDMA/ucma: Fix access to non-initialized CM_ID object
> > 9dea9a2ff61c RDMA/core: Do not use invalid destination in determining port reuse
> > f3f134f5260a RDMA/mlx5: Fix crash while accessing garbage pointer and
> > freed memory
> > c2b37f76485f IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
> > 2c292dbb398e IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
> > 14bc1dff7427 scsi: qla2xxx: Remove FC_NO_LOOP_ID for FCP and FC-NVMe Discovery
> > 318aaf34f117 scsi: libsas: defer ata device eh commands to libata
> > 55c19eee3b47 clk: qcom: msm8916: Fix return value check in
> > qcom_apcs_msm8916_clk_probe()
> > 9903e41ae1f5 clk: hisilicon: hi3660:Fix potential NULL dereference in
> > hi3660_stub_clk_probe()
> > 56e1ee353943 Merge branch 'clk-helpers' (early part) into clk-fixes
> > 04bf9ab3359f clk: fix determine rate error with pass-through clock
> > 91584eb51b47 Merge branch 'clk-phase' into clk-fixes
> > bd13c6cbd3c0 Merge tag 'ti-clk-fixes-4.16' of
> > https://github.com/t-kristo/linux-pm into clk-fixes
> > a88bb86d58ce Merge tag 'clk-imx-fixes-4.16' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into
> > clk-fixes
> > 957a42e8599a Merge tag 'sunxi-clk-fixes-for-4.16' of
> > https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into
> > clk-fixes
> > 99652a469df1 clk: migrate the count of orphaned clocks at init
> > 7f95beea3608 clk: update cached phase to respect the fact when setting phase
> > 762790b75210 clk: ti: am43xx: add set-rate-parent support for display
> > clkctrl clock
> > c083dc5f3738 clk: ti: am33xx: add set-rate-parent support for display
> > clkctrl clock
> > 49159a9dc3da clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag
> > a275b315334d clk: imx51-imx53: Fix UART4/5 registration on i.MX50 and i.MX53
> > 5682e268350f clk: sunxi-ng: a31: Fix CLK_OUT_* clock ops
> > 
> > Out of these, All the interesting ones are clk related:
> > 
> > 56e1ee353943 Merge branch 'clk-helpers' (early part) into clk-fixes
> > 04bf9ab3359f clk: fix determine rate error with pass-through clock
> > 91584eb51b47 Merge branch 'clk-phase' into clk-fixes
> > bd13c6cbd3c0 Merge tag 'ti-clk-fixes-4.16' of
> > https://github.com/t-kristo/linux-pm into clk-fixes
> > 99652a469df1 clk: migrate the count of orphaned clocks at init
> > 7f95beea3608 clk: update cached phase to respect the fact when setting phase
> > 762790b75210 clk: ti: am43xx: add set-rate-parent support for display
> > clkctrl clock
> > c083dc5f3738 clk: ti: am33xx: add set-rate-parent support for display
> > clkctrl clock
> > 49159a9dc3da clk: ti: clkctrl: add support for CLK_SET_RATE_PARENT flag
> > 
> > I've added the involved parties to Cc. We also see the same thing on
> > kernelci, where many OMAP based systems now fail to boot, with the
> > problem starting at the same commit:
> > 
> > https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.16-rc6-431-gbcfc1f455466/
> > 
> > It's possible that this has already been debugged and a fix is being worked on,
> > but I'm not aware of anything, since I have not followed my email
> > while travelling.
> 
> I've confirmed that omap2plus_defconfig boots on bbb while
> multi_v7_defconfig fails to boot with the following:
> 
> l4_wkup_cm:clk:0010:0: failed to disable
> Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa30e054
> pgd = 4b21228f
> [fa30e054] *pgd=48211452(bad)
> Internal error: : 1028 [#1] SMP ARM
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc6-00075-g3215b9d57a2c #709
> Hardware name: Generic AM33XX (Flattened Device Tree)
> PC is at _update_sysc_cache+0x2c/0x88
> LR is at _enable+0x19c/0x274
> pc : [<c032a844>]    lr : [<c032afc8>]    psr: 40000013
> sp : db0adea0  ip : 00000003  fp : 00000000
> r10: c144997c  r9 : 00000157  r8 : 00000003
> r7 : c151d30c  r6 : 00000000  r5 : c1678ef4  r4 : c151b2f0
> r3 : fa30e054  r2 : c151b360  r1 : 00000054  r0 : c151b2f0
> Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 80204019  DAC: 00000051
> Process swapper/0 (pid: 1, stack limit = 0x2ddf0754)
> Stack: (0xdb0adea0 to 0xdb0ae000)
> dea0: c151b2f0 c032afc8 00000000 a0000013 c1504c48 c151b2f0 c151b314 c1504c48
> dec0: c151b328 c1311c78 a0000013 c0c15ec4 00000011 edaa6d91 c131297c c151b2f0
> dee0: c150ce28 c131297c ffffe000 c1312a68 c1504c48 00000000 c131297c c0302730
> df00: dfdffb06 dfdffafa c1250ecc 00000100 00000157 c0361f34 c124f400 c10cc358
> df20: 00000000 00000002 00000002 c10dec28 00000000 c1504c48 c10eeca0 c10dec9c
> df40: 00000000 dfdffb06 00000000 edaa6d91 00000000 c1677700 c1677700 c13cf824
> df60: c13cf83c 00000003 00000157 c144997c 00000000 c1300e2c 00000002 00000002
> df80: 00000000 c13005c0 00000000 c0d96788 00000000 00000000 00000000 00000000
> dfa0: 00000000 c0d96790 00000000 c03010e8 00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 d5370d56 dcffd777
> [<c032a844>] (_update_sysc_cache) from [<c032afc8>] (_enable+0x19c/0x274)
> [<c032afc8>] (_enable) from [<c1311c78>] (_setup.part.16+0xd8/0x418)
> [<c1311c78>] (_setup.part.16) from [<c1312a68>] (__omap_hwmod_setup_all+0xec/0x100)
> [<c1312a68>] (__omap_hwmod_setup_all) from [<c0302730>] (do_one_initcall+0x54/0x18c)
> [<c0302730>] (do_one_initcall) from [<c1300e2c>] (kernel_init_freeable+0x144/0x1d0)
> [<c1300e2c>] (kernel_init_freeable) from [<c0d96790>] (kernel_init+0x8/0x110)
> [<c0d96790>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0xdb0adfb0 to 0xdb0adff8)
> dfa0:                                     00000000 00000000 00000000 00000000
> dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> Code: e31c0c01 e5903048 e0833001 1a00000a (e5933000)
> 
> Tero, it might be some timing related clock issue?

Looks like git bisect points to commit c083dc5f3738 ("clk: ti: am33xx:
add set-rate-parent support for display clkctrl clock"). I also verified
reverting it makes bbb boot again.

Regards,

Tony

  reply	other threads:[~2018-03-25 15:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAKdteOZLZDkpZ0HMSOVQOc6eRxFzkHyLM=sHm7e0bMV-zeUdVQ@mail.gmail.com>
2018-03-25 13:28 ` Possible kernel bug in torvalds/linux/master Arnd Bergmann
2018-03-25 15:19   ` Tony Lindgren
2018-03-25 15:39     ` Tony Lindgren [this message]
2018-03-27 17:43       ` Tero Kristo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180325153927.GB5700@atomide.com \
    --to=tony@atomide.com \
    --cc=arnd@arndb.de \
    --cc=christophe.lyon@linaro.org \
    --cc=jbrunet@baylibre.com \
    --cc=jsarha@ti.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-clk@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=mturquette@baylibre.com \
    --cc=regressions@leemhuis.info \
    --cc=sboyd@kernel.org \
    --cc=shawn.lin@rock-chips.com \
    --cc=t-kristo@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox