linux-sh.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Geert Uytterhoeven <geert@linux-m68k.org>
To: linux-arm-kernel@lists.infradead.org
Subject: Re: (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization
Date: Tue, 25 Nov 2014 21:27:49 +0000	[thread overview]
Message-ID: <CAMuHMdUMpmdYe+ED=M8FVvMMWNmrs8o4WzQ-uc2nFHNdL99HNQ@mail.gmail.com> (raw)
In-Reply-To: <20141125180159.GI5050@linux.vnet.ibm.com>

Hi Paul,

On Tue, Nov 25, 2014 at 7:01 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Nov 25, 2014 at 06:49:16PM +0100, Geert Uytterhoeven wrote:
>> On Fri, Nov 7, 2014 at 8:59 AM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>> > On Thu, Nov 6, 2014 at 10:02 PM, Daniel Lezcano
>> > <daniel.lezcano@linaro.org> wrote:
>> >> On 11/06/2014 09:38 PM, Geert Uytterhoeven wrote:
>> >>> When CONFIG_CPU_IDLE=y, the kernel locks up during cpuidle initialization
>> >>> on Renesas sh73a0/kzm9g-reference, which has a dual-core Cortex-A9.
>> >>>
>> >>> Last message is:
>> >>>
>> >>>      DMA: preallocated 256 KiB pool for atomic coherent allocations
>> >>>
>> >>> After this it's supposed to print:
>> >>>
>> >>>      cpuidle: using governor ladder
>> >>>      cpuidle: using governor menu
>> >>>
>> >>> I've bisected this to commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc
>> >>> ("sched: Let the scheduler see CPU idle states").
>> >>>
>> >>> Reverting that commit, and commit 83a0a96a5f26d974580fd7251043ff70c8f1823d
>> >>> ("sched/fair: Leverage the idle state info when choosing the "idlest"
>> >>> cpu") which
>> >>> depends on it, fixes the problem.
>> >>>
>> >>> I saw the discussion "lockdep splat in CPU hotplug", so I enabled lockdep
>> >>> debugging, but didn't see a lockdep splat.
>> >>
>> >> Did you try the fix attached ?
>> >>
>> >> https://lkml.org/lkml/2014/10/22/722
>> >
>> > Thanks, I didn't try that.
>> >
>> > However, this patch seems to be in v3.18-rc3, so I'm already using it.
>> > Hence it doesn't fix the problem for me.
>> >
>> > On another board, with a dual Cortex-A15, the problem doesn't show up.
>>
>> This problem (regression introduced in v3.18-rc1) is still present in v3.18-rc6.
>>
>> I did some more investigations, and it's hanging in the call to
>> synchronize_rcu() in cpuidle_uninstall_idle_handler(), which was added in
>> commit 442bf3aaf55a91ebfec71da46a4ee10a3c905bcc.
>> More specificailly, it's blocked on the wait_for_completion(&rcu.completion)
>> in kernel/rcu/update.c:void wait_rcu_gp(call_rcu_func_t crf).
>
> You didn't disable RCU CPU stall warnings, did you?  If you did, please
> re-enable them, as the stall warning messages will likely help to debug
> this.  The soft-lockup checks can also be quite valuable.
>
> If you haven't run with CONFIG_PROVE_RCU=y, please try that.  For example,
> if you have CONFIG_PREEMPT=y and you do synchronize_rcu() from within
> an RCU read-side critical section (don't do that, it will hang!!!),
> then you will get a lockdep splat.
>
> Does any sort of system activity (keyboard, network, etc.) unstick the
> system?

Thanks! Unfortunately none of the above helped.

However, I found the culprit. It turned out to be a platform issue, not an
issue in the generic cpu idle or RCU code. Read on below if you're
interested in the gory details. Else just skip, and sleep well again tonight ;-)

> If you have tried all those things without good effect, could you please
> send along your .config and an alt-sysrq-t dump of all tasks' stacks?

As I didn't manage to trigger a sysrq dump over the serial console,
I just called __handle_sysrq() right before the wait_for_completion(), after
a small delay. The dump didn't show anything suspicious. Everything
looked the same as on the dual-core Cortex A15, where the problem
doesn't manifest.

Then I noticed the sched debug output on the A15, which was missing
on the CA9 build. Enabling it on the A9 gave:

Sched Debug Version: v0.11,
3.18.0-rc6-kzm9g-reference-04913-gedc89a2a2059c7ff-dirty #101
ktime                                   : 0.000000
sched_clk                               : 0.000000
cpu_clk                                 : 0.000000
jiffies                                 : 4294928896

Oops, time is not advancing?

Dmesg also shows (early):

    clocksource_of_init: no matching clocksources found

and the timer is only initialized much later, after cpu idle initialization:

    sh_cmt e6138000.timer: ch0: used for periodic clock events

Hacking up a timer node for "arm,cortex-a9-twd-timer" in sh73a0.dtsi
(with some "guessed" values) made it work.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

  reply	other threads:[~2014-11-25 21:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-06 20:38 (bisected) Lock up on sh73a0/kzm9g on cpuidle initialization Geert Uytterhoeven
2014-11-06 21:02 ` Daniel Lezcano
2014-11-07  7:59   ` Geert Uytterhoeven
2014-11-25 17:49     ` Geert Uytterhoeven
2014-11-25 18:01       ` Paul E. McKenney
2014-11-25 21:27         ` Geert Uytterhoeven [this message]
2014-11-25 22:23           ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMuHMdUMpmdYe+ED=M8FVvMMWNmrs8o4WzQ-uc2nFHNdL99HNQ@mail.gmail.com' \
    --to=geert@linux-m68k.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).