public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Florian Fainelli <f.fainelli@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>,
	"Russell King (Oracle)" <linux@armlinux.org.uk>,
	Joel Fernandes <joel@joelfernandes.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, kernel-team@meta.com,
	paulmck@kernel.org, mingo@kernel.org, rcu@vger.kernel.org,
	neeraj.upadhyay@amd.com, urezki@gmail.com,
	qiang.zhang1211@gmail.com, bigeasy@linutronix.de,
	chenzhongjin@huawei.com, yangjihong1@huawei.com,
	rostedt@goodmis.org, Justin Chen <justin.chen@broadcom.com>
Subject: Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]
Date: Thu, 14 Mar 2024 23:51:50 +0100	[thread overview]
Message-ID: <87msr0l94p.ffs@tglx> (raw)
In-Reply-To: <8f977bbb-d949-4e90-b3d2-b9815189b842@gmail.com>

On Thu, Mar 14 2024 at 14:53, Florian Fainelli wrote:
> On 3/14/24 14:21, Thomas Gleixner wrote:
>> 8ca1836769d758e4fbf5851bb81e181c52193f5d is related, but dos not fully
>> explain the fail. I haven't yet spotted where this goes into lala land.
>
> It was a lot harder to generate the same issue on cold boot against 
> 8ca1836769d758e4fbf5851bb81e181c52193f5d,

That's good as it points into the exactly right direction as far as I
can tell from the data we have, but I might be wrong at the end.

> however, just like against 36e40df35d2c1891fe58241640c7c95de4aa739b,
> it would happen resuming from suspend to DRAM whereby the CPU core(s)
> lost their power and had to be re-initialized. Eventually I got a cold
> boot log:
>
> https://gist.github.com/ffainelli/b5684585c78518a5492cbbf1c7dce16e

The picture is similar to the one before:

<idle>-0         2d.... 3016627us : tmigr_update_events: child=00000000 group=6b74d49d group_lvl=0 child_active=0 group_active=1 nextevt=3023000000 next_expiry=3023000000 child_evt_expiry=0 child_evtcpu=0
<idle>-0         2d.... 3016628us : tmigr_group_set_cpu_inactive: group=6b74d49d lvl=0 numa=0 active=1 migrator=1 parent=00000000 childmask=4
<idle>-0         2d.... 3016629us : tmigr_cpu_idle: cpu=2 parent=6b74d49d nextevt=3023000000 wakeup=9223372036854775807

<idle>-0         0d.... 3016684us : tmigr_group_set_cpu_inactive: group=6b74d49d lvl=0 numa=0 active=0 migrator=ff parent=00000000 childmask=1
<idle>-0         0d.... 3016685us : tmigr_cpu_idle: cpu=0 parent=6b74d49d nextevt=9223372036854775807 wakeup=9223372036854775807
<idle>-0         0d.... 3024623us : tmigr_cpu_new_timer_idle: cpu=0 parent=6b74d49d nextevt=9223372036854775807 wakeup=9223372036854775807

<idle>-0         1d.s.. 162080532us : timer_cancel: timer=2e281df7

Just a different CPU this time.

The first expiring timer:

kcompact-42        1d.... 2552891us : timer_start: timer=2e281df7 function=process_timeout expires=4294670348 [timeout=500] bucket_expiry=4294670352 cpu=1 idx=66 flags=

Last expiry before going south:

   <idle>-0         1..s.. 3006620us : timer_expire_entry: timer=6f47b280 function=process_timeout now=4294670302 baseclk=4294670302

4294670352 - 4294670302 = 50

3006620us + 50*1000us = 3056620us

So the previous observation of hitting the exact point of the last CPU
going idle does not matter.

What really bothers me is:

<idle>-0         2d.... 3016629us : tmigr_cpu_idle: cpu=2 parent=6b74d49d nextevt=3023000000 wakeup=9223372036854775807

which has an event between these events:

<idle>-0         0d.... 3016685us : tmigr_cpu_idle: cpu=0 parent=6b74d49d nextevt=9223372036854775807 wakeup=9223372036854775807
<idle>-0         0d.... 3024623us : tmigr_cpu_new_timer_idle: cpu=0 parent=6b74d49d nextevt=9223372036854775807 wakeup=9223372036854775807

But that event is before the next expiring timer. Confused, but Frederic
just told me on IRC he's on to something.

> Does the consistent ~159s mean anything?

I don't think so. It might be the limitation of the clockevent device,
the maximum sleep time restriction or some other unrelated event (device
interrupt, hrtimer) which happens to pop up after this time for some
reason.

But it's definitely not relevant to the problem itself. It's just the
thing which brings the machine back to life. Otherwise it might sit
there forever.

Thanks,

        tglx

  reply	other threads:[~2024-03-14 22:51 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-08 17:15 [GIT PULL] RCU changes for v6.9 Boqun Feng
2024-03-11 19:43 ` pr-tracker-bot
2024-03-12 20:32 ` Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9] Florian Fainelli
2024-03-12 21:01   ` Frederic Weisbecker
2024-03-12 21:15     ` Paul E. McKenney
2024-03-12 21:35       ` Florian Fainelli
2024-03-12 22:05     ` Florian Fainelli
2024-03-12 21:07   ` Boqun Feng
2024-03-12 21:34     ` Florian Fainelli
2024-03-12 21:44       ` Linus Torvalds
2024-03-12 23:48         ` Boqun Feng
2024-03-13 16:01           ` Joel Fernandes
2024-03-13 21:30             ` Florian Fainelli
2024-03-13 21:59               ` Russell King (Oracle)
2024-03-13 22:04                 ` Florian Fainelli
2024-03-13 22:49                   ` Russell King (Oracle)
2024-03-13 23:29                     ` Florian Fainelli
2024-03-14  1:15                       ` Linus Torvalds
2024-03-14  1:22                         ` Florian Fainelli
2024-03-13 22:52                   ` Frederic Weisbecker
2024-03-14  3:44                     ` Florian Fainelli
2024-03-14  5:12                       ` Boqun Feng
2024-03-14  6:33                         ` Boqun Feng
2024-03-14  9:32                           ` Thomas Gleixner
2024-03-14  9:11                       ` Thomas Gleixner
2024-03-14 10:41                       ` Frederic Weisbecker
2024-03-14 18:35                         ` Florian Fainelli
2024-03-14 18:51                           ` Boqun Feng
2024-03-14 19:09                             ` Florian Fainelli
2024-03-14 20:45                               ` Thomas Gleixner
2024-03-14 21:21                                 ` Thomas Gleixner
2024-03-14 21:53                                   ` Florian Fainelli
2024-03-14 22:51                                     ` Thomas Gleixner [this message]
2024-03-14 21:58                                   ` Thomas Gleixner
2024-03-14 22:05                                   ` Boqun Feng
2024-03-14 22:10                                     ` Boqun Feng
2024-03-15  1:14                                     ` [PATCH] timer/migration: Remove buggy early return on deactivation [was Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]] Frederic Weisbecker
2024-03-15  1:20                                       ` Frederic Weisbecker
2024-03-15 13:44                                       ` Florian Fainelli
2024-03-16 19:06                                       ` [tip: timers/urgent] timer/migration: Remove buggy early return on deactivation tip-bot2 for Frederic Weisbecker
2024-03-26 16:41                                       ` [PATCH] timer/migration: Remove buggy early return on deactivation [was Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]] Anna-Maria Behnsen
2024-03-26 17:18                                         ` Frederic Weisbecker
2024-04-04 16:50                                           ` [PATCH] timers/migration: Return early on deactivation Anna-Maria Behnsen
2024-04-04 22:19                                             ` Frederic Weisbecker
2024-04-05  8:53                                               ` [PATCH v2] " Anna-Maria Behnsen
2024-04-05  9:11                                                 ` [tip: timers/urgent] " tip-bot2 for Anna-Maria Behnsen
2024-03-14  9:03                 ` Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9] Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87msr0l94p.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=chenzhongjin@huawei.com \
    --cc=f.fainelli@gmail.com \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=justin.chen@broadcom.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@kernel.org \
    --cc=neeraj.upadhyay@amd.com \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=yangjihong1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox