All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Aaron Thompson <dev@aaront.org>, linux-kernel@vger.kernel.org
Subject: Re: printk.time causes rare kernel boot hangs
Date: Wed, 14 Jun 2023 10:45:22 +0100	[thread overview]
Message-ID: <20230614094522.GA7636@redhat.com> (raw)
In-Reply-To: <20230614092158.GF1639749@hirez.programming.kicks-ass.net>

On Wed, Jun 14, 2023 at 11:21:58AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 02:41:05PM +0100, Richard W.M. Jones wrote:
> > [Being tracked in this bug which contains much more detail:
> > https://gitlab.com/qemu-project/qemu/-/issues/1696 ]
> 
> Can I please just get the detail in mail instead of having to go look at
> random websites?

Sure, the kernel hangs after printing:

[    0.070120] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    0.070120] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127
[    0.070120] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0
[    0.070120] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    0.070120] Spectre V2 : Mitigation: Retpolines
[    0.070120] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    0.070120] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
[    0.070120] Spectre V2 : Enabling Speculation Barrier for firmware calls
[    0.070120] RETBleed: Mitigation: untrained return thunk
[    0.070120] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[    0.070120] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl
[    0.070120] Freeing SMP alternatives memory: 48K

The next message we'd expect here would be:

[    0.070794] smpboot: CPU0: AMD Ryzen 9 3900X 12-Core Processor (family: 0x17, model: 0x71, stepping: 0x0)

I believe this bug would affect baremetal too, basically any kernel
compiled with CONFIG_PRINTK_TIME.  However the hang is very rare.

> > Recent kernels hang rarely when booted on qemu.  Usually you need to
> > boot 100s or 1,000s of times to see the hang, compared to 292,612 [sic]
> > successful boots which I was able to do before the problematic commit.
> > 
> > A reproducer (you'll probably need to use Fedora) is:
> 
> Debian only shop here... in fact, I still have machines without systemd.

Debian should work too actually, just run the following command until
it hangs:

> >   $ while guestfish -a /dev/null -v run >& /tmp/log; do echo -n . ; done
> > 
> > You will need to leave it running for probably several hours, and
> > examine the /tmp/log file at the end.
> > 
> > I tracked this down to the following commit:
> > 
> >   commit f31dcb152a3d0816e2f1deab4e64572336da197d
> >   Author: Aaron Thompson <dev@aaront.org>
> >   Date:   Thu Apr 13 17:50:12 2023 +0000
> > 
> >     sched/clock: Fix local_clock() before sched_clock_init()
> >     
> >     Have local_clock() return sched_clock() if sched_clock_init() has not
> >     yet run. sched_clock_cpu() has this check but it was not included in the
> >     new noinstr implementation of local_clock().
> > 
> >   (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f31dcb152a3d0816e2f1deab4e64572336da197d)
> > 
> > Reverting this commit fixes the problem.
> > 
> > I don't know _why_ this commit is wrong, but can we revert it as it
> > causes serious problems with libguestfs hanging randomly.
> > 
> > Or if there's anything you want me to try out then let me know,
> > because I can reproduce the problem locally quite easily.
> 
> Well, since it's virt and all, can you attach gdb to the gdb-stub and
> see where it's at? Any clue is better than no clue.

I'll see if this is possible, but I didn't have much luck with gdb on
qemu guests in the past.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org


  reply	other threads:[~2023-06-14  9:46 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13 13:41 printk.time causes rare kernel boot hangs Richard W.M. Jones
2023-06-13 14:07 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-06-18 10:25   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-14  9:21 ` Peter Zijlstra
2023-06-14  9:45   ` Richard W.M. Jones [this message]
2023-06-14 10:30     ` Richard W.M. Jones
2023-06-14 10:39       ` Richard W.M. Jones
2023-06-14 11:35         ` Peter Zijlstra
2023-06-14 11:43           ` Richard W.M. Jones
2023-06-14 12:37           ` Richard W.M. Jones
2023-06-14 12:53           ` Peter Zijlstra
2023-06-14 13:03             ` Richard W.M. Jones
2023-06-14 13:09               ` Peter Zijlstra
2023-06-14 14:53                 ` Peter Zijlstra
2023-06-14 15:07                   ` Richard W.M. Jones
2023-06-14 15:19                     ` Peter Zijlstra
2023-06-14 15:22                       ` Richard W.M. Jones
2023-06-14 15:31                       ` Peter Zijlstra
2023-06-14 15:50                         ` Richard W.M. Jones
2023-06-14 17:34                           ` Richard W.M. Jones
2023-06-15  7:40                             ` Alexandre Belloni
2023-06-15  7:48                               ` Richard W.M. Jones
2023-06-14 11:20       ` Peter Zijlstra
2023-06-14 11:16     ` Peter Zijlstra
2023-06-14 11:22       ` Richard W.M. Jones
2023-06-14 11:26         ` Richard W.M. Jones
2023-06-15 11:04           ` YiFei Zhu
2023-06-15 11:29             ` Richard W.M. Jones
2023-06-15 11:31             ` Richard W.M. Jones
2023-06-15 12:20               ` Dr. David Alan Gilbert
2023-06-15 12:21               ` Richard W.M. Jones
2023-06-15 12:23                 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230614094522.GA7636@redhat.com \
    --to=rjones@redhat.com \
    --cc=dev@aaront.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.