All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: YiFei Zhu <zhuyifei@google.com>
Cc: dev@aaront.org, linux-kernel@vger.kernel.org,
	peterz@infradead.org, zhuyifei1999@gmail.com
Subject: Re: printk.time causes rare kernel boot hangs
Date: Thu, 15 Jun 2023 12:29:19 +0100	[thread overview]
Message-ID: <20230615112919.GM7636@redhat.com> (raw)
In-Reply-To: <20230615110429.2839058-1-zhuyifei@google.com>

On Thu, Jun 15, 2023 at 11:04:29AM +0000, YiFei Zhu wrote:
> > FWIW attached is a test program that runs the qemu instances in
> > parallel (up to 8 threads), which seems to be a quicker way to hit the
> > problem for me.  Even on Intel, with this test I can hit the bug in a
> > few hundred iteration.
> 
> A friend sent me here so I took a look.
> 
> I was unable to reproduce with this script after 10000 iterations,
> on a AMD Gentoo Linux host:
> 
> Host kernel:  6.3.3 vanilla
> Guest kernel: git commit f31dcb152a3d0816e2f1deab4e64572336da197d
> Guest config: Provided full-fat Fedora config + CONFIG_GDB_SCRIPTS
> QEMU:         8.0.2 (with kvm_amd)
> Hardware:     AMD Ryzen 7 PRO 5850U
> 
> I wonder if anything on the host side affects this, or could be some
> sort of race condition.

We've had multiple independent reports of reproducing the bug, since
this story (unfortunately) hit Hacker News.  Your configuration above
should work, so I still don't know what the factor is.

[...]

> If you can reproduce the original bug (without the msleep or busy wait
> patch), could you check if you can reproduce that with idle=poll? If so,
> can you run "p show_state_filter(0)" so we get a stack trace of kernel_init,
> assuming it hit a similar issue as if msleep was added. If idle=poll does
> not work, or you can't call functions from within gdb (some old qemu versions
> did not support this), see if you can send a alt-sysrq-w to show stacks of
> blocked tasks.

(1) Adding idle=poll to the guest kernel

=> Bug still occurs, with about the same frequency as before.

(2) Connect with gdb to qemu's gdb-stub:

Trying to evaluate show_state_filter(0) didn't work for reasons I
don't understand:

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: Remote gdbserver does not support determining executable automatically.
RHEL <=6.8 and <=7.2 versions of gdbserver do not support such automatic execut.
The following versions of gdbserver support it:
- Upstream version of gdbserver (unsupported) 7.10 or later
- Red Hat Developer Toolset (DTS) version of gdbserver from DTS 4.0 or later (o)
- RHEL-7.3 versions of gdbserver (on any architecture)
arch_static_branch (branch=false, key=<optimized out>)
    at ./arch/x86/include/asm/jump_label.h:27
27     asm_volatile_goto("1:"
(gdb) bt
#0  arch_static_branch (branch=false, key=<optimized out>)
    at ./arch/x86/include/asm/jump_label.h:27
#1  static_key_false (key=<optimized out>) at ./include/linux/jump_label.h:207
#2  native_write_msr (high=222, low=719927812, msr=1760)
    at ./arch/x86/include/asm/msr.h:147
#3  wrmsrl (val=954202667524, msr=1760) at ./arch/x86/include/asm/msr.h:262
#4  lapic_next_deadline (delta=474, evt=0xffff88804e81bf40)
    at arch/x86/kernel/apic/apic.c:491
#5  0xffffffff81143667 in clockevents_program_event (dev=0xffff88804e81bf40, 
    expires=<optimized out>, force=<optimized out>)
    at kernel/time/clockevents.c:334
#6  0xffffffff81143c0b in tick_handle_periodic (dev=0xffff88804e81bf40)
    at kernel/time/tick-common.c:133
#7  0xffffffff8105d01c in local_apic_timer_interrupt ()
    at arch/x86/kernel/apic/apic.c:1095
#8  __sysvec_apic_timer_interrupt (regs=regs@entry=0xffffc90000003ee8)
    at arch/x86/kernel/apic/apic.c:1112
#9  0xffffffff81e61a91 in sysvec_apic_timer_interrupt (regs=0xffffc90000003ee8)
    at arch/x86/kernel/apic/apic.c:1106
#10 0xffffffff8200144a in asm_sysvec_apic_timer_interrupt ()
    at ./arch/x86/include/asm/idtentry.h:645
#11 0x0000000000000000 in ?? ()
(gdb) p show_state_filter(0)
[Inferior 1 (process 1) exited normally]
The program being debugged exited while in a function called from GDB.
Evaluation of the expression containing the function
(show_state_filter) will be abandoned.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


  reply	other threads:[~2023-06-15 11:30 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13 13:41 printk.time causes rare kernel boot hangs Richard W.M. Jones
2023-06-13 14:07 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-06-18 10:25   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-14  9:21 ` Peter Zijlstra
2023-06-14  9:45   ` Richard W.M. Jones
2023-06-14 10:30     ` Richard W.M. Jones
2023-06-14 10:39       ` Richard W.M. Jones
2023-06-14 11:35         ` Peter Zijlstra
2023-06-14 11:43           ` Richard W.M. Jones
2023-06-14 12:37           ` Richard W.M. Jones
2023-06-14 12:53           ` Peter Zijlstra
2023-06-14 13:03             ` Richard W.M. Jones
2023-06-14 13:09               ` Peter Zijlstra
2023-06-14 14:53                 ` Peter Zijlstra
2023-06-14 15:07                   ` Richard W.M. Jones
2023-06-14 15:19                     ` Peter Zijlstra
2023-06-14 15:22                       ` Richard W.M. Jones
2023-06-14 15:31                       ` Peter Zijlstra
2023-06-14 15:50                         ` Richard W.M. Jones
2023-06-14 17:34                           ` Richard W.M. Jones
2023-06-15  7:40                             ` Alexandre Belloni
2023-06-15  7:48                               ` Richard W.M. Jones
2023-06-14 11:20       ` Peter Zijlstra
2023-06-14 11:16     ` Peter Zijlstra
2023-06-14 11:22       ` Richard W.M. Jones
2023-06-14 11:26         ` Richard W.M. Jones
2023-06-15 11:04           ` YiFei Zhu
2023-06-15 11:29             ` Richard W.M. Jones [this message]
2023-06-15 11:31             ` Richard W.M. Jones
2023-06-15 12:20               ` Dr. David Alan Gilbert
2023-06-15 12:21               ` Richard W.M. Jones
2023-06-15 12:23                 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230615112919.GM7636@redhat.com \
    --to=rjones@redhat.com \
    --cc=dev@aaront.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=zhuyifei1999@gmail.com \
    --cc=zhuyifei@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.