All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Arnd Bergmann" <arnd@arndb.de>,
	"Naresh Kamboju" <naresh.kamboju@linaro.org>,
	"Anders Roxell" <anders.roxell@linaro.org>,
	"Daniel Díaz" <daniel.diaz@linaro.org>,
	"Benjamin Copeland" <ben.copeland@linaro.org>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	"Richard Henderson" <richard.henderson@linaro.org>
Subject: Re: qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing.
Date: Tue, 4 Jul 2023 08:46:20 +0100	[thread overview]
Message-ID: <20230704074620.GA17440@redhat.com> (raw)
In-Reply-To: <20230621160655.GL2053369@hirez.programming.kicks-ass.net>

On Wed, Jun 21, 2023 at 06:06:55PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 05:31:15PM +0200, Arnd Bergmann wrote:
> 
> > I don't know much about x86 exception handling, but my guess is
> > that this is where the stack overflows, so this backtrace
> > is not all that useful. Looking at the full log from your link,
> > I see that recursion through asm_exc_int3:
> > 
> > <4>[   49.886694]  ? exc_int3+0x62/0x80
> > <4>[   49.886714]  ? asm_exc_int3+0x3e/0x50
> > <4>[   49.886759]  ? preempt_count_sub+0x5/0x80
> > <4>[   49.886783]  ? preempt_count_sub+0x5/0x80
> > <4>[   49.886805]  ? irq_work_queue+0x40/0x80
> > <4>[   49.886826]  ? defer_console_output+0x49/0x80
> > <4>[   49.886840]  ? vprintk+0x42/0x60
> > <4>[   49.886857]  ? _printk+0x5d/0x80
> > <4>[   49.886891]  ? die+0x9c/0xe0
> > <4>[   49.886922]  ? exc_int3+0x62/0x80
> > <4>[   49.886942]  ? asm_exc_int3+0x3e/0x50
> 
> So this is exc_int3(), but it is past poke_int3_handler() which is the
> counter-part of text_poke_bp_batch() -- die() seems to suggest we're
> even past do_int3().
> 
> The fact that it continued past poke_int3_handler() suggests that it is
> a 'stray' int3, not covered by an active text modification site.
> 
> > <4>[   49.886966]  ? __pfx_tick_sched_timer+0x10/0x10
> > <4>[   49.886980]  ? __pfx_read_tsc+0x10/0x10
> > <4>[   49.887010]  ? preempt_count_sub+0x5/0x80
> > <4>[   49.887035]  ? preempt_count_sub+0x5/0x80
> > <4>[   49.887057]  ? __hrtimer_run_queues+0xee/0x330
> > <4>[   49.887070]  ? _raw_spin_unlock_irqrestore+0x28/0x50
> > <4>[   49.887085]  ? __hrtimer_run_queues+0xee/0x330
> > <4>[   49.887136]  ? hrtimer_interrupt+0xf6/0x390
> > <4>[   49.887167]  ? __sysvec_apic_timer_interrupt+0x64/0x1a0
> > <4>[   49.887189]  ? sysvec_apic_timer_interrupt+0x7a/0x90
> > <4>[   49.887206]  </IRQ>
> > <4>[   49.887219]  <TASK>
> > <4>[   49.887231]  ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
> > <4>[   49.887252]  ? sched_rt_period_timer+0x4/0x390
> > <4>[   49.887272]  ? __pfx_do_sync_core+0x10/0x10
> > <4>[   49.887305]  ? insn_get_displacement+0x9/0x160
> > <4>[   49.887329]  ? insn_get_displacement+0x9/0x160
> > <4>[   49.887344]  ? insn_get_immediate+0xd2/0x270
> > <4>[   49.887365]  ? insn_decode+0x113/0x150
> > <4>[   49.887384]  ? text_poke_loc_init+0xea/0x220
> > <4>[   49.887429]  ? sched_rt_period_timer+0x4/0x390
> > <4>[   49.887451]  ? text_poke_queue+0x89/0xa0
> 
> This is the text_poke_queue() stage, it is still collecting sites to
> poke but hasn't actually started yet. The actual poking happens in
> text_poke_bp_batch(), which isn't in the call-chain afaict.
> 
> > <4>[   49.887477]  ? ftrace_replace_code+0x149/0x1f0
> > <4>[   49.887508]  ? ftrace_modify_all_code+0x71/0x140
> > <4>[   49.887533]  ? arch_ftrace_update_code+0xd/0x20
> > <4>[   49.887550]  ? ftrace_shutdown+0xf5/0x220
> > <4>[   49.887577]  ? unregister_ftrace_function+0x2e/0x150
> > <4>[   49.887608]  ? stack_trace_sysctl+0x82/0xb0
> > <4>[   49.887636]  ? proc_sys_call_handler+0x18b/0x280
> > <4>[   49.887675]  ? proc_sys_write+0x17/0x20
> > <4>[   49.887692]  ? vfs_write+0x324/0x3f0
> > <4>[   49.887742]  ? ksys_write+0x75/0xe0
> > <4>[   49.887771]  ? __x64_sys_write+0x1f/0x30
> > <4>[   49.887787]  ? do_syscall_64+0x48/0xa0
> > <4>[   49.887802]  ? sysvec_apic_timer_interrupt+0x4d/0x90
> > <4>[   49.887822]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > <4>[   49.887878]  </TASK>

Did anything happen with this bug later?  I see nothing on any public
list, nor any public bug tracker.

We have been having the same sort of problem
(https://bugzilla.redhat.com/show_bug.cgi?id=2216496).  It's another
of those bugs that requires hundreds or thousands of boot iterations
before you can see it.  There is a test in comment 27 but it requires
guestfish and some hacking to work.  I'll try to come up with a
cleaner test later.

We see stack traces like:

[    3.081939] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    3.082266] clocksource: Switched to clocksource acpi_pm
[    3.090201] NET: Registered PF_INET protocol family
[    3.093098] int3: 0000 [#1] PREEMPT SMP NOPTI
[    3.093098] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.4.0-10173-ga901a3568fd2 #8
[    3.093098] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[    3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
[    3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89 ff e8 f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f 20 66 <90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
[    3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
[    3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000280003
[    3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI: ffffffff8441e4b8
[    3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09: ffffffff8441e4b8
[    3.093098] R10: 0000000000000001 R11: 000000000002e990 R12: 0000000000000000
[    3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15: ffffffff8441e4b8
[    3.093098] FS:  0000000000000000(0000) GS:ffff9aa90fd80000(0000) knlGS:0000000000000000
[    3.093098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4: 0000000000750ee0
[    3.093098] PKRU: 55555554
[    3.093098] Call Trace:
[    3.093098]  <TASK>
[    3.093098]  ? die+0x31/0x80
[    3.093098]  ? exc_int3+0x10e/0x120
[    3.093098]  ? asm_exc_int3+0x39/0x40
[    3.093098]  ? __mod_timer+0x1c3/0x370
[    3.093098]  ? __mod_timer+0x1c3/0x370
[    3.093098]  queue_delayed_work_on+0x23/0x30
[    3.093098]  neigh_table_init+0x1bb/0x2e0
[    3.093098]  arp_init+0x12/0x50
[    3.093098]  inet_init+0x15b/0x2f0
[    3.093098]  ? __pfx_inet_init+0x10/0x10
[    3.093098]  do_one_initcall+0x58/0x230
[    3.093098]  kernel_init_freeable+0x199/0x2d0
[    3.093098]  ? __pfx_kernel_init+0x10/0x10
[    3.093098]  kernel_init+0x15/0x1b0
[    3.093098]  ret_from_fork+0x2c/0x50
[    3.093098]  </TASK>
[    3.093098] Modules linked in:
[    3.093098] ---[ end trace 0000000000000000 ]---
[    3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
[    3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89 ff e8 f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f 20 66 <90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
[    3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
[    3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000280003
[    3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI: ffffffff8441e4b8
[    3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09: ffffffff8441e4b8
[    3.093098] R10: 0000000000000001 R11: 000000000002e990 R12: 0000000000000000
[    3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15: ffffffff8441e4b8
[    3.093098] FS:  0000000000000000(0000) GS:ffff9aa90fd80000(0000) knlGS:0000000000000000
[    3.093098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4: 0000000000750ee0
[    3.093098] PKRU: 55555554
[    3.093098] Kernel panic - not syncing: Fatal exception in interrupt

There are many variations, but the common pattern seems to be
<something in the clock or timer code> -> int3 exception

It only happens under qemu TCG (software emulation).

It goes away if we recompile qemu without MTTCG support.

It only happens with -smp enabled, we are using qemu -smp 4

We are using qemu-system-x86_64 full system emulation on x86_64 host
(ie. forcing KVM off).

It happens with the latest upstream kernel and qemu, compiled from
source.

I'd love to have a better way to debug this or collect more
diagnostics if you have any suggestions.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


  reply	other threads:[~2023-07-04  7:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+G9fYsETJQm0Ue7hGsb+nbsiMikwycOV3V0DPr6WC2r61KRBQ@mail.gmail.com>
2023-06-21 15:31 ` qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing Arnd Bergmann
2023-06-21 16:06   ` Peter Zijlstra
2023-07-04  7:46     ` Richard W.M. Jones [this message]
2023-07-04 13:21       ` Richard W.M. Jones
2023-07-05 16:28       ` Richard W.M. Jones
     [not found]         ` <CAFXwXrmbpuFNf5=nQxiTteo8fpCdAbK4pEAN176Cq0yvwZcfFw@mail.gmail.com>
2023-07-05 16:35           ` Richard W.M. Jones
     [not found]             ` <CAFXwXrk1FEZPUO-zqNVJZ6YCHKUkgNehwmyDYuOr5fx8ff0OCA@mail.gmail.com>
2023-07-05 16:40               ` Richard W.M. Jones
2023-07-06  6:13                 ` Richard Henderson
2023-07-05 16:37           ` Richard W.M. Jones
2023-07-05 21:50         ` Richard W.M. Jones
2023-07-06  6:30           ` Richard Henderson
2023-07-06 10:28             ` Richard W.M. Jones
2023-08-08  5:27               ` John Stultz
2023-08-08  7:28                 ` Richard W.M. Jones
2025-10-25 19:48                   ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230704074620.GA17440@redhat.com \
    --to=rjones@redhat.com \
    --cc=anders.roxell@linaro.org \
    --cc=arnd@arndb.de \
    --cc=ben.copeland@linaro.org \
    --cc=daniel.diaz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=peterz@infradead.org \
    --cc=richard.henderson@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.