All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Naresh Kamboju" <naresh.kamboju@linaro.org>,
	"Anders Roxell" <anders.roxell@linaro.org>,
	"Daniel Díaz" <daniel.diaz@linaro.org>,
	"Benjamin Copeland" <ben.copeland@linaro.org>,
	"lkml - Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing.
Date: Wed, 5 Jul 2023 17:35:23 +0100	[thread overview]
Message-ID: <20230705163523.GJ7636@redhat.com> (raw)
In-Reply-To: <CAFXwXrmbpuFNf5=nQxiTteo8fpCdAbK4pEAN176Cq0yvwZcfFw@mail.gmail.com>

On Wed, Jul 05, 2023 at 06:32:30PM +0200, Richard Henderson wrote:
> https://gitlab.com/qemu-project/qemu/-/commit/
> 3307e08c6f142bb3d2406cfbc0ee19359748b51a

I've got that patch in my qemu tree.  Do you suspect it's
wrong and want me to try reverting it?

Rich.

> r~
> 
> On Wed, 5 July 2023, 18:28 Richard W.M. Jones, <rjones@redhat.com> wrote:
> 
>     On Tue, Jul 04, 2023 at 08:46:20AM +0100, Richard W.M. Jones wrote:
>     > We have been having the same sort of problem
>     > (https://bugzilla.redhat.com/show_bug.cgi?id=2216496).  It's another
>     > of those bugs that requires hundreds or thousands of boot iterations
>     > before you can see it.  There is a test in comment 27 but it requires
>     > guestfish and some hacking to work.  I'll try to come up with a
>     > cleaner test later.
>     >
>     > We see stack traces like:
>     >
>     > [    3.081939] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff,
>     max_idle_ns: 2085701024 ns
>     > [    3.082266] clocksource: Switched to clocksource acpi_pm
>     > [    3.090201] NET: Registered PF_INET protocol family
>     > [    3.093098] int3: 0000 [#1] PREEMPT SMP NOPTI
>     > [    3.093098] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
>     6.4.0-10173-ga901a3568fd2 #8
>     > [    3.093098] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>     rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>     > [    3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
>     > [    3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89 ff e8
>     f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f 20 66
>     <90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
>     > [    3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
>     > [    3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>     0000000000280003
>     > [    3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI:
>     ffffffff8441e4b8
>     > [    3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09:
>     ffffffff8441e4b8
>     > [    3.093098] R10: 0000000000000001 R11: 000000000002e990 R12:
>     0000000000000000
>     > [    3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15:
>     ffffffff8441e4b8
>     > [    3.093098] FS:  0000000000000000(0000) GS:ffff9aa90fd80000(0000)
>     knlGS:0000000000000000
>     > [    3.093098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     > [    3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4:
>     0000000000750ee0
>     > [    3.093098] PKRU: 55555554
>     > [    3.093098] Call Trace:
>     > [    3.093098]  <TASK>
>     > [    3.093098]  ? die+0x31/0x80
>     > [    3.093098]  ? exc_int3+0x10e/0x120
>     > [    3.093098]  ? asm_exc_int3+0x39/0x40
>     > [    3.093098]  ? __mod_timer+0x1c3/0x370
>     > [    3.093098]  ? __mod_timer+0x1c3/0x370
>     > [    3.093098]  queue_delayed_work_on+0x23/0x30
>     > [    3.093098]  neigh_table_init+0x1bb/0x2e0
>     > [    3.093098]  arp_init+0x12/0x50
>     > [    3.093098]  inet_init+0x15b/0x2f0
>     > [    3.093098]  ? __pfx_inet_init+0x10/0x10
>     > [    3.093098]  do_one_initcall+0x58/0x230
>     > [    3.093098]  kernel_init_freeable+0x199/0x2d0
>     > [    3.093098]  ? __pfx_kernel_init+0x10/0x10
>     > [    3.093098]  kernel_init+0x15/0x1b0
>     > [    3.093098]  ret_from_fork+0x2c/0x50
>     > [    3.093098]  </TASK>
>     > [    3.093098] Modules linked in:
>     > [    3.093098] ---[ end trace 0000000000000000 ]---
>     > [    3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
>     > [    3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89 ff e8
>     f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f 20 66
>     <90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
>     > [    3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
>     > [    3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>     0000000000280003
>     > [    3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI:
>     ffffffff8441e4b8
>     > [    3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09:
>     ffffffff8441e4b8
>     > [    3.093098] R10: 0000000000000001 R11: 000000000002e990 R12:
>     0000000000000000
>     > [    3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15:
>     ffffffff8441e4b8
>     > [    3.093098] FS:  0000000000000000(0000) GS:ffff9aa90fd80000(0000)
>     knlGS:0000000000000000
>     > [    3.093098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     > [    3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4:
>     0000000000750ee0
>     > [    3.093098] PKRU: 55555554
>     > [    3.093098] Kernel panic - not syncing: Fatal exception in interrupt
>     >
>     > There are many variations, but the common pattern seems to be
>     > <something in the clock or timer code> -> int3 exception
>     >
>     > It only happens under qemu TCG (software emulation).
>     >
>     > It goes away if we recompile qemu without MTTCG support.
>     >
>     > It only happens with -smp enabled, we are using qemu -smp 4
>     >
>     > We are using qemu-system-x86_64 full system emulation on x86_64 host
>     > (ie. forcing KVM off).
>     >
>     > It happens with the latest upstream kernel and qemu, compiled from
>     > source.
> 
>     I got a bit further on this one.
> 
>     The bug happens in the code that updates the static branch used for at
>     least these two keys:
> 
>       static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
>       DEFINE_STATIC_KEY_FALSE(timers_migration_enabled);
> 
>     There are probably others (it seems a generic problem with how static
>     branches are handled by TCG), but I only see the bug for those two.
> 
>     When the static branch is updated we end up calling
>     arch/x86/kernel/alternative.c:text_poke_bp_batch().  It's best to read
>     the description of that function to see where int3 is used:
> 
>       https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
>     arch/x86/kernel/alternative.c#n2086
> 
>     I modified the qemu TCG int3 helper so it dumps the code at %rip when
>     the interrupt fires, and I can actually see the changes in the above
>     function happen, first int3 being written, then the end of the nop,
>     then the int3 being overwritten with the first byte of the nop.
> 
>     Unfortunately the int3 still fires after the code has been completely
>     rewritten to its final (ie nop) value.
> 
>     This seems to indicate to me that neither the self-write to the kernel
>     text segment, nor sync_core (implemented by a "iret to self" trick)
>     invalidates the qemu TCG translation block containing the old int3
>     helper call.  Thus we (qemu) never "see" the new nop, we keep
>     emulating int3, and that causes the kernel to crash.
> 
>     I added print statements inside tb_invalidate_phys_page() and this
>     function seems never to be called at all.  It's my understanding that
>     at least the kernel writing to its text segment ought to cause
>     tb_invalidate_phys_page() to be called, but I'm not super-familiar
>     with this qemu code.
> 
>     Richard Henderson - do you have any suggestions?
> 
>     Rich.
> 
>     --
>     Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/
>     ~rjones
>     Read my programming and virtualization blog: http://rwmj.wordpress.com
>     libguestfs lets you edit virtual machines.  Supports shell scripting,
>     bindings from many languages.  http://libguestfs.org
> 
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


  parent reply	other threads:[~2023-07-05 16:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+G9fYsETJQm0Ue7hGsb+nbsiMikwycOV3V0DPr6WC2r61KRBQ@mail.gmail.com>
2023-06-21 15:31 ` qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing Arnd Bergmann
2023-06-21 16:06   ` Peter Zijlstra
2023-07-04  7:46     ` Richard W.M. Jones
2023-07-04 13:21       ` Richard W.M. Jones
2023-07-05 16:28       ` Richard W.M. Jones
     [not found]         ` <CAFXwXrmbpuFNf5=nQxiTteo8fpCdAbK4pEAN176Cq0yvwZcfFw@mail.gmail.com>
2023-07-05 16:35           ` Richard W.M. Jones [this message]
     [not found]             ` <CAFXwXrk1FEZPUO-zqNVJZ6YCHKUkgNehwmyDYuOr5fx8ff0OCA@mail.gmail.com>
2023-07-05 16:40               ` Richard W.M. Jones
2023-07-06  6:13                 ` Richard Henderson
2023-07-05 16:37           ` Richard W.M. Jones
2023-07-05 21:50         ` Richard W.M. Jones
2023-07-06  6:30           ` Richard Henderson
2023-07-06 10:28             ` Richard W.M. Jones
2023-08-08  5:27               ` John Stultz
2023-08-08  7:28                 ` Richard W.M. Jones
2025-10-25 19:48                   ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230705163523.GJ7636@redhat.com \
    --to=rjones@redhat.com \
    --cc=anders.roxell@linaro.org \
    --cc=arnd@arndb.de \
    --cc=ben.copeland@linaro.org \
    --cc=daniel.diaz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard.henderson@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.