public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Naresh Kamboju" <naresh.kamboju@linaro.org>,
	"Anders Roxell" <anders.roxell@linaro.org>,
	"Daniel Díaz" <daniel.diaz@linaro.org>,
	"Benjamin Copeland" <ben.copeland@linaro.org>,
	"lkml - Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing.
Date: Wed, 5 Jul 2023 17:40:06 +0100	[thread overview]
Message-ID: <20230705164006.GL7636@redhat.com> (raw)
In-Reply-To: <CAFXwXrk1FEZPUO-zqNVJZ6YCHKUkgNehwmyDYuOr5fx8ff0OCA@mail.gmail.com>

On Wed, Jul 05, 2023 at 06:36:43PM +0200, Richard Henderson wrote:
> No, I thought it would be the fix for 8.0.0.

Thanks for the suggestion anyway.

Am I right in thinking that tb_invalidate_phys_page() ought to be
called when the kernel self-modifies its text segment?  If there's
some function that we expect to be called in this case then I can
instrument it.

Rich.

> r~
> 
> On Wed, 5 July 2023, 18:35 Richard W.M. Jones, <rjones@redhat.com> wrote:
> 
>     On Wed, Jul 05, 2023 at 06:32:30PM +0200, Richard Henderson wrote:
>     > https://gitlab.com/qemu-project/qemu/-/commit/
>     > 3307e08c6f142bb3d2406cfbc0ee19359748b51a
> 
>     I've got that patch in my qemu tree.  Do you suspect it's
>     wrong and want me to try reverting it?
> 
>     Rich.
> 
>     > r~
>     >
>     > On Wed, 5 July 2023, 18:28 Richard W.M. Jones, <rjones@redhat.com> wrote:
>     >
>     >     On Tue, Jul 04, 2023 at 08:46:20AM +0100, Richard W.M. Jones wrote:
>     >     > We have been having the same sort of problem
>     >     > (https://bugzilla.redhat.com/show_bug.cgi?id=2216496).  It's
>     another
>     >     > of those bugs that requires hundreds or thousands of boot
>     iterations
>     >     > before you can see it.  There is a test in comment 27 but it
>     requires
>     >     > guestfish and some hacking to work.  I'll try to come up with a
>     >     > cleaner test later.
>     >     >
>     >     > We see stack traces like:
>     >     >
>     >     > [    3.081939] clocksource: acpi_pm: mask: 0xffffff max_cycles:
>     0xffffff,
>     >     max_idle_ns: 2085701024 ns
>     >     > [    3.082266] clocksource: Switched to clocksource acpi_pm
>     >     > [    3.090201] NET: Registered PF_INET protocol family
>     >     > [    3.093098] int3: 0000 [#1] PREEMPT SMP NOPTI
>     >     > [    3.093098] CPU: 3 PID: 1 Comm: swapper/0 Not tainted
>     >     6.4.0-10173-ga901a3568fd2 #8
>     >     > [    3.093098] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
>     BIOS
>     >     rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>     >     > [    3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
>     >     > [    3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89
>     ff e8
>     >     f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f
>     20 66
>     >     <90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
>     >     > [    3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
>     >     > [    3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>     >     0000000000280003
>     >     > [    3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI:
>     >     ffffffff8441e4b8
>     >     > [    3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09:
>     >     ffffffff8441e4b8
>     >     > [    3.093098] R10: 0000000000000001 R11: 000000000002e990 R12:
>     >     0000000000000000
>     >     > [    3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15:
>     >     ffffffff8441e4b8
>     >     > [    3.093098] FS:  0000000000000000(0000) GS:ffff9aa90fd80000
>     (0000)
>     >     knlGS:0000000000000000
>     >     > [    3.093098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     >     > [    3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4:
>     >     0000000000750ee0
>     >     > [    3.093098] PKRU: 55555554
>     >     > [    3.093098] Call Trace:
>     >     > [    3.093098]  <TASK>
>     >     > [    3.093098]  ? die+0x31/0x80
>     >     > [    3.093098]  ? exc_int3+0x10e/0x120
>     >     > [    3.093098]  ? asm_exc_int3+0x39/0x40
>     >     > [    3.093098]  ? __mod_timer+0x1c3/0x370
>     >     > [    3.093098]  ? __mod_timer+0x1c3/0x370
>     >     > [    3.093098]  queue_delayed_work_on+0x23/0x30
>     >     > [    3.093098]  neigh_table_init+0x1bb/0x2e0
>     >     > [    3.093098]  arp_init+0x12/0x50
>     >     > [    3.093098]  inet_init+0x15b/0x2f0
>     >     > [    3.093098]  ? __pfx_inet_init+0x10/0x10
>     >     > [    3.093098]  do_one_initcall+0x58/0x230
>     >     > [    3.093098]  kernel_init_freeable+0x199/0x2d0
>     >     > [    3.093098]  ? __pfx_kernel_init+0x10/0x10
>     >     > [    3.093098]  kernel_init+0x15/0x1b0
>     >     > [    3.093098]  ret_from_fork+0x2c/0x50
>     >     > [    3.093098]  </TASK>
>     >     > [    3.093098] Modules linked in:
>     >     > [    3.093098] ---[ end trace 0000000000000000 ]---
>     >     > [    3.093098] RIP: 0010:__mod_timer+0x1c3/0x370
>     >     > [    3.093098] Code: 00 00 41 bd ff ff ff ff 31 d2 4c 89 f6 4c 89
>     ff e8
>     >     f2 ef ff ff 41 89 c4 85 c0 75 09 83 e3 01 0f 85 54 ff ff ff 41 8b 4f
>     20 66
>     >     <90> f7 c1 00 00 10 00 0f 84 23 01 00 00 48 c7 c3 40 cc 01 00 65 48
>     >     > [    3.093098] RSP: 0018:ffffaf1600013e00 EFLAGS: 00000046
>     >     > [    3.093098] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>     >     0000000000280003
>     >     > [    3.093098] RDX: 0000000000000000 RSI: ffff9aa90fd9dec0 RDI:
>     >     ffffffff8441e4b8
>     >     > [    3.093098] RBP: 00000000fffc200d R08: ffffffff8441e4a0 R09:
>     >     ffffffff8441e4b8
>     >     > [    3.093098] R10: 0000000000000001 R11: 000000000002e990 R12:
>     >     0000000000000000
>     >     > [    3.093098] R13: 00000000ffffffff R14: ffff9aa90fd9dec0 R15:
>     >     ffffffff8441e4b8
>     >     > [    3.093098] FS:  0000000000000000(0000) GS:ffff9aa90fd80000
>     (0000)
>     >     knlGS:0000000000000000
>     >     > [    3.093098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     >     > [    3.093098] CR2: 0000000000000000 CR3: 000000004e02e000 CR4:
>     >     0000000000750ee0
>     >     > [    3.093098] PKRU: 55555554
>     >     > [    3.093098] Kernel panic - not syncing: Fatal exception in
>     interrupt
>     >     >
>     >     > There are many variations, but the common pattern seems to be
>     >     > <something in the clock or timer code> -> int3 exception
>     >     >
>     >     > It only happens under qemu TCG (software emulation).
>     >     >
>     >     > It goes away if we recompile qemu without MTTCG support.
>     >     >
>     >     > It only happens with -smp enabled, we are using qemu -smp 4
>     >     >
>     >     > We are using qemu-system-x86_64 full system emulation on x86_64
>     host
>     >     > (ie. forcing KVM off).
>     >     >
>     >     > It happens with the latest upstream kernel and qemu, compiled from
>     >     > source.
>     >
>     >     I got a bit further on this one.
>     >
>     >     The bug happens in the code that updates the static branch used for
>     at
>     >     least these two keys:
>     >
>     >       static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
>     >       DEFINE_STATIC_KEY_FALSE(timers_migration_enabled);
>     >
>     >     There are probably others (it seems a generic problem with how static
>     >     branches are handled by TCG), but I only see the bug for those two.
>     >
>     >     When the static branch is updated we end up calling
>     >     arch/x86/kernel/alternative.c:text_poke_bp_batch().  It's best to
>     read
>     >     the description of that function to see where int3 is used:
>     >
>     >       https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
>     tree/
>     >     arch/x86/kernel/alternative.c#n2086
>     >
>     >     I modified the qemu TCG int3 helper so it dumps the code at %rip when
>     >     the interrupt fires, and I can actually see the changes in the above
>     >     function happen, first int3 being written, then the end of the nop,
>     >     then the int3 being overwritten with the first byte of the nop.
>     >
>     >     Unfortunately the int3 still fires after the code has been completely
>     >     rewritten to its final (ie nop) value.
>     >
>     >     This seems to indicate to me that neither the self-write to the
>     kernel
>     >     text segment, nor sync_core (implemented by a "iret to self" trick)
>     >     invalidates the qemu TCG translation block containing the old int3
>     >     helper call.  Thus we (qemu) never "see" the new nop, we keep
>     >     emulating int3, and that causes the kernel to crash.
>     >
>     >     I added print statements inside tb_invalidate_phys_page() and this
>     >     function seems never to be called at all.  It's my understanding that
>     >     at least the kernel writing to its text segment ought to cause
>     >     tb_invalidate_phys_page() to be called, but I'm not super-familiar
>     >     with this qemu code.
>     >
>     >     Richard Henderson - do you have any suggestions?
>     >
>     >     Rich.
>     >
>     >     --
>     >     Richard Jones, Virtualization Group, Red Hat http://people.redhat.com
>     /
>     >     ~rjones
>     >     Read my programming and virtualization blog: http://
>     rwmj.wordpress.com
>     >     libguestfs lets you edit virtual machines.  Supports shell scripting,
>     >     bindings from many languages.  http://libguestfs.org
>     >
>     >
> 
>     --
>     Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/
>     ~rjones
>     Read my programming and virtualization blog: http://rwmj.wordpress.com
>     virt-top is 'top' for virtual machines.  Tiny program with many
>     powerful monitoring features, net stats, disk stats, logging, etc.
>     http://people.redhat.com/~rjones/virt-top
> 
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


  parent reply	other threads:[~2023-07-05 16:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+G9fYsETJQm0Ue7hGsb+nbsiMikwycOV3V0DPr6WC2r61KRBQ@mail.gmail.com>
2023-06-21 15:31 ` qemu-x86_64 booting with 8.0.0 stil see int3: when running LTP tracing testing Arnd Bergmann
2023-06-21 16:06   ` Peter Zijlstra
2023-07-04  7:46     ` Richard W.M. Jones
2023-07-04 13:21       ` Richard W.M. Jones
2023-07-05 16:28       ` Richard W.M. Jones
     [not found]         ` <CAFXwXrmbpuFNf5=nQxiTteo8fpCdAbK4pEAN176Cq0yvwZcfFw@mail.gmail.com>
2023-07-05 16:35           ` Richard W.M. Jones
     [not found]             ` <CAFXwXrk1FEZPUO-zqNVJZ6YCHKUkgNehwmyDYuOr5fx8ff0OCA@mail.gmail.com>
2023-07-05 16:40               ` Richard W.M. Jones [this message]
2023-07-06  6:13                 ` Richard Henderson
2023-07-05 16:37           ` Richard W.M. Jones
2023-07-05 21:50         ` Richard W.M. Jones
2023-07-06  6:30           ` Richard Henderson
2023-07-06 10:28             ` Richard W.M. Jones
2023-08-08  5:27               ` John Stultz
2023-08-08  7:28                 ` Richard W.M. Jones
2025-10-25 19:48                   ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230705164006.GL7636@redhat.com \
    --to=rjones@redhat.com \
    --cc=anders.roxell@linaro.org \
    --cc=arnd@arndb.de \
    --cc=ben.copeland@linaro.org \
    --cc=daniel.diaz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard.henderson@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox