All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>, x86-ml <x86@kernel.org>,
	 lkml <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] locking/urgent for v6.17-rc1
Date: Fri, 22 Aug 2025 17:28:02 -0700	[thread overview]
Message-ID: <aKkLEtoDXKxAAWju@google.com> (raw)
In-Reply-To: <20250822141654.Sjoffo8F@linutronix.de>

On Fri, Aug 22, 2025, Sebastian Andrzej Siewior wrote:
> On 2025-08-21 12:45:52 [-0700], Sean Christopherson wrote:
> > Piggybacking the futex private hashing attention, the new fanciness is causing
> > crashes in my testing.  The crashes are 100% reproducible, but my reproducer is
> > simply running a variety of tests in parallel, i.e. isn't very debug-friendly,
> > and the code itself is black magic to me, so all I've done is bisect.
> > 
> > I reported the issue on the original thread, but haven't seen any follow-up.
> > 
> > https://lore.kernel.org/all/aJ_vEP2EHj6l0xRT@google.com
> 
> I somehow missed it. Can you try rc2 with the patch I just sent? 

No dice, fails with the same signature.

I got a trimmed down reproduer.  Load KVM, run this in the background (in a loop)
to constantly trigger try_to_wake_up() on relevant tasks (needs to be run as root):

  echo Y > /sys/module/kvm/parameters/nx_huge_pages
  echo N > /sys/module/kvm/parameters/nx_huge_pages
  sleep .2

and then run the hardware_disable_test KVM selftest (from
tools/testing/selftests/kvm/hardware_disable_test.c).

Strace on hardware_disable_test spewed a whole pile of these

  wait4(32861, 0x7ffc66475dec, WNOHANG, NULL) = 0
  futex(0x7fb735c43000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)

immediately before the crash.  I assume it corresponds to this:

		/* Child is still running, keep waiting. */
		if (pid != waitpid(pid, &status, WNOHANG))
			continue;

I also got a new splat on the "WARN_ON_ONCE(ret < 0);" at the end of __futex_ref_atomic_end().
This happened during boot; AFAICT our userspace was setting up cgroups.  In this
case, the system hung and I had to reboot.

  ------------[ cut here ]------------
  WARNING: CPU: 45 PID: 0 at kernel/futex/core.c:1604 futex_ref_rcu+0xbf/0xf0
  Modules linked in: vfat fat i2c_mux_pca954x i2c_mux spidev cdc_acm xhci_pci xhci_hcd gq(O) sha3_generic
  CPU: 45 UID: 0 PID: 0 Comm: swapper/45 Tainted: G S         O        6.17.0-smp--1278d576b27d-futex #886 NONE 
  Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
  Hardware name: Google LLC Indus/Indus_QC_03, BIOS 30.110.0 09/13/2024
  RIP: 0010:futex_ref_rcu+0xbf/0xf0
  Code: c7 04 0a 00 00 00 00 48 ff c0 eb c2 65 ff 01 89 e8 4c 01 f0 48 ff c0 48 89 c1 f0 48 0f c1 8b 48 01 00 00 48 01 c1 74 06 79 0c <0f> 0b eb 08 48 89 df e8 55 0a f9 ff 48 89 df 5b 41 5e 5d e9 f9 5c
  RSP: 0018:ffffa43c8d440ec8 EFLAGS: 00010286
  RAX: 8000000000000000 RBX: ffff933782245080 RCX: ffffffffffffffff
  RDX: 0000000000000060 RSI: 0000000000000060 RDI: ffffffffac840520
  RBP: 0000000000000000 R08: ffff933680044d00 R09: ffffffff00000000
  R10: ffff9365c9b59e00 R11: ffff9365c9b59e00 R12: ffffffffab77ac10
  R13: ffff9337822451b8 R14: 7fffffffffffffff R15: ffff9365c749de00
  FS:  0000000000000000(0000) GS:ffff9395514f2000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fad8a21cf38 CR3: 00000055c062b002 CR4: 00000000007706f0
  PKRU: 55555554
  Call Trace:
   <IRQ>
   rcu_do_batch+0x250/0x7e0
   rcu_core+0x12f/0x230
   handle_softirqs+0xc8/0x280
   __irq_exit_rcu+0x48/0x100
   sysvec_apic_timer_interrupt+0x74/0x80
   </IRQ>
   <TASK>
   asm_sysvec_apic_timer_interrupt+0x1a/0x20
  RIP: 0010:cpuidle_enter_state+0xfb/0x290
  Code: bb f6 ff ff 49 89 c4 8b 73 04 bf ff ff ff ff e8 9b 68 d8 ff 31 ff e8 f4 32 48 ff 80 7c 24 04 00 74 05 e8 c8 68 d8 ff fb 85 ed <0f> 88 ba 00 00 00 89 e9 48 6b f9 68 4c 8b 44 24 08 49 8b 54 38 30
  RSP: 0018:ffffa43c803d3e80 EFLAGS: 00000206
  RAX: ffff9395514f2000 RBX: ffff9394ff776548 RCX: 000000000000001f
  RDX: 000000000018ec50 RSI: 000000000000002d RDI: 0000000000000000
  RBP: 0000000000000003 R08: 0000000000000002 R09: 0000000000000002
  R10: 00000000000003dc R11: 0000000000000389 R12: 00000010fb32644d
  R13: 00000010fb2333f7 R14: ffffffffad276f68 R15: 0000000000000003
   cpuidle_enter+0x2c/0x40
   do_idle+0x1ac/0x250
   cpu_startup_entry+0x2a/0x30
   start_secondary+0x80/0x80
   common_startup_64+0x13e/0x140
   </TASK>
  ---[ end trace 0000000000000000 ]---

Heh, and two more when booting a different system.  Guess it's my lucky day.
This time whatever went sideways didn't appear to be fatal as the system booted
and I could ssh in.  One is the same WARN as above, and the second WARN on the
system hit the

  WARN_ON_ONCE(atomic_long_read(&mm->futex_atomic) != 0);

in futex_hash_allocate().

  ------------[ cut here ]------------
  WARNING: CPU: 120 PID: 11779 at kernel/futex/core.c:1553 futex_hash_allocate+0x436/0x450
  Modules linked in: vfat fat ccp k10temp i2c_piix4 cdc_acm xhci_pci xhci_hcd gq(O) sha3_generic
  CPU: 120 UID: 0 PID: 11779 Comm: borglet Tainted: G     U  W  O        6.17.0-smp--1278d576b27d-futex #886 NONE 
  Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
  Hardware name: Google, Inc.                                                       Arcadia_IT_80/Arcadia_IT_80, BIOS 34.64.2-0 12/26/2024
  RIP: 0010:futex_hash_allocate+0x436/0x450
  Code: 31 ff 65 48 8b 05 ba bc ae 02 48 3b 44 24 48 75 20 44 89 f8 48 83 c4 50 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b e9 9d fe ff ff <0f> 0b e9 c0 fe ff ff e8 ce 99 af 00 66 66 66 66 66 2e 0f 1f 84 00
  RSP: 0018:ffffbbc0f1237d10 EFLAGS: 00010286
  RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffa25747532180
  RDX: 0000000000000400 RSI: 000000000000ffc0 RDI: 00000000000039b8
  RBP: ffffa296a2620000 R08: 00000000004029c0 R09: 00000000ffffffff
  R10: 00000000ffffffff R11: 0000000000010040 R12: ffffa2571336b700
  R13: ffffa2571336b600 R14: ffffa2571336b600 R15: ffffa296b9270000
  FS:  00007f6bbd3297c0(0000) GS:ffffa2d5a31b2000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f6bae810f38 CR3: 00000001330a4001 CR4: 0000000000770ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? cgroup_can_fork+0x258/0x420
   copy_process+0xae3/0xff0
   kernel_clone+0x99/0x320
   __x64_sys_clone+0xc8/0xf0
   do_syscall_64+0x6f/0x1f0
   ? arch_exit_to_user_mode_prepare+0x9/0x50
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7f6bbd466051
  Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d0 48 89 c7 b8 3c 00 00 00
  RSP: 002b:00007fffff2eda98 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
  RAX: ffffffffffffffda RBX: 00007f6bae812700 RCX: 00007f6bbd466051
  RDX: 00007f6bae8129d0 RSI: 00007f6bae810f30 RDI: 00000000003d0f00
  RBP: 00007fffff2edad0 R08: 00007f6bae812700 R09: 00007f6bae812700
  R10: 00007f6bae8129d0 R11: 0000000000000206 R12: 00007f6bae8129d0
  R13: 00007fffff2edb66 R14: 00007fffff2edbd0 R15: 00007f6bae810f40
   </TASK>
  ---[ end trace 0000000000000000 ]---

  reply	other threads:[~2025-08-23  0:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-09 18:02 [GIT PULL] locking/urgent for v6.17-rc1 Borislav Petkov
2025-08-10  5:58 ` pr-tracker-bot
2025-08-21 18:19 ` Linus Torvalds
2025-08-21 19:45   ` Sean Christopherson
2025-08-22 14:16     ` Sebastian Andrzej Siewior
2025-08-23  0:28       ` Sean Christopherson [this message]
2025-08-25 16:04         ` Sebastian Andrzej Siewior
2025-08-25 23:55           ` Sean Christopherson
2025-08-22 10:57   ` Sebastian Andrzej Siewior
2025-08-22 14:12     ` [PATCH] futex: Move futex_hash_free() back to __mmput() Sebastian Andrzej Siewior
2025-08-27 11:34       ` [tip: locking/urgent] " tip-bot2 for Sebastian Andrzej Siewior
2025-08-31 12:23       ` tip-bot2 for Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKkLEtoDXKxAAWju@google.com \
    --to=seanjc@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.