linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Calvin Owens <calvin@wbinvd.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-kernel@vger.kernel.org, "Lai, Yi" <yi1.lai@linux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	x86@kernel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local hash
Date: Fri, 20 Jun 2025 18:02:28 -0700	[thread overview]
Message-ID: <aFYEpPIwhlL1WvR0@mozart.vkv.me> (raw)
In-Reply-To: <aFWuwdJUEUD8VcTJ@mozart.vkv.me>

On Friday 06/20 at 11:56 -0700, Calvin Owens wrote:
> On Friday 06/20 at 12:31 +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-06-19 14:07:30 [-0700], Calvin Owens wrote:
> > > > Machine #2 oopsed with the GCC kernel after just over an hour:
> > > > 
> > > >     BUG: unable to handle page fault for address: ffff88a91eac4458
> > > >     RIP: 0010:futex_hash+0x16/0x90
> > …
> > > >     Call Trace:
> > > >      <TASK>
> > > >      futex_wait_setup+0x51/0x1b0
> > …
> > 
> > The futex_hash_bucket pointer has an invalid ->priv pointer.
> > This could be use-after-free or double-free. I've been looking through
> > your config and you don't have CONFIG_SLAB_FREELIST_* set. I don't
> > remember which one but one of the two has a "primitiv" double free
> > detection. 
> > 
> > …
> > > I am not able to reproduce the oops at all with these options:
> > > 
> > >     * DEBUG_PAGEALLOC_ENABLE_DEFAULT
> > >     * SLUB_DEBUG_ON
> > 
> > SLUB_DEBUG_ON is something that would "reliably" notice double free.
> > If you drop SLUB_DEBUG_ON (but keep SLUB_DEBUG) then you can boot with
> > slab_debug=f keeping only the consistency checks. The "poison" checks
> > would be excluded for instance. That allocation is kvzalloc() but it
> > should be small on your machine to avoid vmalloc() and use only
> > kmalloc().
> 
> I'll try slab_debug=f next.

I just hit the oops with SLUB_DEBUG and slab_debug=f, but nothing new
was logged.

> > > I'm also experimenting with stress-ng as a reproducer, no luck so far.
> > 
> > Not sure what you are using there. I think cargo does:
> > - lock/ unlock in a threads
> > - create new thread which triggers auto-resize
> > - auto-resize gets delayed due to lock/ unlock in other threads (the
> >   reference is held)
> 
> I've tried various combinations of --io, --fork, --exec, --futex, --cpu,
> --vm, and --forkheavy. It's not mixing the operations in threads as I
> understand it, so I guess it won't ever do anything like what you're
> describing no matter what stressors I run?
> 
> I did get this message once, something I haven't seen before:
> 
>     [33024.247423] [    T281] sched: DL replenish lagged too much
> 
> ...but maybe that's my fault for overloading it so much.
> 
> > And now something happens leading to what we see.
> > _Maybe_ the cargo application terminates/ execs before the new struct is
> > assigned in an unexpected way.
> > The regular hash bucket has reference counting so it should raise
> > warnings if it goes wrong. I haven't seen those.
> > 
> > > A third machine with an older Skylake CPU died overnight, but nothing
> > > was logged over netconsole. Luckily it actually has a serial header on
> > > the motherboard, so that's wired up and it's running again, maybe it
> > > dies in a different way that might be a better clue...
> > 
> > So far I *think* that cargo does something that I don't expect and this
> > leads to a memory double-free. The SLUB_DEBUG_ON hopefully delays the
> > process long enough that the double free does not trigger.
> > 
> > I think I'm going to look for a random rust packet that is using cargo
> > for building (unless you have a recommendation) and look what it is
> > doing. It was always cargo after all. Maybe this brings some light.
> 
> The list of things in my big build that use cargo is pretty short:
> 
>     === Dependendency Snapshot ===
>     Dep    =mc:house:cargo-native.do_install
>     Package=mc:house:cargo-native.do_populate_sysroot
>     RDep   =mc:house:cargo-c-native.do_prepare_recipe_sysroot
>             mc:house:cargo-native.do_create_spdx
>             mc:house:cbindgen-native.do_prepare_recipe_sysroot
>             mc:house:librsvg-native.do_prepare_recipe_sysroot
>             mc:house:librsvg.do_prepare_recipe_sysroot
>             mc:house:libstd-rs.do_prepare_recipe_sysroot
>             mc:house:python3-maturin-native.do_prepare_recipe_sysroot
>             mc:house:python3-maturin-native.do_populate_sysroot
>             mc:house:python3-rpds-py.do_prepare_recipe_sysroot
>             mc:house:python3-setuptools-rust-native.do_prepare_recipe_sysroot
> 
> I've tried building each of those targets alone (and all of them
> together) in a loop, but that hasn't triggered anything. I guess that
> other concurrent builds are necessary to trigger whatever this is.
> 
> I tried using stress-ng --vm and --cpu together to "load up" the machine
> while running the isolated targets, but that hasn't worked either.
> 
> If you want to run *exactly* what I am, clone this unholy mess:
> 
>     https://github.com/jcalvinowens/meta-house
> 
> ...setup for yocto and install kas as described here:
> 
>     https://docs.yoctoproject.org/ref-manual/system-requirements.html#ubuntu-and-debian
>     https://github.com/jcalvinowens/meta-house/blob/6f6a9c643169fc37ba809f7230261d0e5255b6d7/README.md#kas
> 
> ...and run (for the 32-thread machine):
> 
>     BB_NUMBER_THREADS="48" PARALLEL_MAKE="-j 36" kas build kas/walnascar.yaml -- -k
> 
> Fair warning, it needs a *lot* of RAM at the high concurrency, I have
> 96GB with 128GB of swap to spill into. It needs ~500GB of disk space if
> it runs to completion and downloads ~15GB of tarballs when it starts.
> 
> Annoyingly it won't work if the system compiler is gcc-15 right now (the
> verison of glib it has won't build, haven't had a chance to fix it yet).
> 
> > > > > Thanks,
> > > > > Calvin
> > 
> > Sebastian

  reply	other threads:[~2025-06-21  1:02 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16 16:29 [PATCH v12 00/21] futex: Add support task local hash maps, FUTEX2_NUMA and FUTEX2_MPOL Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 01/21] rcuref: Provide rcuref_is_dead() Sebastian Andrzej Siewior
2025-05-05 21:09   ` André Almeida
2025-05-08 10:34   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 02/21] mm: Add vmalloc_huge_node() Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 03/21] futex: Move futex_queue() into futex_wait_setup() Sebastian Andrzej Siewior
2025-05-05 21:43   ` André Almeida
2025-05-16 12:53     ` Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 04/21] futex: Pull futex_hash() out of futex_q_lock() Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 05/21] futex: Create hb scopes Sebastian Andrzej Siewior
2025-05-06 23:45   ` André Almeida
2025-05-16 12:20     ` Sebastian Andrzej Siewior
2025-05-16 13:23     ` Peter Zijlstra
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 06/21] futex: Create futex_hash() get/put class Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 07/21] futex: Create private_hash() " Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 08/21] futex: Acquire a hash reference in futex_wait_multiple_setup() Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 09/21] futex: Decrease the waiter count before the unlock operation Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 10/21] futex: Introduce futex_q_lockptr_lock() Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-05-08 19:06   ` [PATCH v12 10/21] " André Almeida
2025-05-16 12:18     ` Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 11/21] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 12/21] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 13/21] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 14/21] futex: Allow to resize the private local hash Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-05-08 20:32   ` [PATCH v12 14/21] " André Almeida
2025-05-16 10:49     ` Sebastian Andrzej Siewior
2025-05-16 13:00       ` André Almeida
2025-05-10  8:45   ` [PATCH] futex: Fix futex_mm_init() build failure on older compilers, remove rcu_assign_pointer() Ingo Molnar
2025-05-11  8:11     ` [tip: locking/futex] futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init() tip-bot2 for Ingo Molnar
2025-06-01  7:39   ` [PATCH v12 14/21] futex: Allow to resize the private local hash Lai, Yi
2025-06-02 11:00     ` Sebastian Andrzej Siewior
2025-06-02 14:36       ` Lai, Yi
2025-06-02 14:44         ` Sebastian Andrzej Siewior
2025-06-02 15:00           ` Lai, Yi
2025-06-11  9:20       ` [tip: locking/urgent] " tip-bot2 for Sebastian Andrzej Siewior
2025-06-11 14:39       ` tip-bot2 for Sebastian Andrzej Siewior
2025-06-11 14:43         ` Sebastian Andrzej Siewior
2025-06-11 15:11           ` Peter Zijlstra
2025-06-11 15:20             ` Peter Zijlstra
2025-06-11 15:35               ` Sebastian Andrzej Siewior
2025-06-16 17:14         ` Calvin Owens
2025-06-17  7:16           ` Sebastian Andrzej Siewior
2025-06-17  9:23             ` Calvin Owens
2025-06-17  9:50               ` Sebastian Andrzej Siewior
2025-06-17 16:11                 ` Calvin Owens
2025-06-18  2:15                   ` Calvin Owens
2025-06-18 16:47                     ` Sebastian Andrzej Siewior
2025-06-18 16:03                   ` Sebastian Andrzej Siewior
2025-06-18 16:49                     ` Calvin Owens
2025-06-18 17:09                       ` Sebastian Andrzej Siewior
2025-06-18 20:56                         ` Calvin Owens
2025-06-18 22:47                           ` Calvin Owens
2025-06-19 21:07                             ` Calvin Owens
2025-06-20 10:31                               ` Sebastian Andrzej Siewior
2025-06-20 18:56                                 ` Calvin Owens
2025-06-21  1:02                                   ` Calvin Owens [this message]
2025-06-21  7:24                                     ` Calvin Owens
2025-06-21 21:01                                       ` Sebastian Andrzej Siewior
2025-06-22 16:17                                         ` Calvin Owens
2025-04-16 16:29 ` [PATCH v12 15/21] futex: Allow to make the private hash immutable Sebastian Andrzej Siewior
2025-05-02 18:01   ` Peter Zijlstra
2025-05-05  7:14     ` Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 16/21] futex: Implement FUTEX2_NUMA Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 17/21] futex: Implement FUTEX2_MPOL Sebastian Andrzej Siewior
2025-05-02 18:45   ` Peter Zijlstra
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-04-16 16:29 ` [PATCH v12 18/21] tools headers: Synchronize prctl.h ABI header Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 19/21] tools/perf: Allow to select the number of hash buckets Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 20/21] selftests/futex: Add futex_priv_hash Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-05-09 21:22   ` [PATCH v12 20/21] " André Almeida
2025-05-16  7:38     ` Sebastian Andrzej Siewior
2025-05-27 11:28   ` Mark Brown
2025-05-27 12:23     ` Sebastian Andrzej Siewior
2025-05-27 12:35       ` Mark Brown
2025-05-27 12:43         ` Sebastian Andrzej Siewior
2025-05-27 12:59           ` Mark Brown
2025-05-27 13:25             ` Sebastian Andrzej Siewior
2025-05-27 13:40               ` Mark Brown
2025-05-27 13:45                 ` Sebastian Andrzej Siewior
2025-04-16 16:29 ` [PATCH v12 21/21] selftests/futex: Add futex_numa_mpol Sebastian Andrzej Siewior
2025-05-02 19:08   ` Peter Zijlstra
2025-05-05  7:33     ` Sebastian Andrzej Siewior
2025-05-02 19:16   ` Peter Zijlstra
2025-05-05  7:36     ` Sebastian Andrzej Siewior
2025-05-08 10:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-04-16 16:31 ` [PATCH v12 00/21] futex: Add support task local hash maps, FUTEX2_NUMA and FUTEX2_MPOL Sebastian Andrzej Siewior
2025-05-02 19:48   ` Peter Zijlstra
2025-05-03 10:09     ` Peter Zijlstra
2025-05-05  7:30       ` Sebastian Andrzej Siewior
2025-05-06  7:36         ` Peter Zijlstra
2025-05-09 11:41           ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFYEpPIwhlL1WvR0@mozart.vkv.me \
    --to=calvin@wbinvd.org \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    --cc=yi1.lai@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).