All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
	"André Almeida" <andrealmeid@igalia.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"Waiman Long" <longman@redhat.com>
Subject: Re: [PATCH v9 00/11] futex: Add support task local hash maps.
Date: Mon, 10 Mar 2025 16:57:41 +0100	[thread overview]
Message-ID: <20250310155741.GF19344@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250304145837.a8g07J-k@linutronix.de>

On Tue, Mar 04, 2025 at 03:58:37PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-03-03 17:40:16 [+0100], To Peter Zijlstra wrote:
> …
> > You avoided the two states by dropping refcount only there is no !new
> > pointer. That should work.
> …
> > My first few tests succeeded. And I have a few RCU annotations, which I
> > post once I complete them and finish my requeue-pi tests.
> 
> get_futex_key() has this:
> |…
> |         if (!fshared) {
> |…
> |                 if (IS_ENABLED(CONFIG_MMU))
> |                         key->private.mm = mm;
> |                 else
> |                         key->private.mm = NULL;
> |
> |                 key->private.address = address;
> |
> 
> and now __futex_hash_private() has this:
> | {
> |         if (!futex_key_is_private(key))
> |                 return NULL;
> |
> |         if (!fph)
> |                 fph = rcu_dereference(key->private.mm->futex_phash);
> 
> Dereferencing mm won't work on !CONFIG_MMU. We could limit private hash
> to !CONFIG_BASE_SMALL && CONFIG_MMU.

Humph, yeah, not sure we should care about !MMU.

> Ignoring this, I managed to crash the box on top of 49fd6b8f5d59
> ("futex: Implement FUTEX2_MPOL"). I had one commit on top to make the
> prctl not blocking (make futex_hash_allocate(, false)). This is simulate
> the fork resize. The backtrace:
> | [   T8658] BUG: unable to handle page fault for address: fffffffffffffff0
> | [   T8658] #PF: supervisor read access in kernel mode
> | [   T8658] #PF: error_code(0x0000) - not-present page
> | [   T8658] PGD 2c5a067 P4D 2c5a067 PUD 2c5c067 PMD 0
> | [   T8658] Oops: Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
> | [   T8658] CPU: 6 UID: 1001 PID: 8658 Comm: thread-create-l Not tainted 6.14.0-rc4+ #188 676565269ee73396c27dead3a66b3f774bd9af57
> | [   T8658] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
> | [   T8658] RIP: 0010:plist_check_list+0xb/0xa0
> | [   T8658] Code: cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 41 54 49 89 fc 55 53 48 83 ec 10 <48> 8b 1f 48 8b 43 08 48 39 c7  74 27 48 8b 4f 08 50 49 89 f8 48 89
> | [   T8658] RSP: 0018:ffffc90022e27c90 EFLAGS: 00010286
> | [   T8658] RAX: 0000000000000000 RBX: ffffc90022e27e00 RCX: 0000000000000000
> | [   T8658] RDX: ffff888558da02a8 RSI: ffff888558da02a8 RDI: fffffffffffffff0
> | [   T8658] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8885680dc980
> | [   T8658] R10: 0000031e8e1a7200 R11: ffff888574990028 R12: fffffffffffffff0
> | [   T8658] R13: ffff888558da02a8 R14: ffffc90022e27e48 R15: ffffc90022e27d38
> | [   T8658] FS:  00007f741af9e6c0(0000) GS:ffff8885a7c2b000(0000) knlGS:0000000000000000
> | [   T8658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> | [   T8658] CR2: fffffffffffffff0 CR3: 00000006d7aca005 CR4: 00000000000626f0
> | [   T8658] Call Trace:
> | [   T8658]  <TASK>
> | [   T8658]  plist_del+0x28/0x100
> | [   T8658]  __futex_unqueue+0x29/0x40
> | [   T8658]  futex_unqueue_pi+0x1f/0x40
> | [   T8658]  futex_lock_pi+0x24d/0x420
> | [   T8658]  do_futex+0x57/0x190
> | [   T8658]  __x64_sys_futex+0xfe/0x1a0
> 
> It takes about 1h+ to reproduce. And only on one particular stubborn
> box. This originates from futex_unqueue_pi() after
> futex_q_lockptr_lock(). I have another crash within
> futex_q_lockptr_lock() (in spin_lock()).
> 
> This looks like the locking task was not enqueued in the hash bucket
> during the resize. This means there was a timeout and the unlocking task
> removed it while looking for the next owner. But the unlocking part
> acquired an additional reference to avoid a resize in that case. So,
> confused I am.

Yeah, weird that.

> I reverted to 50ca0ec83226 ("futex: Resize local futex hash table based
> on number of threads."), have the another "always resize hack" and so
> far it looks good.
> Looking at __futex_pivot_hash() there is this:
> |         if (fph) {
> |                 if (rcuref_read(&fph->users) != 0) {
> |                         mm->futex_phash_new = new;
> |                         return false;
> |                 }
> |
> |                 futex_rehash_private(fph, new);
> |         }
> 
> So we stash the new pointer as long as rcuref_read() does not return 0.
> How stable is rcuref_read()'s 0 return actually? The code says:
> 
> | static inline unsigned int rcuref_read(rcuref_t *ref)
> | {
> |         unsigned int c = atomic_read(&ref->refcnt);
> |
> |         /* Return 0 if within the DEAD zone. */
> |         return c >= RCUREF_RELEASED ? 0 : c + 1;
> | }
> 
> so if it got negative on its final put, the c becomes -1/ 0xff…ff. This
> +1 will be 0 and we do a resize. But it is negative and did not reach
> RCUREF_DEAD yet so it can be bumbed back to positive. It will not be
> deconstructed because the cmpxchg in rcuref_put_slowpath() fails. So it
> will remains active. But we do a resize here and end up with to private
> hash. That is why I had the `released' member.

I am not quite sure I follow. If rcuref_put_slowpath() returns true;
then the value has been set to DEAD (high nibble E), any concurrent
inc/dec will move it away from that a little, but it will always be set
back to DEAD (IOW, you need 1<<29 concurrent modifications into the same
direction to push it out of the DEAD range).

As long as it is within those 29 bits of DEAD, rcuref_read() should
return 0.


  parent reply	other threads:[~2025-03-10 15:57 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25 17:09 [PATCH v9 00/11] futex: Add support task local hash maps Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 01/11] futex: fixup futex_wait_setup [fold futex: Move futex_queue() into futex_wait_setup()] Sebastian Andrzej Siewior
2025-02-26  8:15   ` Thomas Gleixner
2025-02-26  8:40     ` Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 02/11] futex: Create helper function to initialize a hash slot Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 03/11] futex: Add basic infrastructure for local task local hash Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 04/11] futex: Hash only the address for private futexes Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 05/11] futex: Allow automatic allocation of process wide futex hash Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 06/11] futex: Decrease the waiter count before the unlock operation Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 07/11] futex: Introduce futex_q_lockptr_lock() Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 08/11] futex: Acquire a hash reference in futex_wait_multiple_setup() Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 09/11] futex: Allow to re-allocate the private local hash Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 10/11] futex: Resize local futex hash table based on number of threads Sebastian Andrzej Siewior
2025-02-25 17:09 ` [PATCH v9 11/11] futex: Use a hashmask instead of hashsize Sebastian Andrzej Siewior
2025-02-26  8:17   ` Thomas Gleixner
2025-03-03 10:54 ` [PATCH v9 00/11] futex: Add support task local hash maps Peter Zijlstra
2025-03-03 14:17   ` Sebastian Andrzej Siewior
2025-03-03 16:40     ` Sebastian Andrzej Siewior
2025-03-04 14:58       ` Sebastian Andrzej Siewior
2025-03-05  9:02         ` Sebastian Andrzej Siewior
2025-03-10 16:01           ` Peter Zijlstra
2025-03-10 16:27             ` Sebastian Andrzej Siewior
2025-03-11 10:17               ` Peter Zijlstra
2025-03-11 10:33                 ` Sebastian Andrzej Siewior
2025-03-10 15:57         ` Peter Zijlstra [this message]
2025-03-11 15:20   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250310155741.GF19344@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=andrealmeid@igalia.com \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.