All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Waiman Long <llong@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Breno Leitao <leitao@debian.org>, Ingo Molnar <mingo@redhat.com>,
	Will Deacon <will@kernel.org>,
	aeh@meta.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, jhs@mojatatu.com, kernel-team@meta.com,
	Erik Lundgren <elundgren@meta.com>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH] lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization
Date: Tue, 25 Mar 2025 11:45:10 -0700	[thread overview]
Message-ID: <Z-L5ttC9qllTAEbO@boqun-archlinux> (raw)
In-Reply-To: <934d794b-7ebc-422c-b4fe-3e658a2e5e7a@redhat.com>

On Tue, Mar 25, 2025 at 10:52:16AM -0400, Waiman Long wrote:
> On 3/24/25 11:41 PM, Boqun Feng wrote:
> > On Mon, Mar 24, 2025 at 09:56:25PM -0400, Waiman Long wrote:
> > > On 3/24/25 8:47 PM, Boqun Feng wrote:
> > > > On Mon, Mar 24, 2025 at 12:30:10PM -0700, Boqun Feng wrote:
> > > > > On Mon, Mar 24, 2025 at 12:21:07PM -0700, Boqun Feng wrote:
> > > > > > On Mon, Mar 24, 2025 at 01:23:50PM +0100, Eric Dumazet wrote:
> > > > > > [...]
> > > > > > > > > ---
> > > > > > > > >    kernel/locking/lockdep.c | 6 ++++--
> > > > > > > > >    1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > > > > > > > > index 4470680f02269..a79030ac36dd4 100644
> > > > > > > > > --- a/kernel/locking/lockdep.c
> > > > > > > > > +++ b/kernel/locking/lockdep.c
> > > > > > > > > @@ -6595,8 +6595,10 @@ void lockdep_unregister_key(struct lock_class_key *key)
> > > > > > > > >         if (need_callback)
> > > > > > > > >                 call_rcu(&delayed_free.rcu_head, free_zapped_rcu);
> > > > > > > > > 
> > > > > > > > > -     /* Wait until is_dynamic_key() has finished accessing k->hash_entry. */
> > > > > > > > > -     synchronize_rcu();
> > > > > > I feel a bit confusing even for the old comment, normally I would expect
> > > > > > the caller of lockdep_unregister_key() should guarantee the key has been
> > > > > > unpublished, in other words, there is no way a lockdep_unregister_key()
> > > > > > could race with a register_lock_class()/lockdep_init_map_type(). The
> > > > > > synchronize_rcu() is not needed then.
> > > > > > 
> > > > > > Let's say someone breaks my assumption above, then when doing a
> > > > > > register_lock_class() with a key about to be unregister, I cannot see
> > > > > > anything stops the following:
> > > > > > 
> > > > > > 	CPU 0				CPU 1
> > > > > > 	=====				=====
> > > > > > 	register_lock_class():
> > > > > > 	  ...
> > > > > > 	  } else if (... && !is_dynamic_key(lock->key)) {
> > > > > > 	  	// ->key is not unregistered yet, so this branch is not
> > > > > > 		// taken.
> > > > > > 	  	return NULL;
> > > > > > 	  }
> > > > > > 	  				lockdep_unregister_key(..);
> > > > > > 					// key unregister, can be free
> > > > > > 					// any time.
> > > > > > 	  key = lock->key->subkeys + subclass; // BOOM! UAF.
> > This is not a UAF :(
> > 
> > > > > > So either we don't need the synchronize_rcu() here or the
> > > > > > synchronize_rcu() doesn't help at all. Am I missing something subtle
> > > > > > here?
> > > > > > 
> > > > > Oh! Maybe I was missing register_lock_class() must be called with irq
> > > > > disabled, which is also an RCU read-side critical section.
> > > > > 
> > > > Since register_lock_class() will be call with irq disabled, maybe hazard
> > > > pointers [1] is better because most of the case we only have nr_cpus
> > > > readers, so the potential hazard pointer slots are fixed.
> > > > 
> > > > So the below patch can reduce the time of the tc command from real ~1.7
> > > > second (v6.14) to real ~0.05 second (v6.14 + patch) in my test env,
> > > > which is not surprising given it's a dedicated hazard pointers for
> > > > lock_class_key.
> > > > 
> > > > Thoughts?
> > > My understanding is that it is not a race between register_lock_class() and
> > > lockdep_unregister_key(). It is the fact that the structure that holds the
> > > lock_class_key may be freed immediately after return from
> > > lockdep_unregister_key(). So any processes that are in the process of
> > > iterating the hash_list containing the hash_entry to be unregistered may hit
> > You mean the lock_keys_hash table, right? I used register_lock_class()
> > as an example, because it's one of the places that iterates
> > lock_keys_hash. IIUC lock_keys_hash is only used in
> > lockdep_{un,}register_key() and is_dynamic_key() (which are only called
> > by lockdep_init_map_type() and register_lock_class()).
> > 
> > > a UAF problem. See commit 61cc4534b6550 ("locking/lockdep: Avoid potential
> > > access of invalid memory in lock_class") for a discussion of this kind of
> > > UAF problem.
> > > 
> > That commit seemed fixing a race between disabling lockdep and
> > unregistering key, and most importantly, call zap_class() for the
> > unregistered key even if lockdep is disabled (debug_locks = 0). It might
> > be related, but I'm not sure that's the reason of putting
> > synchronize_rcu() there. Say you want to synchronize between
> > /proc/lockdep and lockdep_unregister_key(), and you have
> > synchronize_rcu() in lockdep_unregister_key(), what's the RCU read-side
> > critical section at /proc/lockdep?
> I agree that the commit that I mentioned is not relevant to the current
> case. You are right that is_dynamic_key() is the only function that is
> problematic, the other two are protected by the lockdep_lock. So they are
> safe. Anyway, I believe that the actual race happens in the iteration of the
> hashed list in is_dynamic_key(). The key that you save in the
> lockdep_key_hazptr in your proposed patch should never be the key (dead_key)

The key stored in lockdep_key_hazptr is the one that the rest of the
function will use after is_dynamic_key() return true. That is,

	CPU 0				CPU 1
	=====				=====
	WRITE_ONCE(*lockdep_key_hazptr, key);
	smp_mb();

	is_dynamic_key():
	  hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
	    if (k == key) {
	      found = true;
	      break;
	    }
	  }
	  				lockdep_unregister_key():
					  hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
					    if (k == key) {
					      hlist_del_rcu(&k->hash_entry);
				              found = true;
				              break;
					    }
					  }

				        smp_mb();

					synchronize_lockdep_key_hazptr():
					  for_each_possible_cpu(cpu) {
					    <wait for the hazptr slot on
					    that CPU to be not equal to
					    the removed key>
					  }


, so that if is_dynamic_key() finds a key was in the hash, even though
later on the key would be removed by lockdep_unregister_key(), the
hazard pointers guarantee lockdep_unregister_key() would wait for the
hazard pointer to release.

> that is passed to lockdep_unregister_key(). In is_dynamic_key():
> 
>     hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
>                 if (k == key) {
>                         found = true;
>                         break;
>                 }
>         }
> 
> key != k (dead_key), but before accessing its content to get to hash_entry,

It is the dead_key.

> an interrupt/NMI can happen. In the mean time, the structure holding the key
> is freed and its content can be overwritten with some garbage. When
> interrupt/NMI returns, hash_entry can point to anything leading to crash or
> an infinite loop.  Perhaps we can use some kind of synchronization mechanism

No, hash_entry cannot be freed or overwritten because the user has
protect the key with hazard pointers, only when the user reset the
hazard pointer to NULL, lockdep_unregister_key() can then return and the
key can be freed.

> between is_dynamic_key() and lockdep_unregister_key() to prevent this kind
> of racing. For example, we can have an atomic counter associated with each

The hazard pointer I proposed provides the exact synchronization ;-)

Regards,
Boqun

> head of the hashed table. is_dynamic_key() can increment the counter if it
> is not zero to proceed and lockdep_unregister_key() have to make sure it can
> safely decrement the counter to 0 before going ahead. Just a thought!
> 
> Cheers,
> Longman

  parent reply	other threads:[~2025-03-25 18:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21  9:30 [PATCH] lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization Breno Leitao
2025-03-21 10:37 ` Eric Dumazet
2025-03-21 14:22   ` Breno Leitao
2025-03-24 12:12 ` Peter Zijlstra
2025-03-24 12:23   ` Eric Dumazet
2025-03-24 12:24     ` Eric Dumazet
2025-03-24 19:21     ` Boqun Feng
2025-03-24 19:30       ` Boqun Feng
2025-03-25  0:47         ` Boqun Feng
2025-03-25  1:56           ` Waiman Long
2025-03-25  3:41             ` Boqun Feng
     [not found]               ` <934d794b-7ebc-422c-b4fe-3e658a2e5e7a@redhat.com>
2025-03-25 14:57                 ` Waiman Long
2025-03-25 18:45                 ` Boqun Feng [this message]
2025-03-25 19:23                   ` Waiman Long
2025-03-25 19:42                     ` Boqun Feng
2025-03-25 23:20                       ` Waiman Long
2025-03-26  5:25                         ` Boqun Feng
     [not found]                           ` <df237702-55c3-466b-b51e-f3fe46ae03ba@redhat.com>
2025-03-26 16:40                             ` Waiman Long
2025-03-26 16:47                               ` Boqun Feng
2025-03-26 17:02                                 ` Waiman Long
2025-03-26 17:10                                   ` Paul E. McKenney
2025-03-26 18:42                                     ` Boqun Feng
2025-03-26 21:37                                       ` Paul E. McKenney
2025-03-31 16:48                                       ` Breno Leitao
2025-03-31 17:34                                         ` Boqun Feng
2025-03-31 17:26                             ` Boqun Feng
2025-03-31 17:33                               ` Waiman Long
2025-03-31 18:33                                 ` Paul E. McKenney
2025-03-31 18:57                                   ` Waiman Long
2025-03-31 21:21                                     ` Boqun Feng
2025-03-31 21:47                                       ` Waiman Long
2025-03-31 17:42                               ` Eric Dumazet
2025-07-09 10:00 ` Breno Leitao
2025-07-09 13:57   ` Waiman Long
2025-07-09 14:57     ` Boqun Feng
2025-07-19 17:40 ` [tip: locking/core] " tip-bot2 for Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-L5ttC9qllTAEbO@boqun-archlinux \
    --to=boqun.feng@gmail.com \
    --cc=aeh@meta.com \
    --cc=edumazet@google.com \
    --cc=elundgren@meta.com \
    --cc=jhs@mojatatu.com \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llong@redhat.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.