Re: Handling large numbers of hazard pointers

public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed

From: Joel Fernandes <joelagnelf@nvidia.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "paulmck@kernel.org" <paulmck@kernel.org>,
	"rcu@vger.kernel.org" <rcu@vger.kernel.org>,
	"boqun.feng@gmail.com" <boqun.feng@gmail.com>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>
Subject: Re: Handling large numbers of hazard pointers
Date: Sun, 14 Dec 2025 07:14:42 +0000	[thread overview]
Message-ID: <207B1F17-D789-48DC-88E3-352ECF0DAF11@nvidia.com> (raw)
In-Reply-To: <961B6DB0-3071-41D0-9AB6-70B76C547F19@nvidia.com>



> On Dec 14, 2025, at 3:37 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
> 
> 
> 
>>> On Dec 14, 2025, at 11:38 AM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>> 
>>> On 2025-12-13 17:31, Paul E. McKenney wrote:
>>> Hello!
>>> I didn't do a good job of answering the "what about large numbers of
>>> hazard pointers" at Boqun's and my hazard-pointers talk at Linux Plumbers
>>> Conference yesterday, so please allow me to at least start on the path
>>> towards fixing that problem.
>>> Also, there were a couple of people participating whose email addresses
>>> I don't know, so please feel free to CC them.
>>> The trick is that in real workloads to date, although there might be
>>> millions of hazard pointers, there will typically only be a few active
>>> per CPU at a given time.  This of course suggests a per-CPU data structure
>>> tracking the active ones.  Allocating a hazard pointer grabs an unused one
>>> from this array, or, if all entries are in use, takes memory provided by
>>> the caller and links it into an overflow list.  Either way, it returns a
>>> pointer to the hazard pointer that is now visible to updaters.  When done,
>>> the caller calls a function that marks the array-entry as unused or
>>> removes the element from the list, as the case may be.  Because hazard
>>> pointers can migrate among CPUs, full synchronization is required when
>>> operating on the array and the overflow list.
>>> And either way, the caller is responsible for allocating and freeing the
>>> backup hazard-pointer structure that will be used in case of overflow.
>>> And also either way, the updater need only deal with hazard pointers
>>> that are currently in use.
>> OK, so let me state how I see the fundamental problem you are trying
>> to address, and detail a possible solution.
>> 
>> * Problem Statement
>> 
>> Assuming we go for an array of hazard pointer slots per CPU to cover
>> the fast path (common case), we still need to handle the
>> overflow scenario, where more hazard pointers are accessed
>> concurrently for a given CPU than the array size, either due to
>> preemption, nested interrupts, or simply nested calls.
>> 
>> * Possible Solution
>> 
>> Requiring the HP caller to allocate backup space is clearly something
>> that would cover all scenarios. My worry is that tracking this backup
>> space allocation may be cumbersome for the user, especially if this
>> requires heap allocation.
>> 
>> Where the backup space can be allocated will likely depend on how long
>> the HP will be accessed. My working hypothesis here (let me know if
>> I'm wrong) is that a most of those HP users will complete their access
>> within the same stack frame where the HP was acquired. This is the
>> primary use-case I would like to make sure is convenient.
>> 
>> For that use-case the users can simply allocate room on their
>> stack frame for the backup HP slot. The requirement here is that they
>> clear the HP slot before the end of the current stack frame.
>> If there is enough room in the per-CPU array, they use that, else
>> they add the backup slot from their stack into the backup slot
>> list. When they are done, if they used a backup slot, they need
>> to remove it from the list.
>> 
>> There could still be room for more advanced use-cases where the
>> backup slots are allocated on the heap, but I suspect that it would
>> make the API trickier to use and should be reserved for use-cases
>> that really require it.
>> 
>> Thoughts ?
> 
> This sounds fine to me.
> 
> We have had similar issues with kfree_rcu() and running out of preallocated memory there meant we just triggered a slow path (synchronize) but for hazard ptr I am not sure what such a slow path would be since, since this problem appears to be on the reader side.
> 
> Perhaps we can also pre allocate overflow nodes in the api itself, and tap into that in case of overflow? The user need not provide their own storage then for overflow purposes, I think. And perhaps this pre allocated pool of overflow nodes can also be common to all hazptr users. Ideally it would be good to not all the api user deal with overflow at all and it transparently works behind the scenes.
> 
> I think we need not worry too much about cases of preallocated overflow nodes itself running out because that is no different than reserved memory needed in atomic context which need a minimum off anyway right? And we have a bounded number of CPU and bounded number of context, so the number of *active* nodes required at any given time should also be bounded?
> 
> Thoughts?
> 
> Thanks.

By the way I wanted to emphasize my PoV, that requiring storage provided by the user seems to negate one of the big benefits of hazard pointers I think. Unless there is a solid use case for it, we should probably not require the user to provide separate storage IMHO (or make allocation internal to the api as I mentioned).

thanks.





> 
> 
> 
> 
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>

next prev parent reply	other threads:[~2025-12-14  7:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-13 22:31 Handling large numbers of hazard pointers Paul E. McKenney
2025-12-14  2:38 ` Mathieu Desnoyers
2025-12-14  6:37   ` Joel Fernandes
2025-12-14  7:14     ` Joel Fernandes [this message]
2025-12-14 13:43       ` Mathieu Desnoyers
2025-12-14 20:49         ` Boqun Feng
2025-12-15 19:34           ` Mathieu Desnoyers
2025-12-15 21:11             ` Mathieu Desnoyers
2025-12-14 22:36         ` Joel Fernandes
2025-12-14 23:26           ` Joel Fernandes
2025-12-15 19:38             ` Mathieu Desnoyers
2025-12-17  7:54               ` Joel Fernandes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=207B1F17-D789-48DC-88E3-352ECF0DAF11@nvidia.com \
    --to=joelagnelf@nvidia.com \
    --cc=boqun.feng@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=roman.gushchin@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox