All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] posix-timers: Reduce spinlock contention
@ 2025-02-14 13:59 Eric Dumazet
  2025-02-14 13:59 ` [PATCH 1/2] posix-timers: Make next_posix_timer_id an atomic_t Eric Dumazet
  2025-02-14 13:59 ` [PATCH 2/2] posix-timers: Use RCU in posix_timer_add() Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2025-02-14 13:59 UTC (permalink / raw)
  To: Anna-Maria Behnsen, Frederic Weisbecker, Thomas Gleixner
  Cc: linux-kernel, Benjamin Segall, Eric Dumazet, Eric Dumazet

We had incidents involving gazillions of concurrent posix timers.

Benjamin Segall is planning to add a cond_resched() in exit_itimers()

In this small series, I am reducing hash_lock contention in posix_timer_add()

Eric Dumazet (2):
  posix-timers: Make next_posix_timer_id an atomic_t
  posix-timers: Use RCU in posix_timer_add()

 include/linux/sched/signal.h |  2 +-
 kernel/time/posix-timers.c   | 21 +++++++++++++++++----
 2 files changed, 18 insertions(+), 5 deletions(-)

-- 
2.48.1.601.g30ceb7b040-goog


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] posix-timers: Make next_posix_timer_id an atomic_t
  2025-02-14 13:59 [PATCH 0/2] posix-timers: Reduce spinlock contention Eric Dumazet
@ 2025-02-14 13:59 ` Eric Dumazet
  2025-02-14 13:59 ` [PATCH 2/2] posix-timers: Use RCU in posix_timer_add() Eric Dumazet
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2025-02-14 13:59 UTC (permalink / raw)
  To: Anna-Maria Behnsen, Frederic Weisbecker, Thomas Gleixner
  Cc: linux-kernel, Benjamin Segall, Eric Dumazet, Eric Dumazet

Instead of relying on a global and shared hash_lock
to protect sig->next_posix_timer_id, make it atomic.

This allows the following patch to use RCU.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/sched/signal.h | 2 +-
 kernel/time/posix-timers.c   | 9 +++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index d5d03d919df8..72649d7fce2a 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -136,7 +136,7 @@ struct signal_struct {
 #ifdef CONFIG_POSIX_TIMERS
 
 	/* POSIX.1b Interval Timers */
-	unsigned int		next_posix_timer_id;
+	atomic_t		next_posix_timer_id;
 	struct hlist_head	posix_timers;
 	struct hlist_head	ignored_posix_timers;
 
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 1b675aee99a9..204a351a2fd3 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -105,13 +105,14 @@ static int posix_timer_add(struct k_itimer *timer)
 	 * a plan to handle the resulting CRIU regression gracefully.
 	 */
 	for (cnt = 0; cnt <= INT_MAX; cnt++) {
-		spin_lock(&hash_lock);
-		id = sig->next_posix_timer_id;
+		id = atomic_inc_return(&sig->next_posix_timer_id) - 1;
 
-		/* Write the next ID back. Clamp it to the positive space */
-		sig->next_posix_timer_id = (id + 1) & INT_MAX;
+		/* Clamp id to the positive space */
+		id &= INT_MAX;
 
 		head = &posix_timers_hashtable[hash(sig, id)];
+
+		spin_lock(&hash_lock);
 		if (!__posix_timers_find(head, sig, id)) {
 			hlist_add_head_rcu(&timer->t_hash, head);
 			spin_unlock(&hash_lock);
-- 
2.48.1.601.g30ceb7b040-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
  2025-02-14 13:59 [PATCH 0/2] posix-timers: Reduce spinlock contention Eric Dumazet
  2025-02-14 13:59 ` [PATCH 1/2] posix-timers: Make next_posix_timer_id an atomic_t Eric Dumazet
@ 2025-02-14 13:59 ` Eric Dumazet
  2025-02-14 16:59   ` David Laight
  2025-02-17 19:24   ` Thomas Gleixner
  1 sibling, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2025-02-14 13:59 UTC (permalink / raw)
  To: Anna-Maria Behnsen, Frederic Weisbecker, Thomas Gleixner
  Cc: linux-kernel, Benjamin Segall, Eric Dumazet, Eric Dumazet

If many posix timers are hashed in posix_timers_hashtable,
hash_lock can be held for long durations.

This can be really bad in some cases as Thomas
explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/

We can perform all searches under RCU, then acquire
the lock only when there is a good chance to need it,
and after cpu caches were populated.

I also added a cond_resched() in the possible long loop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 kernel/time/posix-timers.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 204a351a2fd3..dd2f9016d3dc 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
 
 		head = &posix_timers_hashtable[hash(sig, id)];
 
+		rcu_read_lock();
+		if (__posix_timers_find(head, sig, id)) {
+			rcu_read_unlock();
+			cond_resched();
+			continue;
+		}
+		rcu_read_unlock();
 		spin_lock(&hash_lock);
+		/*
+		 * We must perform the lookup under hash_lock protection
+		 * because another thread could have used the same id.
+		 * This is very unlikely, but possible.
+		 */
 		if (!__posix_timers_find(head, sig, id)) {
 			hlist_add_head_rcu(&timer->t_hash, head);
 			spin_unlock(&hash_lock);
-- 
2.48.1.601.g30ceb7b040-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
  2025-02-14 13:59 ` [PATCH 2/2] posix-timers: Use RCU in posix_timer_add() Eric Dumazet
@ 2025-02-14 16:59   ` David Laight
  2025-02-14 17:48     ` Eric Dumazet
  2025-02-17 19:24   ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: David Laight @ 2025-02-14 16:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Thomas Gleixner,
	linux-kernel, Benjamin Segall, Eric Dumazet

On Fri, 14 Feb 2025 13:59:11 +0000
Eric Dumazet <edumazet@google.com> wrote:

> If many posix timers are hashed in posix_timers_hashtable,
> hash_lock can be held for long durations.
> 
> This can be really bad in some cases as Thomas
> explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/
> 
> We can perform all searches under RCU, then acquire
> the lock only when there is a good chance to need it,
> and after cpu caches were populated.
> 
> I also added a cond_resched() in the possible long loop.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  kernel/time/posix-timers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 204a351a2fd3..dd2f9016d3dc 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>  
>  		head = &posix_timers_hashtable[hash(sig, id)];
>  
> +		rcu_read_lock();
> +		if (__posix_timers_find(head, sig, id)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			continue;
> +		}
> +		rcu_read_unlock();
>  		spin_lock(&hash_lock);
> +		/*
> +		 * We must perform the lookup under hash_lock protection
> +		 * because another thread could have used the same id.
> +		 * This is very unlikely, but possible.
> +		 */

If next_posix_timer_id is 64bit (so can't wrap) I think you can compare the
(unmasked by MAX_INT) value being used with the current value.
If the difference is small (well less than MAX_INT) I don't think you need
the rescan.
(Not going to help 32bit - but who cares :-)

	David

>  		if (!__posix_timers_find(head, sig, id)) {
>  			hlist_add_head_rcu(&timer->t_hash, head);
>  			spin_unlock(&hash_lock);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
  2025-02-14 16:59   ` David Laight
@ 2025-02-14 17:48     ` Eric Dumazet
  2025-02-17 19:25       ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2025-02-14 17:48 UTC (permalink / raw)
  To: David Laight
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Thomas Gleixner,
	linux-kernel, Benjamin Segall, Eric Dumazet

On Fri, Feb 14, 2025 at 5:59 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Fri, 14 Feb 2025 13:59:11 +0000
> Eric Dumazet <edumazet@google.com> wrote:
>
> > If many posix timers are hashed in posix_timers_hashtable,
> > hash_lock can be held for long durations.
> >
> > This can be really bad in some cases as Thomas
> > explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/
> >
> > We can perform all searches under RCU, then acquire
> > the lock only when there is a good chance to need it,
> > and after cpu caches were populated.
> >
> > I also added a cond_resched() in the possible long loop.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  kernel/time/posix-timers.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> > index 204a351a2fd3..dd2f9016d3dc 100644
> > --- a/kernel/time/posix-timers.c
> > +++ b/kernel/time/posix-timers.c
> > @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
> >
> >               head = &posix_timers_hashtable[hash(sig, id)];
> >
> > +             rcu_read_lock();
> > +             if (__posix_timers_find(head, sig, id)) {
> > +                     rcu_read_unlock();
> > +                     cond_resched();
> > +                     continue;
> > +             }
> > +             rcu_read_unlock();
> >               spin_lock(&hash_lock);
> > +             /*
> > +              * We must perform the lookup under hash_lock protection
> > +              * because another thread could have used the same id.
> > +              * This is very unlikely, but possible.
> > +              */
>
> If next_posix_timer_id is 64bit (so can't wrap) I think you can compare the
> (unmasked by MAX_INT) value being used with the current value.
> If the difference is small (well less than MAX_INT) I don't think you need
> the rescan.
> (Not going to help 32bit - but who cares :-)

I just noticed the rescan is racy anyway, because when the other threads add
a timer, the timer->it_signal and timer->it_id are temporarily zero.

There is a small race window.

We can set timer->it_id earlier [1], but not timer->it_signal

More work is needed :)

[1]

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index dd2f9016d3dc..59ff75c81cff 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -126,6 +126,7 @@ static int posix_timer_add(struct k_itimer *timer)
                 * This is very unlikely, but possible.
                 */
                if (!__posix_timers_find(head, sig, id)) {
+                       timer->it_id = (timer_t)id;
                        hlist_add_head_rcu(&timer->t_hash, head);
                        spin_unlock(&hash_lock);
                        return id;
@@ -428,7 +429,6 @@ static int do_timer_create(clockid_t which_clock,
struct sigevent *event,
                return new_timer_id;
        }

-       new_timer->it_id = (timer_t) new_timer_id;
        new_timer->it_clock = which_clock;
        new_timer->kclock = kc;
        new_timer->it_overrun = -1LL;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
  2025-02-14 13:59 ` [PATCH 2/2] posix-timers: Use RCU in posix_timer_add() Eric Dumazet
  2025-02-14 16:59   ` David Laight
@ 2025-02-17 19:24   ` Thomas Gleixner
  2025-02-18 14:16     ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Gleixner @ 2025-02-17 19:24 UTC (permalink / raw)
  To: Eric Dumazet, Anna-Maria Behnsen, Frederic Weisbecker
  Cc: linux-kernel, Benjamin Segall, Eric Dumazet, Eric Dumazet,
	Andrey Vagin, Pavel Tikhomirov, Peter Zijlstra

On Fri, Feb 14 2025 at 13:59, Eric Dumazet wrote:

> If many posix timers are hashed in posix_timers_hashtable,
> hash_lock can be held for long durations.
>
> This can be really bad in some cases as Thomas
> explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/

I really hate the horrible ABI which we can't get rid of w/o breaking
CRIU.

The global hash really needs to go away and be replaced by a per signal
xarray. That can be done, but due to CRIU there is no way to make this
non-sparse by reusing holes, which are created by deleted timers.

The sad truth is that the kernel has absolutely zero clue that this
happens in a CRIU restore operation context, unless I'm missing
something.

If it would be able to detect it, then we could work around it
somehow. But without that there is not much we can do aside of breaking
the ABI.

Though in the above thread the CRIU people already signaled that they
are willing to work out a migration scheme. I just forgot to revisit
this. Let me stare at it some more.

> We can perform all searches under RCU, then acquire
> the lock only when there is a good chance to need it,
> and after cpu caches were populated.
>
> I also added a cond_resched() in the possible long loop.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  kernel/time/posix-timers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 204a351a2fd3..dd2f9016d3dc 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>  
>  		head = &posix_timers_hashtable[hash(sig, id)];
>  
> +		rcu_read_lock();
> +		if (__posix_timers_find(head, sig, id)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			continue;
> +		}
> +		rcu_read_unlock();
>  		spin_lock(&hash_lock);
> +		/*
> +		 * We must perform the lookup under hash_lock protection
> +		 * because another thread could have used the same id.

Hmm, that won't help and is broken already today as timer->id is set at
the call site after releasing hash_lock.

> +		 * This is very unlikely, but possible.

Only if the process is able to install INT_MAX - 1 timers and the stupid
search wraps around (INT_MAX loops) on the other thread and ends up at
the same number again. But yes, theoretically it's possible. :)

So the timer ID must be set _before_ adding it to the hash list, but
that wants to be a seperate patch.

> +		 */
>  		if (!__posix_timers_find(head, sig, id)) {
>  			hlist_add_head_rcu(&timer->t_hash, head);
>  			spin_unlock(&hash_lock);

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
  2025-02-14 17:48     ` Eric Dumazet
@ 2025-02-17 19:25       ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2025-02-17 19:25 UTC (permalink / raw)
  To: Eric Dumazet, David Laight
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, linux-kernel,
	Benjamin Segall, Eric Dumazet

On Fri, Feb 14 2025 at 18:48, Eric Dumazet wrote:
> I just noticed the rescan is racy anyway, because when the other threads add
> a timer, the timer->it_signal and timer->it_id are temporarily zero.

Ah, you noticed too, but that has nothing to do with the rescan. That's
broken already today.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
  2025-02-17 19:24   ` Thomas Gleixner
@ 2025-02-18 14:16     ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2025-02-18 14:16 UTC (permalink / raw)
  To: Eric Dumazet, Anna-Maria Behnsen, Frederic Weisbecker
  Cc: linux-kernel, Benjamin Segall, Eric Dumazet, Eric Dumazet,
	Andrey Vagin, Pavel Tikhomirov, Peter Zijlstra

On Mon, Feb 17 2025 at 20:24, Thomas Gleixner wrote:
> On Fri, Feb 14 2025 at 13:59, Eric Dumazet wrote:
>> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>>  
>>  		head = &posix_timers_hashtable[hash(sig, id)];
>>  
>> +		rcu_read_lock();
>> +		if (__posix_timers_find(head, sig, id)) {
>> +			rcu_read_unlock();
>> +			cond_resched();
>> +			continue;
>> +		}
>> +		rcu_read_unlock();
>>  		spin_lock(&hash_lock);
>> +		/*
>> +		 * We must perform the lookup under hash_lock protection
>> +		 * because another thread could have used the same id.
>
> Hmm, that won't help and is broken already today as timer->id is set at
> the call site after releasing hash_lock.
>
>> +		 * This is very unlikely, but possible.
>
> Only if the process is able to install INT_MAX - 1 timers and the stupid
> search wraps around (INT_MAX loops) on the other thread and ends up at
> the same number again. But yes, theoretically it's possible. :)
>
> So the timer ID must be set _before_ adding it to the hash list, but
> that wants to be a seperate patch.

It's even worse. __posix_timers_find() checks for both timer->it_id and
timer->it_signal, but the latter is only set when the timer is about to
go live. I have an idea, but that might be a bad one :)

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-02-18 14:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-14 13:59 [PATCH 0/2] posix-timers: Reduce spinlock contention Eric Dumazet
2025-02-14 13:59 ` [PATCH 1/2] posix-timers: Make next_posix_timer_id an atomic_t Eric Dumazet
2025-02-14 13:59 ` [PATCH 2/2] posix-timers: Use RCU in posix_timer_add() Eric Dumazet
2025-02-14 16:59   ` David Laight
2025-02-14 17:48     ` Eric Dumazet
2025-02-17 19:25       ` Thomas Gleixner
2025-02-17 19:24   ` Thomas Gleixner
2025-02-18 14:16     ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.