All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Eric Dumazet <edumazet@google.com>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Frederic Weisbecker <frederic@kernel.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Benjamin Segall <bsegall@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	Andrey Vagin <avagin@openvz.org>,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Date: Mon, 17 Feb 2025 20:24:21 +0100	[thread overview]
Message-ID: <87o6z0jrx6.ffs@tglx> (raw)
In-Reply-To: <20250214135911.2037402-3-edumazet@google.com>

On Fri, Feb 14 2025 at 13:59, Eric Dumazet wrote:

> If many posix timers are hashed in posix_timers_hashtable,
> hash_lock can be held for long durations.
>
> This can be really bad in some cases as Thomas
> explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/

I really hate the horrible ABI which we can't get rid of w/o breaking
CRIU.

The global hash really needs to go away and be replaced by a per signal
xarray. That can be done, but due to CRIU there is no way to make this
non-sparse by reusing holes, which are created by deleted timers.

The sad truth is that the kernel has absolutely zero clue that this
happens in a CRIU restore operation context, unless I'm missing
something.

If it would be able to detect it, then we could work around it
somehow. But without that there is not much we can do aside of breaking
the ABI.

Though in the above thread the CRIU people already signaled that they
are willing to work out a migration scheme. I just forgot to revisit
this. Let me stare at it some more.

> We can perform all searches under RCU, then acquire
> the lock only when there is a good chance to need it,
> and after cpu caches were populated.
>
> I also added a cond_resched() in the possible long loop.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  kernel/time/posix-timers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 204a351a2fd3..dd2f9016d3dc 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>  
>  		head = &posix_timers_hashtable[hash(sig, id)];
>  
> +		rcu_read_lock();
> +		if (__posix_timers_find(head, sig, id)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			continue;
> +		}
> +		rcu_read_unlock();
>  		spin_lock(&hash_lock);
> +		/*
> +		 * We must perform the lookup under hash_lock protection
> +		 * because another thread could have used the same id.

Hmm, that won't help and is broken already today as timer->id is set at
the call site after releasing hash_lock.

> +		 * This is very unlikely, but possible.

Only if the process is able to install INT_MAX - 1 timers and the stupid
search wraps around (INT_MAX loops) on the other thread and ends up at
the same number again. But yes, theoretically it's possible. :)

So the timer ID must be set _before_ adding it to the hash list, but
that wants to be a seperate patch.

> +		 */
>  		if (!__posix_timers_find(head, sig, id)) {
>  			hlist_add_head_rcu(&timer->t_hash, head);
>  			spin_unlock(&hash_lock);

Thanks,

        tglx

  parent reply	other threads:[~2025-02-17 19:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-14 13:59 [PATCH 0/2] posix-timers: Reduce spinlock contention Eric Dumazet
2025-02-14 13:59 ` [PATCH 1/2] posix-timers: Make next_posix_timer_id an atomic_t Eric Dumazet
2025-02-14 13:59 ` [PATCH 2/2] posix-timers: Use RCU in posix_timer_add() Eric Dumazet
2025-02-14 16:59   ` David Laight
2025-02-14 17:48     ` Eric Dumazet
2025-02-17 19:25       ` Thomas Gleixner
2025-02-17 19:24   ` Thomas Gleixner [this message]
2025-02-18 14:16     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o6z0jrx6.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=anna-maria@linutronix.de \
    --cc=avagin@openvz.org \
    --cc=bsegall@google.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=ptikhomirov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.