public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: Steve Wahl <steve.wahl@hpe.com>
Cc: Russ Anderson <rja@hpe.com>, Dimitri Sivanich <sivanich@hpe.com>,
	Kyle Meyer <kyle.meyer@hpe.com>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] tick/sched: Limit non-timekeeper CPUs calling jiffies update
Date: Tue, 28 Oct 2025 20:52:37 +0530	[thread overview]
Message-ID: <32315105-af88-4894-8d45-35f8700df534@linux.ibm.com> (raw)
In-Reply-To: <aQDSMxKDr85kPCJJ@swahl-home.5wahls.com>



On 10/28/25 7:54 PM, Steve Wahl wrote:
> On Tue, Oct 28, 2025 at 11:39:30AM +0530, Shrikanth Hegde wrote:
>>
>>
>> On 10/28/25 12:04 AM, Steve Wahl wrote:
>>> On large NUMA systems, while running a test program that saturates the
>>> inter-processor and inter-NUMA links, acquiring the jiffies_lock can
>>> be very expensive.  If the cpu designated to do jiffies updates
>>> (tick_do_timer_cpu) gets delayed and other cpus decide to do the
>>> jiffies update themselves, a large number of them decide to do so at
>>> the same time.  The inexpensive check against tick_next_period is far
>>> quicker than actually acquiring the lock, so most of these get in line
>>> to obtain the lock.  If obtaining the lock is slow enough, this
>>> spirals into the vast majority of CPUs continuously being stuck
>>> waiting for this lock, just to obtain it and find out that time has
>>> already been updated by another cpu. For example, on one random entry
>>> to kdb by manually-injected NMI, I saw 2912 of 3840 cpus stuck here.
>>>
>>> To avoid this, allow only one non-timekeeper CPU to call
>>> tick_do_update_jiffies64() at any given time, resetting ts->stalled
>>> jiffies only if the jiffies update function is actually called.
>>>
>>> With this change, manually interrupting the test I find at most two
>>> CPUs in the tick_do_update_jiffies64 function (the timekeeper and one
>>> other).
>>>
>>> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
>>> ---
>>>
>>> v2: Rewritten to use an atomic to gate non-timekeeping cpus calling the
>>>       jiffies update, as suggested by tglx. Title of patch has changed
>>>       since trylock is no longer used.
>>>
>>> v1 discussion:
>>> https://lore.kernel.org/all/20251013150959.298288-1-steve.wahl@hpe.com/
>>>
>>>    kernel/time/tick-sched.c | 30 ++++++++++++++++++++++++++----
>>>    1 file changed, 26 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>>> index c527b421c865..3ff3eb1f90d0 100644
>>> --- a/kernel/time/tick-sched.c
>>> +++ b/kernel/time/tick-sched.c
>>> @@ -201,6 +201,27 @@ static inline void tick_sched_flag_clear(struct tick_sched *ts,
>>>    	ts->flags &= ~flag;
>>>    }
>>> +/*
>>> + * Allow only one non-timekeeper CPU at a time update jiffies from
>>> + * the timer tick.
>>> + *
>>> + * Returns true if update was run.
>>> + */
>>> +static bool tick_limited_update_jiffies64(struct tick_sched *ts, ktime_t now)
>>> +{
>>> +	static atomic_t in_progress;
>>> +	int inp;
>>> +
>>> +	inp = atomic_read(&in_progress);
>>> +	if (inp || !atomic_try_cmpxchg(&in_progress, &inp, 1))
>>> +		return false;
>>> +
>>
>> You come here if (ts->last_tick_jiffies == jiffies). So it may be not necessary to check again.
> 
> TGLX had this in his rewrite suggestion, and I looked pretty intensely
> at this test.
> 
> The situation I'm looking to resolve is caused by inter-NUMA links
> being abnormally swamped with traffic.  Especially for writes, access
> to shared memory locations, such as the atomic operations to
> in_progress right above this, take longer than one usually would
> expect.  So to me it makes sense that things may have changed since
> the atomic_try_cmpxchg was initiated, and so I left the check in
> place.
> 

I see, one possibility is

- if it runs in parallel by that time on tick_cpu.( which always updates it)

>>> +	if (ts->last_tick_jiffies == jiffies)
>>> +		tick_do_update_jiffies64(now);
>>> +	atomic_set(&in_progress, 0);
>>> +	return true;
>>> +}
>>> +
>>>    #define MAX_STALLED_JIFFIES 5
>>>    static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
>>> @@ -239,10 +260,11 @@ static void tick_sched_do_timer(struct tick_sched *ts, ktime_t now)
>>>    		ts->stalled_jiffies = 0;
>>>    		ts->last_tick_jiffies = READ_ONCE(jiffies);
>>>    	} else {
>>> -		if (++ts->stalled_jiffies == MAX_STALLED_JIFFIES) {
>>> -			tick_do_update_jiffies64(now);
>>> -			ts->stalled_jiffies = 0;
>>> -			ts->last_tick_jiffies = READ_ONCE(jiffies);
>>> +		if (++ts->stalled_jiffies >= MAX_STALLED_JIFFIES) {
>>> +			if (tick_limited_update_jiffies64(ts, now)) {
>>> +				ts->stalled_jiffies = 0;
>>> +				ts->last_tick_jiffies = READ_ONCE(jiffies);
>>> +			}
>>>    		}
>>>    	}
>>
>>
>> Yes. This could help large systems.
>>
>> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> 
> Thanks for your time reviewing!
> 
> --> Steve Wahl
> 


  reply	other threads:[~2025-10-28 15:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 18:34 [PATCH v2] tick/sched: Limit non-timekeeper CPUs calling jiffies update Steve Wahl
2025-10-28  6:09 ` Shrikanth Hegde
2025-10-28 14:24   ` Steve Wahl
2025-10-28 15:22     ` Shrikanth Hegde [this message]
2025-11-01 19:29 ` [tip: timers/core] " tip-bot2 for Steve Wahl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32315105-af88-4894-8d45-35f8700df534@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=anna-maria@linutronix.de \
    --cc=frederic@kernel.org \
    --cc=kyle.meyer@hpe.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rja@hpe.com \
    --cc=sivanich@hpe.com \
    --cc=steve.wahl@hpe.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox