From: Waiman Long <longman@redhat.com>
To: paulmck@kernel.org, Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org
Subject: Re: A couple of TSC questions
Date: Sun, 2 Apr 2023 21:04:04 -0400 [thread overview]
Message-ID: <293db107-a572-592f-cc27-e59ab81a4e60@redhat.com> (raw)
In-Reply-To: <3daa086c-b4a0-47a9-8bfc-aac4139013c4@paulmck-laptop>
On 3/31/23 13:16, Paul E. McKenney wrote:
> On Tue, Mar 28, 2023 at 02:58:54PM -0700, Paul E. McKenney wrote:
>> On Mon, Mar 27, 2023 at 10:19:54AM +0800, Feng Tang wrote:
>>> On Fri, Mar 24, 2023 at 05:47:33PM -0700, Paul E. McKenney wrote:
>>>> On Wed, Mar 22, 2023 at 01:14:48PM +0800, Feng Tang wrote:
> [ . . . ]
>
>>>>>> Second, we are very occasionally running into console messages like this:
>>>>>>
>>>>>> Measured 2 cycles TSC warp between CPUs, turning off TSC clock.
>>>>>>
>>>>>> This comes from check_tsc_sync_source() and indicates that one CPU's
>>>>>> TSC read produced a later time than a later read from some other CPU.
>>>>>> I am beginning to suspect that these can be caused by unscheduled delays
>>>>>> in the TSC synchronization code, but figured I should ask you if you have
>>>>>> ever seen these. And of course, if so, what the usual causes might be.
>>>>> I haven't seen this error myself or got similar reports. Usually it
>>>>> should be easy to detect once happened, as falling back to HPET
>>>>> will trigger obvious performance degradation.
>>>> And that is exactly what happened. ;-)
>>>>
>>>>> Could you give more detail about when and how it happens, and the
>>>>> HW info like how many sockets the platform has.
>>>> We are in early days, so I am checking for other experiences.
>>>>
>>>>> CC Thomas, Waiman, as they discussed simliar case here:
>>>>> https://lore.kernel.org/lkml/87h76ew3sb.ffs@tglx/T/#md4d0a88fb708391654e78312ffa75b481690699f
>>>> Fun! ;-)
>> Waiman, do you recall what fraction of the benefit was provided by the
>> first patch, that is, the one that grouped the sync_lock, last_tsc,
>> max_warp, nr_warps, and random_warps global variables into a single
>> struct?
The purpose of the first patch is just to avoid false cacheline sharing
between the watchdog cpu and another cpu that happens to access a nearby
data in the same cacheline.
Now I realize that I should have followed up with this patch series. The
problem reported in that patch series happen on one system only, I believe.
> And what we are seeing is unlikely to be due to cache-latency-induced
> delays. We see a very precise warp, for example, one system always
> has 182 cycles of TSC warp, another 273 cycles, and a third 469 cycles.
> Another is at the insanely large value of about 2^64/10, and shows some
> variation, but that variation is only about 0.1%.
>
> But any given system only sees warp on about half of its reboots.
> Perhaps due to the automation sometimes power cycling?
>
> There are few enough affected systems that investigation will take
> some time.
Maybe the difference in wrap is due to NUMA distance of the running cpu
from the node where the data reside. It will be interesting to see if my
patch helps.
Cheers,
Longman
next prev parent reply other threads:[~2023-04-03 1:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-21 23:23 A couple of TSC questions Paul E. McKenney
2023-03-22 5:14 ` Feng Tang
2023-03-25 0:47 ` Paul E. McKenney
2023-03-27 2:19 ` Feng Tang
2023-03-28 21:58 ` Paul E. McKenney
2023-03-31 17:16 ` Paul E. McKenney
2023-04-03 1:04 ` Waiman Long [this message]
2023-04-03 2:00 ` Paul E. McKenney
2023-04-03 2:05 ` Waiman Long
2023-04-03 3:38 ` Paul E. McKenney
2023-04-03 15:11 ` Feng Tang
2023-04-13 18:39 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=293db107-a572-592f-cc27-e59ab81a4e60@redhat.com \
--to=longman@redhat.com \
--cc=feng.tang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox