From: Rajvi Jingar <rajvi.Jingar@intel.com>
To: <tglx@linutronix.de>
Cc: <mingo@kernel.org>, <hpa@zytor.com>, <x86@kernel.org>,
<peterz@infradead.org>, <linux-kernel@vger.kernel.org>,
<christopher.s.hall@intel.com>
Subject: Re: [PATCH] x86/tsc: Always Running Timer (ART) nanoseconds clocksource
Date: Tue, 6 Mar 2018 16:46:02 -0800 [thread overview]
Message-ID: <1520383562.29569.49.camel@intel.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1803011022330.1396@nanos.tec.linutronix.de>
Hi Thomas,
Thank you for your review comments. Please find my answers inline.
On Thu, 2018-03-01 at 12:40 +0100, Thomas Gleixner wrote:
> On Wed, 28 Feb 2018, Rajvi Jingar wrote:
>
> Subject: x86/tsc: Always Running Timer (ART) nanoseconds clocksource
>
> Please don't use clocksource here. That's misleading because
> clocksources
> are related to the time keeping infrastructure. What the patch
> provides is a
> conversion/correlation function for ART.
>
Sure. v2 has it corrected.
> > Some clock distribution mechanisms (e.g. PCIe-PTM) require time to
> > be
> > distributed in units of nanoseconds. In order to cross-timestamp
> > local
> > device time across domains the local device timestamp needs to be
> > correlated with TSC.
> >
> > On systems that support ART, a CPUID leaf (0x15) returns parameter
> > Nominal Core Crystal Clock Frequency such that:
> >
> > ART_value (in ticks) = (cryst_freq * ART.ns) / 1e9
> >
> > Add a special case for Goldmont-based platform (which returns
> > cryst_freq 0)
> > to manually set the frequency to 19.2MHz.
> >
> > Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
> > Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
>
> This SOB chain is wrong. Christopher is not transporting your patch.
> If he
> was involved in development then please use the:
>
> Co-developed-by: Christopher S. Hall <christopher.s.hall@intel.com>
> Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
>
> format.
>
Adding Cristopher to "Suggested-by" tag since it was suggested by him.
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -108,6 +108,7 @@
> > #define X86_FEATURE_EXTD_APICID ( 3*32+26) /*
> > Extended APICID (8 bits) */
> > #define X86_FEATURE_AMD_DCM ( 3*32+27) /* AMD
> > multi-node processor */
> > #define X86_FEATURE_APERFMPERF ( 3*32+28) /* P-
> > State hardware coordination feedback capability (APERF/MPERF MSRs)
> > */
> > +#define X86_FEATURE_ART_NS ( 3*32+29) /* Always
> > running timer (ART) in nanoseconds */
>
> What's the point of this feature flag? You are not using it in the
> conversion function for sanity checking the invocation.
>
> Also the naming is bogus as it suggests that the ART value is
> actually in
> nano seconds which is not true at all.
>
> What it allows is to do a translation from nanosecond based ART
> values
> - where ever they come from - to TSC.
Flag was introduced because of the unreliability of the
CPUID[0x15].ECX, to check in driver whether it is set or not before
calling this conversion. It has been removed from v2 since new
conversion uses tsc_khz instead. We can utilize
X86_FEATURE_TSC_KNOWN_FREQ flag to add check in driver before calling
this conversion.
> > static u32 art_to_tsc_numerator;
> > static u32 art_to_tsc_denominator;
> > +static u32 art_to_tsc_hz;
>
> I really do not understand your attempt to connect this to TSC. It's
> just
> wrong. From your changelog:
>
> ART_value (in ticks) = (cryst_freq * ART.ns) / 1e9
>
> Where is TSC in that formula? Also what is ART.ns? This does not make
> any
> sense at all.
>
> From the SDM:
>
> The invariant TSC is based on the invariant timekeeping hardware
> (called Always Running Timer or ART), that runs at the core
> crystal
> clock frequency.
>
> So ART_TICKS is simply the value read from the ART register in a
> device and
> the unit of this value is the core crystal clock frequency.
>
> Now what you want to achieve is the conversion of ART_TICKS to
> nanoseconds. That is:
>
> ART_NS = ART_TICKS * CORE_CRYSTAL_FREQ / 1e9
>
To correct the above formula,
ART_NS = ART_TICKS * 1e9 / CORE_CRYSTAL_FREQ
Added the ART_NS->TSC formula in changelog with more details.
> > cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
> > - &art_to_tsc_numerator, unused, unused+1);
> > + &art_to_tsc_numerator, &art_to_tsc_hz, &unused);
>
> That means that the variable you want here is:
>
> core_crystal_freq
>
> and not some misleading randomly chosen one. Again from the SDM:
>
> ECX Bits 31 - 00: An unsigned integer which is the nominal frequency
> of the
> core crystal clock in Hz.
>
Variable core_crystal_freq has been removed in v2.
> > if (art_to_tsc_denominator < ART_MIN_DENOMINATOR)
> > return;
> > @@ -1001,6 +1002,15 @@ static void __init detect_art(void)
> >
> > /* Make this sticky over multiple CPU init calls */
> > setup_force_cpu_cap(X86_FEATURE_ART);
> > +
> > + if (art_to_tsc_hz == 0) {
> > + if (boot_cpu_data.x86_model ==
> > INTEL_FAM6_ATOM_GOLDMONT)
> > + art_to_tsc_hz = 19200000;
> > + else
> > + return;
>
> Please make this a switch case right away. Given the track record of
> Intels
> bogus frequency information in CPUID this will grow before this patch
> is
> merged.
>
> switch (boot_cpu_data.x86_model) {
> case INTEL_FAM6_ATOM_GOLDMONT:
> /* Add a comment explaining why goldmont is special
> */
> art_to_tsc_hz = 19200000;
> break;
> default: return;
> }
>
This hardcoding was redundant for this conversion so it has been
removed and v2 uses existing frequency hardcode for platforms where
CPUID[15H].ECX == 0.
> > + }
> > +
> > + setup_force_cpu_cap(X86_FEATURE_ART_NS);
>
> This still makes no sense. Can you please elaborate what this feature
> is for?
>
> > @@ -1179,6 +1189,27 @@ struct system_counterval_t
> > convert_art_to_tsc(u64 art)
> > }
> > EXPORT_SYMBOL(convert_art_to_tsc);
> >
> > +#define ART_NS_QUANTITY 1000000000
>
> What on earth does this constant mean? It's simply NSEC_PER_SEC, i.e.
> 1e9,
> if I did not miscount the trailing zeros. There is absolutely no
> point to
> invent obscure new constants if there are meaningful and correct ones
> available already.
>
Sure. I missed out the already existing constant. Thanks for pointing
it out.
> > +/*
> > + * Convert ART ns to TSC given numerator/denominator found in
> > detect_art()
>
> Please use proper kernel doc to document the function.
>
> > + */
> > +struct system_counterval_t convert_art_ns_to_tsc(u64 art_ns)
>
> How do you get a ART value in nanoseconds in the first place? You are
> mumbling something unspecific in your changelog:
>
> Some clock distribution mechanisms (e.g. PCIe-PTM) require time
> to be
> distributed in units of nanoseconds.
>
> Of course you completely fail to explain how that is supposed to
> work. The
> original explanation for ART was that ART is distributed to PCIe as
> is and
> the time stamps taken in devices are in ART frequency. That's how PTP
> uses
> it, right?
>
> Now you say, that PCIe-PTM provides ART values in nanosecond units. I
> assume that's done in hardware and uses the same conversion formula:
>
> ART_NS = ART_TICKS * CORE_CRYSTAL_FREQ / 1e9
>
> That brings up the obvious question how PCIe-PTM knows about
> CORE_CRYSTAL_FREQ on Goldmont if the CPUID does not. What a mess.
>
> All this information wants to be in the changelog and not left to the
> reader/reviewer to be figured out with crystalballs.
>
> So for full correlation to TSC you need to go back to the original
> core
> crystal ticks and then do the conversion to TSC. The way you are
> doing this
> is:
>
> ART_TICKS = ART_NS * 1e9 / CORE_CRYSTAL_FREQ;
>
> and then:
>
> TSC = art_tsc_offset + ART_TICKS * art_tsc_nominator /
> art_tsc_denominator
>
> Sorry, but that is just mindless hackery. The complete conversion
> function
> is:
>
> TSC = art_tsc_offset + (ART_TICKS * 1e9 * art_tsc_nominator) /
> (CORE_CRYSTAL_FREQ *
> art_tsc_denominator)
>
> The relevant values are already known at init time. So you can simply
> compute the compound values.
>
> art_ns_tsc_nominator = 1e9 * art_tsc_nominator;
> art_ns_tsc_denominator = CORE_CRYSTAL_FREQ *
> art_tsc_denominator;
>
> and the computation boils down to:
>
> res = div64_u64_rem(art_ns, art_ns_tsc_denominator, &rem);
> res *= art_ns_to_tsc_numerator;
>
> rem *= art_ns_to_tsc_numerator;
> res += div64_u64(rem, art_ns_tsc_denominator);
> res += art_tsc_offset;
>
> instead of a completely uncomprehensible mess which is also prone to
> lose
> precision.
>
> Hmm?
>
> Thanks,
>
> tglx
Formula has been changed in v2 to calculate TSC from given ART in
nanoseconds that is much straightforward.
Thanks,
Rajvi
prev parent reply other threads:[~2018-03-06 23:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-28 11:54 [PATCH] x86/tsc: Always Running Timer (ART) nanoseconds clocksource Rajvi Jingar
2018-03-01 11:40 ` Thomas Gleixner
2018-03-07 0:46 ` Rajvi Jingar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1520383562.29569.49.camel@intel.com \
--to=rajvi.jingar@intel.com \
--cc=christopher.s.hall@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.