From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Bragg Subject: Re: [PATCH v9 09/11] drm/i915: add dev.i915.oa_max_sample_rate sysctl Date: Thu, 10 Nov 2016 13:57:26 +0000 Message-ID: References: <20161107194957.3385-1-robert@sixbynine.org> <20161107194957.3385-10-robert@sixbynine.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1117849728==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Matthew Auld Cc: ML dri-devel , David Airlie , Intel Graphics Development , Sourab Gupta , Daniel Vetter List-Id: dri-devel@lists.freedesktop.org --===============1117849728== Content-Type: multipart/alternative; boundary=001a11377e18282e240540f2c2b6 --001a11377e18282e240540f2c2b6 Content-Type: text/plain; charset=UTF-8 On Wed, Nov 9, 2016 at 7:52 PM, Matthew Auld wrote: > On 7 November 2016 at 19:49, Robert Bragg wrote: > > The maximum OA sampling frequency is now configurable via a > > dev.i915.oa_max_sample_rate sysctl parameter. > > > > Following the precedent set by perf's similar > > kernel.perf_event_max_sample_rate the default maximum rate is 100000Hz > > > > Signed-off-by: Robert Bragg > > --- > > drivers/gpu/drm/i915/i915_perf.c | 61 ++++++++++++++++++++++++++++++ > ++-------- > > 1 file changed, 50 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_perf.c > b/drivers/gpu/drm/i915/i915_perf.c > > index e51c1d8..1a87fe9 100644 > > --- a/drivers/gpu/drm/i915/i915_perf.c > > +++ b/drivers/gpu/drm/i915/i915_perf.c > > @@ -82,6 +82,21 @@ static u32 i915_perf_stream_paranoid = true; > > #define INVALID_CTX_ID 0xffffffff > > > > > > +/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate > > + * > > + * 160ns is the smallest sampling period we can theoretically program > the OA > > + * unit with on Haswell, corresponding to 6.25MHz. > > + */ > > +static int oa_sample_rate_hard_limit = 6250000; > > + > > +/* Theoretically we can program the OA unit to sample every 160ns but > don't > > + * allow that by default unless root... > > + * > > + * The default threshold of 100000Hz is based on perf's similar > > + * kernel.perf_event_max_sample_rate sysctl parameter. > > + */ > > +static u32 i915_oa_max_sample_rate = 100000; > > + > > /* XXX: beware if future OA HW adds new report formats that the current > > * code assumes all reports have a power-of-two size and ~(size - 1) can > > * be used as a mask to align the OA tail pointer. > > @@ -1314,6 +1329,7 @@ static int read_properties_unlocked(struct > drm_i915_private *dev_priv, > > } > > > > for (i = 0; i < n_props; i++) { > > + u64 oa_period, oa_freq_hz; > > u64 id, value; > > int ret; > > > > @@ -1359,21 +1375,35 @@ static int read_properties_unlocked(struct > drm_i915_private *dev_priv, > > return -EINVAL; > > } > > > > - /* NB: The exponent represents a period as > follows: > > - * > > - * 80ns * 2^(period_exponent + 1) > > - * > > - * Theoretically we can program the OA unit to > sample > > + /* Theoretically we can program the OA unit to > sample > > * every 160ns but don't allow that by default > unless > > * root. > > * > > - * Referring to perf's > > - * kernel.perf_event_max_sample_rate for a > precedent > > - * (100000 by default); with an OA exponent of 6 > we get > > - * a period of 10.240 microseconds -just under > 100000Hz > > + * On Haswell the period is derived from the > exponent > > + * as: > > + * > > + * period = 80ns * 2^(exponent + 1) > > + */ > > + BUILD_BUG_ON(sizeof(oa_period) != 8); > > + oa_period = 80ull * (2ull << value); > > + > > + /* This check is primarily to ensure that > oa_period <= > > + * UINT32_MAX (before passing to do_div which > only > > + * accepts a u32 denominator), but we can also > skip > > + * checking anything < 1Hz which implicitly > can't be > > + * limited via an integer oa_max_sample_rate. > > */ > > - if (value < 6 && !capable(CAP_SYS_ADMIN)) { > > - DRM_ERROR("Minimum OA sampling exponent > is 6 without root privileges\n"); > > + if (oa_period <= NSEC_PER_SEC) { > > + u64 tmp = NSEC_PER_SEC; > > + do_div(tmp, oa_period); > > + oa_freq_hz = tmp; > > + } else > > + oa_freq_hz = 0; > > + > > + if (oa_freq_hz > i915_oa_max_sample_rate && > > + !capable(CAP_SYS_ADMIN)) { > > + DRM_ERROR("OA exponent would exceed the > max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uHz without > root privileges\n", > This line is getting a little too long. > yeah, though the coding style has the exception for string constants so they remain grepable which can be useful too. Could maybe remove the bracketed naming of the corresponding sysctl parameter, though it does seem like pertinent/useful info if someone were to hit this limit. > > Reviewed-by: Matthew Auld > thanks --001a11377e18282e240540f2c2b6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Wed, Nov 9, 2016 at 7:52 PM, Matthew Auld <matthew.wil= liam.auld@gmail.com> wrote:
On 7 November 2016 at 19:49, Robert Bragg <robert@sixbynine.org> wr= ote:
> The maximum OA sampling frequency is now configurable via a
> dev.i915.oa_max_sample_rate sysctl parameter.
>
> Following the precedent set by perf's similar
> kernel.perf_event_max_sample_rate the default maximum rate is 100= 000Hz
>
> Signed-off-by: Robert Bragg <robert@sixbynine.org>
> ---
>=C2=A0 drivers/gpu/drm/i915/i915_perf.c | 61 +++++++++++++++++++++= +++++++++++--------
>=C2=A0 1 file changed, 50 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i= 915/i915_perf.c
> index e51c1d8..1a87fe9 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -82,6 +82,21 @@ static u32 i915_perf_stream_paranoid =3D true;
>=C2=A0 #define INVALID_CTX_ID 0xffffffff
>
>
> +/* For sysctl proc_dointvec_minmax of i915_oa_max_sample_rate
> + *
> + * 160ns is the smallest sampling period we can theoretically program= the OA
> + * unit with on Haswell, corresponding to 6.25MHz.
> + */
> +static int oa_sample_rate_hard_limit =3D 6250000;
> +
> +/* Theoretically we can program the OA unit to sample every 160ns but= don't
> + * allow that by default unless root...
> + *
> + * The default threshold of 100000Hz is based on perf's similar > + * kernel.perf_event_max_sample_rate sysctl parameter.
> + */
> +static u32 i915_oa_max_sample_rate =3D 100000;
> +
>=C2=A0 /* XXX: beware if future OA HW adds new report formats that the = current
>=C2=A0 =C2=A0* code assumes all reports have a power-of-two size and ~(= size - 1) can
>=C2=A0 =C2=A0* be used as a mask to align the OA tail pointer.
> @@ -1314,6 +1329,7 @@ static int read_properties_unlocked(struct = drm_i915_private *dev_priv,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < n_props; i++) {<= br> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0u64 oa_period,= oa_freq_hz;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0u64 id, v= alue;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0int ret;<= br> >
> @@ -1359,21 +1375,35 @@ static int read_properties_unlocked(struc= t drm_i915_private *dev_priv,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return -EINVAL;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0}
>
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0/* NB: The exponent represents a period as follows:
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 *=C2=A0 =C2=A080ns * 2^(period_exponent + 1)
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * Theoretically we can program the OA unit to sample
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0/* Theoretically we can program the OA unit to sample
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * every 160ns but don't allow that by default unle= ss
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * root.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 *
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * Referring to perf's
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * kernel.perf_event_max_sample_rate for a precedent
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * (100000 by default); with an OA exponent of 6 we get
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * a period of 10.240 microseconds -just under 100000Hz
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * On Haswell the period is derived from the exponent
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * as:
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 *
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 *=C2=A0 =C2=A0period =3D 80ns * 2^(exponent + 1)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 */
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0BUILD_BUG_ON(sizeof(oa_period) !=3D 8);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0oa_period =3D 80ull * (2ull << value);
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0/* This check is primarily to ensure that oa_period <=3D > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * UINT32_MAX (before passing to do_div which only
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * accepts a u32 denominator), but we can also skip
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * checking anything < 1Hz which implicitly can't be > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 * limited via an integer oa_max_sample_rate.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 */
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if (value < 6 && !capable(CAP_SYS_ADMIN)) {
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0DRM_ERROR("Minimum OA sampli= ng exponent is 6 without root privileges\n");
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if (oa_period <=3D NSEC_PER_SEC) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0u64 tmp =3D NSEC_PER_SEC;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do_div(tmp, oa_period);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0oa_freq_hz =3D tmp;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0} else
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0oa_freq_hz =3D 0;
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if (oa_freq_hz > i915_oa_max_sample_rate &&
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0!capable(CAP_SYS_ADMIN)) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0DRM_ERROR("OA exponent would= exceed the max sampling frequency (sysctl dev.i915.oa_max_sample_rate) %uH= z without root privileges\n",
This line is getting a little too long.
= =C2=A0
yeah, though the coding style has the exception for string= constants so they remain grepable which can be useful too. Could maybe rem= ove the bracketed naming of the corresponding sysctl parameter, though it d= oes seem like pertinent/useful info if someone were to hit this limit.
<= br>
=C2=A0

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

thanks
--001a11377e18282e240540f2c2b6-- --===============1117849728== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg== --===============1117849728==--