From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Bragg Subject: Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver Date: Wed, 30 Sep 2015 14:36:41 +0100 Message-ID: References: <1443537549-6905-1-git-send-email-robert@sixbynine.org> <20150930083027.GF9929@nuc-i3427.alporthouse.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0661687681==" Return-path: In-Reply-To: <20150930083027.GF9929@nuc-i3427.alporthouse.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson , Robert Bragg , intel-gfx@lists.freedesktop.org, Daniel Vetter , Sourab Gupta , Zhenyu Wang , Jani Nikula , David Airlie , Peter Zijlstra , Ingo Molnar , Kan Liang , Alexander Shishkin , Zheng Yan , Mark Rutland , Matt Fleming , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org List-Id: linux-api@vger.kernel.org --===============0661687681== Content-Type: multipart/alternative; boundary=001a114782847a98db0520f7060d --001a114782847a98db0520f7060d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Sep 30, 2015 at 9:30 AM, Chris Wilson wrote: > On Tue, Sep 29, 2015 at 03:39:03PM +0100, Robert Bragg wrote: > > Updating Mesa and GPU Top to experiment with this was straightforward > > given the similarity to the perf interface. The main difference is tha= t > > it only supports forwarding metrics via read()s instead of an mmaped > > circular buffer. As mentioned above, I think that suits this well, and > > requires no additional copying of data. I think the userspace code has > > ended up being a little simpler too. > > Did you try updating the existing perf based overlay? > I don't recall the overlay attempting to read OA counters, but potentially it could be quite nice to add support - sorry I hadn't considered that so far. I don't believe being perf based or not will affect the effort to do this though. The perf based driver doesn't handle OA counter normalization in the kernel so userspace needs to be able to handle that - which is probably the bigger effort. Something to note here about your early pmu driver, is that it was notably for counters that were explicitly sampled from the cpu using a hrtimer via mmio. I think they were a better fit for the existing perf design than the OA unit, primarily because they were explicitly read from the cpu and each counter was very independent. > > > Overall the driver currently isn't much more code than with perf (~200 > > lines). > > > > Personally my gut feeling a.t.m, is that we should aim to move forward > > independent from perf. > > > > I'd really appreciate some feedback from others on this though. > > > > Daniel and Chris; although I think it made sense at the outset to try > > and use perf, in light of the above would you be open to a non-perf > > based driver for the OA unit? > > No. I strongly dislike that they will be multiple incompatibile perf > interfaces and strongly like the coupling with other profiling that > comes with perf - i.e. we very much want to simultaneously sample CPU > and GPU workloads along with other devices, that information is much > more useful to me for the purposes of scheduling work and maximising > concurrency than optimising shaders. > In this case I don't think there's inherently any more compatibility that comes from using perf or not - no existing userspace will Just Work=E2=84= =A2 with the perf based OA driver. I think some of the cases you're referring to may be ok to expose via the existing perf infrastructure, but I'm currently enabling the OA unit which poses some unique difficulties I've tried to explain. A guiding differentiator may be whether or not the counter is orthogonal (in terms of configuration and normalization) and explicitly readable from the cpu, as to whether the existing perf pmu infrastructure is a good fit. 'i915 perf' shows my lack of imagination naming this and maybe another name could imply a more limited scope. I.e. on a case by case basis, when looking to expose a new counters we can still evaluate whether it makes sense to expose via the existing perf infrastructure or this. - Robert > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" i= n > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > --001a114782847a98db0520f7060d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Wed, Sep 30, 2015 at 9:30 AM, Chris Wilson <chris@chris-wils= on.co.uk> wrote:
On Tue, Sep 29, 2015 at 03:39:03PM +0100, Rober= t Bragg wrote:
> Updating Mesa and GPU Top to experiment with this was straightforward<= br> > given the similarity to the perf interface.=C2=A0 The main difference = is that
> it only supports forwarding metrics via read()s instead of an mmaped > circular buffer. As mentioned above, I think that suits this well, and=
> requires no additional copying of data. I think the userspace code has=
> ended up being a little simpler too.

Did you try updating the existing perf based overlay?

I don't recall the overlay attempting to read OA = counters, but potentially it could be quite nice to add support - sorry I h= adn't considered that so far.

I don't believe bei= ng perf based or not will affect the effort to do this though. The perf bas= ed driver doesn't handle OA counter normalization in the kernel so user= space needs to be able to handle that - which is probably the bigger effort= .

Something to note here about your early pmu driver, is = that it was notably for counters that were explicitly sampled from the cpu = using a hrtimer via mmio. I think they were a better fit for the existing p= erf design than the OA unit, primarily because they were explicitly read fr= om the cpu and each counter was very independent.
=C2=A0

> Overall the driver currently isn't much more code than with perf (= ~200
> lines).
>
> Personally my gut feeling a.t.m, is that we should aim to move forward=
> independent from perf.
>
> I'd really appreciate some feedback from others on this though. >
> Daniel and Chris; although I think it made sense at the outset to try<= br> > and use perf, in light of the above would you be open to a non-perf > based driver for the OA unit?

No. I strongly dislike that they will be multiple incompatibile perf=
interfaces and strongly like the coupling with other profiling that
comes with perf - i.e. we very much want to simultaneously sample CPU
and GPU workloads along with other devices, that information is much
more useful to me for the purposes of scheduling work and maximising
concurrency than optimising shaders.

In this case = I don't think there's inherently any more compatibility that comes = from using perf or not - no existing userspace will Just Work=E2=84=A2 with= the perf based OA driver.

I think some of the cases you&= #39;re referring to may be ok to expose via the existing perf infrastructur= e, but I'm currently enabling the OA unit which poses some unique diffi= culties I've tried to explain.

A guiding differentiat= or may be whether or not the counter is orthogonal (in terms of configurati= on and normalization) and explicitly readable from the cpu, as to whether t= he existing perf pmu infrastructure is a good fit.

'i= 915 perf' shows my lack of imagination naming this and maybe another n= ame could imply a more limited scope. I.e. on a case by case basis, when lo= oking to expose a new counters we can still evaluate whether it makes sense= to expose via the existing perf infrastructure or this.

= - Robert
=C2=A0
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel= " in
the body of a message to major= domo@vger.kernel.org
More majordomo info at=C2=A0 http://vger.kernel.org/majord= omo-info.html
Please read the FAQ at=C2=A0 http://www.tux.org/lkml/

--001a114782847a98db0520f7060d-- --===============0661687681== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9pbnRlbC1nZngK --===============0661687681==--