From: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
To: "tursulin@ursulin.net" <tursulin@ursulin.net>
Cc: "peterz@infradead.org" <peterz@infradead.org>,
"Intel-gfx@lists.freedesktop.org"
<Intel-gfx@lists.freedesktop.org>
Subject: Re: [RFC v3 00/11] i915 PMU and engine busy stats
Date: Tue, 12 Sep 2017 02:03:04 +0000 [thread overview]
Message-ID: <1505152854.7443.6.camel@intel.com> (raw)
In-Reply-To: <20170911152559.7077-1-tvrtko.ursulin@linux.intel.com>
Hi,
Just tried v3 series. perf-stat works fine. From the IGT tests which I wrote for i915 PMU
(https://patchwork.freedesktop.org/series/29313/) all pass (assuming pmu.enabled will be exposed
in debugfs) except cpu_online subtest. And this is pretty interesting - see details below.
Ok, be prepared for the long mail:)...
So, cpu_online subtest loads RCS0 engine 100% and starts to put CPUs offline one by one starting
from CPU0 (don't forget to have CONFIG_BOOTPARAM_HOTPLUG_CPU0=y in .config). Test expectation is
that i915 PMU will report almost 100% (with 2% tolerance) busyness of RCS0. Test requests metric
just twice: before running on RCS0 and right after. This fails as follows:
Executed on rcs0 for 32004262184us
i915/rcs0-busy/: 2225723999us
(perf_pmu:6325) CRITICAL: Test assertion failure function test_cpu_online, file perf_pmu.c:719:
(perf_pmu:6325) CRITICAL: Failed assertion: perf_elapsed(&pm.metrics[0]) > (1-USAGE_TOLERANCE) * elapsed_ns(&start, &now)
Stack trace:
#0 [__igt_fail_assert+0xf1]
#1 [__real_main773+0xff1]
#2 [main+0x35]
#3 [__libc_start_main+0xf5]
#4 [_start+0x29]
#5 [<unknown>+0x29]
Subtest cpu_online failed.
**** DEBUG ****
(perf_pmu:6325) DEBUG: Test requirement passed: is_hotplug_cpu0()
(perf_pmu:6325) INFO: perf_init: enabled 1 metrics from 1 requested
(perf_pmu:6325) ioctl-wrappers-DEBUG: Test requirement passed: gem_has_ring(fd, ring)
(perf_pmu:6325) INFO: Executed on rcs0 for 32004262184us
(perf_pmu:6325) INFO: i915/rcs0-busy/: 2225723999us
(perf_pmu:6325) CRITICAL: Test assertion failure function test_cpu_online, file perf_pmu.c:719:
(perf_pmu:6325) CRITICAL: Failed assertion: perf_elapsed(&pm.metrics[0]) > (1-USAGE_TOLERANCE) * elapsed_ns(&start, &now)
Now. Looks like that by itself PMU context migration works. For example, if you will comment out
"perf_pmu_migrate_context(&pmu->base, cpu, target)" you will get:
Executed on rcs0 for 32004434918us
i915/rcs0-busy/: 76623707us
Compare with previous:
Executed on rcs0 for 32004262184us
i915/rcs0-busy/: 2225723999us
This test passed on the previous set of patches, I mean Tvrtko's v2 series + my patches.
So, it seems we are loosing counter values somehow. I saw in the patches that this place really was modified - you
have added subtraction from initial counter value:
static void i915_pmu_event_read(struct perf_event *event)
{
local64_set(&event->count,
__i915_pmu_event_read(event) -
local64_read(&event->hw.prev_count));
}
But looks like the problem is that with the PMU context migration we get sequence of events start/stop (or maybe
add/del) which eventually call our i915_pmu_enable/disable. Here is the dmesg log with the obvious printk:
[ 153.971096] [IGT] perf_pmu: starting subtest cpu_online
[ 153.971151] i915_pmu_enable: event->hw.prev_count=0
[ 154.036015] i915_pmu_disable: event->hw.prev_count=0
[ 154.048027] i915_pmu_enable: event->hw.prev_count=0
[ 154.049343] smpboot: CPU 0 is now offline
[ 155.059028] smpboot: Booting Node 0 Processor 0 APIC 0x0
[ 155.155078] smpboot: CPU 1 is now offline
[ 156.161026] x86: Booting SMP configuration:
[ 156.161027] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 156.197065] IRQ 122: no longer affine to CPU2
[ 156.198087] smpboot: CPU 2 is now offline
[ 157.208028] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 157.263093] smpboot: CPU 3 is now offline
[ 158.273026] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 158.310026] i915_pmu_disable: event->hw.prev_count=76648307
[ 158.319020] i915_pmu_enable: event->hw.prev_count=76648307
[ 158.319098] IRQ 124: no longer affine to CPU4
[ 158.320368] smpboot: CPU 4 is now offline
[ 159.326030] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 159.365306] smpboot: CPU 5 is now offline
[ 160.371030] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 160.421077] IRQ 125: no longer affine to CPU6
[ 160.422093] smpboot: CPU 6 is now offline
[ 161.429030] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 161.467091] smpboot: CPU 7 is now offline
[ 162.473027] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 162.527019] i915_pmu_disable: event->hw.prev_count=4347548222
[ 162.546017] i915_pmu_enable: event->hw.prev_count=4347548222
[ 162.547317] smpboot: CPU 0 is now offline
[ 163.553028] smpboot: Booting Node 0 Processor 0 APIC 0x0
[ 163.621089] smpboot: CPU 1 is now offline
[ 164.627028] x86: Booting SMP configuration:
[ 164.627029] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 164.669308] smpboot: CPU 2 is now offline
[ 165.679025] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 165.717089] smpboot: CPU 3 is now offline
[ 166.723025] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 166.775016] i915_pmu_disable: event->hw.prev_count=8574197312
[ 166.787016] i915_pmu_enable: event->hw.prev_count=8574197312
[ 166.788309] smpboot: CPU 4 is now offline
[ 167.794025] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 167.837114] smpboot: CPU 5 is now offline
[ 168.847025] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 168.889312] smpboot: CPU 6 is now offline
[ 169.899030] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 169.944104] smpboot: CPU 7 is now offline
[ 170.954032] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 171.000016] i915_pmu_disable: event->hw.prev_count=12815138319
[ 171.008017] i915_pmu_enable: event->hw.prev_count=12815138319
[ 171.009304] smpboot: CPU 0 is now offline
[ 172.017028] smpboot: Booting Node 0 Processor 0 APIC 0x0
[ 172.096104] smpboot: CPU 1 is now offline
[ 173.106025] x86: Booting SMP configuration:
[ 173.106026] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 173.147078] smpboot: CPU 2 is now offline
[ 174.153025] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 174.192093] smpboot: CPU 3 is now offline
[ 175.198028] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 175.229042] i915_pmu_disable: event->hw.prev_count=17035889079
[ 175.242030] i915_pmu_enable: event->hw.prev_count=17035889079
[ 175.242163] IRQ fixup: irq 120 move in progress, old vector 131
[ 175.242165] IRQ fixup: irq 121 move in progress, old vector 147
[ 175.242171] IRQ 124: no longer affine to CPU4
[ 175.243435] smpboot: CPU 4 is now offline
[ 176.248040] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 176.285328] smpboot: CPU 5 is now offline
[ 177.296039] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 177.325067] IRQ 125: no longer affine to CPU6
[ 177.326087] smpboot: CPU 6 is now offline
[ 178.335036] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 178.377063] IRQ 122: no longer affine to CPU7
[ 178.378086] smpboot: CPU 7 is now offline
[ 179.388028] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 179.454030] i915_pmu_disable: event->hw.prev_count=21269856967
[ 179.470026] i915_pmu_enable: event->hw.prev_count=21269856967
[ 179.471110] smpboot: CPU 0 is now offline
[ 180.481028] smpboot: Booting Node 0 Processor 0 APIC 0x0
[ 180.551075] smpboot: CPU 1 is now offline
[ 181.558029] x86: Booting SMP configuration:
[ 181.558030] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 181.595096] smpboot: CPU 2 is now offline
[ 182.605029] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 182.657084] smpboot: CPU 3 is now offline
[ 183.668030] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 183.709017] i915_pmu_disable: event->hw.prev_count=25497358644
[ 183.727016] i915_pmu_enable: event->hw.prev_count=25497358644
[ 183.728305] smpboot: CPU 4 is now offline
[ 184.734027] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 184.767090] smpboot: CPU 5 is now offline
[ 185.777036] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 185.823096] smpboot: CPU 6 is now offline
[ 186.829051] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 186.856350] smpboot: CPU 7 is now offline
[ 187.862051] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 187.871216] [IGT] perf_pmu: exiting, ret=99
[ 187.889199] Console: switching to colour frame buffer device 240x67
[ 187.889583] i915_pmu_disable: event->hw.prev_count=29754080941
And the result which I got in userspace for this run were
Executed on rcs0 for 32003587981us
i915/rcs0-busy/: 2247436461us
After that I decided to roll back the change with counting values which I mentioned before, i.e.:
static void i915_pmu_event_read(struct perf_event *event)
{
local64_set(&event->count,
__i915_pmu_event_read(event) /*-
local64_read(&event->hw.prev_count)*/);
}
And I got test PASSED :):
Executed on rcs0 for 32002282603us
i915/rcs0-busy/: 31998855052us
Subtest cpu_online: SUCCESS (33.950s)
At this point I need to go home :). Maybe you will have time to look into this issue? If not, I will continue
tomorrow.
Regards,
Dmitry.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-09-12 2:03 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-11 15:25 [RFC v3 00/11] i915 PMU and engine busy stats Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 01/11] drm/i915: Convert intel_rc6_residency_us to ns Tvrtko Ursulin
2017-09-14 19:48 ` Chris Wilson
2017-09-11 15:25 ` [RFC 02/11] drm/i915: Add intel_energy_uJ Tvrtko Ursulin
2017-09-14 19:49 ` Chris Wilson
2017-09-15 9:18 ` Tvrtko Ursulin
2017-09-14 20:36 ` Ville Syrjälä
2017-09-15 6:56 ` Tvrtko Ursulin
2017-09-15 8:51 ` Chris Wilson
2017-09-15 10:07 ` Tvrtko Ursulin
2017-09-15 10:34 ` Ville Syrjälä
2017-09-15 10:38 ` Chris Wilson
2017-09-15 11:16 ` Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 03/11] drm/i915: Extract intel_get_cagf Tvrtko Ursulin
2017-09-14 19:51 ` Chris Wilson
2017-09-11 15:25 ` [RFC 04/11] drm/i915/pmu: Expose a PMU interface for perf queries Tvrtko Ursulin
2017-09-12 2:06 ` Rogozhkin, Dmitry V
2017-09-12 14:59 ` Tvrtko Ursulin
2017-09-13 8:57 ` [RFC v6 " Tvrtko Ursulin
2017-09-13 10:34 ` [RFC v7 " Tvrtko Ursulin
2017-09-15 0:00 ` Rogozhkin, Dmitry V
2017-09-15 7:57 ` Tvrtko Ursulin
2017-09-14 19:46 ` [RFC " Chris Wilson
2017-09-11 15:25 ` [RFC 05/11] drm/i915/pmu: Suspend sampling when GPU is idle Tvrtko Ursulin
2017-09-13 10:34 ` [RFC v5 " Tvrtko Ursulin
2017-09-14 19:57 ` Chris Wilson
2017-09-15 9:22 ` Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 06/11] drm/i915: Wrap context schedule notification Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 07/11] drm/i915: Engine busy time tracking Tvrtko Ursulin
2017-09-14 20:16 ` Chris Wilson
2017-09-15 9:45 ` Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 08/11] drm/i915: Export engine busy stats in debugfs Tvrtko Ursulin
2017-09-14 20:17 ` Chris Wilson
2017-09-15 9:46 ` Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 09/11] drm/i915/pmu: Wire up engine busy stats to PMU Tvrtko Ursulin
2017-09-11 15:25 ` [RFC 10/11] drm/i915: Export engine stats API to other users Tvrtko Ursulin
2017-09-12 18:35 ` Ben Widawsky
2017-09-14 20:26 ` Chris Wilson
2017-09-15 9:49 ` Tvrtko Ursulin
2017-09-19 19:50 ` Ben Widawsky
2017-09-19 20:11 ` Rogozhkin, Dmitry V
2017-09-29 10:59 ` Joonas Lahtinen
2017-09-11 15:25 ` [RFC 11/11] drm/i915: Gate engine stats collection with a static key Tvrtko Ursulin
2017-09-13 12:18 ` [RFC v3 " Tvrtko Ursulin
2017-09-14 20:22 ` Chris Wilson
2017-09-15 9:51 ` Tvrtko Ursulin
2017-09-11 15:50 ` ✗ Fi.CI.BAT: warning for i915 PMU and engine busy stats (rev3) Patchwork
2017-09-12 2:03 ` Rogozhkin, Dmitry V [this message]
2017-09-12 14:54 ` [RFC v3 00/11] i915 PMU and engine busy stats Tvrtko Ursulin
2017-09-12 22:01 ` Rogozhkin, Dmitry V
2017-09-13 8:54 ` [RFC v6 04/11] drm/i915/pmu: Expose a PMU interface for perf queries Tvrtko Ursulin
2017-09-13 9:01 ` [RFC v3 00/11] i915 PMU and engine busy stats Tvrtko Ursulin
2017-09-13 9:34 ` ✗ Fi.CI.BAT: warning for i915 PMU and engine busy stats (rev4) Patchwork
2017-09-13 10:46 ` ✗ Fi.CI.BAT: failure for i915 PMU and engine busy stats (rev6) Patchwork
2017-09-13 13:27 ` ✓ Fi.CI.BAT: success for i915 PMU and engine busy stats (rev7) Patchwork
2017-09-13 21:24 ` ✓ Fi.CI.IGT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1505152854.7443.6.camel@intel.com \
--to=dmitry.v.rogozhkin@intel.com \
--cc=Intel-gfx@lists.freedesktop.org \
--cc=peterz@infradead.org \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.