public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 0/9] intel-gpu-top improvements
@ 2014-07-18 15:38 Robert Bragg
  2014-07-18 15:38 ` [PATCH 1/9] intel_gpu_top: don't fclose NULL output Robert Bragg
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

Since it seemed like a re-occurring complaint that developers didn't feel
they could trust the numbers from intel-gpu-top I reviewed the
implementation and came across a few issues that I've tried to address in
this series.

Just to let others know; I'm also experimenting with the possibility of
collecting this data, and more, from the kernel via the perf interface
(building on work originally done by Chris Wilson last year) so these may
just be stop-gap improvements if those experiments pan out.

- Robert

Robert Bragg (9):
  intel_gpu_top: don't fclose NULL output
  intel_gpu_top: aim for 2000 samples per frame
  intel_gpu_top: ignore out of range ring pointers
  intel_gpu_top: read max/current gt freq via sysfs
  intel_reg: rename RING_LEN RING_CTL
  intel_reg: add RING_CCID current context ID reg
  instdone: Add human readable names for HSW
  intel_gpu_top: account for per context statistics
  intel_gpu_top: hide absolute counter values

 lib/instdone.c        |  28 +--
 lib/intel_reg.h       |   8 +-
 tools/intel_gpu_top.c | 519 ++++++++++++++++++++++++++++++++++----------------
 3 files changed, 368 insertions(+), 187 deletions(-)

-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/9] intel_gpu_top: don't fclose NULL output
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 2/9] intel_gpu_top: aim for 2000 samples per frame Robert Bragg
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

---
 tools/intel_gpu_top.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index b5cfda0..fef7f96 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -711,7 +711,8 @@ int main(int argc, char **argv)
 		}
 	}
 
-	fclose(output);
+        if (output)
+                fclose(output);
 
 	intel_register_access_fini();
 	return 0;
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/9] intel_gpu_top: aim for 2000 samples per frame
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
  2014-07-18 15:38 ` [PATCH 1/9] intel_gpu_top: don't fclose NULL output Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 3/9] intel_gpu_top: ignore out of range ring pointers Robert Bragg
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

The previous sample rate of ~167 per frame was rather low in relation to
frequency of the events being measured and so for example the derived
busy status could become quite unstable at times.
---
 tools/intel_gpu_top.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index fef7f96..f60e58b 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -50,7 +50,8 @@
 #define  FORCEWAKE	    0xA18C
 #define  FORCEWAKE_ACK	    0x130090
 
-#define SAMPLES_PER_SEC             10000
+/* Aim for ~2000 samples per frame @ 60fps... */
+#define SAMPLES_PER_SEC             (60 * 2000)
 #define SAMPLES_TO_PERCENT_RATIO    (SAMPLES_PER_SEC / 100)
 
 #define MAX_NUM_TOP_BITS            100
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/9] intel_gpu_top: ignore out of range ring pointers
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
  2014-07-18 15:38 ` [PATCH 1/9] intel_gpu_top: don't fclose NULL output Robert Bragg
  2014-07-18 15:38 ` [PATCH 2/9] intel_gpu_top: aim for 2000 samples per frame Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 4/9] intel_gpu_top: read max/current gt freq via sysfs Robert Bragg
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

---
 tools/intel_gpu_top.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index f60e58b..7574ef0 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -344,6 +344,15 @@ static void ring_sample(struct ring *ring)
 	ring->head = ring_read(ring, RING_HEAD) & HEAD_ADDR;
 	ring->tail = ring_read(ring, RING_TAIL) & TAIL_ADDR;
 
+	/* We sometimes read spurious, out of range pointers which
+	 * we want to ignore. We treat them as idle for now... */
+	if (ring->head > ring->size || ring->tail > ring->size)
+	{
+	    fprintf(stderr, "Ignoring spurious ring pointer\n");
+	    ring->idle++;
+	    return;
+	}
+
 	if (ring->tail == ring->head)
 		ring->idle++;
 
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/9] intel_gpu_top: read max/current gt freq via sysfs
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
                   ` (2 preceding siblings ...)
  2014-07-18 15:38 ` [PATCH 3/9] intel_gpu_top: ignore out of range ring pointers Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 5/9] intel_reg: rename RING_LEN RING_CTL Robert Bragg
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

---
 tools/intel_gpu_top.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 7574ef0..3115b5e 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -39,6 +39,9 @@
 #include <sys/time.h>
 #include <sys/wait.h>
 #include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
 #ifdef HAVE_TERMIOS_H
 #include <termios.h>
 #endif
@@ -127,6 +130,24 @@ gettime(void)
 }
 
 static int
+read_file_int(const char *file)
+{
+	char buf[32];
+	int fd, n;
+
+	fd = open(file, 0);
+	if (fd < 0)
+	    return -1;
+	n = read(fd, buf, sizeof (buf) - 1);
+	close(fd);
+	if (n < 0)
+	    return -1;
+
+	buf[n] = '\0';
+	return strtol(buf, 0, 0);
+}
+
+static int
 top_bits_sort(const void *a, const void *b)
 {
 	struct top_bit * const *bit_a = a;
@@ -280,6 +301,16 @@ print_clock_info(struct pci_device *pci_dev)
 		print_clock("render", render_clock);
 		printf("  ");
 		print_clock("display", display_clock);
+	} else {
+	    int max_render_clock;
+	    int cur_render_clock;
+
+	    max_render_clock = read_file_int("/sys/class/drm/card0/gt_max_freq_mhz");
+	    cur_render_clock = read_file_int("/sys/class/drm/card0/gt_cur_freq_mhz");
+
+	    print_clock("max render", max_render_clock);
+	    printf("  ");
+	    print_clock("current render", cur_render_clock);
 	}
 
 
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/9] intel_reg: rename RING_LEN RING_CTL
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
                   ` (3 preceding siblings ...)
  2014-07-18 15:38 ` [PATCH 4/9] intel_gpu_top: read max/current gt freq via sysfs Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 6/9] intel_reg: add RING_CCID current context ID reg Robert Bragg
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

This register holds more than the length. This also renames the lsb to
RING_ENABLED.
---
 lib/intel_reg.h       | 7 ++-----
 tools/intel_gpu_top.c | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/lib/intel_reg.h b/lib/intel_reg.h
index 56459ea..8a6e3f1 100644
--- a/lib/intel_reg.h
+++ b/lib/intel_reg.h
@@ -662,17 +662,14 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #define START_ADDR          0x03FFFFF8
 #define I830_RING_START_MASK	0xFFFFF000
 
-#define RING_LEN       0x0C
+#define RING_CTL       0x0C
 #define RING_NR_PAGES       0x001FF000 
 #define I830_RING_NR_PAGES	0x001FF000
 #define RING_REPORT_MASK    0x00000006
 #define RING_REPORT_64K     0x00000002
 #define RING_REPORT_128K    0x00000004
 #define RING_NO_REPORT      0x00000000
-#define RING_VALID_MASK     0x00000001
-#define RING_VALID          0x00000001
-#define RING_INVALID        0x00000000
-
+#define RING_ENABLED_MASK   0x00000001
 
 
 /* BitBlt Instructions
diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 3115b5e..6e494b1 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -357,7 +357,7 @@ static uint32_t ring_read(struct ring *ring, uint32_t reg)
 
 static void ring_init(struct ring *ring)
 {
-	ring->size = (((ring_read(ring, RING_LEN) & RING_NR_PAGES) >> 12) + 1) * 4096;
+	ring->size = (((ring_read(ring, RING_CTL) & RING_NR_PAGES) >> 12) + 1) * 4096;
 }
 
 static void ring_reset(struct ring *ring)
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/9] intel_reg: add RING_CCID current context ID reg
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
                   ` (4 preceding siblings ...)
  2014-07-18 15:38 ` [PATCH 5/9] intel_reg: rename RING_LEN RING_CTL Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 7/9] instdone: Add human readable names for HSW Robert Bragg
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

---
 lib/intel_reg.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/intel_reg.h b/lib/intel_reg.h
index 8a6e3f1..51430f4 100644
--- a/lib/intel_reg.h
+++ b/lib/intel_reg.h
@@ -671,6 +671,9 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #define RING_NO_REPORT      0x00000000
 #define RING_ENABLED_MASK   0x00000001
 
+#define RING_CCID 0x150
+#define CCID_ADDR_MASK      0xFFFFF000
+
 
 /* BitBlt Instructions
  *
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/9] instdone: Add human readable names for HSW
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
                   ` (5 preceding siblings ...)
  2014-07-18 15:38 ` [PATCH 6/9] intel_reg: add RING_CCID current context ID reg Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 8/9] intel_gpu_top: account for per context statistics Robert Bragg
  2014-07-18 15:38 ` [PATCH 9/9] intel_gpu_top: hide absolute counter values Robert Bragg
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

---
 lib/instdone.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/lib/instdone.c b/lib/instdone.c
index 99857e2..57b1635 100644
--- a/lib/instdone.c
+++ b/lib/instdone.c
@@ -381,23 +381,23 @@ init_g4x_instdone1(void)
 static void
 init_gen7_instdone(void)
 {
-	gen6_instdone1_bit(1 << 19, "GAM");
+	gen6_instdone1_bit(1 << 19, "Memory Arbiter (GAM)");
 	gen6_instdone1_bit(1 << 18, "GAFM");
-	gen6_instdone1_bit(1 << 17, "TSG");
-	gen6_instdone1_bit(1 << 16, "VFE");
+	gen6_instdone1_bit(1 << 17, "Thread Spawner (TSG)");
+	gen6_instdone1_bit(1 << 16, "Video Front-End (VFE)");
 	gen6_instdone1_bit(1 << 15, "GAFS");
 	gen6_instdone1_bit(1 << 14, "SVG");
-	gen6_instdone1_bit(1 << 13, "URBM");
-	gen6_instdone1_bit(1 << 12, "TDG");
-	gen6_instdone1_bit(1 << 9, "SF");
-	gen6_instdone1_bit(1 << 8, "CL");
-	gen6_instdone1_bit(1 << 7, "SOL");
-	gen6_instdone1_bit(1 << 6, "GS");
-	gen6_instdone1_bit(1 << 5, "DS");
-	gen6_instdone1_bit(1 << 4, "TE");
-	gen6_instdone1_bit(1 << 3, "HS");
-	gen6_instdone1_bit(1 << 2, "VS");
-	gen6_instdone1_bit(1 << 1, "VF");
+	gen6_instdone1_bit(1 << 13, "Uni. Ret. Buf. Mgr. (URBM)");
+	gen6_instdone1_bit(1 << 12, "Thread Dispatcher (TDG)");
+	gen6_instdone1_bit(1 << 9, "FF Strips & Fans (SF)");
+	gen6_instdone1_bit(1 << 8, "FF Clip Unit (CL)");
+	gen6_instdone1_bit(1 << 7, "FF Stream Output Logic (SOL)");
+	gen6_instdone1_bit(1 << 6, "FF Geometry Shader (GS)");
+	gen6_instdone1_bit(1 << 5, "FF Domain Shader (DS)");
+	gen6_instdone1_bit(1 << 4, "FF Tessellation Engine (TE)");
+	gen6_instdone1_bit(1 << 3, "FF Hull Shader (HS)");
+	gen6_instdone1_bit(1 << 2, "FF Vertex Shader (VS)");
+	gen6_instdone1_bit(1 << 1, "FF Vertex Fetch (VF)");
 }
 
 static void
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 8/9] intel_gpu_top: account for per context statistics
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
                   ` (6 preceding siblings ...)
  2014-07-18 15:38 ` [PATCH 7/9] instdone: Add human readable names for HSW Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  2014-07-18 15:38 ` [PATCH 9/9] intel_gpu_top: hide absolute counter values Robert Bragg
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

The pipeline statistics counters represent per context values and so we
can't assume that a snapshot taken once per second will correspond to the
same context as the last snapshot.

We now read the statistics counters for every sample and before and
after each sample we read the current context id to give us a way
of detecting if the context changes.

There is now a slightly clearer separation between collecting sample
data into a fixed sized array and then handling analytics once per
second before printing/outputing the results.

Signed-off-by: Robert Bragg <robert.bragg@intel.com>
---
 tools/intel_gpu_top.c | 490 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 315 insertions(+), 175 deletions(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index 6e494b1..e5582fd 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -24,6 +24,7 @@
  * Authors:
  *    Eric Anholt <eric@anholt.net>
  *    Eugeni Dodonov <eugeni.dodonov@intel.com>
+ *    Robert Bragg <robert.bragg@intel.com>
  *
  */
 
@@ -67,8 +68,6 @@ struct top_bit {
 } top_bits[MAX_NUM_TOP_BITS];
 struct top_bit *top_bits_sorted[MAX_NUM_TOP_BITS];
 
-static uint32_t instdone, instdone1;
-
 static const char *bars[] = {
 	" ",
 	"▏",
@@ -118,8 +117,57 @@ const char *stats_reg_names[STATS_COUNT] = {
 	"PS depth pass",
 };
 
-uint64_t stats[STATS_COUNT];
-uint64_t last_stats[STATS_COUNT];
+struct pipeline_stat {
+	uint64_t start;
+	uint64_t end;
+	unsigned warped:1;
+};
+
+#define MAX_CONTEXTS 10
+struct context {
+	uint32_t id;
+	struct pipeline_stat stats[STATS_COUNT];
+	int n_samples;
+} contexts[MAX_CONTEXTS];
+struct context *contexts_sorted[MAX_CONTEXTS];
+
+enum rings {
+        RING_RENDER,
+        RING_BSD,
+        RING_BSD6,
+        RING_BLIT,
+        MAX_RINGS
+};
+
+static struct ring {
+	const char *name;
+	uint32_t mmio;
+	int size;
+
+	/* used for analytics... */
+	uint64_t full;
+	uint64_t idle;
+	int n_samples;
+} rings[MAX_RINGS] = {
+	{ .name = "render",    .mmio = 0x2030 },
+	{ .name = "bitstream", .mmio = 0x4030 },
+	{ .name = "bitstream", .mmio = 0x12030 },
+	{ .name = "blitter",   .mmio = 0x22030 }
+};
+
+struct ring_sample {
+        uint32_t ccid_start;
+        uint32_t ccid_end;
+        uint32_t head, tail;
+};
+
+struct sample {
+	uint64_t timestamp;
+	struct ring_sample ring_samples[MAX_RINGS];
+	uint32_t instdone;
+	uint32_t instdone1;
+	uint64_t stats[STATS_COUNT];
+};
 
 static unsigned long
 gettime(void)
@@ -163,18 +211,20 @@ top_bits_sort(const void *a, const void *b)
 		return -1;
 }
 
-static void
-update_idle_bit(struct top_bit *top_bit)
+static int
+contexts_sort(const void *a, const void *b)
 {
-	uint32_t reg_val;
+	struct context * const * context_a = a;
+	struct context * const * context_b = b;
+	int a_samples = (*context_a)->n_samples;
+	int b_samples = (*context_b)->n_samples;
 
-	if (top_bit->bit->reg == INSTDONE_1)
-		reg_val = instdone1;
+	if (a_samples < b_samples)
+		return 1;
+	else if (a_samples == b_samples)
+		return 0;
 	else
-		reg_val = instdone;
-
-	if ((reg_val & top_bit->bit->bit) == 0)
-		top_bit->count++;
+		return -1;
 }
 
 static void
@@ -342,14 +392,6 @@ print_percentage_bar(float percent, int cur_line_len)
 	printf("%*s", PERCENTAGE_BAR_END - cur_line_len, "");
 }
 
-struct ring {
-	const char *name;
-	uint32_t mmio;
-	int head, tail, size;
-	uint64_t full;
-	int idle;
-};
-
 static uint32_t ring_read(struct ring *ring, uint32_t reg)
 {
 	return INREG(ring->mmio + reg);
@@ -360,72 +402,36 @@ static void ring_init(struct ring *ring)
 	ring->size = (((ring_read(ring, RING_CTL) & RING_NR_PAGES) >> 12) + 1) * 4096;
 }
 
-static void ring_reset(struct ring *ring)
-{
-	ring->idle = ring->full = 0;
-}
-
-static void ring_sample(struct ring *ring)
-{
-	int full;
-
-	if (!ring->size)
-		return;
-
-	ring->head = ring_read(ring, RING_HEAD) & HEAD_ADDR;
-	ring->tail = ring_read(ring, RING_TAIL) & TAIL_ADDR;
-
-	/* We sometimes read spurious, out of range pointers which
-	 * we want to ignore. We treat them as idle for now... */
-	if (ring->head > ring->size || ring->tail > ring->size)
-	{
-	    fprintf(stderr, "Ignoring spurious ring pointer\n");
-	    ring->idle++;
-	    return;
-	}
-
-	if (ring->tail == ring->head)
-		ring->idle++;
-
-	full = ring->tail - ring->head;
-	if (full < 0)
-		full += ring->size;
-	ring->full += full;
-}
-
 static void ring_print_header(FILE *out, struct ring *ring)
 {
-    fprintf(out, "%.6s%%\tops\t",
-            ring->name
-          );
+	fprintf(out, " %9s%% %6s", ring->name, "ops");
 }
 
-static void ring_print(struct ring *ring, unsigned long samples_per_sec)
+static void ring_print(struct ring *ring)
 {
 	int percent_busy, len;
 
 	if (!ring->size)
 		return;
 
-	percent_busy = 100 - 100 * ring->idle / samples_per_sec;
+	percent_busy = 100 - 100 * ring->idle / ring->n_samples;
 
 	len = printf("%25s busy: %3d%%: ", ring->name, percent_busy);
 	print_percentage_bar (percent_busy, len);
 	printf("%24s space: %d/%d\n",
-		   ring->name,
-		   (int)(ring->full / samples_per_sec),
-		   ring->size);
+	       ring->name,
+	       (int)(ring->full / ring->n_samples),
+	       ring->size);
 }
 
-static void ring_log(struct ring *ring, unsigned long samples_per_sec,
-		FILE *output)
+static void ring_log(struct ring *ring, FILE *output)
 {
 	if (ring->size)
-		fprintf(output, "%3d\t%d\t",
-			(int)(100 - 100 * ring->idle / samples_per_sec),
-			(int)(ring->full / samples_per_sec));
+		fprintf(output, " %10d %6d",
+			(int)(100 - 100 * ring->idle / ring->n_samples),
+			(int)(ring->full / ring->n_samples));
 	else
-		fprintf(output, "-1\t-1\t");
+		fprintf(output, " %10d %6d", -1, -1);
 }
 
 static void
@@ -448,23 +454,109 @@ usage(const char *appname)
 	return;
 }
 
+static int analyse_samples(uint32_t devid, struct sample *samples, int n_samples)
+{
+        int n_contexts = 0;
+        int i;
+
+        for (i = 0; i < MAX_CONTEXTS; i++)
+                contexts[i].n_samples = 0;
+
+        for (i = 0; i < n_samples; i++) {
+                struct sample *sample = samples + i;
+                uint32_t ccid = sample->ring_samples[RING_RENDER].ccid_start;
+                struct context *context = NULL;
+                int bad_render_ring_sample = 0;
+
+                for (int j = 0; j < num_instdone_bits; j++) {
+                        struct top_bit *top_bit = top_bits + j;
+                        uint32_t reg_val;
+
+                        if (top_bit->bit->reg == INSTDONE_1)
+                                reg_val = sample->instdone1;
+                        else
+                                reg_val = sample->instdone;
+
+                        if ((reg_val & top_bit->bit->bit) == 0)
+                                top_bit->count++;
+                }
+
+                for (int j = 0; j < MAX_RINGS; j++) {
+                        struct ring_sample *rs = sample->ring_samples + j;
+
+                        if (!rings[j].size)
+                                continue;
+
+                        /* We sometimes read spurious, out of range
+                         * pointers which we want to ignore... */
+                        if (rs->head < rings[j].size &&
+                            rs->tail < rings[j].size)
+                        {
+                                int32_t full = rs->tail - rs->head;
+
+                                full = rs->tail - rs->head;
+                                if (full < 0)
+                                        full += rings[j].size;
+                                rings[j].full += full;
+
+                                if (!full)
+                                        rings[j].idle++;
+
+                                rings[j].n_samples++;
+                        } else if (j == RING_RENDER)
+                                bad_render_ring_sample = 1;
+                }
+
+                /* Some of the stats are per render context so we
+                 * have bad data if the context changed while
+                 * sampling... */
+                if (bad_render_ring_sample ||
+                    ccid != sample->ring_samples[RING_RENDER].ccid_end)
+                        continue;
+
+                for (int j = 0; j < n_contexts; j++) {
+                        context = contexts + j;
+                        if (context->id == ccid)
+                                break;
+                }
+                if (n_contexts && context->id == ccid) {
+                        context->n_samples++;
+
+                        if (!HAS_STATS_REGS(devid))
+                                continue;
+
+                        for (int j = 0; j < STATS_COUNT; j++) {
+                                if (sample->stats[j] >= context->stats[j].end)
+                                        context->stats[j].end = sample->stats[j];
+                                else
+                                        context->stats[j].warped = 1;
+                        }
+                } else {
+                        if (n_contexts == MAX_CONTEXTS)
+                                continue;
+
+                        context = &contexts[n_contexts++];
+                        context->id = ccid;
+                        context->n_samples = 1;
+
+                        if (!HAS_STATS_REGS(devid))
+                                continue;
+
+                        for (int j = 0; j < STATS_COUNT; j++) {
+                                context->stats[j].start = sample->stats[j];
+                                context->stats[j].end = sample->stats[j];
+                                context->stats[j].warped = 0;
+                        }
+                }
+        }
+
+        return n_contexts;
+}
+
 int main(int argc, char **argv)
 {
 	uint32_t devid;
 	struct pci_device *pci_dev;
-	struct ring render_ring = {
-		.name = "render",
-		.mmio = 0x2030,
-	}, bsd_ring = {
-		.name = "bitstream",
-		.mmio = 0x4030,
-	}, bsd6_ring = {
-		.name = "bitstream",
-		.mmio = 0x12030,
-	}, blt_ring = {
-		.name = "blitter",
-		.mmio = 0x22030,
-	};
 	int i, ch;
 	int samples_per_sec = SAMPLES_PER_SEC;
 	FILE *output = NULL;
@@ -474,6 +566,7 @@ int main(int argc, char **argv)
 	int child_stat;
 	char *cmd=NULL;
 	int interactive=1;
+	struct sample *samples;
 
 	/* Parse options? */
 	while ((ch = getopt(argc, argv, "s:o:e:h")) != -1) {
@@ -512,6 +605,8 @@ int main(int argc, char **argv)
 		}
 	}
 
+	samples = malloc(sizeof(struct sample) * samples_per_sec);
+
 	pci_dev = intel_get_pci_device();
 	devid = pci_dev->device_id;
 	intel_mmio_use_pci_bar(pci_dev);
@@ -551,35 +646,21 @@ int main(int argc, char **argv)
 		top_bits_sorted[i] = &top_bits[i];
 	}
 
+	for (i = 0; i < MAX_CONTEXTS; i++)
+		contexts_sorted[i] = &contexts[i];
+
 	/* Grab access to the registers */
 	intel_register_access_init(pci_dev, 0);
 
-	ring_init(&render_ring);
+	ring_init(&rings[RING_RENDER]);
 	if (IS_GEN4(devid) || IS_GEN5(devid))
-		ring_init(&bsd_ring);
+		ring_init(&rings[RING_BSD]);
 	if (IS_GEN6(devid) || IS_GEN7(devid)) {
-		ring_init(&bsd6_ring);
-		ring_init(&blt_ring);
-	}
-
-	/* Initialize GPU stats */
-	if (HAS_STATS_REGS(devid)) {
-		for (i = 0; i < STATS_COUNT; i++) {
-			uint32_t stats_high, stats_low, stats_high_2;
-
-			do {
-				stats_high = INREG(stats_regs[i] + 4);
-				stats_low = INREG(stats_regs[i]);
-				stats_high_2 = INREG(stats_regs[i] + 4);
-			} while (stats_high != stats_high_2);
-
-			last_stats[i] = (uint64_t)stats_high << 32 |
-				stats_low;
-		}
+		ring_init(&rings[RING_BSD6]);
+		ring_init(&rings[RING_BLIT]);
 	}
 
 	for (;;) {
-		int j;
 		unsigned long long t1, ti, tf, t2;
 		unsigned long long def_sleep = 1000000 / samples_per_sec;
 		unsigned long long last_samples_per_sec = samples_per_sec;
@@ -588,32 +669,63 @@ int main(int argc, char **argv)
 		char clear_screen[] = {0x1b, '[', 'H',
 				       0x1b, '[', 'J',
 				       0x0};
-		int percent;
 		int len;
+		int n_contexts;
 
 		t1 = gettime();
 
-		ring_reset(&render_ring);
-		ring_reset(&bsd_ring);
-		ring_reset(&bsd6_ring);
-		ring_reset(&blt_ring);
-
 		for (i = 0; i < samples_per_sec; i++) {
+			struct sample *sample = samples + i;
 			long long interval;
 			ti = gettime();
+
+			sample->timestamp = t1;
+
 			if (IS_965(devid)) {
-				instdone = INREG(INSTDONE_I965);
-				instdone1 = INREG(INSTDONE_1);
+				sample->instdone = INREG(INSTDONE_I965);
+				sample->instdone1 = INREG(INSTDONE_1);
 			} else
-				instdone = INREG(INSTDONE);
+				sample->instdone = INREG(INSTDONE);
+
+			for (int j = 0; j < MAX_RINGS; j++) {
+				struct ring_sample *rs;
+
+                                if (!rings[j].size)
+                                        continue;
 
-			for (j = 0; j < num_instdone_bits; j++)
-				update_idle_bit(&top_bits[j]);
+                                rs = sample->ring_samples + j;
+				rs->ccid_start =
+				    ring_read(rings + j, RING_CCID) & CCID_ADDR_MASK;
+				rs->head =
+				    ring_read(rings + j, RING_HEAD) & HEAD_ADDR;
+				rs->tail =
+				    ring_read(rings + j, RING_TAIL) & TAIL_ADDR;
+			}
+
+			if (HAS_STATS_REGS(devid)) {
+				for (int j = 0; j < STATS_COUNT; j++) {
+					uint32_t stats_high, stats_low, stats_high_2;
+
+					do {
+					    stats_high = INREG(stats_regs[j] + 4);
+					    stats_low = INREG(stats_regs[j]);
+					    stats_high_2 = INREG(stats_regs[j] + 4);
+					} while (stats_high != stats_high_2);
 
-			ring_sample(&render_ring);
-			ring_sample(&bsd_ring);
-			ring_sample(&bsd6_ring);
-			ring_sample(&blt_ring);
+					sample->stats[j] = (uint64_t)stats_high << 32 |
+					    stats_low;
+				}
+			}
+
+			for (int j = 0; j < MAX_RINGS; j++) {
+				struct ring_sample *rs = sample->ring_samples + j;
+
+                                if (!rings[j].size)
+                                        continue;
+
+				rs->ccid_end =
+				    ring_read(rings + j, RING_CCID) & CCID_ADDR_MASK;
+			}
 
 			tf = gettime();
 			if (tf - t1 >= 1000000) {
@@ -626,47 +738,58 @@ int main(int argc, char **argv)
 				usleep(interval);
 		}
 
-		if (HAS_STATS_REGS(devid)) {
-			for (i = 0; i < STATS_COUNT; i++) {
-				uint32_t stats_high, stats_low, stats_high_2;
+		for (i = 0; i < MAX_RINGS; i++) {
+			struct ring *ring = rings + i;
 
-				do {
-					stats_high = INREG(stats_regs[i] + 4);
-					stats_low = INREG(stats_regs[i]);
-					stats_high_2 = INREG(stats_regs[i] + 4);
-				} while (stats_high != stats_high_2);
+                        if (!ring->size)
+                                continue;
 
-				stats[i] = (uint64_t)stats_high << 32 |
-					stats_low;
-			}
+			ring->full = 0;
+			ring->idle = 0;
+			ring->n_samples = 0;
 		}
+		for (i = 0; i < num_instdone_bits; i++)
+			top_bits[i].count = 0;
+
+                n_contexts = analyse_samples(devid, samples, last_samples_per_sec);
+                if (!n_contexts) {
+                        fprintf(stderr, "Not able to distinguish even one "
+                                "context in samples!");
+                        exit(1);
+                }
 
 		qsort(top_bits_sorted, num_instdone_bits,
 		      sizeof(struct top_bit *), top_bits_sort);
+		qsort(contexts_sorted, MAX_CONTEXTS,
+		      sizeof(struct context *), contexts_sort);
 
 		/* Limit the number of lines printed to the terminal height so the
 		 * most important info (at the top) will stay on screen. */
 		max_lines = -1;
 		if (ioctl(0, TIOCGWINSZ, &ws) != -1)
 			max_lines = ws.ws_row - 6; /* exclude header lines */
-		if (max_lines >= num_instdone_bits)
-			max_lines = num_instdone_bits;
 
 		t2 = gettime();
 		elapsed_time += (t2 - t1) / 1000000.0;
 
 		if (interactive) {
+			int ctx_i = 0;
+			int stat_i = -1; /* account for context header */
+			int percent;
+
 			printf("%s", clear_screen);
 			print_clock_info(pci_dev);
 
-			ring_print(&render_ring, last_samples_per_sec);
-			ring_print(&bsd_ring, last_samples_per_sec);
-			ring_print(&bsd6_ring, last_samples_per_sec);
-			ring_print(&blt_ring, last_samples_per_sec);
+			for (i = 0; i < MAX_RINGS; i++) {
+                                if (!rings[i].size)
+                                        continue;
+				ring_print(rings + i);
+                        }
 
 			printf("\n%30s  %s\n", "task", "percent busy");
 			for (i = 0; i < max_lines; i++) {
-				if (top_bits_sorted[i]->count > 0) {
+				if (i < num_instdone_bits &&
+				    top_bits_sorted[i]->count > 0) {
 					percent = (top_bits_sorted[i]->count * 100) /
 						last_samples_per_sec;
 					len = printf("%30s: %3d%%: ",
@@ -677,14 +800,32 @@ int main(int argc, char **argv)
 					printf("%*s", PERCENTAGE_BAR_END, "");
 				}
 
-				if (i < STATS_COUNT && HAS_STATS_REGS(devid)) {
-					printf("%13s: %llu (%lld/sec)",
-						   stats_reg_names[i],
-						   (long long)stats[i],
-						   (long long)(stats[i] - last_stats[i]));
-					last_stats[i] = stats[i];
+				if (ctx_i < n_contexts && HAS_STATS_REGS(devid)) {
+					struct context *context = contexts_sorted[ctx_i];
+
+					if (stat_i == -1) {
+						percent = (context->n_samples * 100) /
+							last_samples_per_sec;
+						printf("context = %" PRIx32 " : %d%% active",
+						       context->id, percent);
+					} else if (!context->stats[stat_i].warped) {
+						printf("   %-15s: %" PRIu64 " (%" PRIu64 "/sec)",
+						       stats_reg_names[stat_i],
+						       context->stats[stat_i].end,
+						       (context->stats[stat_i].end -
+							context->stats[stat_i].start));
+					} else {
+						printf("   %-15s: %" PRIu64 " (Time Warp Error)",
+						       stats_reg_names[stat_i],
+						       context->stats[stat_i].end);
+					}
+					if (++stat_i == STATS_COUNT) {
+					    ctx_i++;
+					    stat_i = -1;
+					}
 				} else {
-					if (!top_bits_sorted[i]->count)
+					if (i >= num_instdone_bits ||
+					    !top_bits_sorted[i]->count)
 						break;
 				}
 				printf("\n");
@@ -693,51 +834,50 @@ int main(int argc, char **argv)
 		if (output) {
 			/* Print headers for columns at first run */
 			if (print_headers) {
-				fprintf(output, "# time\t");
-				ring_print_header(output, &render_ring);
-				ring_print_header(output, &bsd_ring);
-				ring_print_header(output, &bsd6_ring);
-				ring_print_header(output, &blt_ring);
-				for (i = 0; i < MAX_NUM_TOP_BITS; i++) {
-					if (i < STATS_COUNT && HAS_STATS_REGS(devid)) {
-						fprintf(output, "%.6s\t",
-							   stats_reg_names[i]
-							   );
-					}
-					if (!top_bits[i].count)
-						continue;
+				fprintf(output, "#%15s %10s", "time", "context");
+				for (i = 0; i < MAX_RINGS; i++) {
+                                        if (!rings[i].size)
+                                                continue;
+					ring_print_header(output, &rings[i]);
+                                }
+				for (i = 0; i < STATS_COUNT; i++) {
+					fprintf(output, " %15s", stats_reg_names[i]);
 				}
 				fprintf(output, "\n");
 				print_headers = 0;
 			}
 
 			/* Print statistics */
-			fprintf(output, "%.2f\t", elapsed_time);
-			ring_log(&render_ring, last_samples_per_sec, output);
-			ring_log(&bsd_ring, last_samples_per_sec, output);
-			ring_log(&bsd6_ring, last_samples_per_sec, output);
-			ring_log(&blt_ring, last_samples_per_sec, output);
-
-			for (i = 0; i < MAX_NUM_TOP_BITS; i++) {
-				if (i < STATS_COUNT && HAS_STATS_REGS(devid)) {
-					fprintf(output, "%lu\t",
-						   stats[i] - last_stats[i]);
-					last_stats[i] = stats[i];
+			fprintf(output, " %15.2f %10s", elapsed_time, "");
+
+			for (i = 0; i < MAX_RINGS; i++) {
+                                if (!rings[i].size)
+                                        continue;
+				ring_log(&rings[i], output);
+                        }
+			fprintf(output, "\n");
+			for (i = 0; i < n_contexts; i++) {
+				struct context *context = contexts_sorted[i];
+
+				fprintf(output, " %15s %10" PRIx32, "", context->id);
+
+				for (int j = 0; j < MAX_RINGS; j++) {
+                                        if (!rings[i].size)
+                                                continue;
+					fprintf(output, " %10s %6s", "", "");
+                                }
+
+				for (int j = 0; j < STATS_COUNT; j++) {
+					fprintf(output, " %15" PRIu64,
+						(context->stats[j].end -
+						 context->stats[j].start));
 				}
-					if (!top_bits[i].count)
-						continue;
+				fprintf(output, "\n");
 			}
 			fprintf(output, "\n");
 			fflush(output);
 		}
 
-		for (i = 0; i < num_instdone_bits; i++) {
-			top_bits_sorted[i]->count = 0;
-
-			if (i < STATS_COUNT)
-				last_stats[i] = stats[i];
-		}
-
 		/* Check if child has gone */
 		if (child_pid > 0) {
 			int res;
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 9/9] intel_gpu_top: hide absolute counter values
  2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
                   ` (7 preceding siblings ...)
  2014-07-18 15:38 ` [PATCH 8/9] intel_gpu_top: account for per context statistics Robert Bragg
@ 2014-07-18 15:38 ` Robert Bragg
  8 siblings, 0 replies; 10+ messages in thread
From: Robert Bragg @ 2014-07-18 15:38 UTC (permalink / raw)
  To: intel-gfx

The absolute values of the pipeline statistic counters are more
distracting than they are useful.
---
 tools/intel_gpu_top.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/intel_gpu_top.c b/tools/intel_gpu_top.c
index e5582fd..0ef26f8 100644
--- a/tools/intel_gpu_top.c
+++ b/tools/intel_gpu_top.c
@@ -809,9 +809,8 @@ int main(int argc, char **argv)
 						printf("context = %" PRIx32 " : %d%% active",
 						       context->id, percent);
 					} else if (!context->stats[stat_i].warped) {
-						printf("   %-15s: %" PRIu64 " (%" PRIu64 "/sec)",
+						printf("   %-15s: (%" PRIu64 "/sec)",
 						       stats_reg_names[stat_i],
-						       context->stats[stat_i].end,
 						       (context->stats[stat_i].end -
 							context->stats[stat_i].start));
 					} else {
-- 
2.0.1

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-07-18 15:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-18 15:38 [PATCH 0/9] intel-gpu-top improvements Robert Bragg
2014-07-18 15:38 ` [PATCH 1/9] intel_gpu_top: don't fclose NULL output Robert Bragg
2014-07-18 15:38 ` [PATCH 2/9] intel_gpu_top: aim for 2000 samples per frame Robert Bragg
2014-07-18 15:38 ` [PATCH 3/9] intel_gpu_top: ignore out of range ring pointers Robert Bragg
2014-07-18 15:38 ` [PATCH 4/9] intel_gpu_top: read max/current gt freq via sysfs Robert Bragg
2014-07-18 15:38 ` [PATCH 5/9] intel_reg: rename RING_LEN RING_CTL Robert Bragg
2014-07-18 15:38 ` [PATCH 6/9] intel_reg: add RING_CCID current context ID reg Robert Bragg
2014-07-18 15:38 ` [PATCH 7/9] instdone: Add human readable names for HSW Robert Bragg
2014-07-18 15:38 ` [PATCH 8/9] intel_gpu_top: account for per context statistics Robert Bragg
2014-07-18 15:38 ` [PATCH 9/9] intel_gpu_top: hide absolute counter values Robert Bragg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox