linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ARM64: Accessing perf counters from userspace
@ 2014-11-03 15:04 Yogesh Tillu
  2014-11-03 15:04 ` [RFC PATCH 1/5] Application: reads perf cycle counter using perf_event_open syscall Yogesh Tillu
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: tillu.yogesh, linux-perf-users, linaro-networking, jean.pihet,
	arnd, Andrew.Pinski, mike.holmes, ola.liljedahl, magnus.karlsson,
	Prasun.Kapoor, Yogesh Tillu

We have tried to implement some changes to allow perf counters to be accessed
from user space. Benchmarking so far has show that these are 100s of times 
faster than using syscall(perf_event_open). This would be useful for many use
cases like networking(critical to fast path code), benchmark executionpath with
low budget of cpu cycles etc.

Benchmark figures on ArmV8, "reading perf cycle counter" with below approaches
1) Reading perf cycle counter through perf_event_open syscall 
Result[cpu cycles]: 2000 (For Armv7[Arndale] 5407)
2) Direct access of perf counters from userspace through asm
Result[cpu cycles]: 2 (For Armv7[Arndale] 16)
3) Reading perf cycle counter through vDSO path
Result[cpu cycles]: ~20


Could you please let me know your comments/review. Below are the details about
setup and patchset. 

** Setup details **
Architecture: ArmV8
Board       : Juno Board
Linux kernel: 3.16.0+
Kernel Repo : git://git.linaro.org/kernel/linux-linaro-tracking.git
(Branch:linux-linaro)
Rootfs      : Linaro Ubuntu rootfs
Toolchain   : gcc version 4.9.1 20140529 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.06-02 - Linaro GCC 4.9-2014.06)

1) Reading perf cycle counter through perf_event_open syscall 
*Application to read counter using perf_event_open syscall.
[PATCH] Application reads perf cycle counter using perf_event_open
syscall, and prints Benchmark results.

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
---
 app_readcounter.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 app_readcounter.c


2) Direct access of perf counters from userspace using asm
This setup contains kernel module + header file with implemented asm to access
perf counters + Application uses api provided in header file to access counter.

* Kernel Module: To enable access of counters from userspace
Yogesh Tillu (1):
  Kernel module to Enable userspace access to PMU counters for
    ArmV8

 ARMv8_Module/Makefile         |    8 ++++
 ARMv8_Module/README           |    1 +
 ARMv8_Module/enable_arm_pmu.c |   96 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+)
 create mode 100644 ARMv8_Module/Makefile
 create mode 100644 ARMv8_Module/README
 create mode 100644 ARMv8_Module/enable_arm_pmu.c

* Application:
[PATCH] Added test for Direct access of perf counter from userspace
 using asm.

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
---
 README.directaccess |    8 ++++
 direct_access.c     |   65 ++++++++++++++++++++++++++++
 direct_access.h     |  117 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 190 insertions(+)
 create mode 100644 README.directaccess
 create mode 100644 direct_access.c
 create mode 100644 direct_access.h

3) Reading perf cycle counter through vDSO path
* Kernel Module: To enable access of counters from userspace ( Same as setup (2) )
* Kernel vDSO implementation: vDSO implementation for reading of perf cycle counter
[PATCH] provide open/read function through vDSO for PMU counters
Yogesh Tillu (1):
  To read PMU cycle counter through vDSO Path

 arch/arm64/kernel/vdso/Makefile     |    6 +++---
 arch/arm64/kernel/vdso/vdso.lds.S   |    5 +++++
 arch/arm64/kernel/vdso/vdso_perfc.c |   20 ++++++++++++++++++++
 3 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/kernel/vdso/vdso_perfc.c

* application  : To read perf counter through api(implemented through vDSO)
[PATCH] Test Application: access PMU counter through vDSO
Yogesh Tillu (1):
  Test application to read PMU counter through vdso     

 vdso_userspace_perf.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 vdso_userspace_perf.c

NOTE: This codebase mainly for POC of "Access perf counters from userspace",
not much concentration towards api standard forms.

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/5] Application: reads perf cycle counter using perf_event_open syscall
  2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
@ 2014-11-03 15:04 ` Yogesh Tillu
  2014-11-03 15:04 ` [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8 Yogesh Tillu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: tillu.yogesh, linux-perf-users, linaro-networking, jean.pihet,
	arnd, Andrew.Pinski, mike.holmes, ola.liljedahl, magnus.karlsson,
	Prasun.Kapoor, Yogesh Tillu

This patchset is for application reading perf cycle counter using syscall
perf_event_open. 

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
---
 app_readcounter.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)
 create mode 100644 app_readcounter.c

diff --git a/app_readcounter.c b/app_readcounter.c
new file mode 100644
index 0000000..5363dd4
--- /dev/null
+++ b/app_readcounter.c
@@ -0,0 +1,83 @@
+/*
+Application to Read perf cycle counter using perf_event_open syscall
+
+To Run: pass randon number to create busy loop
+e.g.: $./app_readcounter 64
+
+*/
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/syscall.h>
+#include <linux/perf_event.h>
+
+static int fddev = -1;
+__attribute__((constructor)) static void
+init(void)
+{
+	static struct perf_event_attr attr;
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.config = PERF_COUNT_HW_CPU_CYCLES;
+	fddev = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+}
+
+__attribute__((destructor)) static void
+fini(void)
+{
+	close(fddev);
+}
+
+static inline long long
+cpucycles(void)
+{
+	long long result = 0;
+	if (read(fddev, &result, sizeof(result)) < sizeof(result)) return 0;
+	return result;
+}
+
+/* Simple loop body to keep things interested. Make sure it gets inlined. */
+static inline int
+loop(int* __restrict__ a, int* __restrict__ b, int n)
+{
+	unsigned sum = 0;
+	int i=0;
+	for ( i = 0; i < n; ++i)
+		if(a[i] > b[i])
+			sum += a[i] + 5;
+	return sum;
+}
+
+int
+main(int ac, char **av)
+{
+        long long time_start = 0;
+        long long time_end   = 0;
+
+        int *a  = NULL;
+        int *b  = NULL;
+        int len = 0;
+	int i,sum = 0;
+        if (ac != 2) return -1;
+        len = atoi(av[1]);
+	printf("%s: len = %d\n", av[0], len);
+
+        a = malloc(len*sizeof(*a));
+        b = malloc(len*sizeof(*b));
+
+        for (i = 0; i < len; ++i) {
+                a[i] = i+128;
+                b[i] = i+64;
+        }
+
+        printf("%s: beginning loop\n", av[0]);
+        time_start = cpucycles();
+        sum = loop(a, b, len);
+        time_end   = cpucycles();
+        printf("%s: done. sum = %d; time delta = %llu\n", av[0], sum, time_end - time_start);
+
+        free(a); free(b);
+        return 0;
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8
  2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
  2014-11-03 15:04 ` [RFC PATCH 1/5] Application: reads perf cycle counter using perf_event_open syscall Yogesh Tillu
@ 2014-11-03 15:04 ` Yogesh Tillu
  2014-11-03 15:22   ` Måns Rullgård
  2014-11-03 16:16   ` Mark Rutland
  2014-11-03 15:04 ` [RFC PATCH 3/5] Application: Added test for Direct access of perf counter from userspace using asm Yogesh Tillu
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: tillu.yogesh, linux-perf-users, linaro-networking, jean.pihet,
	arnd, Andrew.Pinski, mike.holmes, ola.liljedahl, magnus.karlsson,
	Prasun.Kapoor, Yogesh Tillu

This Patchset is for Kernel Module to Enable userspace access to PMU counters(ArmV8)

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
---
 ARMv8_Module/Makefile         |    8 ++++
 ARMv8_Module/README           |    1 +
 ARMv8_Module/enable_arm_pmu.c |   96 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 105 insertions(+)
 create mode 100644 ARMv8_Module/Makefile
 create mode 100644 ARMv8_Module/README
 create mode 100644 ARMv8_Module/enable_arm_pmu.c

diff --git a/ARMv8_Module/Makefile b/ARMv8_Module/Makefile
new file mode 100644
index 0000000..19a31ea
--- /dev/null
+++ b/ARMv8_Module/Makefile
@@ -0,0 +1,8 @@
+obj-m	:= enable_arm_pmu.o
+KDIR	:= /lib/modules/$(shell uname -r)/build
+PWD	:= $(shell pwd)
+
+all:
+	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) clean
diff --git a/ARMv8_Module/README b/ARMv8_Module/README
new file mode 100644
index 0000000..648456b
--- /dev/null
+++ b/ARMv8_Module/README
@@ -0,0 +1 @@
+make ARCH=arm64 clean;make ARCH=arm64 CROSS_COMPILE=~/arm64-tc-14.06/bin/aarch64-linux-gnu- -C ~/work/lava_ci/juno/linux-linaro/workspace/builddir-3.16.0-linaro-juno/ SUBDIRS=`pwd`
diff --git a/ARMv8_Module/enable_arm_pmu.c b/ARMv8_Module/enable_arm_pmu.c
new file mode 100644
index 0000000..5c87b08
--- /dev/null
+++ b/ARMv8_Module/enable_arm_pmu.c
@@ -0,0 +1,96 @@
+/*
+ * Enable user-mode ARM performance counter access.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/smp.h>
+/** -- Configuration stuff ------------------------------------------------- */
+
+#define DRVR_NAME "enable_arm_pmu"
+
+#if !defined(__aarch64__)
+	#error Module can only be compiled on ARM 64 machines.
+#endif
+
+/** -- Initialization & boilerplate ---------------------------------------- */
+
+#define PERF_DEF_OPTS 		(1 | 16)
+#define PERF_OPT_RESET_CYCLES 	(2 | 4)
+#define PERF_OPT_DIV64 		(8)
+#define ARMV8_PMCR_MASK         0x3f
+#define ARMV8_PMCR_E            (1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P            (1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C            (1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D            (1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X            (1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP           (1 << 5) /* Disable CCNT if non-invasive debug*/
+#define ARMV8_PMCR_N_SHIFT      11       /* Number of counters supported */
+#define ARMV8_PMCR_N_MASK       0x1f
+
+#define ARMV8_PMUSERENR_EN_EL0  (1 << 0) /* EL0 access enable */
+#define ARMV8_PMUSERENR_CR      (1 << 2) /* Cycle counter read enable */
+#define ARMV8_PMUSERENR_ER      (1 << 3) /* Event counter read enable */
+
+static inline u32 armv8pmu_pmcr_read(void)
+{
+        u64 val=0;
+        asm volatile("mrs %0, pmcr_el0" : "=r" (val));
+        return (u32)val;
+}
+static inline void armv8pmu_pmcr_write(u32 val)
+{
+        val &= ARMV8_PMCR_MASK;
+        isb();
+        asm volatile("msr pmcr_el0, %0" : : "r" ((u64)val));
+}
+ 
+static void
+enable_cpu_counters(void* data)
+{
+	u32 val=0;
+/* Enable user-mode access to counters. */
+	asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)ARMV8_PMUSERENR_EN_EL0|ARMV8_PMUSERENR_ER|ARMV8_PMUSERENR_CR));
+/* Initialize & Reset PMNC: C and P bits. */
+	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C); 
+/*G4.4.11
+PMINTENSET, Performance Monitors Interrupt Enable Set register */
+/*cycle counter overflow interrupt request is disabled */
+	asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
+/*start*/
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMCR_E);
+}
+
+static void
+disable_cpu_counters(void* data)
+{
+	u32 val=0;
+	printk(KERN_INFO "\n [" DRVR_NAME "] disabling user-mode PMU access on CPU #%d",
+	smp_processor_id());
+
+	/* Program PMU and disable all counters */
+	armv8pmu_pmcr_write(armv8pmu_pmcr_read() |~ARMV8_PMCR_E);
+	/* disable user-mode access to counters. */
+	asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)0));
+
+}
+
+static int __init
+init(void)
+{
+	on_each_cpu(enable_cpu_counters, NULL, 1);
+	printk(KERN_INFO "[" DRVR_NAME "] initialized");
+	return 0;
+}
+
+static void __exit
+fini(void)
+{
+	on_each_cpu(disable_cpu_counters, NULL, 1);
+	printk(KERN_INFO "[" DRVR_NAME "] unloaded");
+}
+
+MODULE_AUTHOR("Yogesh Tillu ");
+MODULE_DESCRIPTION("Enables user-mode access to ARMv8 PMU counters");
+MODULE_VERSION("0:0.1-dev");
+module_init(init);
+module_exit(fini);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/5] Application: Added test for Direct access of perf counter from userspace using asm.
  2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
  2014-11-03 15:04 ` [RFC PATCH 1/5] Application: reads perf cycle counter using perf_event_open syscall Yogesh Tillu
  2014-11-03 15:04 ` [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8 Yogesh Tillu
@ 2014-11-03 15:04 ` Yogesh Tillu
  2014-11-03 15:04 ` [RFC PATCH 4/5]ARM64: Kernel: To read PMU cycle counter through vDSO Path Yogesh Tillu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: tillu.yogesh, linux-perf-users, linaro-networking, jean.pihet,
	arnd, Andrew.Pinski, mike.holmes, ola.liljedahl, magnus.karlsson,
	Prasun.Kapoor, Yogesh Tillu

This patchset contains Test application for accessing perf cycle counter from
userspace using asm.

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
---
 README.directaccess |    8 ++++
 direct_access.c     |   65 ++++++++++++++++++++++++++++
 direct_access.h     |  117 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 190 insertions(+)
 create mode 100644 README.directaccess
 create mode 100644 direct_access.c
 create mode 100644 direct_access.h

diff --git a/README.directaccess b/README.directaccess
new file mode 100644
index 0000000..99e929c
--- /dev/null
+++ b/README.directaccess
@@ -0,0 +1,8 @@
+To Cross-compile application:
+ ~/arm64-tc-14.06/bin/aarch64-linux-gnu-gcc -std=gnu99 -O3 direct_access.c
+ -o direct_access
+
+Run:
+1) Insert kernel module to enable userspace access of perf cycle counter
+2) $./direct_access 64 [ Pass same random number as with test 
+			of perf_event_open ]
diff --git a/direct_access.c b/direct_access.c
new file mode 100644
index 0000000..7a9e9b2
--- /dev/null
+++ b/direct_access.c
@@ -0,0 +1,65 @@
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include "direct_access.h"
+/* Simple loop body to keep things interested. Make sure it gets inlined. */
+static inline int
+loop(int* __restrict__ a, int* __restrict__ b, int n)
+{
+	unsigned sum = 0;
+	for (int i = 0; i < n; ++i)
+		if(a[i] > b[i])
+			sum += a[i] + 5;
+	return sum;
+}
+
+int
+main(int ac, char **av)
+{
+	uint32_t time_start = 0;
+	uint32_t time_end   = 0;
+	int result=0;
+	int *a  = NULL;
+	int *b  = NULL;
+	int len = 0;
+	int sum = 0;
+	int i;
+
+	if (ac != 2) return -1;
+	len = atoi(av[1]);
+
+	a = malloc(len*sizeof(*a));
+	b = malloc(len*sizeof(*b));
+
+	for (int i = 0; i < len; ++i) {
+		a[i] = i+128;
+		b[i] = i+64;
+	}
+/* Open Counter */
+	if(odph_perf_open_counter()!=0)
+		{
+			printf("Error in perf_open_counter\n");
+			goto cleanup;
+		}		
+	printf("\nbeginning busy loop for %s len=%d \n", av[0],len);
+/* Read Counter  with Busy loop */
+	time_start = odph_perf_read_counter();
+	sum = loop(a, b, len);
+	time_end   = odph_perf_read_counter();
+	printf("**********************************************************************\n");
+	printf("Busyloop sum = %d\nTime delta Including Busyloop = %lu [clockcycle]\n", sum, time_end - time_start);
+
+/* Read Counter with profiling read_counter */
+	time_start = odph_perf_read_counter();
+	odph_perf_read_counter();
+	time_end   = odph_perf_read_counter();
+	printf("\nTime delta Without Busyloop   = %lu   [clockcycle]\n", time_end - time_start);
+	printf("**********************************************************************\n");
+
+/* Close Counter */
+	odph_perf_close_counter();
+	free(a); free(b);
+	return 0;
+cleanup: 
+	return -1;
+}
diff --git a/direct_access.h b/direct_access.h
new file mode 100644
index 0000000..f7ac20b
--- /dev/null
+++ b/direct_access.h
@@ -0,0 +1,117 @@
+/* Copyright (c) 2014, Linaro Limited
+ * All rights reserved.
+ *
+ * SPDX-License-Identifier:     BSD-3-Clause
+ */
+
+
+/**
+ * @file
+ *
+ * Performance Counter Direct access Header
+ */
+
+#ifndef DIRECT_ACCESS_H_
+#define DIRECT_ACCESS_H_
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if __aarch64__ /**< Check for ArmV8 */
+#define ARMV8_PMCNTENSET_EL0_ENABLE (1<<31) /**< Enable Perf count reg */
+#endif
+ 
+/**
+ * Open Performance counter 
+ *
+ * @note api to enable performance counters in system, this function does
+ * enable sequence for respective arm versions
+ * 
+ * @param void
+ *
+ * @return 0 if open successfully, otherwise -1
+ */
+static inline int odph_perf_open_counter(void)
+{
+
+#if __aarch64__
+/*  Performance Monitors Count Enable Set register bit 31:0 disable, 1 enable */
+	asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));
+	return 0;
+#elif defined(__ARM_ARCH_7A__)
+	return 0;
+#else
+	#error Unsupported Architecture
+	return -1;
+#endif
+}
+
+/**
+ * Read Performance counter 
+ *
+ * @note api to read performance cycle counters in system
+ * 
+ * @param void
+ *
+ * @return cycle counter value if read successfully, otherwise -1
+ */
+static inline uint64_t
+odph_perf_read_counter(void)
+{
+uint64_t ret = 0;
+#if  defined __aarch64__     
+	asm volatile("mrs %0, pmccntr_el0" : "=r" (ret)); 
+	return ret;
+#elif defined(__ARM_ARCH_7A__)
+	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(ret));
+	return ret;
+#else
+	#error Unsupported architecture/compiler!
+	return -1;
+#endif
+}
+
+/**
+ * Write Performance counter 
+ *
+ * @note api to write value to Performance counter, 
+ * NA for now 
+ * 
+ * @param void
+ *
+ * @return 0 if written successfully, otherwise -1
+ */
+static inline int odph_perf_write_counter(void)
+{
+/* Stub */
+}
+
+/**
+ * Close Performance counter 
+ *
+ * @note api to perform close sequnce for cycle counters in system
+ * 
+ * @param void
+ *
+ * @return 0 if close successfully, otherwise -1
+ */
+static inline int odph_perf_close_counter(void)
+{
+#if  defined __aarch64__     
+	/* Performance Monitors Count Enable Set register bit 31:0 disable, 1 enable */
+	asm volatile("msr pmcntenset_el0, %0" : : "r" (0<<31));
+	/* Note above statement does not really clearing register...refer to doc */
+	return 0;
+#elif defined(__ARM_ARCH_7A__)
+	return 0;
+#else
+	#error Unsupported architecture/compiler!
+	return -1;
+#endif
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* DIRECT_ACCESS_H_ */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/5]ARM64: Kernel: To read PMU cycle counter through vDSO Path
  2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
                   ` (2 preceding siblings ...)
  2014-11-03 15:04 ` [RFC PATCH 3/5] Application: Added test for Direct access of perf counter from userspace using asm Yogesh Tillu
@ 2014-11-03 15:04 ` Yogesh Tillu
  2014-11-03 16:13   ` Mark Rutland
  2014-11-03 15:04 ` [RFC PATCH 5/5] Application: to read PMU counter through vdso Yogesh Tillu
  2014-11-03 15:40 ` [RFC] ARM64: Accessing perf counters from userspace Mark Rutland
  5 siblings, 1 reply; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: tillu.yogesh, linux-perf-users, linaro-networking, jean.pihet,
	arnd, Andrew.Pinski, mike.holmes, ola.liljedahl, magnus.karlsson,
	Prasun.Kapoor, Yogesh Tillu

Kernel patchset to enable vDSO path for reading PMU cycle counter.

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
---
 arch/arm64/kernel/vdso/Makefile     |    6 +++---
 arch/arm64/kernel/vdso/vdso.lds.S   |    5 +++++
 arch/arm64/kernel/vdso/vdso_perfc.c |   20 ++++++++++++++++++++
 3 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/kernel/vdso/vdso_perfc.c

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index 6d20b7d..4fde490 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -5,7 +5,7 @@
 # Heavily based on the vDSO Makefiles for other archs.
 #
 
-obj-vdso := gettimeofday.o note.o sigreturn.o
+obj-vdso := gettimeofday.o note.o sigreturn.o armpmu.o
 
 # Build rules
 targets := $(obj-vdso) vdso.so vdso.so.dbg
@@ -43,8 +43,8 @@ $(obj)/vdso-offsets.h: $(obj)/vdso.so.dbg FORCE
 	$(call if_changed,vdsosym)
 
 # Assembly rules for the .S files
-$(obj-vdso): %.o: %.S
-	$(call if_changed_dep,vdsoas)
+#$(obj-vdso): %.o: %.S
+#	$(call if_changed_dep,vdsoas)
 
 # Actual build commands
 quiet_cmd_vdsold = VDSOL $@
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 8154b8d..8cb56e0 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -90,6 +90,11 @@ VERSION
 		__kernel_gettimeofday;
 		__kernel_clock_gettime;
 		__kernel_clock_getres;
+		 /* ADD YOUR VDSO STUFF HERE */
+		perf_read_counter;
+		__vdso_perf_read_counter;
+		perf_open_counter;
+		__vdso_perf_open_counter;
 	local: *;
 	};
 }
diff --git a/arch/arm64/kernel/vdso/vdso_perfc.c b/arch/arm64/kernel/vdso/vdso_perfc.c
new file mode 100644
index 0000000..c363d64
--- /dev/null
+++ b/arch/arm64/kernel/vdso/vdso_perfc.c
@@ -0,0 +1,20 @@
+#include <linux/compiler.h>
+
+int perf_read_counter(void)
+    __attribute__((weak, alias("__vdso__perf_read_counter")));
+int perf_open_counter(void)
+    __attribute__((weak, alias("__vdso__perf_open_counter")));
+
+#define ARMV8_PMCNTENSET_EL0_ENABLE (1<<31) /**< Enable Perf count reg */
+
+__attribute__((no_instrument_function)) int __vdso__perf_read_counter(void)
+{
+int ret = 0;
+asm volatile("mrs %0, pmccntr_el0" : "=r" (ret));
+return ret;
+}
+
+__attribute__((no_instrument_function)) void __vdso__perf_open_counter(void)
+{
+asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 5/5] Application: to read PMU counter through vdso
  2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
                   ` (3 preceding siblings ...)
  2014-11-03 15:04 ` [RFC PATCH 4/5]ARM64: Kernel: To read PMU cycle counter through vDSO Path Yogesh Tillu
@ 2014-11-03 15:04 ` Yogesh Tillu
  2014-11-03 15:40 ` [RFC] ARM64: Accessing perf counters from userspace Mark Rutland
  5 siblings, 0 replies; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: tillu.yogesh, linux-perf-users, linaro-networking, jean.pihet,
	arnd, Andrew.Pinski, mike.holmes, ola.liljedahl, magnus.karlsson,
	Prasun.Kapoor, Yogesh Tillu

Test application to read PMU cycle counter through vDSO path. 

Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>

---
 vdso_userspace_perf.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)
 create mode 100644 vdso_userspace_perf.c

diff --git a/vdso_userspace_perf.c b/vdso_userspace_perf.c
new file mode 100644
index 0000000..42d2682
--- /dev/null
+++ b/vdso_userspace_perf.c
@@ -0,0 +1,58 @@
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/time.h>
+/* Simple loop body to keep things interested. Make sure it gets inlined. */
+static inline int
+loop(int* __restrict__ a, int* __restrict__ b, int n)
+{
+	unsigned sum = 0;
+	for (int i = 0; i < n; ++i)
+		if(a[i] > b[i])
+			sum += a[i] + 5;
+	return sum;
+}
+
+int
+main(int ac, char **av)
+{
+	uint32_t time_start = 0;
+	uint32_t time_end   = 0;
+	int result=0;
+	int *a  = NULL;
+	int *b  = NULL;
+	int len = 0;
+	int sum = 0;
+	int i;
+	struct timeval tv;
+
+	if (ac != 2) return -1;
+	len = atoi(av[1]);
+
+	a = malloc(len*sizeof(*a));
+	b = malloc(len*sizeof(*b));
+
+	for (int i = 0; i < len; ++i) {
+		a[i] = i+128;
+		b[i] = i+64;
+	}
+/* Read Counter  with Busy loop */
+	perf_open_counter();
+	time_start = perf_read_counter();
+	sum = loop(a, b, len);
+	time_end   = perf_read_counter();
+	printf("**********************************************************************\n");
+	printf("Busyloop sum = %d\nTime delta Including Busyloop = %lu [clockcycle]\n", sum, time_end - time_start);
+
+/* Read Counter with profiling read_counter */
+	time_start = perf_read_counter();
+	perf_read_counter();
+	time_end   = perf_read_counter();
+	printf("\nTime delta Without Busyloop   = %lu   [clockcycle]\n", time_end - time_start);
+	printf("**********************************************************************\n");
+
+	free(a); free(b);
+	return 0;
+cleanup: 
+	return -1;
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8
  2014-11-03 15:04 ` [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8 Yogesh Tillu
@ 2014-11-03 15:22   ` Måns Rullgård
  2014-11-03 16:16   ` Mark Rutland
  1 sibling, 0 replies; 12+ messages in thread
From: Måns Rullgård @ 2014-11-03 15:22 UTC (permalink / raw)
  To: Yogesh Tillu
  Cc: magnus.karlsson, tillu.yogesh, Prasun.Kapoor, linux-perf-users,
	Andrew.Pinski, mike.holmes, ola.liljedahl, linaro-networking,
	jean.pihet, arnd, linux-arm-kernel

Yogesh Tillu <yogesh.tillu@linaro.org> writes:

> This Patchset is for Kernel Module to Enable userspace access to PMU
> counters(ArmV8)

This would make a lot more sense as a sysfs control.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] ARM64: Accessing perf counters from userspace
  2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
                   ` (4 preceding siblings ...)
  2014-11-03 15:04 ` [RFC PATCH 5/5] Application: to read PMU counter through vdso Yogesh Tillu
@ 2014-11-03 15:40 ` Mark Rutland
  2014-11-04 18:32   ` Yogesh Tillu
  5 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2014-11-03 15:40 UTC (permalink / raw)
  To: Yogesh Tillu
  Cc: linux-arm-kernel@lists.infradead.org,
	magnus.karlsson@avagotech.com, tillu.yogesh@gmail.com,
	Prasun.Kapoor@caviumnetworks.com,
	linux-perf-users@vger.kernel.org,
	Andrew.Pinski@caviumnetworks.com, mike.holmes@linaro.org,
	ola.liljedahl@linaro.org, linaro-networking@linaro.org,
	jean.pihet@linaro.org, arnd@linaro.org

Hi,

On Mon, Nov 03, 2014 at 03:04:00PM +0000, Yogesh Tillu wrote:
> We have tried to implement some changes to allow perf counters to be accessed
> from user space. Benchmarking so far has show that these are 100s of times 
> faster than using syscall(perf_event_open). This would be useful for many use
> cases like networking(critical to fast path code), benchmark executionpath with
> low budget of cpu cycles etc.
> 
> Benchmark figures on ArmV8, "reading perf cycle counter" with below approaches
> 1) Reading perf cycle counter through perf_event_open syscall 
> Result[cpu cycles]: 2000 (For Armv7[Arndale] 5407)
> 2) Direct access of perf counters from userspace through asm
> Result[cpu cycles]: 2 (For Armv7[Arndale] 16)
> 3) Reading perf cycle counter through vDSO path
> Result[cpu cycles]: ~20
> 
> 
> Could you please let me know your comments/review. Below are the details about
> setup and patchset. 

For there to be any meaningful review of this, it needs to be based on a
kernel tree, and implemented within the existing perf framework; it
cannot be a module on the side. This is impossible to review, because it
looks nothing like what a real solution will have to.

Please base this on a kernel tree, and integrate with the existing
frameworks.

It would also be helpful if you could describe a use case for which the
current mechanisms are too expensive. It will certainly be cheaper to
read the registers directly, but there is additional work userspace will
need to do in addition to simply reading the registers. That can impact
the use-case.

It's unclear to me why you cannot amortize the cost of the reads over a
number of iterations. A specific (non-trivial) example would help.

Thanks,
Mark.

> 
> ** Setup details **
> Architecture: ArmV8
> Board       : Juno Board
> Linux kernel: 3.16.0+
> Kernel Repo : git://git.linaro.org/kernel/linux-linaro-tracking.git
> (Branch:linux-linaro)
> Rootfs      : Linaro Ubuntu rootfs
> Toolchain   : gcc version 4.9.1 20140529 (prerelease)
> (crosstool-NG linaro-1.13.1-4.9-2014.06-02 - Linaro GCC 4.9-2014.06)
> 
> 1) Reading perf cycle counter through perf_event_open syscall 
> *Application to read counter using perf_event_open syscall.
> [PATCH] Application reads perf cycle counter using perf_event_open
> syscall, and prints Benchmark results.
> 
> Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
> ---
>  app_readcounter.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 83 insertions(+)
>  create mode 100644 app_readcounter.c
> 
> 
> 2) Direct access of perf counters from userspace using asm
> This setup contains kernel module + header file with implemented asm to access
> perf counters + Application uses api provided in header file to access counter.
> 
> * Kernel Module: To enable access of counters from userspace
> Yogesh Tillu (1):
>   Kernel module to Enable userspace access to PMU counters for
>     ArmV8
> 
>  ARMv8_Module/Makefile         |    8 ++++
>  ARMv8_Module/README           |    1 +
>  ARMv8_Module/enable_arm_pmu.c |   96 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 105 insertions(+)
>  create mode 100644 ARMv8_Module/Makefile
>  create mode 100644 ARMv8_Module/README
>  create mode 100644 ARMv8_Module/enable_arm_pmu.c
> 
> * Application:
> [PATCH] Added test for Direct access of perf counter from userspace
>  using asm.
> 
> Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
> ---
>  README.directaccess |    8 ++++
>  direct_access.c     |   65 ++++++++++++++++++++++++++++
>  direct_access.h     |  117 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 190 insertions(+)
>  create mode 100644 README.directaccess
>  create mode 100644 direct_access.c
>  create mode 100644 direct_access.h
> 
> 3) Reading perf cycle counter through vDSO path
> * Kernel Module: To enable access of counters from userspace ( Same as setup (2) )
> * Kernel vDSO implementation: vDSO implementation for reading of perf cycle counter
> [PATCH] provide open/read function through vDSO for PMU counters
> Yogesh Tillu (1):
>   To read PMU cycle counter through vDSO Path
> 
>  arch/arm64/kernel/vdso/Makefile     |    6 +++---
>  arch/arm64/kernel/vdso/vdso.lds.S   |    5 +++++
>  arch/arm64/kernel/vdso/vdso_perfc.c |   20 ++++++++++++++++++++
>  3 files changed, 28 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm64/kernel/vdso/vdso_perfc.c
> 
> * application  : To read perf counter through api(implemented through vDSO)
> [PATCH] Test Application: access PMU counter through vDSO
> Yogesh Tillu (1):
>   Test application to read PMU counter through vdso     
> 
>  vdso_userspace_perf.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 58 insertions(+)
>  create mode 100644 vdso_userspace_perf.c
> 
> NOTE: This codebase mainly for POC of "Access perf counters from userspace",
> not much concentration towards api standard forms.
> 
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 4/5]ARM64: Kernel: To read PMU cycle counter through vDSO Path
  2014-11-03 15:04 ` [RFC PATCH 4/5]ARM64: Kernel: To read PMU cycle counter through vDSO Path Yogesh Tillu
@ 2014-11-03 16:13   ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2014-11-03 16:13 UTC (permalink / raw)
  To: Yogesh Tillu
  Cc: linux-arm-kernel@lists.infradead.org,
	magnus.karlsson@avagotech.com, tillu.yogesh@gmail.com,
	Prasun.Kapoor@caviumnetworks.com,
	linux-perf-users@vger.kernel.org,
	Andrew.Pinski@caviumnetworks.com, mike.holmes@linaro.org,
	ola.liljedahl@linaro.org, linaro-networking@linaro.org,
	jean.pihet@linaro.org, arnd@linaro.org

On Mon, Nov 03, 2014 at 03:04:04PM +0000, Yogesh Tillu wrote:
> Kernel patchset to enable vDSO path for reading PMU cycle counter.
> 
> Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
> ---
>  arch/arm64/kernel/vdso/Makefile     |    6 +++---
>  arch/arm64/kernel/vdso/vdso.lds.S   |    5 +++++
>  arch/arm64/kernel/vdso/vdso_perfc.c |   20 ++++++++++++++++++++
>  3 files changed, 28 insertions(+), 3 deletions(-)
>  create mode 100644 arch/arm64/kernel/vdso/vdso_perfc.c
> 
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index 6d20b7d..4fde490 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -5,7 +5,7 @@
>  # Heavily based on the vDSO Makefiles for other archs.
>  #
>  
> -obj-vdso := gettimeofday.o note.o sigreturn.o
> +obj-vdso := gettimeofday.o note.o sigreturn.o armpmu.o
>  
>  # Build rules
>  targets := $(obj-vdso) vdso.so vdso.so.dbg
> @@ -43,8 +43,8 @@ $(obj)/vdso-offsets.h: $(obj)/vdso.so.dbg FORCE
>  	$(call if_changed,vdsosym)
>  
>  # Assembly rules for the .S files
> -$(obj-vdso): %.o: %.S
> -	$(call if_changed_dep,vdsoas)
> +#$(obj-vdso): %.o: %.S
> +#	$(call if_changed_dep,vdsoas)

Either this is unnecessary, and it goes, or it is necessary, and it
stays. Do not half-remove code in this fashion.

>  # Actual build commands
>  quiet_cmd_vdsold = VDSOL $@
> diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
> index 8154b8d..8cb56e0 100644
> --- a/arch/arm64/kernel/vdso/vdso.lds.S
> +++ b/arch/arm64/kernel/vdso/vdso.lds.S
> @@ -90,6 +90,11 @@ VERSION
>  		__kernel_gettimeofday;
>  		__kernel_clock_gettime;
>  		__kernel_clock_getres;
> +		 /* ADD YOUR VDSO STUFF HERE */

This comment adds no value.

> +		perf_read_counter;
> +		__vdso_perf_read_counter;
> +		perf_open_counter;
> +		__vdso_perf_open_counter;

I believe we need a new version label for these symbols.

Why are we exposing internal names outside of the VDSO? That would seem
to defeat the point of this linker script.

>  	local: *;
>  	};
>  }
> diff --git a/arch/arm64/kernel/vdso/vdso_perfc.c b/arch/arm64/kernel/vdso/vdso_perfc.c
> new file mode 100644
> index 0000000..c363d64
> --- /dev/null
> +++ b/arch/arm64/kernel/vdso/vdso_perfc.c
> @@ -0,0 +1,20 @@
> +#include <linux/compiler.h>
> +
> +int perf_read_counter(void)
> +    __attribute__((weak, alias("__vdso__perf_read_counter")));
> +int perf_open_counter(void)
> +    __attribute__((weak, alias("__vdso__perf_open_counter")));

Why is this not in plain assembly like the rest of the VDSO functions?

There is absolutely no reason for a C wrapper for an asm block,
especially given the additional changes required to make that work at
all.

> +
> +#define ARMV8_PMCNTENSET_EL0_ENABLE (1<<31) /**< Enable Perf count reg */
> +
> +__attribute__((no_instrument_function)) int __vdso__perf_read_counter(void)
> +{
> +int ret = 0;
> +asm volatile("mrs %0, pmccntr_el0" : "=r" (ret));
> +return ret;
> +}
> +
> +__attribute__((no_instrument_function)) void __vdso__perf_open_counter(void)
> +{
> +asm volatile("msr pmcntenset_el0, %0" : : "r" (ARMV8_PMCNTENSET_EL0_ENABLE));
> +}

Huh?

This function is completely misnamed, as it enables the cycle counter in
hardware -- it does not 'open' any counter in the traditional perf
meaning. It does so without notifying the kernel, and no code has been
added to context switch the counter.

This will not work as-is for all but the most trivial of test cases:

* The application's view of the cycle counter will jump up arbitrarily
  when the kernel/hypervisor/firmware takes control of the hardware
  (e.g. to handle interrupts).

* The application's view of the cycle counter can change arbitrarily as
  it gets migrated across CPUs. The counter value can change, and its
  configuration can also change (e.g. when moving from a CPU where it is
  enabled to one where it is not).

* If another application is profiling system-wide, it will cause the
  cycle counter to be reset occasionally on overflow.

* With cpuidle, the hardware context (including the cycle counter
  configuration) can be lost in low power states. The cycle counter may
  suddenly stop ticking, and stay at an arbitrary reset value.

If you want to expose the counters directly to userspace for reading,
then you need to modify the existing framework to work with that. It is
simply not possible to hack this onto the side. Otherwise the numbers
you read are effectively meaningless.

There are additional complications for big.LITTLE when using anything
other than the cycle counter (which even then is meaningless when
summed).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8
  2014-11-03 15:04 ` [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8 Yogesh Tillu
  2014-11-03 15:22   ` Måns Rullgård
@ 2014-11-03 16:16   ` Mark Rutland
  1 sibling, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2014-11-03 16:16 UTC (permalink / raw)
  To: Yogesh Tillu
  Cc: linux-arm-kernel@lists.infradead.org,
	magnus.karlsson@avagotech.com, tillu.yogesh@gmail.com,
	Prasun.Kapoor@caviumnetworks.com,
	linux-perf-users@vger.kernel.org,
	Andrew.Pinski@caviumnetworks.com, mike.holmes@linaro.org,
	ola.liljedahl@linaro.org, linaro-networking@linaro.org,
	jean.pihet@linaro.org, arnd@linaro.org

On Mon, Nov 03, 2014 at 03:04:02PM +0000, Yogesh Tillu wrote:
> This Patchset is for Kernel Module to Enable userspace access to PMU counters(ArmV8)

NAK.

This _must_ be built within the existing framework.

Thanks,
Mark.

> 
> Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
> ---
>  ARMv8_Module/Makefile         |    8 ++++
>  ARMv8_Module/README           |    1 +
>  ARMv8_Module/enable_arm_pmu.c |   96 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 105 insertions(+)
>  create mode 100644 ARMv8_Module/Makefile
>  create mode 100644 ARMv8_Module/README
>  create mode 100644 ARMv8_Module/enable_arm_pmu.c
> 
> diff --git a/ARMv8_Module/Makefile b/ARMv8_Module/Makefile
> new file mode 100644
> index 0000000..19a31ea
> --- /dev/null
> +++ b/ARMv8_Module/Makefile
> @@ -0,0 +1,8 @@
> +obj-m	:= enable_arm_pmu.o
> +KDIR	:= /lib/modules/$(shell uname -r)/build
> +PWD	:= $(shell pwd)
> +
> +all:
> +	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
> +clean:
> +	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) clean
> diff --git a/ARMv8_Module/README b/ARMv8_Module/README
> new file mode 100644
> index 0000000..648456b
> --- /dev/null
> +++ b/ARMv8_Module/README
> @@ -0,0 +1 @@
> +make ARCH=arm64 clean;make ARCH=arm64 CROSS_COMPILE=~/arm64-tc-14.06/bin/aarch64-linux-gnu- -C ~/work/lava_ci/juno/linux-linaro/workspace/builddir-3.16.0-linaro-juno/ SUBDIRS=`pwd`
> diff --git a/ARMv8_Module/enable_arm_pmu.c b/ARMv8_Module/enable_arm_pmu.c
> new file mode 100644
> index 0000000..5c87b08
> --- /dev/null
> +++ b/ARMv8_Module/enable_arm_pmu.c
> @@ -0,0 +1,96 @@
> +/*
> + * Enable user-mode ARM performance counter access.
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/smp.h>
> +/** -- Configuration stuff ------------------------------------------------- */
> +
> +#define DRVR_NAME "enable_arm_pmu"
> +
> +#if !defined(__aarch64__)
> +	#error Module can only be compiled on ARM 64 machines.
> +#endif
> +
> +/** -- Initialization & boilerplate ---------------------------------------- */
> +
> +#define PERF_DEF_OPTS 		(1 | 16)
> +#define PERF_OPT_RESET_CYCLES 	(2 | 4)
> +#define PERF_OPT_DIV64 		(8)
> +#define ARMV8_PMCR_MASK         0x3f
> +#define ARMV8_PMCR_E            (1 << 0) /* Enable all counters */
> +#define ARMV8_PMCR_P            (1 << 1) /* Reset all counters */
> +#define ARMV8_PMCR_C            (1 << 2) /* Cycle counter reset */
> +#define ARMV8_PMCR_D            (1 << 3) /* CCNT counts every 64th cpu cycle */
> +#define ARMV8_PMCR_X            (1 << 4) /* Export to ETM */
> +#define ARMV8_PMCR_DP           (1 << 5) /* Disable CCNT if non-invasive debug*/
> +#define ARMV8_PMCR_N_SHIFT      11       /* Number of counters supported */
> +#define ARMV8_PMCR_N_MASK       0x1f
> +
> +#define ARMV8_PMUSERENR_EN_EL0  (1 << 0) /* EL0 access enable */
> +#define ARMV8_PMUSERENR_CR      (1 << 2) /* Cycle counter read enable */
> +#define ARMV8_PMUSERENR_ER      (1 << 3) /* Event counter read enable */
> +
> +static inline u32 armv8pmu_pmcr_read(void)
> +{
> +        u64 val=0;
> +        asm volatile("mrs %0, pmcr_el0" : "=r" (val));
> +        return (u32)val;
> +}
> +static inline void armv8pmu_pmcr_write(u32 val)
> +{
> +        val &= ARMV8_PMCR_MASK;
> +        isb();
> +        asm volatile("msr pmcr_el0, %0" : : "r" ((u64)val));
> +}
> + 
> +static void
> +enable_cpu_counters(void* data)
> +{
> +	u32 val=0;
> +/* Enable user-mode access to counters. */
> +	asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)ARMV8_PMUSERENR_EN_EL0|ARMV8_PMUSERENR_ER|ARMV8_PMUSERENR_CR));
> +/* Initialize & Reset PMNC: C and P bits. */
> +	armv8pmu_pmcr_write(ARMV8_PMCR_P | ARMV8_PMCR_C); 
> +/*G4.4.11
> +PMINTENSET, Performance Monitors Interrupt Enable Set register */
> +/*cycle counter overflow interrupt request is disabled */
> +	asm volatile("msr pmintenset_el1, %0" : : "r" ((u64)(0 << 31)));
> +/*start*/
> +	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMCR_E);
> +}
> +
> +static void
> +disable_cpu_counters(void* data)
> +{
> +	u32 val=0;
> +	printk(KERN_INFO "\n [" DRVR_NAME "] disabling user-mode PMU access on CPU #%d",
> +	smp_processor_id());
> +
> +	/* Program PMU and disable all counters */
> +	armv8pmu_pmcr_write(armv8pmu_pmcr_read() |~ARMV8_PMCR_E);
> +	/* disable user-mode access to counters. */
> +	asm volatile("msr pmuserenr_el0, %0" : : "r"((u64)0));
> +
> +}
> +
> +static int __init
> +init(void)
> +{
> +	on_each_cpu(enable_cpu_counters, NULL, 1);
> +	printk(KERN_INFO "[" DRVR_NAME "] initialized");
> +	return 0;
> +}
> +
> +static void __exit
> +fini(void)
> +{
> +	on_each_cpu(disable_cpu_counters, NULL, 1);
> +	printk(KERN_INFO "[" DRVR_NAME "] unloaded");
> +}
> +
> +MODULE_AUTHOR("Yogesh Tillu ");
> +MODULE_DESCRIPTION("Enables user-mode access to ARMv8 PMU counters");
> +MODULE_VERSION("0:0.1-dev");
> +module_init(init);
> +module_exit(fini);
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] ARM64: Accessing perf counters from userspace
  2014-11-03 15:40 ` [RFC] ARM64: Accessing perf counters from userspace Mark Rutland
@ 2014-11-04 18:32   ` Yogesh Tillu
       [not found]     ` <CAPiYAf4ZbEP20AEaEyaDJ9ov-w-F-UNDu9OmurL_3YQx+9p5bA@mail.gmail.com>
  0 siblings, 1 reply; 12+ messages in thread
From: Yogesh Tillu @ 2014-11-04 18:32 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel@lists.infradead.org,
	magnus.karlsson@avagotech.com, tillu.yogesh@gmail.com,
	Prasun.Kapoor@caviumnetworks.com,
	linux-perf-users@vger.kernel.org,
	Andrew.Pinski@caviumnetworks.com, mike.holmes@linaro.org,
	ola.liljedahl@linaro.org, linaro-networking@linaro.org,
	jean.pihet@linaro.org, arnd@linaro.org

Hi,
   Please find my reply inline.
On 3 November 2014 21:10, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> On Mon, Nov 03, 2014 at 03:04:00PM +0000, Yogesh Tillu wrote:
> > We have tried to implement some changes to allow perf counters to be accessed
> > from user space. Benchmarking so far has show that these are 100s of times
> > faster than using syscall(perf_event_open). This would be useful for many use
> > cases like networking(critical to fast path code), benchmark executionpath with
> > low budget of cpu cycles etc.
> >
> > Benchmark figures on ArmV8, "reading perf cycle counter" with below approaches
> > 1) Reading perf cycle counter through perf_event_open syscall
> > Result[cpu cycles]: 2000 (For Armv7[Arndale] 5407)
> > 2) Direct access of perf counters from userspace through asm
> > Result[cpu cycles]: 2 (For Armv7[Arndale] 16)
> > 3) Reading perf cycle counter through vDSO path
> > Result[cpu cycles]: ~20
> >
> >
> > Could you please let me know your comments/review. Below are the details about
> > setup and patchset.
>
> For there to be any meaningful review of this, it needs to be based on a
> kernel tree, and implemented within the existing perf framework; it
> cannot be a module on the side. This is impossible to review, because it
> looks nothing like what a real solution will have to.
Agree, I will resend patchset based on kernel tree.
I will rework on Module implementation and try to reimplement it with
CONFIG_ based design to co-exist with kernel perf framework
(as in armv8pmu_reset it Disable access to counters from userspace).

>
> Please base this on a kernel tree, and integrate with the existing
> frameworks.
>
> It would also be helpful if you could describe a use case for which the
> current mechanisms are too expensive. It will certainly be cheaper to
> read the registers directly, but there is additional work userspace will
> need to do in addition to simply reading the registers. That can impact
> the use-case.
With Current mechanism, it takes lot of cpu cycles where "only read of
perf counter" operations are interested. For example, To Benchmark
networking fastpath code like control plane where we have very limited
budget for reading value of counters.

>
> It's unclear to me why you cannot amortize the cost of the reads over a
> number of iterations. A specific (non-trivial) example would help.
Agree, I will try to modify tests with number of iterations.

Thanks,
Yogesh

> Thanks,
> Mark.
>
> >
> > ** Setup details **
> > Architecture: ArmV8
> > Board       : Juno Board
> > Linux kernel: 3.16.0+
> > Kernel Repo : git://git.linaro.org/kernel/linux-linaro-tracking.git
> > (Branch:linux-linaro)
> > Rootfs      : Linaro Ubuntu rootfs
> > Toolchain   : gcc version 4.9.1 20140529 (prerelease)
> > (crosstool-NG linaro-1.13.1-4.9-2014.06-02 - Linaro GCC 4.9-2014.06)
> >
> > 1) Reading perf cycle counter through perf_event_open syscall
> > *Application to read counter using perf_event_open syscall.
> > [PATCH] Application reads perf cycle counter using perf_event_open
> > syscall, and prints Benchmark results.
> >
> > Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
> > ---
> >  app_readcounter.c |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 83 insertions(+)
> >  create mode 100644 app_readcounter.c
> >
> >
> > 2) Direct access of perf counters from userspace using asm
> > This setup contains kernel module + header file with implemented asm to access
> > perf counters + Application uses api provided in header file to access counter.
> >
> > * Kernel Module: To enable access of counters from userspace
> > Yogesh Tillu (1):
> >   Kernel module to Enable userspace access to PMU counters for
> >     ArmV8
> >
> >  ARMv8_Module/Makefile         |    8 ++++
> >  ARMv8_Module/README           |    1 +
> >  ARMv8_Module/enable_arm_pmu.c |   96 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 105 insertions(+)
> >  create mode 100644 ARMv8_Module/Makefile
> >  create mode 100644 ARMv8_Module/README
> >  create mode 100644 ARMv8_Module/enable_arm_pmu.c
> >
> > * Application:
> > [PATCH] Added test for Direct access of perf counter from userspace
> >  using asm.
> >
> > Signed-off-by: Yogesh Tillu <yogesh.tillu@linaro.org>
> > ---
> >  README.directaccess |    8 ++++
> >  direct_access.c     |   65 ++++++++++++++++++++++++++++
> >  direct_access.h     |  117 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 190 insertions(+)
> >  create mode 100644 README.directaccess
> >  create mode 100644 direct_access.c
> >  create mode 100644 direct_access.h
> >
> > 3) Reading perf cycle counter through vDSO path
> > * Kernel Module: To enable access of counters from userspace ( Same as setup (2) )
> > * Kernel vDSO implementation: vDSO implementation for reading of perf cycle counter
> > [PATCH] provide open/read function through vDSO for PMU counters
> > Yogesh Tillu (1):
> >   To read PMU cycle counter through vDSO Path
> >
> >  arch/arm64/kernel/vdso/Makefile     |    6 +++---
> >  arch/arm64/kernel/vdso/vdso.lds.S   |    5 +++++
> >  arch/arm64/kernel/vdso/vdso_perfc.c |   20 ++++++++++++++++++++
> >  3 files changed, 28 insertions(+), 3 deletions(-)
> >  create mode 100644 arch/arm64/kernel/vdso/vdso_perfc.c
> >
> > * application  : To read perf counter through api(implemented through vDSO)
> > [PATCH] Test Application: access PMU counter through vDSO
> > Yogesh Tillu (1):
> >   Test application to read PMU counter through vdso
> >
> >  vdso_userspace_perf.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 58 insertions(+)
> >  create mode 100644 vdso_userspace_perf.c
> >
> > NOTE: This codebase mainly for POC of "Access perf counters from userspace",
> > not much concentration towards api standard forms.
> >
> > --
> > 1.7.9.5
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] ARM64: Accessing perf counters from userspace
       [not found]     ` <CAPiYAf4ZbEP20AEaEyaDJ9ov-w-F-UNDu9OmurL_3YQx+9p5bA@mail.gmail.com>
@ 2014-11-05 17:39       ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2014-11-05 17:39 UTC (permalink / raw)
  To: Ola Liljedahl
  Cc: Yogesh Tillu, linux-arm-kernel@lists.infradead.org,
	magnus.karlsson@avagotech.com, tillu.yogesh@gmail.com,
	Prasun.Kapoor@caviumnetworks.com,
	linux-perf-users@vger.kernel.org,
	Andrew.Pinski@caviumnetworks.com, mike.holmes@linaro.org,
	linaro-networking@linaro.org, jean.pihet@linaro.org,
	arnd@linaro.org

Hi Ola,

On Wed, Nov 05, 2014 at 04:21:52PM +0000, Ola Liljedahl wrote:
>    Re the use case.
>    We would like to profile e.g. number of cycles or caches misses or
>    mispredicted branches for the rather short code path from when a packet is
>    dequeued for processing until this processing stage is complete and the
>    packet is enqueued. This could be as few as 500 or 1000 instructions. No
>    system calls are allowed in this code path (indeed it is unlikely that the
>    networking dataplane application will doing be any system calls at all
>    after initialization). We also don't want to (or can't) average overhead
>    over many iterations just in order to amortize the perf syscall overhead.
>    Re the implementation.
>    I think enabling user space PMU counter access should be done
>    automatically by the kernel when an application requires exclusive access
>    to a PMU counter. This would be a standard feature in the kernel, probably
>    requiring a new flag to perf_even_open() or maybe a new ioctl (reserve PMU
>    counter and return which actual counter was reserved).

There's already a framework used on x86 that we should re-use. No-one
has yet attempted to reuse it, nor does anyone seem to have done the due
diligence to discover it already exists.

There are many problems with giving userspace control over the counters,
and at best we might be able to safely provide userspace with read-only
access. Counter reservation won't fit the existing framework, and the
existing userspace counter access framework doesn't take this approach.

If we can safely expose read-only access in the same manner as x86, and
userspace takes into account the various caveats (e.g. that events can
be rotated across counters), I am not opposed to that. There are a
number of issues that need to be investigated and addressed to make that
possible beyond flipping a bit in a control register.

There's also the problem of big.LITTLE. I don't see how it's possible to
expose access to the counters in any heterogeneous system in a way that
isn't guaranteed to be broken. I suspect that we can't provide raw
counter access on such systems.

Thanks,
Mark.

>    On 4 November 2014 19:32, Yogesh Tillu <[1]yogesh.tillu@linaro.org> wrote:
> 
>      Hi,
>         Please find my reply inline.
>      On 3 November 2014 21:10, Mark Rutland <[2]mark.rutland@arm.com> wrote:
>      >
>      > Hi,
>      >
>      > On Mon, Nov 03, 2014 at 03:04:00PM +0000, Yogesh Tillu wrote:
>      > > We have tried to implement some changes to allow perf counters to be
>      accessed
>      > > from user space. Benchmarking so far has show that these are 100s of
>      times
>      > > faster than using syscall(perf_event_open). This would be useful for
>      many use
>      > > cases like networking(critical to fast path code), benchmark
>      executionpath with
>      > > low budget of cpu cycles etc.
>      > >
>      > > Benchmark figures on ArmV8, "reading perf cycle counter" with below
>      approaches
>      > > 1) Reading perf cycle counter through perf_event_open syscall
>      > > Result[cpu cycles]: 2000 (For Armv7[Arndale] 5407)
>      > > 2) Direct access of perf counters from userspace through asm
>      > > Result[cpu cycles]: 2 (For Armv7[Arndale] 16)
>      > > 3) Reading perf cycle counter through vDSO path
>      > > Result[cpu cycles]: ~20
>      > >
>      > >
>      > > Could you please let me know your comments/review. Below are the
>      details about
>      > > setup and patchset.
>      >
>      > For there to be any meaningful review of this, it needs to be based on
>      a
>      > kernel tree, and implemented within the existing perf framework; it
>      > cannot be a module on the side. This is impossible to review, because
>      it
>      > looks nothing like what a real solution will have to.
>      Agree, I will resend patchset based on kernel tree.
>      I will rework on Module implementation and try to reimplement it with
>      CONFIG_ based design to co-exist with kernel perf framework
>      (as in armv8pmu_reset it Disable access to counters from userspace).
>      >
>      > Please base this on a kernel tree, and integrate with the existing
>      > frameworks.
>      >
>      > It would also be helpful if you could describe a use case for which
>      the
>      > current mechanisms are too expensive. It will certainly be cheaper to
>      > read the registers directly, but there is additional work userspace
>      will
>      > need to do in addition to simply reading the registers. That can
>      impact
>      > the use-case.
>      With Current mechanism, it takes lot of cpu cycles where "only read of
>      perf counter" operations are interested. For example, To Benchmark
>      networking fastpath code like control plane where we have very limited
>      budget for reading value of counters.
>      >
>      > It's unclear to me why you cannot amortize the cost of the reads over
>      a
>      > number of iterations. A specific (non-trivial) example would help.
>      Agree, I will try to modify tests with number of iterations.
> 
>      Thanks,
>      Yogesh
>      > Thanks,
>      > Mark.
>      >
>      > >
>      > > ** Setup details **
>      > > Architecture: ArmV8
>      > > Board       : Juno Board
>      > > Linux kernel: 3.16.0+
>      > > Kernel Repo :
>      git://[3]git.linaro.org/kernel/linux-linaro-tracking.git
>      > > (Branch:linux-linaro)
>      > > Rootfs      : Linaro Ubuntu rootfs
>      > > Toolchain   : gcc version 4.9.1 20140529 (prerelease)
>      > > (crosstool-NG linaro-1.13.1-4.9-2014.06-02 - Linaro GCC 4.9-2014.06)
>      > >
>      > > 1) Reading perf cycle counter through perf_event_open syscall
>      > > *Application to read counter using perf_event_open syscall.
>      > > [PATCH] Application reads perf cycle counter using perf_event_open
>      > > syscall, and prints Benchmark results.
>      > >
>      > > Signed-off-by: Yogesh Tillu <[4]yogesh.tillu@linaro.org>
>      > > ---
>      > >  app_readcounter.c |   83
>      +++++++++++++++++++++++++++++++++++++++++++++++++++++
>      > >  1 file changed, 83 insertions(+)
>      > >  create mode 100644 app_readcounter.c
>      > >
>      > >
>      > > 2) Direct access of perf counters from userspace using asm
>      > > This setup contains kernel module + header file with implemented asm
>      to access
>      > > perf counters + Application uses api provided in header file to
>      access counter.
>      > >
>      > > * Kernel Module: To enable access of counters from userspace
>      > > Yogesh Tillu (1):
>      > >   Kernel module to Enable userspace access to PMU counters for
>      > >     ArmV8
>      > >
>      > >  ARMv8_Module/Makefile         |    8 ++++
>      > >  ARMv8_Module/README           |    1 +
>      > >  ARMv8_Module/enable_arm_pmu.c |   96
>      +++++++++++++++++++++++++++++++++++++++++
>      > >  3 files changed, 105 insertions(+)
>      > >  create mode 100644 ARMv8_Module/Makefile
>      > >  create mode 100644 ARMv8_Module/README
>      > >  create mode 100644 ARMv8_Module/enable_arm_pmu.c
>      > >
>      > > * Application:
>      > > [PATCH] Added test for Direct access of perf counter from userspace
>      > >  using asm.
>      > >
>      > > Signed-off-by: Yogesh Tillu <[5]yogesh.tillu@linaro.org>
>      > > ---
>      > >  README.directaccess |    8 ++++
>      > >  direct_access.c     |   65 ++++++++++++++++++++++++++++
>      > >  direct_access.h     |  117
>      +++++++++++++++++++++++++++++++++++++++++++++++++++
>      > >  3 files changed, 190 insertions(+)
>      > >  create mode 100644 README.directaccess
>      > >  create mode 100644 direct_access.c
>      > >  create mode 100644 direct_access.h
>      > >
>      > > 3) Reading perf cycle counter through vDSO path
>      > > * Kernel Module: To enable access of counters from userspace ( Same
>      as setup (2) )
>      > > * Kernel vDSO implementation: vDSO implementation for reading of
>      perf cycle counter
>      > > [PATCH] provide open/read function through vDSO for PMU counters
>      > > Yogesh Tillu (1):
>      > >   To read PMU cycle counter through vDSO Path
>      > >
>      > >  arch/arm64/kernel/vdso/Makefile     |    6 +++---
>      > >  arch/arm64/kernel/vdso/vdso.lds.S   |    5 +++++
>      > >  arch/arm64/kernel/vdso/vdso_perfc.c |   20 ++++++++++++++++++++
>      > >  3 files changed, 28 insertions(+), 3 deletions(-)
>      > >  create mode 100644 arch/arm64/kernel/vdso/vdso_perfc.c
>      > >
>      > > * application  : To read perf counter through api(implemented
>      through vDSO)
>      > > [PATCH] Test Application: access PMU counter through vDSO
>      > > Yogesh Tillu (1):
>      > >   Test application to read PMU counter through vdso
>      > >
>      > >  vdso_userspace_perf.c |   58
>      +++++++++++++++++++++++++++++++++++++++++++++++++
>      > >  1 file changed, 58 insertions(+)
>      > >  create mode 100644 vdso_userspace_perf.c
>      > >
>      > > NOTE: This codebase mainly for POC of "Access perf counters from
>      userspace",
>      > > not much concentration towards api standard forms.
>      > >
>      > > --
>      > > 1.7.9.5
>      > >
>      > >
>      > > _______________________________________________
>      > > linux-arm-kernel mailing list
>      > > [6]linux-arm-kernel@lists.infradead.org
>      > > [7]http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>      > >
> 
> References
> 
>    Visible links
>    1. mailto:yogesh.tillu@linaro.org
>    2. mailto:mark.rutland@arm.com
>    3. http://git.linaro.org/kernel/linux-linaro-tracking.git
>    4. mailto:yogesh.tillu@linaro.org
>    5. mailto:yogesh.tillu@linaro.org
>    6. mailto:linux-arm-kernel@lists.infradead.org
>    7. http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-05 17:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-03 15:04 [RFC] ARM64: Accessing perf counters from userspace Yogesh Tillu
2014-11-03 15:04 ` [RFC PATCH 1/5] Application: reads perf cycle counter using perf_event_open syscall Yogesh Tillu
2014-11-03 15:04 ` [RFC PATCH 2/5] Kernel module: to Enable userspace access to PMU counters for ArmV8 Yogesh Tillu
2014-11-03 15:22   ` Måns Rullgård
2014-11-03 16:16   ` Mark Rutland
2014-11-03 15:04 ` [RFC PATCH 3/5] Application: Added test for Direct access of perf counter from userspace using asm Yogesh Tillu
2014-11-03 15:04 ` [RFC PATCH 4/5]ARM64: Kernel: To read PMU cycle counter through vDSO Path Yogesh Tillu
2014-11-03 16:13   ` Mark Rutland
2014-11-03 15:04 ` [RFC PATCH 5/5] Application: to read PMU counter through vdso Yogesh Tillu
2014-11-03 15:40 ` [RFC] ARM64: Accessing perf counters from userspace Mark Rutland
2014-11-04 18:32   ` Yogesh Tillu
     [not found]     ` <CAPiYAf4ZbEP20AEaEyaDJ9ov-w-F-UNDu9OmurL_3YQx+9p5bA@mail.gmail.com>
2014-11-05 17:39       ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).