* [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V @ 2018-04-18 2:12 Alan Kao 2018-04-18 2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao 2018-04-18 2:12 ` [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide Alan Kao 0 siblings, 2 replies; 5+ messages in thread From: Alan Kao @ 2018-04-18 2:12 UTC (permalink / raw) To: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv, linux-doc, linux-kernel Cc: Alan Kao, Greentime Hu, Nick Hu This implements the baseline PMU for RISC-V platforms. To ease future PMU portings, a guide is also written, containing perf concepts, arch porting practices and some hints. Changes in v4: - Fix several compilation errors. Sorry for that. - Raise a warning in the write_counter body. Changes in v3: - Fix typos in the document. - Change the initialization routine from statically assigning PMU to device-tree-based methods, and set default to the PMU proposed in this patch. Changes in v2: - Fix the bug reported by Alex, which was caused by not sufficient initialization. Check https://lkml.org/lkml/2018/3/31/251 for the discussion. Alan Kao (2): perf: riscv: preliminary RISC-V support perf: riscv: Add Document for Future Porting Guide Documentation/riscv/pmu.txt | 249 ++++++++++++++ arch/riscv/Kconfig | 13 + arch/riscv/include/asm/perf_event.h | 79 ++++- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/perf_event.c | 482 ++++++++++++++++++++++++++++ 5 files changed, 820 insertions(+), 4 deletions(-) create mode 100644 Documentation/riscv/pmu.txt create mode 100644 arch/riscv/kernel/perf_event.c -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 1/2] perf: riscv: preliminary RISC-V support 2018-04-18 2:12 [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao @ 2018-04-18 2:12 ` Alan Kao 2018-04-19 19:46 ` Atish Patra 2018-04-18 2:12 ` [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide Alan Kao 1 sibling, 1 reply; 5+ messages in thread From: Alan Kao @ 2018-04-18 2:12 UTC (permalink / raw) To: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv, linux-doc, linux-kernel Cc: Alan Kao, Nick Hu, Greentime Hu This patch provide a basic PMU, riscv_base_pmu, which supports two general hardware event, instructions and cycles. Furthermore, this PMU serves as a reference implementation to ease the portings in the future. riscv_base_pmu should be able to run on any RISC-V machine that conforms to the Priv-Spec. Note that the latest qemu model hasn't fully support a proper behavior of Priv-Spec 1.10 yet, but work around should be easy with very small fixes. Please check https://github.com/riscv/riscv-qemu/pull/115 for future updates. Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <greentime@andestech.com> Signed-off-by: Alan Kao <alankao@andestech.com> --- arch/riscv/Kconfig | 13 + arch/riscv/include/asm/perf_event.h | 79 ++++- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/perf_event.c | 482 ++++++++++++++++++++++++++++ 4 files changed, 571 insertions(+), 4 deletions(-) create mode 100644 arch/riscv/kernel/perf_event.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index c22ebe08e902..90d9c8e50377 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -203,6 +203,19 @@ config RISCV_ISA_C config RISCV_ISA_A def_bool y +menu "supported PMU type" + depends on PERF_EVENTS + +config RISCV_BASE_PMU + bool "Base Performance Monitoring Unit" + def_bool y + help + A base PMU that serves as a reference implementation and has limited + feature of perf. It can run on any RISC-V machines so serves as the + fallback, but this option can also be disable to reduce kernel size. + +endmenu + endmenu menu "Kernel type" diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h index e13d2ff29e83..0e638a0c3feb 100644 --- a/arch/riscv/include/asm/perf_event.h +++ b/arch/riscv/include/asm/perf_event.h @@ -1,13 +1,84 @@ +/* SPDX-License-Identifier: GPL-2.0 */ /* * Copyright (C) 2018 SiFive + * Copyright (C) 2018 Andes Technology Corporation * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public Licence - * as published by the Free Software Foundation; either version - * 2 of the Licence, or (at your option) any later version. */ #ifndef _ASM_RISCV_PERF_EVENT_H #define _ASM_RISCV_PERF_EVENT_H +#include <linux/perf_event.h> +#include <linux/ptrace.h> + +#define RISCV_BASE_COUNTERS 2 + +/* + * The RISCV_MAX_COUNTERS parameter should be specified. + */ + +#ifdef CONFIG_RISCV_BASE_PMU +#define RISCV_MAX_COUNTERS 2 +#endif + +#ifndef RISCV_MAX_COUNTERS +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." +#endif + +/* + * These are the indexes of bits in counteren register *minus* 1, + * except for cycle. It would be coherent if it can directly mapped + * to counteren bit definition, but there is a *time* register at + * counteren[1]. Per-cpu structure is scarce resource here. + * + * According to the spec, an implementation can support counter up to + * mhpmcounter31, but many high-end processors has at most 6 general + * PMCs, we give the definition to MHPMCOUNTER8 here. + */ +#define RISCV_PMU_CYCLE 0 +#define RISCV_PMU_INSTRET 1 +#define RISCV_PMU_MHPMCOUNTER3 2 +#define RISCV_PMU_MHPMCOUNTER4 3 +#define RISCV_PMU_MHPMCOUNTER5 4 +#define RISCV_PMU_MHPMCOUNTER6 5 +#define RISCV_PMU_MHPMCOUNTER7 6 +#define RISCV_PMU_MHPMCOUNTER8 7 + +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) + +struct cpu_hw_events { + /* # currently enabled events*/ + int n_events; + /* currently enabled events */ + struct perf_event *events[RISCV_MAX_COUNTERS]; + /* vendor-defined PMU data */ + void *platform; +}; + +struct riscv_pmu { + struct pmu *pmu; + + /* generic hw/cache events table */ + const int *hw_events; + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX]; + /* method used to map hw/cache events */ + int (*map_hw_event)(u64 config); + int (*map_cache_event)(u64 config); + + /* max generic hw events in map */ + int max_events; + /* number total counters, 2(base) + x(general) */ + int num_counters; + /* the width of the counter */ + int counter_width; + + /* vendor-defined PMU features */ + void *platform; + + irqreturn_t (*handle_irq)(int irq_num, void *dev); + int irq; +}; + #endif /* _ASM_RISCV_PERF_EVENT_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index ffa439d4a364..f50d19816757 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o +obj-$(CONFIG_PERF_EVENTS) += perf_event.o clean: diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c new file mode 100644 index 000000000000..ba3192afc470 --- /dev/null +++ b/arch/riscv/kernel/perf_event.c @@ -0,0 +1,482 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de> + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar + * Copyright (C) 2009 Jaswinder Singh Rajput + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com> + * Copyright (C) 2009 Google, Inc., Stephane Eranian + * Copyright 2014 Tilera Corporation. All Rights Reserved. + * Copyright (C) 2018 Andes Technology Corporation + * + * Perf_events support for RISC-V platforms. + * + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough + * functionality for perf event to fully work, this file provides + * the very basic framework only. + * + * For platform portings, please check Documentations/riscv/pmu.txt. + * + * The Copyright line includes x86 and tile ones. + */ + +#include <linux/kprobes.h> +#include <linux/kernel.h> +#include <linux/kdebug.h> +#include <linux/mutex.h> +#include <linux/bitmap.h> +#include <linux/irq.h> +#include <linux/interrupt.h> +#include <linux/perf_event.h> +#include <linux/atomic.h> +#include <linux/of.h> +#include <asm/perf_event.h> + +static const struct riscv_pmu *riscv_pmu __read_mostly; +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); + +/* + * Hardware & cache maps and their methods + */ + +static const int riscv_hw_event_map[] = { + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE, + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET, + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP, + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP, +}; + +#define C(x) PERF_COUNT_HW_CACHE_##x +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX] +[PERF_COUNT_HW_CACHE_OP_MAX] +[PERF_COUNT_HW_CACHE_RESULT_MAX] = { + [C(L1D)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(L1I)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(LL)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(DTLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(ITLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, + [C(BPU)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, + }, + }, +}; + +static int riscv_map_hw_event(u64 config) +{ + if (config >= riscv_pmu->max_events) + return -EINVAL; + + return riscv_pmu->hw_events[config]; +} + +int riscv_map_cache_decode(u64 config, unsigned int *type, + unsigned int *op, unsigned int *result) +{ + return -ENOENT; +} + +static int riscv_map_cache_event(u64 config) +{ + unsigned int type, op, result; + int err = -ENOENT; + int code; + + err = riscv_map_cache_decode(config, &type, &op, &result); + if (!riscv_pmu->cache_events || err) + return err; + + if (type >= PERF_COUNT_HW_CACHE_MAX || + op >= PERF_COUNT_HW_CACHE_OP_MAX || + result >= PERF_COUNT_HW_CACHE_RESULT_MAX) + return -EINVAL; + + code = (*riscv_pmu->cache_events)[type][op][result]; + if (code == RISCV_OP_UNSUPP) + return -EINVAL; + + return code; +} + +/* + * Low-level functions: reading/writing counters + */ + +static inline u64 read_counter(int idx) +{ + u64 val = 0; + + switch (idx) { + case RISCV_PMU_CYCLE: + val = csr_read(cycle); + break; + case RISCV_PMU_INSTRET: + val = csr_read(instret); + break; + default: + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS); + return -EINVAL; + } + + return val; +} + +static inline void write_counter(int idx, u64 value) +{ + /* currently not supported */ + WARN_ON_ONCE(1); +} + +/* + * pmu->read: read and update the counter + * + * Other architectures' implementation often have a xxx_perf_event_update + * routine, which can return counter values when called in the IRQ, but + * return void when being called by the pmu->read method. + */ +static void riscv_pmu_read(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 prev_raw_count, new_raw_count; + u64 oldval; + int idx = hwc->idx; + u64 delta; + + do { + prev_raw_count = local64_read(&hwc->prev_count); + new_raw_count = read_counter(idx); + + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count, + new_raw_count); + } while (oldval != prev_raw_count); + + /* + * delta is the value to update the counter we maintain in the kernel. + */ + delta = (new_raw_count - prev_raw_count) & + ((1ULL << riscv_pmu->counter_width) - 1); + local64_add(delta, &event->count); + /* + * Something like local64_sub(delta, &hwc->period_left) here is + * needed if there is an interrupt for perf. + */ +} + +/* + * State transition functions: + * + * stop()/start() & add()/del() + */ + +/* + * pmu->stop: stop the counter + */ +static void riscv_pmu_stop(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { + riscv_pmu->pmu->read(event); + hwc->state |= PERF_HES_UPTODATE; + } +} + +/* + * pmu->start: start the event. + */ +static void riscv_pmu_start(struct perf_event *event, int flags) +{ + struct hw_perf_event *hwc = &event->hw; + + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) + return; + + if (flags & PERF_EF_RELOAD) { + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE)); + + /* + * Set the counter to the period to the next interrupt here, + * if you have any. + */ + } + + hwc->state = 0; + perf_event_update_userpage(event); + + /* + * Since we cannot write to counters, this serves as an initialization + * to the delta-mechanism in pmu->read(); otherwise, the delta would be + * wrong when pmu->read is called for the first time. + */ + local64_set(&hwc->prev_count, read_counter(hwc->idx)); +} + +/* + * pmu->add: add the event to PMU. + */ +static int riscv_pmu_add(struct perf_event *event, int flags) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc = &event->hw; + + if (cpuc->n_events == riscv_pmu->num_counters) + return -ENOSPC; + + /* + * We don't have general conunters, so no binding-event-to-counter + * process here. + * + * Indexing using hwc->config generally not works, since config may + * contain extra information, but here the only info we have in + * hwc->config is the event index. + */ + hwc->idx = hwc->config; + cpuc->events[hwc->idx] = event; + cpuc->n_events++; + + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; + + if (flags & PERF_EF_START) + riscv_pmu->pmu->start(event, PERF_EF_RELOAD); + + return 0; +} + +/* + * pmu->del: delete the event from PMU. + */ +static void riscv_pmu_del(struct perf_event *event, int flags) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct hw_perf_event *hwc = &event->hw; + + cpuc->events[hwc->idx] = NULL; + cpuc->n_events--; + riscv_pmu->pmu->stop(event, PERF_EF_UPDATE); + perf_event_update_userpage(event); +} + +/* + * Interrupt: a skeletion for reference. + */ + +static DEFINE_MUTEX(pmc_reserve_mutex); + +irqreturn_t riscv_base_pmu_handle_irq(int irq_num, void *dev) +{ + return IRQ_NONE; +} + +static int reserve_pmc_hardware(void) +{ + int err = 0; + + mutex_lock(&pmc_reserve_mutex); + if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) { + err = request_irq(riscv_pmu->irq, riscv_pmu->handle_irq, + IRQF_PERCPU, "riscv-base-perf", NULL); + } + mutex_unlock(&pmc_reserve_mutex); + + return err; +} + +void release_pmc_hardware(void) +{ + mutex_lock(&pmc_reserve_mutex); + if (riscv_pmu->irq >=0) { + free_irq(riscv_pmu->irq, NULL); + } + mutex_unlock(&pmc_reserve_mutex); +} + +/* + * Event Initialization/Finalization + */ + +static atomic_t riscv_active_events = ATOMIC_INIT(0); + +static void riscv_event_destroy(struct perf_event *event) +{ + if (atomic_dec_return(&riscv_active_events) == 0) + release_pmc_hardware(); +} + +static int riscv_event_init(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + struct hw_perf_event *hwc = &event->hw; + int err; + int code; + + if (atomic_inc_return(&riscv_active_events) == 1) { + err = reserve_pmc_hardware(); + + if (err) { + pr_warn("PMC hardware not available\n"); + atomic_dec(&riscv_active_events); + return -EBUSY; + } + } + + switch (event->attr.type) { + case PERF_TYPE_HARDWARE: + code = riscv_pmu->map_hw_event(attr->config); + break; + case PERF_TYPE_HW_CACHE: + code = riscv_pmu->map_cache_event(attr->config); + break; + case PERF_TYPE_RAW: + return -EOPNOTSUPP; + default: + return -ENOENT; + } + + event->destroy = riscv_event_destroy; + if (code < 0) { + event->destroy(event); + return code; + } + + /* + * idx is set to -1 because the index of a general event should not be + * decided until binding to some counter in pmu->add(). + * + * But since we don't have such support, later in pmu->add(), we just + * use hwc->config as the index instead. + */ + hwc->config = code; + hwc->idx = -1; + + return 0; +} + +/* + * Initialization + */ + +static struct pmu min_pmu = { + .name = "riscv-base", + .event_init = riscv_event_init, + .add = riscv_pmu_add, + .del = riscv_pmu_del, + .start = riscv_pmu_start, + .stop = riscv_pmu_stop, + .read = riscv_pmu_read, +}; + +static const struct riscv_pmu riscv_base_pmu = { + .pmu = &min_pmu, + .max_events = ARRAY_SIZE(riscv_hw_event_map), + .map_hw_event = riscv_map_hw_event, + .hw_events = riscv_hw_event_map, + .map_cache_event = riscv_map_cache_event, + .cache_events = &riscv_cache_event_map, + .counter_width = 63, + .num_counters = RISCV_BASE_COUNTERS + 0, + .handle_irq = &riscv_base_pmu_handle_irq, + + /* This means this PMU has no IRQ. */ + .irq = -1, +}; + +static const struct of_device_id riscv_pmu_of_ids[] = { + {.compatible = "riscv,base-pmu", .data = &riscv_base_pmu}, + { /* sentinel value */ } +}; + +int __init init_hw_perf_events(void) +{ + struct device_node *node = of_find_node_by_type(NULL, "pmu"); + const struct of_device_id *of_id; + + if (node && (of_id = of_match_node(riscv_pmu_of_ids, node))) + riscv_pmu = of_id->data; + else + riscv_pmu = &riscv_base_pmu; + + perf_pmu_register(riscv_pmu->pmu, "cpu", PERF_TYPE_RAW); + return 0; +} +arch_initcall(init_hw_perf_events); -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/2] perf: riscv: preliminary RISC-V support 2018-04-18 2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao @ 2018-04-19 19:46 ` Atish Patra 2018-04-19 23:24 ` Alan Kao 0 siblings, 1 reply; 5+ messages in thread From: Atish Patra @ 2018-04-19 19:46 UTC (permalink / raw) To: Alan Kao, Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Nick Hu, Greentime Hu On 4/17/18 7:13 PM, Alan Kao wrote: > This patch provide a basic PMU, riscv_base_pmu, which supports two > general hardware event, instructions and cycles. Furthermore, this > PMU serves as a reference implementation to ease the portings in > the future. > > riscv_base_pmu should be able to run on any RISC-V machine that > conforms to the Priv-Spec. Note that the latest qemu model hasn't > fully support a proper behavior of Priv-Spec 1.10 yet, but work > around should be easy with very small fixes. Please check > https://github.com/riscv/riscv-qemu/pull/115 for future updates. > > Cc: Nick Hu <nickhu@andestech.com> > Cc: Greentime Hu <greentime@andestech.com> > Signed-off-by: Alan Kao <alankao@andestech.com> > --- > arch/riscv/Kconfig | 13 + > arch/riscv/include/asm/perf_event.h | 79 ++++- > arch/riscv/kernel/Makefile | 1 + > arch/riscv/kernel/perf_event.c | 482 ++++++++++++++++++++++++++++ > 4 files changed, 571 insertions(+), 4 deletions(-) > create mode 100644 arch/riscv/kernel/perf_event.c > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index c22ebe08e902..90d9c8e50377 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -203,6 +203,19 @@ config RISCV_ISA_C > config RISCV_ISA_A > def_bool y > > +menu "supported PMU type" > + depends on PERF_EVENTS > + > +config RISCV_BASE_PMU > + bool "Base Performance Monitoring Unit" > + def_bool y > + help > + A base PMU that serves as a reference implementation and has limited > + feature of perf. It can run on any RISC-V machines so serves as the > + fallback, but this option can also be disable to reduce kernel size. > + > +endmenu > + > endmenu > > menu "Kernel type" > diff --git a/arch/riscv/include/asm/perf_event.h b/arch/riscv/include/asm/perf_event.h > index e13d2ff29e83..0e638a0c3feb 100644 > --- a/arch/riscv/include/asm/perf_event.h > +++ b/arch/riscv/include/asm/perf_event.h > @@ -1,13 +1,84 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > /* > * Copyright (C) 2018 SiFive > + * Copyright (C) 2018 Andes Technology Corporation > * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public Licence > - * as published by the Free Software Foundation; either version > - * 2 of the Licence, or (at your option) any later version. > */ > > #ifndef _ASM_RISCV_PERF_EVENT_H > #define _ASM_RISCV_PERF_EVENT_H > > +#include <linux/perf_event.h> > +#include <linux/ptrace.h> > + > +#define RISCV_BASE_COUNTERS 2 > + > +/* > + * The RISCV_MAX_COUNTERS parameter should be specified. > + */ > + > +#ifdef CONFIG_RISCV_BASE_PMU > +#define RISCV_MAX_COUNTERS 2 > +#endif > + > +#ifndef RISCV_MAX_COUNTERS > +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU." > +#endif > + > +/* > + * These are the indexes of bits in counteren register *minus* 1, > + * except for cycle. It would be coherent if it can directly mapped > + * to counteren bit definition, but there is a *time* register at > + * counteren[1]. Per-cpu structure is scarce resource here. > + * > + * According to the spec, an implementation can support counter up to > + * mhpmcounter31, but many high-end processors has at most 6 general > + * PMCs, we give the definition to MHPMCOUNTER8 here. > + */ > +#define RISCV_PMU_CYCLE 0 > +#define RISCV_PMU_INSTRET 1 > +#define RISCV_PMU_MHPMCOUNTER3 2 > +#define RISCV_PMU_MHPMCOUNTER4 3 > +#define RISCV_PMU_MHPMCOUNTER5 4 > +#define RISCV_PMU_MHPMCOUNTER6 5 > +#define RISCV_PMU_MHPMCOUNTER7 6 > +#define RISCV_PMU_MHPMCOUNTER8 7 > + > +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) > + > +struct cpu_hw_events { > + /* # currently enabled events*/ > + int n_events; > + /* currently enabled events */ > + struct perf_event *events[RISCV_MAX_COUNTERS]; > + /* vendor-defined PMU data */ > + void *platform; > +}; > + > +struct riscv_pmu { > + struct pmu *pmu; > + > + /* generic hw/cache events table */ > + const int *hw_events; > + const int (*cache_events)[PERF_COUNT_HW_CACHE_MAX] > + [PERF_COUNT_HW_CACHE_OP_MAX] > + [PERF_COUNT_HW_CACHE_RESULT_MAX]; > + /* method used to map hw/cache events */ > + int (*map_hw_event)(u64 config); > + int (*map_cache_event)(u64 config); > + > + /* max generic hw events in map */ > + int max_events; > + /* number total counters, 2(base) + x(general) */ > + int num_counters; > + /* the width of the counter */ > + int counter_width; > + > + /* vendor-defined PMU features */ > + void *platform; > + > + irqreturn_t (*handle_irq)(int irq_num, void *dev); > + int irq; > +}; > + > #endif /* _ASM_RISCV_PERF_EVENT_H */ > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile > index ffa439d4a364..f50d19816757 100644 > --- a/arch/riscv/kernel/Makefile > +++ b/arch/riscv/kernel/Makefile > @@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o > obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o > +obj-$(CONFIG_PERF_EVENTS) += perf_event.o > > clean: > diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_event.c > new file mode 100644 > index 000000000000..ba3192afc470 > --- /dev/null > +++ b/arch/riscv/kernel/perf_event.c > @@ -0,0 +1,482 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de> > + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar > + * Copyright (C) 2009 Jaswinder Singh Rajput > + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter > + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra > + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com> > + * Copyright (C) 2009 Google, Inc., Stephane Eranian > + * Copyright 2014 Tilera Corporation. All Rights Reserved. > + * Copyright (C) 2018 Andes Technology Corporation > + * > + * Perf_events support for RISC-V platforms. > + * > + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough > + * functionality for perf event to fully work, this file provides > + * the very basic framework only. > + * > + * For platform portings, please check Documentations/riscv/pmu.txt. > + * > + * The Copyright line includes x86 and tile ones. > + */ > + > +#include <linux/kprobes.h> > +#include <linux/kernel.h> > +#include <linux/kdebug.h> > +#include <linux/mutex.h> > +#include <linux/bitmap.h> > +#include <linux/irq.h> > +#include <linux/interrupt.h> > +#include <linux/perf_event.h> > +#include <linux/atomic.h> > +#include <linux/of.h> > +#include <asm/perf_event.h> > + > +static const struct riscv_pmu *riscv_pmu __read_mostly; > +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events); > + > +/* > + * Hardware & cache maps and their methods > + */ > + > +static const int riscv_hw_event_map[] = { > + [PERF_COUNT_HW_CPU_CYCLES] = RISCV_PMU_CYCLE, > + [PERF_COUNT_HW_INSTRUCTIONS] = RISCV_PMU_INSTRET, > + [PERF_COUNT_HW_CACHE_REFERENCES] = RISCV_OP_UNSUPP, > + [PERF_COUNT_HW_CACHE_MISSES] = RISCV_OP_UNSUPP, > + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = RISCV_OP_UNSUPP, > + [PERF_COUNT_HW_BRANCH_MISSES] = RISCV_OP_UNSUPP, > + [PERF_COUNT_HW_BUS_CYCLES] = RISCV_OP_UNSUPP, > +}; > + > +#define C(x) PERF_COUNT_HW_CACHE_##x > +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX] > +[PERF_COUNT_HW_CACHE_OP_MAX] > +[PERF_COUNT_HW_CACHE_RESULT_MAX] = { > + [C(L1D)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + }, > + [C(L1I)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + }, > + [C(LL)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + }, > + [C(DTLB)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + }, > + [C(ITLB)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + }, > + [C(BPU)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP, > + [C(RESULT_MISS)] = RISCV_OP_UNSUPP, > + }, > + }, > +}; > + > +static int riscv_map_hw_event(u64 config) > +{ > + if (config >= riscv_pmu->max_events) > + return -EINVAL; > + > + return riscv_pmu->hw_events[config]; > +} > + > +int riscv_map_cache_decode(u64 config, unsigned int *type, > + unsigned int *op, unsigned int *result) > +{ > + return -ENOENT; > +} > + > +static int riscv_map_cache_event(u64 config) > +{ > + unsigned int type, op, result; > + int err = -ENOENT; > + int code; > + > + err = riscv_map_cache_decode(config, &type, &op, &result); > + if (!riscv_pmu->cache_events || err) > + return err; > + > + if (type >= PERF_COUNT_HW_CACHE_MAX || > + op >= PERF_COUNT_HW_CACHE_OP_MAX || > + result >= PERF_COUNT_HW_CACHE_RESULT_MAX) > + return -EINVAL; > + > + code = (*riscv_pmu->cache_events)[type][op][result]; > + if (code == RISCV_OP_UNSUPP) > + return -EINVAL; > + > + return code; > +} > + > +/* > + * Low-level functions: reading/writing counters > + */ > + > +static inline u64 read_counter(int idx) > +{ > + u64 val = 0; > + > + switch (idx) { > + case RISCV_PMU_CYCLE: > + val = csr_read(cycle); > + break; > + case RISCV_PMU_INSTRET: > + val = csr_read(instret); > + break; > + default: > + WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS); > + return -EINVAL; > + } > + > + return val; > +} > + > +static inline void write_counter(int idx, u64 value) > +{ > + /* currently not supported */ > + WARN_ON_ONCE(1); > +} > + > +/* > + * pmu->read: read and update the counter > + * > + * Other architectures' implementation often have a xxx_perf_event_update > + * routine, which can return counter values when called in the IRQ, but > + * return void when being called by the pmu->read method. > + */ > +static void riscv_pmu_read(struct perf_event *event) > +{ > + struct hw_perf_event *hwc = &event->hw; > + u64 prev_raw_count, new_raw_count; > + u64 oldval; > + int idx = hwc->idx; > + u64 delta; > + > + do { > + prev_raw_count = local64_read(&hwc->prev_count); > + new_raw_count = read_counter(idx); > + > + oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count, > + new_raw_count); > + } while (oldval != prev_raw_count); > + > + /* > + * delta is the value to update the counter we maintain in the kernel. > + */ > + delta = (new_raw_count - prev_raw_count) & > + ((1ULL << riscv_pmu->counter_width) - 1); > + local64_add(delta, &event->count); > + /* > + * Something like local64_sub(delta, &hwc->period_left) here is > + * needed if there is an interrupt for perf. > + */ > +} > + > +/* > + * State transition functions: > + * > + * stop()/start() & add()/del() > + */ > + > +/* > + * pmu->stop: stop the counter > + */ > +static void riscv_pmu_stop(struct perf_event *event, int flags) > +{ > + struct hw_perf_event *hwc = &event->hw; > + > + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); > + hwc->state |= PERF_HES_STOPPED; > + > + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) { > + riscv_pmu->pmu->read(event); > + hwc->state |= PERF_HES_UPTODATE; > + } > +} > + > +/* > + * pmu->start: start the event. > + */ > +static void riscv_pmu_start(struct perf_event *event, int flags) > +{ > + struct hw_perf_event *hwc = &event->hw; > + > + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) > + return; > + > + if (flags & PERF_EF_RELOAD) { > + WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE)); > + > + /* > + * Set the counter to the period to the next interrupt here, > + * if you have any. > + */ > + } > + > + hwc->state = 0; > + perf_event_update_userpage(event); > + > + /* > + * Since we cannot write to counters, this serves as an initialization > + * to the delta-mechanism in pmu->read(); otherwise, the delta would be > + * wrong when pmu->read is called for the first time. > + */ > + local64_set(&hwc->prev_count, read_counter(hwc->idx)); > +} > + > +/* > + * pmu->add: add the event to PMU. > + */ > +static int riscv_pmu_add(struct perf_event *event, int flags) > +{ > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > + struct hw_perf_event *hwc = &event->hw; > + > + if (cpuc->n_events == riscv_pmu->num_counters) > + return -ENOSPC; > + > + /* > + * We don't have general conunters, so no binding-event-to-counter > + * process here. > + * > + * Indexing using hwc->config generally not works, since config may > + * contain extra information, but here the only info we have in > + * hwc->config is the event index. > + */ > + hwc->idx = hwc->config; > + cpuc->events[hwc->idx] = event; > + cpuc->n_events++; > + > + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED; > + > + if (flags & PERF_EF_START) > + riscv_pmu->pmu->start(event, PERF_EF_RELOAD); > + > + return 0; > +} > + > +/* > + * pmu->del: delete the event from PMU. > + */ > +static void riscv_pmu_del(struct perf_event *event, int flags) > +{ > + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); > + struct hw_perf_event *hwc = &event->hw; > + > + cpuc->events[hwc->idx] = NULL; > + cpuc->n_events--; > + riscv_pmu->pmu->stop(event, PERF_EF_UPDATE); > + perf_event_update_userpage(event); > +} > + > +/* > + * Interrupt: a skeletion for reference. > + */ > + > +static DEFINE_MUTEX(pmc_reserve_mutex); > + > +irqreturn_t riscv_base_pmu_handle_irq(int irq_num, void *dev) > +{ > + return IRQ_NONE; > +} > + > +static int reserve_pmc_hardware(void) > +{ > + int err = 0; > + > + mutex_lock(&pmc_reserve_mutex); > + if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) { > + err = request_irq(riscv_pmu->irq, riscv_pmu->handle_irq, > + IRQF_PERCPU, "riscv-base-perf", NULL); > + } > + mutex_unlock(&pmc_reserve_mutex); > + > + return err; > +} > + > +void release_pmc_hardware(void) > +{ > + mutex_lock(&pmc_reserve_mutex); > + if (riscv_pmu->irq >=0) { > + free_irq(riscv_pmu->irq, NULL); > + } > + mutex_unlock(&pmc_reserve_mutex); > +} > + > +/* > + * Event Initialization/Finalization > + */ > + > +static atomic_t riscv_active_events = ATOMIC_INIT(0); > + > +static void riscv_event_destroy(struct perf_event *event) > +{ > + if (atomic_dec_return(&riscv_active_events) == 0) > + release_pmc_hardware(); > +} > + > +static int riscv_event_init(struct perf_event *event) > +{ > + struct perf_event_attr *attr = &event->attr; > + struct hw_perf_event *hwc = &event->hw; > + int err; > + int code; > + > + if (atomic_inc_return(&riscv_active_events) == 1) { > + err = reserve_pmc_hardware(); > + > + if (err) { > + pr_warn("PMC hardware not available\n"); > + atomic_dec(&riscv_active_events); > + return -EBUSY; > + } > + } > + > + switch (event->attr.type) { > + case PERF_TYPE_HARDWARE: > + code = riscv_pmu->map_hw_event(attr->config); > + break; > + case PERF_TYPE_HW_CACHE: > + code = riscv_pmu->map_cache_event(attr->config); > + break; > + case PERF_TYPE_RAW: > + return -EOPNOTSUPP; > + default: > + return -ENOENT; > + } > + > + event->destroy = riscv_event_destroy; > + if (code < 0) { > + event->destroy(event); > + return code; > + } > + > + /* > + * idx is set to -1 because the index of a general event should not be > + * decided until binding to some counter in pmu->add(). > + * > + * But since we don't have such support, later in pmu->add(), we just > + * use hwc->config as the index instead. > + */ > + hwc->config = code; > + hwc->idx = -1; > + > + return 0; > +} > + > +/* > + * Initialization > + */ > + > +static struct pmu min_pmu = { > + .name = "riscv-base", > + .event_init = riscv_event_init, > + .add = riscv_pmu_add, > + .del = riscv_pmu_del, > + .start = riscv_pmu_start, > + .stop = riscv_pmu_stop, > + .read = riscv_pmu_read, > +}; > + > +static const struct riscv_pmu riscv_base_pmu = { > + .pmu = &min_pmu, > + .max_events = ARRAY_SIZE(riscv_hw_event_map), > + .map_hw_event = riscv_map_hw_event, > + .hw_events = riscv_hw_event_map, > + .map_cache_event = riscv_map_cache_event, > + .cache_events = &riscv_cache_event_map, > + .counter_width = 63, > + .num_counters = RISCV_BASE_COUNTERS + 0, > + .handle_irq = &riscv_base_pmu_handle_irq, > + > + /* This means this PMU has no IRQ. */ > + .irq = -1, > +}; > + > +static const struct of_device_id riscv_pmu_of_ids[] = { > + {.compatible = "riscv,base-pmu", .data = &riscv_base_pmu}, > + { /* sentinel value */ } > +}; > + > +int __init init_hw_perf_events(void) > +{ > + struct device_node *node = of_find_node_by_type(NULL, "pmu"); > + const struct of_device_id *of_id; > + > + if (node && (of_id = of_match_node(riscv_pmu_of_ids, node))) > + riscv_pmu = of_id->data; > + else > + riscv_pmu = &riscv_base_pmu; > + > + perf_pmu_register(riscv_pmu->pmu, "cpu", PERF_TYPE_RAW); > + return 0; > +} > +arch_initcall(init_hw_perf_events); > Some check patch errors. ERROR: spaces required around that '>=' (ctx:WxV) #517: FILE: arch/riscv/kernel/perf_event.c:356: + if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) { ^ ERROR: spaces required around that '>=' (ctx:WxV) #529: FILE: arch/riscv/kernel/perf_event.c:368: + if (riscv_pmu->irq >=0) { ^ WARNING: braces {} are not necessary for single statement blocks #529: FILE: arch/riscv/kernel/perf_event.c:368: + if (riscv_pmu->irq >=0) { + free_irq(riscv_pmu->irq, NULL); + } WARNING: DT compatible string "riscv,base-pmu" appears un-documented -- check ./Documentation/devicetree/bindings/ #626: FILE: arch/riscv/kernel/perf_event.c:465: + {.compatible = "riscv,base-pmu", .data = &riscv_base_pmu}, ERROR: trailing whitespace #634: FILE: arch/riscv/kernel/perf_event.c:473: +^I$ ERROR: do not use assignment in if condition #635: FILE: arch/riscv/kernel/perf_event.c:474: + if (node && (of_id = of_match_node(riscv_pmu_of_ids, node))) total: 4 errors, 3 warnings, 595 lines checked Regards, Atish -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v4 1/2] perf: riscv: preliminary RISC-V support 2018-04-19 19:46 ` Atish Patra @ 2018-04-19 23:24 ` Alan Kao 0 siblings, 0 replies; 5+ messages in thread From: Alan Kao @ 2018-04-19 23:24 UTC (permalink / raw) To: Atish Patra Cc: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Nick Hu, Greentime Hu On Thu, Apr 19, 2018 at 12:46:24PM -0700, Atish Patra wrote: > On 4/17/18 7:13 PM, Alan Kao wrote: > >This patch provide a basic PMU, riscv_base_pmu, which supports two > >general hardware event, instructions and cycles. Furthermore, this > >PMU serves as a reference implementation to ease the portings in > >the future. > > > >riscv_base_pmu should be able to run on any RISC-V machine that > >conforms to the Priv-Spec. Note that the latest qemu model hasn't > >fully support a proper behavior of Priv-Spec 1.10 yet, but work > >around should be easy with very small fixes. Please check > >https://github.com/riscv/riscv-qemu/pull/115 for future updates. > > > >Cc: Nick Hu <nickhu@andestech.com> > >Cc: Greentime Hu <greentime@andestech.com> > >Signed-off-by: Alan Kao <alankao@andestech.com> > >--- > > arch/riscv/Kconfig | 13 + > > arch/riscv/include/asm/perf_event.h | 79 ++++- > > arch/riscv/kernel/Makefile | 1 + > > arch/riscv/kernel/perf_event.c | 482 ++++++++++++++++++++++++++++ > > 4 files changed, 571 insertions(+), 4 deletions(-) > > create mode 100644 arch/riscv/kernel/perf_event.c > > > >diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > >index c22ebe08e902..90d9c8e50377 100644 > >--- a/arch/riscv/Kconfig > >+++ b/arch/riscv/Kconfig > Some check patch errors. > > ERROR: spaces required around that '>=' (ctx:WxV) > #517: FILE: arch/riscv/kernel/perf_event.c:356: > + if (riscv_pmu->irq >=0 && riscv_pmu->handle_irq) { > ^ > > ERROR: spaces required around that '>=' (ctx:WxV) > #529: FILE: arch/riscv/kernel/perf_event.c:368: > + if (riscv_pmu->irq >=0) { > ^ > > WARNING: braces {} are not necessary for single statement blocks > #529: FILE: arch/riscv/kernel/perf_event.c:368: > + if (riscv_pmu->irq >=0) { > + free_irq(riscv_pmu->irq, NULL); > + } > > WARNING: DT compatible string "riscv,base-pmu" appears un-documented -- > check ./Documentation/devicetree/bindings/ > #626: FILE: arch/riscv/kernel/perf_event.c:465: > + {.compatible = "riscv,base-pmu", .data = &riscv_base_pmu}, > > ERROR: trailing whitespace > #634: FILE: arch/riscv/kernel/perf_event.c:473: > +^I$ > > ERROR: do not use assignment in if condition > #635: FILE: arch/riscv/kernel/perf_event.c:474: > + if (node && (of_id = of_match_node(riscv_pmu_of_ids, node))) > > total: 4 errors, 3 warnings, 595 lines checked > > > Regards, > Atish Thanks for pointing this out. I happened to develop this patchset on a machine without the post-commit settings. A new version is ready. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide 2018-04-18 2:12 [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao 2018-04-18 2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao @ 2018-04-18 2:12 ` Alan Kao 1 sibling, 0 replies; 5+ messages in thread From: Alan Kao @ 2018-04-18 2:12 UTC (permalink / raw) To: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa, Namhyung Kim, Alex Solomatnikov, Jonathan Corbet, linux-riscv, linux-doc, linux-kernel Cc: Alan Kao, Nick Hu, Greentime Hu Reviewed-by: Alex Solomatnikov <sols@sifive.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <greentime@andestech.com> Signed-off-by: Alan Kao <alankao@andestech.com> --- Documentation/riscv/pmu.txt | 249 ++++++++++++++++++++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 Documentation/riscv/pmu.txt diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt new file mode 100644 index 000000000000..b29f03a6d82f --- /dev/null +++ b/Documentation/riscv/pmu.txt @@ -0,0 +1,249 @@ +Supporting PMUs on RISC-V platforms +========================================== +Alan Kao <alankao@andestech.com>, Mar 2018 + +Introduction +------------ + +As of this writing, perf_event-related features mentioned in The RISC-V ISA +Privileged Version 1.10 are as follows: +(please check the manual for more details) + +* [m|s]counteren +* mcycle[h], cycle[h] +* minstret[h], instret[h] +* mhpeventx, mhpcounterx[h] + +With such function set only, porting perf would require a lot of work, due to +the lack of the following general architectural performance monitoring features: + +* Enabling/Disabling counters + Counters are just free-running all the time in our case. +* Interrupt caused by counter overflow + No such feature in the spec. +* Interrupt indicator + It is not possible to have many interrupt ports for all counters, so an + interrupt indicator is required for software to tell which counter has + just overflowed. +* Writing to counters + There will be an SBI to support this since the kernel cannot modify the + counters [1]. Alternatively, some vendor considers to implement + hardware-extension for M-S-U model machines to write counters directly. + +This document aims to provide developers a quick guide on supporting their +PMUs in the kernel. The following sections briefly explain perf' mechanism +and todos. + +You may check previous discussions here [1][2]. Also, it might be helpful +to check the appendix for related kernel structures. + + +1. Initialization +----------------- + +*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains +various methods according to perf's internal convention and PMU-specific +parameters. One should declare such instance to represent the PMU. By default, +*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very +basic support to a baseline QEMU model. + +Then he/she can either assign the instance's pointer to *riscv_pmu* so that +the minimal and already-implemented logic can be leveraged, or invent his/her +own *riscv_init_platform_pmu* implementation. + +In other words, existing sources of *riscv_base_pmu* merely provide a +reference implementation. Developers can flexibly decide how many parts they +can leverage, and in the most extreme case, they can customize every function +according to their needs. + + +2. Event Initialization +----------------------- + +When a user launches a perf command to monitor some events, it is first +interpreted by the userspace perf tool into multiple *perf_event_open* +system calls, and then each of them calls to the body of *event_init* +member function that was assigned in the previous step. In *riscv_base_pmu*'s +case, it is *riscv_event_init*. + +The main purpose of this function is to translate the event provided by user +into bitmap, so that HW-related control registers or counters can directly be +manipulated. The translation is based on the mappings and methods provided in +*riscv_pmu*. + +Note that some features can be done in this stage as well: + +(1) interrupt setting, which is stated in the next section; +(2) privilege level setting (user space only, kernel space only, both); +(3) destructor setting. Normally it is sufficient to apply *riscv_destroy_event*; +(4) tweaks for non-sampling events, which will be utilized by functions such as +*perf_adjust_period*, usually something like the follows: + +if (!is_sampling_event(event)) { + hwc->sample_period = x86_pmu.max_period; + hwc->last_period = hwc->sample_period; + local64_set(&hwc->period_left, hwc->sample_period); +} + +In the case of *riscv_base_pmu*, only (3) is provided for now. + + +3. Interrupt +------------ + +3.1. Interrupt Initialization + +This often occurs at the beginning of the *event_init* method. In common +practice, this should be a code segment like + +int x86_reserve_hardware(void) +{ + int err = 0; + + if (!atomic_inc_not_zero(&pmc_refcount)) { + mutex_lock(&pmc_reserve_mutex); + if (atomic_read(&pmc_refcount) == 0) { + if (!reserve_pmc_hardware()) + err = -EBUSY; + else + reserve_ds_buffers(); + } + if (!err) + atomic_inc(&pmc_refcount); + mutex_unlock(&pmc_reserve_mutex); + } + + return err; +} + +And the magic is in *reserve_pmc_hardware*, which usually does atomic +operations to make implemented IRQ accessible from some global function pointer. +*release_pmc_hardware* serves the opposite purpose, and it is used in event +destructors mentioned in previous section. + +(Note: From the implementations in all the architectures, the *reserve/release* +pair are always IRQ settings, so the *pmc_hardware* seems somehow misleading. +It does NOT deal with the binding between an event and a physical counter, +which will be introduced in the next section.) + +3.2. IRQ Structure + +Basically, a IRQ runs the following pseudo code: + +for each hardware counter that triggered this overflow + + get the event of this counter + + // following two steps are defined as *read()*, + // check the section Reading/Writing Counters for details. + count the delta value since previous interrupt + update the event->count (# event occurs) by adding delta, and + event->hw.period_left by subtracting delta + + if the event overflows + sample data + set the counter appropriately for the next overflow + + if the event overflows again + too frequently, throttle this event + fi + fi + +end for + +However as of this writing, none of the RISC-V implementations have designed an +interrupt for perf, so the details are to be completed in the future. + +4. Reading/Writing Counters +--------------------------- + +They seem symmetric but perf treats them quite differently. For reading, there +is a *read* interface in *struct pmu*, but it serves more than just reading. +According to the context, the *read* function not only reads the content of the +counter (event->count), but also updates the left period to the next interrupt +(event->hw.period_left). + +But the core of perf does not need direct write to counters. Writing counters +is hidden behind the abstraction of 1) *pmu->start*, literally start counting so one +has to set the counter to a good value for the next interrupt; 2) inside the IRQ +it should set the counter to the same resonable value. + +Reading is not a problem in RISC-V but writing would need some effort, since +counters are not allowed to be written by S-mode. + + +5. add()/del()/start()/stop() +----------------------------- + +Basic idea: add()/del() adds/deletes events to/from a PMU, and start()/stop() +starts/stop the counter of some event in the PMU. All of them take the same +arguments: *struct perf_event *event* and *int flag*. + +Consider perf as a state machine, then you will find that these functions serve +as the state transition process between those states. +Three states (event->hw.state) are defined: + +* PERF_HES_STOPPED: the counter is stopped +* PERF_HES_UPTODATE: the event->count is up-to-date +* PERF_HES_ARCH: arch-dependent usage ... we don't need this for now + +A normal flow of these state transitions are as follows: + +* A user launches a perf event, resulting in calling to *event_init*. +* When being context-switched in, *add* is called by the perf core, with a flag + PERF_EF_START, which means that the event should be started after it is added. + At this stage, a general event is bound to a physical counter, if any. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, because it is now + stopped, and the (software) event count does not need updating. +** *start* is then called, and the counter is enabled. + With flag PERF_EF_RELOAD, it writes an appropriate value to the counter (check + previous section for detail). + Nothing is written if the flag does not contain PERF_EF_RELOAD. + The state now is reset to none, because it is neither stopped nor updated + (the counting already started) +* When being context-switched out, *del* is called. It then checks out all the + events in the PMU and calls *stop* to update their counts. +** *stop* is called by *del* + and the perf core with flag PERF_EF_UPDATE, and it often shares the same + subroutine as *read* with the same logic. + The state changes to PERF_HES_STOPPED and PERF_HES_UPTODATE, again. + +** Life cycle of these two pairs: *add* and *del* are called repeatedly as + tasks switch in-and-out; *start* and *stop* is also called when the perf core + needs a quick stop-and-start, for instance, when the interrupt period is being + adjusted. + +Current implementation is sufficient for now and can be easily extended to +features in the future. + +A. Related Structures +--------------------- + +* struct pmu: include/linux/perf_event.h +* struct riscv_pmu: arch/riscv/include/asm/perf_event.h + + Both structures are designed to be read-only. + + *struct pmu* defines some function pointer interfaces, and most of them take +*struct perf_event* as a main argument, dealing with perf events according to +perf's internal state machine (check kernel/events/core.c for details). + + *struct riscv_pmu* defines PMU-specific parameters. The naming follows the +convention of all other architectures. + +* struct perf_event: include/linux/perf_event.h +* struct hw_perf_event + + The generic structure that represents perf events, and the hardware-related +details. + +* struct riscv_hw_events: arch/riscv/include/asm/perf_event.h + + The structure that holds the status of events, has two fixed members: +the number of events and the array of the events. + +References +---------- + +[1] https://github.com/riscv/riscv-linux/pull/124 +[2] https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/f19TmCNP6yA -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-04-19 23:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-04-18 2:12 [PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V Alan Kao 2018-04-18 2:12 ` [PATCH v4 1/2] perf: riscv: preliminary RISC-V support Alan Kao 2018-04-19 19:46 ` Atish Patra 2018-04-19 23:24 ` Alan Kao 2018-04-18 2:12 ` [PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide Alan Kao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).