linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] perf: implement AMD IBS (v2)
@ 2010-05-19 21:20 Robert Richter
  2010-05-19 21:20 ` [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample Robert Richter
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML

This is an updated version of my patch set that introduces AMD IBS for
perf.

Changes made:

* rebased to latest tip/perf/core
* dropped the approach using a model_spec flag
* introduced an attribute to specify a raw hardware event type
* renamed *_SIZE macros to *_REG_COUNT
* introduced AMD_IBS_EVENT_CONSTRAINT() macro
* ease ibs initialization code
* made code cpu hotplug capable (using hotplug hooks)
* introduced ibs_map[] to better describe an ibs event which eases
  interfaces of ibs functions
* implemented support for setup using sample_period parameter
* fixed irq statistic counter
* adjust raw sample size to end the buffer at a 64 bit boundary

See also the diff below.

-Robert

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 9e70f20..73d680c 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -388,6 +388,11 @@ __hw_perf_event_init(struct perf_event *event)
 	} else if (PERF_TYPE_HW_CACHE == event->attr.type) {
 		mapping = armpmu_map_cache_event(event->attr.config);
 	} else if (PERF_TYPE_RAW == event->attr.type) {
+		if (event->attr.raw_type) {
+			pr_debug("invalid raw type %x\n",
+				 event->attr.raw_type);
+			return -EINVAL;
+		}
 		mapping = armpmu->raw_event(event->attr.config);
 	} else {
 		pr_debug("event type %x not supported\n", event->attr.type);
diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 0b1f0f2..c8fb3cf 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -1036,6 +1036,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
 			return ERR_PTR(err);
 		break;
 	case PERF_TYPE_RAW:
+		if (event->attr.raw_type)
+			return ERR_PTR(-EINVAL);
 		ev = event->attr.config;
 		break;
 	default:
@@ -1044,9 +1046,6 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
 	event->hw.config_base = ev;
 	event->hw.idx = 0;
 
-	if (attr->model_spec)
-		return ERR_PTR(-EOPNOTSUPP);
-
 	/*
 	 * If we are not running on a hypervisor, force the
 	 * exclude_hv bit to 0 so that we don't care what
diff --git a/arch/powerpc/kernel/perf_event_fsl_emb.c b/arch/powerpc/kernel/perf_event_fsl_emb.c
index 369872f..7547e96 100644
--- a/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ b/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -452,6 +452,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
 		break;
 
 	case PERF_TYPE_RAW:
+		if (event->attr.raw_type)
+			return ERR_PTR(-EINVAL);
 		ev = event->attr.config;
 		break;
 
diff --git a/arch/sh/kernel/perf_event.c b/arch/sh/kernel/perf_event.c
index eef545a..482cf48 100644
--- a/arch/sh/kernel/perf_event.c
+++ b/arch/sh/kernel/perf_event.c
@@ -109,9 +109,6 @@ static int __hw_perf_event_init(struct perf_event *event)
 	if (!sh_pmu_initialized())
 		return -ENODEV;
 
-	if (attr->model_spec)
-		return -EOPNOTSUPP;
-
 	/*
 	 * All of the on-chip counters are "limited", in that they have
 	 * no interrupts, and are therefore unable to do sampling without
@@ -145,6 +142,8 @@ static int __hw_perf_event_init(struct perf_event *event)
 
 	switch (attr->type) {
 	case PERF_TYPE_RAW:
+		if (attr->raw_type)
+			return -EINVAL;
 		config = attr->config & sh_pmu->raw_event_mask;
 		break;
 	case PERF_TYPE_HW_CACHE:
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index b3ae28e..cf4ce26 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1047,9 +1047,6 @@ static int __hw_perf_event_init(struct perf_event *event)
 	} else
 		return -EOPNOTSUPP;
 
-	if (attr->model_spec)
-		return -EOPNOTSUPP;
-
 	/* We save the enable bits in the config_base.  */
 	hwc->config_base = sparc_pmu->irq_bit;
 	if (!attr->exclude_user)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index a7e4aa5..8b9929f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -113,7 +113,7 @@
 #define MSR_AMD64_IBSFETCHCTL		0xc0011030
 #define MSR_AMD64_IBSFETCHLINAD		0xc0011031
 #define MSR_AMD64_IBSFETCHPHYSAD	0xc0011032
-#define MSR_AMD64_IBSFETCH_SIZE		3
+#define MSR_AMD64_IBSFETCH_REG_COUNT	3
 #define MSR_AMD64_IBSOPCTL		0xc0011033
 #define MSR_AMD64_IBSOPRIP		0xc0011034
 #define MSR_AMD64_IBSOPDATA		0xc0011035
@@ -121,9 +121,9 @@
 #define MSR_AMD64_IBSOPDATA3		0xc0011037
 #define MSR_AMD64_IBSDCLINAD		0xc0011038
 #define MSR_AMD64_IBSDCPHYSAD		0xc0011039
-#define MSR_AMD64_IBSOP_SIZE		7
+#define MSR_AMD64_IBSOP_REG_COUNT	7
 #define MSR_AMD64_IBSCTL		0xc001103a
-#define MSR_AMD64_IBS_SIZE_MAX		MSR_AMD64_IBSOP_SIZE
+#define MSR_AMD64_IBS_REG_COUNT_MAX	MSR_AMD64_IBSOP_REG_COUNT
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index e787d01..dace4e2 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -44,14 +44,15 @@
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)
+#define AMD64_NUM_COUNTERS				4
 
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		      0x3c
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX			 0
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX		0
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \
 		(1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX))
 
-#define ARCH_PERFMON_BRANCH_MISSES_RETIRED			 6
+#define ARCH_PERFMON_BRANCH_MISSES_RETIRED		6
 
 /*
  * Intel "Architectural Performance Monitoring" CPUID
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3f3f0ed..fe7ba91 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -148,6 +148,9 @@ struct cpu_hw_events {
 #define INTEL_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT)
 
+#define AMD_IBS_EVENT_CONSTRAINT(idx)	\
+	__EVENT_CONSTRAINT(0, 1ULL << (idx), 0, AMD64_NUM_COUNTERS + 1)
+
 /*
  * Constraint on the Event code + UMask + fixed-mask
  *
@@ -186,24 +189,12 @@ union perf_capabilities {
 };
 
 /*
- * Model specific hardware events
- *
- * With the attr.model_spec bit set we can setup hardware events
- * others than generic performance counters. A special PMU 64 bit
- * config value can be passed through the perf_event interface. The
- * concept of PMU model-specific arguments was practiced already in
- * Perfmon2. The type of event (8 bits) is determinded from the config
- * value too, bit 32-39 are reserved for this.
+ * Raw hardware event types
  */
-#define MODEL_SPEC_TYPE_IBS_FETCH	0
-#define MODEL_SPEC_TYPE_IBS_OP		1
-
-#define MODEL_SPEC_TYPE_MASK		(0xFFULL << 32)
+#define PERF_RAW_IBS_FETCH	1
+#define PERF_RAW_IBS_OP		2
 
-static inline int get_model_spec_type(u64 config)
-{
-	return (config & MODEL_SPEC_TYPE_MASK) >> 32;
-}
+#define PERF_RAW_IBS_BASE	PERF_RAW_IBS_FETCH
 
 /*
  * struct x86_pmu - generic x86 pmu
@@ -405,15 +396,12 @@ static void release_pmc_hardware(void) {}
 
 static int reserve_ds_buffers(void);
 static void release_ds_buffers(void);
-static int reserve_ibs_hardware(void);
-static void release_ibs_hardware(void);
 
 static void hw_perf_event_destroy(struct perf_event *event)
 {
 	if (atomic_dec_and_mutex_lock(&active_events, &pmc_reserve_mutex)) {
 		release_pmc_hardware();
 		release_ds_buffers();
-		release_ibs_hardware();
 		mutex_unlock(&pmc_reserve_mutex);
 	}
 }
@@ -462,9 +450,6 @@ static int x86_setup_perfctr(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	u64 config;
 
-	if (attr->model_spec)
-		return -EOPNOTSUPP;
-
 	if (!hwc->sample_period) {
 		hwc->sample_period = x86_pmu.max_period;
 		hwc->last_period = hwc->sample_period;
@@ -480,8 +465,11 @@ static int x86_setup_perfctr(struct perf_event *event)
 			return -EOPNOTSUPP;
 	}
 
-	if (attr->type == PERF_TYPE_RAW)
+	if (attr->type == PERF_TYPE_RAW) {
+		if (attr->raw_type)
+			return -EINVAL;
 		return 0;
+	}
 
 	if (attr->type == PERF_TYPE_HW_CACHE)
 		return set_ext_hw_attr(hwc, attr);
@@ -556,6 +544,8 @@ static int x86_pmu_hw_config(struct perf_event *event)
 	return x86_setup_perfctr(event);
 }
 
+static inline void init_ibs_nmi(void);
+
 /*
  * Setup the hardware configuration for a given attr_type
  */
@@ -577,13 +567,8 @@ static int __hw_perf_event_init(struct perf_event *event)
 				if (err)
 					release_pmc_hardware();
 			}
-			if (!err) {
-				err = reserve_ibs_hardware();
-				if (err) {
-					release_ds_buffers();
-					release_pmc_hardware();
-				}
-			}
+			if (!err)
+				init_ibs_nmi();
 		}
 		if (!err)
 			atomic_inc(&active_events);
@@ -1324,6 +1309,7 @@ static void __init pmu_check_apic(void)
 		return;
 
 	x86_pmu.apic = 0;
+	x86_pmu.ibs = 0;
 	pr_info("no APIC, boot with the \"lapic\" boot parameter to force-enable it.\n");
 	pr_info("no hardware sampling interrupt available.\n");
 }
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 78b0b34..a083174 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -2,9 +2,42 @@
 
 #include <linux/pci.h>
 
+#define IBS_FETCH_MAP_IDX	(PERF_RAW_IBS_FETCH - PERF_RAW_IBS_BASE)
+#define IBS_OP_MAP_IDX		(PERF_RAW_IBS_OP - PERF_RAW_IBS_BASE)
+
 #define IBS_FETCH_CONFIG_MASK	(IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
-#define IBS_OP_CONFIG_MASK	(IBS_OP_CNT_CTL | IBS_OP_MAX_CNT)
-#define AMD64_NUM_COUNTERS	4
+#define IBS_OP_CONFIG_MASK	IBS_OP_MAX_CNT
+
+struct ibs_map {
+	int idx;
+	u64 cnt_mask;
+	u64 sample_valid;
+	u64 enable;
+	u64 valid_mask;
+	unsigned int msr;
+	int reg_count;
+};
+
+static struct ibs_map ibs_map[] = {
+	[IBS_FETCH_MAP_IDX] = {
+		.idx		= X86_PMC_IDX_SPECIAL_IBS_FETCH,
+		.cnt_mask	= IBS_FETCH_MAX_CNT,
+		.sample_valid	= IBS_FETCH_VAL,
+		.enable		= IBS_FETCH_ENABLE,
+		.valid_mask	= IBS_FETCH_CONFIG_MASK,
+		.msr		= MSR_AMD64_IBSFETCHCTL,
+		.reg_count	= MSR_AMD64_IBSFETCH_REG_COUNT,
+	},
+	[IBS_OP_MAP_IDX] = {
+		.idx		= X86_PMC_IDX_SPECIAL_IBS_OP,
+		.cnt_mask	= IBS_OP_MAX_CNT,
+		.sample_valid	= IBS_OP_VAL,
+		.enable		= IBS_OP_ENABLE,
+		.valid_mask	= IBS_OP_CONFIG_MASK,
+		.msr		= MSR_AMD64_IBSOPCTL,
+		.reg_count	= MSR_AMD64_IBSOP_REG_COUNT,
+	},
+};
 
 static DEFINE_RAW_SPINLOCK(amd_nb_lock);
 
@@ -116,28 +149,22 @@ static const u64 amd_perfmon_event_map[] =
 
 /* IBS - apic initialization, taken from oprofile, should be unified */
 
-static u8 ibs_eilvt_off;
-
-static inline void apic_init_ibs_nmi_per_cpu(void *arg)
-{
-	ibs_eilvt_off = setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_NMI, 0);
-}
-
-static inline void apic_clear_ibs_nmi_per_cpu(void *arg)
-{
-	setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_FIX, 1);
-}
+/*
+ * Currently there is no early pci ecs access implemented, so this
+ * can't be put into amd_pmu_init(). For now we initialize it in
+ * __hw_perf_event_init().
+ */
 
-static int init_ibs_nmi(void)
+static int __init_ibs_nmi(void)
 {
 #define IBSCTL_LVTOFFSETVAL		(1 << 8)
 #define IBSCTL				0x1cc
 	struct pci_dev *cpu_cfg;
 	int nodes;
 	u32 value = 0;
+	u8 ibs_eilvt_off;
 
-	/* per CPU setup */
-	on_each_cpu(apic_init_ibs_nmi_per_cpu, NULL, 1);
+	ibs_eilvt_off = setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_FIX, 1);
 
 	nodes = 0;
 	cpu_cfg = NULL;
@@ -167,36 +194,36 @@ static int init_ibs_nmi(void)
 	return 0;
 }
 
-/* uninitialize the APIC for the IBS interrupts if needed */
-static void clear_ibs_nmi(void)
-{
-	on_each_cpu(apic_clear_ibs_nmi_per_cpu, NULL, 1);
-}
-
-#else
-
-static inline int init_ibs_nmi(void) { return 1; }
-static inline void clear_ibs_nmi(void) { }
-
-#endif
-
-static int reserve_ibs_hardware(void)
+static inline void init_ibs_nmi(void)
 {
 	if (!x86_pmu.ibs)
-		return 0;
-	if (init_ibs_nmi())
+		return;
+
+	if (__init_ibs_nmi())
 		/* something went wrong, disable ibs */
 		x86_pmu.ibs = 0;
-	return 0;
 }
 
-static void release_ibs_hardware(void)
+static inline void apic_init_ibs(void)
 {
-	if (!x86_pmu.ibs)
-		return;
-	clear_ibs_nmi();
+	if (x86_pmu.ibs)
+		setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_NMI, 0);
 }
 
+static inline void apic_clear_ibs(void)
+{
+	if (x86_pmu.ibs)
+		setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_FIX, 1);
+}
+
+#else
+
+static inline void init_ibs_nmi(void) { }
+static inline void apic_init_ibs(void) { }
+static inline void apic_clear_ibs(void) { }
+
+#endif
+
 static inline void amd_pmu_disable_ibs(void)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -233,45 +260,56 @@ static inline void amd_pmu_enable_ibs(void)
 
 static int amd_pmu_ibs_config(struct perf_event *event)
 {
-	int type;
+	int map_idx;
+	u64 max_cnt, config;
+	struct ibs_map *map;
 
 	if (!x86_pmu.ibs)
 		return -ENODEV;
 
-	if (event->hw.sample_period)
-		/*
-		 * The usage of the sample period attribute to
-		 * calculate the IBS max count value is not yet
-		 * supported, the max count must be in the raw config
-		 * value.
-		 */
-		return -ENOSYS;
-
 	if (event->attr.type != PERF_TYPE_RAW)
 		/* only raw sample types are supported */
 		return -EINVAL;
 
-	type = get_model_spec_type(event->attr.config);
-	switch (type) {
-	case MODEL_SPEC_TYPE_IBS_FETCH:
-		event->hw.config = IBS_FETCH_CONFIG_MASK & event->attr.config;
-		event->hw.idx = X86_PMC_IDX_SPECIAL_IBS_FETCH;
-		/*
-		 * dirty hack, needed for __x86_pmu_enable_event(), we
-		 * should better change event->hw.config_base into
-		 * event->hw.config_msr that already includes the index
-		 */
-		event->hw.config_base = MSR_AMD64_IBSFETCHCTL - event->hw.idx;
-		break;
-	case MODEL_SPEC_TYPE_IBS_OP:
-		event->hw.config = IBS_OP_CONFIG_MASK & event->attr.config;
-		event->hw.idx = X86_PMC_IDX_SPECIAL_IBS_OP;
-		event->hw.config_base = MSR_AMD64_IBSOPCTL - event->hw.idx;
-		break;
-	default:
+	if (event->attr.raw_type < PERF_RAW_IBS_BASE)
+		return -ENODEV;
+	map_idx = event->attr.raw_type - PERF_RAW_IBS_BASE;
+	if (map_idx >= ARRAY_SIZE(ibs_map))
 		return -ENODEV;
+
+	map = &ibs_map[map_idx];
+	config = event->attr.config;
+	if (event->hw.sample_period) {
+		if (config & map->cnt_mask)
+			/* raw max_cnt may not be set */
+			return -EINVAL;
+		if (event->hw.sample_period & 0x0f)
+			/* lower 4 bits can not be set in ibs max cnt */
+			return -EINVAL;
+		max_cnt = event->hw.sample_period >> 4;
+		if (max_cnt & ~map->cnt_mask)
+			/* out of range */
+			return -EINVAL;
+		config |= max_cnt;
+	} else {
+		max_cnt = event->attr.config & map->cnt_mask;
 	}
 
+	if (!max_cnt)
+		return -EINVAL;
+
+	if (config & ~map->valid_mask)
+		return -EINVAL;
+
+	event->hw.config = config;
+	event->hw.idx = map->idx;
+	/*
+	 * dirty hack, needed for __x86_pmu_enable_event(), we
+	 * should better change event->hw.config_base into
+	 * event->hw.config_msr that already includes the index
+	 */
+	event->hw.config_base = map->msr - event->hw.idx;
+
 	return 0;
 }
 
@@ -283,30 +321,33 @@ static inline void __amd_pmu_enable_ibs_event(struct hw_perf_event *hwc)
 		__x86_pmu_enable_event(hwc, IBS_OP_ENABLE);
 }
 
-static int amd_pmu_check_ibs(int idx, unsigned int msr, u64 valid,
-			     u64 reenable, int size, struct pt_regs *iregs)
+static int amd_pmu_check_ibs(struct pt_regs *iregs, int map_idx)
 {
+	struct ibs_map *map = &ibs_map[map_idx];
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct perf_event *event = cpuc->events[idx];
+	struct perf_event *event = cpuc->events[map->idx];
 	struct perf_sample_data data;
 	struct perf_raw_record raw;
 	struct pt_regs regs;
-	u64 buffer[MSR_AMD64_IBS_SIZE_MAX];
-	u64 *buf = buffer;
+	u64 buffer[MSR_AMD64_IBS_REG_COUNT_MAX];
 	int i;
+	unsigned int msr;
+	u64 *buf;
 
-	if (!test_bit(idx, cpuc->active_mask))
+	if (!test_bit(map->idx, cpuc->active_mask))
 		return 0;
 
+	msr = map->msr;
+	buf = buffer;
 	rdmsrl(msr++, *buf);
-	if (!(*buf++ & valid))
+	if (!(*buf++ & map->sample_valid))
 		return 0;
 
 	perf_sample_data_init(&data, 0);
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-		for (i = 1; i < size; i++)
+		for (i = 1; i < map->reg_count; i++)
 			rdmsrl(msr++, *buf++);
-		raw.size = sizeof(u64) * size;
+		raw.size = sizeof(u32) + sizeof(u64) * map->reg_count;
 		raw.data = buffer;
 		data.raw = &raw;
 	}
@@ -316,7 +357,7 @@ static int amd_pmu_check_ibs(int idx, unsigned int msr, u64 valid,
 	if (perf_event_overflow(event, 1, &data, &regs))
 		x86_pmu_stop(event);
 	else
-		__x86_pmu_enable_event(&event->hw, reenable);
+		__x86_pmu_enable_event(&event->hw, map->enable);
 
 	return 1;
 }
@@ -331,16 +372,9 @@ static int amd_pmu_handle_irq(struct pt_regs *regs)
 		return handled;
 
 	handled2 = 0;
-	handled2 += amd_pmu_check_ibs(X86_PMC_IDX_SPECIAL_IBS_FETCH,
-				      MSR_AMD64_IBSFETCHCTL, IBS_FETCH_VAL,
-				      IBS_FETCH_ENABLE, MSR_AMD64_IBSFETCH_SIZE,
-				      regs);
-	handled2 += amd_pmu_check_ibs(X86_PMC_IDX_SPECIAL_IBS_OP,
-				      MSR_AMD64_IBSOPCTL, IBS_OP_VAL,
-				      IBS_OP_ENABLE, MSR_AMD64_IBSOP_SIZE,
-				      regs);
-
-	if (handled2)
+	handled2 += amd_pmu_check_ibs(regs, IBS_FETCH_MAP_IDX);
+	handled2 += amd_pmu_check_ibs(regs, IBS_OP_MAP_IDX);
+	if (!handled && handled2)
 		inc_irq_stat(apic_perf_irqs);
 
 	return (handled || handled2);
@@ -381,7 +415,7 @@ static int amd_pmu_hw_config(struct perf_event *event)
 {
 	int ret;
 
-	if (event->attr.model_spec)
+	if (event->attr.raw_type)
 		return amd_pmu_ibs_config(event);
 
 	ret = x86_pmu_hw_config(event);
@@ -392,6 +426,9 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	if (event->attr.raw_type)
+		return -EINVAL;
+
 	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
 
 	return 0;
@@ -407,10 +444,8 @@ static struct event_constraint amd_event_constraints[] =
 	 * than in the unconstrainted case to process ibs after the
 	 * generic counters in x86_schedule_events().
 	 */
-	__EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_SPECIAL_IBS_FETCH, 0,
-			   AMD64_NUM_COUNTERS + 1),
-	__EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_SPECIAL_IBS_OP, 0,
-			   AMD64_NUM_COUNTERS + 1),
+	AMD_IBS_EVENT_CONSTRAINT(X86_PMC_IDX_SPECIAL_IBS_FETCH),
+	AMD_IBS_EVENT_CONSTRAINT(X86_PMC_IDX_SPECIAL_IBS_OP),
 	EVENT_CONSTRAINT_END
 };
 
@@ -644,6 +679,8 @@ static void amd_pmu_cpu_starting(int cpu)
 	cpuc->amd_nb->refcnt++;
 
 	raw_spin_unlock(&amd_nb_lock);
+
+	apic_init_ibs();
 }
 
 static void amd_pmu_cpu_dead(int cpu)
@@ -667,6 +704,8 @@ static void amd_pmu_cpu_dead(int cpu)
 	}
 
 	raw_spin_unlock(&amd_nb_lock);
+
+	apic_clear_ibs();
 }
 
 static __initconst const struct x86_pmu amd_pmu = {
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 8a16205..dfbbe69 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -770,6 +770,9 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	if (event->attr.raw_type)
+		return -EINVAL;
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 87e1803..1001892 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -437,6 +437,11 @@ static int p4_hw_config(struct perf_event *event)
 		event->hw.config = p4_set_ht_bit(event->hw.config);
 
 	if (event->attr.type == PERF_TYPE_RAW) {
+		/* only raw perfctr config supported */
+		if (event->attr.raw_type) {
+			rc = -EINVAL;
+			goto out;
+		}
 
 		/* user data may have out-of-bound event index */
 		evnt = p4_config_unpack_event(event->attr.config);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b50f4cf..f9d2d5e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -214,16 +214,18 @@ struct perf_event_attr {
 				 *  See also PERF_RECORD_MISC_EXACT_IP
 				 */
 				precise_ip     :  2, /* skid constraint       */
-				model_spec     :  1, /* model specific hw event */
 
-				__reserved_1   : 46;
+				__reserved_1   : 47;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
 		__u32		wakeup_watermark; /* bytes before wakeup   */
 	};
 
-	__u32			bp_type;
+	union {
+		__u32		bp_type;
+		__u32		raw_type;
+	};
 	__u64			bp_addr;
 	__u64			bp_len;
 };




^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  2010-05-19 22:02   ` Corey Ashford
  2010-05-20  8:10   ` Stephane Eranian
  2010-05-19 21:20 ` [PATCH 2/7] perf, x86: introduce bit range for special pmu events Robert Richter
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

This patch introduces a method to specify the type of a raw sample.
This can be used to setup hardware events other than generic
performance counters by passing special config data to the pmu. The
config data can be interpreted different from generic events and thus
can be used for other purposes.

The raw_type attribute is an extension of the ABI. It reuses the
unused bp_type space for this. Generic performance counters can be
setup by setting the raw_type attribute to null. Thus special raw
events must have a type other than null.

Raw types can be defined as needed for cpu models or architectures.
To keep backward compatibility all architectures must return an error
for an event with a raw_type other than null that is not supported.

E.g., raw_type can be used to setup IBS on an AMD cpu. IBS is not
common to pmu features from other vendors or architectures. The pmu
must be setup with a special config value. Sample data is returned in
a certain format back to the userland. An IBS event is created by
setting a raw event and encoding the IBS type in raw_type. The pmu
handles this raw event then and passes raw sample data back.

Raw type could be architecure specific, e.g. for x86:

enum perf_raw_type {
        PERF_RAW_PERFCTR                        = 0,
        PERF_RAW_IBS_FETCH                      = 1,
        PERF_RAW_IBS_OP                         = 2,

        PERF_RAW_MAX,
};

Null is the architecture's default, meaning for x86 a perfctr.

Maybe the raw type definition could also be part of the ABI with one
definition for all architectures.

To use raw events with perf, the raw event syntax could be suffixed by
the type (as for breakpoints):

   -e rNNN[:TYPE]

Example:

 perf record -e r186A:1          # ... meaning IBS fetch, cycle count 100000
 perf record -e r0:1 -c 100000   # ... the same

Or with named types:

 perf record -e r186A:IBS_FETCH ...
 perf record -e r0:IBS_FETCH -c 100000 ...

This solution has a number of advantages: A raw event type may be
specified without to encode the type in the config value. The attr
flags are not 'polluted'. We can follow the already existing
breakpoint concept in syntax and encoding.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/arm/kernel/perf_event.c             |    5 +++++
 arch/powerpc/kernel/perf_event.c         |    2 ++
 arch/powerpc/kernel/perf_event_fsl_emb.c |    2 ++
 arch/sh/kernel/perf_event.c              |    2 ++
 arch/x86/kernel/cpu/perf_event.c         |    5 ++++-
 arch/x86/kernel/cpu/perf_event_amd.c     |    3 +++
 arch/x86/kernel/cpu/perf_event_intel.c   |    3 +++
 arch/x86/kernel/cpu/perf_event_p4.c      |    5 +++++
 include/linux/perf_event.h               |    5 ++++-
 9 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 9e70f20..73d680c 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -388,6 +388,11 @@ __hw_perf_event_init(struct perf_event *event)
 	} else if (PERF_TYPE_HW_CACHE == event->attr.type) {
 		mapping = armpmu_map_cache_event(event->attr.config);
 	} else if (PERF_TYPE_RAW == event->attr.type) {
+		if (event->attr.raw_type) {
+			pr_debug("invalid raw type %x\n",
+				 event->attr.raw_type);
+			return -EINVAL;
+		}
 		mapping = armpmu->raw_event(event->attr.config);
 	} else {
 		pr_debug("event type %x not supported\n", event->attr.type);
diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index 43b83c3..c8fb3cf 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -1036,6 +1036,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
 			return ERR_PTR(err);
 		break;
 	case PERF_TYPE_RAW:
+		if (event->attr.raw_type)
+			return ERR_PTR(-EINVAL);
 		ev = event->attr.config;
 		break;
 	default:
diff --git a/arch/powerpc/kernel/perf_event_fsl_emb.c b/arch/powerpc/kernel/perf_event_fsl_emb.c
index 369872f..7547e96 100644
--- a/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ b/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -452,6 +452,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
 		break;
 
 	case PERF_TYPE_RAW:
+		if (event->attr.raw_type)
+			return ERR_PTR(-EINVAL);
 		ev = event->attr.config;
 		break;
 
diff --git a/arch/sh/kernel/perf_event.c b/arch/sh/kernel/perf_event.c
index 81b6de4..482cf48 100644
--- a/arch/sh/kernel/perf_event.c
+++ b/arch/sh/kernel/perf_event.c
@@ -142,6 +142,8 @@ static int __hw_perf_event_init(struct perf_event *event)
 
 	switch (attr->type) {
 	case PERF_TYPE_RAW:
+		if (attr->raw_type)
+			return -EINVAL;
 		config = attr->config & sh_pmu->raw_event_mask;
 		break;
 	case PERF_TYPE_HW_CACHE:
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index fd4db0d..3539b53 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -449,8 +449,11 @@ static int x86_setup_perfctr(struct perf_event *event)
 			return -EOPNOTSUPP;
 	}
 
-	if (attr->type == PERF_TYPE_RAW)
+	if (attr->type == PERF_TYPE_RAW) {
+		if (attr->raw_type)
+			return -EINVAL;
 		return 0;
+	}
 
 	if (attr->type == PERF_TYPE_HW_CACHE)
 		return set_ext_hw_attr(hwc, attr);
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 611df11..87e5ae4 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -121,6 +121,9 @@ static int amd_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	if (event->attr.raw_type)
+		return -EINVAL;
+
 	event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index fdbc652..d15faf5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -770,6 +770,9 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	if (event->attr.raw_type)
+		return -EINVAL;
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 87e1803..1001892 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -437,6 +437,11 @@ static int p4_hw_config(struct perf_event *event)
 		event->hw.config = p4_set_ht_bit(event->hw.config);
 
 	if (event->attr.type == PERF_TYPE_RAW) {
+		/* only raw perfctr config supported */
+		if (event->attr.raw_type) {
+			rc = -EINVAL;
+			goto out;
+		}
 
 		/* user data may have out-of-bound event index */
 		evnt = p4_config_unpack_event(event->attr.config);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fe50347..f9d2d5e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -222,7 +222,10 @@ struct perf_event_attr {
 		__u32		wakeup_watermark; /* bytes before wakeup   */
 	};
 
-	__u32			bp_type;
+	union {
+		__u32		bp_type;
+		__u32		raw_type;
+	};
 	__u64			bp_addr;
 	__u64			bp_len;
 };
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/7] perf, x86: introduce bit range for special pmu events
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
  2010-05-19 21:20 ` [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  2010-05-19 21:20 ` [PATCH 3/7] perf, x86: modify some code to allow the introduction of ibs events Robert Richter
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

There are some pmu events such as Intel BTS or AMD IBS that do not fit
in the generic or fixed performance counter scheme. The upper bits
starting at bit 48 of the 64 bit counter mask are reserved for such
events and can be used to handle them. The events can be identified by
its index in the bit mask.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/include/asm/perf_event.h         |    3 ++-
 arch/x86/kernel/cpu/perf_event.c          |    6 +++---
 arch/x86/kernel/cpu/perf_event_intel.c    |   10 +++++-----
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    4 ++--
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 254883d..7e51c75 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -10,6 +10,7 @@
 
 #define X86_PMC_IDX_GENERIC				        0
 #define X86_PMC_IDX_FIXED				       32
+#define X86_PMC_IDX_SPECIAL				       48
 #define X86_PMC_IDX_MAX					       64
 
 #define MSR_ARCH_PERFMON_PERFCTR0			      0xc1
@@ -107,7 +108,7 @@ union cpuid10_edx {
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
-#define X86_PMC_IDX_FIXED_BTS				(X86_PMC_IDX_FIXED + 16)
+#define X86_PMC_IDX_SPECIAL_BTS				(X86_PMC_IDX_SPECIAL + 0)
 
 /* IbsFetchCtl bits/masks */
 #define IBS_FETCH_RAND_EN		(1ULL<<57)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3539b53..75c0a44 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -283,7 +283,7 @@ x86_perf_event_update(struct perf_event *event)
 	int idx = hwc->idx;
 	s64 delta;
 
-	if (idx == X86_PMC_IDX_FIXED_BTS)
+	if (idx == X86_PMC_IDX_SPECIAL_BTS)
 		return 0;
 
 	/*
@@ -775,7 +775,7 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 	hwc->last_cpu = smp_processor_id();
 	hwc->last_tag = ++cpuc->tags[i];
 
-	if (hwc->idx == X86_PMC_IDX_FIXED_BTS) {
+	if (hwc->idx == X86_PMC_IDX_SPECIAL_BTS) {
 		hwc->config_base = 0;
 		hwc->event_base	= 0;
 	} else if (hwc->idx >= X86_PMC_IDX_FIXED) {
@@ -891,7 +891,7 @@ x86_perf_event_set_period(struct perf_event *event)
 	s64 period = hwc->sample_period;
 	int ret = 0, idx = hwc->idx;
 
-	if (idx == X86_PMC_IDX_FIXED_BTS)
+	if (idx == X86_PMC_IDX_SPECIAL_BTS)
 		return 0;
 
 	/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index d15faf5..dfbbe69 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -458,7 +458,7 @@ static void intel_pmu_disable_all(void)
 
 	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
 
-	if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask))
+	if (test_bit(X86_PMC_IDX_SPECIAL_BTS, cpuc->active_mask))
 		intel_pmu_disable_bts();
 
 	intel_pmu_pebs_disable_all();
@@ -473,9 +473,9 @@ static void intel_pmu_enable_all(int added)
 	intel_pmu_lbr_enable_all();
 	wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
 
-	if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
+	if (test_bit(X86_PMC_IDX_SPECIAL_BTS, cpuc->active_mask)) {
 		struct perf_event *event =
-			cpuc->events[X86_PMC_IDX_FIXED_BTS];
+			cpuc->events[X86_PMC_IDX_SPECIAL_BTS];
 
 		if (WARN_ON_ONCE(!event))
 			return;
@@ -550,7 +550,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) {
+	if (unlikely(hwc->idx == X86_PMC_IDX_SPECIAL_BTS)) {
 		intel_pmu_disable_bts();
 		intel_pmu_drain_bts_buffer();
 		return;
@@ -602,7 +602,7 @@ static void intel_pmu_enable_event(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) {
+	if (unlikely(hwc->idx == X86_PMC_IDX_SPECIAL_BTS)) {
 		if (!__get_cpu_var(cpu_hw_events).enabled)
 			return;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 18018d1..bd46bbd 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -176,7 +176,7 @@ static int reserve_ds_buffers(void)
  */
 
 static struct event_constraint bts_constraint =
-	EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_FIXED_BTS, 0);
+	EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_SPECIAL_BTS, 0);
 
 static void intel_pmu_enable_bts(u64 config)
 {
@@ -223,7 +223,7 @@ static void intel_pmu_drain_bts_buffer(void)
 		u64	to;
 		u64	flags;
 	};
-	struct perf_event *event = cpuc->events[X86_PMC_IDX_FIXED_BTS];
+	struct perf_event *event = cpuc->events[X86_PMC_IDX_SPECIAL_BTS];
 	struct bts_record *at, *top;
 	struct perf_output_handle handle;
 	struct perf_event_header header;
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/7] perf, x86: modify some code to allow the introduction of ibs events
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
  2010-05-19 21:20 ` [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample Robert Richter
  2010-05-19 21:20 ` [PATCH 2/7] perf, x86: introduce bit range for special pmu events Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  2010-05-19 21:20 ` [PATCH 4/7] perf, x86: implement IBS feature detection Robert Richter
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

The changes are needed to introduce ibs events that are implemented as
special x86 events.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 75c0a44..e64502c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -283,7 +283,7 @@ x86_perf_event_update(struct perf_event *event)
 	int idx = hwc->idx;
 	s64 delta;
 
-	if (idx == X86_PMC_IDX_SPECIAL_BTS)
+	if (idx >= X86_PMC_IDX_SPECIAL)
 		return 0;
 
 	/*
@@ -775,10 +775,10 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 	hwc->last_cpu = smp_processor_id();
 	hwc->last_tag = ++cpuc->tags[i];
 
-	if (hwc->idx == X86_PMC_IDX_SPECIAL_BTS) {
-		hwc->config_base = 0;
-		hwc->event_base	= 0;
-	} else if (hwc->idx >= X86_PMC_IDX_FIXED) {
+	if (hwc->idx < X86_PMC_IDX_FIXED) {
+		hwc->config_base = x86_pmu.eventsel;
+		hwc->event_base  = x86_pmu.perfctr;
+	} else if (hwc->idx < X86_PMC_IDX_SPECIAL) {
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
 		/*
 		 * We set it so that event_base + idx in wrmsr/rdmsr maps to
@@ -786,9 +786,9 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 		 */
 		hwc->event_base =
 			MSR_ARCH_PERFMON_FIXED_CTR0 - X86_PMC_IDX_FIXED;
-	} else {
-		hwc->config_base = x86_pmu.eventsel;
-		hwc->event_base  = x86_pmu.perfctr;
+	} else if (hwc->idx == X86_PMC_IDX_SPECIAL_BTS) {
+		hwc->config_base = 0;
+		hwc->event_base	= 0;
 	}
 }
 
@@ -891,7 +891,7 @@ x86_perf_event_set_period(struct perf_event *event)
 	s64 period = hwc->sample_period;
 	int ret = 0, idx = hwc->idx;
 
-	if (idx == X86_PMC_IDX_SPECIAL_BTS)
+	if (idx >= X86_PMC_IDX_SPECIAL_BTS)
 		return 0;
 
 	/*
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/7] perf, x86: implement IBS feature detection
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
                   ` (2 preceding siblings ...)
  2010-05-19 21:20 ` [PATCH 3/7] perf, x86: modify some code to allow the introduction of ibs events Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  2010-05-19 21:20 ` [PATCH 5/7] perf, x86: setup NMI handler for IBS Robert Richter
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

The new code checks if IBS is available on the cpu. It implements only
a basic detection that should be later extended to read ibs cpuid
feature flags.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c     |    5 +++++
 arch/x86/kernel/cpu/perf_event_amd.c |    2 ++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e64502c..e186d3b 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -244,6 +244,11 @@ struct x86_pmu {
 	 */
 	unsigned long	lbr_tos, lbr_from, lbr_to; /* MSR base regs       */
 	int		lbr_nr;			   /* hardware stack size */
+
+	/*
+	 * AMD IBS
+	 */
+	int		ibs;			/* cpuid flags */
 };
 
 static struct x86_pmu x86_pmu __read_mostly;
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 87e5ae4..cda8475 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -405,6 +405,8 @@ static __init int amd_pmu_init(void)
 		return -ENODEV;
 
 	x86_pmu = amd_pmu;
+	if (boot_cpu_has(X86_FEATURE_IBS))
+		x86_pmu.ibs = 1;
 
 	/* Events are common for all AMDs */
 	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/7] perf, x86: setup NMI handler for IBS
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
                   ` (3 preceding siblings ...)
  2010-05-19 21:20 ` [PATCH 4/7] perf, x86: implement IBS feature detection Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  2010-05-19 21:20 ` [PATCH 6/7] perf, x86: implement AMD IBS event configuration Robert Richter
  2010-05-19 21:20 ` [PATCH 7/7] perf, x86: implement the ibs interrupt handler Robert Richter
  6 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

This implements the perf nmi handler for ibs interrupts. The code was
copied from oprofile and should be merged somewhen.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event.c     |    5 ++
 arch/x86/kernel/cpu/perf_event_amd.c |   85 ++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e186d3b..91c48b2 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -533,6 +533,8 @@ static int x86_pmu_hw_config(struct perf_event *event)
 	return x86_setup_perfctr(event);
 }
 
+static inline void init_ibs_nmi(void);
+
 /*
  * Setup the hardware configuration for a given attr_type
  */
@@ -554,6 +556,8 @@ static int __hw_perf_event_init(struct perf_event *event)
 				if (err)
 					release_pmc_hardware();
 			}
+			if (!err)
+				init_ibs_nmi();
 		}
 		if (!err)
 			atomic_inc(&active_events);
@@ -1294,6 +1298,7 @@ static void __init pmu_check_apic(void)
 		return;
 
 	x86_pmu.apic = 0;
+	x86_pmu.ibs = 0;
 	pr_info("no APIC, boot with the \"lapic\" boot parameter to force-enable it.\n");
 	pr_info("no hardware sampling interrupt available.\n");
 }
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index cda8475..4f6a73a 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -1,5 +1,7 @@
 #ifdef CONFIG_CPU_SUP_AMD
 
+#include <linux/pci.h>
+
 static DEFINE_RAW_SPINLOCK(amd_nb_lock);
 
 static __initconst const u64 amd_hw_cache_event_ids
@@ -106,6 +108,85 @@ static const u64 amd_perfmon_event_map[] =
   [PERF_COUNT_HW_BRANCH_MISSES]		= 0x00c5,
 };
 
+#ifdef CONFIG_X86_LOCAL_APIC
+
+/* IBS - apic initialization, taken from oprofile, should be unified */
+
+/*
+ * Currently there is no early pci ecs access implemented, so this
+ * can't be put into amd_pmu_init(). For now we initialize it in
+ * __hw_perf_event_init().
+ */
+
+static int __init_ibs_nmi(void)
+{
+#define IBSCTL_LVTOFFSETVAL		(1 << 8)
+#define IBSCTL				0x1cc
+	struct pci_dev *cpu_cfg;
+	int nodes;
+	u32 value = 0;
+	u8 ibs_eilvt_off;
+
+	ibs_eilvt_off = setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_FIX, 1);
+
+	nodes = 0;
+	cpu_cfg = NULL;
+	do {
+		cpu_cfg = pci_get_device(PCI_VENDOR_ID_AMD,
+					 PCI_DEVICE_ID_AMD_10H_NB_MISC,
+					 cpu_cfg);
+		if (!cpu_cfg)
+			break;
+		++nodes;
+		pci_write_config_dword(cpu_cfg, IBSCTL, ibs_eilvt_off
+				       | IBSCTL_LVTOFFSETVAL);
+		pci_read_config_dword(cpu_cfg, IBSCTL, &value);
+		if (value != (ibs_eilvt_off | IBSCTL_LVTOFFSETVAL)) {
+			pci_dev_put(cpu_cfg);
+			printk(KERN_DEBUG "Failed to setup IBS LVT offset, "
+				"IBSCTL = 0x%08x", value);
+			return 1;
+		}
+	} while (1);
+
+	if (!nodes) {
+		printk(KERN_DEBUG "No CPU node configured for IBS");
+		return 1;
+	}
+
+	return 0;
+}
+
+static inline void init_ibs_nmi(void)
+{
+	if (!x86_pmu.ibs)
+		return;
+
+	if (__init_ibs_nmi())
+		/* something went wrong, disable ibs */
+		x86_pmu.ibs = 0;
+}
+
+static inline void apic_init_ibs(void)
+{
+	if (x86_pmu.ibs)
+		setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_NMI, 0);
+}
+
+static inline void apic_clear_ibs(void)
+{
+	if (x86_pmu.ibs)
+		setup_APIC_eilvt_ibs(0, APIC_EILVT_MSG_FIX, 1);
+}
+
+#else
+
+static inline void init_ibs_nmi(void) { }
+static inline void apic_init_ibs(void) { }
+static inline void apic_clear_ibs(void) { }
+
+#endif
+
 static u64 amd_pmu_event_map(int hw_event)
 {
 	return amd_perfmon_event_map[hw_event];
@@ -346,6 +427,8 @@ static void amd_pmu_cpu_starting(int cpu)
 	cpuc->amd_nb->refcnt++;
 
 	raw_spin_unlock(&amd_nb_lock);
+
+	apic_init_ibs();
 }
 
 static void amd_pmu_cpu_dead(int cpu)
@@ -369,6 +452,8 @@ static void amd_pmu_cpu_dead(int cpu)
 	}
 
 	raw_spin_unlock(&amd_nb_lock);
+
+	apic_clear_ibs();
 }
 
 static __initconst const struct x86_pmu amd_pmu = {
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/7] perf, x86: implement AMD IBS event configuration
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
                   ` (4 preceding siblings ...)
  2010-05-19 21:20 ` [PATCH 5/7] perf, x86: setup NMI handler for IBS Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  2010-05-19 21:20 ` [PATCH 7/7] perf, x86: implement the ibs interrupt handler Robert Richter
  6 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

This patch implements IBS for perf_event. It extends the AMD pmu to
handle IBS and implements special raw events for this.

An IBS hw event can be selected by setting the event's raw type. There
are two types of IBS events for instruction fetch (IBS_FETCH) and
instruction execution (IBS_OP). Both events are implemented as special
x86 events. The event allocation is implemented by using special event
constraints for ibs. This way, the generic event scheduler can be used
to allocate ibs events.

Except for the sample period IBS can only be set up with raw config
values and raw data samples. The event attributes for the syscall
should be programmed like this (IBS_FETCH):

        memset(&attr, 0, sizeof(attr));
        attr.type        = PERF_TYPE_RAW;
        attr.raw_type    = PERF_RAW_IBS_FETCH;
        attr.sample_type = PERF_SAMPLE_CPU | PERF_SAMPLE_RAW;
        attr.config      = IBS_FETCH_CONFIG_DEFAULT;

Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/include/asm/perf_event.h    |   11 ++-
 arch/x86/kernel/cpu/perf_event.c     |   11 ++
 arch/x86/kernel/cpu/perf_event_amd.c |  194 +++++++++++++++++++++++++++++++++-
 3 files changed, 207 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 7e51c75..dace4e2 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -44,14 +44,15 @@
 #define AMD64_RAW_EVENT_MASK		\
 	(X86_RAW_EVENT_MASK          |  \
 	 AMD64_EVENTSEL_EVENT)
+#define AMD64_NUM_COUNTERS				4
 
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		      0x3c
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL		0x3c
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK		(0x00 << 8)
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX			 0
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX		0
 #define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \
 		(1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX))
 
-#define ARCH_PERFMON_BRANCH_MISSES_RETIRED			 6
+#define ARCH_PERFMON_BRANCH_MISSES_RETIRED		6
 
 /*
  * Intel "Architectural Performance Monitoring" CPUID
@@ -102,13 +103,15 @@ union cpuid10_edx {
 #define X86_PMC_IDX_FIXED_BUS_CYCLES			(X86_PMC_IDX_FIXED + 2)
 
 /*
- * We model BTS tracing as another fixed-mode PMC.
+ * Masks for special PMU features
  *
  * We choose a value in the middle of the fixed event range, since lower
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
 #define X86_PMC_IDX_SPECIAL_BTS				(X86_PMC_IDX_SPECIAL + 0)
+#define X86_PMC_IDX_SPECIAL_IBS_FETCH			(X86_PMC_IDX_SPECIAL + 1)
+#define X86_PMC_IDX_SPECIAL_IBS_OP			(X86_PMC_IDX_SPECIAL + 2)
 
 /* IbsFetchCtl bits/masks */
 #define IBS_FETCH_RAND_EN		(1ULL<<57)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 91c48b2..fe7ba91 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -148,6 +148,9 @@ struct cpu_hw_events {
 #define INTEL_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT)
 
+#define AMD_IBS_EVENT_CONSTRAINT(idx)	\
+	__EVENT_CONSTRAINT(0, 1ULL << (idx), 0, AMD64_NUM_COUNTERS + 1)
+
 /*
  * Constraint on the Event code + UMask + fixed-mask
  *
@@ -186,6 +189,14 @@ union perf_capabilities {
 };
 
 /*
+ * Raw hardware event types
+ */
+#define PERF_RAW_IBS_FETCH	1
+#define PERF_RAW_IBS_OP		2
+
+#define PERF_RAW_IBS_BASE	PERF_RAW_IBS_FETCH
+
+/*
  * struct x86_pmu - generic x86 pmu
  */
 struct x86_pmu {
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 4f6a73a..5161745 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -2,6 +2,34 @@
 
 #include <linux/pci.h>
 
+#define IBS_FETCH_MAP_IDX	(PERF_RAW_IBS_FETCH - PERF_RAW_IBS_BASE)
+#define IBS_OP_MAP_IDX		(PERF_RAW_IBS_OP - PERF_RAW_IBS_BASE)
+
+#define IBS_FETCH_CONFIG_MASK	(IBS_FETCH_RAND_EN | IBS_FETCH_MAX_CNT)
+#define IBS_OP_CONFIG_MASK	IBS_OP_MAX_CNT
+
+struct ibs_map {
+	int idx;
+	u64 cnt_mask;
+	u64 valid_mask;
+	unsigned int msr;
+};
+
+static struct ibs_map ibs_map[] = {
+	[IBS_FETCH_MAP_IDX] = {
+		.idx		= X86_PMC_IDX_SPECIAL_IBS_FETCH,
+		.cnt_mask	= IBS_FETCH_MAX_CNT,
+		.valid_mask	= IBS_FETCH_CONFIG_MASK,
+		.msr		= MSR_AMD64_IBSFETCHCTL,
+	},
+	[IBS_OP_MAP_IDX] = {
+		.idx		= X86_PMC_IDX_SPECIAL_IBS_OP,
+		.cnt_mask	= IBS_OP_MAX_CNT,
+		.valid_mask	= IBS_OP_CONFIG_MASK,
+		.msr		= MSR_AMD64_IBSOPCTL,
+	},
+};
+
 static DEFINE_RAW_SPINLOCK(amd_nb_lock);
 
 static __initconst const u64 amd_hw_cache_event_ids
@@ -187,6 +215,129 @@ static inline void apic_clear_ibs(void) { }
 
 #endif
 
+static inline void amd_pmu_disable_ibs(void)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	u64 val;
+
+	if (test_bit(X86_PMC_IDX_SPECIAL_IBS_FETCH, cpuc->active_mask)) {
+		rdmsrl(MSR_AMD64_IBSFETCHCTL, val);
+		val &= ~IBS_FETCH_ENABLE;
+		wrmsrl(MSR_AMD64_IBSFETCHCTL, val);
+	}
+	if (test_bit(X86_PMC_IDX_SPECIAL_IBS_OP, cpuc->active_mask)) {
+		rdmsrl(MSR_AMD64_IBSOPCTL, val);
+		val &= ~IBS_OP_ENABLE;
+		wrmsrl(MSR_AMD64_IBSOPCTL, val);
+	}
+}
+
+static inline void amd_pmu_enable_ibs(void)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	u64 val;
+
+	if (test_bit(X86_PMC_IDX_SPECIAL_IBS_FETCH, cpuc->active_mask)) {
+		rdmsrl(MSR_AMD64_IBSFETCHCTL, val);
+		val |= IBS_FETCH_ENABLE;
+		wrmsrl(MSR_AMD64_IBSFETCHCTL, val);
+	}
+	if (test_bit(X86_PMC_IDX_SPECIAL_IBS_OP, cpuc->active_mask)) {
+		rdmsrl(MSR_AMD64_IBSOPCTL, val);
+		val |= IBS_OP_ENABLE;
+		wrmsrl(MSR_AMD64_IBSOPCTL, val);
+	}
+}
+
+static int amd_pmu_ibs_config(struct perf_event *event)
+{
+	int map_idx;
+	u64 max_cnt, config;
+	struct ibs_map *map;
+
+	if (!x86_pmu.ibs)
+		return -ENODEV;
+
+	if (event->attr.type != PERF_TYPE_RAW)
+		/* only raw sample types are supported */
+		return -EINVAL;
+
+	if (event->attr.raw_type < PERF_RAW_IBS_BASE)
+		return -ENODEV;
+	map_idx = event->attr.raw_type - PERF_RAW_IBS_BASE;
+	if (map_idx >= ARRAY_SIZE(ibs_map))
+		return -ENODEV;
+
+	map = &ibs_map[map_idx];
+	config = event->attr.config;
+	if (event->hw.sample_period) {
+		if (config & map->cnt_mask)
+			/* raw max_cnt may not be set */
+			return -EINVAL;
+		if (event->hw.sample_period & 0x0f)
+			/* lower 4 bits can not be set in ibs max cnt */
+			return -EINVAL;
+		max_cnt = event->hw.sample_period >> 4;
+		if (max_cnt & ~map->cnt_mask)
+			/* out of range */
+			return -EINVAL;
+		config |= max_cnt;
+	} else {
+		max_cnt = event->attr.config & map->cnt_mask;
+	}
+
+	if (!max_cnt)
+		return -EINVAL;
+
+	if (config & ~map->valid_mask)
+		return -EINVAL;
+
+	event->hw.config = config;
+	event->hw.idx = map->idx;
+	/*
+	 * dirty hack, needed for __x86_pmu_enable_event(), we
+	 * should better change event->hw.config_base into
+	 * event->hw.config_msr that already includes the index
+	 */
+	event->hw.config_base = map->msr - event->hw.idx;
+
+	return 0;
+}
+
+static inline void __amd_pmu_enable_ibs_event(struct hw_perf_event *hwc)
+{
+	if (hwc->idx == X86_PMC_IDX_SPECIAL_IBS_FETCH)
+		__x86_pmu_enable_event(hwc, IBS_FETCH_ENABLE);
+	else if (hwc->idx == X86_PMC_IDX_SPECIAL_IBS_OP)
+		__x86_pmu_enable_event(hwc, IBS_OP_ENABLE);
+}
+
+static void amd_pmu_disable_all(void)
+{
+	x86_pmu_disable_all();
+	amd_pmu_disable_ibs();
+}
+
+static void amd_pmu_enable_all(int added)
+{
+	x86_pmu_enable_all(added);
+	amd_pmu_enable_ibs();
+}
+
+static void amd_pmu_enable_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!cpuc->enabled)
+		return;
+
+	if (hwc->idx < X86_PMC_IDX_SPECIAL)
+		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
+	else
+		__amd_pmu_enable_ibs_event(hwc);
+}
+
 static u64 amd_pmu_event_map(int hw_event)
 {
 	return amd_perfmon_event_map[hw_event];
@@ -194,7 +345,12 @@ static u64 amd_pmu_event_map(int hw_event)
 
 static int amd_pmu_hw_config(struct perf_event *event)
 {
-	int ret = x86_pmu_hw_config(event);
+	int ret;
+
+	if (event->attr.raw_type)
+		return amd_pmu_ibs_config(event);
+
+	ret = x86_pmu_hw_config(event);
 
 	if (ret)
 		return ret;
@@ -211,6 +367,21 @@ static int amd_pmu_hw_config(struct perf_event *event)
 }
 
 /*
+ * AMD64 events - list of special events (IBS)
+ */
+static struct event_constraint amd_event_constraints[] =
+{
+	/*
+	 * The value for the weight of these constraints is higher
+	 * than in the unconstrainted case to process ibs after the
+	 * generic counters in x86_schedule_events().
+	 */
+	AMD_IBS_EVENT_CONSTRAINT(X86_PMC_IDX_SPECIAL_IBS_FETCH),
+	AMD_IBS_EVENT_CONSTRAINT(X86_PMC_IDX_SPECIAL_IBS_OP),
+	EVENT_CONSTRAINT_END
+};
+
+/*
  * AMD64 events are detected based on their event codes.
  */
 static inline int amd_is_nb_event(struct hw_perf_event *hwc)
@@ -296,10 +467,23 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	struct amd_nb *nb = cpuc->amd_nb;
 	struct perf_event *old = NULL;
+	struct event_constraint *c;
 	int max = x86_pmu.num_counters;
 	int i, j, k = -1;
 
 	/*
+	 * The index of special events must be set in
+	 * hw_perf_event_init(). The constraints are used for resource
+	 * allocation by the event scheduler.
+	 */
+	if (hwc->idx >= X86_PMC_IDX_SPECIAL) {
+		for_each_event_constraint(c, amd_event_constraints) {
+			if (test_bit(hwc->idx, c->idxmsk))
+				return c;
+		}
+	}
+
+	/*
 	 * if not NB event or no NB, then no constraints
 	 */
 	if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
@@ -459,9 +643,9 @@ static void amd_pmu_cpu_dead(int cpu)
 static __initconst const struct x86_pmu amd_pmu = {
 	.name			= "AMD",
 	.handle_irq		= x86_pmu_handle_irq,
-	.disable_all		= x86_pmu_disable_all,
-	.enable_all		= x86_pmu_enable_all,
-	.enable			= x86_pmu_enable_event,
+	.disable_all		= amd_pmu_disable_all,
+	.enable_all		= amd_pmu_enable_all,
+	.enable			= amd_pmu_enable_event,
 	.disable		= x86_pmu_disable_event,
 	.hw_config		= amd_pmu_hw_config,
 	.schedule_events	= x86_schedule_events,
@@ -469,7 +653,7 @@ static __initconst const struct x86_pmu amd_pmu = {
 	.perfctr		= MSR_K7_PERFCTR0,
 	.event_map		= amd_pmu_event_map,
 	.max_events		= ARRAY_SIZE(amd_perfmon_event_map),
-	.num_counters		= 4,
+	.num_counters		= AMD64_NUM_COUNTERS,
 	.cntval_bits		= 48,
 	.cntval_mask		= (1ULL << 48) - 1,
 	.apic			= 1,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/7] perf, x86: implement the ibs interrupt handler
  2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
                   ` (5 preceding siblings ...)
  2010-05-19 21:20 ` [PATCH 6/7] perf, x86: implement AMD IBS event configuration Robert Richter
@ 2010-05-19 21:20 ` Robert Richter
  6 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-19 21:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Stephane Eranian, LKML, Robert Richter

This patch implements code to handle ibs interrupts. If ibs data is
available a raw perf_event data sample is created and sent back to the
userland. This patch only implements the storage of ibs data in the
raw sample, but this could be extended in a later patch by generating
generic event data such as the rip from the ibs sampling data.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/include/asm/msr-index.h     |    3 +
 arch/x86/kernel/cpu/perf_event_amd.c |   70 +++++++++++++++++++++++++++++++++-
 2 files changed, 72 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bc473ac..8b9929f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -113,6 +113,7 @@
 #define MSR_AMD64_IBSFETCHCTL		0xc0011030
 #define MSR_AMD64_IBSFETCHLINAD		0xc0011031
 #define MSR_AMD64_IBSFETCHPHYSAD	0xc0011032
+#define MSR_AMD64_IBSFETCH_REG_COUNT	3
 #define MSR_AMD64_IBSOPCTL		0xc0011033
 #define MSR_AMD64_IBSOPRIP		0xc0011034
 #define MSR_AMD64_IBSOPDATA		0xc0011035
@@ -120,7 +121,9 @@
 #define MSR_AMD64_IBSOPDATA3		0xc0011037
 #define MSR_AMD64_IBSDCLINAD		0xc0011038
 #define MSR_AMD64_IBSDCPHYSAD		0xc0011039
+#define MSR_AMD64_IBSOP_REG_COUNT	7
 #define MSR_AMD64_IBSCTL		0xc001103a
+#define MSR_AMD64_IBS_REG_COUNT_MAX	MSR_AMD64_IBSOP_REG_COUNT
 
 /* Fam 10h MSRs */
 #define MSR_FAM10H_MMIO_CONF_BASE	0xc0010058
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 5161745..a083174 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -11,22 +11,31 @@
 struct ibs_map {
 	int idx;
 	u64 cnt_mask;
+	u64 sample_valid;
+	u64 enable;
 	u64 valid_mask;
 	unsigned int msr;
+	int reg_count;
 };
 
 static struct ibs_map ibs_map[] = {
 	[IBS_FETCH_MAP_IDX] = {
 		.idx		= X86_PMC_IDX_SPECIAL_IBS_FETCH,
 		.cnt_mask	= IBS_FETCH_MAX_CNT,
+		.sample_valid	= IBS_FETCH_VAL,
+		.enable		= IBS_FETCH_ENABLE,
 		.valid_mask	= IBS_FETCH_CONFIG_MASK,
 		.msr		= MSR_AMD64_IBSFETCHCTL,
+		.reg_count	= MSR_AMD64_IBSFETCH_REG_COUNT,
 	},
 	[IBS_OP_MAP_IDX] = {
 		.idx		= X86_PMC_IDX_SPECIAL_IBS_OP,
 		.cnt_mask	= IBS_OP_MAX_CNT,
+		.sample_valid	= IBS_OP_VAL,
+		.enable		= IBS_OP_ENABLE,
 		.valid_mask	= IBS_OP_CONFIG_MASK,
 		.msr		= MSR_AMD64_IBSOPCTL,
+		.reg_count	= MSR_AMD64_IBSOP_REG_COUNT,
 	},
 };
 
@@ -312,6 +321,65 @@ static inline void __amd_pmu_enable_ibs_event(struct hw_perf_event *hwc)
 		__x86_pmu_enable_event(hwc, IBS_OP_ENABLE);
 }
 
+static int amd_pmu_check_ibs(struct pt_regs *iregs, int map_idx)
+{
+	struct ibs_map *map = &ibs_map[map_idx];
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct perf_event *event = cpuc->events[map->idx];
+	struct perf_sample_data data;
+	struct perf_raw_record raw;
+	struct pt_regs regs;
+	u64 buffer[MSR_AMD64_IBS_REG_COUNT_MAX];
+	int i;
+	unsigned int msr;
+	u64 *buf;
+
+	if (!test_bit(map->idx, cpuc->active_mask))
+		return 0;
+
+	msr = map->msr;
+	buf = buffer;
+	rdmsrl(msr++, *buf);
+	if (!(*buf++ & map->sample_valid))
+		return 0;
+
+	perf_sample_data_init(&data, 0);
+	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
+		for (i = 1; i < map->reg_count; i++)
+			rdmsrl(msr++, *buf++);
+		raw.size = sizeof(u32) + sizeof(u64) * map->reg_count;
+		raw.data = buffer;
+		data.raw = &raw;
+	}
+
+	regs = *iregs; /* later: update ip from ibs sample */
+
+	if (perf_event_overflow(event, 1, &data, &regs))
+		x86_pmu_stop(event);
+	else
+		__x86_pmu_enable_event(&event->hw, map->enable);
+
+	return 1;
+}
+
+static int amd_pmu_handle_irq(struct pt_regs *regs)
+{
+	int handled, handled2;
+
+	handled = x86_pmu_handle_irq(regs);
+
+	if (!x86_pmu.ibs)
+		return handled;
+
+	handled2 = 0;
+	handled2 += amd_pmu_check_ibs(regs, IBS_FETCH_MAP_IDX);
+	handled2 += amd_pmu_check_ibs(regs, IBS_OP_MAP_IDX);
+	if (!handled && handled2)
+		inc_irq_stat(apic_perf_irqs);
+
+	return (handled || handled2);
+}
+
 static void amd_pmu_disable_all(void)
 {
 	x86_pmu_disable_all();
@@ -642,7 +710,7 @@ static void amd_pmu_cpu_dead(int cpu)
 
 static __initconst const struct x86_pmu amd_pmu = {
 	.name			= "AMD",
-	.handle_irq		= x86_pmu_handle_irq,
+	.handle_irq		= amd_pmu_handle_irq,
 	.disable_all		= amd_pmu_disable_all,
 	.enable_all		= amd_pmu_enable_all,
 	.enable			= amd_pmu_enable_event,
-- 
1.7.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-19 21:20 ` [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample Robert Richter
@ 2010-05-19 22:02   ` Corey Ashford
  2010-05-20  6:51     ` Ingo Molnar
  2010-05-20 22:46     ` Robert Richter
  2010-05-20  8:10   ` Stephane Eranian
  1 sibling, 2 replies; 27+ messages in thread
From: Corey Ashford @ 2010-05-19 22:02 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, LKML, Lin Ming

Hi Robert,

On 5/19/2010 2:20 PM, Robert Richter wrote:
> This patch introduces a method to specify the type of a raw sample.
> This can be used to setup hardware events other than generic
> performance counters by passing special config data to the pmu. The
> config data can be interpreted different from generic events and thus
> can be used for other purposes.
> 
> The raw_type attribute is an extension of the ABI. It reuses the
> unused bp_type space for this. Generic performance counters can be
> setup by setting the raw_type attribute to null. Thus special raw
> events must have a type other than null.
> 
> Raw types can be defined as needed for cpu models or architectures.
> To keep backward compatibility all architectures must return an error
> for an event with a raw_type other than null that is not supported.
> 
> E.g., raw_type can be used to setup IBS on an AMD cpu. IBS is not
> common to pmu features from other vendors or architectures. The pmu
> must be setup with a special config value. Sample data is returned in
> a certain format back to the userland. An IBS event is created by
> setting a raw event and encoding the IBS type in raw_type. The pmu
> handles this raw event then and passes raw sample data back.
> 
> Raw type could be architecure specific, e.g. for x86:
> 
> enum perf_raw_type {
>         PERF_RAW_PERFCTR                        = 0,
>         PERF_RAW_IBS_FETCH                      = 1,
>         PERF_RAW_IBS_OP                         = 2,
> 
>         PERF_RAW_MAX,
> };
> 
> Null is the architecture's default, meaning for x86 a perfctr.
> 
> Maybe the raw type definition could also be part of the ABI with one
> definition for all architectures.
> 
> To use raw events with perf, the raw event syntax could be suffixed by
> the type (as for breakpoints):
> 
>    -e rNNN[:TYPE]
> 
> Example:
> 
>  perf record -e r186A:1          # ... meaning IBS fetch, cycle count 100000
>  perf record -e r0:1 -c 100000   # ... the same
> 
> Or with named types:
> 
>  perf record -e r186A:IBS_FETCH ...
>  perf record -e r0:IBS_FETCH -c 100000 ...

Should this raw value have been 186A0 instead of 186A?

Where is the named type translation coming from?  Is this something that needs to be hard-coded into perf?

Have you looked at Lin Ming's patch series?  I think it offers another way to support IBS and other arch-specific and off-chip PMUs in a more general way, though it's not quite fully-baked yet.

- Corey


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-19 22:02   ` Corey Ashford
@ 2010-05-20  6:51     ` Ingo Molnar
  2010-05-20 23:06       ` Robert Richter
  2010-05-20 22:46     ` Robert Richter
  1 sibling, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2010-05-20  6:51 UTC (permalink / raw)
  To: Corey Ashford
  Cc: Robert Richter, Peter Zijlstra, Stephane Eranian, LKML, Lin Ming


* Corey Ashford <cjashfor@linux.vnet.ibm.com> wrote:

> [...]
> 
> Have you looked at Lin Ming's patch series?  I think it 
> offers another way to support IBS and other 
> arch-specific and off-chip PMUs in a more general way, 
> though it's not quite fully-baked yet.

Robert, check out this thread on lkml:

  [RFC][PATCH v2 06/11] perf: core, export pmus via sysfs

Here is Peter's post that describes the high level design:

  http://groups.google.com/group/linux.kernel/msg/ab9aa075016c639e

And please help out Lin Ming with this work - we dont want 
to add new, special-purpose ABI extensions for enumeration 
purposes, it should be introduced using a new event_source 
node via the sysfs enumeration.

Whether IBS fits that category is an open question - 
ideally it should have similar high level usage as the 
PEBS/LBR bits on Intel CPUs. (at least as far as tooling 
goes)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-19 21:20 ` [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample Robert Richter
  2010-05-19 22:02   ` Corey Ashford
@ 2010-05-20  8:10   ` Stephane Eranian
  2010-05-20  9:23     ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Stephane Eranian @ 2010-05-20  8:10 UTC (permalink / raw)
  To: Robert Richter; +Cc: Peter Zijlstra, Ingo Molnar, LKML

Robert,

I still don't understand why you need all of this to encode IBS.
I still believe that with attr.config there is plenty of bits to choose
from. I do understand the need for PERF_SAMPLE_RAW. I think
there is no other way.

You simply need to pick an encoding to mark the config as IBS. You
need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
So you could use bits 62-63 for instance. You don't need to encode
the sampling period in attr.config for either IBS. You can use
attr.sample_period, so you free up 16 bits.

I understand that IBS may evolve and thus may use more bits. But
you still have at least 16 bits of margin.

Users and tools would rely on an library to provide the event encoding.
No need to come up with some raw hex number on the cmdline.

On Wed, May 19, 2010 at 11:20 PM, Robert Richter <robert.richter@amd.com> wrote:
> This patch introduces a method to specify the type of a raw sample.
> This can be used to setup hardware events other than generic
> performance counters by passing special config data to the pmu. The
> config data can be interpreted different from generic events and thus
> can be used for other purposes.
>
> The raw_type attribute is an extension of the ABI. It reuses the
> unused bp_type space for this. Generic performance counters can be
> setup by setting the raw_type attribute to null. Thus special raw
> events must have a type other than null.
>
> Raw types can be defined as needed for cpu models or architectures.
> To keep backward compatibility all architectures must return an error
> for an event with a raw_type other than null that is not supported.
>
> E.g., raw_type can be used to setup IBS on an AMD cpu. IBS is not
> common to pmu features from other vendors or architectures. The pmu
> must be setup with a special config value. Sample data is returned in
> a certain format back to the userland. An IBS event is created by
> setting a raw event and encoding the IBS type in raw_type. The pmu
> handles this raw event then and passes raw sample data back.
>
> Raw type could be architecure specific, e.g. for x86:
>
> enum perf_raw_type {
>        PERF_RAW_PERFCTR                        = 0,
>        PERF_RAW_IBS_FETCH                      = 1,
>        PERF_RAW_IBS_OP                         = 2,
>
>        PERF_RAW_MAX,
> };
>
> Null is the architecture's default, meaning for x86 a perfctr.
>
> Maybe the raw type definition could also be part of the ABI with one
> definition for all architectures.
>
> To use raw events with perf, the raw event syntax could be suffixed by
> the type (as for breakpoints):
>
>   -e rNNN[:TYPE]
>
> Example:
>
>  perf record -e r186A:1          # ... meaning IBS fetch, cycle count 100000
>  perf record -e r0:1 -c 100000   # ... the same
>
> Or with named types:
>
>  perf record -e r186A:IBS_FETCH ...
>  perf record -e r0:IBS_FETCH -c 100000 ...
>
> This solution has a number of advantages: A raw event type may be
> specified without to encode the type in the config value. The attr
> flags are not 'polluted'. We can follow the already existing
> breakpoint concept in syntax and encoding.
>
> Signed-off-by: Robert Richter <robert.richter@amd.com>
> ---
>  arch/arm/kernel/perf_event.c             |    5 +++++
>  arch/powerpc/kernel/perf_event.c         |    2 ++
>  arch/powerpc/kernel/perf_event_fsl_emb.c |    2 ++
>  arch/sh/kernel/perf_event.c              |    2 ++
>  arch/x86/kernel/cpu/perf_event.c         |    5 ++++-
>  arch/x86/kernel/cpu/perf_event_amd.c     |    3 +++
>  arch/x86/kernel/cpu/perf_event_intel.c   |    3 +++
>  arch/x86/kernel/cpu/perf_event_p4.c      |    5 +++++
>  include/linux/perf_event.h               |    5 ++++-
>  9 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index 9e70f20..73d680c 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -388,6 +388,11 @@ __hw_perf_event_init(struct perf_event *event)
>        } else if (PERF_TYPE_HW_CACHE == event->attr.type) {
>                mapping = armpmu_map_cache_event(event->attr.config);
>        } else if (PERF_TYPE_RAW == event->attr.type) {
> +               if (event->attr.raw_type) {
> +                       pr_debug("invalid raw type %x\n",
> +                                event->attr.raw_type);
> +                       return -EINVAL;
> +               }
>                mapping = armpmu->raw_event(event->attr.config);
>        } else {
>                pr_debug("event type %x not supported\n", event->attr.type);
> diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
> index 43b83c3..c8fb3cf 100644
> --- a/arch/powerpc/kernel/perf_event.c
> +++ b/arch/powerpc/kernel/perf_event.c
> @@ -1036,6 +1036,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
>                        return ERR_PTR(err);
>                break;
>        case PERF_TYPE_RAW:
> +               if (event->attr.raw_type)
> +                       return ERR_PTR(-EINVAL);
>                ev = event->attr.config;
>                break;
>        default:
> diff --git a/arch/powerpc/kernel/perf_event_fsl_emb.c b/arch/powerpc/kernel/perf_event_fsl_emb.c
> index 369872f..7547e96 100644
> --- a/arch/powerpc/kernel/perf_event_fsl_emb.c
> +++ b/arch/powerpc/kernel/perf_event_fsl_emb.c
> @@ -452,6 +452,8 @@ const struct pmu *hw_perf_event_init(struct perf_event *event)
>                break;
>
>        case PERF_TYPE_RAW:
> +               if (event->attr.raw_type)
> +                       return ERR_PTR(-EINVAL);
>                ev = event->attr.config;
>                break;
>
> diff --git a/arch/sh/kernel/perf_event.c b/arch/sh/kernel/perf_event.c
> index 81b6de4..482cf48 100644
> --- a/arch/sh/kernel/perf_event.c
> +++ b/arch/sh/kernel/perf_event.c
> @@ -142,6 +142,8 @@ static int __hw_perf_event_init(struct perf_event *event)
>
>        switch (attr->type) {
>        case PERF_TYPE_RAW:
> +               if (attr->raw_type)
> +                       return -EINVAL;
>                config = attr->config & sh_pmu->raw_event_mask;
>                break;
>        case PERF_TYPE_HW_CACHE:
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index fd4db0d..3539b53 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -449,8 +449,11 @@ static int x86_setup_perfctr(struct perf_event *event)
>                        return -EOPNOTSUPP;
>        }
>
> -       if (attr->type == PERF_TYPE_RAW)
> +       if (attr->type == PERF_TYPE_RAW) {
> +               if (attr->raw_type)
> +                       return -EINVAL;
>                return 0;
> +       }
>
>        if (attr->type == PERF_TYPE_HW_CACHE)
>                return set_ext_hw_attr(hwc, attr);
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index 611df11..87e5ae4 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -121,6 +121,9 @@ static int amd_pmu_hw_config(struct perf_event *event)
>        if (event->attr.type != PERF_TYPE_RAW)
>                return 0;
>
> +       if (event->attr.raw_type)
> +               return -EINVAL;
> +
>        event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
>
>        return 0;
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index fdbc652..d15faf5 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -770,6 +770,9 @@ static int intel_pmu_hw_config(struct perf_event *event)
>        if (event->attr.type != PERF_TYPE_RAW)
>                return 0;
>
> +       if (event->attr.raw_type)
> +               return -EINVAL;
> +
>        if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
>                return 0;
>
> diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
> index 87e1803..1001892 100644
> --- a/arch/x86/kernel/cpu/perf_event_p4.c
> +++ b/arch/x86/kernel/cpu/perf_event_p4.c
> @@ -437,6 +437,11 @@ static int p4_hw_config(struct perf_event *event)
>                event->hw.config = p4_set_ht_bit(event->hw.config);
>
>        if (event->attr.type == PERF_TYPE_RAW) {
> +               /* only raw perfctr config supported */
> +               if (event->attr.raw_type) {
> +                       rc = -EINVAL;
> +                       goto out;
> +               }
>
>                /* user data may have out-of-bound event index */
>                evnt = p4_config_unpack_event(event->attr.config);
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index fe50347..f9d2d5e 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -222,7 +222,10 @@ struct perf_event_attr {
>                __u32           wakeup_watermark; /* bytes before wakeup   */
>        };
>
> -       __u32                   bp_type;
> +       union {
> +               __u32           bp_type;
> +               __u32           raw_type;
> +       };
>        __u64                   bp_addr;
>        __u64                   bp_len;
>  };
> --
> 1.7.1
>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-20  8:10   ` Stephane Eranian
@ 2010-05-20  9:23     ` Peter Zijlstra
  2010-05-20  9:42       ` Stephane Eranian
  2010-05-20 13:58       ` Robert Richter
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-05-20  9:23 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Robert Richter, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On Thu, 2010-05-20 at 10:10 +0200, Stephane Eranian wrote:
> I still don't understand why you need all of this to encode IBS.
> I still believe that with attr.config there is plenty of bits to choose
> from. I do understand the need for PERF_SAMPLE_RAW. I think
> there is no other way.
> 
> You simply need to pick an encoding to mark the config as IBS. You
> need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
> Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
> So you could use bits 62-63 for instance. You don't need to encode
> the sampling period in attr.config for either IBS. You can use
> attr.sample_period, so you free up 16 bits.
> 
> I understand that IBS may evolve and thus may use more bits. But
> you still have at least 16 bits of margin.
> 
> Users and tools would rely on an library to provide the event encoding.
> No need to come up with some raw hex number on the cmdline. 

No need for any of that afaict, how about:

For Instruction-Fetch:

  0:15 sample-period (r/w)
 16:31 cnt           (r/w)
 32:47 latency       (r/w)
    48 enable        (r/w)
    49 valid         (r/w)
 50:56               (ro)
    57 randomized    (r/w)


For Instruction-Execution:

  0:15 sample-period (r/w)
    17 enable        (r/w)
    18 valid         (r/w)

So if we add perf_event_attr::latency (can also be used for
PEBS-load-latency, can Sparc/PowerPC/ARM/SH use this too?), we can
encode these IBS things as:

  0x87 Instruction Fetch Stall -- Ins-Fetch 
  0xC0 Retired Instructions    -- Ins-Exec
  
When we set perf_event_attr::precise > 0

The Ins-Exec will have to re-construct the actual event->count by adding
sample-period on each interrupt, as it seems we lack an actual counter
in hardware.

Furthermore, these counters will have to deal with sample-period > 2^16
by 'ignoring' interrupts until we get ->period_left down to 0.

The extra data could possibly be exposed through attaching non-sampling
group events and using SAMPLE_READ, like L1-misses, although
reconstructing the count from just one bit seems 'interesting'. 

The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
replacing pt_regs->ip I guess.

IbsDcLinAd goes into PERF_SAMPLE_ADDR


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-20  9:23     ` Peter Zijlstra
@ 2010-05-20  9:42       ` Stephane Eranian
  2010-05-20 10:37         ` Peter Zijlstra
  2010-05-20 13:58       ` Robert Richter
  1 sibling, 1 reply; 27+ messages in thread
From: Stephane Eranian @ 2010-05-20  9:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On Thu, May 20, 2010 at 11:23 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-05-20 at 10:10 +0200, Stephane Eranian wrote:
>> I still don't understand why you need all of this to encode IBS.
>> I still believe that with attr.config there is plenty of bits to choose
>> from. I do understand the need for PERF_SAMPLE_RAW. I think
>> there is no other way.
>>
>> You simply need to pick an encoding to mark the config as IBS. You
>> need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
>> Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
>> So you could use bits 62-63 for instance. You don't need to encode
>> the sampling period in attr.config for either IBS. You can use
>> attr.sample_period, so you free up 16 bits.
>>
>> I understand that IBS may evolve and thus may use more bits. But
>> you still have at least 16 bits of margin.
>>
>> Users and tools would rely on an library to provide the event encoding.
>> No need to come up with some raw hex number on the cmdline.
>
> No need for any of that afaict, how about:
>
> For Instruction-Fetch:
>
>  0:15 sample-period (r/w)
>  16:31 cnt           (r/w)
>  32:47 latency       (r/w)
>    48 enable        (r/w)
>    49 valid         (r/w)
>  50:56               (ro)
>    57 randomized    (r/w)
>
Your are mixing output and input parameters.

The only input parameters you have are:
- sample-period, enable, random
The rest is output only.

So I would say: you don't need to encode anything
sample-period -> attr.sample_period
random           -> attr.random_width (once we add that)

>
> For Instruction-Execution:
>
>  0:15 sample-period (r/w)
>    17 enable        (r/w)
>    18 valid         (r/w)
>
Same thing here, sample-period is the only input parameter.
so sample-period -> attr.sample_period

Both IBFETCHCTL, IBSOP would also be included in
the PERF_SAMPLE_RAW. Because they have the valid
bits and latency.


> So if we add perf_event_attr::latency (can also be used for
> PEBS-load-latency, can Sparc/PowerPC/ARM/SH use this too?), we can

I suspect you can encode the latency with the event code used for PEBS
load-latency (PEBS-LL). It can only be used with one event.

> encode these IBS things as:
>
>  0x87 Instruction Fetch Stall -- Ins-Fetch
>  0xC0 Retired Instructions    -- Ins-Exec
>
I think those events do not map to the behavior of IBS. We have
add that discussion before.

> When we set perf_event_attr::precise > 0
>
Well, then you'd have to document that when you do that you use IBS and
that the event do not count the same things.

> The Ins-Exec will have to re-construct the actual event->count by adding
> sample-period on each interrupt, as it seems we lack an actual counter
> in hardware.
>
For what? counting mode?

> Furthermore, these counters will have to deal with sample-period > 2^16
> by 'ignoring' interrupts until we get ->period_left down to 0.
>
Well, it's not 2^16, it's 2^20 but bottom 4 bits must be zero.
What about simply failing perf_event_open() is sample_period does not fit the
constraint?

> The extra data could possibly be exposed through attaching non-sampling
> group events and using SAMPLE_READ, like L1-misses, although
> reconstructing the count from just one bit seems 'interesting'.
>
> The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
> replacing pt_regs->ip I guess.
>
> IbsDcLinAd goes into PERF_SAMPLE_ADDR
>
What about the rest, the TLB, alignment, data sources?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-20  9:42       ` Stephane Eranian
@ 2010-05-20 10:37         ` Peter Zijlstra
  2010-05-20 12:13           ` Stephane Eranian
  2010-05-20 14:08           ` Robert Richter
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-05-20 10:37 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Robert Richter, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On Thu, 2010-05-20 at 11:42 +0200, Stephane Eranian wrote:

> > For Instruction-Fetch:

> >  32:47 latency       (r/w)

> Your are mixing output and input parameters.
> 
> The only input parameters you have are:
> - sample-period, enable, random
> The rest is output only.

Ah, my bad, I thought it was a r/w field.

> > encode these IBS things as:
> >
> >  0x87 Instruction Fetch Stall -- Ins-Fetch
> >  0xC0 Retired Instructions    -- Ins-Exec
> >
> I think those events do not map to the behavior of IBS. We have
> add that discussion before.

Hrm,. so there are no regular events that count the same thing as the
IBS things? That really sucks.

So yeah, you might as well expose it as a whole separate PMU using Lin's
stuff.

> > The Ins-Exec will have to re-construct the actual event->count by adding
> > sample-period on each interrupt, as it seems we lack an actual counter
> > in hardware.
> >
> For what? counting mode?

Yeah, events are supposed to count.

> > Furthermore, these counters will have to deal with sample-period > 2^16
> > by 'ignoring' interrupts until we get ->period_left down to 0.
> >
> Well, it's not 2^16, it's 2^20 but bottom 4 bits must be zero.
> What about simply failing perf_event_open() is sample_period does not fit the
> constraint?

Why, its simple enough to ignore a few interrupts, we do the same for
all other implementations.

> > The extra data could possibly be exposed through attaching non-sampling
> > group events and using SAMPLE_READ, like L1-misses, although
> > reconstructing the count from just one bit seems 'interesting'.
> >
> > The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
> > replacing pt_regs->ip I guess.
> >
> > IbsDcLinAd goes into PERF_SAMPLE_ADDR
> >
> What about the rest, the TLB, alignment, data sources?

Dunno, reconstruct sensible counters? Surely the software that uses IBS
does something useful with that data? What does libpfm do with the IBS
data?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-20 10:37         ` Peter Zijlstra
@ 2010-05-20 12:13           ` Stephane Eranian
  2010-05-20 15:22             ` Robert Richter
  2010-05-20 14:08           ` Robert Richter
  1 sibling, 1 reply; 27+ messages in thread
From: Stephane Eranian @ 2010-05-20 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On Thu, May 20, 2010 at 12:37 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2010-05-20 at 11:42 +0200, Stephane Eranian wrote:
>
>> > For Instruction-Fetch:
>
>> >  32:47 latency       (r/w)
>
>> Your are mixing output and input parameters.
>>
>> The only input parameters you have are:
>> - sample-period, enable, random
>> The rest is output only.
>
> Ah, my bad, I thought it was a r/w field.
>
>> > encode these IBS things as:
>> >
>> >  0x87 Instruction Fetch Stall -- Ins-Fetch
>> >  0xC0 Retired Instructions    -- Ins-Exec
>> >
>> I think those events do not map to the behavior of IBS. We have
>> add that discussion before.
>
> Hrm,. so there are no regular events that count the same thing as the
> IBS things? That really sucks.
>
> So yeah, you might as well expose it as a whole separate PMU using Lin's
> stuff.

What's wrong with creating pseudo-events for IBS? We'd have to pick
two unused event codes. That would have the advantage of making it
explicit you're using IBS. I think we could still use the precise_ip field
if people are only interested in the IP. They would use PERF_SAMPLE_RAW
if they need more.

>
>> > The Ins-Exec will have to re-construct the actual event->count by adding
>> > sample-period on each interrupt, as it seems we lack an actual counter
>> > in hardware.
>> >
>> For what? counting mode?
>
> Yeah, events are supposed to count.
>
IBS is a sampling only feature. I suspect it would be okay to return 0 here
or do as you said, count the number of IBS interrupts and multiply by the
sampling period.

>> > Furthermore, these counters will have to deal with sample-period > 2^16
>> > by 'ignoring' interrupts until we get ->period_left down to 0.
>> >
>> Well, it's not 2^16, it's 2^20 but bottom 4 bits must be zero.
>> What about simply failing perf_event_open() is sample_period does not fit the
>> constraint?
>
> Why, its simple enough to ignore a few interrupts, we do the same for
> all other implementations.
>
>> > The extra data could possibly be exposed through attaching non-sampling
>> > group events and using SAMPLE_READ, like L1-misses, although
>> > reconstructing the count from just one bit seems 'interesting'.
>> >
>> > The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
>> > replacing pt_regs->ip I guess.
>> >
>> > IbsDcLinAd goes into PERF_SAMPLE_ADDR
>> >
>> What about the rest, the TLB, alignment, data sources?
>
> Dunno, reconstruct sensible counters? Surely the software that uses IBS
> does something useful with that data? What does libpfm do with the IBS
> data?
>
The common usage model is you gather the IBSop data (all of it), you save
samples into a file and then you have scripts that extract whatever fields they
need to compute the metric you want. For instance, if you want data cache misses
you extract [IP, data address, data source, miss latency], if you care about
instruction latencies, you extract [IP, tag2ret, comp2ret], and so on.

Libpfm does not handle IBS output. Its goal is to help applications setup
the events/counters. With perf_events, it does the mapping from symbolic event
names+attributes -> struct perf_event_attr. It does not make any perf_event
syscalls for you.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20  9:23     ` Peter Zijlstra
  2010-05-20  9:42       ` Stephane Eranian
@ 2010-05-20 13:58       ` Robert Richter
  2010-05-20 14:14         ` Stephane Eranian
  1 sibling, 1 reply; 27+ messages in thread
From: Robert Richter @ 2010-05-20 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On 20.05.10 05:23:33, Peter Zijlstra wrote:
> On Thu, 2010-05-20 at 10:10 +0200, Stephane Eranian wrote:
> > I still don't understand why you need all of this to encode IBS.
> > I still believe that with attr.config there is plenty of bits to choose
> > from. I do understand the need for PERF_SAMPLE_RAW. I think
> > there is no other way.
> > 
> > You simply need to pick an encoding to mark the config as IBS. You
> > need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
> > Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
> > So you could use bits 62-63 for instance. You don't need to encode
> > the sampling period in attr.config for either IBS. You can use
> > attr.sample_period, so you free up 16 bits.
> > 
> > I understand that IBS may evolve and thus may use more bits. But
> > you still have at least 16 bits of margin.

We have some bits in the msrs that are reserved now, for perfctr and
also for ibs. We could start reusing it to mark a special sample type
and so on. So far, ok. Somewhen in the future there is a hw extensions
and these bits are not reserved anymore, what now?

In case the register layout changes, you start workaround this
shifting and masking bits back and forth. This will end in a mess. We
don't know what will change and when, but *reserved* means it could be
subject to change. So, actually you have to assume arbitrary 64 bit
config values for perfctrs and ibs. Thus, you cannot use reserved bits
for encoding, doing this would be a hack.

The approach is also not extendable for new pmu features. The question
will rise again then how to encode it.

So encoding this information in some other attribute like raw_type or
any other field is a much cleaner and a more stable solution, easier
to program and to maintain. And there is enough space in the attribute
structure.

> > Users and tools would rely on an library to provide the event encoding.
> > No need to come up with some raw hex number on the cmdline. 

We can pack everything in tools and libraries and hide direct access
to the user. Now, one reads the IBS spec and wants to start using
it. He will have to learn (by reading the source or documentation)
now, how to pass parameters using the tool or library to set certain
bits in the registers. This adds an additional i/f layer what needs to
be defined, documented, implemented and maintained. Why not keep
things simple and let the user pass a register value to the pmu? Only
the config has to be validated in the kernel and that's it. The hw
spec is then the reference for using the interface. We have already
raw config values for counters and there was a reason for it.

> For Instruction-Fetch:
> 
>   0:15 sample-period (r/w)
>     57 randomized    (r/w)

> For Instruction-Execution:
> 
>   0:15 sample-period (r/w)

>   0x87 Instruction Fetch Stall -- Ins-Fetch 
>   0xC0 Retired Instructions    -- Ins-Exec

I agree, where possible we should reuse attributes or existing events
for configuration. But since the pmu behavior differs, there is the
risk of mixing different things together. "precise" would be different
on Intel and AMD. This will be hidden to the user leading to wrong
assumptions and results. And, what about new options that don't have
equivalent attribute flags?

But it's not the question how to pass the config, the question is how
to mark the event configuration as a certain IBS event.

> Furthermore, these counters will have to deal with sample-period > 2^16
> by 'ignoring' interrupts until we get ->period_left down to 0.

No question here, ibs counters should be extended to 64 bit counters,
but this is not addressed with this patch set, I have kept this for
later work.

> The extra data could possibly be exposed through attaching non-sampling
> group events and using SAMPLE_READ, like L1-misses, although
> reconstructing the count from just one bit seems 'interesting'. 
> 
> The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
> replacing pt_regs->ip I guess.
> 
> IbsDcLinAd goes into PERF_SAMPLE_ADDR

It's the same for sampling data as with config data. We can spread it
all over its data structures, but a user will then recollect all of it
again needing to know where the information can be found. And, it will
never be the same kind of information if the pmus are different.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 10:37         ` Peter Zijlstra
  2010-05-20 12:13           ` Stephane Eranian
@ 2010-05-20 14:08           ` Robert Richter
  2010-05-20 16:55             ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Robert Richter @ 2010-05-20 14:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On 20.05.10 06:37:39, Peter Zijlstra wrote:

> So yeah, you might as well expose it as a whole separate PMU using Lin's
> stuff.

This seems to be the most promising alternative to raw_type. But it's
also much more bloated requiring sysfs and additional file
descriptors.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-20 13:58       ` Robert Richter
@ 2010-05-20 14:14         ` Stephane Eranian
  2010-05-20 14:30           ` Stephane Eranian
  2010-05-20 15:48           ` Robert Richter
  0 siblings, 2 replies; 27+ messages in thread
From: Stephane Eranian @ 2010-05-20 14:14 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On Thu, May 20, 2010 at 3:58 PM, Robert Richter <robert.richter@amd.com> wrote:
> On 20.05.10 05:23:33, Peter Zijlstra wrote:
>> On Thu, 2010-05-20 at 10:10 +0200, Stephane Eranian wrote:
>> > I still don't understand why you need all of this to encode IBS.
>> > I still believe that with attr.config there is plenty of bits to choose
>> > from. I do understand the need for PERF_SAMPLE_RAW. I think
>> > there is no other way.
>> >
>> > You simply need to pick an encoding to mark the config as IBS. You
>> > need two bits for this: 00 regular counters, 01 IBS Fetch, 10 IBS op.
>> > Regular counters use 43 bits, IBS fetch uses 58, IBS op uses 52.
>> > So you could use bits 62-63 for instance. You don't need to encode
>> > the sampling period in attr.config for either IBS. You can use
>> > attr.sample_period, so you free up 16 bits.
>> >
>> > I understand that IBS may evolve and thus may use more bits. But
>> > you still have at least 16 bits of margin.
>
> We have some bits in the msrs that are reserved now, for perfctr and
> also for ibs. We could start reusing it to mark a special sample type
> and so on. So far, ok. Somewhen in the future there is a hw extensions
> and these bits are not reserved anymore, what now?
>
You will always have the sampling period bits. Those will always be
handled by attr->sample_period. And they correspond to the lower
bits which are also used to encode the event select on regular counters.
So you could identify IBS with special event selects in the lower 8 bits
for instance.

IBS is a very model specific feature, as such the method you use to
encode it may change.



> In case the register layout changes, you start workaround this
> shifting and masking bits back and forth. This will end in a mess. We
> don't know what will change and when, but *reserved* means it could be
> subject to change. So, actually you have to assume arbitrary 64 bit
> config values for perfctrs and ibs. Thus, you cannot use reserved bits
> for encoding, doing this would be a hack.
>
In my new proposal, I am not using reserved bits, I am using bits which
are stored elsewhere in the attr structure.

> The approach is also not extendable for new pmu features. The question
> will rise again then how to encode it.
>
There is no guarantee your scheme will help there either. Nobody knows
the interface to those new features. It may be that you will need more than
64 bits.

> So encoding this information in some other attribute like raw_type or
> any other field is a much cleaner and a more stable solution, easier
> to program and to maintain. And there is enough space in the attribute
> structure.
>
What I am missing with raw_type is how it plays out with attr->type which
already has RAW? Seems like you have a second level of raw.

>> > Users and tools would rely on an library to provide the event encoding.
>> > No need to come up with some raw hex number on the cmdline.
>
> We can pack everything in tools and libraries and hide direct access
> to the user. Now, one reads the IBS spec and wants to start using
> it. He will have to learn (by reading the source or documentation)
> now, how to pass parameters using the tool or library to set certain
> bits in the registers. This adds an additional i/f layer what needs to
> be defined, documented, implemented and maintained. Why not keep
> things simple and let the user pass a register value to the pmu? Only
> the config has to be validated in the kernel and that's it. The hw
> spec is then the reference for using the interface. We have already
> raw config values for counters and there was a reason for it.
>
>> For Instruction-Fetch:
>>
>>   0:15 sample-period (r/w)
>>     57 randomized    (r/w)
>
>> For Instruction-Execution:
>>
>>   0:15 sample-period (r/w)
>
>>   0x87 Instruction Fetch Stall -- Ins-Fetch
>>   0xC0 Retired Instructions    -- Ins-Exec
>
> I agree, where possible we should reuse attributes or existing events
> for configuration. But since the pmu behavior differs, there is the
> risk of mixing different things together. "precise" would be different
> on Intel and AMD. This will be hidden to the user leading to wrong
> assumptions and results. And, what about new options that don't have
> equivalent attribute flags?
>
I agree.

> But it's not the question how to pass the config, the question is how
> to mark the event configuration as a certain IBS event.
>
Yes, we agree on that.

>> Furthermore, these counters will have to deal with sample-period > 2^16
>> by 'ignoring' interrupts until we get ->period_left down to 0.
>
> No question here, ibs counters should be extended to 64 bit counters,
> but this is not addressed with this patch set, I have kept this for
> later work.
>
That's fine with me.

>> The extra data could possibly be exposed through attaching non-sampling
>> group events and using SAMPLE_READ, like L1-misses, although
>> reconstructing the count from just one bit seems 'interesting'.
>>
>> The IbsFetchLinAd/IbsOpRip would go straight into PERF_SAMPLE_IP by
>> replacing pt_regs->ip I guess.
>>
>> IbsDcLinAd goes into PERF_SAMPLE_ADDR
>
> It's the same for sampling data as with config data. We can spread it
> all over its data structures, but a user will then recollect all of it
> again needing to know where the information can be found. And, it will
> never be the same kind of information if the pmus are different.
>
Well, I don't think you want to define an attribute for each and every
bit of info returned by IBS. I think you need to return the IBSDATA regs
as is and let the user pick and choose. Of course, the information can
change, but IBS is model specific, it returns something richer than just
a count.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the  type of a raw sample
  2010-05-20 14:14         ` Stephane Eranian
@ 2010-05-20 14:30           ` Stephane Eranian
  2010-05-20 15:48           ` Robert Richter
  1 sibling, 0 replies; 27+ messages in thread
From: Stephane Eranian @ 2010-05-20 14:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

>> But it's not the question how to pass the config, the question is how
>> to mark the event configuration as a certain IBS event.
>>
> Yes, we agree on that.
>
IBS uses at most 58 out of 64 bits today. We know we can
move sampling period, random out to other fields in attr. In
fact all we would need would be two bits. But to distinguish
with regular events, I think we could reuse the event select
field of 8 bits. So you would be using 8 out of 64. IBS could
go from 58 to 64 = 6 more bits. Those could be encoded in
the config field, where you have > 50 bits left.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 12:13           ` Stephane Eranian
@ 2010-05-20 15:22             ` Robert Richter
  2012-11-23 12:00               ` Robert Richter
  0 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2010-05-20 15:22 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On 20.05.10 08:13:43, Stephane Eranian wrote:
> What's wrong with creating pseudo-events for IBS? We'd have to pick
> two unused event codes. That would have the advantage of making it
> explicit you're using IBS. I think we could still use the precise_ip field
> if people are only interested in the IP. They would use PERF_SAMPLE_RAW
> if they need more.

For some key events this is fine and useful. But, if this has to be
used to programm all IBS features, it will introduce too much
implementation and maintenance effort. What you get then will probably
of less interest since users/tools want to have the raw sample data
for further processing. But maybe we agree already in this point, if
there is a demand we should provide pseudo events for important or
offen used events.

> >> > The Ins-Exec will have to re-construct the actual event->count by adding
> >> > sample-period on each interrupt, as it seems we lack an actual counter
> >> > in hardware.
> >> >
> >> For what? counting mode?
> >
> > Yeah, events are supposed to count.
> >
> IBS is a sampling only feature. I suspect it would be okay to return 0 here
> or do as you said, count the number of IBS interrupts and multiply by the
> sampling period.

True, ibs can be used as an additional counter using the interrupt
only and dropping sampling data. I was already thinking of adding it
somehow to the list of available counters. But this is also later work
as its 64 bit extension.

> > Dunno, reconstruct sensible counters? Surely the software that uses IBS
> > does something useful with that data? What does libpfm do with the IBS
> > data?

Here are some use cases described:

 http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf
 http://developer.amd.com/cpu/CodeAnalyst/assets/ISPASS2010_IBS_CA_abstract.pdf

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 14:14         ` Stephane Eranian
  2010-05-20 14:30           ` Stephane Eranian
@ 2010-05-20 15:48           ` Robert Richter
  1 sibling, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-20 15:48 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Ingo Molnar, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On 20.05.10 10:14:46, Stephane Eranian wrote:

> > We have some bits in the msrs that are reserved now, for perfctr and
> > also for ibs. We could start reusing it to mark a special sample type
> > and so on. So far, ok. Somewhen in the future there is a hw extensions
> > and these bits are not reserved anymore, what now?
> >
> You will always have the sampling period bits. Those will always be
> handled by attr->sample_period. And they correspond to the lower
> bits which are also used to encode the event select on regular counters.
> So you could identify IBS with special event selects in the lower 8 bits
> for instance.
> 
> IBS is a very model specific feature, as such the method you use to
> encode it may change.

> In my new proposal, I am not using reserved bits, I am using bits which
> are stored elsewhere in the attr structure.

Ok, maybe technically it is possible for ibs and perfctrs. You are
moving the sample count from the raw config out to sample_count
attribute, then the type is written to the new free space in the raw
config. In the kernel you do all of this in reverse order. This makes
programming events really difficult and confusing. I don't see an
advantage compared to the use of a raw_type.

> There is no guarantee your scheme will help there either. Nobody knows
> the interface to those new features. It may be that you will need more than
> 64 bits.

Right, for more than 64 bit the need another extension.

> What I am missing with raw_type is how it plays out with attr->type which
> already has RAW? Seems like you have a second level of raw.

"raw_type" is only valid together with (attr->type == PERF_TYPE_RAW).
Currently it is null (bp_type is unused and zeroed). So (raw_type ==
0) will leave everything as it is and uses the archtectures default
raw config. If we set raw_type to a value other than zero, we have a
special configuration for a special pmu feature. It looks sane to me.

> Well, I don't think you want to define an attribute for each and every
> bit of info returned by IBS. I think you need to return the IBSDATA regs
> as is and let the user pick and choose. Of course, the information can
> change, but IBS is model specific, it returns something richer than just
> a count.

I think, we all agree the ibs sample should be send back to userland
also as raw sample.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 14:08           ` Robert Richter
@ 2010-05-20 16:55             ` Ingo Molnar
  2010-05-20 17:07               ` Robert Richter
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2010-05-20 16:55 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Stephane Eranian, LKML, Paul Mackerras,
	David Miller, Will Deacon, Paul Mundt


* Robert Richter <robert.richter@amd.com> wrote:

> On 20.05.10 06:37:39, Peter Zijlstra wrote:
> 
> > So yeah, you might as well expose it as a whole separate PMU using Lin's 
> > stuff.
> 
> This seems to be the most promising alternative to raw_type. But it's also 
> much more bloated requiring sysfs and additional file descriptors.

Not really, fds are really cheap on Linux, and in any case it's only needed 
when the event is created - can be closed afterwards.

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 16:55             ` Ingo Molnar
@ 2010-05-20 17:07               ` Robert Richter
  2010-05-20 17:16                 ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Robert Richter @ 2010-05-20 17:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, LKML, Paul Mackerras,
	David Miller, Will Deacon, Paul Mundt

On 20.05.10 12:55:01, Ingo Molnar wrote:
> Not really, fds are really cheap on Linux, and in any case it's only needed 
> when the event is created - can be closed afterwards.

I was wondering if we could get the perf event fd directly with the
open("/sys/...") command? This would be really easy to set up.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 17:07               ` Robert Richter
@ 2010-05-20 17:16                 ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-05-20 17:16 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, LKML, Paul Mackerras, David Miller,
	Will Deacon, Paul Mundt

On Thu, 2010-05-20 at 19:07 +0200, Robert Richter wrote:
> On 20.05.10 12:55:01, Ingo Molnar wrote:
> > Not really, fds are really cheap on Linux, and in any case it's only needed 
> > when the event is created - can be closed afterwards.
> 
> I was wondering if we could get the perf event fd directly with the
> open("/sys/...") command? This would be really easy to set up.

I guess we could sorta do that, but then we'd need a ioctl to change the
perf_event_attr.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-19 22:02   ` Corey Ashford
  2010-05-20  6:51     ` Ingo Molnar
@ 2010-05-20 22:46     ` Robert Richter
  1 sibling, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-20 22:46 UTC (permalink / raw)
  To: Corey Ashford
  Cc: Peter Zijlstra, Ingo Molnar, Stephane Eranian, LKML, Lin Ming

On 19.05.10 18:02:40, Corey Ashford wrote:

> >  perf record -e r186A:IBS_FETCH ...
> >  perf record -e r0:IBS_FETCH -c 100000 ...
> 
> Should this raw value have been 186A0 instead of 186A?

This is the 20 bit value of the cycle count, but only bits 19:4 are
encoded in bits 15:0 of the raw ibs register. Lower 4 bits of the
cycle count must be null and thus not pushed to the register. The raw
register setup is correct.

> Where is the named type translation coming from?  Is this something
> that needs to be hard-coded into perf?

I was thinking of an enum or macro definition for the values in the
kernel. A name mapping table could be implemented at least in the
userspace, if needed maybe also in the kernel.

> Have you looked at Lin Ming's patch series?  I think it offers
> another way to support IBS and other arch-specific and off-chip PMUs
> in a more general way, though it's not quite fully-baked yet.

Yes, this could be an option too. The proposal was some days ago and a
little hidden in its subject, so I missed it first. But the concept of
registering a pmu feature looks good and could be an alternative to
the raw_value approach.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20  6:51     ` Ingo Molnar
@ 2010-05-20 23:06       ` Robert Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2010-05-20 23:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Corey Ashford, Peter Zijlstra, Stephane Eranian, LKML, Lin Ming

On 20.05.10 02:51:43, Ingo Molnar wrote:

> Whether IBS fits that category is an open question - 
> ideally it should have similar high level usage as the 
> PEBS/LBR bits on Intel CPUs. (at least as far as tooling 
> goes)

Ingo,

high level usage is from the perspective of perf tools. IBS delivers
high detailed diagnostic data that is also processed by userland tools
other than perf. If we would introduce another interface layer it
becomes much harder for such tools to use the pmu via this layer. The
raw data is of much more interest here for users. If we think of
perf_event as a subsystem controlling the pmu in a single instance, it
should provide access to set and get raw pmu registers. I see this as
a necessary feature.

Nevertheless I agree that key features should be exposed also with a
high level interface and if possible in an abstracted non-model
specific way.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample
  2010-05-20 15:22             ` Robert Richter
@ 2012-11-23 12:00               ` Robert Richter
  0 siblings, 0 replies; 27+ messages in thread
From: Robert Richter @ 2012-11-23 12:00 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML

On 20.05.10 17:22:21, Robert Richter wrote:
> > > Dunno, reconstruct sensible counters? Surely the software that uses IBS
> > > does something useful with that data? What does libpfm do with the IBS
> > > data?
> 
> Here are some use cases described:
> 
>  http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf
>  http://developer.amd.com/cpu/CodeAnalyst/assets/ISPASS2010_IBS_CA_abstract.pdf

A copy of this document can be found here:

 http://www.kernel.org/pub/linux/kernel/people/rric/ISPASS2010_IBS_CA_abstract.pdf

-Robert

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-11-23 12:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-19 21:20 [PATCH 0/7] perf: implement AMD IBS (v2) Robert Richter
2010-05-19 21:20 ` [PATCH 1/7] perf: introduce raw_type attribute to specify the type of a raw sample Robert Richter
2010-05-19 22:02   ` Corey Ashford
2010-05-20  6:51     ` Ingo Molnar
2010-05-20 23:06       ` Robert Richter
2010-05-20 22:46     ` Robert Richter
2010-05-20  8:10   ` Stephane Eranian
2010-05-20  9:23     ` Peter Zijlstra
2010-05-20  9:42       ` Stephane Eranian
2010-05-20 10:37         ` Peter Zijlstra
2010-05-20 12:13           ` Stephane Eranian
2010-05-20 15:22             ` Robert Richter
2012-11-23 12:00               ` Robert Richter
2010-05-20 14:08           ` Robert Richter
2010-05-20 16:55             ` Ingo Molnar
2010-05-20 17:07               ` Robert Richter
2010-05-20 17:16                 ` Peter Zijlstra
2010-05-20 13:58       ` Robert Richter
2010-05-20 14:14         ` Stephane Eranian
2010-05-20 14:30           ` Stephane Eranian
2010-05-20 15:48           ` Robert Richter
2010-05-19 21:20 ` [PATCH 2/7] perf, x86: introduce bit range for special pmu events Robert Richter
2010-05-19 21:20 ` [PATCH 3/7] perf, x86: modify some code to allow the introduction of ibs events Robert Richter
2010-05-19 21:20 ` [PATCH 4/7] perf, x86: implement IBS feature detection Robert Richter
2010-05-19 21:20 ` [PATCH 5/7] perf, x86: setup NMI handler for IBS Robert Richter
2010-05-19 21:20 ` [PATCH 6/7] perf, x86: implement AMD IBS event configuration Robert Richter
2010-05-19 21:20 ` [PATCH 7/7] perf, x86: implement the ibs interrupt handler Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).