* [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct
@ 2022-08-31 14:55 kan.liang
2022-08-31 14:55 ` [PATCH 1/6] perf: Add sample_flags to indicate the PMU-filled sample data kan.liang
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The patch series is to fix PEBS timestamps overwritten and improve the
perf_sample_data struct. The detailed discussion can be found at
https://lore.kernel.org/lkml/YwXvGe4%2FQdgGYOKJ@worktop.programming.kicks-ass.net/
The patch series has two changes compared with the suggestions in the
above discussion.
- Only clear the sample flags for the perf_prepare_sample().
The __perf_event_header__init_id is shared between perf_prepare_sample()
(used by PERF_RECORD_SAMPLE) and perf_event_header__init_id() (used by
other PERF_RECORD_* event type). The sample data is only available
for the PERF_RECORD_SAMPLE.
- The CALLCHAIN_EARLY hack is still required for the BPF, especially
perf_event_set_bpf_handler(). The sample data is not available when
the function is invoked.
Kan Liang (6):
perf: Add sample_flags to indicate the PMU-filled sample data
perf/x86/intel/pebs: Fix PEBS timestamps overwritten
perf: Use sample_flags for branch stack
perf: Use sample_flags for weight
perf: Use sample_flags for data_src
perf: Use sample_flags for txn
arch/powerpc/perf/core-book3s.c | 10 ++++++---
arch/x86/events/core.c | 4 +++-
arch/x86/events/intel/core.c | 4 +++-
arch/x86/events/intel/ds.c | 39 ++++++++++++++++++++++++---------
include/linux/perf_event.h | 15 ++++++-------
kernel/events/core.c | 33 +++++++++++++++++++---------
6 files changed, 72 insertions(+), 33 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/6] perf: Add sample_flags to indicate the PMU-filled sample data
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
@ 2022-08-31 14:55 ` kan.liang
2022-08-31 14:55 ` [PATCH 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten kan.liang
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
On some platforms, some data e.g., timestamps, can be retrieved from
the PMU driver. Usually, the data from the PMU driver is more accurate.
The current perf kernel should output the PMU-filled sample data if
it's available.
To check the availability of the PMU-filled sample data, the current
perf kernel initializes the related fields in the
perf_sample_data_init(). When outputting a sample, the perf checks
whether the field is updated by the PMU driver. If yes, the updated
value will be output. If not, the perf uses an SW way to calculate the
value or just outputs the initialized value if an SW way is unavailable
either.
With more and more data being provided by the PMU driver, more fields
has to be initialized in the perf_sample_data_init(). That will
increase the number of cache lines touched in perf_sample_data_init()
and be harmful to the performance.
Add new "sample_flags" to indicate the PMU-filled sample data. The PMU
driver should set the corresponding PERF_SAMPLE_ flag when the field is
updated. The initialization of the corresponding field is not required
anymore. The following patches will make use of it and remove the
corresponding fields from the perf_sample_data_init(), which will
further minimize the number of cache lines touched.
Only clear the sample flags that have already been done by the PMU
driver in the perf_prepare_sample() for the PERF_RECORD_SAMPLE. For the
other PERF_RECORD_ event type, the sample data is not available.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
include/linux/perf_event.h | 2 ++
kernel/events/core.c | 17 +++++++++++------
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee8b9ecdc03b..b0ebbb1377b9 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1007,6 +1007,7 @@ struct perf_sample_data {
* Fields set by perf_sample_data_init(), group so as to
* minimize the cachelines touched.
*/
+ u64 sample_flags;
u64 addr;
struct perf_raw_record *raw;
struct perf_branch_stack *br_stack;
@@ -1056,6 +1057,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
u64 addr, u64 period)
{
/* remaining struct members initialized in perf_prepare_sample() */
+ data->sample_flags = 0;
data->addr = addr;
data->raw = NULL;
data->br_stack = NULL;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2621fd24ad26..c9b9cb79231a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6794,11 +6794,10 @@ static void perf_aux_sample_output(struct perf_event *event,
static void __perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
- struct perf_event *event)
+ struct perf_event *event,
+ u64 sample_type)
{
- u64 sample_type = event->attr.sample_type;
-
- data->type = sample_type;
+ data->type = event->attr.sample_type;
header->size += event->id_header_size;
if (sample_type & PERF_SAMPLE_TID) {
@@ -6827,7 +6826,7 @@ void perf_event_header__init_id(struct perf_event_header *header,
struct perf_event *event)
{
if (event->attr.sample_id_all)
- __perf_event_header__init_id(header, data, event);
+ __perf_event_header__init_id(header, data, event, event->attr.sample_type);
}
static void __perf_event__output_id_sample(struct perf_output_handle *handle,
@@ -7303,6 +7302,7 @@ void perf_prepare_sample(struct perf_event_header *header,
struct pt_regs *regs)
{
u64 sample_type = event->attr.sample_type;
+ u64 filtered_sample_type;
header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header) + event->header_size;
@@ -7310,7 +7310,12 @@ void perf_prepare_sample(struct perf_event_header *header,
header->misc = 0;
header->misc |= perf_misc_flags(regs);
- __perf_event_header__init_id(header, data, event);
+ /*
+ * Clear the sample flags that have already been done by the
+ * PMU driver.
+ */
+ filtered_sample_type = sample_type & ~data->sample_flags;
+ __perf_event_header__init_id(header, data, event, filtered_sample_type);
if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
data->ip = perf_instruction_pointer(regs);
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
2022-08-31 14:55 ` [PATCH 1/6] perf: Add sample_flags to indicate the PMU-filled sample data kan.liang
@ 2022-08-31 14:55 ` kan.liang
2022-08-31 14:55 ` [PATCH 3/6] perf: Use sample_flags for branch stack kan.liang
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The PEBS TSC-based timestamps do not appear correctly in the final
perf.data output file from perf record.
The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
is later overwritten by perf_events generic code in
perf_prepare_sample(). There is an ordering problem.
Set the sample flags when the data->time is updated by PEBS.
The data->time field will not be overwritten anymore.
Reported-by: Andreas Kogler <andreas.kogler.0x@gmail.com>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/intel/ds.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 6ce73b4ae2f3..3af24c4891fb 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1640,8 +1640,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
- event->attr.use_clockid == 0)
+ event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(pebs->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }
if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
@@ -1702,8 +1704,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
perf_sample_data_init(data, 0, event->hw.last_period);
data->period = event->hw.last_period;
- if (event->attr.use_clockid == 0)
+ if (event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(basic->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }
/*
* We must however always use iregs for the unwinder to stay sane; the
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/6] perf: Use sample_flags for branch stack
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
2022-08-31 14:55 ` [PATCH 1/6] perf: Add sample_flags to indicate the PMU-filled sample data kan.liang
2022-08-31 14:55 ` [PATCH 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten kan.liang
@ 2022-08-31 14:55 ` kan.liang
2022-08-31 22:47 ` Namhyung Kim
2022-08-31 14:55 ` [PATCH 4/6] perf: Use sample_flags for weight kan.liang
` (3 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Use the new sample_flags to indicate whether the branch stack is filled
by the PMU driver.
Remove the br_stack from the perf_sample_data_init() to minimize the number
of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/powerpc/perf/core-book3s.c | 1 +
arch/x86/events/core.c | 4 +++-
arch/x86/events/intel/core.c | 4 +++-
arch/x86/events/intel/ds.c | 5 ++++-
include/linux/perf_event.h | 4 ++--
kernel/events/core.c | 4 ++--
6 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 13919eb96931..1ad1efdb33f9 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2297,6 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
cpuhw = this_cpu_ptr(&cpu_hw_events);
power_pmu_bhrb_read(event, cpuhw);
data.br_stack = &cpuhw->bhrb_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}
if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f969410d0c90..bb34a28fa71b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1714,8 +1714,10 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
perf_sample_data_init(&data, 0, event->hw.last_period);
- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4fce2bdbbf87..36f95894dd1c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3004,8 +3004,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
perf_sample_data_init(&data, 0, event->hw.last_period);
- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 3af24c4891fb..d5f3007af59d 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1645,8 +1645,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
data->sample_flags |= PERF_SAMPLE_TIME;
}
- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data->br_stack = &cpuc->lbr_stack;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }
}
static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1796,6 +1798,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (has_branch_stack(event)) {
intel_pmu_store_pebs_lbrs(lbr);
data->br_stack = &cpuc->lbr_stack;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}
}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b0ebbb1377b9..2aec1765b3d5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1010,7 +1010,6 @@ struct perf_sample_data {
u64 sample_flags;
u64 addr;
struct perf_raw_record *raw;
- struct perf_branch_stack *br_stack;
u64 period;
union perf_sample_weight weight;
u64 txn;
@@ -1020,6 +1019,8 @@ struct perf_sample_data {
* The other fields, optionally {set,used} by
* perf_{prepare,output}_sample().
*/
+ struct perf_branch_stack *br_stack;
+
u64 type;
u64 ip;
struct {
@@ -1060,7 +1061,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->sample_flags = 0;
data->addr = addr;
data->raw = NULL;
- data->br_stack = NULL;
data->period = period;
data->weight.full = 0;
data->data_src.val = PERF_MEM_NA;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c9b9cb79231a..104c0c9f4e6f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7052,7 +7052,7 @@ void perf_output_sample(struct perf_output_handle *handle,
}
if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
- if (data->br_stack) {
+ if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
size_t size;
size = data->br_stack->nr
@@ -7358,7 +7358,7 @@ void perf_prepare_sample(struct perf_event_header *header,
if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
int size = sizeof(u64); /* nr */
- if (data->br_stack) {
+ if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
if (perf_sample_save_hw_index(event))
size += sizeof(u64);
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] perf: Use sample_flags for weight
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
` (2 preceding siblings ...)
2022-08-31 14:55 ` [PATCH 3/6] perf: Use sample_flags for branch stack kan.liang
@ 2022-08-31 14:55 ` kan.liang
2022-08-31 14:55 ` [PATCH 5/6] perf: Use sample_flags for data_src kan.liang
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Use the new sample_flags to indicate whether the weight field is filled
by the PMU driver.
Remove the weight field from the perf_sample_data_init() to minimize the
number of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/powerpc/perf/core-book3s.c | 5 +++--
arch/x86/events/intel/ds.c | 10 +++++++---
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
4 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1ad1efdb33f9..a5c95a2006ea 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2305,9 +2305,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
- ppmu->get_mem_weight)
+ ppmu->get_mem_weight) {
ppmu->get_mem_weight(&data.weight.full, event->attr.sample_type);
-
+ data.sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
} else if (period) {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index d5f3007af59d..e80632a575d1 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1532,8 +1532,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
/*
* Use latency for weight (only avail with PEBS-LL)
*/
- if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE))
+ if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) {
data->weight.full = pebs->lat;
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
/*
* data.data_src encodes the data source
@@ -1625,9 +1627,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
if (x86_pmu.intel_cap.pebs_format >= 2) {
/* Only set the TSX weight when no memory weight. */
- if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll)
+ if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll) {
data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
-
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
if (sample_type & PERF_SAMPLE_TRANSACTION)
data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
pebs->ax);
@@ -1769,6 +1772,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
data->weight.var1_dw = (u32)(weight & PEBS_LATENCY_MASK) ?:
intel_get_tsx_weight(meminfo->tsx_tuning);
}
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}
if (sample_type & PERF_SAMPLE_DATA_SRC)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2aec1765b3d5..c030d1d1c675 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1011,7 +1011,6 @@ struct perf_sample_data {
u64 addr;
struct perf_raw_record *raw;
u64 period;
- union perf_sample_weight weight;
u64 txn;
union perf_mem_data_src data_src;
@@ -1020,6 +1019,7 @@ struct perf_sample_data {
* perf_{prepare,output}_sample().
*/
struct perf_branch_stack *br_stack;
+ union perf_sample_weight weight;
u64 type;
u64 ip;
@@ -1062,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->weight.full = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 104c0c9f4e6f..f0af45db02b3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7408,6 +7408,9 @@ void perf_prepare_sample(struct perf_event_header *header,
header->size += size;
}
+ if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
+ data->weight.full = 0;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] perf: Use sample_flags for data_src
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
` (3 preceding siblings ...)
2022-08-31 14:55 ` [PATCH 4/6] perf: Use sample_flags for weight kan.liang
@ 2022-08-31 14:55 ` kan.liang
2022-08-31 14:55 ` [PATCH 6/6] perf: Use sample_flags for txn kan.liang
2022-08-31 22:42 ` [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct Namhyung Kim
6 siblings, 0 replies; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Use the new sample_flags to indicate whether the data_src field is
filled by the PMU driver.
Remove the data_src field from the perf_sample_data_init() to minimize
the number of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/powerpc/perf/core-book3s.c | 4 +++-
arch/x86/events/intel/ds.c | 8 ++++++--
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a5c95a2006ea..6ec7069e6482 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2301,8 +2301,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
}
if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
- ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src) {
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
+ data.sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }
if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
ppmu->get_mem_weight) {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index e80632a575d1..9a10457ff32a 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1540,8 +1540,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
/*
* data.data_src encodes the data source
*/
- if (sample_type & PERF_SAMPLE_DATA_SRC)
+ if (sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = get_data_src(event, pebs->dse);
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }
/*
* We must however always use iregs for the unwinder to stay sane; the
@@ -1775,8 +1777,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}
- if (sample_type & PERF_SAMPLE_DATA_SRC)
+ if (sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = get_data_src(event, meminfo->aux);
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }
if (sample_type & PERF_SAMPLE_ADDR_TYPE)
data->addr = meminfo->address;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c030d1d1c675..79b44084c15d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1012,7 +1012,6 @@ struct perf_sample_data {
struct perf_raw_record *raw;
u64 period;
u64 txn;
- union perf_mem_data_src data_src;
/*
* The other fields, optionally {set,used} by
@@ -1020,6 +1019,7 @@ struct perf_sample_data {
*/
struct perf_branch_stack *br_stack;
union perf_sample_weight weight;
+ union perf_mem_data_src data_src;
u64 type;
u64 ip;
@@ -1062,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->data_src.val = PERF_MEM_NA;
data->txn = 0;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0af45db02b3..163e2f478e61 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7411,6 +7411,9 @@ void perf_prepare_sample(struct perf_event_header *header,
if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
data->weight.full = 0;
+ if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
+ data->data_src.val = PERF_MEM_NA;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] perf: Use sample_flags for txn
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
` (4 preceding siblings ...)
2022-08-31 14:55 ` [PATCH 5/6] perf: Use sample_flags for data_src kan.liang
@ 2022-08-31 14:55 ` kan.liang
2022-08-31 22:42 ` [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct Namhyung Kim
6 siblings, 0 replies; 12+ messages in thread
From: kan.liang @ 2022-08-31 14:55 UTC (permalink / raw)
To: peterz, acme, mingo, eranian, mpe, linux-kernel
Cc: ak, andreas.kogler.0x, atrajeev, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Use the new sample_flags to indicate whether the txn field is filled by
the PMU driver.
Remove the txn field from the perf_sample_data_init() to minimize the
number of cache lines touched.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/intel/ds.c | 8 ++++++--
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 9a10457ff32a..3c6a68d7fe42 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1633,9 +1633,11 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}
- if (sample_type & PERF_SAMPLE_TRANSACTION)
+ if (sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
pebs->ax);
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
}
/*
@@ -1785,9 +1787,11 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (sample_type & PERF_SAMPLE_ADDR_TYPE)
data->addr = meminfo->address;
- if (sample_type & PERF_SAMPLE_TRANSACTION)
+ if (sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = intel_get_tsx_transaction(meminfo->tsx_tuning,
gprs ? gprs->ax : 0);
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
}
if (format_size & PEBS_DATACFG_XMMS) {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 79b44084c15d..d7c9fdd82bc3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1011,7 +1011,6 @@ struct perf_sample_data {
u64 addr;
struct perf_raw_record *raw;
u64 period;
- u64 txn;
/*
* The other fields, optionally {set,used} by
@@ -1020,6 +1019,7 @@ struct perf_sample_data {
struct perf_branch_stack *br_stack;
union perf_sample_weight weight;
union perf_mem_data_src data_src;
+ u64 txn;
u64 type;
u64 ip;
@@ -1062,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->txn = 0;
}
/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 163e2f478e61..15d27b14c827 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7414,6 +7414,9 @@ void perf_prepare_sample(struct perf_event_header *header,
if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
data->data_src.val = PERF_MEM_NA;
+ if (filtered_sample_type & PERF_SAMPLE_TRANSACTION)
+ data->txn = 0;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
` (5 preceding siblings ...)
2022-08-31 14:55 ` [PATCH 6/6] perf: Use sample_flags for txn kan.liang
@ 2022-08-31 22:42 ` Namhyung Kim
2022-09-01 12:47 ` Liang, Kan
6 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2022-08-31 22:42 UTC (permalink / raw)
To: kan.liang
Cc: peterz, acme, mingo, eranian, mpe, linux-kernel, ak,
andreas.kogler.0x, atrajeev, ravi.bangoria
(Adding Ravi to CC)
On Wed, Aug 31, 2022 at 07:55:08AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The patch series is to fix PEBS timestamps overwritten and improve the
> perf_sample_data struct. The detailed discussion can be found at
> https://lore.kernel.org/lkml/YwXvGe4%2FQdgGYOKJ@worktop.programming.kicks-ass.net/
>
> The patch series has two changes compared with the suggestions in the
> above discussion.
> - Only clear the sample flags for the perf_prepare_sample().
> The __perf_event_header__init_id is shared between perf_prepare_sample()
> (used by PERF_RECORD_SAMPLE) and perf_event_header__init_id() (used by
> other PERF_RECORD_* event type). The sample data is only available
> for the PERF_RECORD_SAMPLE.
> - The CALLCHAIN_EARLY hack is still required for the BPF, especially
> perf_event_set_bpf_handler(). The sample data is not available when
> the function is invoked.
In general, looks good! I'd like to work on the BPF side so that it can
get the sample data for filtering. The previous discussion was at
https://lore.kernel.org/all/CAM9d7cjj0X90=NsvdwaLMGCDVkMJBLAGF_q-+Eqj6b44OAnzoQ@mail.gmail.com/
Thanks,
Namhyung
>
> Kan Liang (6):
> perf: Add sample_flags to indicate the PMU-filled sample data
> perf/x86/intel/pebs: Fix PEBS timestamps overwritten
> perf: Use sample_flags for branch stack
> perf: Use sample_flags for weight
> perf: Use sample_flags for data_src
> perf: Use sample_flags for txn
>
> arch/powerpc/perf/core-book3s.c | 10 ++++++---
> arch/x86/events/core.c | 4 +++-
> arch/x86/events/intel/core.c | 4 +++-
> arch/x86/events/intel/ds.c | 39 ++++++++++++++++++++++++---------
> include/linux/perf_event.h | 15 ++++++-------
> kernel/events/core.c | 33 +++++++++++++++++++---------
> 6 files changed, 72 insertions(+), 33 deletions(-)
>
> --
> 2.35.1
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/6] perf: Use sample_flags for branch stack
2022-08-31 14:55 ` [PATCH 3/6] perf: Use sample_flags for branch stack kan.liang
@ 2022-08-31 22:47 ` Namhyung Kim
2022-09-01 12:42 ` Liang, Kan
0 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2022-08-31 22:47 UTC (permalink / raw)
To: kan.liang
Cc: peterz, acme, mingo, eranian, mpe, linux-kernel, ak,
andreas.kogler.0x, atrajeev, ravi.bangoria
On Wed, Aug 31, 2022 at 07:55:11AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Use the new sample_flags to indicate whether the branch stack is filled
> by the PMU driver.
>
> Remove the br_stack from the perf_sample_data_init() to minimize the number
> of cache lines touched.
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> arch/powerpc/perf/core-book3s.c | 1 +
> arch/x86/events/core.c | 4 +++-
> arch/x86/events/intel/core.c | 4 +++-
> arch/x86/events/intel/ds.c | 5 ++++-
> include/linux/perf_event.h | 4 ++--
> kernel/events/core.c | 4 ++--
> 6 files changed, 15 insertions(+), 7 deletions(-)
Looks like you need to update AMD LBR code in amd_pmu_v2_handle_irq().
Thanks,
Namhyung
>
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 13919eb96931..1ad1efdb33f9 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2297,6 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
> cpuhw = this_cpu_ptr(&cpu_hw_events);
> power_pmu_bhrb_read(event, cpuhw);
> data.br_stack = &cpuhw->bhrb_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> }
>
> if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index f969410d0c90..bb34a28fa71b 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1714,8 +1714,10 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>
> perf_sample_data_init(&data, 0, event->hw.last_period);
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data.br_stack = &cpuc->lbr_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
>
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 4fce2bdbbf87..36f95894dd1c 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3004,8 +3004,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>
> perf_sample_data_init(&data, 0, event->hw.last_period);
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data.br_stack = &cpuc->lbr_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
>
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 3af24c4891fb..d5f3007af59d 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1645,8 +1645,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
> data->sample_flags |= PERF_SAMPLE_TIME;
> }
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data->br_stack = &cpuc->lbr_stack;
> + data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
> }
>
> static void adaptive_pebs_save_regs(struct pt_regs *regs,
> @@ -1796,6 +1798,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
> if (has_branch_stack(event)) {
> intel_pmu_store_pebs_lbrs(lbr);
> data->br_stack = &cpuc->lbr_stack;
> + data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> }
> }
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index b0ebbb1377b9..2aec1765b3d5 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1010,7 +1010,6 @@ struct perf_sample_data {
> u64 sample_flags;
> u64 addr;
> struct perf_raw_record *raw;
> - struct perf_branch_stack *br_stack;
> u64 period;
> union perf_sample_weight weight;
> u64 txn;
> @@ -1020,6 +1019,8 @@ struct perf_sample_data {
> * The other fields, optionally {set,used} by
> * perf_{prepare,output}_sample().
> */
> + struct perf_branch_stack *br_stack;
> +
> u64 type;
> u64 ip;
> struct {
> @@ -1060,7 +1061,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
> data->sample_flags = 0;
> data->addr = addr;
> data->raw = NULL;
> - data->br_stack = NULL;
> data->period = period;
> data->weight.full = 0;
> data->data_src.val = PERF_MEM_NA;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c9b9cb79231a..104c0c9f4e6f 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7052,7 +7052,7 @@ void perf_output_sample(struct perf_output_handle *handle,
> }
>
> if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
> - if (data->br_stack) {
> + if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
> size_t size;
>
> size = data->br_stack->nr
> @@ -7358,7 +7358,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>
> if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
> int size = sizeof(u64); /* nr */
> - if (data->br_stack) {
> + if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
> if (perf_sample_save_hw_index(event))
> size += sizeof(u64);
>
> --
> 2.35.1
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/6] perf: Use sample_flags for branch stack
2022-08-31 22:47 ` Namhyung Kim
@ 2022-09-01 12:42 ` Liang, Kan
0 siblings, 0 replies; 12+ messages in thread
From: Liang, Kan @ 2022-09-01 12:42 UTC (permalink / raw)
To: Namhyung Kim
Cc: peterz, acme, mingo, eranian, mpe, linux-kernel, ak,
andreas.kogler.0x, atrajeev, ravi.bangoria
On 2022-08-31 6:47 p.m., Namhyung Kim wrote:
> On Wed, Aug 31, 2022 at 07:55:11AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Use the new sample_flags to indicate whether the branch stack is filled
>> by the PMU driver.
>>
>> Remove the br_stack from the perf_sample_data_init() to minimize the number
>> of cache lines touched.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>> arch/powerpc/perf/core-book3s.c | 1 +
>> arch/x86/events/core.c | 4 +++-
>> arch/x86/events/intel/core.c | 4 +++-
>> arch/x86/events/intel/ds.c | 5 ++++-
>> include/linux/perf_event.h | 4 ++--
>> kernel/events/core.c | 4 ++--
>> 6 files changed, 15 insertions(+), 7 deletions(-)
>
> Looks like you need to update AMD LBR code in amd_pmu_v2_handle_irq().
Right, the patch is on the top of the 6.0-rc, which doesn't include the
AMD LBR code yet. I will re-base on Peter's perf/core.
Thanks,
Kan
>
> Thanks,
> Namhyung
>
>>
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 13919eb96931..1ad1efdb33f9 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -2297,6 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
>> cpuhw = this_cpu_ptr(&cpu_hw_events);
>> power_pmu_bhrb_read(event, cpuhw);
>> data.br_stack = &cpuhw->bhrb_stack;
>> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
>> }
>>
>> if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index f969410d0c90..bb34a28fa71b 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -1714,8 +1714,10 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>>
>> perf_sample_data_init(&data, 0, event->hw.last_period);
>>
>> - if (has_branch_stack(event))
>> + if (has_branch_stack(event)) {
>> data.br_stack = &cpuc->lbr_stack;
>> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
>> + }
>>
>> if (perf_event_overflow(event, &data, regs))
>> x86_pmu_stop(event, 0);
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 4fce2bdbbf87..36f95894dd1c 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -3004,8 +3004,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>>
>> perf_sample_data_init(&data, 0, event->hw.last_period);
>>
>> - if (has_branch_stack(event))
>> + if (has_branch_stack(event)) {
>> data.br_stack = &cpuc->lbr_stack;
>> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
>> + }
>>
>> if (perf_event_overflow(event, &data, regs))
>> x86_pmu_stop(event, 0);
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 3af24c4891fb..d5f3007af59d 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -1645,8 +1645,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
>> data->sample_flags |= PERF_SAMPLE_TIME;
>> }
>>
>> - if (has_branch_stack(event))
>> + if (has_branch_stack(event)) {
>> data->br_stack = &cpuc->lbr_stack;
>> + data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
>> + }
>> }
>>
>> static void adaptive_pebs_save_regs(struct pt_regs *regs,
>> @@ -1796,6 +1798,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
>> if (has_branch_stack(event)) {
>> intel_pmu_store_pebs_lbrs(lbr);
>> data->br_stack = &cpuc->lbr_stack;
>> + data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
>> }
>> }
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index b0ebbb1377b9..2aec1765b3d5 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1010,7 +1010,6 @@ struct perf_sample_data {
>> u64 sample_flags;
>> u64 addr;
>> struct perf_raw_record *raw;
>> - struct perf_branch_stack *br_stack;
>> u64 period;
>> union perf_sample_weight weight;
>> u64 txn;
>> @@ -1020,6 +1019,8 @@ struct perf_sample_data {
>> * The other fields, optionally {set,used} by
>> * perf_{prepare,output}_sample().
>> */
>> + struct perf_branch_stack *br_stack;
>> +
>> u64 type;
>> u64 ip;
>> struct {
>> @@ -1060,7 +1061,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
>> data->sample_flags = 0;
>> data->addr = addr;
>> data->raw = NULL;
>> - data->br_stack = NULL;
>> data->period = period;
>> data->weight.full = 0;
>> data->data_src.val = PERF_MEM_NA;
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index c9b9cb79231a..104c0c9f4e6f 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -7052,7 +7052,7 @@ void perf_output_sample(struct perf_output_handle *handle,
>> }
>>
>> if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
>> - if (data->br_stack) {
>> + if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
>> size_t size;
>>
>> size = data->br_stack->nr
>> @@ -7358,7 +7358,7 @@ void perf_prepare_sample(struct perf_event_header *header,
>>
>> if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
>> int size = sizeof(u64); /* nr */
>> - if (data->br_stack) {
>> + if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
>> if (perf_sample_save_hw_index(event))
>> size += sizeof(u64);
>>
>> --
>> 2.35.1
>>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct
2022-08-31 22:42 ` [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct Namhyung Kim
@ 2022-09-01 12:47 ` Liang, Kan
2022-09-02 5:24 ` Namhyung Kim
0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2022-09-01 12:47 UTC (permalink / raw)
To: Namhyung Kim
Cc: peterz, acme, mingo, eranian, mpe, linux-kernel, ak,
andreas.kogler.0x, atrajeev, ravi.bangoria
On 2022-08-31 6:42 p.m., Namhyung Kim wrote:
> (Adding Ravi to CC)
>
> On Wed, Aug 31, 2022 at 07:55:08AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The patch series is to fix PEBS timestamps overwritten and improve the
>> perf_sample_data struct. The detailed discussion can be found at
>> https://lore.kernel.org/lkml/YwXvGe4%2FQdgGYOKJ@worktop.programming.kicks-ass.net/
>>
>> The patch series has two changes compared with the suggestions in the
>> above discussion.
>> - Only clear the sample flags for the perf_prepare_sample().
>> The __perf_event_header__init_id is shared between perf_prepare_sample()
>> (used by PERF_RECORD_SAMPLE) and perf_event_header__init_id() (used by
>> other PERF_RECORD_* event type). The sample data is only available
>> for the PERF_RECORD_SAMPLE.
>> - The CALLCHAIN_EARLY hack is still required for the BPF, especially
>> perf_event_set_bpf_handler(). The sample data is not available when
>> the function is invoked.
>
> In general, looks good! I'd like to work on the BPF side so that it can
> get the sample data for filtering.
Thanks Namhyung. I will send out the V2 shortly. I think the BPF work
can be on top of it.
Thanks,
Kan
> The previous discussion was at
>
> https://lore.kernel.org/all/CAM9d7cjj0X90=NsvdwaLMGCDVkMJBLAGF_q-+Eqj6b44OAnzoQ@mail.gmail.com/
>
> Thanks,
> Namhyung
>
>>
>> Kan Liang (6):
>> perf: Add sample_flags to indicate the PMU-filled sample data
>> perf/x86/intel/pebs: Fix PEBS timestamps overwritten
>> perf: Use sample_flags for branch stack
>> perf: Use sample_flags for weight
>> perf: Use sample_flags for data_src
>> perf: Use sample_flags for txn
>>
>> arch/powerpc/perf/core-book3s.c | 10 ++++++---
>> arch/x86/events/core.c | 4 +++-
>> arch/x86/events/intel/core.c | 4 +++-
>> arch/x86/events/intel/ds.c | 39 ++++++++++++++++++++++++---------
>> include/linux/perf_event.h | 15 ++++++-------
>> kernel/events/core.c | 33 +++++++++++++++++++---------
>> 6 files changed, 72 insertions(+), 33 deletions(-)
>>
>> --
>> 2.35.1
>>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct
2022-09-01 12:47 ` Liang, Kan
@ 2022-09-02 5:24 ` Namhyung Kim
0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-09-02 5:24 UTC (permalink / raw)
To: Liang, Kan
Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
Stephane Eranian, Michael Ellerman, linux-kernel, Andi Kleen,
andreas.kogler.0x, Athira Rajeev, Ravi Bangoria
On Thu, Sep 1, 2022 at 5:47 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2022-08-31 6:42 p.m., Namhyung Kim wrote:
> > (Adding Ravi to CC)
> >
> > On Wed, Aug 31, 2022 at 07:55:08AM -0700, kan.liang@linux.intel.com wrote:
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> The patch series is to fix PEBS timestamps overwritten and improve the
> >> perf_sample_data struct. The detailed discussion can be found at
> >> https://lore.kernel.org/lkml/YwXvGe4%2FQdgGYOKJ@worktop.programming.kicks-ass.net/
> >>
> >> The patch series has two changes compared with the suggestions in the
> >> above discussion.
> >> - Only clear the sample flags for the perf_prepare_sample().
> >> The __perf_event_header__init_id is shared between perf_prepare_sample()
> >> (used by PERF_RECORD_SAMPLE) and perf_event_header__init_id() (used by
> >> other PERF_RECORD_* event type). The sample data is only available
> >> for the PERF_RECORD_SAMPLE.
> >> - The CALLCHAIN_EARLY hack is still required for the BPF, especially
> >> perf_event_set_bpf_handler(). The sample data is not available when
> >> the function is invoked.
> >
> > In general, looks good! I'd like to work on the BPF side so that it can
> > get the sample data for filtering.
>
> Thanks Namhyung. I will send out the V2 shortly. I think the BPF work
> can be on top of it.
Yep, I'll work on it once this patchset is merged.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-09-02 5:24 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-31 14:55 [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct kan.liang
2022-08-31 14:55 ` [PATCH 1/6] perf: Add sample_flags to indicate the PMU-filled sample data kan.liang
2022-08-31 14:55 ` [PATCH 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten kan.liang
2022-08-31 14:55 ` [PATCH 3/6] perf: Use sample_flags for branch stack kan.liang
2022-08-31 22:47 ` Namhyung Kim
2022-09-01 12:42 ` Liang, Kan
2022-08-31 14:55 ` [PATCH 4/6] perf: Use sample_flags for weight kan.liang
2022-08-31 14:55 ` [PATCH 5/6] perf: Use sample_flags for data_src kan.liang
2022-08-31 14:55 ` [PATCH 6/6] perf: Use sample_flags for txn kan.liang
2022-08-31 22:42 ` [PATCH 0/6] Add sample_flags to improve the perf_sample_data struct Namhyung Kim
2022-09-01 12:47 ` Liang, Kan
2022-09-02 5:24 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox