* [PATCH 1/2] perf: Add sched_task callback during ctx reschedule
@ 2023-03-28 22:27 kan.liang
2023-03-28 22:27 ` [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record kan.liang
2023-04-06 13:10 ` [PATCH 1/2] perf: Add sched_task callback during ctx reschedule Peter Zijlstra
0 siblings, 2 replies; 7+ messages in thread
From: kan.liang @ 2023-03-28 22:27 UTC (permalink / raw)
To: peterz, mingo, linux-kernel; +Cc: ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Several similar kernel warnings can be triggered,
[56605.607840] CPU0 PEBS record size 0, expected 32, config 0
cpuc->record_size=208
when the below commands are running in parallel for a while on SPR.
while true; do perf record --no-buildid -a --intr-regs=AX -e
cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done &
while true; do perf record -o /tmp/out -W -d -e
'{ld_blocks.store_forward:period=1000000,
MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}'
-c 1037 ./triad; done
*The triad program is just the generation of loads/stores.
The warnings are triggered when an unexpected PEBS record (with a
different config and size) is found.
In a context switch, different events may be applied to the old task and
the new task. The sched_task callback is used to switch the PMU-specific
data. For the PEBS, the callback flushes the DS area and parses the PEBS
records from the old task when schedule out. The new task never sees the
stale PEBS records.
However, the exec() doesn't have similar support. The new command may
see the PEBS records from the old command. In the perf_event_exec(), the
ctx_resched() is invoked to reschedule the context. It may updates the
active events, which may change the global PEBS configuration. The PEBS
record from the old command may have a different config and size from
the new command. The warning is triggered.
The sched_task callback should be invoked in all the places where the
context can be changed. Add the sched_task callback in the ctx_resched()
as well.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
kernel/events/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..0c49183656fd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2642,6 +2642,8 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
ctx_sched_in(ctx, EVENT_FLEXIBLE);
}
+static void perf_ctx_sched_task_cb(struct perf_event_context *ctx, bool sched_in);
+
/*
* We want to maintain the following priority of scheduling:
* - CPU pinned (EVENT_CPU | EVENT_PINNED)
@@ -2680,6 +2682,7 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
perf_ctx_disable(&cpuctx->ctx);
if (task_ctx) {
perf_ctx_disable(task_ctx);
+ perf_ctx_sched_task_cb(task_ctx, false);
task_ctx_sched_out(task_ctx, event_type);
}
@@ -2698,8 +2701,10 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
perf_event_sched_in(cpuctx, task_ctx);
perf_ctx_enable(&cpuctx->ctx);
- if (task_ctx)
+ if (task_ctx) {
+ perf_ctx_sched_task_cb(task_ctx, true);
perf_ctx_enable(task_ctx);
+ }
}
void perf_pmu_resched(struct pmu *pmu)
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
2023-03-28 22:27 [PATCH 1/2] perf: Add sched_task callback during ctx reschedule kan.liang
@ 2023-03-28 22:27 ` kan.liang
2023-03-29 0:34 ` kernel test robot
` (2 more replies)
2023-04-06 13:10 ` [PATCH 1/2] perf: Add sched_task callback during ctx reschedule Peter Zijlstra
1 sibling, 3 replies; 7+ messages in thread
From: kan.liang @ 2023-03-28 22:27 UTC (permalink / raw)
To: peterz, mingo, linux-kernel; +Cc: ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The kernel warning for the unexpected PEBS record can also be observed
during a context switch, when the below commands are running in parallel
for a while on SPR.
while true; do perf record --no-buildid -a --intr-regs=AX -e
cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done &
while true; do perf record -o /tmp/out -W -d -e
'{ld_blocks.store_forward:period=1000000,
MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}'
-c 1037 ./triad; done
*The triad program is just the generation of loads/stores.
The current PEBS code assumes that all the PEBS records in the DS buffer
have the same size, aka cpuc->pebs_record_size. It's true for the most
cases, since the DS buffer is always flushed in every context switch.
However, there is a corner case that breaks the assumption.
A system-wide PEBS event with the large PEBS config may be enabled
during a context switch. Some PEBS records for the system-wide PEBS may
be generated while the old task is sched out but the new one hasn't been
sched in yet. When the new task is sched in, the cpuc->pebs_record_size
may be updated for the per-task PEBS events. So the existing system-wide
PEBS records have a different size from the later PEBS records.
Two methods were considered to fix the issue.
One is to flush the DS buffer for the system-wide PEBS right before the
new task sched in. It has to be done in the generic code via the
sched_task() call back. However, the sched_task() is shared among
different ARCHs. The movement may impact other ARCHs, e.g., AMD BRS
requires the sched_task() is called after the PMU has started on a
ctxswin. The method is dropped.
The other method is implemented here. It doesn't assume that all the
PEBS records have the same size any more. The size from each PEBS record
is used to parse the record. For the previous platform (PEBS format < 4),
which doesn't support adaptive PEBS, there is nothing changed.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
arch/x86/events/intel/ds.c | 31 ++++++++++++++++++++++++++-----
arch/x86/include/asm/perf_event.h | 6 ++++++
2 files changed, 32 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index a2e566e53076..905135a8b99f 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1546,6 +1546,15 @@ static inline u64 get_pebs_status(void *n)
return ((struct pebs_basic *)n)->applicable_counters;
}
+static inline u64 get_pebs_size(void *n)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+ if (x86_pmu.intel_cap.pebs_format < 4)
+ return cpuc->pebs_record_size;
+ return intel_adaptive_pebs_size(((struct pebs_basic *)n)->format_size);
+}
+
#define PERF_X86_EVENT_PEBS_HSW_PREC \
(PERF_X86_EVENT_PEBS_ST_HSW | \
PERF_X86_EVENT_PEBS_LD_HSW | \
@@ -1903,9 +1912,9 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
}
}
- WARN_ONCE(next_record != __pebs + (format_size >> 48),
+ WARN_ONCE(next_record != __pebs + intel_adaptive_pebs_size(format_size),
"PEBS record size %llu, expected %llu, config %llx\n",
- format_size >> 48,
+ intel_adaptive_pebs_size(format_size),
(u64)(next_record - __pebs),
basic->format_size);
}
@@ -1927,7 +1936,7 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
if (base == NULL)
return NULL;
- for (at = base; at < top; at += cpuc->pebs_record_size) {
+ for (at = base; at < top; at += get_pebs_size(at)) {
unsigned long status = get_pebs_status(at);
if (test_bit(bit, (unsigned long *)&status)) {
@@ -2054,7 +2063,7 @@ __intel_pmu_pebs_event(struct perf_event *event,
while (count > 1) {
setup_sample(event, iregs, at, data, regs);
perf_event_output(event, data, regs);
- at += cpuc->pebs_record_size;
+ at += get_pebs_size(at);
at = get_next_pebs_record_by_bit(at, top, bit);
count--;
}
@@ -2278,7 +2287,19 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs, struct perf_sample_d
return;
}
- for (at = base; at < top; at += cpuc->pebs_record_size) {
+ /*
+ * The cpuc->pebs_record_size may be different from the
+ * size of each PEBS record. For example, a system-wide
+ * PEBS event with the large PEBS config may be enabled
+ * during a context switch. Some PEBS records for the
+ * system-wide PEBS may be generated while the old task
+ * is sched out but the new one isn't sched in. When the
+ * new task is sched in, the cpuc->pebs_record_size may
+ * be updated for the per-task PEBS events. So the
+ * existing system-wide PEBS records have a different
+ * size from the later PEBS records.
+ */
+ for (at = base; at < top; at += get_pebs_size(at)) {
u64 pebs_status;
pebs_status = get_pebs_status(at) & cpuc->pebs_enabled;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8fc15ed5e60b..ad5655bb90f6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -386,6 +386,12 @@ static inline bool is_topdown_idx(int idx)
/*
* Adaptive PEBS v4
*/
+#define INTEL_ADAPTIVE_PEBS_SIZE_OFF 48
+
+static inline u64 intel_adaptive_pebs_size(u64 format_size)
+{
+ return (format_size >> INTEL_ADAPTIVE_PEBS_SIZE_OFF);
+}
struct pebs_basic {
u64 format_size;
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
2023-03-28 22:27 ` [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record kan.liang
@ 2023-03-29 0:34 ` kernel test robot
2023-03-29 2:58 ` kernel test robot
2023-04-06 13:13 ` Peter Zijlstra
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2023-03-29 0:34 UTC (permalink / raw)
To: kan.liang, mingo, linux-kernel; +Cc: oe-kbuild-all, ak, eranian, Kan Liang
Hi,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/perf/core]
[also build test WARNING on acme/perf/core linus/master v6.3-rc4 next-20230328]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/kan-liang-linux-intel-com/perf-x86-intel-ds-Use-the-size-from-each-PEBS-record/20230329-064258
patch link: https://lore.kernel.org/r/20230328222735.1367829-2-kan.liang%40linux.intel.com
patch subject: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20230329/202303290854.CQhdHZDG-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/a5988003cfa30fb0c88507d2d124eb551d42e1a6
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review kan-liang-linux-intel-com/perf-x86-intel-ds-Use-the-size-from-each-PEBS-record/20230329-064258
git checkout a5988003cfa30fb0c88507d2d124eb551d42e1a6
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash arch/x86/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303290854.CQhdHZDG-lkp@intel.com/
All warnings (new ones prefixed by >>):
arch/x86/events/intel/ds.c: In function '__intel_pmu_pebs_event':
>> arch/x86/events/intel/ds.c:2042:31: warning: unused variable 'cpuc' [-Wunused-variable]
2042 | struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
| ^~~~
vim +/cpuc +2042 arch/x86/events/intel/ds.c
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2029
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2030 static __always_inline void
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2031 __intel_pmu_pebs_event(struct perf_event *event,
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2032 struct pt_regs *iregs,
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2033 struct perf_sample_data *data,
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2034 void *base, void *top,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2035 int bit, int count,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2036 void (*setup_sample)(struct perf_event *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2037 struct pt_regs *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2038 void *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2039 struct perf_sample_data *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2040 struct pt_regs *))
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2041 {
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 @2042 struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2043 struct hw_perf_event *hwc = &event->hw;
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2044 struct x86_perf_regs perf_regs;
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2045 struct pt_regs *regs = &perf_regs.regs;
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2046 void *at = get_next_pebs_record_by_bit(base, top, bit);
e506d1dac0edb2 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2047 static struct pt_regs dummy_iregs;
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2048
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2049 if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2050 /*
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2051 * Now, auto-reload is only enabled in fixed period mode.
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2052 * The reload value is always hwc->sample_period.
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2053 * May need to change it, if auto-reload is enabled in
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2054 * freq mode later.
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2055 */
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2056 intel_pmu_save_and_restart_reload(event, count);
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2057 } else if (!intel_pmu_save_and_restart(event))
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2058 return;
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2059
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2060 if (!iregs)
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2061 iregs = &dummy_iregs;
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2062
a3d86542de8850 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2015-05-12 2063 while (count > 1) {
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2064 setup_sample(event, iregs, at, data, regs);
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2065 perf_event_output(event, data, regs);
a5988003cfa30f arch/x86/events/intel/ds.c Kan Liang 2023-03-28 2066 at += get_pebs_size(at);
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2067 at = get_next_pebs_record_by_bit(at, top, bit);
a3d86542de8850 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2015-05-12 2068 count--;
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2069 }
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2070
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2071 setup_sample(event, iregs, at, data, regs);
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2072 if (iregs == &dummy_iregs) {
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2073 /*
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2074 * The PEBS records may be drained in the non-overflow context,
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2075 * e.g., large PEBS + context switch. Perf should treat the
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2076 * last record the same as other PEBS records, and doesn't
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2077 * invoke the generic overflow handler.
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2078 */
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2079 perf_event_output(event, data, regs);
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2080 } else {
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2081 /*
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2082 * All but the last records are processed.
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2083 * The last one is left to be able to call the overflow handler.
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2084 */
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2085 if (perf_event_overflow(event, data, regs))
a4eaf7f14675cb arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2010-06-16 2086 x86_pmu_stop(event, 0);
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2087 }
2b0b5c6fe9b383 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2010-04-08 2088 }
2b0b5c6fe9b383 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2010-04-08 2089
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
2023-03-28 22:27 ` [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record kan.liang
2023-03-29 0:34 ` kernel test robot
@ 2023-03-29 2:58 ` kernel test robot
2023-04-06 13:13 ` Peter Zijlstra
2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2023-03-29 2:58 UTC (permalink / raw)
To: kan.liang, mingo, linux-kernel
Cc: llvm, oe-kbuild-all, ak, eranian, Kan Liang
Hi,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/perf/core]
[also build test WARNING on acme/perf/core linus/master v6.3-rc4 next-20230328]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/kan-liang-linux-intel-com/perf-x86-intel-ds-Use-the-size-from-each-PEBS-record/20230329-064258
patch link: https://lore.kernel.org/r/20230328222735.1367829-2-kan.liang%40linux.intel.com
patch subject: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
config: i386-randconfig-a013-20230327 (https://download.01.org/0day-ci/archive/20230329/202303291028.3Xe9Gdlp-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/a5988003cfa30fb0c88507d2d124eb551d42e1a6
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review kan-liang-linux-intel-com/perf-x86-intel-ds-Use-the-size-from-each-PEBS-record/20230329-064258
git checkout a5988003cfa30fb0c88507d2d124eb551d42e1a6
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/events/intel/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303291028.3Xe9Gdlp-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> arch/x86/events/intel/ds.c:2042:24: warning: unused variable 'cpuc' [-Wunused-variable]
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
^
1 warning generated.
vim +/cpuc +2042 arch/x86/events/intel/ds.c
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2029
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2030 static __always_inline void
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2031 __intel_pmu_pebs_event(struct perf_event *event,
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2032 struct pt_regs *iregs,
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2033 struct perf_sample_data *data,
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2034 void *base, void *top,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2035 int bit, int count,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2036 void (*setup_sample)(struct perf_event *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2037 struct pt_regs *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2038 void *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2039 struct perf_sample_data *,
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2040 struct pt_regs *))
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2041 {
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 @2042 struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2043 struct hw_perf_event *hwc = &event->hw;
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2044 struct x86_perf_regs perf_regs;
c22497f5838c23 arch/x86/events/intel/ds.c Kan Liang 2019-04-02 2045 struct pt_regs *regs = &perf_regs.regs;
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2046 void *at = get_next_pebs_record_by_bit(base, top, bit);
e506d1dac0edb2 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2047 static struct pt_regs dummy_iregs;
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2048
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2049 if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2050 /*
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2051 * Now, auto-reload is only enabled in fixed period mode.
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2052 * The reload value is always hwc->sample_period.
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2053 * May need to change it, if auto-reload is enabled in
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2054 * freq mode later.
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2055 */
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2056 intel_pmu_save_and_restart_reload(event, count);
d31fc13fdcb20e arch/x86/events/intel/ds.c Kan Liang 2018-02-12 2057 } else if (!intel_pmu_save_and_restart(event))
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2058 return;
43cf76312faefe arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2059
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2060 if (!iregs)
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2061 iregs = &dummy_iregs;
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2062
a3d86542de8850 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2015-05-12 2063 while (count > 1) {
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2064 setup_sample(event, iregs, at, data, regs);
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2065 perf_event_output(event, data, regs);
a5988003cfa30f arch/x86/events/intel/ds.c Kan Liang 2023-03-28 2066 at += get_pebs_size(at);
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2067 at = get_next_pebs_record_by_bit(at, top, bit);
a3d86542de8850 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2015-05-12 2068 count--;
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2069 }
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2070
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2071 setup_sample(event, iregs, at, data, regs);
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2072 if (iregs == &dummy_iregs) {
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2073 /*
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2074 * The PEBS records may be drained in the non-overflow context,
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2075 * e.g., large PEBS + context switch. Perf should treat the
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2076 * last record the same as other PEBS records, and doesn't
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2077 * invoke the generic overflow handler.
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2078 */
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2079 perf_event_output(event, data, regs);
35d1ce6bec1336 arch/x86/events/intel/ds.c Kan Liang 2020-09-02 2080 } else {
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2081 /*
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2082 * All but the last records are processed.
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2083 * The last one is left to be able to call the overflow handler.
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2084 */
9dfa9a5c9bae34 arch/x86/events/intel/ds.c Peter Zijlstra 2020-10-30 2085 if (perf_event_overflow(event, data, regs))
a4eaf7f14675cb arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2010-06-16 2086 x86_pmu_stop(event, 0);
21509084f999d7 arch/x86/kernel/cpu/perf_event_intel_ds.c Yan, Zheng 2015-05-06 2087 }
2b0b5c6fe9b383 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2010-04-08 2088 }
2b0b5c6fe9b383 arch/x86/kernel/cpu/perf_event_intel_ds.c Peter Zijlstra 2010-04-08 2089
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] perf: Add sched_task callback during ctx reschedule
2023-03-28 22:27 [PATCH 1/2] perf: Add sched_task callback during ctx reschedule kan.liang
2023-03-28 22:27 ` [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record kan.liang
@ 2023-04-06 13:10 ` Peter Zijlstra
1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2023-04-06 13:10 UTC (permalink / raw)
To: kan.liang; +Cc: mingo, linux-kernel, ak, eranian
On Tue, Mar 28, 2023 at 03:27:34PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Several similar kernel warnings can be triggered,
>
> [56605.607840] CPU0 PEBS record size 0, expected 32, config 0
> cpuc->record_size=208
>
> when the below commands are running in parallel for a while on SPR.
>
> while true; do perf record --no-buildid -a --intr-regs=AX -e
> cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done &
>
> while true; do perf record -o /tmp/out -W -d -e
> '{ld_blocks.store_forward:period=1000000,
> MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}'
> -c 1037 ./triad; done
> *The triad program is just the generation of loads/stores.
>
> The warnings are triggered when an unexpected PEBS record (with a
> different config and size) is found.
>
> In a context switch, different events may be applied to the old task and
> the new task. The sched_task callback is used to switch the PMU-specific
> data. For the PEBS, the callback flushes the DS area and parses the PEBS
> records from the old task when schedule out. The new task never sees the
> stale PEBS records.
>
> However, the exec() doesn't have similar support. The new command may
> see the PEBS records from the old command. In the perf_event_exec(), the
> ctx_resched() is invoked to reschedule the context. It may updates the
> active events, which may change the global PEBS configuration. The PEBS
> record from the old command may have a different config and size from
> the new command. The warning is triggered.
>
> The sched_task callback should be invoked in all the places where the
> context can be changed. Add the sched_task callback in the ctx_resched()
> as well.
>
> Reported-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> kernel/events/core.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f79fd8b87f75..0c49183656fd 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2642,6 +2642,8 @@ static void perf_event_sched_in(struct perf_cpu_context *cpuctx,
> ctx_sched_in(ctx, EVENT_FLEXIBLE);
> }
>
> +static void perf_ctx_sched_task_cb(struct perf_event_context *ctx, bool sched_in);
> +
> /*
> * We want to maintain the following priority of scheduling:
> * - CPU pinned (EVENT_CPU | EVENT_PINNED)
> @@ -2680,6 +2682,7 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
> perf_ctx_disable(&cpuctx->ctx);
> if (task_ctx) {
> perf_ctx_disable(task_ctx);
> + perf_ctx_sched_task_cb(task_ctx, false);
> task_ctx_sched_out(task_ctx, event_type);
> }
>
> @@ -2698,8 +2701,10 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
> perf_event_sched_in(cpuctx, task_ctx);
>
> perf_ctx_enable(&cpuctx->ctx);
> - if (task_ctx)
> + if (task_ctx) {
> + perf_ctx_sched_task_cb(task_ctx, true);
> perf_ctx_enable(task_ctx);
> + }
> }
>
Urgh... yuck.. Also I think you missed perf_rotate_context() which has
an open coded resched.
But I think this is a very fragile approach. Why can't the x86 pmu/pebs
driver not flush when it's programming changes -- it, as no other, knows
when this happens.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
2023-03-28 22:27 ` [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record kan.liang
2023-03-29 0:34 ` kernel test robot
2023-03-29 2:58 ` kernel test robot
@ 2023-04-06 13:13 ` Peter Zijlstra
2023-04-06 15:36 ` Liang, Kan
2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2023-04-06 13:13 UTC (permalink / raw)
To: kan.liang; +Cc: mingo, linux-kernel, ak, eranian
On Tue, Mar 28, 2023 at 03:27:35PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The kernel warning for the unexpected PEBS record can also be observed
> during a context switch, when the below commands are running in parallel
> for a while on SPR.
>
> while true; do perf record --no-buildid -a --intr-regs=AX -e
> cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done &
>
> while true; do perf record -o /tmp/out -W -d -e
> '{ld_blocks.store_forward:period=1000000,
> MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}'
> -c 1037 ./triad; done
> *The triad program is just the generation of loads/stores.
>
> The current PEBS code assumes that all the PEBS records in the DS buffer
> have the same size, aka cpuc->pebs_record_size. It's true for the most
> cases, since the DS buffer is always flushed in every context switch.
>
> However, there is a corner case that breaks the assumption.
> A system-wide PEBS event with the large PEBS config may be enabled
> during a context switch. Some PEBS records for the system-wide PEBS may
> be generated while the old task is sched out but the new one hasn't been
> sched in yet. When the new task is sched in, the cpuc->pebs_record_size
> may be updated for the per-task PEBS events. So the existing system-wide
> PEBS records have a different size from the later PEBS records.
>
> Two methods were considered to fix the issue.
> One is to flush the DS buffer for the system-wide PEBS right before the
> new task sched in. It has to be done in the generic code via the
> sched_task() call back. However, the sched_task() is shared among
> different ARCHs. The movement may impact other ARCHs, e.g., AMD BRS
> requires the sched_task() is called after the PMU has started on a
> ctxswin. The method is dropped.
>
> The other method is implemented here. It doesn't assume that all the
> PEBS records have the same size any more. The size from each PEBS record
> is used to parse the record. For the previous platform (PEBS format < 4),
> which doesn't support adaptive PEBS, there is nothing changed.
Same as with the other; why can't we flush the buffer when we reprogram
the hardware?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record
2023-04-06 13:13 ` Peter Zijlstra
@ 2023-04-06 15:36 ` Liang, Kan
0 siblings, 0 replies; 7+ messages in thread
From: Liang, Kan @ 2023-04-06 15:36 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: mingo, linux-kernel, ak, eranian
On 2023-04-06 9:13 a.m., Peter Zijlstra wrote:
> On Tue, Mar 28, 2023 at 03:27:35PM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The kernel warning for the unexpected PEBS record can also be observed
>> during a context switch, when the below commands are running in parallel
>> for a while on SPR.
>>
>> while true; do perf record --no-buildid -a --intr-regs=AX -e
>> cpu/event=0xd0,umask=0x81/pp -c 10003 -o /dev/null ./triad; done &
>>
>> while true; do perf record -o /tmp/out -W -d -e
>> '{ld_blocks.store_forward:period=1000000,
>> MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}'
>> -c 1037 ./triad; done
>> *The triad program is just the generation of loads/stores.
>>
>> The current PEBS code assumes that all the PEBS records in the DS buffer
>> have the same size, aka cpuc->pebs_record_size. It's true for the most
>> cases, since the DS buffer is always flushed in every context switch.
>>
>> However, there is a corner case that breaks the assumption.
>> A system-wide PEBS event with the large PEBS config may be enabled
>> during a context switch. Some PEBS records for the system-wide PEBS may
>> be generated while the old task is sched out but the new one hasn't been
>> sched in yet. When the new task is sched in, the cpuc->pebs_record_size
>> may be updated for the per-task PEBS events. So the existing system-wide
>> PEBS records have a different size from the later PEBS records.
>>
>> Two methods were considered to fix the issue.
>> One is to flush the DS buffer for the system-wide PEBS right before the
>> new task sched in. It has to be done in the generic code via the
>> sched_task() call back. However, the sched_task() is shared among
>> different ARCHs. The movement may impact other ARCHs, e.g., AMD BRS
>> requires the sched_task() is called after the PMU has started on a
>> ctxswin. The method is dropped.
>>
>> The other method is implemented here. It doesn't assume that all the
>> PEBS records have the same size any more. The size from each PEBS record
>> is used to parse the record. For the previous platform (PEBS format < 4),
>> which doesn't support adaptive PEBS, there is nothing changed.
>
> Same as with the other; why can't we flush the buffer when we reprogram
> the hardware?
For the current code, the pebs_record_size has been updated in another
place before we reprogram the hardware.
But I think it's possible to move the update of the pebs_record_size
right before the hardware reprogram. So we can flush the buffer before
everything is updated. Let me try this method.
Thanks,
Kan
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-06 15:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-28 22:27 [PATCH 1/2] perf: Add sched_task callback during ctx reschedule kan.liang
2023-03-28 22:27 ` [PATCH 2/2] perf/x86/intel/ds: Use the size from each PEBS record kan.liang
2023-03-29 0:34 ` kernel test robot
2023-03-29 2:58 ` kernel test robot
2023-04-06 13:13 ` Peter Zijlstra
2023-04-06 15:36 ` Liang, Kan
2023-04-06 13:10 ` [PATCH 1/2] perf: Add sched_task callback during ctx reschedule Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox