linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/22] Python generated Intel metrics
@ 2024-09-26 17:50 Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 01/22] perf jevents: Add RAPL metrics for all Intel models Ian Rogers
                   ` (22 more replies)
  0 siblings, 23 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Generate twenty sets of additional metrics for Intel. Rapl and Idle
metrics aren't specific to Intel but are placed here for ease and
convenience. Smi and tsx metrics are added so they can be dropped from
the per model json files. There are four uncore sets of metrics and
eleven core metrics. Add a CheckPmu function to metric to simplify
detecting the presence of hybrid PMUs in events. Metrics with
experimental events are flagged as experimental in their description.

The patches should be applied on top of:
https://lore.kernel.org/lkml/20240926174101.406874-1-irogers@google.com/

v4. Experimental metric descriptions. Add mesh bandwidth metric. Rebase.
v3. Swap tsx and CheckPMU patches that were in the wrong order. Some
    minor code cleanup changes. Drop reference to merged fix for
    umasks/occ_sel in PCU events and for cstate metrics.
v2. Drop the cycles breakdown in favor of having it as a common
    metric, spelling and other improvements suggested by Kan Liang
    <kan.liang@linux.intel.com>.

Ian Rogers (22):
  perf jevents: Add RAPL metrics for all Intel models
  perf jevents: Add idle metric for Intel models
  perf jevents: Add smi metric group for Intel models
  perf jevents: Add CheckPmu to see if a PMU is in loaded json events
  perf jevents: Mark metrics with experimental events as experimental
  perf jevents: Add tsx metric group for Intel models
  perf jevents: Add br metric group for branch statistics on Intel
  perf jevents: Add software prefetch (swpf) metric group for Intel
  perf jevents: Add ports metric group giving utilization on Intel
  perf jevents: Add L2 metrics for Intel
  perf jevents: Add load store breakdown metrics ldst for Intel
  perf jevents: Add ILP metrics for Intel
  perf jevents: Add context switch metrics for Intel
  perf jevents: Add FPU metrics for Intel
  perf jevents: Add Miss Level Parallelism (MLP) metric for Intel
  perf jevents: Add mem_bw metric for Intel
  perf jevents: Add local/remote "mem" breakdown metrics for Intel
  perf jevents: Add dir breakdown metrics for Intel
  perf jevents: Add C-State metrics from the PCU PMU for Intel
  perf jevents: Add local/remote miss latency metrics for Intel
  perf jevents: Add upi_bw metric for Intel
  perf jevents: Add mesh bandwidth saturation metric for Intel

 tools/perf/pmu-events/intel_metrics.py | 1046 +++++++++++++++++++++++-
 tools/perf/pmu-events/metric.py        |   52 ++
 2 files changed, 1095 insertions(+), 3 deletions(-)

-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 01/22] perf jevents: Add RAPL metrics for all Intel models
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 02/22] perf jevents: Add idle metric for " Ian Rogers
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Add a 'cpu_power' metric group that computes the power consumption
from RAPL events if they are present.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 45 ++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 04a19d05c6c1..58e23eb48312 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -1,13 +1,49 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
-from metric import (JsonEncodeMetric, JsonEncodeMetricGroupDescriptions, LoadEvents,
-                    MetricGroup)
+from metric import (d_ratio, has_event, Event, JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
+                    LoadEvents, Metric, MetricGroup, Select)
 import argparse
 import json
+import math
 import os
 
 # Global command line arguments.
 _args = None
+interval_sec = Event("duration_time")
+
+def Rapl() -> MetricGroup:
+  """Processor power consumption estimate.
+
+  Use events from the running average power limit (RAPL) driver.
+  """
+  # Watts = joules/second
+  pkg = Event("power/energy\-pkg/")
+  cond_pkg = Select(pkg, has_event(pkg), math.nan)
+  cores = Event("power/energy\-cores/")
+  cond_cores = Select(cores, has_event(cores), math.nan)
+  ram = Event("power/energy\-ram/")
+  cond_ram = Select(ram, has_event(ram), math.nan)
+  gpu = Event("power/energy\-gpu/")
+  cond_gpu = Select(gpu, has_event(gpu), math.nan)
+  psys = Event("power/energy\-psys/")
+  cond_psys = Select(psys, has_event(psys), math.nan)
+  scale = 2.3283064365386962890625e-10
+  metrics = [
+      Metric("cpu_power_pkg", "",
+             d_ratio(cond_pkg * scale, interval_sec), "Watts"),
+      Metric("cpu_power_cores", "",
+             d_ratio(cond_cores * scale, interval_sec), "Watts"),
+      Metric("cpu_power_ram", "",
+             d_ratio(cond_ram * scale, interval_sec), "Watts"),
+      Metric("cpu_power_gpu", "",
+             d_ratio(cond_gpu * scale, interval_sec), "Watts"),
+      Metric("cpu_power_psys", "",
+             d_ratio(cond_psys * scale, interval_sec), "Watts"),
+  ]
+
+  return MetricGroup("cpu_power", metrics,
+                     description="Running Average Power Limit (RAPL) power consumption estimates")
+
 
 def main() -> None:
   global _args
@@ -31,7 +67,10 @@ def main() -> None:
   directory = f"{_args.events_path}/x86/{_args.model}/"
   LoadEvents(directory)
 
-  all_metrics = MetricGroup("",[])
+  all_metrics = MetricGroup("", [
+      Rapl(),
+  ])
+
 
   if _args.metricgroups:
     print(JsonEncodeMetricGroupDescriptions(all_metrics))
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 02/22] perf jevents: Add idle metric for Intel models
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 01/22] perf jevents: Add RAPL metrics for all Intel models Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-11-06 17:01   ` Liang, Kan
  2024-09-26 17:50 ` [PATCH v4 03/22] perf jevents: Add smi metric group " Ian Rogers
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Compute using the msr PMU the percentage of wallclock cycles where the
CPUs are in a low power state.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 58e23eb48312..f875eb844c78 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -1,7 +1,8 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
-from metric import (d_ratio, has_event, Event, JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
-                    LoadEvents, Metric, MetricGroup, Select)
+from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
+                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
+                    MetricGroup, Select)
 import argparse
 import json
 import math
@@ -11,6 +12,16 @@ import os
 _args = None
 interval_sec = Event("duration_time")
 
+def Idle() -> Metric:
+  cyc = Event("msr/mperf/")
+  tsc = Event("msr/tsc/")
+  low = max(tsc - cyc, 0)
+  return Metric(
+      "idle",
+      "Percentage of total wallclock cycles where CPUs are in low power state (C1 or deeper sleep state)",
+      d_ratio(low, tsc), "100%")
+
+
 def Rapl() -> MetricGroup:
   """Processor power consumption estimate.
 
@@ -68,6 +79,7 @@ def main() -> None:
   LoadEvents(directory)
 
   all_metrics = MetricGroup("", [
+      Idle(),
       Rapl(),
   ])
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 03/22] perf jevents: Add smi metric group for Intel models
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 01/22] perf jevents: Add RAPL metrics for all Intel models Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 02/22] perf jevents: Add idle metric for " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-11-06 17:32   ` Liang, Kan
  2024-09-26 17:50 ` [PATCH v4 04/22] perf jevents: Add CheckPmu to see if a PMU is in loaded json events Ian Rogers
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Allow duplicated metric to be dropped from json files.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index f875eb844c78..f34b4230a4ee 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -2,7 +2,7 @@
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
                     JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
-                    MetricGroup, Select)
+                    MetricGroup, MetricRef, Select)
 import argparse
 import json
 import math
@@ -56,6 +56,24 @@ def Rapl() -> MetricGroup:
                      description="Running Average Power Limit (RAPL) power consumption estimates")
 
 
+def Smi() -> MetricGroup:
+    aperf = Event('msr/aperf/')
+    cycles = Event('cycles')
+    smi_num = Event('msr/smi/')
+    smi_cycles = Select(Select((aperf - cycles) / aperf, smi_num > 0, 0),
+                        has_event(aperf),
+                        0)
+    return MetricGroup('smi', [
+        Metric('smi_num', 'Number of SMI interrupts.',
+               Select(smi_num, has_event(smi_num), 0), 'SMI#'),
+        # Note, the smi_cycles "Event" is really a reference to the metric.
+        Metric('smi_cycles',
+               'Percentage of cycles spent in System Management Interrupts. '
+               'Requires /sys/devices/cpu/freeze_on_smi to be 1.',
+               smi_cycles, '100%', threshold=(MetricRef('smi_cycles') > 0.10))
+    ], description = 'System Management Interrupt metrics')
+
+
 def main() -> None:
   global _args
 
@@ -81,6 +99,7 @@ def main() -> None:
   all_metrics = MetricGroup("", [
       Idle(),
       Rapl(),
+      Smi(),
   ])
 
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 04/22] perf jevents: Add CheckPmu to see if a PMU is in loaded json events
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (2 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 03/22] perf jevents: Add smi metric group " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 05/22] perf jevents: Mark metrics with experimental events as experimental Ian Rogers
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

CheckPmu can be used to determine if hybrid events are present,
allowing for hybrid conditional metrics/events/pmus to be premised on
the json files rather than hard coded tables.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/metric.py | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index 03312cd6d491..e1847cccfdb0 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -8,10 +8,12 @@ import re
 from enum import Enum
 from typing import Dict, List, Optional, Set, Tuple, Union
 
+all_pmus = set()
 all_events = set()
 
 def LoadEvents(directory: str) -> None:
   """Populate a global set of all known events for the purpose of validating Event names"""
+  global all_pmus
   global all_events
   all_events = {
       "context\-switches",
@@ -24,12 +26,18 @@ def LoadEvents(directory: str) -> None:
     filename = os.fsdecode(file)
     if filename.endswith(".json"):
       for x in json.load(open(f"{directory}/{filename}")):
+        if "Unit" in x:
+          all_pmus.add(x["Unit"])
         if "EventName" in x:
           all_events.add(x["EventName"])
         elif "ArchStdEvent" in x:
           all_events.add(x["ArchStdEvent"])
 
 
+def CheckPmu(name: str) -> bool:
+  return name in all_pmus
+
+
 def CheckEvent(name: str) -> bool:
   """Check the event name exists in the set of all loaded events"""
   global all_events
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 05/22] perf jevents: Mark metrics with experimental events as experimental
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (3 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 04/22] perf jevents: Add CheckPmu to see if a PMU is in loaded json events Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models Ian Rogers
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

When metrics are made with experimental events it is desirable the
metric description also carries this information in case of metric
inaccuracies.

Suggested-by: Perry Taylor <perry.taylor@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/metric.py | 44 +++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tools/perf/pmu-events/metric.py b/tools/perf/pmu-events/metric.py
index e1847cccfdb0..5a5b149dd286 100644
--- a/tools/perf/pmu-events/metric.py
+++ b/tools/perf/pmu-events/metric.py
@@ -10,11 +10,13 @@ from typing import Dict, List, Optional, Set, Tuple, Union
 
 all_pmus = set()
 all_events = set()
+experimental_events = set()
 
 def LoadEvents(directory: str) -> None:
   """Populate a global set of all known events for the purpose of validating Event names"""
   global all_pmus
   global all_events
+  global experimental_events
   all_events = {
       "context\-switches",
       "cycles",
@@ -30,6 +32,8 @@ def LoadEvents(directory: str) -> None:
           all_pmus.add(x["Unit"])
         if "EventName" in x:
           all_events.add(x["EventName"])
+          if "Experimental" in x and x["Experimental"] == "1":
+            experimental_events.add(x["EventName"])
         elif "ArchStdEvent" in x:
           all_events.add(x["ArchStdEvent"])
 
@@ -55,6 +59,18 @@ def CheckEvent(name: str) -> bool:
   return name in all_events
 
 
+def IsExperimentalEvent(name: str) -> bool:
+  global experimental_events
+  if ':' in name:
+    # Remove trailing modifier.
+    name = name[:name.find(':')]
+  elif '/' in name:
+    # Name could begin with a PMU or an event, for now assume it is not experimental.
+    return False
+
+  return name in experimental_events
+
+
 class MetricConstraint(Enum):
   GROUPED_EVENTS = 0
   NO_GROUP_EVENTS = 1
@@ -76,6 +92,10 @@ class Expression:
     """Returns a simplified version of self."""
     raise NotImplementedError()
 
+  def HasExperimentalEvents(self) -> bool:
+    """Are experimental events used in the expression?"""
+    raise NotImplementedError()
+
   def Equals(self, other) -> bool:
     """Returns true when two expressions are the same."""
     raise NotImplementedError()
@@ -243,6 +263,9 @@ class Operator(Expression):
 
     return Operator(self.operator, lhs, rhs)
 
+  def HasExperimentalEvents(self) -> bool:
+    return self.lhs.HasExperimentalEvents() or self.rhs.HasExperimentalEvents()
+
   def Equals(self, other: Expression) -> bool:
     if isinstance(other, Operator):
       return self.operator == other.operator and self.lhs.Equals(
@@ -291,6 +314,10 @@ class Select(Expression):
 
     return Select(true_val, cond, false_val)
 
+  def HasExperimentalEvents(self) -> bool:
+    return (self.cond.HasExperimentalEvents() or self.true_val.HasExperimentalEvents() or
+            self.false_val.HasExperimentalEvents())
+
   def Equals(self, other: Expression) -> bool:
     if isinstance(other, Select):
       return self.cond.Equals(other.cond) and self.false_val.Equals(
@@ -339,6 +366,9 @@ class Function(Expression):
 
     return Function(self.fn, lhs, rhs)
 
+  def HasExperimentalEvents(self) -> bool:
+    return self.lhs.HasExperimentalEvents() or (self.rhs and self.rhs.HasExperimentalEvents())
+
   def Equals(self, other: Expression) -> bool:
     if isinstance(other, Function):
       result = self.fn == other.fn and self.lhs.Equals(other.lhs)
@@ -378,6 +408,9 @@ class Event(Expression):
     global all_events
     raise Exception(f"No event {error} in:\n{all_events}")
 
+  def HasExperimentalEvents(self) -> bool:
+    return IsExperimentalEvent(self.name)
+
   def ToPerfJson(self):
     result = re.sub('/', '@', self.name)
     return result
@@ -410,6 +443,9 @@ class MetricRef(Expression):
   def Simplify(self) -> Expression:
     return self
 
+  def HasExperimentalEvents(self) -> bool:
+    return False
+
   def Equals(self, other: Expression) -> bool:
     return isinstance(other, MetricRef) and self.name == other.name
 
@@ -437,6 +473,9 @@ class Constant(Expression):
   def Simplify(self) -> Expression:
     return self
 
+  def HasExperimentalEvents(self) -> bool:
+    return False
+
   def Equals(self, other: Expression) -> bool:
     return isinstance(other, Constant) and self.value == other.value
 
@@ -459,6 +498,9 @@ class Literal(Expression):
   def Simplify(self) -> Expression:
     return self
 
+  def HasExperimentalEvents(self) -> bool:
+    return False
+
   def Equals(self, other: Expression) -> bool:
     return isinstance(other, Literal) and self.value == other.value
 
@@ -521,6 +563,8 @@ class Metric:
     self.name = name
     self.description = description
     self.expr = expr.Simplify()
+    if self.expr.HasExperimentalEvents():
+      self.description += " (metric should be considered experimental as it contains experimental events)."
     # Workraound valid_only_metric hiding certain metrics based on unit.
     scale_unit = scale_unit.replace('/sec', ' per sec')
     if scale_unit[0].isdigit():
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (4 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 05/22] perf jevents: Mark metrics with experimental events as experimental Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-11-06 17:52   ` Liang, Kan
  2024-09-26 17:50 ` [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel Ian Rogers
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Allow duplicated metric to be dropped from json files. Detect when TSX
is supported by a model by using the json events, use sysfs events at
runtime as hypervisors, etc. may disable TSX.

Add CheckPmu to metric to determine if which PMUs have been associated
with the loaded events.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 52 +++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index f34b4230a4ee..58e243695f0a 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -1,12 +1,13 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
-from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
+from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
                     JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
                     MetricGroup, MetricRef, Select)
 import argparse
 import json
 import math
 import os
+from typing import Optional
 
 # Global command line arguments.
 _args = None
@@ -74,6 +75,54 @@ def Smi() -> MetricGroup:
     ], description = 'System Management Interrupt metrics')
 
 
+def Tsx() -> Optional[MetricGroup]:
+  pmu = "cpu_core" if CheckPmu("cpu_core") else "cpu"
+  cycles = Event('cycles')
+  cycles_in_tx = Event(f'{pmu}/cycles\-t/')
+  cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
+  try:
+    # Test if the tsx event is present in the json, prefer the
+    # sysfs version so that we can detect its presence at runtime.
+    transaction_start = Event("RTM_RETIRED.START")
+    transaction_start = Event(f'{pmu}/tx\-start/')
+  except:
+    return None
+
+  elision_start = None
+  try:
+    # Elision start isn't supported by all models, but we'll not
+    # generate the tsx_cycles_per_elision metric in that
+    # case. Again, prefer the sysfs encoding of the event.
+    elision_start = Event("HLE_RETIRED.START")
+    elision_start = Event(f'{pmu}/el\-start/')
+  except:
+    pass
+
+  return MetricGroup('transaction', [
+      Metric('tsx_transactional_cycles',
+             'Percentage of cycles within a transaction region.',
+             Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
+             '100%'),
+      Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted transactions.',
+             Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
+                    has_event(cycles_in_tx),
+                    0),
+             '100%'),
+      Metric('tsx_cycles_per_transaction',
+             'Number of cycles within a transaction divided by the number of transactions.',
+             Select(cycles_in_tx / transaction_start,
+                    has_event(cycles_in_tx),
+                    0),
+             "cycles / transaction"),
+      Metric('tsx_cycles_per_elision',
+             'Number of cycles within a transaction divided by the number of elisions.',
+             Select(cycles_in_tx / elision_start,
+                    has_event(elision_start),
+                    0),
+             "cycles / elision") if elision_start else None,
+  ], description="Breakdown of transactional memory statistics")
+
+
 def main() -> None:
   global _args
 
@@ -100,6 +149,7 @@ def main() -> None:
       Idle(),
       Rapl(),
       Smi(),
+      Tsx(),
   ])
 
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (5 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-11-07 14:35   ` Liang, Kan
  2024-09-26 17:50 ` [PATCH v4 08/22] perf jevents: Add software prefetch (swpf) metric group for Intel Ian Rogers
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

The br metric group for branches itself comprises metric groups for
total, taken, conditional, fused and far metric groups using json
events. Conditional taken and not taken metrics are specific to
Icelake and later generations, so the presence of the event is used to
determine whether the metric should exist.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 138 +++++++++++++++++++++++++
 1 file changed, 138 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 58e243695f0a..09f7b7159e7c 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -123,6 +123,143 @@ def Tsx() -> Optional[MetricGroup]:
   ], description="Breakdown of transactional memory statistics")
 
 
+def IntelBr():
+  ins = Event("instructions")
+
+  def Total() -> MetricGroup:
+    br_all = Event ("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
+    br_m_all = Event("BR_MISP_RETIRED.ALL_BRANCHES",
+                     "BR_INST_RETIRED.MISPRED",
+                     "BR_MISP_EXEC.ANY")
+    br_clr = None
+    try:
+      br_clr = Event("BACLEARS.ANY", "BACLEARS.ALL")
+    except:
+      pass
+
+    br_r = d_ratio(br_all, interval_sec)
+    ins_r = d_ratio(ins, br_all)
+    misp_r = d_ratio(br_m_all, br_all)
+    clr_r = d_ratio(br_clr, interval_sec) if br_clr else None
+
+    return MetricGroup("br_total", [
+        Metric("br_total_retired",
+               "The number of branch instructions retired per second.", br_r,
+               "insn/s"),
+        Metric(
+            "br_total_mispred",
+            "The number of branch instructions retired, of any type, that were "
+            "not correctly predicted as a percentage of all branch instrucions.",
+            misp_r, "100%"),
+        Metric("br_total_insn_between_branches",
+               "The number of instructions divided by the number of branches.",
+               ins_r, "insn"),
+        Metric("br_total_insn_fe_resteers",
+               "The number of resync branches per second.", clr_r, "req/s"
+               ) if clr_r else None
+    ])
+
+  def Taken() -> MetricGroup:
+    br_all = Event("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
+    br_m_tk = None
+    try:
+      br_m_tk = Event("BR_MISP_RETIRED.NEAR_TAKEN",
+                      "BR_MISP_RETIRED.TAKEN_JCC",
+                      "BR_INST_RETIRED.MISPRED_TAKEN")
+    except:
+      pass
+    br_r = d_ratio(br_all, interval_sec)
+    ins_r = d_ratio(ins, br_all)
+    misp_r = d_ratio(br_m_tk, br_all) if br_m_tk else None
+    return MetricGroup("br_taken", [
+        Metric("br_taken_retired",
+               "The number of taken branches that were retired per second.",
+               br_r, "insn/s"),
+        Metric(
+            "br_taken_mispred",
+            "The number of retired taken branch instructions that were "
+            "mispredicted as a percentage of all taken branches.", misp_r,
+            "100%") if misp_r else None,
+        Metric(
+            "br_taken_insn_between_branches",
+            "The number of instructions divided by the number of taken branches.",
+            ins_r, "insn"),
+    ])
+
+  def Conditional() -> Optional[MetricGroup]:
+    try:
+      br_cond = Event("BR_INST_RETIRED.COND",
+                      "BR_INST_RETIRED.CONDITIONAL",
+                      "BR_INST_RETIRED.TAKEN_JCC")
+      br_m_cond = Event("BR_MISP_RETIRED.COND",
+                        "BR_MISP_RETIRED.CONDITIONAL",
+                        "BR_MISP_RETIRED.TAKEN_JCC")
+    except:
+      return None
+
+    br_cond_nt = None
+    br_m_cond_nt = None
+    try:
+      br_cond_nt = Event("BR_INST_RETIRED.COND_NTAKEN")
+      br_m_cond_nt = Event("BR_MISP_RETIRED.COND_NTAKEN")
+    except:
+      pass
+    br_r = d_ratio(br_cond, interval_sec)
+    ins_r = d_ratio(ins, br_cond)
+    misp_r = d_ratio(br_m_cond, br_cond)
+    taken_metrics = [
+        Metric("br_cond_retired", "Retired conditional branch instructions.",
+               br_r, "insn/s"),
+        Metric("br_cond_insn_between_branches",
+               "The number of instructions divided by the number of conditional "
+               "branches.", ins_r, "insn"),
+        Metric("br_cond_mispred",
+               "Retired conditional branch instructions mispredicted as a "
+               "percentage of all conditional branches.", misp_r, "100%"),
+    ]
+    if not br_m_cond_nt:
+      return MetricGroup("br_cond", taken_metrics)
+
+    br_r = d_ratio(br_cond_nt, interval_sec)
+    ins_r = d_ratio(ins, br_cond_nt)
+    misp_r = d_ratio(br_m_cond_nt, br_cond_nt)
+
+    not_taken_metrics = [
+        Metric("br_cond_retired", "Retired conditional not taken branch instructions.",
+               br_r, "insn/s"),
+        Metric("br_cond_insn_between_branches",
+               "The number of instructions divided by the number of not taken conditional "
+               "branches.", ins_r, "insn"),
+        Metric("br_cond_mispred",
+               "Retired not taken conditional branch instructions mispredicted as a "
+               "percentage of all not taken conditional branches.", misp_r, "100%"),
+    ]
+    return MetricGroup("br_cond", [
+        MetricGroup("br_cond_nt", not_taken_metrics),
+        MetricGroup("br_cond_tkn", taken_metrics),
+    ])
+
+  def Far() -> Optional[MetricGroup]:
+    try:
+      br_far = Event("BR_INST_RETIRED.FAR_BRANCH")
+    except:
+      return None
+
+    br_r = d_ratio(br_far, interval_sec)
+    ins_r = d_ratio(ins, br_far)
+    return MetricGroup("br_far", [
+        Metric("br_far_retired", "Retired far control transfers per second.",
+               br_r, "insn/s"),
+        Metric(
+            "br_far_insn_between_branches",
+            "The number of instructions divided by the number of far branches.",
+            ins_r, "insn"),
+    ])
+
+  return MetricGroup("br", [Total(), Taken(), Conditional(), Far()],
+                     description="breakdown of retired branch instructions")
+
+
 def main() -> None:
   global _args
 
@@ -150,6 +287,7 @@ def main() -> None:
       Rapl(),
       Smi(),
       Tsx(),
+      IntelBr(),
   ])
 
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 08/22] perf jevents: Add software prefetch (swpf) metric group for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (6 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel Ian Rogers
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Add metrics that breakdown software prefetch instruction use.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 65 ++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 09f7b7159e7c..f4707e964f75 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -260,6 +260,70 @@ def IntelBr():
                      description="breakdown of retired branch instructions")
 
 
+def IntelSwpf() -> Optional[MetricGroup]:
+  ins = Event("instructions")
+  try:
+    s_ld = Event("MEM_INST_RETIRED.ALL_LOADS", "MEM_UOPS_RETIRED.ALL_LOADS")
+    s_nta = Event("SW_PREFETCH_ACCESS.NTA")
+    s_t0 = Event("SW_PREFETCH_ACCESS.T0")
+    s_t1 = Event("SW_PREFETCH_ACCESS.T1_T2")
+    s_w = Event("SW_PREFETCH_ACCESS.PREFETCHW")
+  except:
+    return None
+
+  all_sw = s_nta + s_t0 + s_t1 + s_w
+  swp_r = d_ratio(all_sw, interval_sec)
+  ins_r = d_ratio(ins, all_sw)
+  ld_r = d_ratio(s_ld, all_sw)
+
+  return MetricGroup("swpf", [
+      MetricGroup("swpf_totals", [
+          Metric("swpf_totals_exec", "Software prefetch instructions per second",
+                swp_r, "swpf/s"),
+          Metric("swpf_totals_insn_per_pf",
+                 "Average number of instructions between software prefetches",
+                 ins_r, "insn/swpf"),
+          Metric("swpf_totals_loads_per_pf",
+                 "Average number of loads between software prefetches",
+                 ld_r, "loads/swpf"),
+      ]),
+      MetricGroup("swpf_bkdwn", [
+          MetricGroup("swpf_bkdwn_nta", [
+              Metric("swpf_bkdwn_nta_per_swpf",
+                     "Software prefetch NTA instructions as a percent of all prefetch instructions",
+                     d_ratio(s_nta, all_sw), "100%"),
+              Metric("swpf_bkdwn_nta_rate",
+                     "Software prefetch NTA instructions per second",
+                     d_ratio(s_nta, interval_sec), "insn/s"),
+          ]),
+          MetricGroup("swpf_bkdwn_t0", [
+              Metric("swpf_bkdwn_t0_per_swpf",
+                     "Software prefetch T0 instructions as a percent of all prefetch instructions",
+                     d_ratio(s_t0, all_sw), "100%"),
+              Metric("swpf_bkdwn_t0_rate",
+                     "Software prefetch T0 instructions per second",
+                     d_ratio(s_t0, interval_sec), "insn/s"),
+          ]),
+          MetricGroup("swpf_bkdwn_t1_t2", [
+              Metric("swpf_bkdwn_t1_t2_per_swpf",
+                     "Software prefetch T1 or T2 instructions as a percent of all prefetch instructions",
+                     d_ratio(s_t1, all_sw), "100%"),
+              Metric("swpf_bkdwn_t1_t2_rate",
+                     "Software prefetch T1 or T2 instructions per second",
+                     d_ratio(s_t1, interval_sec), "insn/s"),
+          ]),
+          MetricGroup("swpf_bkdwn_w", [
+              Metric("swpf_bkdwn_w_per_swpf",
+                     "Software prefetch W instructions as a percent of all prefetch instructions",
+                     d_ratio(s_w, all_sw), "100%"),
+              Metric("swpf_bkdwn_w_rate",
+                     "Software prefetch W instructions per second",
+                     d_ratio(s_w, interval_sec), "insn/s"),
+          ]),
+      ]),
+  ], description="Software prefetch instruction breakdown")
+
+
 def main() -> None:
   global _args
 
@@ -288,6 +352,7 @@ def main() -> None:
       Smi(),
       Tsx(),
       IntelBr(),
+      IntelSwpf(),
   ])
 
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (7 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 08/22] perf jevents: Add software prefetch (swpf) metric group for Intel Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-11-07 15:00   ` Liang, Kan
  2024-09-26 17:50 ` [PATCH v4 10/22] perf jevents: Add L2 metrics for Intel Ian Rogers
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

The ports metric group contains a metric for each port giving its
utilization as a ratio of cycles. The metrics are created by looking
for UOPS_DISPATCHED.PORT events.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index f4707e964f75..3ef4eb868580 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -1,12 +1,13 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
-                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
-                    MetricGroup, MetricRef, Select)
+                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
+                    Metric, MetricGroup, MetricRef, Select)
 import argparse
 import json
 import math
 import os
+import re
 from typing import Optional
 
 # Global command line arguments.
@@ -260,6 +261,33 @@ def IntelBr():
                      description="breakdown of retired branch instructions")
 
 
+def IntelPorts() -> Optional[MetricGroup]:
+  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
+
+  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
+                      "CPU_CLK_UNHALTED.DISTRIBUTED",
+                      "cycles")
+  # Number of CPU cycles scaled for SMT.
+  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
+
+  metrics = []
+  for x in pipeline_events:
+    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
+      name = x["EventName"]
+      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
+      if name.endswith("_CORE"):
+        cyc = core_cycles
+      else:
+        cyc = smt_cycles
+      metrics.append(Metric(port, f"{port} utilization (higher is better)",
+                            d_ratio(Event(name), cyc), "100%"))
+  if len(metrics) == 0:
+    return None
+
+  return MetricGroup("ports", metrics, "functional unit (port) utilization -- "
+                     "fraction of cycles each port is utilized (higher is better)")
+
+
 def IntelSwpf() -> Optional[MetricGroup]:
   ins = Event("instructions")
   try:
@@ -352,6 +380,7 @@ def main() -> None:
       Smi(),
       Tsx(),
       IntelBr(),
+      IntelPorts(),
       IntelSwpf(),
   ])
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 10/22] perf jevents: Add L2 metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (8 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 11/22] perf jevents: Add load store breakdown metrics ldst " Ian Rogers
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Give a breakdown of various L2 counters as metrics, including totals,
reads, hardware prefetcher, RFO, code and evictions.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 158 +++++++++++++++++++++++++
 1 file changed, 158 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 3ef4eb868580..4ddc68006b10 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -261,6 +261,163 @@ def IntelBr():
                      description="breakdown of retired branch instructions")
 
 
+def IntelL2() -> Optional[MetricGroup]:
+  try:
+    DC_HIT = Event("L2_RQSTS.DEMAND_DATA_RD_HIT")
+  except:
+    return None
+  try:
+    DC_MISS = Event("L2_RQSTS.DEMAND_DATA_RD_MISS")
+    l2_dmnd_miss = DC_MISS
+    l2_dmnd_rd_all = DC_MISS + DC_HIT
+  except:
+    DC_ALL = Event("L2_RQSTS.ALL_DEMAND_DATA_RD")
+    l2_dmnd_miss = DC_ALL - DC_HIT
+    l2_dmnd_rd_all = DC_ALL
+  l2_dmnd_mrate = d_ratio(l2_dmnd_miss, interval_sec)
+  l2_dmnd_rrate = d_ratio(l2_dmnd_rd_all, interval_sec)
+
+  DC_PFH = None
+  DC_PFM = None
+  l2_pf_all = None
+  l2_pf_mrate = None
+  l2_pf_rrate = None
+  try:
+    DC_PFH = Event("L2_RQSTS.PF_HIT")
+    DC_PFM = Event("L2_RQSTS.PF_MISS")
+    l2_pf_all = DC_PFH + DC_PFM
+    l2_pf_mrate = d_ratio(DC_PFM, interval_sec)
+    l2_pf_rrate = d_ratio(l2_pf_all, interval_sec)
+  except:
+    pass
+
+  DC_RFOH = Event("L2_RQSTS.RFO_HIT")
+  DC_RFOM = Event("L2_RQSTS.RFO_MISS")
+  l2_rfo_all = DC_RFOH + DC_RFOM
+  l2_rfo_mrate  = d_ratio(DC_RFOM, interval_sec)
+  l2_rfo_rrate  = d_ratio(l2_rfo_all, interval_sec)
+
+  DC_CH = Event("L2_RQSTS.CODE_RD_HIT")
+  DC_CM = Event("L2_RQSTS.CODE_RD_MISS")
+  DC_IN = Event("L2_LINES_IN.ALL")
+  DC_OUT_NS = None
+  DC_OUT_S = None
+  l2_lines_out = None
+  l2_out_rate = None
+  wbn = None
+  isd = None
+  try:
+    DC_OUT_NS = Event("L2_LINES_OUT.NON_SILENT",
+                      "L2_LINES_OUT.DEMAND_DIRTY",
+                      "L2_LINES_IN.S")
+    DC_OUT_S = Event("L2_LINES_OUT.SILENT",
+                     "L2_LINES_OUT.DEMAND_CLEAN",
+                     "L2_LINES_IN.I")
+    if DC_OUT_S.name == "L2_LINES_OUT.SILENT" and (
+        args.model.startswith("skylake") or
+        args.model == "cascadelakex"):
+      DC_OUT_S.name = "L2_LINES_OUT.SILENT/any/"
+    # bring is back to per-CPU
+    l2_s  = Select(DC_OUT_S / 2, Literal("#smt_on"), DC_OUT_S)
+    l2_ns = DC_OUT_NS
+    l2_lines_out = l2_s + l2_ns;
+    l2_out_rate = d_ratio(l2_lines_out, interval_sec);
+    nlr = max(l2_ns - DC_WB_U - DC_WB_D, 0)
+    wbn = d_ratio(nlr, interval_sec)
+    isd = d_ratio(l2_s, interval_sec)
+  except:
+    pass
+  DC_OUT_U = None
+  l2_pf_useless = None
+  l2_useless_rate = None
+  try:
+    DC_OUT_U = Event("L2_LINES_OUT.USELESS_HWPF")
+    l2_pf_useless = DC_OUT_U
+    l2_useless_rate = d_ratio(l2_pf_useless, interval_sec)
+  except:
+    pass
+  DC_WB_U = None
+  DC_WB_D = None
+  wbu = None
+  wbd = None
+  try:
+    DC_WB_U = Event("IDI_MISC.WB_UPGRADE")
+    DC_WB_D = Event("IDI_MISC.WB_DOWNGRADE")
+    wbu = d_ratio(DC_WB_U, interval_sec)
+    wbd = d_ratio(DC_WB_D, interval_sec)
+  except:
+    pass
+
+  l2_lines_in = DC_IN
+  l2_code_all = DC_CH + DC_CM
+  l2_code_rate = d_ratio(l2_code_all, interval_sec)
+  l2_code_miss_rate = d_ratio(DC_CM, interval_sec)
+  l2_in_rate = d_ratio(l2_lines_in, interval_sec)
+
+  return MetricGroup("l2", [
+    MetricGroup("l2_totals", [
+      Metric("l2_totals_in", "L2 cache total in per second",
+             l2_in_rate, "In/s"),
+      Metric("l2_totals_out", "L2 cache total out per second",
+             l2_out_rate, "Out/s") if l2_out_rate else None,
+    ]),
+    MetricGroup("l2_rd", [
+      Metric("l2_rd_hits", "L2 cache data read hits",
+             d_ratio(DC_HIT, l2_dmnd_rd_all), "100%"),
+      Metric("l2_rd_hits", "L2 cache data read hits",
+             d_ratio(l2_dmnd_miss, l2_dmnd_rd_all), "100%"),
+      Metric("l2_rd_requests", "L2 cache data read requests per second",
+             l2_dmnd_rrate, "requests/s"),
+      Metric("l2_rd_misses", "L2 cache data read misses per second",
+             l2_dmnd_mrate, "misses/s"),
+    ]),
+    MetricGroup("l2_hwpf", [
+      Metric("l2_hwpf_hits", "L2 cache hardware prefetcher hits",
+             d_ratio(DC_PFH, l2_pf_all), "100%"),
+      Metric("l2_hwpf_misses", "L2 cache hardware prefetcher misses",
+             d_ratio(DC_PFM, l2_pf_all), "100%"),
+      Metric("l2_hwpf_useless", "L2 cache hardware prefetcher useless prefetches per second",
+             l2_useless_rate, "100%") if l2_useless_rate else None,
+      Metric("l2_hwpf_requests", "L2 cache hardware prefetcher requests per second",
+             l2_pf_rrate, "100%"),
+      Metric("l2_hwpf_misses", "L2 cache hardware prefetcher misses per second",
+             l2_pf_mrate, "100%"),
+    ]) if DC_PFH else None,
+    MetricGroup("l2_rfo", [
+      Metric("l2_rfo_hits", "L2 cache request for ownership (RFO) hits",
+             d_ratio(DC_RFOH, l2_rfo_all), "100%"),
+      Metric("l2_rfo_misses", "L2 cache request for ownership (RFO) misses",
+             d_ratio(DC_RFOM, l2_rfo_all), "100%"),
+      Metric("l2_rfo_requests", "L2 cache request for ownership (RFO) requests per second",
+             l2_rfo_rrate, "requests/s"),
+      Metric("l2_rfo_misses", "L2 cache request for ownership (RFO) misses per second",
+             l2_rfo_mrate, "misses/s"),
+    ]),
+    MetricGroup("l2_code", [
+      Metric("l2_code_hits", "L2 cache code hits",
+             d_ratio(DC_CH, l2_code_all), "100%"),
+      Metric("l2_code_misses", "L2 cache code misses",
+             d_ratio(DC_CM, l2_code_all), "100%"),
+      Metric("l2_code_requests", "L2 cache code requests per second",
+             l2_code_rate, "requests/s"),
+      Metric("l2_code_misses", "L2 cache code misses per second",
+             l2_code_miss_rate, "misses/s"),
+    ]),
+    MetricGroup("l2_evict", [
+      MetricGroup("l2_evict_mef_lines", [
+        Metric("l2_evict_mef_lines_l3_hot_lru", "L2 evictions M/E/F lines L3 hot LRU per second",
+               wbu, "HotLRU/s") if wbu else None,
+        Metric("l2_evict_mef_lines_l3_norm_lru", "L2 evictions M/E/F lines L3 normal LRU per second",
+               wbn, "NormLRU/s") if wbn else None,
+        Metric("l2_evict_mef_lines_dropped", "L2 evictions M/E/F lines dropped per second",
+               wbd, "dropped/s") if wbd else None,
+        Metric("l2_evict_is_lines_dropped", "L2 evictions I/S lines dropped per second",
+               isd, "dropped/s") if isd else None,
+      ]),
+    ]),
+  ], description = "L2 data cache analysis")
+
+
 def IntelPorts() -> Optional[MetricGroup]:
   pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
 
@@ -380,6 +537,7 @@ def main() -> None:
       Smi(),
       Tsx(),
       IntelBr(),
+      IntelL2(),
       IntelPorts(),
       IntelSwpf(),
   ])
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 11/22] perf jevents: Add load store breakdown metrics ldst for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (9 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 10/22] perf jevents: Add L2 metrics for Intel Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 12/22] perf jevents: Add ILP metrics " Ian Rogers
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Give breakdown of number of instructions. Use the counter mask (cmask)
to show the number of cycles taken to retire the instructions.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 86 +++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 4ddc68006b10..d528b97e8822 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -2,7 +2,7 @@
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
 from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
                     JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
-                    Metric, MetricGroup, MetricRef, Select)
+                    Metric, MetricConstraint, MetricGroup, MetricRef, Select)
 import argparse
 import json
 import math
@@ -509,6 +509,89 @@ def IntelSwpf() -> Optional[MetricGroup]:
   ], description="Software prefetch instruction breakdown")
 
 
+def IntelLdSt() -> Optional[MetricGroup]:
+  if _args.model in [
+      "bonnell",
+      "nehalemep",
+      "nehalemex",
+      "westmereep-dp",
+      "westmereep-sp",
+      "westmereex",
+  ]:
+    return None
+  LDST_LD = Event("MEM_INST_RETIRED.ALL_LOADS", "MEM_UOPS_RETIRED.ALL_LOADS")
+  LDST_ST = Event("MEM_INST_RETIRED.ALL_STORES", "MEM_UOPS_RETIRED.ALL_STORES")
+  LDST_LDC1 = Event(f"{LDST_LD.name}/cmask=1/")
+  LDST_STC1 = Event(f"{LDST_ST.name}/cmask=1/")
+  LDST_LDC2 = Event(f"{LDST_LD.name}/cmask=2/")
+  LDST_STC2 = Event(f"{LDST_ST.name}/cmask=2/")
+  LDST_LDC3 = Event(f"{LDST_LD.name}/cmask=3/")
+  LDST_STC3 = Event(f"{LDST_ST.name}/cmask=3/")
+  ins = Event("instructions")
+  LDST_CYC = Event("CPU_CLK_UNHALTED.THREAD",
+                   "CPU_CLK_UNHALTED.CORE_P",
+                   "CPU_CLK_UNHALTED.THREAD_P")
+  LDST_PRE = None
+  try:
+    LDST_PRE = Event("LOAD_HIT_PREFETCH.SWPF", "LOAD_HIT_PRE.SW_PF")
+  except:
+    pass
+  LDST_AT = None
+  try:
+    LDST_AT = Event("MEM_INST_RETIRED.LOCK_LOADS")
+  except:
+    pass
+  cyc  = LDST_CYC
+
+  ld_rate = d_ratio(LDST_LD, interval_sec)
+  st_rate = d_ratio(LDST_ST, interval_sec)
+  pf_rate = d_ratio(LDST_PRE, interval_sec) if LDST_PRE else None
+  at_rate = d_ratio(LDST_AT, interval_sec) if LDST_AT else None
+
+  ldst_ret_constraint = MetricConstraint.GROUPED_EVENTS
+  if LDST_LD.name == "MEM_UOPS_RETIRED.ALL_LOADS":
+    ldst_ret_constraint = MetricConstraint.NO_GROUP_EVENTS_NMI
+
+  return MetricGroup("ldst", [
+      MetricGroup("ldst_total", [
+          Metric("ldst_total_loads", "Load/store instructions total loads",
+                 ld_rate, "loads"),
+          Metric("ldst_total_stores", "Load/store instructions total stores",
+                 st_rate, "stores"),
+      ]),
+      MetricGroup("ldst_prcnt", [
+          Metric("ldst_prcnt_loads", "Percent of all instructions that are loads",
+                 d_ratio(LDST_LD, ins), "100%"),
+          Metric("ldst_prcnt_stores", "Percent of all instructions that are stores",
+                 d_ratio(LDST_ST, ins), "100%"),
+      ]),
+      MetricGroup("ldst_ret_lds", [
+          Metric("ldst_ret_lds_1", "Retired loads in 1 cycle",
+                 d_ratio(max(LDST_LDC1 - LDST_LDC2, 0), cyc), "100%",
+                 constraint = ldst_ret_constraint),
+          Metric("ldst_ret_lds_2", "Retired loads in 2 cycles",
+                 d_ratio(max(LDST_LDC2 - LDST_LDC3, 0), cyc), "100%",
+                 constraint = ldst_ret_constraint),
+          Metric("ldst_ret_lds_3", "Retired loads in 3 or more cycles",
+                 d_ratio(LDST_LDC3, cyc), "100%"),
+      ]),
+      MetricGroup("ldst_ret_sts", [
+          Metric("ldst_ret_sts_1", "Retired stores in 1 cycle",
+                 d_ratio(max(LDST_STC1 - LDST_STC2, 0), cyc), "100%",
+                 constraint = ldst_ret_constraint),
+          Metric("ldst_ret_sts_2", "Retired stores in 2 cycles",
+                 d_ratio(max(LDST_STC2 - LDST_STC3, 0), cyc), "100%",
+                 constraint = ldst_ret_constraint),
+          Metric("ldst_ret_sts_3", "Retired stores in 3 more cycles",
+                 d_ratio(LDST_STC3, cyc), "100%"),
+      ]),
+      Metric("ldst_ld_hit_swpf", "Load hit software prefetches per second",
+             pf_rate, "swpf/s") if pf_rate else None,
+      Metric("ldst_atomic_lds", "Atomic loads per second",
+             at_rate, "loads/s") if at_rate else None,
+  ], description = "Breakdown of load/store instructions")
+
+
 def main() -> None:
   global _args
 
@@ -538,6 +621,7 @@ def main() -> None:
       Tsx(),
       IntelBr(),
       IntelL2(),
+      IntelLdSt(),
       IntelPorts(),
       IntelSwpf(),
   ])
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 12/22] perf jevents: Add ILP metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (10 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 11/22] perf jevents: Add load store breakdown metrics ldst " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 13/22] perf jevents: Add context switch " Ian Rogers
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Use the counter mask (cmask) to see how many cycles an instruction
takes to retire. Present as a set of ILP metrics.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index d528b97e8822..1d886e416e7f 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -261,6 +261,38 @@ def IntelBr():
                      description="breakdown of retired branch instructions")
 
 
+def IntelIlp() -> MetricGroup:
+  tsc = Event("msr/tsc/")
+  c0 = Event("msr/mperf/")
+  low = tsc - c0
+  inst_ret = Event("INST_RETIRED.ANY_P")
+  inst_ret_c = [Event(f"{inst_ret.name}/cmask={x}/") for x in range(1, 6)]
+  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
+                      "CPU_CLK_UNHALTED.DISTRIBUTED",
+                      "cycles")
+  ilp = [d_ratio(max(inst_ret_c[x] - inst_ret_c[x + 1], 0), core_cycles) for x in range(0, 4)]
+  ilp.append(d_ratio(inst_ret_c[4], core_cycles))
+  ilp0 = 1
+  for x in ilp:
+    ilp0 -= x
+  return MetricGroup("ilp", [
+      Metric("ilp_idle", "Lower power cycles as a percentage of all cycles",
+             d_ratio(low, tsc), "100%"),
+      Metric("ilp_inst_ret_0", "Instructions retired in 0 cycles as a percentage of all cycles",
+             ilp0, "100%"),
+      Metric("ilp_inst_ret_1", "Instructions retired in 1 cycles as a percentage of all cycles",
+             ilp[0], "100%"),
+      Metric("ilp_inst_ret_2", "Instructions retired in 2 cycles as a percentage of all cycles",
+             ilp[1], "100%"),
+      Metric("ilp_inst_ret_3", "Instructions retired in 3 cycles as a percentage of all cycles",
+             ilp[2], "100%"),
+      Metric("ilp_inst_ret_4", "Instructions retired in 4 cycles as a percentage of all cycles",
+             ilp[3], "100%"),
+      Metric("ilp_inst_ret_5", "Instructions retired in 5 or more cycles as a percentage of all cycles",
+             ilp[4], "100%"),
+  ])
+
+
 def IntelL2() -> Optional[MetricGroup]:
   try:
     DC_HIT = Event("L2_RQSTS.DEMAND_DATA_RD_HIT")
@@ -620,6 +652,7 @@ def main() -> None:
       Smi(),
       Tsx(),
       IntelBr(),
+      IntelIlp(),
       IntelL2(),
       IntelLdSt(),
       IntelPorts(),
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 13/22] perf jevents: Add context switch metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (11 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 12/22] perf jevents: Add ILP metrics " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 14/22] perf jevents: Add FPU " Ian Rogers
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Metrics break down context switches for different kinds of
instruction.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 55 ++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 1d886e416e7f..7cd933a28cfd 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -261,6 +261,60 @@ def IntelBr():
                      description="breakdown of retired branch instructions")
 
 
+def IntelCtxSw() -> MetricGroup:
+  cs = Event("context\-switches")
+  metrics = [
+      Metric("cs_rate", "Context switches per second", d_ratio(cs, interval_sec), "ctxsw/s")
+  ]
+
+  ev = Event("instructions")
+  metrics.append(Metric("cs_instr", "Instructions per context switch",
+                        d_ratio(ev, cs), "instr/cs"))
+
+  ev = Event("cycles")
+  metrics.append(Metric("cs_cycles", "Cycles per context switch",
+                        d_ratio(ev, cs), "cycles/cs"))
+
+  try:
+    ev = Event("MEM_INST_RETIRED.ALL_LOADS", "MEM_UOPS_RETIRED.ALL_LOADS")
+    metrics.append(Metric("cs_loads", "Loads per context switch",
+                          d_ratio(ev, cs), "loads/cs"))
+  except:
+    pass
+
+  try:
+    ev = Event("MEM_INST_RETIRED.ALL_STORES", "MEM_UOPS_RETIRED.ALL_STORES")
+    metrics.append(Metric("cs_stores", "Stores per context switch",
+                          d_ratio(ev, cs), "stores/cs"))
+  except:
+    pass
+
+  try:
+    ev = Event("BR_INST_RETIRED.NEAR_TAKEN", "BR_INST_RETIRED.TAKEN_JCC")
+    metrics.append(Metric("cs_br_taken", "Branches taken per context switch",
+                          d_ratio(ev, cs), "br_taken/cs"))
+  except:
+    pass
+
+  try:
+    l2_misses = (Event("L2_RQSTS.DEMAND_DATA_RD_MISS") +
+                 Event("L2_RQSTS.RFO_MISS") +
+                 Event("L2_RQSTS.CODE_RD_MISS"))
+    try:
+      l2_misses += Event("L2_RQSTS.HWPF_MISS", "L2_RQSTS.L2_PF_MISS", "L2_RQSTS.PF_MISS")
+    except:
+      pass
+
+    metrics.append(Metric("cs_l2_misses", "L2 misses per context switch",
+                          d_ratio(l2_misses, cs), "l2_misses/cs"))
+  except:
+    pass
+
+  return MetricGroup("cs", metrics,
+                     description = ("Number of context switches per second, instructions "
+                                    "retired & core cycles between context switches"))
+
+
 def IntelIlp() -> MetricGroup:
   tsc = Event("msr/tsc/")
   c0 = Event("msr/mperf/")
@@ -652,6 +706,7 @@ def main() -> None:
       Smi(),
       Tsx(),
       IntelBr(),
+      IntelCtxSw(),
       IntelIlp(),
       IntelL2(),
       IntelLdSt(),
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 14/22] perf jevents: Add FPU metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (12 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 13/22] perf jevents: Add context switch " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 15/22] perf jevents: Add Miss Level Parallelism (MLP) metric " Ian Rogers
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Metrics break down of floating point operations.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 90 ++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 7cd933a28cfd..dc14fff7abc3 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -315,6 +315,95 @@ def IntelCtxSw() -> MetricGroup:
                                     "retired & core cycles between context switches"))
 
 
+def IntelFpu() -> Optional[MetricGroup]:
+  cyc = Event("cycles")
+  try:
+    s_64 = Event("FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
+                 "SIMD_INST_RETIRED.SCALAR_SINGLE")
+  except:
+    return None
+  d_64 = Event("FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
+               "SIMD_INST_RETIRED.SCALAR_DOUBLE")
+  s_128 = Event("FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE",
+                "SIMD_INST_RETIRED.PACKED_SINGLE")
+
+  flop = s_64 + d_64 + 4 * s_128
+
+  d_128 = None
+  s_256 = None
+  d_256 = None
+  s_512 = None
+  d_512 = None
+  try:
+    d_128 = Event("FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE")
+    flop += 2 * d_128
+    s_256 = Event("FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE")
+    flop += 8 * s_256
+    d_256 = Event("FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE")
+    flop += 4 * d_256
+    s_512 = Event("FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE")
+    flop += 16 * s_512
+    d_512 = Event("FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE")
+    flop += 8 * d_512
+  except:
+    pass
+
+  f_assist = Event("ASSISTS.FP", "FP_ASSIST.ANY", "FP_ASSIST.S")
+  if f_assist in [
+      "ASSISTS.FP",
+      "FP_ASSIST.S",
+  ]:
+    f_assist += "/cmask=1/"
+
+  flop_r = d_ratio(flop, interval_sec)
+  flop_c = d_ratio(flop, cyc)
+  nmi_constraint = MetricConstraint.GROUPED_EVENTS
+  if f_assist.name == "ASSISTS.FP": # Icelake+
+    nmi_constraint = MetricConstraint.NO_GROUP_EVENTS_NMI
+  def FpuMetrics(group: str, fl: Optional[Event], mult: int, desc: str) -> Optional[MetricGroup]:
+    if not fl:
+      return None
+
+    f = fl * mult
+    fl_r = d_ratio(f, interval_sec)
+    r_s = d_ratio(fl, interval_sec)
+    return MetricGroup(group, [
+        Metric(f"{group}_of_total", desc + " floating point operations per second",
+               d_ratio(f, flop), "100%"),
+        Metric(f"{group}_flops", desc + " floating point operations per second",
+               fl_r, "flops/s"),
+        Metric(f"{group}_ops", desc + " operations per second",
+               r_s, "ops/s"),
+    ])
+
+  return MetricGroup("fpu", [
+      MetricGroup("fpu_total", [
+          Metric("fpu_total_flops", "Floating point operations per second",
+                 flop_r, "flops/s"),
+          Metric("fpu_total_flopc", "Floating point operations per cycle",
+                 flop_c, "flops/cycle", constraint=nmi_constraint),
+      ]),
+      MetricGroup("fpu_64", [
+          FpuMetrics("fpu_64_single", s_64, 1, "64-bit single"),
+          FpuMetrics("fpu_64_double", d_64, 1, "64-bit double"),
+      ]),
+      MetricGroup("fpu_128", [
+          FpuMetrics("fpu_128_single", s_128, 4, "128-bit packed single"),
+          FpuMetrics("fpu_128_double", d_128, 2, "128-bit packed double"),
+      ]),
+      MetricGroup("fpu_256", [
+          FpuMetrics("fpu_256_single", s_256, 8, "128-bit packed single"),
+          FpuMetrics("fpu_256_double", d_256, 4, "128-bit packed double"),
+      ]),
+      MetricGroup("fpu_512", [
+          FpuMetrics("fpu_512_single", s_512, 16, "128-bit packed single"),
+          FpuMetrics("fpu_512_double", d_512, 8, "128-bit packed double"),
+      ]),
+      Metric("fpu_assists", "FP assists as a percentage of cycles",
+             d_ratio(f_assist, cyc), "100%"),
+  ])
+
+
 def IntelIlp() -> MetricGroup:
   tsc = Event("msr/tsc/")
   c0 = Event("msr/mperf/")
@@ -707,6 +796,7 @@ def main() -> None:
       Tsx(),
       IntelBr(),
       IntelCtxSw(),
+      IntelFpu(),
       IntelIlp(),
       IntelL2(),
       IntelLdSt(),
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 15/22] perf jevents: Add Miss Level Parallelism (MLP) metric for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (13 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 14/22] perf jevents: Add FPU " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 16/22] perf jevents: Add mem_bw " Ian Rogers
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Number of outstanding load misses per cycle.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index dc14fff7abc3..8c6be9e1883f 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -593,6 +593,20 @@ def IntelL2() -> Optional[MetricGroup]:
   ], description = "L2 data cache analysis")
 
 
+def IntelMlp() -> Optional[Metric]:
+  try:
+    l1d = Event("L1D_PEND_MISS.PENDING")
+    l1dc = Event("L1D_PEND_MISS.PENDING_CYCLES")
+  except:
+    return None
+
+  l1dc = Select(l1dc / 2, Literal("#smt_on"), l1dc)
+  ml = d_ratio(l1d, l1dc)
+  return Metric("mlp",
+                "Miss level parallelism - number of outstanding load misses per cycle (higher is better)",
+                ml, "load_miss_pending/cycle")
+
+
 def IntelPorts() -> Optional[MetricGroup]:
   pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
 
@@ -800,6 +814,7 @@ def main() -> None:
       IntelIlp(),
       IntelL2(),
       IntelLdSt(),
+      IntelMlp(),
       IntelPorts(),
       IntelSwpf(),
   ])
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 16/22] perf jevents: Add mem_bw metric for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (14 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 15/22] perf jevents: Add Miss Level Parallelism (MLP) metric " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 17/22] perf jevents: Add local/remote "mem" breakdown metrics " Ian Rogers
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Break down memory bandwidth using uncore counters. For many models
this matches the memory_bandwidth_* metrics, but these metrics aren't
made available on all models. Add support for free running counters.
Query the event json when determining which what events/counters are
available.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 62 ++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 8c6be9e1883f..05e803286f29 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -781,6 +781,67 @@ def IntelLdSt() -> Optional[MetricGroup]:
   ], description = "Breakdown of load/store instructions")
 
 
+def UncoreMemBw() -> Optional[MetricGroup]:
+  mem_events = []
+  try:
+    mem_events = json.load(open(f"{os.path.dirname(os.path.realpath(__file__))}"
+                                f"/arch/x86/{args.model}/uncore-memory.json"))
+  except:
+    pass
+
+  ddr_rds = 0
+  ddr_wrs = 0
+  ddr_total = 0
+  for x in mem_events:
+    if "EventName" in x:
+      name = x["EventName"]
+      if re.search("^UNC_MC[0-9]+_RDCAS_COUNT_FREERUN", name):
+        ddr_rds += Event(name)
+      elif re.search("^UNC_MC[0-9]+_WRCAS_COUNT_FREERUN", name):
+        ddr_wrs += Event(name)
+      #elif re.search("^UNC_MC[0-9]+_TOTAL_REQCOUNT_FREERUN", name):
+      #  ddr_total += Event(name)
+
+  if ddr_rds == 0:
+    try:
+      ddr_rds = Event("UNC_M_CAS_COUNT.RD")
+      ddr_wrs = Event("UNC_M_CAS_COUNT.WR")
+    except:
+      return None
+
+  ddr_total = ddr_rds + ddr_wrs
+
+  pmm_rds = 0
+  pmm_wrs = 0
+  try:
+    pmm_rds = Event("UNC_M_PMM_RPQ_INSERTS")
+    pmm_wrs = Event("UNC_M_PMM_WPQ_INSERTS")
+  except:
+    pass
+
+  pmm_total = pmm_rds + pmm_wrs
+
+  scale = 64 / 1_000_000
+  return MetricGroup("mem_bw", [
+      MetricGroup("mem_bw_ddr", [
+          Metric("mem_bw_ddr_read", "DDR memory read bandwidth",
+                 d_ratio(ddr_rds, interval_sec), f"{scale}MB/s"),
+          Metric("mem_bw_ddr_write", "DDR memory write bandwidth",
+                 d_ratio(ddr_wrs, interval_sec), f"{scale}MB/s"),
+          Metric("mem_bw_ddr_total", "DDR memory write bandwidth",
+                 d_ratio(ddr_total, interval_sec), f"{scale}MB/s"),
+      ], description = "DDR Memory Bandwidth"),
+      MetricGroup("mem_bw_pmm", [
+          Metric("mem_bw_pmm_read", "PMM memory read bandwidth",
+                 d_ratio(pmm_rds, interval_sec), f"{scale}MB/s"),
+          Metric("mem_bw_pmm_write", "PMM memory write bandwidth",
+                 d_ratio(pmm_wrs, interval_sec), f"{scale}MB/s"),
+          Metric("mem_bw_pmm_total", "PMM memory write bandwidth",
+                 d_ratio(pmm_total, interval_sec), f"{scale}MB/s"),
+      ], description = "PMM Memory Bandwidth") if pmm_rds != 0 else None,
+  ], description = "Memory Bandwidth")
+
+
 def main() -> None:
   global _args
 
@@ -817,6 +878,7 @@ def main() -> None:
       IntelMlp(),
       IntelPorts(),
       IntelSwpf(),
+      UncoreMemBw(),
   ])
 
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 17/22] perf jevents: Add local/remote "mem" breakdown metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (15 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 16/22] perf jevents: Add mem_bw " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 18/22] perf jevents: Add dir " Ian Rogers
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Breakdown local and remote memory bandwidth, read and writes. The
implementation uses the HA and CHA PMUs present in server models
broadwellde, broadwellx cascadelakex, emeraldrapids, haswellx,
icelakex, ivytown, sapphirerapids and skylakex.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 05e803286f29..62d504036ba0 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -781,6 +781,32 @@ def IntelLdSt() -> Optional[MetricGroup]:
   ], description = "Breakdown of load/store instructions")
 
 
+def UncoreMem() -> Optional[MetricGroup]:
+  try:
+    loc_rds = Event("UNC_CHA_REQUESTS.READS_LOCAL", "UNC_H_REQUESTS.READS_LOCAL")
+    rem_rds = Event("UNC_CHA_REQUESTS.READS_REMOTE", "UNC_H_REQUESTS.READS_REMOTE")
+    loc_wrs = Event("UNC_CHA_REQUESTS.WRITES_LOCAL", "UNC_H_REQUESTS.WRITES_LOCAL")
+    rem_wrs = Event("UNC_CHA_REQUESTS.WRITES_REMOTE", "UNC_H_REQUESTS.WRITES_REMOTE")
+  except:
+    return None
+
+  scale = 64 / 1_000_000
+  return MetricGroup("mem", [
+      MetricGroup("mem_local", [
+          Metric("mem_local_read", "Local memory read bandwidth not including directory updates",
+                 d_ratio(loc_rds, interval_sec), f"{scale}MB/s"),
+          Metric("mem_local_write", "Local memory write bandwidth not including directory updates",
+                 d_ratio(loc_wrs, interval_sec), f"{scale}MB/s"),
+      ]),
+      MetricGroup("mem_remote", [
+          Metric("mem_remote_read", "Remote memory read bandwidth not including directory updates",
+                 d_ratio(rem_rds, interval_sec), f"{scale}MB/s"),
+          Metric("mem_remote_write", "Remote memory write bandwidth not including directory updates",
+                 d_ratio(rem_wrs, interval_sec), f"{scale}MB/s"),
+      ]),
+  ], description = "Memory Bandwidth breakdown local vs. remote (remote requests in). directory updates not included")
+
+
 def UncoreMemBw() -> Optional[MetricGroup]:
   mem_events = []
   try:
@@ -878,6 +904,7 @@ def main() -> None:
       IntelMlp(),
       IntelPorts(),
       IntelSwpf(),
+      UncoreMem(),
       UncoreMemBw(),
   ])
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 18/22] perf jevents: Add dir breakdown metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (16 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 17/22] perf jevents: Add local/remote "mem" breakdown metrics " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 19/22] perf jevents: Add C-State metrics from the PCU PMU " Ian Rogers
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Breakdown directory hit, misses and requests. The implementation uses
the M2M and CHA PMUs present in server models broadwellde, broadwellx
cascadelakex, emeraldrapids, icelakex, sapphirerapids and skylakex.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 36 ++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 62d504036ba0..77ac048c5451 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -781,6 +781,41 @@ def IntelLdSt() -> Optional[MetricGroup]:
   ], description = "Breakdown of load/store instructions")
 
 
+def UncoreDir() -> Optional[MetricGroup]:
+  try:
+    m2m_upd = Event("UNC_M2M_DIRECTORY_UPDATE.ANY")
+    m2m_hits = Event("UNC_M2M_DIRECTORY_HIT.DIRTY_I")
+    # Turn the umask into a ANY rather than DIRTY_I filter.
+    m2m_hits.name += "/umask=0xFF,name=UNC_M2M_DIRECTORY_HIT.ANY/"
+    m2m_miss = Event("UNC_M2M_DIRECTORY_MISS.DIRTY_I")
+    # Turn the umask into a ANY rather than DIRTY_I filter.
+    m2m_miss.name += "/umask=0xFF,name=UNC_M2M_DIRECTORY_MISS.ANY/"
+    cha_upd = Event("UNC_CHA_DIR_UPDATE.HA")
+    # Turn the umask into a ANY rather than HA filter.
+    cha_upd.name += "/umask=3,name=UNC_CHA_DIR_UPDATE.ANY/"
+  except:
+    return None
+
+  m2m_total = m2m_hits + m2m_miss
+  upd = m2m_upd + cha_upd # in cache lines
+  upd_r = upd / interval_sec
+  look_r = m2m_total / interval_sec
+
+  scale = 64 / 1_000_000 # Cache lines to MB
+  return MetricGroup("dir", [
+      Metric("dir_lookup_rate", "",
+             d_ratio(m2m_total, interval_sec), "requests/s"),
+      Metric("dir_lookup_hits", "",
+             d_ratio(m2m_hits, m2m_total), "100%"),
+      Metric("dir_lookup_misses", "",
+             d_ratio(m2m_miss, m2m_total), "100%"),
+      Metric("dir_update_requests", "",
+             d_ratio(m2m_upd + cha_upd, interval_sec), "requests/s"),
+      Metric("dir_update_bw", "",
+             d_ratio(m2m_upd + cha_upd, interval_sec), f"{scale}MB/s"),
+  ])
+
+
 def UncoreMem() -> Optional[MetricGroup]:
   try:
     loc_rds = Event("UNC_CHA_REQUESTS.READS_LOCAL", "UNC_H_REQUESTS.READS_LOCAL")
@@ -904,6 +939,7 @@ def main() -> None:
       IntelMlp(),
       IntelPorts(),
       IntelSwpf(),
+      UncoreDir(),
       UncoreMem(),
       UncoreMemBw(),
   ])
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 19/22] perf jevents: Add C-State metrics from the PCU PMU for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (17 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 18/22] perf jevents: Add dir " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 20/22] perf jevents: Add local/remote miss latency metrics " Ian Rogers
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Use occupancy events fixed in:
https://lore.kernel.org/lkml/20240226201517.3540187-1-irogers@google.com/

Metrics are at the socket level referring to cores, not hyperthreads.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 27 ++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 77ac048c5451..5668128273b3 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -781,6 +781,32 @@ def IntelLdSt() -> Optional[MetricGroup]:
   ], description = "Breakdown of load/store instructions")
 
 
+def UncoreCState() -> Optional[MetricGroup]:
+  try:
+    pcu_ticks = Event("UNC_P_CLOCKTICKS")
+    c0 = Event("UNC_P_POWER_STATE_OCCUPANCY.CORES_C0")
+    c3 = Event("UNC_P_POWER_STATE_OCCUPANCY.CORES_C3")
+    c6 = Event("UNC_P_POWER_STATE_OCCUPANCY.CORES_C6")
+  except:
+    return None
+
+  num_cores = Literal("#num_cores") / Literal("#num_packages")
+
+  max_cycles   = pcu_ticks * num_cores;
+  total_cycles = c0 + c3 + c6
+
+  # remove fused-off cores which show up in C6/C7.
+  c6 = Select(max(c6 - (total_cycles - max_cycles), 0),
+              total_cycles > max_cycles,
+              c6)
+
+  return MetricGroup("cstate", [
+      Metric("cstate_c0", "C-State cores in C0/C1", d_ratio(c0, pcu_ticks), "cores"),
+      Metric("cstate_c3", "C-State cores in C3", d_ratio(c3, pcu_ticks), "cores"),
+      Metric("cstate_c6", "C-State cores in C6/C7", d_ratio(c6, pcu_ticks), "cores"),
+  ])
+
+
 def UncoreDir() -> Optional[MetricGroup]:
   try:
     m2m_upd = Event("UNC_M2M_DIRECTORY_UPDATE.ANY")
@@ -939,6 +965,7 @@ def main() -> None:
       IntelMlp(),
       IntelPorts(),
       IntelSwpf(),
+      UncoreCState(),
       UncoreDir(),
       UncoreMem(),
       UncoreMemBw(),
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 20/22] perf jevents: Add local/remote miss latency metrics for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (18 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 19/22] perf jevents: Add C-State metrics from the PCU PMU " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 21/22] perf jevents: Add upi_bw metric " Ian Rogers
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Derive from CBOX/CHA occupancy and inserts the average latency as is
provided in Intel's uncore performance monitoring reference.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 68 ++++++++++++++++++++++++--
 1 file changed, 65 insertions(+), 3 deletions(-)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 5668128273b3..ec15653e2cb6 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -1,8 +1,9 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
-from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
-                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
-                    Metric, MetricConstraint, MetricGroup, MetricRef, Select)
+from metric import (d_ratio, has_event, max, source_count, CheckPmu, Event,
+                    JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
+                    Literal, LoadEvents, Metric, MetricConstraint, MetricGroup,
+                    MetricRef, Select)
 import argparse
 import json
 import math
@@ -593,6 +594,66 @@ def IntelL2() -> Optional[MetricGroup]:
   ], description = "L2 data cache analysis")
 
 
+def IntelMissLat() -> Optional[MetricGroup]:
+  try:
+    ticks = Event("UNC_CHA_CLOCKTICKS", "UNC_C_CLOCKTICKS")
+    data_rd_loc_occ = Event("UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_LOCAL",
+                            "UNC_CHA_TOR_OCCUPANCY.IA_MISS",
+                            "UNC_C_TOR_OCCUPANCY.MISS_LOCAL_OPCODE",
+                            "UNC_C_TOR_OCCUPANCY.MISS_OPCODE")
+    data_rd_loc_ins = Event("UNC_CHA_TOR_INSERTS.IA_MISS_DRD_LOCAL",
+                            "UNC_CHA_TOR_INSERTS.IA_MISS",
+                            "UNC_C_TOR_INSERTS.MISS_LOCAL_OPCODE",
+                            "UNC_C_TOR_INSERTS.MISS_OPCODE")
+    data_rd_rem_occ = Event("UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE",
+                            "UNC_CHA_TOR_OCCUPANCY.IA_MISS",
+                            "UNC_C_TOR_OCCUPANCY.MISS_REMOTE_OPCODE",
+                            "UNC_C_TOR_OCCUPANCY.NID_MISS_OPCODE")
+    data_rd_rem_ins = Event("UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE",
+                            "UNC_CHA_TOR_INSERTS.IA_MISS",
+                            "UNC_C_TOR_INSERTS.MISS_REMOTE_OPCODE",
+                            "UNC_C_TOR_INSERTS.NID_MISS_OPCODE")
+  except:
+    return None
+
+  if (data_rd_loc_occ.name == "UNC_C_TOR_OCCUPANCY.MISS_LOCAL_OPCODE" or
+      data_rd_loc_occ.name == "UNC_C_TOR_OCCUPANCY.MISS_OPCODE"):
+    data_rd = 0x182
+    for e in [data_rd_loc_occ, data_rd_loc_ins, data_rd_rem_occ, data_rd_rem_ins]:
+      e.name += f"/filter_opc={hex(data_rd)}/"
+  elif data_rd_loc_occ.name == "UNC_CHA_TOR_OCCUPANCY.IA_MISS":
+    # Demand Data Read - Full cache-line read requests from core for
+    # lines to be cached in S or E, typically for data
+    demand_data_rd = 0x202
+    #  LLC Prefetch Data - Uncore will first look up the line in the
+    #  LLC; for a cache hit, the LRU will be updated, on a miss, the
+    #  DRd will be initiated
+    llc_prefetch_data = 0x25a
+    local_filter = (f"/filter_opc0={hex(demand_data_rd)},"
+                    f"filter_opc1={hex(llc_prefetch_data)},"
+                    "filter_loc,filter_nm,filter_not_nm/")
+    remote_filter = (f"/filter_opc0={hex(demand_data_rd)},"
+                     f"filter_opc1={hex(llc_prefetch_data)},"
+                     "filter_rem,filter_nm,filter_not_nm/")
+    for e in [data_rd_loc_occ, data_rd_loc_ins]:
+      e.name += local_filter
+    for e in [data_rd_rem_occ, data_rd_rem_ins]:
+      e.name += remote_filter
+  else:
+    assert data_rd_loc_occ.name == "UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_LOCAL", data_rd_loc_occ
+
+  ticks_per_cha = ticks / source_count(data_rd_loc_ins)
+  loc_lat = interval_sec * 1e9 * data_rd_loc_occ / (ticks_per_cha * data_rd_loc_ins)
+  ticks_per_cha = ticks / source_count(data_rd_rem_ins)
+  rem_lat = interval_sec * 1e9 * data_rd_rem_occ / (ticks_per_cha * data_rd_rem_ins)
+  return MetricGroup("miss_lat", [
+      Metric("miss_lat_loc", "Local to a socket miss latency in nanoseconds",
+             loc_lat, "ns"),
+      Metric("miss_lat_rem", "Remote to a socket miss latency in nanoseconds",
+             rem_lat, "ns"),
+  ])
+
+
 def IntelMlp() -> Optional[Metric]:
   try:
     l1d = Event("L1D_PEND_MISS.PENDING")
@@ -962,6 +1023,7 @@ def main() -> None:
       IntelIlp(),
       IntelL2(),
       IntelLdSt(),
+      IntelMissLat(),
       IntelMlp(),
       IntelPorts(),
       IntelSwpf(),
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 21/22] perf jevents: Add upi_bw metric for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (19 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 20/22] perf jevents: Add local/remote miss latency metrics " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-26 17:50 ` [PATCH v4 22/22] perf jevents: Add mesh bandwidth saturation " Ian Rogers
  2024-09-27 18:33 ` [PATCH v4 00/22] Python generated Intel metrics Liang, Kan
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Break down UPI read and write bandwidth using uncore_upi counters.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index ec15653e2cb6..8e1c0bc17b8a 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -990,6 +990,27 @@ def UncoreMemBw() -> Optional[MetricGroup]:
   ], description = "Memory Bandwidth")
 
 
+def UncoreUpiBw() -> Optional[MetricGroup]:
+  try:
+    upi_rds = Event("UNC_UPI_RxL_FLITS.ALL_DATA")
+    upi_wrs = Event("UNC_UPI_TxL_FLITS.ALL_DATA")
+  except:
+    return None
+
+  upi_total = upi_rds + upi_wrs
+
+  # From "Uncore Performance Monitoring": When measuring the amount of
+  # bandwidth consumed by transmission of the data (i.e. NOT including
+  # the header), it should be .ALL_DATA / 9 * 64B.
+  scale = (64 / 9) / 1_000_000
+  return MetricGroup("upi_bw", [
+      Metric("upi_bw_read", "UPI read bandwidth",
+             d_ratio(upi_rds, interval_sec), f"{scale}MB/s"),
+      Metric("upi_bw_write", "DDR memory write bandwidth",
+             d_ratio(upi_wrs, interval_sec), f"{scale}MB/s"),
+  ], description = "UPI Bandwidth")
+
+
 def main() -> None:
   global _args
 
@@ -1031,6 +1052,7 @@ def main() -> None:
       UncoreDir(),
       UncoreMem(),
       UncoreMemBw(),
+      UncoreUpiBw(),
   ])
 
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 22/22] perf jevents: Add mesh bandwidth saturation metric for Intel
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (20 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 21/22] perf jevents: Add upi_bw metric " Ian Rogers
@ 2024-09-26 17:50 ` Ian Rogers
  2024-09-27 18:33 ` [PATCH v4 00/22] Python generated Intel metrics Liang, Kan
  22 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-09-26 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, Perry Taylor, Samantha Alt, Caleb Biggers,
	Weilin Wang, Edward Baker

Memory bandwidth saturation from CBOX/CHA events present in
broadwellde, broadwellx, cascadelakex, haswellx, icelakex, skylakex
and snowridgex.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/pmu-events/intel_metrics.py | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
index 8e1c0bc17b8a..a3a317d13841 100755
--- a/tools/perf/pmu-events/intel_metrics.py
+++ b/tools/perf/pmu-events/intel_metrics.py
@@ -990,6 +990,22 @@ def UncoreMemBw() -> Optional[MetricGroup]:
   ], description = "Memory Bandwidth")
 
 
+def UncoreMemSat() -> Optional[Metric]:
+  try:
+    clocks = Event("UNC_CHA_CLOCKTICKS", "UNC_C_CLOCKTICKS")
+    sat = Event("UNC_CHA_DISTRESS_ASSERTED.VERT", "UNC_CHA_FAST_ASSERTED.VERT",
+                "UNC_C_FAST_ASSERTED")
+  except:
+    return None
+
+  desc = ("Mesh Bandwidth saturation (% CBOX cycles with FAST signal asserted, "
+          "include QPI bandwidth saturation), lower is better")
+  if "UNC_CHA_" in sat.name:
+    desc = ("Mesh Bandwidth saturation (% CHA cycles with FAST signal asserted, "
+            "include UPI bandwidth saturation), lower is better")
+  return Metric("mem_sat", desc, d_ratio(sat, clocks), "100%")
+
+
 def UncoreUpiBw() -> Optional[MetricGroup]:
   try:
     upi_rds = Event("UNC_UPI_RxL_FLITS.ALL_DATA")
@@ -1052,6 +1068,7 @@ def main() -> None:
       UncoreDir(),
       UncoreMem(),
       UncoreMemBw(),
+      UncoreMemSat(),
       UncoreUpiBw(),
   ])
 
-- 
2.46.1.824.gd892dcdcdd-goog


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/22] Python generated Intel metrics
  2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
                   ` (21 preceding siblings ...)
  2024-09-26 17:50 ` [PATCH v4 22/22] perf jevents: Add mesh bandwidth saturation " Ian Rogers
@ 2024-09-27 18:33 ` Liang, Kan
  2024-10-09 16:02   ` Ian Rogers
  22 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-09-27 18:33 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> Generate twenty sets of additional metrics for Intel. Rapl and Idle
> metrics aren't specific to Intel but are placed here for ease and
> convenience. Smi and tsx metrics are added so they can be dropped from
> the per model json files. 

Are Smi and tsx metrics the only two metrics who's duplicate metrics in
the json files will be dropped?

It sounds like there will be many duplicate metrics in perf list, right?

Also, is it an attempt to define some architectural metrics for perf?
How do you decide which metrics should be added here?

Thanks,
Kan

> There are four uncore sets of metrics and
> eleven core metrics. Add a CheckPmu function to metric to simplify
> detecting the presence of hybrid PMUs in events. Metrics with
> experimental events are flagged as experimental in their description.
> 
> The patches should be applied on top of:
> https://lore.kernel.org/lkml/20240926174101.406874-1-irogers@google.com/
> 
> v4. Experimental metric descriptions. Add mesh bandwidth metric. Rebase.
> v3. Swap tsx and CheckPMU patches that were in the wrong order. Some
>     minor code cleanup changes. Drop reference to merged fix for
>     umasks/occ_sel in PCU events and for cstate metrics.
> v2. Drop the cycles breakdown in favor of having it as a common
>     metric, spelling and other improvements suggested by Kan Liang
>     <kan.liang@linux.intel.com>.
> 
> Ian Rogers (22):
>   perf jevents: Add RAPL metrics for all Intel models
>   perf jevents: Add idle metric for Intel models
>   perf jevents: Add smi metric group for Intel models
>   perf jevents: Add CheckPmu to see if a PMU is in loaded json events
>   perf jevents: Mark metrics with experimental events as experimental
>   perf jevents: Add tsx metric group for Intel models
>   perf jevents: Add br metric group for branch statistics on Intel
>   perf jevents: Add software prefetch (swpf) metric group for Intel
>   perf jevents: Add ports metric group giving utilization on Intel
>   perf jevents: Add L2 metrics for Intel
>   perf jevents: Add load store breakdown metrics ldst for Intel
>   perf jevents: Add ILP metrics for Intel
>   perf jevents: Add context switch metrics for Intel
>   perf jevents: Add FPU metrics for Intel
>   perf jevents: Add Miss Level Parallelism (MLP) metric for Intel
>   perf jevents: Add mem_bw metric for Intel
>   perf jevents: Add local/remote "mem" breakdown metrics for Intel
>   perf jevents: Add dir breakdown metrics for Intel
>   perf jevents: Add C-State metrics from the PCU PMU for Intel
>   perf jevents: Add local/remote miss latency metrics for Intel
>   perf jevents: Add upi_bw metric for Intel
>   perf jevents: Add mesh bandwidth saturation metric for Intel
> 
>  tools/perf/pmu-events/intel_metrics.py | 1046 +++++++++++++++++++++++-
>  tools/perf/pmu-events/metric.py        |   52 ++
>  2 files changed, 1095 insertions(+), 3 deletions(-)
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/22] Python generated Intel metrics
  2024-09-27 18:33 ` [PATCH v4 00/22] Python generated Intel metrics Liang, Kan
@ 2024-10-09 16:02   ` Ian Rogers
  2024-11-06 16:46     ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-10-09 16:02 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Fri, Sep 27, 2024 at 11:34 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > Generate twenty sets of additional metrics for Intel. Rapl and Idle
> > metrics aren't specific to Intel but are placed here for ease and
> > convenience. Smi and tsx metrics are added so they can be dropped from
> > the per model json files.
>
> Are Smi and tsx metrics the only two metrics who's duplicate metrics in
> the json files will be dropped?

Yes. These metrics with their runtime detection and use of sysfs event
names I feel more naturally fit here rather than in the Intel perfmon
github converter script.

> It sounds like there will be many duplicate metrics in perf list, right?

That's not the goal. There may be memory bandwidth computed in
different ways, like TMA and using uncore, but that seems okay as the
metrics are using different counters so may say different things. I
think there is an action to always watch the metrics and ensure
duplicates don't occur, but some duplication can be beneficial.

> Also, is it an attempt to define some architectural metrics for perf?

There are many advantages of using python to generate the metric json,
a few are:
1) we verify the metrics use events from the event json,
2) the error prone escaping of commas and slashes is handled by the python,
3) metric expressions can be spread over multiple lines and have comments.
It is also an advantage that we can avoid copy-pasting one metric from
one architectural metric json to another. This helps propagate fixes.

So, it's not so much a goal to have architectural metrics but its nice
that we avoid copy-paste. Somewhere where I've tried to set up common
events across all architectures is with making tool have its own PMU.
Rather than have the tool PMU describe events using custom code it
just reuses the existing PMU json support:
https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/pmu-events/arch/common/common/tool.json

> How do you decide which metrics should be added here?

The goal is to try to make open source metrics that Google has
internally. I've set up a git repo for this here:
https://github.com/googleprodkernel/linux-perf
Often the source of the metric is Intel's documentation on things like
uncore events, it's just such metrics aren't part of the perfmon
process and so we're adding them here. Were all these metrics on the
Intel github it'd be reasonable to remove them from here. If Intel
would like to work on or contribute some metrics here, that's also
fine. I think the main thing is to be giving users useful metrics.

Thanks,
Ian

> > There are four uncore sets of metrics and
> > eleven core metrics. Add a CheckPmu function to metric to simplify
> > detecting the presence of hybrid PMUs in events. Metrics with
> > experimental events are flagged as experimental in their description.
> >
> > The patches should be applied on top of:
> > https://lore.kernel.org/lkml/20240926174101.406874-1-irogers@google.com/
> >
> > v4. Experimental metric descriptions. Add mesh bandwidth metric. Rebase.
> > v3. Swap tsx and CheckPMU patches that were in the wrong order. Some
> >     minor code cleanup changes. Drop reference to merged fix for
> >     umasks/occ_sel in PCU events and for cstate metrics.
> > v2. Drop the cycles breakdown in favor of having it as a common
> >     metric, spelling and other improvements suggested by Kan Liang
> >     <kan.liang@linux.intel.com>.
> >
> > Ian Rogers (22):
> >   perf jevents: Add RAPL metrics for all Intel models
> >   perf jevents: Add idle metric for Intel models
> >   perf jevents: Add smi metric group for Intel models
> >   perf jevents: Add CheckPmu to see if a PMU is in loaded json events
> >   perf jevents: Mark metrics with experimental events as experimental
> >   perf jevents: Add tsx metric group for Intel models
> >   perf jevents: Add br metric group for branch statistics on Intel
> >   perf jevents: Add software prefetch (swpf) metric group for Intel
> >   perf jevents: Add ports metric group giving utilization on Intel
> >   perf jevents: Add L2 metrics for Intel
> >   perf jevents: Add load store breakdown metrics ldst for Intel
> >   perf jevents: Add ILP metrics for Intel
> >   perf jevents: Add context switch metrics for Intel
> >   perf jevents: Add FPU metrics for Intel
> >   perf jevents: Add Miss Level Parallelism (MLP) metric for Intel
> >   perf jevents: Add mem_bw metric for Intel
> >   perf jevents: Add local/remote "mem" breakdown metrics for Intel
> >   perf jevents: Add dir breakdown metrics for Intel
> >   perf jevents: Add C-State metrics from the PCU PMU for Intel
> >   perf jevents: Add local/remote miss latency metrics for Intel
> >   perf jevents: Add upi_bw metric for Intel
> >   perf jevents: Add mesh bandwidth saturation metric for Intel
> >
> >  tools/perf/pmu-events/intel_metrics.py | 1046 +++++++++++++++++++++++-
> >  tools/perf/pmu-events/metric.py        |   52 ++
> >  2 files changed, 1095 insertions(+), 3 deletions(-)
> >

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/22] Python generated Intel metrics
  2024-10-09 16:02   ` Ian Rogers
@ 2024-11-06 16:46     ` Liang, Kan
  2024-11-13 23:40       ` Ian Rogers
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 16:46 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-10-09 12:02 p.m., Ian Rogers wrote:
> On Fri, Sep 27, 2024 at 11:34 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>> Generate twenty sets of additional metrics for Intel. Rapl and Idle
>>> metrics aren't specific to Intel but are placed here for ease and
>>> convenience. Smi and tsx metrics are added so they can be dropped from
>>> the per model json files.
>>
>> Are Smi and tsx metrics the only two metrics who's duplicate metrics in
>> the json files will be dropped?
> 
> Yes. These metrics with their runtime detection and use of sysfs event
> names I feel more naturally fit here rather than in the Intel perfmon
> github converter script.
> 
>> It sounds like there will be many duplicate metrics in perf list, right?
> 
> That's not the goal. There may be memory bandwidth computed in
> different ways, like TMA and using uncore, but that seems okay as the
> metrics are using different counters so may say different things. I
> think there is an action to always watch the metrics and ensure
> duplicates don't occur, but some duplication can be beneficial.


Can we give a common prefix for all the automatically generated metrics,
e.g., general_ or std_?
As you said, there may be different metrics to calculate the same thing.

With a common prefix, we can clearly understand where the metrics is
from. In case, there are any issues found later for some metrics. I can
tell the end user to use either the TMA metrics or the automatically
generated metrics.
If they count the same thing, the main body of the metric name should be
the same.

Thanks,
Kan

> 
>> Also, is it an attempt to define some architectural metrics for perf?
> 
> There are many advantages of using python to generate the metric json,
> a few are:
> 1) we verify the metrics use events from the event json,
> 2) the error prone escaping of commas and slashes is handled by the python,
> 3) metric expressions can be spread over multiple lines and have comments.
> It is also an advantage that we can avoid copy-pasting one metric from
> one architectural metric json to another. This helps propagate fixes.
> 
> So, it's not so much a goal to have architectural metrics but its nice
> that we avoid copy-paste. Somewhere where I've tried to set up common
> events across all architectures is with making tool have its own PMU.
> Rather than have the tool PMU describe events using custom code it
> just reuses the existing PMU json support:
> https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/pmu-events/arch/common/common/tool.json
> 
>> How do you decide which metrics should be added here?
> 
> The goal is to try to make open source metrics that Google has
> internally. I've set up a git repo for this here:
> https://github.com/googleprodkernel/linux-perf
> Often the source of the metric is Intel's documentation on things like
> uncore events, it's just such metrics aren't part of the perfmon
> process and so we're adding them here. Were all these metrics on the
> Intel github it'd be reasonable to remove them from here. If Intel
> would like to work on or contribute some metrics here, that's also
> fine. I think the main thing is to be giving users useful metrics.
> 
> Thanks,
> Ian
> 
>>> There are four uncore sets of metrics and
>>> eleven core metrics. Add a CheckPmu function to metric to simplify
>>> detecting the presence of hybrid PMUs in events. Metrics with
>>> experimental events are flagged as experimental in their description.
>>>
>>> The patches should be applied on top of:
>>> https://lore.kernel.org/lkml/20240926174101.406874-1-irogers@google.com/
>>>
>>> v4. Experimental metric descriptions. Add mesh bandwidth metric. Rebase.
>>> v3. Swap tsx and CheckPMU patches that were in the wrong order. Some
>>>     minor code cleanup changes. Drop reference to merged fix for
>>>     umasks/occ_sel in PCU events and for cstate metrics.
>>> v2. Drop the cycles breakdown in favor of having it as a common
>>>     metric, spelling and other improvements suggested by Kan Liang
>>>     <kan.liang@linux.intel.com>.
>>>
>>> Ian Rogers (22):
>>>   perf jevents: Add RAPL metrics for all Intel models
>>>   perf jevents: Add idle metric for Intel models
>>>   perf jevents: Add smi metric group for Intel models
>>>   perf jevents: Add CheckPmu to see if a PMU is in loaded json events
>>>   perf jevents: Mark metrics with experimental events as experimental
>>>   perf jevents: Add tsx metric group for Intel models
>>>   perf jevents: Add br metric group for branch statistics on Intel
>>>   perf jevents: Add software prefetch (swpf) metric group for Intel
>>>   perf jevents: Add ports metric group giving utilization on Intel
>>>   perf jevents: Add L2 metrics for Intel
>>>   perf jevents: Add load store breakdown metrics ldst for Intel
>>>   perf jevents: Add ILP metrics for Intel
>>>   perf jevents: Add context switch metrics for Intel
>>>   perf jevents: Add FPU metrics for Intel
>>>   perf jevents: Add Miss Level Parallelism (MLP) metric for Intel
>>>   perf jevents: Add mem_bw metric for Intel
>>>   perf jevents: Add local/remote "mem" breakdown metrics for Intel
>>>   perf jevents: Add dir breakdown metrics for Intel
>>>   perf jevents: Add C-State metrics from the PCU PMU for Intel
>>>   perf jevents: Add local/remote miss latency metrics for Intel
>>>   perf jevents: Add upi_bw metric for Intel
>>>   perf jevents: Add mesh bandwidth saturation metric for Intel
>>>
>>>  tools/perf/pmu-events/intel_metrics.py | 1046 +++++++++++++++++++++++-
>>>  tools/perf/pmu-events/metric.py        |   52 ++
>>>  2 files changed, 1095 insertions(+), 3 deletions(-)
>>>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/22] perf jevents: Add idle metric for Intel models
  2024-09-26 17:50 ` [PATCH v4 02/22] perf jevents: Add idle metric for " Ian Rogers
@ 2024-11-06 17:01   ` Liang, Kan
  2024-11-06 17:08     ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 17:01 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> Compute using the msr PMU the percentage of wallclock cycles where the
> CPUs are in a low power state.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/pmu-events/intel_metrics.py | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index 58e23eb48312..f875eb844c78 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -1,7 +1,8 @@
>  #!/usr/bin/env python3
>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> -from metric import (d_ratio, has_event, Event, JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
> -                    LoadEvents, Metric, MetricGroup, Select)
> +from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
> +                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> +                    MetricGroup, Select)
>  import argparse
>  import json
>  import math
> @@ -11,6 +12,16 @@ import os
>  _args = None
>  interval_sec = Event("duration_time")
>  
> +def Idle() -> Metric:
> +  cyc = Event("msr/mperf/")
> +  tsc = Event("msr/tsc/")
> +  low = max(tsc - cyc, 0)
> +  return Metric(
> +      "idle",
> +      "Percentage of total wallclock cycles where CPUs are in low power state (C1 or deeper sleep state)",
> +      d_ratio(low, tsc), "100%")

I'm not sure if the metrics is correct, especially considering the mperf
is a R/W register. If someone clear the mperf, the restuls must be wrong.

Thanks,
Kan

> +
> +
>  def Rapl() -> MetricGroup:
>    """Processor power consumption estimate.
>  
> @@ -68,6 +79,7 @@ def main() -> None:
>    LoadEvents(directory)
>  
>    all_metrics = MetricGroup("", [
> +      Idle(),
>        Rapl(),
>    ])
>  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 02/22] perf jevents: Add idle metric for Intel models
  2024-11-06 17:01   ` Liang, Kan
@ 2024-11-06 17:08     ` Liang, Kan
  0 siblings, 0 replies; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 17:08 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-11-06 12:01 p.m., Liang, Kan wrote:
> 
> 
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>> Compute using the msr PMU the percentage of wallclock cycles where the
>> CPUs are in a low power state.
>>
>> Signed-off-by: Ian Rogers <irogers@google.com>
>> ---
>>  tools/perf/pmu-events/intel_metrics.py | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>> index 58e23eb48312..f875eb844c78 100755
>> --- a/tools/perf/pmu-events/intel_metrics.py
>> +++ b/tools/perf/pmu-events/intel_metrics.py
>> @@ -1,7 +1,8 @@
>>  #!/usr/bin/env python3
>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>> -from metric import (d_ratio, has_event, Event, JsonEncodeMetric, JsonEncodeMetricGroupDescriptions,
>> -                    LoadEvents, Metric, MetricGroup, Select)
>> +from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
>> +                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>> +                    MetricGroup, Select)
>>  import argparse
>>  import json
>>  import math
>> @@ -11,6 +12,16 @@ import os
>>  _args = None
>>  interval_sec = Event("duration_time")
>>  
>> +def Idle() -> Metric:
>> +  cyc = Event("msr/mperf/")
>> +  tsc = Event("msr/tsc/")
>> +  low = max(tsc - cyc, 0)
>> +  return Metric(
>> +      "idle",
>> +      "Percentage of total wallclock cycles where CPUs are in low power state (C1 or deeper sleep state)",
>> +      d_ratio(low, tsc), "100%")
> 

The mperf and aperf can be enumerated. It's better to check it before
using it, has_event(mperf).

Thanks,
Kan

> I'm not sure if the metrics is correct, especially considering the mperf
> is a R/W register. If someone clear the mperf, the restuls must be wrong.
> 
> Thanks,
> Kan
> 
>> +
>> +
>>  def Rapl() -> MetricGroup:
>>    """Processor power consumption estimate.
>>  
>> @@ -68,6 +79,7 @@ def main() -> None:
>>    LoadEvents(directory)
>>  
>>    all_metrics = MetricGroup("", [
>> +      Idle(),
>>        Rapl(),
>>    ])
>>  
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 03/22] perf jevents: Add smi metric group for Intel models
  2024-09-26 17:50 ` [PATCH v4 03/22] perf jevents: Add smi metric group " Ian Rogers
@ 2024-11-06 17:32   ` Liang, Kan
  2024-11-06 17:42     ` Ian Rogers
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 17:32 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> Allow duplicated metric to be dropped from json files.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/pmu-events/intel_metrics.py | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index f875eb844c78..f34b4230a4ee 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -2,7 +2,7 @@
>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>  from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
>                      JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> -                    MetricGroup, Select)
> +                    MetricGroup, MetricRef, Select)
>  import argparse
>  import json
>  import math
> @@ -56,6 +56,24 @@ def Rapl() -> MetricGroup:
>                       description="Running Average Power Limit (RAPL) power consumption estimates")
>  
>  
> +def Smi() -> MetricGroup:
> +    aperf = Event('msr/aperf/')
> +    cycles = Event('cycles')
> +    smi_num = Event('msr/smi/')
> +    smi_cycles = Select(Select((aperf - cycles) / aperf, smi_num > 0, 0),
> +                        has_event(aperf),
> +                        0)
> +    return MetricGroup('smi', [
> +        Metric('smi_num', 'Number of SMI interrupts.',
> +               Select(smi_num, has_event(smi_num), 0), 'SMI#'),
> +        # Note, the smi_cycles "Event" is really a reference to the metric.
> +        Metric('smi_cycles',
> +               'Percentage of cycles spent in System Management Interrupts. '
> +               'Requires /sys/devices/cpu/freeze_on_smi to be 1.',

It seems not work for hybrid?

Thanks,
Kan
> +               smi_cycles, '100%', threshold=(MetricRef('smi_cycles') > 0.10))
> +    ], description = 'System Management Interrupt metrics')
> +
> +
>  def main() -> None:
>    global _args
>  
> @@ -81,6 +99,7 @@ def main() -> None:
>    all_metrics = MetricGroup("", [
>        Idle(),
>        Rapl(),
> +      Smi(),
>    ])
>  
>  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 03/22] perf jevents: Add smi metric group for Intel models
  2024-11-06 17:32   ` Liang, Kan
@ 2024-11-06 17:42     ` Ian Rogers
  2024-11-06 18:29       ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-11-06 17:42 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Wed, Nov 6, 2024 at 9:32 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > Allow duplicated metric to be dropped from json files.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/pmu-events/intel_metrics.py | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> > index f875eb844c78..f34b4230a4ee 100755
> > --- a/tools/perf/pmu-events/intel_metrics.py
> > +++ b/tools/perf/pmu-events/intel_metrics.py
> > @@ -2,7 +2,7 @@
> >  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> >  from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
> >                      JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> > -                    MetricGroup, Select)
> > +                    MetricGroup, MetricRef, Select)
> >  import argparse
> >  import json
> >  import math
> > @@ -56,6 +56,24 @@ def Rapl() -> MetricGroup:
> >                       description="Running Average Power Limit (RAPL) power consumption estimates")
> >
> >
> > +def Smi() -> MetricGroup:
> > +    aperf = Event('msr/aperf/')
> > +    cycles = Event('cycles')
> > +    smi_num = Event('msr/smi/')
> > +    smi_cycles = Select(Select((aperf - cycles) / aperf, smi_num > 0, 0),
> > +                        has_event(aperf),
> > +                        0)
> > +    return MetricGroup('smi', [
> > +        Metric('smi_num', 'Number of SMI interrupts.',
> > +               Select(smi_num, has_event(smi_num), 0), 'SMI#'),
> > +        # Note, the smi_cycles "Event" is really a reference to the metric.
> > +        Metric('smi_cycles',
> > +               'Percentage of cycles spent in System Management Interrupts. '
> > +               'Requires /sys/devices/cpu/freeze_on_smi to be 1.',
>
> It seems not work for hybrid?

Thanks. The code is a migration of existing metrics that exist for hybrid:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json?h=perf-tools-next#n74
I still lack an easy way to test on hybrid, but I think fixing that
case can be follow on work.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models
  2024-09-26 17:50 ` [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models Ian Rogers
@ 2024-11-06 17:52   ` Liang, Kan
  2024-11-06 18:15     ` Ian Rogers
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 17:52 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> Allow duplicated metric to be dropped from json files. Detect when TSX
> is supported by a model by using the json events, use sysfs events at
> runtime as hypervisors, etc. may disable TSX.
> 
> Add CheckPmu to metric to determine if which PMUs have been associated
> with the loaded events.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/pmu-events/intel_metrics.py | 52 +++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index f34b4230a4ee..58e243695f0a 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -1,12 +1,13 @@
>  #!/usr/bin/env python3
>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> -from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
> +from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
>                      JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>                      MetricGroup, MetricRef, Select)
>  import argparse
>  import json
>  import math
>  import os
> +from typing import Optional
>  
>  # Global command line arguments.
>  _args = None
> @@ -74,6 +75,54 @@ def Smi() -> MetricGroup:
>      ], description = 'System Management Interrupt metrics')
>  
>  
> +def Tsx() -> Optional[MetricGroup]:
> +  pmu = "cpu_core" if CheckPmu("cpu_core") else "cpu"
> +  cycles = Event('cycles')

Isn't the pmu prefix required for cycles as well?

> +  cycles_in_tx = Event(f'{pmu}/cycles\-t/')
> +  cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
> +  try:
> +    # Test if the tsx event is present in the json, prefer the
> +    # sysfs version so that we can detect its presence at runtime.
> +    transaction_start = Event("RTM_RETIRED.START")
> +    transaction_start = Event(f'{pmu}/tx\-start/')

What's the difference between this check and the later has_event() check?

All the tsx related events are model-specific events. We should check
them all before using it.

Thanks,
Kan
> +  except:> +    return None
> +
> +  elision_start = None
> +  try:
> +    # Elision start isn't supported by all models, but we'll not
> +    # generate the tsx_cycles_per_elision metric in that
> +    # case. Again, prefer the sysfs encoding of the event.
> +    elision_start = Event("HLE_RETIRED.START")
> +    elision_start = Event(f'{pmu}/el\-start/')
> +  except:
> +    pass
> +
> +  return MetricGroup('transaction', [
> +      Metric('tsx_transactional_cycles',
> +             'Percentage of cycles within a transaction region.',
> +             Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
> +             '100%'),
> +      Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted transactions.',
> +             Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
> +                    has_event(cycles_in_tx),
> +                    0),
> +             '100%'),
> +      Metric('tsx_cycles_per_transaction',
> +             'Number of cycles within a transaction divided by the number of transactions.',
> +             Select(cycles_in_tx / transaction_start,
> +                    has_event(cycles_in_tx),
> +                    0),
> +             "cycles / transaction"),
> +      Metric('tsx_cycles_per_elision',
> +             'Number of cycles within a transaction divided by the number of elisions.',
> +             Select(cycles_in_tx / elision_start,
> +                    has_event(elision_start),
> +                    0),
> +             "cycles / elision") if elision_start else None,
> +  ], description="Breakdown of transactional memory statistics")
> +
> +
>  def main() -> None:
>    global _args
>  
> @@ -100,6 +149,7 @@ def main() -> None:
>        Idle(),
>        Rapl(),
>        Smi(),
> +      Tsx(),
>    ])
>  
>  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models
  2024-11-06 17:52   ` Liang, Kan
@ 2024-11-06 18:15     ` Ian Rogers
  2024-11-06 18:48       ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-11-06 18:15 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Wed, Nov 6, 2024 at 9:53 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > Allow duplicated metric to be dropped from json files. Detect when TSX
> > is supported by a model by using the json events, use sysfs events at
> > runtime as hypervisors, etc. may disable TSX.
> >
> > Add CheckPmu to metric to determine if which PMUs have been associated
> > with the loaded events.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/pmu-events/intel_metrics.py | 52 +++++++++++++++++++++++++-
> >  1 file changed, 51 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> > index f34b4230a4ee..58e243695f0a 100755
> > --- a/tools/perf/pmu-events/intel_metrics.py
> > +++ b/tools/perf/pmu-events/intel_metrics.py
> > @@ -1,12 +1,13 @@
> >  #!/usr/bin/env python3
> >  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > -from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
> > +from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
> >                      JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> >                      MetricGroup, MetricRef, Select)
> >  import argparse
> >  import json
> >  import math
> >  import os
> > +from typing import Optional
> >
> >  # Global command line arguments.
> >  _args = None
> > @@ -74,6 +75,54 @@ def Smi() -> MetricGroup:
> >      ], description = 'System Management Interrupt metrics')
> >
> >
> > +def Tsx() -> Optional[MetricGroup]:
> > +  pmu = "cpu_core" if CheckPmu("cpu_core") else "cpu"
> > +  cycles = Event('cycles')
>
> Isn't the pmu prefix required for cycles as well?

Makes sense.

> > +  cycles_in_tx = Event(f'{pmu}/cycles\-t/')
> > +  cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
> > +  try:
> > +    # Test if the tsx event is present in the json, prefer the
> > +    # sysfs version so that we can detect its presence at runtime.
> > +    transaction_start = Event("RTM_RETIRED.START")
> > +    transaction_start = Event(f'{pmu}/tx\-start/')
>
> What's the difference between this check and the later has_event() check?
>
> All the tsx related events are model-specific events. We should check
> them all before using it.

So if there is PMU in the Event name then the Event logic assumes you
are using sysfs and doesn't check the event exists in json. As you
say, I needed a way to detect does this model support TSX? I wanted to
avoid a model lookup table, so I used the existence of
RTM_RETIRED.START for a model as the way to determine if the model
supports TSX. Once we know we have a model supporting TSX then we use
the sysfs event name and has_event check, so that if the TSX and the
event have been disabled the metric doesn't fail parsing.

So, the first check is a compile time check of, "does this model have
TSX?". The "has_event" check is a runtime thing where we want to see
if the event exists in sysfs in case the TSX was disabled say in the
BIOS.

Thanks,
Ian

>
> Thanks,
> Kan
> > +  except:> +    return None
> > +
> > +  elision_start = None
> > +  try:
> > +    # Elision start isn't supported by all models, but we'll not
> > +    # generate the tsx_cycles_per_elision metric in that
> > +    # case. Again, prefer the sysfs encoding of the event.
> > +    elision_start = Event("HLE_RETIRED.START")
> > +    elision_start = Event(f'{pmu}/el\-start/')
> > +  except:
> > +    pass
> > +
> > +  return MetricGroup('transaction', [
> > +      Metric('tsx_transactional_cycles',
> > +             'Percentage of cycles within a transaction region.',
> > +             Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
> > +             '100%'),
> > +      Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted transactions.',
> > +             Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
> > +                    has_event(cycles_in_tx),
> > +                    0),
> > +             '100%'),
> > +      Metric('tsx_cycles_per_transaction',
> > +             'Number of cycles within a transaction divided by the number of transactions.',
> > +             Select(cycles_in_tx / transaction_start,
> > +                    has_event(cycles_in_tx),
> > +                    0),
> > +             "cycles / transaction"),
> > +      Metric('tsx_cycles_per_elision',
> > +             'Number of cycles within a transaction divided by the number of elisions.',
> > +             Select(cycles_in_tx / elision_start,
> > +                    has_event(elision_start),
> > +                    0),
> > +             "cycles / elision") if elision_start else None,
> > +  ], description="Breakdown of transactional memory statistics")
> > +
> > +
> >  def main() -> None:
> >    global _args
> >
> > @@ -100,6 +149,7 @@ def main() -> None:
> >        Idle(),
> >        Rapl(),
> >        Smi(),
> > +      Tsx(),
> >    ])
> >
> >
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 03/22] perf jevents: Add smi metric group for Intel models
  2024-11-06 17:42     ` Ian Rogers
@ 2024-11-06 18:29       ` Liang, Kan
  0 siblings, 0 replies; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 18:29 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-11-06 12:42 p.m., Ian Rogers wrote:
> On Wed, Nov 6, 2024 at 9:32 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>> Allow duplicated metric to be dropped from json files.
>>>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>  tools/perf/pmu-events/intel_metrics.py | 21 ++++++++++++++++++++-
>>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>> index f875eb844c78..f34b4230a4ee 100755
>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>> @@ -2,7 +2,7 @@
>>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>>  from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
>>>                      JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>>> -                    MetricGroup, Select)
>>> +                    MetricGroup, MetricRef, Select)
>>>  import argparse
>>>  import json
>>>  import math
>>> @@ -56,6 +56,24 @@ def Rapl() -> MetricGroup:
>>>                       description="Running Average Power Limit (RAPL) power consumption estimates")
>>>
>>>
>>> +def Smi() -> MetricGroup:
>>> +    aperf = Event('msr/aperf/')
>>> +    cycles = Event('cycles')
>>> +    smi_num = Event('msr/smi/')
>>> +    smi_cycles = Select(Select((aperf - cycles) / aperf, smi_num > 0, 0),
>>> +                        has_event(aperf),
>>> +                        0)
>>> +    return MetricGroup('smi', [
>>> +        Metric('smi_num', 'Number of SMI interrupts.',
>>> +               Select(smi_num, has_event(smi_num), 0), 'SMI#'),
>>> +        # Note, the smi_cycles "Event" is really a reference to the metric.
>>> +        Metric('smi_cycles',
>>> +               'Percentage of cycles spent in System Management Interrupts. '
>>> +               'Requires /sys/devices/cpu/freeze_on_smi to be 1.',
>>
>> It seems not work for hybrid?
> 
> Thanks. The code is a migration of existing metrics that exist for hybrid:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json?h=perf-tools-next#n74
> I still lack an easy way to test on hybrid, but I think fixing that
> case can be follow on work.

The metrics itself works on hybrid. But the description doesn't.
For hybrid, the location of the knob should be
/sys/devices/cpu_atom/freeze_on_smi
and
/sys/devices/cpu_core/freeze_on_smi

Maybe changes it as below?
'Requires /sys/devices/cpu*/freeze_on_smi to be 1.'

Thanks,
Kan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models
  2024-11-06 18:15     ` Ian Rogers
@ 2024-11-06 18:48       ` Liang, Kan
  0 siblings, 0 replies; 42+ messages in thread
From: Liang, Kan @ 2024-11-06 18:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-11-06 1:15 p.m., Ian Rogers wrote:
> On Wed, Nov 6, 2024 at 9:53 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>> Allow duplicated metric to be dropped from json files. Detect when TSX
>>> is supported by a model by using the json events, use sysfs events at
>>> runtime as hypervisors, etc. may disable TSX.
>>>
>>> Add CheckPmu to metric to determine if which PMUs have been associated
>>> with the loaded events.
>>>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>  tools/perf/pmu-events/intel_metrics.py | 52 +++++++++++++++++++++++++-
>>>  1 file changed, 51 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>> index f34b4230a4ee..58e243695f0a 100755
>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>> @@ -1,12 +1,13 @@
>>>  #!/usr/bin/env python3
>>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>> -from metric import (d_ratio, has_event, max, Event, JsonEncodeMetric,
>>> +from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
>>>                      JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>>>                      MetricGroup, MetricRef, Select)
>>>  import argparse
>>>  import json
>>>  import math
>>>  import os
>>> +from typing import Optional
>>>
>>>  # Global command line arguments.
>>>  _args = None
>>> @@ -74,6 +75,54 @@ def Smi() -> MetricGroup:
>>>      ], description = 'System Management Interrupt metrics')
>>>
>>>
>>> +def Tsx() -> Optional[MetricGroup]:
>>> +  pmu = "cpu_core" if CheckPmu("cpu_core") else "cpu"
>>> +  cycles = Event('cycles')
>>
>> Isn't the pmu prefix required for cycles as well?
> 
> Makes sense.
> 
>>> +  cycles_in_tx = Event(f'{pmu}/cycles\-t/')
>>> +  cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
>>> +  try:
>>> +    # Test if the tsx event is present in the json, prefer the
>>> +    # sysfs version so that we can detect its presence at runtime.
>>> +    transaction_start = Event("RTM_RETIRED.START")
>>> +    transaction_start = Event(f'{pmu}/tx\-start/')
>>
>> What's the difference between this check and the later has_event() check?
>>
>> All the tsx related events are model-specific events. We should check
>> them all before using it.
> 
> So if there is PMU in the Event name then the Event logic assumes you
> are using sysfs and doesn't check the event exists in json. As you
> say, I needed a way to detect does this model support TSX? I wanted to
> avoid a model lookup table, so I used the existence of
> RTM_RETIRED.START for a model as the way to determine if the model
> supports TSX. Once we know we have a model supporting TSX then we use
> the sysfs event name and has_event check, so that if the TSX and the
> event have been disabled the metric doesn't fail parsing.
> 
> So, the first check is a compile time check of, "does this model have
> TSX?". The "has_event" check is a runtime thing where we want to see
> if the event exists in sysfs in case the TSX was disabled say in the
> BIOS.
> 

Yes, that's sufficient.
But the "has_event" check seems very random.

For example,
>>>> +      Metric('tsx_cycles_per_transaction',
>>>> +             'Number of cycles within a transaction divided by the number of transactions.',
>>>> +             Select(cycles_in_tx / transaction_start,
>>>> +                    has_event(cycles_in_tx),
>>>> +                    0),
>>>> +             "cycles / transaction"),

I think both cycles_in_tx and transaction_start should be checked.

>>>> +      Metric('tsx_cycles_per_elision',
>>>> +             'Number of cycles within a transaction divided by the number of elisions.',
>>>> +             Select(cycles_in_tx / elision_start,
>>>> +                    has_event(elision_start),
>>>> +                    0),

This one only checks the elision_start event.

Thanks,
Kan
> Thanks,
> Ian
> 
>>
>> Thanks,
>> Kan
>>> +  except:> +    return None
>>> +
>>> +  elision_start = None
>>> +  try:
>>> +    # Elision start isn't supported by all models, but we'll not
>>> +    # generate the tsx_cycles_per_elision metric in that
>>> +    # case. Again, prefer the sysfs encoding of the event.
>>> +    elision_start = Event("HLE_RETIRED.START")
>>> +    elision_start = Event(f'{pmu}/el\-start/')
>>> +  except:
>>> +    pass
>>> +
>>> +  return MetricGroup('transaction', [
>>> +      Metric('tsx_transactional_cycles',
>>> +             'Percentage of cycles within a transaction region.',
>>> +             Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
>>> +             '100%'),
>>> +      Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted transactions.',
>>> +             Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
>>> +                    has_event(cycles_in_tx),
>>> +                    0),
>>> +             '100%'),
>>> +      Metric('tsx_cycles_per_transaction',
>>> +             'Number of cycles within a transaction divided by the number of transactions.',
>>> +             Select(cycles_in_tx / transaction_start,
>>> +                    has_event(cycles_in_tx),
>>> +                    0),
>>> +             "cycles / transaction"),
>>> +      Metric('tsx_cycles_per_elision',
>>> +             'Number of cycles within a transaction divided by the number of elisions.',
>>> +             Select(cycles_in_tx / elision_start,
>>> +                    has_event(elision_start),
>>> +                    0),
>>> +             "cycles / elision") if elision_start else None,
>>> +  ], description="Breakdown of transactional memory statistics")
>>> +
>>> +
>>>  def main() -> None:
>>>    global _args
>>>
>>> @@ -100,6 +149,7 @@ def main() -> None:
>>>        Idle(),
>>>        Rapl(),
>>>        Smi(),
>>> +      Tsx(),
>>>    ])
>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel
  2024-09-26 17:50 ` [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel Ian Rogers
@ 2024-11-07 14:35   ` Liang, Kan
  2024-11-07 17:19     ` Ian Rogers
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-07 14:35 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> The br metric group for branches itself comprises metric groups for
> total, taken, conditional, fused and far metric groups using json
> events. Conditional taken and not taken metrics are specific to
> Icelake and later generations, so the presence of the event is used to
> determine whether the metric should exist.
> > Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/pmu-events/intel_metrics.py | 138 +++++++++++++++++++++++++
>  1 file changed, 138 insertions(+)
> 
> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index 58e243695f0a..09f7b7159e7c 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -123,6 +123,143 @@ def Tsx() -> Optional[MetricGroup]:
>    ], description="Breakdown of transactional memory statistics")
>  
>  
> +def IntelBr():
> +  ins = Event("instructions")
> +
> +  def Total() -> MetricGroup:
> +    br_all = Event ("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
> +    br_m_all = Event("BR_MISP_RETIRED.ALL_BRANCHES",
> +                     "BR_INST_RETIRED.MISPRED",
> +                     "BR_MISP_EXEC.ANY")
> +    br_clr = None
> +    try:
> +      br_clr = Event("BACLEARS.ANY", "BACLEARS.ALL")
> +    except:
> +      pass

There is no guarantee to the event name. It could be changed later.
The renaming already occurred several times, even for architectural events.

I think we should test all events' presence, not just a few of them.

There could be some effort in the future to sync with the event list for
each new generation and check if there is a renaming.

Thanks,
Kan
> +
> +    br_r = d_ratio(br_all, interval_sec)
> +    ins_r = d_ratio(ins, br_all)
> +    misp_r = d_ratio(br_m_all, br_all)
> +    clr_r = d_ratio(br_clr, interval_sec) if br_clr else None
> +
> +    return MetricGroup("br_total", [
> +        Metric("br_total_retired",
> +               "The number of branch instructions retired per second.", br_r,
> +               "insn/s"),
> +        Metric(
> +            "br_total_mispred",
> +            "The number of branch instructions retired, of any type, that were "
> +            "not correctly predicted as a percentage of all branch instrucions.",
> +            misp_r, "100%"),
> +        Metric("br_total_insn_between_branches",
> +               "The number of instructions divided by the number of branches.",
> +               ins_r, "insn"),
> +        Metric("br_total_insn_fe_resteers",
> +               "The number of resync branches per second.", clr_r, "req/s"
> +               ) if clr_r else None
> +    ])
> +
> +  def Taken() -> MetricGroup:
> +    br_all = Event("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
> +    br_m_tk = None
> +    try:
> +      br_m_tk = Event("BR_MISP_RETIRED.NEAR_TAKEN",
> +                      "BR_MISP_RETIRED.TAKEN_JCC",
> +                      "BR_INST_RETIRED.MISPRED_TAKEN")
> +    except:
> +      pass
> +    br_r = d_ratio(br_all, interval_sec)
> +    ins_r = d_ratio(ins, br_all)
> +    misp_r = d_ratio(br_m_tk, br_all) if br_m_tk else None
> +    return MetricGroup("br_taken", [
> +        Metric("br_taken_retired",
> +               "The number of taken branches that were retired per second.",
> +               br_r, "insn/s"),
> +        Metric(
> +            "br_taken_mispred",
> +            "The number of retired taken branch instructions that were "
> +            "mispredicted as a percentage of all taken branches.", misp_r,
> +            "100%") if misp_r else None,
> +        Metric(
> +            "br_taken_insn_between_branches",
> +            "The number of instructions divided by the number of taken branches.",
> +            ins_r, "insn"),
> +    ])
> +
> +  def Conditional() -> Optional[MetricGroup]:
> +    try:
> +      br_cond = Event("BR_INST_RETIRED.COND",
> +                      "BR_INST_RETIRED.CONDITIONAL",
> +                      "BR_INST_RETIRED.TAKEN_JCC")
> +      br_m_cond = Event("BR_MISP_RETIRED.COND",
> +                        "BR_MISP_RETIRED.CONDITIONAL",
> +                        "BR_MISP_RETIRED.TAKEN_JCC")
> +    except:
> +      return None
> +
> +    br_cond_nt = None
> +    br_m_cond_nt = None
> +    try:
> +      br_cond_nt = Event("BR_INST_RETIRED.COND_NTAKEN")
> +      br_m_cond_nt = Event("BR_MISP_RETIRED.COND_NTAKEN")
> +    except:
> +      pass
> +    br_r = d_ratio(br_cond, interval_sec)
> +    ins_r = d_ratio(ins, br_cond)
> +    misp_r = d_ratio(br_m_cond, br_cond)
> +    taken_metrics = [
> +        Metric("br_cond_retired", "Retired conditional branch instructions.",
> +               br_r, "insn/s"),
> +        Metric("br_cond_insn_between_branches",
> +               "The number of instructions divided by the number of conditional "
> +               "branches.", ins_r, "insn"),
> +        Metric("br_cond_mispred",
> +               "Retired conditional branch instructions mispredicted as a "
> +               "percentage of all conditional branches.", misp_r, "100%"),
> +    ]
> +    if not br_m_cond_nt:
> +      return MetricGroup("br_cond", taken_metrics)
> +
> +    br_r = d_ratio(br_cond_nt, interval_sec)
> +    ins_r = d_ratio(ins, br_cond_nt)
> +    misp_r = d_ratio(br_m_cond_nt, br_cond_nt)
> +
> +    not_taken_metrics = [
> +        Metric("br_cond_retired", "Retired conditional not taken branch instructions.",
> +               br_r, "insn/s"),
> +        Metric("br_cond_insn_between_branches",
> +               "The number of instructions divided by the number of not taken conditional "
> +               "branches.", ins_r, "insn"),
> +        Metric("br_cond_mispred",
> +               "Retired not taken conditional branch instructions mispredicted as a "
> +               "percentage of all not taken conditional branches.", misp_r, "100%"),
> +    ]
> +    return MetricGroup("br_cond", [
> +        MetricGroup("br_cond_nt", not_taken_metrics),
> +        MetricGroup("br_cond_tkn", taken_metrics),
> +    ])
> +
> +  def Far() -> Optional[MetricGroup]:
> +    try:
> +      br_far = Event("BR_INST_RETIRED.FAR_BRANCH")
> +    except:
> +      return None
> +
> +    br_r = d_ratio(br_far, interval_sec)
> +    ins_r = d_ratio(ins, br_far)
> +    return MetricGroup("br_far", [
> +        Metric("br_far_retired", "Retired far control transfers per second.",
> +               br_r, "insn/s"),
> +        Metric(
> +            "br_far_insn_between_branches",
> +            "The number of instructions divided by the number of far branches.",
> +            ins_r, "insn"),
> +    ])
> +
> +  return MetricGroup("br", [Total(), Taken(), Conditional(), Far()],
> +                     description="breakdown of retired branch instructions")
> +
> +
>  def main() -> None:
>    global _args
>  
> @@ -150,6 +287,7 @@ def main() -> None:
>        Rapl(),
>        Smi(),
>        Tsx(),
> +      IntelBr(),
>    ])
>  
>  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel
  2024-09-26 17:50 ` [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel Ian Rogers
@ 2024-11-07 15:00   ` Liang, Kan
  2024-11-07 17:12     ` Ian Rogers
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-07 15:00 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> The ports metric group contains a metric for each port giving its
> utilization as a ratio of cycles. The metrics are created by looking
> for UOPS_DISPATCHED.PORT events.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> index f4707e964f75..3ef4eb868580 100755
> --- a/tools/perf/pmu-events/intel_metrics.py
> +++ b/tools/perf/pmu-events/intel_metrics.py
> @@ -1,12 +1,13 @@
>  #!/usr/bin/env python3
>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
> -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> -                    MetricGroup, MetricRef, Select)
> +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
> +                    Metric, MetricGroup, MetricRef, Select)
>  import argparse
>  import json
>  import math
>  import os
> +import re
>  from typing import Optional
>  
>  # Global command line arguments.
> @@ -260,6 +261,33 @@ def IntelBr():
>                       description="breakdown of retired branch instructions")
>  
>  
> +def IntelPorts() -> Optional[MetricGroup]:
> +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
> +
> +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
> +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
> +                      "cycles")
> +  # Number of CPU cycles scaled for SMT.
> +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
> +
> +  metrics = []
> +  for x in pipeline_events:
> +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
> +      name = x["EventName"]
> +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
> +      if name.endswith("_CORE"):
> +        cyc = core_cycles
> +      else:
> +        cyc = smt_cycles
> +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
> +                            d_ratio(Event(name), cyc), "100%"))


The generated metric highly depends on the event name, which is very
fragile. We will probably have the same event in a new generation, but
with a different name. Long-term maintenance could be a problem.
Is there an idea regarding how to sync the event names for new generations?

Maybe we should improve the event generation script and do an automatic
check to tell which metrics are missed. Then we may decide if updating
the new event name, dropping the metric or adding a different metric.

Thanks,
Kan

> +  if len(metrics) == 0:
> +    return None
> +
> +  return MetricGroup("ports", metrics, "functional unit (port) utilization -- "
> +                     "fraction of cycles each port is utilized (higher is better)")
> +
> +
>  def IntelSwpf() -> Optional[MetricGroup]:
>    ins = Event("instructions")
>    try:
> @@ -352,6 +380,7 @@ def main() -> None:
>        Smi(),
>        Tsx(),
>        IntelBr(),
> +      IntelPorts(),
>        IntelSwpf(),
>    ])
>  


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel
  2024-11-07 15:00   ` Liang, Kan
@ 2024-11-07 17:12     ` Ian Rogers
  2024-11-07 19:36       ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-11-07 17:12 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > The ports metric group contains a metric for each port giving its
> > utilization as a ratio of cycles. The metrics are created by looking
> > for UOPS_DISPATCHED.PORT events.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
> >  1 file changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> > index f4707e964f75..3ef4eb868580 100755
> > --- a/tools/perf/pmu-events/intel_metrics.py
> > +++ b/tools/perf/pmu-events/intel_metrics.py
> > @@ -1,12 +1,13 @@
> >  #!/usr/bin/env python3
> >  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> >  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
> > -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> > -                    MetricGroup, MetricRef, Select)
> > +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
> > +                    Metric, MetricGroup, MetricRef, Select)
> >  import argparse
> >  import json
> >  import math
> >  import os
> > +import re
> >  from typing import Optional
> >
> >  # Global command line arguments.
> > @@ -260,6 +261,33 @@ def IntelBr():
> >                       description="breakdown of retired branch instructions")
> >
> >
> > +def IntelPorts() -> Optional[MetricGroup]:
> > +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
> > +
> > +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
> > +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
> > +                      "cycles")
> > +  # Number of CPU cycles scaled for SMT.
> > +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
> > +
> > +  metrics = []
> > +  for x in pipeline_events:
> > +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
> > +      name = x["EventName"]
> > +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
> > +      if name.endswith("_CORE"):
> > +        cyc = core_cycles
> > +      else:
> > +        cyc = smt_cycles
> > +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
> > +                            d_ratio(Event(name), cyc), "100%"))
>
>
> The generated metric highly depends on the event name, which is very
> fragile. We will probably have the same event in a new generation, but
> with a different name. Long-term maintenance could be a problem.
> Is there an idea regarding how to sync the event names for new generations?

I agree with the idea that it is fragile, it is also strangely robust
as you say, new generations will gain support if they follow the same
naming convention. We have tests that load bearing metrics exists on
our platforms so maybe the appropriate place to test for existence is
in Weilin's metrics test.


> Maybe we should improve the event generation script and do an automatic
> check to tell which metrics are missed. Then we may decide if updating
> the new event name, dropping the metric or adding a different metric.

So I'm not sure it is a bug to not have the metric, if it were we
could just throw rather than return None. We're going to run the
script for every model including old models like nehalem, so I've
generally kept it as None. I think doing future work on testing is
probably best. It would also indicate use of the metric if people
notice it missing (not that the script aims for that :-) ).

Thanks,
Ian

> Thanks,
> Kan
>
> > +  if len(metrics) == 0:
> > +    return None
> > +
> > +  return MetricGroup("ports", metrics, "functional unit (port) utilization -- "
> > +                     "fraction of cycles each port is utilized (higher is better)")
> > +
> > +
> >  def IntelSwpf() -> Optional[MetricGroup]:
> >    ins = Event("instructions")
> >    try:
> > @@ -352,6 +380,7 @@ def main() -> None:
> >        Smi(),
> >        Tsx(),
> >        IntelBr(),
> > +      IntelPorts(),
> >        IntelSwpf(),
> >    ])
> >
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel
  2024-11-07 14:35   ` Liang, Kan
@ 2024-11-07 17:19     ` Ian Rogers
  0 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-11-07 17:19 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Thu, Nov 7, 2024 at 6:35 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> > The br metric group for branches itself comprises metric groups for
> > total, taken, conditional, fused and far metric groups using json
> > events. Conditional taken and not taken metrics are specific to
> > Icelake and later generations, so the presence of the event is used to
> > determine whether the metric should exist.
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/pmu-events/intel_metrics.py | 138 +++++++++++++++++++++++++
> >  1 file changed, 138 insertions(+)
> >
> > diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> > index 58e243695f0a..09f7b7159e7c 100755
> > --- a/tools/perf/pmu-events/intel_metrics.py
> > +++ b/tools/perf/pmu-events/intel_metrics.py
> > @@ -123,6 +123,143 @@ def Tsx() -> Optional[MetricGroup]:
> >    ], description="Breakdown of transactional memory statistics")
> >
> >
> > +def IntelBr():
> > +  ins = Event("instructions")
> > +
> > +  def Total() -> MetricGroup:
> > +    br_all = Event ("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
> > +    br_m_all = Event("BR_MISP_RETIRED.ALL_BRANCHES",
> > +                     "BR_INST_RETIRED.MISPRED",
> > +                     "BR_MISP_EXEC.ANY")
> > +    br_clr = None
> > +    try:
> > +      br_clr = Event("BACLEARS.ANY", "BACLEARS.ALL")
> > +    except:
> > +      pass
>
> There is no guarantee to the event name. It could be changed later.
> The renaming already occurred several times, even for architectural events.
>
> I think we should test all events' presence, not just a few of them.

So the idea with the Event names is that we look for each name in turn
in the json, stopping with the first one we find. If none are found
then an exception is thrown. This means a typo can cause issues so we
also check that the event name exists in some json somewhere:
https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/pmu-events/metric.py#L89
I agree it isn't super perfect, we're dealing with strings, things can
be fragile..

> There could be some effort in the future to sync with the event list for
> each new generation and check if there is a renaming.

So the worst case is that a metric isn't present in a new generation,
then we update the script for a new event name and things will be
fixed. I think testing can keep this in check.

Thanks,
Ian

> Thanks,
> Kan
> > +
> > +    br_r = d_ratio(br_all, interval_sec)
> > +    ins_r = d_ratio(ins, br_all)
> > +    misp_r = d_ratio(br_m_all, br_all)
> > +    clr_r = d_ratio(br_clr, interval_sec) if br_clr else None
> > +
> > +    return MetricGroup("br_total", [
> > +        Metric("br_total_retired",
> > +               "The number of branch instructions retired per second.", br_r,
> > +               "insn/s"),
> > +        Metric(
> > +            "br_total_mispred",
> > +            "The number of branch instructions retired, of any type, that were "
> > +            "not correctly predicted as a percentage of all branch instrucions.",
> > +            misp_r, "100%"),
> > +        Metric("br_total_insn_between_branches",
> > +               "The number of instructions divided by the number of branches.",
> > +               ins_r, "insn"),
> > +        Metric("br_total_insn_fe_resteers",
> > +               "The number of resync branches per second.", clr_r, "req/s"
> > +               ) if clr_r else None
> > +    ])
> > +
> > +  def Taken() -> MetricGroup:
> > +    br_all = Event("BR_INST_RETIRED.ALL_BRANCHES", "BR_INST_RETIRED.ANY")
> > +    br_m_tk = None
> > +    try:
> > +      br_m_tk = Event("BR_MISP_RETIRED.NEAR_TAKEN",
> > +                      "BR_MISP_RETIRED.TAKEN_JCC",
> > +                      "BR_INST_RETIRED.MISPRED_TAKEN")
> > +    except:
> > +      pass
> > +    br_r = d_ratio(br_all, interval_sec)
> > +    ins_r = d_ratio(ins, br_all)
> > +    misp_r = d_ratio(br_m_tk, br_all) if br_m_tk else None
> > +    return MetricGroup("br_taken", [
> > +        Metric("br_taken_retired",
> > +               "The number of taken branches that were retired per second.",
> > +               br_r, "insn/s"),
> > +        Metric(
> > +            "br_taken_mispred",
> > +            "The number of retired taken branch instructions that were "
> > +            "mispredicted as a percentage of all taken branches.", misp_r,
> > +            "100%") if misp_r else None,
> > +        Metric(
> > +            "br_taken_insn_between_branches",
> > +            "The number of instructions divided by the number of taken branches.",
> > +            ins_r, "insn"),
> > +    ])
> > +
> > +  def Conditional() -> Optional[MetricGroup]:
> > +    try:
> > +      br_cond = Event("BR_INST_RETIRED.COND",
> > +                      "BR_INST_RETIRED.CONDITIONAL",
> > +                      "BR_INST_RETIRED.TAKEN_JCC")
> > +      br_m_cond = Event("BR_MISP_RETIRED.COND",
> > +                        "BR_MISP_RETIRED.CONDITIONAL",
> > +                        "BR_MISP_RETIRED.TAKEN_JCC")
> > +    except:
> > +      return None
> > +
> > +    br_cond_nt = None
> > +    br_m_cond_nt = None
> > +    try:
> > +      br_cond_nt = Event("BR_INST_RETIRED.COND_NTAKEN")
> > +      br_m_cond_nt = Event("BR_MISP_RETIRED.COND_NTAKEN")
> > +    except:
> > +      pass
> > +    br_r = d_ratio(br_cond, interval_sec)
> > +    ins_r = d_ratio(ins, br_cond)
> > +    misp_r = d_ratio(br_m_cond, br_cond)
> > +    taken_metrics = [
> > +        Metric("br_cond_retired", "Retired conditional branch instructions.",
> > +               br_r, "insn/s"),
> > +        Metric("br_cond_insn_between_branches",
> > +               "The number of instructions divided by the number of conditional "
> > +               "branches.", ins_r, "insn"),
> > +        Metric("br_cond_mispred",
> > +               "Retired conditional branch instructions mispredicted as a "
> > +               "percentage of all conditional branches.", misp_r, "100%"),
> > +    ]
> > +    if not br_m_cond_nt:
> > +      return MetricGroup("br_cond", taken_metrics)
> > +
> > +    br_r = d_ratio(br_cond_nt, interval_sec)
> > +    ins_r = d_ratio(ins, br_cond_nt)
> > +    misp_r = d_ratio(br_m_cond_nt, br_cond_nt)
> > +
> > +    not_taken_metrics = [
> > +        Metric("br_cond_retired", "Retired conditional not taken branch instructions.",
> > +               br_r, "insn/s"),
> > +        Metric("br_cond_insn_between_branches",
> > +               "The number of instructions divided by the number of not taken conditional "
> > +               "branches.", ins_r, "insn"),
> > +        Metric("br_cond_mispred",
> > +               "Retired not taken conditional branch instructions mispredicted as a "
> > +               "percentage of all not taken conditional branches.", misp_r, "100%"),
> > +    ]
> > +    return MetricGroup("br_cond", [
> > +        MetricGroup("br_cond_nt", not_taken_metrics),
> > +        MetricGroup("br_cond_tkn", taken_metrics),
> > +    ])
> > +
> > +  def Far() -> Optional[MetricGroup]:
> > +    try:
> > +      br_far = Event("BR_INST_RETIRED.FAR_BRANCH")
> > +    except:
> > +      return None
> > +
> > +    br_r = d_ratio(br_far, interval_sec)
> > +    ins_r = d_ratio(ins, br_far)
> > +    return MetricGroup("br_far", [
> > +        Metric("br_far_retired", "Retired far control transfers per second.",
> > +               br_r, "insn/s"),
> > +        Metric(
> > +            "br_far_insn_between_branches",
> > +            "The number of instructions divided by the number of far branches.",
> > +            ins_r, "insn"),
> > +    ])
> > +
> > +  return MetricGroup("br", [Total(), Taken(), Conditional(), Far()],
> > +                     description="breakdown of retired branch instructions")
> > +
> > +
> >  def main() -> None:
> >    global _args
> >
> > @@ -150,6 +287,7 @@ def main() -> None:
> >        Rapl(),
> >        Smi(),
> >        Tsx(),
> > +      IntelBr(),
> >    ])
> >
> >
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel
  2024-11-07 17:12     ` Ian Rogers
@ 2024-11-07 19:36       ` Liang, Kan
  2024-11-07 21:00         ` Ian Rogers
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2024-11-07 19:36 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-11-07 12:12 p.m., Ian Rogers wrote:
> On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>> The ports metric group contains a metric for each port giving its
>>> utilization as a ratio of cycles. The metrics are created by looking
>>> for UOPS_DISPATCHED.PORT events.
>>>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
>>>  1 file changed, 31 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>> index f4707e964f75..3ef4eb868580 100755
>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>> @@ -1,12 +1,13 @@
>>>  #!/usr/bin/env python3
>>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>>  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
>>> -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>>> -                    MetricGroup, MetricRef, Select)
>>> +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
>>> +                    Metric, MetricGroup, MetricRef, Select)
>>>  import argparse
>>>  import json
>>>  import math
>>>  import os
>>> +import re
>>>  from typing import Optional
>>>
>>>  # Global command line arguments.
>>> @@ -260,6 +261,33 @@ def IntelBr():
>>>                       description="breakdown of retired branch instructions")
>>>
>>>
>>> +def IntelPorts() -> Optional[MetricGroup]:
>>> +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
>>> +
>>> +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
>>> +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
>>> +                      "cycles")
>>> +  # Number of CPU cycles scaled for SMT.
>>> +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
>>> +
>>> +  metrics = []
>>> +  for x in pipeline_events:
>>> +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
>>> +      name = x["EventName"]
>>> +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
>>> +      if name.endswith("_CORE"):
>>> +        cyc = core_cycles
>>> +      else:
>>> +        cyc = smt_cycles
>>> +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
>>> +                            d_ratio(Event(name), cyc), "100%"))
>>
>> The generated metric highly depends on the event name, which is very
>> fragile. We will probably have the same event in a new generation, but
>> with a different name. Long-term maintenance could be a problem.
>> Is there an idea regarding how to sync the event names for new generations?
> I agree with the idea that it is fragile, it is also strangely robust
> as you say, new generations will gain support if they follow the same
> naming convention. We have tests that load bearing metrics exists on
> our platforms so maybe the appropriate place to test for existence is
> in Weilin's metrics test.
> 
> 
>> Maybe we should improve the event generation script and do an automatic
>> check to tell which metrics are missed. Then we may decide if updating
>> the new event name, dropping the metric or adding a different metric.
> So I'm not sure it is a bug to not have the metric, if it were we
> could just throw rather than return None. We're going to run the
> script for every model including old models like nehalem, so I've
> generally kept it as None. I think doing future work on testing is
> probably best. It would also indicate use of the metric if people
> notice it missing (not that the script aims for that 🙂 ).

The maintenance is still a concern, even if we have a way to test it
out. There is already an "official" metric published in GitHub, which is
maintained by Intel. To be honest, I don't think there is more energy to
maintain these "non-official" metrics.

I don't think it should be a bug without these metrics. So it's very
likely that the issue will not be addressed right away. If we cannot
keep these metrics updated for the future platforms, I couldn't find a
reason to have them.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel
  2024-11-07 19:36       ` Liang, Kan
@ 2024-11-07 21:00         ` Ian Rogers
  2024-11-08 16:45           ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Ian Rogers @ 2024-11-07 21:00 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Thu, Nov 7, 2024 at 11:36 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
> On 2024-11-07 12:12 p.m., Ian Rogers wrote:
> > On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> >>> The ports metric group contains a metric for each port giving its
> >>> utilization as a ratio of cycles. The metrics are created by looking
> >>> for UOPS_DISPATCHED.PORT events.
> >>>
> >>> Signed-off-by: Ian Rogers <irogers@google.com>
> >>> ---
> >>>  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
> >>>  1 file changed, 31 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
> >>> index f4707e964f75..3ef4eb868580 100755
> >>> --- a/tools/perf/pmu-events/intel_metrics.py
> >>> +++ b/tools/perf/pmu-events/intel_metrics.py
> >>> @@ -1,12 +1,13 @@
> >>>  #!/usr/bin/env python3
> >>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> >>>  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
> >>> -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
> >>> -                    MetricGroup, MetricRef, Select)
> >>> +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
> >>> +                    Metric, MetricGroup, MetricRef, Select)
> >>>  import argparse
> >>>  import json
> >>>  import math
> >>>  import os
> >>> +import re
> >>>  from typing import Optional
> >>>
> >>>  # Global command line arguments.
> >>> @@ -260,6 +261,33 @@ def IntelBr():
> >>>                       description="breakdown of retired branch instructions")
> >>>
> >>>
> >>> +def IntelPorts() -> Optional[MetricGroup]:
> >>> +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
> >>> +
> >>> +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
> >>> +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
> >>> +                      "cycles")
> >>> +  # Number of CPU cycles scaled for SMT.
> >>> +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
> >>> +
> >>> +  metrics = []
> >>> +  for x in pipeline_events:
> >>> +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
> >>> +      name = x["EventName"]
> >>> +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
> >>> +      if name.endswith("_CORE"):
> >>> +        cyc = core_cycles
> >>> +      else:
> >>> +        cyc = smt_cycles
> >>> +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
> >>> +                            d_ratio(Event(name), cyc), "100%"))
> >>
> >> The generated metric highly depends on the event name, which is very
> >> fragile. We will probably have the same event in a new generation, but
> >> with a different name. Long-term maintenance could be a problem.
> >> Is there an idea regarding how to sync the event names for new generations?
> > I agree with the idea that it is fragile, it is also strangely robust
> > as you say, new generations will gain support if they follow the same
> > naming convention. We have tests that load bearing metrics exists on
> > our platforms so maybe the appropriate place to test for existence is
> > in Weilin's metrics test.
> >
> >
> >> Maybe we should improve the event generation script and do an automatic
> >> check to tell which metrics are missed. Then we may decide if updating
> >> the new event name, dropping the metric or adding a different metric.
> > So I'm not sure it is a bug to not have the metric, if it were we
> > could just throw rather than return None. We're going to run the
> > script for every model including old models like nehalem, so I've
> > generally kept it as None. I think doing future work on testing is
> > probably best. It would also indicate use of the metric if people
> > notice it missing (not that the script aims for that 🙂 ).
>
> The maintenance is still a concern, even if we have a way to test it
> out. There is already an "official" metric published in GitHub, which is
> maintained by Intel. To be honest, I don't think there is more energy to
> maintain these "non-official" metrics.
>
> I don't think it should be a bug without these metrics. So it's very
> likely that the issue will not be addressed right away. If we cannot
> keep these metrics updated for the future platforms, I couldn't find a
> reason to have them.

So I think there are a few things:
1) I'd like there to be a non-json infrastructure for events that can
handle multiple models. Some failings of json are its inability to
validate events, long lines, lack of comments, metric expression
strings that aren't inherently sound, etc.  I'd like to make it so we
can have json metrics for everything, ie remove the hardcoded metrics
that play badly with event sharing, etc. Doing this by updating every
json would be tedious and excessively noisy.
2) There are "official" metrics from Intel and I've worked for the
establishment of that. That doesn't mean every Intel metric is in the
official metrics. Servers are better served than client machines. Core
TMA metrics are well served but uncore less so.
3) Are perf metrics perfect with some kind of warranty? Well no, as
your reviews in this thread have shown SMI cost on hybrid is likely
broken. We don't intentionally try to have broken metrics and fix them
asap when they come up. GPLv2 has an explicit "no warranty" section.
Now Intel have experimental and non-experimental events, we update the
comments of metrics using those to highlight that the underlying
events are experimental:
https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/tools/perf/pmu-events/metric.py#L598
If there are bugs in the metrics then open source, sharing and fixing
benefits everyone.
4) Am I looking for energy from Intel to maintain these metrics? No.
I'm trying to stop carrying the patches just inside of Google's tree.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel
  2024-11-07 21:00         ` Ian Rogers
@ 2024-11-08 16:45           ` Liang, Kan
  0 siblings, 0 replies; 42+ messages in thread
From: Liang, Kan @ 2024-11-08 16:45 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker



On 2024-11-07 4:00 p.m., Ian Rogers wrote:
> On Thu, Nov 7, 2024 at 11:36 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>> On 2024-11-07 12:12 p.m., Ian Rogers wrote:
>>> On Thu, Nov 7, 2024 at 7:00 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
>>>>> The ports metric group contains a metric for each port giving its
>>>>> utilization as a ratio of cycles. The metrics are created by looking
>>>>> for UOPS_DISPATCHED.PORT events.
>>>>>
>>>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>>>> ---
>>>>>  tools/perf/pmu-events/intel_metrics.py | 33 ++++++++++++++++++++++++--
>>>>>  1 file changed, 31 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>>>> index f4707e964f75..3ef4eb868580 100755
>>>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>>>> @@ -1,12 +1,13 @@
>>>>>  #!/usr/bin/env python3
>>>>>  # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
>>>>>  from metric import (d_ratio, has_event, max, CheckPmu, Event, JsonEncodeMetric,
>>>>> -                    JsonEncodeMetricGroupDescriptions, LoadEvents, Metric,
>>>>> -                    MetricGroup, MetricRef, Select)
>>>>> +                    JsonEncodeMetricGroupDescriptions, Literal, LoadEvents,
>>>>> +                    Metric, MetricGroup, MetricRef, Select)
>>>>>  import argparse
>>>>>  import json
>>>>>  import math
>>>>>  import os
>>>>> +import re
>>>>>  from typing import Optional
>>>>>
>>>>>  # Global command line arguments.
>>>>> @@ -260,6 +261,33 @@ def IntelBr():
>>>>>                       description="breakdown of retired branch instructions")
>>>>>
>>>>>
>>>>> +def IntelPorts() -> Optional[MetricGroup]:
>>>>> +  pipeline_events = json.load(open(f"{_args.events_path}/x86/{_args.model}/pipeline.json"))
>>>>> +
>>>>> +  core_cycles = Event("CPU_CLK_UNHALTED.THREAD_P_ANY",
>>>>> +                      "CPU_CLK_UNHALTED.DISTRIBUTED",
>>>>> +                      "cycles")
>>>>> +  # Number of CPU cycles scaled for SMT.
>>>>> +  smt_cycles = Select(core_cycles / 2, Literal("#smt_on"), core_cycles)
>>>>> +
>>>>> +  metrics = []
>>>>> +  for x in pipeline_events:
>>>>> +    if "EventName" in x and re.search("^UOPS_DISPATCHED.PORT", x["EventName"]):
>>>>> +      name = x["EventName"]
>>>>> +      port = re.search(r"(PORT_[0-9].*)", name).group(0).lower()
>>>>> +      if name.endswith("_CORE"):
>>>>> +        cyc = core_cycles
>>>>> +      else:
>>>>> +        cyc = smt_cycles
>>>>> +      metrics.append(Metric(port, f"{port} utilization (higher is better)",
>>>>> +                            d_ratio(Event(name), cyc), "100%"))
>>>> The generated metric highly depends on the event name, which is very
>>>> fragile. We will probably have the same event in a new generation, but
>>>> with a different name. Long-term maintenance could be a problem.
>>>> Is there an idea regarding how to sync the event names for new generations?
>>> I agree with the idea that it is fragile, it is also strangely robust
>>> as you say, new generations will gain support if they follow the same
>>> naming convention. We have tests that load bearing metrics exists on
>>> our platforms so maybe the appropriate place to test for existence is
>>> in Weilin's metrics test.
>>>
>>>
>>>> Maybe we should improve the event generation script and do an automatic
>>>> check to tell which metrics are missed. Then we may decide if updating
>>>> the new event name, dropping the metric or adding a different metric.
>>> So I'm not sure it is a bug to not have the metric, if it were we
>>> could just throw rather than return None. We're going to run the
>>> script for every model including old models like nehalem, so I've
>>> generally kept it as None. I think doing future work on testing is
>>> probably best. It would also indicate use of the metric if people
>>> notice it missing (not that the script aims for that 🙂 ).
>> The maintenance is still a concern, even if we have a way to test it
>> out. There is already an "official" metric published in GitHub, which is
>> maintained by Intel. To be honest, I don't think there is more energy to
>> maintain these "non-official" metrics.
>>
>> I don't think it should be a bug without these metrics. So it's very
>> likely that the issue will not be addressed right away. If we cannot
>> keep these metrics updated for the future platforms, I couldn't find a
>> reason to have them.
> So I think there are a few things:
> 1) I'd like there to be a non-json infrastructure for events that can
> handle multiple models. Some failings of json are its inability to
> validate events, long lines, lack of comments, metric expression
> strings that aren't inherently sound, etc.  I'd like to make it so we
> can have json metrics for everything, ie remove the hardcoded metrics
> that play badly with event sharing, etc. Doing this by updating every
> json would be tedious and excessively noisy.
> 2) There are "official" metrics from Intel and I've worked for the
> establishment of that. That doesn't mean every Intel metric is in the
> official metrics. Servers are better served than client machines. Core
> TMA metrics are well served but uncore less so.
> 3) Are perf metrics perfect with some kind of warranty? Well no, as
> your reviews in this thread have shown SMI cost on hybrid is likely
> broken. We don't intentionally try to have broken metrics and fix them
> asap when they come up. GPLv2 has an explicit "no warranty" section.
> Now Intel have experimental and non-experimental events, we update the
> comments of metrics using those to highlight that the underlying
> events are experimental:
> https://github.com/googleprodkernel/linux-perf/blob/google_tools_master/
> tools/perf/pmu-events/metric.py#L598
> If there are bugs in the metrics then open source, sharing and fixing
> benefits everyone.
> 4) Am I looking for energy from Intel to maintain these metrics? No.
> I'm trying to stop carrying the patches just inside of Google's tree.

Got it. Thanks for the clarification.

IIUC, the generated metrics are based on the best knowledge of
contributors. The initial source is from Google's tree. It should be the
contributor's responsibility to maintain or update the metrics.
If so, I agree that it should be a good supplement.

I would have some comments in general.
- I think we need a way to distinguish these metrics, e.g., add a
dedicated prefix. I will not be surprised if some customers bring the
metrics to Intel or other vendor's customer service, and ask why it
doesn't work on some platforms. I don't think they will get any useful
information there. The best way is to report any issues here. So we can
fix and update the metric.
- All events, no matter whether they are from the JSON file or exposed
by the kernel, have to be checked before showing the metrics. Because we
cannot guarantee the availability of an event, even for the
architectural events. We may introduce a wrapper for the Metric() to
check all the involved events. So we don't need to add the try: except
thing in each patch?
- In the perf test, it's better to ignore the errors from these metrics.
So it doesn't block things. But we can show a warning from them with
-vvv. The issue can still be found and fixed.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 00/22] Python generated Intel metrics
  2024-11-06 16:46     ` Liang, Kan
@ 2024-11-13 23:40       ` Ian Rogers
  0 siblings, 0 replies; 42+ messages in thread
From: Ian Rogers @ 2024-11-13 23:40 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, linux-perf-users, linux-kernel, Perry Taylor,
	Samantha Alt, Caleb Biggers, Weilin Wang, Edward Baker

On Wed, Nov 6, 2024 at 8:47 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-10-09 12:02 p.m., Ian Rogers wrote:
> > On Fri, Sep 27, 2024 at 11:34 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2024-09-26 1:50 p.m., Ian Rogers wrote:
> >>> Generate twenty sets of additional metrics for Intel. Rapl and Idle
> >>> metrics aren't specific to Intel but are placed here for ease and
> >>> convenience. Smi and tsx metrics are added so they can be dropped from
> >>> the per model json files.
> >>
> >> Are Smi and tsx metrics the only two metrics who's duplicate metrics in
> >> the json files will be dropped?
> >
> > Yes. These metrics with their runtime detection and use of sysfs event
> > names I feel more naturally fit here rather than in the Intel perfmon
> > github converter script.
> >
> >> It sounds like there will be many duplicate metrics in perf list, right?
> >
> > That's not the goal. There may be memory bandwidth computed in
> > different ways, like TMA and using uncore, but that seems okay as the
> > metrics are using different counters so may say different things. I
> > think there is an action to always watch the metrics and ensure
> > duplicates don't occur, but some duplication can be beneficial.
>
>
> Can we give a common prefix for all the automatically generated metrics,
> e.g., general_ or std_?
> As you said, there may be different metrics to calculate the same thing.
>
> With a common prefix, we can clearly understand where the metrics is
> from. In case, there are any issues found later for some metrics. I can
> tell the end user to use either the TMA metrics or the automatically
> generated metrics.
> If they count the same thing, the main body of the metric name should be
> the same.

I'm reminded of the default events where some of the set fail on AMD,
and of AMD calling their topdown like metrics things like PipelineL1
and PipelineL2 rather than the Intel names of TopdownL1 and TopdownL2.
Like you I have a desire for consistent naming, it just seems we
always get pulled away from it.

I'm going to post a v5 of these changes, we carry them in:
https://github.com/googleprodkernel/linux-perf
but I'll not vary the naming for now.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-11-13 23:40 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-26 17:50 [PATCH v4 00/22] Python generated Intel metrics Ian Rogers
2024-09-26 17:50 ` [PATCH v4 01/22] perf jevents: Add RAPL metrics for all Intel models Ian Rogers
2024-09-26 17:50 ` [PATCH v4 02/22] perf jevents: Add idle metric for " Ian Rogers
2024-11-06 17:01   ` Liang, Kan
2024-11-06 17:08     ` Liang, Kan
2024-09-26 17:50 ` [PATCH v4 03/22] perf jevents: Add smi metric group " Ian Rogers
2024-11-06 17:32   ` Liang, Kan
2024-11-06 17:42     ` Ian Rogers
2024-11-06 18:29       ` Liang, Kan
2024-09-26 17:50 ` [PATCH v4 04/22] perf jevents: Add CheckPmu to see if a PMU is in loaded json events Ian Rogers
2024-09-26 17:50 ` [PATCH v4 05/22] perf jevents: Mark metrics with experimental events as experimental Ian Rogers
2024-09-26 17:50 ` [PATCH v4 06/22] perf jevents: Add tsx metric group for Intel models Ian Rogers
2024-11-06 17:52   ` Liang, Kan
2024-11-06 18:15     ` Ian Rogers
2024-11-06 18:48       ` Liang, Kan
2024-09-26 17:50 ` [PATCH v4 07/22] perf jevents: Add br metric group for branch statistics on Intel Ian Rogers
2024-11-07 14:35   ` Liang, Kan
2024-11-07 17:19     ` Ian Rogers
2024-09-26 17:50 ` [PATCH v4 08/22] perf jevents: Add software prefetch (swpf) metric group for Intel Ian Rogers
2024-09-26 17:50 ` [PATCH v4 09/22] perf jevents: Add ports metric group giving utilization on Intel Ian Rogers
2024-11-07 15:00   ` Liang, Kan
2024-11-07 17:12     ` Ian Rogers
2024-11-07 19:36       ` Liang, Kan
2024-11-07 21:00         ` Ian Rogers
2024-11-08 16:45           ` Liang, Kan
2024-09-26 17:50 ` [PATCH v4 10/22] perf jevents: Add L2 metrics for Intel Ian Rogers
2024-09-26 17:50 ` [PATCH v4 11/22] perf jevents: Add load store breakdown metrics ldst " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 12/22] perf jevents: Add ILP metrics " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 13/22] perf jevents: Add context switch " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 14/22] perf jevents: Add FPU " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 15/22] perf jevents: Add Miss Level Parallelism (MLP) metric " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 16/22] perf jevents: Add mem_bw " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 17/22] perf jevents: Add local/remote "mem" breakdown metrics " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 18/22] perf jevents: Add dir " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 19/22] perf jevents: Add C-State metrics from the PCU PMU " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 20/22] perf jevents: Add local/remote miss latency metrics " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 21/22] perf jevents: Add upi_bw metric " Ian Rogers
2024-09-26 17:50 ` [PATCH v4 22/22] perf jevents: Add mesh bandwidth saturation " Ian Rogers
2024-09-27 18:33 ` [PATCH v4 00/22] Python generated Intel metrics Liang, Kan
2024-10-09 16:02   ` Ian Rogers
2024-11-06 16:46     ` Liang, Kan
2024-11-13 23:40       ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).