From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: sj@kernel.org, damon@lists.linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Cc: akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com,
ravis.opensrc@gmail.com, bharata@amd.com
Subject: [RFC PATCH 7/7] mm/damon/damon_ibs: add AMD IBS-based access sampling backend
Date: Sat, 16 May 2026 15:34:32 -0700 [thread overview]
Message-ID: <20260516223439.4033-8-ravis.opensrc@gmail.com> (raw)
In-Reply-To: <20260516223439.4033-1-ravis.opensrc@gmail.com>
Add paddr_ibs operations using AMD IBS Op sampling via
perf_event_create_kernel_counter(). IBS delivers physical-address-
keyed access reports to DAMON's shared-layer ring buffer
(damon_report_access()), without dependency on PTE Accessed-bit
scanning or page faults.
Per-CPU IBS events are created and torn down via cpuhp notifiers
(CPUHP_AP_ONLINE_DYN). Routing of access reports through the ring-
drain path is bound to ops.id == DAMON_OPS_PADDR_IBS at the dispatch
site (see "mm/damon: add sysfs binding and dispatch hookup for
paddr_ibs operations"), so .init does not need to flip a per-context
flag.
Sample-time discipline:
- PERF_SAMPLE_PHYS_ADDR is requested in attr.sample_type, but the
IBS perf driver only fills data->phys_addr when
IBS_OP_DATA3.dc_phy_addr_valid is set. Skip stale-PA samples by
inspecting data->sample_flags rather than testing phys_addr for
zero (which would also drop legitimate page 0).
- PERF_SAMPLE_DATA_SRC is requested so the perf driver decodes
IBS_OP_DATA3.{ld_op,st_op} into data->data_src.mem_op; the
backend reports load vs store accordingly via
damon_access_report.is_write.
Module parameters:
- max_cnt: IBS Op MaxCnt (writable; writes call
damon_ibs_set_sample_rate() to restart sampling at the new rate)
- samples_total / samples_filtered: per-CPU-aggregated counters
(read-only)
Source file is mm/damon/damon_ibs.c (renamed from mm/damon/ibs.c) so
the resulting module is loaded as damon_ibs.ko, avoiding the generic
"ibs" namespace.
The IBS sampling approach is derived from Bharata B Rao's pghot RFC v5
series; the attribution header is in damon_ibs.c.
Suggested-by: Bharata B Rao <bharata@amd.com>
Link: https://lore.kernel.org/linux-mm/20260129144043.231636-1-bharata@amd.com/
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
mm/damon/Kconfig | 10 ++
mm/damon/Makefile | 1 +
mm/damon/damon_ibs.c | 369 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 380 insertions(+)
create mode 100644 mm/damon/damon_ibs.c
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index ad629f0f31d8d..bb698d2717f34 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -131,4 +131,14 @@ config DAMON_ACMA
min/max memory for the system and maximum memory pressure stall time
ratio.
+config DAMON_IBS
+ tristate "AMD IBS-based access sampling for DAMON"
+ depends on DAMON_PADDR && CPU_SUP_AMD && X86_64 && PERF_EVENTS
+ help
+ Uses AMD IBS (Instruction-Based Sampling) hardware to deliver
+ physical-address-keyed access reports to DAMON's shared-layer
+ ring buffer, without relying on PTE Accessed-bit scanning or
+ page faults. Registers as the "paddr_ibs" operations set.
+ Requires AMD processors with IBS Op support.
+
endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 22494754f41e8..109d0fb1db97d 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_DAMON_RECLAIM) += modules-common.o reclaim.o
obj-$(CONFIG_DAMON_LRU_SORT) += modules-common.o lru_sort.o
obj-$(CONFIG_DAMON_STAT) += modules-common.o stat.o
obj-$(CONFIG_DAMON_ACMA) += modules-common.o acma.o
+obj-$(CONFIG_DAMON_IBS) += damon_ibs.o
diff --git a/mm/damon/damon_ibs.c b/mm/damon/damon_ibs.c
new file mode 100644
index 0000000000000..1dd99e91c3928
--- /dev/null
+++ b/mm/damon/damon_ibs.c
@@ -0,0 +1,369 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON IBS (Instruction-Based Sampling) backend for AMD processors.
+ *
+ * Uses AMD IBS Op sampling via the perf kernel counter infrastructure to
+ * deliver PA-keyed access reports to DAMON's shared-layer ring buffer
+ * (see damon_report_access()). This enables physical-address hot-page
+ * detection without relying on page-table Accessed bits or page faults.
+ *
+ * The IBS sampling approach in this file derives from concepts in
+ * Bharata B Rao's pghot RFC v5 series for hot page tracking.
+ * See: https://lore.kernel.org/linux-mm/20260129144043.231636-1-bharata@amd.com/
+ *
+ * Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
+ */
+
+#include <linux/cpu.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+
+#include <linux/damon.h>
+#include "ops-common.h"
+
+#define DAMON_IBS_DEFAULT_MAX_CNT 262144 /* ~4K samples/sec/core */
+#define IBS_OP_PMU_TYPE_PATH "/sys/bus/event_source/devices/ibs_op/type"
+
+static unsigned int damon_ibs_max_cnt = DAMON_IBS_DEFAULT_MAX_CNT;
+
+static int damon_ibs_set_sample_rate(unsigned int max_cnt);
+
+static int max_cnt_set(const char *val, const struct kernel_param *kp)
+{
+ unsigned int new_cnt;
+ int ret;
+
+ ret = kstrtouint(val, 0, &new_cnt);
+ if (ret)
+ return ret;
+ if (!new_cnt)
+ return -EINVAL;
+ return damon_ibs_set_sample_rate(new_cnt);
+}
+static const struct kernel_param_ops max_cnt_ops = {
+ .set = max_cnt_set,
+ .get = param_get_uint,
+};
+module_param_cb(max_cnt, &max_cnt_ops, &damon_ibs_max_cnt, 0644);
+MODULE_PARM_DESC(max_cnt,
+ "IBS MaxCnt (ops between samples). Writes restart sampling.");
+
+static DEFINE_MUTEX(damon_ibs_lock);
+static bool damon_ibs_enabled;
+static enum cpuhp_state damon_ibs_cpuhp_state;
+static unsigned int ibs_pmu_type; /* discovered at init */
+
+static DEFINE_PER_CPU(struct perf_event *, damon_ibs_event);
+
+/*
+ * Diagnostic counters. Incremented from NMI context, so use per-CPU
+ * counters and sum them on read.
+ */
+static DEFINE_PER_CPU(unsigned long, ibs_samples_total_pcpu);
+static DEFINE_PER_CPU(unsigned long, ibs_samples_filtered_pcpu);
+
+static unsigned long damon_ibs_sum_pcpu(unsigned long __percpu *var)
+{
+ unsigned long sum = 0;
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ sum += per_cpu(*var, cpu);
+ return sum;
+}
+
+static int samples_total_get(char *buffer, const struct kernel_param *kp)
+{
+ return sysfs_emit(buffer, "%lu\n",
+ damon_ibs_sum_pcpu(&ibs_samples_total_pcpu));
+}
+
+static int samples_filtered_get(char *buffer, const struct kernel_param *kp)
+{
+ return sysfs_emit(buffer, "%lu\n",
+ damon_ibs_sum_pcpu(&ibs_samples_filtered_pcpu));
+}
+
+static const struct kernel_param_ops samples_total_ops = {
+ .get = samples_total_get,
+};
+static const struct kernel_param_ops samples_filtered_ops = {
+ .get = samples_filtered_get,
+};
+
+module_param_cb(samples_total, &samples_total_ops, NULL, 0444);
+MODULE_PARM_DESC(samples_total, "Total IBS samples delivered (read-only)");
+module_param_cb(samples_filtered, &samples_filtered_ops, NULL, 0444);
+MODULE_PARM_DESC(samples_filtered, "IBS samples filtered out (read-only)");
+
+/**
+ * damon_ibs_overflow_handler() - IBS overflow callback.
+ *
+ * Called when an IBS Op counter overflows. The IBS perf driver fills
+ * data->phys_addr from IBSDCPHYSAD when dc_phy_addr_valid is set.
+ *
+ * Context: NMI — no sleeping, no mutex, no kmalloc.
+ */
+static void damon_ibs_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct damon_access_report report;
+ unsigned long phys_addr;
+
+ if (!data)
+ return;
+
+ /*
+ * PERF_SAMPLE_PHYS_ADDR was requested in attr.sample_type, but
+ * the IBS perf driver only populates data->phys_addr when
+ * IBS_OP_DATA3.dc_phy_addr_valid is set. Skip stale-PA samples
+ * by checking the sample_flags rather than testing phys_addr
+ * for zero (which would also drop legitimate page 0).
+ */
+ if (!(data->sample_flags & PERF_SAMPLE_PHYS_ADDR)) {
+ this_cpu_inc(ibs_samples_filtered_pcpu);
+ return;
+ }
+ phys_addr = data->phys_addr;
+
+ report = (struct damon_access_report){
+ .paddr = phys_addr & PAGE_MASK,
+ .size = PAGE_SIZE,
+ .cpu = smp_processor_id(),
+ .is_write = !!(data->data_src.mem_op & PERF_MEM_OP_STORE),
+ };
+ damon_report_access(&report);
+ this_cpu_inc(ibs_samples_total_pcpu);
+}
+
+static int damon_ibs_create_event(int cpu)
+{
+ struct perf_event_attr attr = {
+ .type = ibs_pmu_type,
+ .size = sizeof(attr),
+ /* config=0: IBS perf driver uses sample_period as MaxCnt. */
+ .config = 0,
+ .sample_period = damon_ibs_max_cnt,
+ .sample_type = PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_DATA_SRC,
+ .pinned = 1,
+ };
+ struct perf_event *event;
+
+ event = perf_event_create_kernel_counter(&attr, cpu, NULL,
+ damon_ibs_overflow_handler,
+ NULL);
+ if (IS_ERR(event))
+ return PTR_ERR(event);
+
+ /*
+ * perf_event_create_kernel_counter() returns the event already
+ * enabled; no perf_event_enable() needed here.
+ */
+ per_cpu(damon_ibs_event, cpu) = event;
+ return 0;
+}
+
+static void damon_ibs_destroy_event(int cpu)
+{
+ struct perf_event *event = per_cpu(damon_ibs_event, cpu);
+
+ if (!event)
+ return;
+
+ perf_event_disable(event);
+ perf_event_release_kernel(event);
+ per_cpu(damon_ibs_event, cpu) = NULL;
+}
+
+static int damon_ibs_cpu_online(unsigned int cpu)
+{
+ int ret = damon_ibs_create_event(cpu);
+
+ if (ret)
+ pr_warn_ratelimited(
+ "damon-ibs: failed to create perf_event on cpu %u (err %d); "
+ "this cpu will not contribute samples\n", cpu, ret);
+ return 0; /* never block CPU online */
+}
+
+static int damon_ibs_cpu_offline(unsigned int cpu)
+{
+ damon_ibs_destroy_event(cpu);
+ return 0;
+}
+
+/* Caller must hold damon_ibs_lock. */
+static int __damon_ibs_start(void)
+{
+ int ret;
+
+ if (damon_ibs_enabled)
+ return -EBUSY;
+
+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "damon/ibs:online",
+ damon_ibs_cpu_online, damon_ibs_cpu_offline);
+ if (ret < 0)
+ return ret;
+ damon_ibs_cpuhp_state = ret;
+
+ damon_ibs_enabled = true;
+ pr_info_once("damon-ibs: first start (max_cnt=%u, pmu_type=%u)\n",
+ damon_ibs_max_cnt, ibs_pmu_type);
+ return 0;
+}
+
+/* Caller must hold damon_ibs_lock. */
+static void __damon_ibs_stop(void)
+{
+ if (!damon_ibs_enabled)
+ return;
+
+ cpuhp_remove_state(damon_ibs_cpuhp_state);
+ damon_ibs_enabled = false;
+}
+
+static int damon_ibs_start(void)
+{
+ int ret;
+
+ mutex_lock(&damon_ibs_lock);
+ ret = __damon_ibs_start();
+ mutex_unlock(&damon_ibs_lock);
+ return ret;
+}
+
+static void damon_ibs_stop(void)
+{
+ mutex_lock(&damon_ibs_lock);
+ __damon_ibs_stop();
+ mutex_unlock(&damon_ibs_lock);
+}
+
+/**
+ * damon_ibs_set_sample_rate() - Set IBS sampling interval.
+ * @max_cnt: IBS Op MaxCnt value (ops between samples).
+ * Higher = fewer samples/sec.
+ *
+ * If IBS is already running, restart it with the new rate.
+ *
+ * Return: 0 on success; if a restart was required and failed,
+ * propagate the error so callers (e.g. the max_cnt module-param
+ * .set callback) surface it to userspace instead of silently
+ * leaving sampling stopped.
+ */
+static int damon_ibs_set_sample_rate(unsigned int max_cnt)
+{
+ int ret = 0;
+
+ mutex_lock(&damon_ibs_lock);
+ damon_ibs_max_cnt = max_cnt ? max_cnt : DAMON_IBS_DEFAULT_MAX_CNT;
+
+ if (damon_ibs_enabled) {
+ __damon_ibs_stop();
+ ret = __damon_ibs_start();
+ if (ret)
+ pr_warn("damon-ibs: restart failed: %d\n", ret);
+ }
+ mutex_unlock(&damon_ibs_lock);
+ return ret;
+}
+
+
+static void damon_ibs_init_ctx(struct damon_ctx *ctx)
+{
+ int ret;
+
+ /* IBS is the access-detection source for this ctx. */
+ ctx->sample_control.primitives_enabled.page_table = false;
+
+ ret = damon_ibs_start();
+ if (ret && ret != -EBUSY)
+ pr_warn("damon-ibs: failed to start IBS sampling: %d\n", ret);
+}
+
+/**
+ * damon_ibs_discover_pmu_type() - Discover IBS Op PMU type from sysfs.
+ *
+ * Reads /sys/bus/event_source/devices/ibs_op/type to get the PMU type
+ * identifier needed for perf_event_attr.type.
+ *
+ * TODO: replace sysfs-read with a PMU lookup API when one becomes
+ * available.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int damon_ibs_discover_pmu_type(void)
+{
+ struct file *f;
+ char buf[16];
+ loff_t pos = 0;
+ ssize_t len;
+ int ret;
+
+ f = filp_open(IBS_OP_PMU_TYPE_PATH, O_RDONLY, 0);
+ if (IS_ERR(f))
+ return PTR_ERR(f);
+
+ len = kernel_read(f, buf, sizeof(buf) - 1, &pos);
+ filp_close(f, NULL);
+ if (len <= 0)
+ return -EIO;
+
+ buf[len] = '\0';
+ ret = kstrtouint(strim(buf), 10, &ibs_pmu_type);
+ if (ret)
+ return ret;
+
+ pr_info("damon-ibs: discovered ibs_op PMU type=%u\n", ibs_pmu_type);
+ return 0;
+}
+
+static int __init damon_ibs_init(void)
+{
+ struct damon_operations ops = {
+ .id = DAMON_OPS_PADDR_IBS,
+ .owner = THIS_MODULE,
+ .init = damon_ibs_init_ctx,
+ .prepare_access_checks = damon_pa_prepare_access_checks,
+ .check_accesses = damon_pa_check_accesses,
+ .apply_probes = damon_pa_apply_probes,
+ .apply_scheme = damon_pa_apply_scheme,
+ .get_scheme_score = damon_pa_scheme_score,
+ };
+ int err;
+
+ if (!boot_cpu_has(X86_FEATURE_IBS))
+ return -ENODEV;
+
+ err = damon_ibs_discover_pmu_type();
+ if (err) {
+ pr_err("damon-ibs: failed to discover IBS PMU type: %d\n", err);
+ return err;
+ }
+
+ err = damon_register_ops(&ops);
+ if (err)
+ return err;
+
+ pr_info("damon-ibs: AMD IBS backend registered (max_cnt=%u, pmu_type=%u)\n",
+ damon_ibs_max_cnt, ibs_pmu_type);
+ return 0;
+}
+
+static void __exit damon_ibs_exit(void)
+{
+ damon_ibs_stop();
+ damon_unregister_ops(DAMON_OPS_PADDR_IBS);
+}
+
+module_init(damon_ibs_init);
+module_exit(damon_ibs_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Ravi Jonnalagadda <ravis.opensrc@gmail.com>");
+MODULE_DESCRIPTION("AMD IBS-based access sampling backend for DAMON");
--
2.43.0
prev parent reply other threads:[~2026-05-16 22:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-16 22:34 [RFC PATCH 0/7] mm/damon: hardware-sampled access reports + AMD IBS Op example Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 1/7] mm/damon/core: refcount ops owner module to prevent rmmod UAF Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 2/7] mm/damon/paddr: export damon_pa_* ops for IBS module Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 3/7] mm/damon/core: replace mutex-protected report buffer with per-CPU lockless ring Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 4/7] mm/damon/core: flat-array snapshot + bsearch in ring-drain loop Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 5/7] mm/damon: add sysfs binding and dispatch hookup for paddr_ibs operations Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 6/7] mm/damon/core: accept paddr_ibs in node_eligible_mem_bp ops check Ravi Jonnalagadda
2026-05-16 22:34 ` Ravi Jonnalagadda [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260516223439.4033-8-ravis.opensrc@gmail.com \
--to=ravis.opensrc@gmail.com \
--cc=ajayjoshi@micron.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=bijan311@gmail.com \
--cc=corbet@lwn.net \
--cc=damon@lists.linux.dev \
--cc=honggyu.kim@sk.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sj@kernel.org \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox