All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: sj@kernel.org, damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Cc: akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com,
	ravis.opensrc@gmail.com, bharata@amd.com
Subject: [RFC PATCH 7/7] mm/damon/damon_ibs: add AMD IBS-based access sampling backend
Date: Sat, 16 May 2026 15:34:32 -0700	[thread overview]
Message-ID: <20260516223439.4033-8-ravis.opensrc@gmail.com> (raw)
In-Reply-To: <20260516223439.4033-1-ravis.opensrc@gmail.com>

Add paddr_ibs operations using AMD IBS Op sampling via
perf_event_create_kernel_counter().  IBS delivers physical-address-
keyed access reports to DAMON's shared-layer ring buffer
(damon_report_access()), without dependency on PTE Accessed-bit
scanning or page faults.

Per-CPU IBS events are created and torn down via cpuhp notifiers
(CPUHP_AP_ONLINE_DYN).  Routing of access reports through the ring-
drain path is bound to ops.id == DAMON_OPS_PADDR_IBS at the dispatch
site (see "mm/damon: add sysfs binding and dispatch hookup for
paddr_ibs operations"), so .init does not need to flip a per-context
flag.

Sample-time discipline:
  - PERF_SAMPLE_PHYS_ADDR is requested in attr.sample_type, but the
    IBS perf driver only fills data->phys_addr when
    IBS_OP_DATA3.dc_phy_addr_valid is set.  Skip stale-PA samples by
    inspecting data->sample_flags rather than testing phys_addr for
    zero (which would also drop legitimate page 0).
  - PERF_SAMPLE_DATA_SRC is requested so the perf driver decodes
    IBS_OP_DATA3.{ld_op,st_op} into data->data_src.mem_op; the
    backend reports load vs store accordingly via
    damon_access_report.is_write.

Module parameters:
  - max_cnt: IBS Op MaxCnt (writable; writes call
    damon_ibs_set_sample_rate() to restart sampling at the new rate)
  - samples_total / samples_filtered: per-CPU-aggregated counters
    (read-only)

Source file is mm/damon/damon_ibs.c (renamed from mm/damon/ibs.c) so
the resulting module is loaded as damon_ibs.ko, avoiding the generic
"ibs" namespace.

The IBS sampling approach is derived from Bharata B Rao's pghot RFC v5
series; the attribution header is in damon_ibs.c.

Suggested-by: Bharata B Rao <bharata@amd.com>
Link: https://lore.kernel.org/linux-mm/20260129144043.231636-1-bharata@amd.com/
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
 mm/damon/Kconfig     |  10 ++
 mm/damon/Makefile    |   1 +
 mm/damon/damon_ibs.c | 369 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 380 insertions(+)
 create mode 100644 mm/damon/damon_ibs.c

diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index ad629f0f31d8d..bb698d2717f34 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -131,4 +131,14 @@ config DAMON_ACMA
 	  min/max memory for the system and maximum memory pressure stall time
 	  ratio.
 
+config DAMON_IBS
+	tristate "AMD IBS-based access sampling for DAMON"
+	depends on DAMON_PADDR && CPU_SUP_AMD && X86_64 && PERF_EVENTS
+	help
+	  Uses AMD IBS (Instruction-Based Sampling) hardware to deliver
+	  physical-address-keyed access reports to DAMON's shared-layer
+	  ring buffer, without relying on PTE Accessed-bit scanning or
+	  page faults.  Registers as the "paddr_ibs" operations set.
+	  Requires AMD processors with IBS Op support.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 22494754f41e8..109d0fb1db97d 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_DAMON_RECLAIM)	+= modules-common.o reclaim.o
 obj-$(CONFIG_DAMON_LRU_SORT)	+= modules-common.o lru_sort.o
 obj-$(CONFIG_DAMON_STAT)	+= modules-common.o stat.o
 obj-$(CONFIG_DAMON_ACMA)	+= modules-common.o acma.o
+obj-$(CONFIG_DAMON_IBS)		+= damon_ibs.o
diff --git a/mm/damon/damon_ibs.c b/mm/damon/damon_ibs.c
new file mode 100644
index 0000000000000..1dd99e91c3928
--- /dev/null
+++ b/mm/damon/damon_ibs.c
@@ -0,0 +1,369 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON IBS (Instruction-Based Sampling) backend for AMD processors.
+ *
+ * Uses AMD IBS Op sampling via the perf kernel counter infrastructure to
+ * deliver PA-keyed access reports to DAMON's shared-layer ring buffer
+ * (see damon_report_access()).  This enables physical-address hot-page
+ * detection without relying on page-table Accessed bits or page faults.
+ *
+ * The IBS sampling approach in this file derives from concepts in
+ * Bharata B Rao's pghot RFC v5 series for hot page tracking.
+ * See: https://lore.kernel.org/linux-mm/20260129144043.231636-1-bharata@amd.com/
+ *
+ * Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
+ */
+
+#include <linux/cpu.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+
+#include <linux/damon.h>
+#include "ops-common.h"
+
+#define DAMON_IBS_DEFAULT_MAX_CNT	262144	/* ~4K samples/sec/core */
+#define IBS_OP_PMU_TYPE_PATH	"/sys/bus/event_source/devices/ibs_op/type"
+
+static unsigned int damon_ibs_max_cnt = DAMON_IBS_DEFAULT_MAX_CNT;
+
+static int damon_ibs_set_sample_rate(unsigned int max_cnt);
+
+static int max_cnt_set(const char *val, const struct kernel_param *kp)
+{
+	unsigned int new_cnt;
+	int ret;
+
+	ret = kstrtouint(val, 0, &new_cnt);
+	if (ret)
+		return ret;
+	if (!new_cnt)
+		return -EINVAL;
+	return damon_ibs_set_sample_rate(new_cnt);
+}
+static const struct kernel_param_ops max_cnt_ops = {
+	.set = max_cnt_set,
+	.get = param_get_uint,
+};
+module_param_cb(max_cnt, &max_cnt_ops, &damon_ibs_max_cnt, 0644);
+MODULE_PARM_DESC(max_cnt,
+	"IBS MaxCnt (ops between samples). Writes restart sampling.");
+
+static DEFINE_MUTEX(damon_ibs_lock);
+static bool damon_ibs_enabled;
+static enum cpuhp_state damon_ibs_cpuhp_state;
+static unsigned int ibs_pmu_type;	/* discovered at init */
+
+static DEFINE_PER_CPU(struct perf_event *, damon_ibs_event);
+
+/*
+ * Diagnostic counters.  Incremented from NMI context, so use per-CPU
+ * counters and sum them on read.
+ */
+static DEFINE_PER_CPU(unsigned long, ibs_samples_total_pcpu);
+static DEFINE_PER_CPU(unsigned long, ibs_samples_filtered_pcpu);
+
+static unsigned long damon_ibs_sum_pcpu(unsigned long __percpu *var)
+{
+	unsigned long sum = 0;
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		sum += per_cpu(*var, cpu);
+	return sum;
+}
+
+static int samples_total_get(char *buffer, const struct kernel_param *kp)
+{
+	return sysfs_emit(buffer, "%lu\n",
+			damon_ibs_sum_pcpu(&ibs_samples_total_pcpu));
+}
+
+static int samples_filtered_get(char *buffer, const struct kernel_param *kp)
+{
+	return sysfs_emit(buffer, "%lu\n",
+			damon_ibs_sum_pcpu(&ibs_samples_filtered_pcpu));
+}
+
+static const struct kernel_param_ops samples_total_ops = {
+	.get = samples_total_get,
+};
+static const struct kernel_param_ops samples_filtered_ops = {
+	.get = samples_filtered_get,
+};
+
+module_param_cb(samples_total, &samples_total_ops, NULL, 0444);
+MODULE_PARM_DESC(samples_total, "Total IBS samples delivered (read-only)");
+module_param_cb(samples_filtered, &samples_filtered_ops, NULL, 0444);
+MODULE_PARM_DESC(samples_filtered, "IBS samples filtered out (read-only)");
+
+/**
+ * damon_ibs_overflow_handler() - IBS overflow callback.
+ *
+ * Called when an IBS Op counter overflows.  The IBS perf driver fills
+ * data->phys_addr from IBSDCPHYSAD when dc_phy_addr_valid is set.
+ *
+ * Context: NMI — no sleeping, no mutex, no kmalloc.
+ */
+static void damon_ibs_overflow_handler(struct perf_event *event,
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
+{
+	struct damon_access_report report;
+	unsigned long phys_addr;
+
+	if (!data)
+		return;
+
+	/*
+	 * PERF_SAMPLE_PHYS_ADDR was requested in attr.sample_type, but
+	 * the IBS perf driver only populates data->phys_addr when
+	 * IBS_OP_DATA3.dc_phy_addr_valid is set.  Skip stale-PA samples
+	 * by checking the sample_flags rather than testing phys_addr
+	 * for zero (which would also drop legitimate page 0).
+	 */
+	if (!(data->sample_flags & PERF_SAMPLE_PHYS_ADDR)) {
+		this_cpu_inc(ibs_samples_filtered_pcpu);
+		return;
+	}
+	phys_addr = data->phys_addr;
+
+	report = (struct damon_access_report){
+		.paddr = phys_addr & PAGE_MASK,
+		.size = PAGE_SIZE,
+		.cpu = smp_processor_id(),
+		.is_write = !!(data->data_src.mem_op & PERF_MEM_OP_STORE),
+	};
+	damon_report_access(&report);
+	this_cpu_inc(ibs_samples_total_pcpu);
+}
+
+static int damon_ibs_create_event(int cpu)
+{
+	struct perf_event_attr attr = {
+		.type = ibs_pmu_type,
+		.size = sizeof(attr),
+		/* config=0: IBS perf driver uses sample_period as MaxCnt. */
+		.config = 0,
+		.sample_period = damon_ibs_max_cnt,
+		.sample_type = PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_DATA_SRC,
+		.pinned = 1,
+	};
+	struct perf_event *event;
+
+	event = perf_event_create_kernel_counter(&attr, cpu, NULL,
+						 damon_ibs_overflow_handler,
+						 NULL);
+	if (IS_ERR(event))
+		return PTR_ERR(event);
+
+	/*
+	 * perf_event_create_kernel_counter() returns the event already
+	 * enabled; no perf_event_enable() needed here.
+	 */
+	per_cpu(damon_ibs_event, cpu) = event;
+	return 0;
+}
+
+static void damon_ibs_destroy_event(int cpu)
+{
+	struct perf_event *event = per_cpu(damon_ibs_event, cpu);
+
+	if (!event)
+		return;
+
+	perf_event_disable(event);
+	perf_event_release_kernel(event);
+	per_cpu(damon_ibs_event, cpu) = NULL;
+}
+
+static int damon_ibs_cpu_online(unsigned int cpu)
+{
+	int ret = damon_ibs_create_event(cpu);
+
+	if (ret)
+		pr_warn_ratelimited(
+			"damon-ibs: failed to create perf_event on cpu %u (err %d); "
+			"this cpu will not contribute samples\n", cpu, ret);
+	return 0;	/* never block CPU online */
+}
+
+static int damon_ibs_cpu_offline(unsigned int cpu)
+{
+	damon_ibs_destroy_event(cpu);
+	return 0;
+}
+
+/* Caller must hold damon_ibs_lock. */
+static int __damon_ibs_start(void)
+{
+	int ret;
+
+	if (damon_ibs_enabled)
+		return -EBUSY;
+
+	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "damon/ibs:online",
+				damon_ibs_cpu_online, damon_ibs_cpu_offline);
+	if (ret < 0)
+		return ret;
+	damon_ibs_cpuhp_state = ret;
+
+	damon_ibs_enabled = true;
+	pr_info_once("damon-ibs: first start (max_cnt=%u, pmu_type=%u)\n",
+		     damon_ibs_max_cnt, ibs_pmu_type);
+	return 0;
+}
+
+/* Caller must hold damon_ibs_lock. */
+static void __damon_ibs_stop(void)
+{
+	if (!damon_ibs_enabled)
+		return;
+
+	cpuhp_remove_state(damon_ibs_cpuhp_state);
+	damon_ibs_enabled = false;
+}
+
+static int damon_ibs_start(void)
+{
+	int ret;
+
+	mutex_lock(&damon_ibs_lock);
+	ret = __damon_ibs_start();
+	mutex_unlock(&damon_ibs_lock);
+	return ret;
+}
+
+static void damon_ibs_stop(void)
+{
+	mutex_lock(&damon_ibs_lock);
+	__damon_ibs_stop();
+	mutex_unlock(&damon_ibs_lock);
+}
+
+/**
+ * damon_ibs_set_sample_rate() - Set IBS sampling interval.
+ * @max_cnt: IBS Op MaxCnt value (ops between samples).
+ *           Higher = fewer samples/sec.
+ *
+ * If IBS is already running, restart it with the new rate.
+ *
+ * Return: 0 on success; if a restart was required and failed,
+ * propagate the error so callers (e.g. the max_cnt module-param
+ * .set callback) surface it to userspace instead of silently
+ * leaving sampling stopped.
+ */
+static int damon_ibs_set_sample_rate(unsigned int max_cnt)
+{
+	int ret = 0;
+
+	mutex_lock(&damon_ibs_lock);
+	damon_ibs_max_cnt = max_cnt ? max_cnt : DAMON_IBS_DEFAULT_MAX_CNT;
+
+	if (damon_ibs_enabled) {
+		__damon_ibs_stop();
+		ret = __damon_ibs_start();
+		if (ret)
+			pr_warn("damon-ibs: restart failed: %d\n", ret);
+	}
+	mutex_unlock(&damon_ibs_lock);
+	return ret;
+}
+
+
+static void damon_ibs_init_ctx(struct damon_ctx *ctx)
+{
+	int ret;
+
+	/* IBS is the access-detection source for this ctx. */
+	ctx->sample_control.primitives_enabled.page_table = false;
+
+	ret = damon_ibs_start();
+	if (ret && ret != -EBUSY)
+		pr_warn("damon-ibs: failed to start IBS sampling: %d\n", ret);
+}
+
+/**
+ * damon_ibs_discover_pmu_type() - Discover IBS Op PMU type from sysfs.
+ *
+ * Reads /sys/bus/event_source/devices/ibs_op/type to get the PMU type
+ * identifier needed for perf_event_attr.type.
+ *
+ * TODO: replace sysfs-read with a PMU lookup API when one becomes
+ * available.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int damon_ibs_discover_pmu_type(void)
+{
+	struct file *f;
+	char buf[16];
+	loff_t pos = 0;
+	ssize_t len;
+	int ret;
+
+	f = filp_open(IBS_OP_PMU_TYPE_PATH, O_RDONLY, 0);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	len = kernel_read(f, buf, sizeof(buf) - 1, &pos);
+	filp_close(f, NULL);
+	if (len <= 0)
+		return -EIO;
+
+	buf[len] = '\0';
+	ret = kstrtouint(strim(buf), 10, &ibs_pmu_type);
+	if (ret)
+		return ret;
+
+	pr_info("damon-ibs: discovered ibs_op PMU type=%u\n", ibs_pmu_type);
+	return 0;
+}
+
+static int __init damon_ibs_init(void)
+{
+	struct damon_operations ops = {
+		.id = DAMON_OPS_PADDR_IBS,
+		.owner = THIS_MODULE,
+		.init = damon_ibs_init_ctx,
+		.prepare_access_checks = damon_pa_prepare_access_checks,
+		.check_accesses = damon_pa_check_accesses,
+		.apply_probes = damon_pa_apply_probes,
+		.apply_scheme = damon_pa_apply_scheme,
+		.get_scheme_score = damon_pa_scheme_score,
+	};
+	int err;
+
+	if (!boot_cpu_has(X86_FEATURE_IBS))
+		return -ENODEV;
+
+	err = damon_ibs_discover_pmu_type();
+	if (err) {
+		pr_err("damon-ibs: failed to discover IBS PMU type: %d\n", err);
+		return err;
+	}
+
+	err = damon_register_ops(&ops);
+	if (err)
+		return err;
+
+	pr_info("damon-ibs: AMD IBS backend registered (max_cnt=%u, pmu_type=%u)\n",
+		damon_ibs_max_cnt, ibs_pmu_type);
+	return 0;
+}
+
+static void __exit damon_ibs_exit(void)
+{
+	damon_ibs_stop();
+	damon_unregister_ops(DAMON_OPS_PADDR_IBS);
+}
+
+module_init(damon_ibs_init);
+module_exit(damon_ibs_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Ravi Jonnalagadda <ravis.opensrc@gmail.com>");
+MODULE_DESCRIPTION("AMD IBS-based access sampling backend for DAMON");
-- 
2.43.0


      parent reply	other threads:[~2026-05-16 22:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-16 22:34 [RFC PATCH 0/7] mm/damon: hardware-sampled access reports + AMD IBS Op example Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 1/7] mm/damon/core: refcount ops owner module to prevent rmmod UAF Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 2/7] mm/damon/paddr: export damon_pa_* ops for IBS module Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 3/7] mm/damon/core: replace mutex-protected report buffer with per-CPU lockless ring Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 4/7] mm/damon/core: flat-array snapshot + bsearch in ring-drain loop Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 5/7] mm/damon: add sysfs binding and dispatch hookup for paddr_ibs operations Ravi Jonnalagadda
2026-05-16 22:34 ` [RFC PATCH 6/7] mm/damon/core: accept paddr_ibs in node_eligible_mem_bp ops check Ravi Jonnalagadda
2026-05-16 22:34 ` Ravi Jonnalagadda [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516223439.4033-8-ravis.opensrc@gmail.com \
    --to=ravis.opensrc@gmail.com \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.