Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 6.12 374/666] perf evsel: Add alternate_hw_config and use in evsel__match
From: Greg Kroah-Hartman @ 2026-05-20 16:19 UTC (permalink / raw)
  To: stable
  Cc: Greg Kroah-Hartman, patches, Namhyung Kim, Ian Rogers, Kan Liang,
	James Clark, Yang Jihong, Dominique Martinet, Colin Ian King,
	Howard Chu, Yunseong Kim, Ze Gao, Yicong Yang, Weilin Wang,
	Will Deacon, Mike Leach, Jing Zhang, Yang Li, Leo Yan, ak,
	Athira Rajeev, linux-arm-kernel, Sun Haiyong, John Garry,
	Sasha Levin
In-Reply-To: <20260520162111.222830634@linuxfoundation.org>

6.12-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ian Rogers <irogers@google.com>

[ Upstream commit 22a4db3c36034e2b034c5b88414680857fc59cf4 ]

There are cases where we want to match events like instructions and
cycles with legacy hardware values, in particular in stat-shadow's
hard coded metrics. An evsel's name isn't a good point of reference as
it gets altered, strstr would be too imprecise and re-parsing the
event from its name is silly. Instead, hold the legacy hardware event
name, determined during parsing, in the evsel for this matching case.

Inline evsel__match2 that is only used in builtin-diff.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Yunseong Kim <yskelg@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: ak@linux.intel.com
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20240926144851.245903-2-james.clark@linaro.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Stable-dep-of: c9ef786c0970 ("perf cgroup: Update metric leader in evlist__expand_cgroup")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 tools/perf/builtin-diff.c      |  6 ++--
 tools/perf/util/evsel.c        | 21 ++++++++++++
 tools/perf/util/evsel.h        | 19 ++---------
 tools/perf/util/parse-events.c | 59 +++++++++++++++++++++-------------
 tools/perf/util/parse-events.h |  8 ++++-
 tools/perf/util/parse-events.y |  2 +-
 tools/perf/util/pmu.c          |  6 +++-
 tools/perf/util/pmu.h          |  2 +-
 8 files changed, 77 insertions(+), 46 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 23326dd203339..82fb7773e03e6 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -469,13 +469,13 @@ static int diff__process_sample_event(const struct perf_tool *tool,
 
 static struct perf_diff pdiff;
 
-static struct evsel *evsel_match(struct evsel *evsel,
-				      struct evlist *evlist)
+static struct evsel *evsel_match(struct evsel *evsel, struct evlist *evlist)
 {
 	struct evsel *e;
 
 	evlist__for_each_entry(evlist, e) {
-		if (evsel__match2(evsel, e))
+		if ((evsel->core.attr.type == e->core.attr.type) &&
+		    (evsel->core.attr.config == e->core.attr.config))
 			return e;
 	}
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dda107b12b8c6..6e8d70ec05bad 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -299,6 +299,7 @@ void evsel__init(struct evsel *evsel,
 	evsel->pmu_name      = NULL;
 	evsel->group_pmu_name = NULL;
 	evsel->skippable     = false;
+	evsel->alternate_hw_config = PERF_COUNT_HW_MAX;
 }
 
 struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx)
@@ -445,6 +446,8 @@ struct evsel *evsel__clone(struct evsel *orig)
 	if (evsel__copy_config_terms(evsel, orig) < 0)
 		goto out_err;
 
+	evsel->alternate_hw_config = orig->alternate_hw_config;
+
 	return evsel;
 
 out_err:
@@ -1856,6 +1859,24 @@ static int evsel__read_tool(struct evsel *evsel, int cpu_map_idx, int thread)
 	return 0;
 }
 
+bool __evsel__match(const struct evsel *evsel, u32 type, u64 config)
+{
+
+	u32 e_type = evsel->core.attr.type;
+	u64 e_config = evsel->core.attr.config;
+
+	if (e_type != type) {
+		return type == PERF_TYPE_HARDWARE && evsel->pmu && evsel->pmu->is_core &&
+			evsel->alternate_hw_config == config;
+	}
+
+	if ((type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE) &&
+	    perf_pmus__supports_extended_type())
+		e_config &= PERF_HW_EVENT_MASK;
+
+	return e_config == config;
+}
+
 int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread)
 {
 	if (evsel__is_tool(evsel))
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 26574a33a7250..dc0d300776f16 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -102,6 +102,7 @@ struct evsel {
 		int			bpf_fd;
 		struct bpf_object	*bpf_obj;
 		struct list_head	config_terms;
+		u64			alternate_hw_config;
 	};
 
 	/*
@@ -395,26 +396,10 @@ u64 format_field__intval(struct tep_format_field *field, struct perf_sample *sam
 struct tep_format_field *evsel__field(struct evsel *evsel, const char *name);
 struct tep_format_field *evsel__common_field(struct evsel *evsel, const char *name);
 
-static inline bool __evsel__match(const struct evsel *evsel, u32 type, u64 config)
-{
-	if (evsel->core.attr.type != type)
-		return false;
-
-	if ((type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE)  &&
-	    perf_pmus__supports_extended_type())
-		return (evsel->core.attr.config & PERF_HW_EVENT_MASK) == config;
-
-	return evsel->core.attr.config == config;
-}
+bool __evsel__match(const struct evsel *evsel, u32 type, u64 config);
 
 #define evsel__match(evsel, t, c) __evsel__match(evsel, PERF_TYPE_##t, PERF_COUNT_##c)
 
-static inline bool evsel__match2(struct evsel *e1, struct evsel *e2)
-{
-	return (e1->core.attr.type == e2->core.attr.type) &&
-	       (e1->core.attr.config == e2->core.attr.config);
-}
-
 int evsel__read_counter(struct evsel *evsel, int cpu_map_idx, int thread);
 
 int __evsel__read_on_cpu(struct evsel *evsel, int cpu_map_idx, int thread, bool scale);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9a8be1e46d674..fcc4dab618bee 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -228,7 +228,7 @@ __add_event(struct list_head *list, int *idx,
 	    bool init_attr,
 	    const char *name, const char *metric_id, struct perf_pmu *pmu,
 	    struct list_head *config_terms, bool auto_merge_stats,
-	    struct perf_cpu_map *cpu_list)
+	    struct perf_cpu_map *cpu_list, u64 alternate_hw_config)
 {
 	struct evsel *evsel;
 	struct perf_cpu_map *cpus = perf_cpu_map__is_empty(cpu_list) && pmu ? pmu->cpus : cpu_list;
@@ -264,6 +264,7 @@ __add_event(struct list_head *list, int *idx,
 	evsel->auto_merge_stats = auto_merge_stats;
 	evsel->pmu = pmu;
 	evsel->pmu_name = pmu ? strdup(pmu->name) : NULL;
+	evsel->alternate_hw_config = alternate_hw_config;
 
 	if (name)
 		evsel->name = strdup(name);
@@ -286,16 +287,19 @@ struct evsel *parse_events__add_event(int idx, struct perf_event_attr *attr,
 {
 	return __add_event(/*list=*/NULL, &idx, attr, /*init_attr=*/false, name,
 			   metric_id, pmu, /*config_terms=*/NULL,
-			   /*auto_merge_stats=*/false, /*cpu_list=*/NULL);
+			   /*auto_merge_stats=*/false, /*cpu_list=*/NULL,
+			   /*alternate_hw_config=*/PERF_COUNT_HW_MAX);
 }
 
 static int add_event(struct list_head *list, int *idx,
 		     struct perf_event_attr *attr, const char *name,
-		     const char *metric_id, struct list_head *config_terms)
+		     const char *metric_id, struct list_head *config_terms,
+		     u64 alternate_hw_config)
 {
 	return __add_event(list, idx, attr, /*init_attr*/true, name, metric_id,
 			   /*pmu=*/NULL, config_terms,
-			   /*auto_merge_stats=*/false, /*cpu_list=*/NULL) ? 0 : -ENOMEM;
+			   /*auto_merge_stats=*/false, /*cpu_list=*/NULL,
+			   alternate_hw_config) ? 0 : -ENOMEM;
 }
 
 static int add_event_tool(struct list_head *list, int *idx,
@@ -315,7 +319,8 @@ static int add_event_tool(struct list_head *list, int *idx,
 	evsel = __add_event(list, idx, &attr, /*init_attr=*/true, /*name=*/NULL,
 			    /*metric_id=*/NULL, /*pmu=*/NULL,
 			    /*config_terms=*/NULL, /*auto_merge_stats=*/false,
-			    cpu_list);
+			    cpu_list,
+			    /*alternate_hw_config=*/PERF_COUNT_HW_MAX);
 	perf_cpu_map__put(cpu_list);
 	if (!evsel)
 		return -ENOMEM;
@@ -450,7 +455,7 @@ bool parse_events__filter_pmu(const struct parse_events_state *parse_state,
 static int parse_events_add_pmu(struct parse_events_state *parse_state,
 				struct list_head *list, struct perf_pmu *pmu,
 				const struct parse_events_terms *const_parsed_terms,
-				bool auto_merge_stats);
+				bool auto_merge_stats, u64 alternate_hw_config);
 
 int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
 			   struct parse_events_state *parse_state,
@@ -476,7 +481,8 @@ int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
 			 */
 			ret = parse_events_add_pmu(parse_state, list, pmu,
 						   parsed_terms,
-						   perf_pmu__auto_merge_stats(pmu));
+						   perf_pmu__auto_merge_stats(pmu),
+						   /*alternate_hw_config=*/PERF_COUNT_HW_MAX);
 			if (ret)
 				return ret;
 			continue;
@@ -507,7 +513,8 @@ int parse_events_add_cache(struct list_head *list, int *idx, const char *name,
 
 		if (__add_event(list, idx, &attr, /*init_attr*/true, config_name ?: name,
 				metric_id, pmu, &config_terms, /*auto_merge_stats=*/false,
-				/*cpu_list=*/NULL) == NULL)
+				/*cpu_list=*/NULL,
+				/*alternate_hw_config=*/PERF_COUNT_HW_MAX) == NULL)
 			return -ENOMEM;
 
 		free_config_terms(&config_terms);
@@ -772,7 +779,7 @@ int parse_events_add_breakpoint(struct parse_events_state *parse_state,
 	name = get_config_name(head_config);
 
 	return add_event(list, &parse_state->idx, &attr, name, /*mertic_id=*/NULL,
-			 &config_terms);
+			&config_terms, /*alternate_hw_config=*/PERF_COUNT_HW_MAX);
 }
 
 static int check_type_val(struct parse_events_term *term,
@@ -1072,6 +1079,7 @@ static int config_term_pmu(struct perf_event_attr *attr,
 		if (perf_pmu__have_event(pmu, term->config)) {
 			term->type_term = PARSE_EVENTS__TERM_TYPE_USER;
 			term->no_value = true;
+			term->alternate_hw_config = true;
 		} else {
 			attr->type = PERF_TYPE_HARDWARE;
 			attr->config = term->val.num;
@@ -1384,8 +1392,9 @@ static int __parse_events_add_numeric(struct parse_events_state *parse_state,
 	name = get_config_name(head_config);
 	metric_id = get_config_metric_id(head_config);
 	ret = __add_event(list, &parse_state->idx, &attr, /*init_attr*/true, name,
-			metric_id, pmu, &config_terms, /*auto_merge_stats=*/false,
-			/*cpu_list=*/NULL) ? 0 : -ENOMEM;
+			  metric_id, pmu, &config_terms, /*auto_merge_stats=*/false,
+			  /*cpu_list=*/NULL, /*alternate_hw_config=*/PERF_COUNT_HW_MAX
+		) == NULL ? -ENOMEM : 0;
 	free_config_terms(&config_terms);
 	return ret;
 }
@@ -1443,7 +1452,7 @@ static bool config_term_percore(struct list_head *config_terms)
 static int parse_events_add_pmu(struct parse_events_state *parse_state,
 				struct list_head *list, struct perf_pmu *pmu,
 				const struct parse_events_terms *const_parsed_terms,
-				bool auto_merge_stats)
+				bool auto_merge_stats, u64 alternate_hw_config)
 {
 	struct perf_event_attr attr;
 	struct perf_pmu_info info;
@@ -1480,7 +1489,7 @@ static int parse_events_add_pmu(struct parse_events_state *parse_state,
 				    /*init_attr=*/true, /*name=*/NULL,
 				    /*metric_id=*/NULL, pmu,
 				    /*config_terms=*/NULL, auto_merge_stats,
-				    /*cpu_list=*/NULL);
+				    /*cpu_list=*/NULL, alternate_hw_config);
 		return evsel ? 0 : -ENOMEM;
 	}
 
@@ -1501,7 +1510,8 @@ static int parse_events_add_pmu(struct parse_events_state *parse_state,
 
 	/* Look for event names in the terms and rewrite into format based terms. */
 	if (perf_pmu__check_alias(pmu, &parsed_terms,
-				  &info, &alias_rewrote_terms, err)) {
+				  &info, &alias_rewrote_terms,
+				  &alternate_hw_config, err)) {
 		parse_events_terms__exit(&parsed_terms);
 		return -EINVAL;
 	}
@@ -1546,7 +1556,8 @@ static int parse_events_add_pmu(struct parse_events_state *parse_state,
 	evsel = __add_event(list, &parse_state->idx, &attr, /*init_attr=*/true,
 			    get_config_name(&parsed_terms),
 			    get_config_metric_id(&parsed_terms), pmu,
-			    &config_terms, auto_merge_stats, /*cpu_list=*/NULL);
+			    &config_terms, auto_merge_stats, /*cpu_list=*/NULL,
+			    alternate_hw_config);
 	if (!evsel) {
 		parse_events_terms__exit(&parsed_terms);
 		return -ENOMEM;
@@ -1567,7 +1578,7 @@ static int parse_events_add_pmu(struct parse_events_state *parse_state,
 }
 
 int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
-			       const char *event_name,
+			       const char *event_name, u64 hw_config,
 			       const struct parse_events_terms *const_parsed_terms,
 			       struct list_head **listp, void *loc_)
 {
@@ -1620,7 +1631,7 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 
 		auto_merge_stats = perf_pmu__auto_merge_stats(pmu);
 		if (!parse_events_add_pmu(parse_state, list, pmu,
-					  &parsed_terms, auto_merge_stats)) {
+					  &parsed_terms, auto_merge_stats, hw_config)) {
 			struct strbuf sb;
 
 			strbuf_init(&sb, /*hint=*/ 0);
@@ -1633,7 +1644,7 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 
 	if (parse_state->fake_pmu) {
 		if (!parse_events_add_pmu(parse_state, list, perf_pmus__fake_pmu(), &parsed_terms,
-					  /*auto_merge_stats=*/true)) {
+					  /*auto_merge_stats=*/true, hw_config)) {
 			struct strbuf sb;
 
 			strbuf_init(&sb, /*hint=*/ 0);
@@ -1674,13 +1685,15 @@ int parse_events_multi_pmu_add_or_add_pmu(struct parse_events_state *parse_state
 	/* Attempt to add to list assuming event_or_pmu is a PMU name. */
 	pmu = perf_pmus__find(event_or_pmu);
 	if (pmu && !parse_events_add_pmu(parse_state, *listp, pmu, const_parsed_terms,
-					/*auto_merge_stats=*/false))
+					 /*auto_merge_stats=*/false,
+					 /*alternate_hw_config=*/PERF_COUNT_HW_MAX))
 		return 0;
 
 	if (parse_state->fake_pmu) {
 		if (!parse_events_add_pmu(parse_state, *listp, perf_pmus__fake_pmu(),
 					  const_parsed_terms,
-					  /*auto_merge_stats=*/false))
+					  /*auto_merge_stats=*/false,
+					  /*alternate_hw_config=*/PERF_COUNT_HW_MAX))
 			return 0;
 	}
 
@@ -1693,7 +1706,8 @@ int parse_events_multi_pmu_add_or_add_pmu(struct parse_events_state *parse_state
 
 			if (!parse_events_add_pmu(parse_state, *listp, pmu,
 						  const_parsed_terms,
-						  auto_merge_stats)) {
+						  auto_merge_stats,
+						  /*alternate_hw_config=*/PERF_COUNT_HW_MAX)) {
 				ok++;
 				parse_state->wild_card_pmus = true;
 			}
@@ -1704,7 +1718,8 @@ int parse_events_multi_pmu_add_or_add_pmu(struct parse_events_state *parse_state
 
 	/* Failure to add, assume event_or_pmu is an event name. */
 	zfree(listp);
-	if (!parse_events_multi_pmu_add(parse_state, event_or_pmu, const_parsed_terms, listp, loc))
+	if (!parse_events_multi_pmu_add(parse_state, event_or_pmu, PERF_COUNT_HW_MAX,
+					const_parsed_terms, listp, loc))
 		return 0;
 
 	if (asprintf(&help, "Unable to find PMU or event on a PMU of '%s'", event_or_pmu) < 0)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 10cc9c433116d..2b52f8d6aa29a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -127,6 +127,12 @@ struct parse_events_term {
 	 * value is assumed to be 1. An event name also has no value.
 	 */
 	bool no_value;
+	/**
+	 * @alternate_hw_config: config is the event name but num is an
+	 * alternate PERF_TYPE_HARDWARE config value which is often nice for the
+	 * sake of quick matching.
+	 */
+	bool alternate_hw_config;
 };
 
 struct parse_events_error {
@@ -238,7 +244,7 @@ struct evsel *parse_events__add_event(int idx, struct perf_event_attr *attr,
 				      struct perf_pmu *pmu);
 
 int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
-			       const char *event_name,
+			       const char *event_name, u64 hw_config,
 			       const struct parse_events_terms *const_parsed_terms,
 			       struct list_head **listp, void *loc);
 
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index b3c51f06cbdc4..dcf47fabdfdd7 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -292,7 +292,7 @@ PE_NAME sep_dc
 	struct list_head *list;
 	int err;
 
-	err = parse_events_multi_pmu_add(_parse_state, $1, NULL, &list, &@1);
+	err = parse_events_multi_pmu_add(_parse_state, $1, PERF_COUNT_HW_MAX, NULL, &list, &@1);
 	if (err < 0) {
 		struct parse_events_state *parse_state = _parse_state;
 		struct parse_events_error *error = parse_state->error;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8b4e346808b4c..8885998c19530 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1606,7 +1606,7 @@ static int check_info_data(struct perf_pmu *pmu,
  */
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_terms,
 			  struct perf_pmu_info *info, bool *rewrote_terms,
-			  struct parse_events_error *err)
+			  u64 *alternate_hw_config, struct parse_events_error *err)
 {
 	struct parse_events_term *term, *h;
 	struct perf_pmu_alias *alias;
@@ -1638,6 +1638,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_
 						NULL);
 			return ret;
 		}
+
 		*rewrote_terms = true;
 		ret = check_info_data(pmu, alias, info, err, term->err_term);
 		if (ret)
@@ -1646,6 +1647,9 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_
 		if (alias->per_pkg)
 			info->per_pkg = true;
 
+		if (term->alternate_hw_config)
+			*alternate_hw_config = term->val.num;
+
 		list_del_init(&term->list);
 		parse_events_term__delete(term);
 	}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index bcd278b9b546f..0222124b86b92 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -220,7 +220,7 @@ __u64 perf_pmu__format_bits(struct perf_pmu *pmu, const char *name);
 int perf_pmu__format_type(struct perf_pmu *pmu, const char *name);
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct parse_events_terms *head_terms,
 			  struct perf_pmu_info *info, bool *rewrote_terms,
-			  struct parse_events_error *err);
+			  u64 *alternate_hw_config, struct parse_events_error *err);
 int perf_pmu__find_event(struct perf_pmu *pmu, const char *event, void *state, pmu_event_callback cb);
 
 void perf_pmu_format__set_value(void *format, int config, unsigned long *bits);
-- 
2.53.0





^ permalink raw reply related

* Re: [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device
From: Nicolin Chen @ 2026-05-20 18:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Helgaas,
	Rafael J . Wysocki, Len Brown, Pranjal Shrivastava, Mostafa Saleh,
	Lu Baolu, Kevin Tian, linux-arm-kernel, iommu, linux-kernel,
	linux-acpi, linux-pci, vsethi, Shuai Xue
In-Reply-To: <20260520175123.GZ3602937@nvidia.com>

On Wed, May 20, 2026 at 02:51:23PM -0300, Jason Gunthorpe wrote:
> On Wed, May 20, 2026 at 12:20:25AM -0700, Nicolin Chen wrote:
> > > > I see you suggest to treat the entire batch as ATS-broken. Just to
> > > > confirm: without per-SID retry, that might falsely block a healthy
> > > > device in the ATC batch, right? The driver now batches all ATC_INV
> > > > commands via arm_smmu_invs_end_batch().
> > > 
> > > Yes, it is not good, but a giant complex series is not reviewable. So
> > > I'd start with trashing all the devices, then come with a narrowing.
> > 
> > I can take that path for now and leave a FIXME.
> > 
> > Another option is to not batch multiple devices, until we support
> > retry (which shouldn't be hard to add since we've already done the
> > coding)?
> 
> That's an interesting idea, it undoes some of the meaningful
> optimization we have recently done though :\

I remember you didn't like it. That's why we had the retry(), which
I feel we should keep it..

> > > We cannot eliminate parallel ATS invalidation. Two threads could be
> > > concurrently processing the invs list. So it has handle it, the driver
> > > is going to have to tolerate a number of redundant error events.
> > 
> > OK. That sounds like we still need a flag or locking so that at
> > least pci_disable_ats() would not be called again. I will see
> > what I can do.
> 
> I think we can call pci_disable_ats() as many times as we want

That triggers WARN_ON(!dev->ats_enabled) in pci_disable_ats :-(

> we
> mostly need the driver to merge multiple error notifications for the
> same event.

Yes.

> > > But I wasn't thinking we can rely on existing AER events here, yes
> > > probably there will be AERs associated with the device exploding so
> > > badly it cannot do ATS, but also maybe not..
> > 
> > So, should I put the AER injection on hold for a future work? To
> > be honest, I am still not very clear how AER injection could help
> > here; or is it for a case where ATC times out while device isn't
> > aware of any AER fault?
> 
> Right, if we don't get an AER fault then we should ensure the ATC is
> surfaced, but you have a reasonable point that it isn't so likely the
> get an ATC invalidation timeout without a corresponding related AER..
> 
> Still, I'd feel better if it is was definititive and we didn't rely on
> this. This further points that the driver has to merge multiple error
> notifications if it gets some AERs and a new "ATC ERROR" all for the
> same key event.

I feel some race here... Part of the complexity of this v4 is to deal
with concurrent device reset during the async report() between IOMMU
core and driver. Now, we add AER that could compete on the device side
as well...

I will see what I can do here, yet likely would defer it to a followup
series, given the direction is to shrink the size of the series.

Thanks
Nicolin


^ permalink raw reply

* Re: [PATCH 5/8] dt-bindings: arm: ras: Introduce bindings for ARM AEST
From: Umang Chheda @ 2026-05-20 18:13 UTC (permalink / raw)
  To: Rob Herring
  Cc: Ruidong Tian, Tony Luck, Borislav Petkov, Krzysztof Kozlowski,
	Conor Dooley, Bjorn Andersson, Konrad Dybcio, catalin.marinas,
	will, lpieralisi, rafael, mark.rutland, Sudeep Holla,
	linux-arm-msm, linux-acpi, linux-arm-kernel, linux-edac,
	linux-kernel, devicetree
In-Reply-To: <20260513175823.GA1471517-robh@kernel.org>

Hello Rob,

Thanks for helping reviewing the code!

On 5/13/2026 11:28 PM, Rob Herring wrote:
> On Tue, May 05, 2026 at 05:53:49PM +0530, Umang Chheda wrote:
>> The Arm Error Source Table (AEST) specification describes how firmware
>> exposes RAS error source topology to the operating system. On ACPI
>> systems this information is provided via the AEST ACPI table.
>>
>> Introduce Device Tree bindings that provide an equivalent description
>> of AEST error sources for DT-based platforms.
>>
>> Signed-off-by: Umang Chheda <umang.chheda@oss.qualcomm.com>
>> ---
>>  .../devicetree/bindings/arm/arm,aest.yaml          | 406 +++++++++++++++++++++
>>  include/dt-bindings/arm/aest.h                     |  43 +++
>>  2 files changed, 449 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/arm,aest.yaml b/Documentation/devicetree/bindings/arm/arm,aest.yaml
>> new file mode 100644
>> index 000000000000..7809a0d38270
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/arm,aest.yaml
>> @@ -0,0 +1,406 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/arm/arm,aest.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Arm Error Source Table (AEST)
>> +
>> +maintainers:
>> +  - Umang Chheda <umang.chheda@oss.qualcomm.com>
>> +
>> +description:
>> +  The Arm Error Source Table (AEST) describes RAS error sources and their
>> +  register interfaces. Each error source exposes one or more error records
>> +  through either system registers or a memory-mapped register window, and
>> +  may signal errors via interrupts. The top-level node acts as a container
>> +  for one or more child nodes, each describing a single AEST error source.
>> +  Refer to the Arm AEST specification (DEN0085 / DDI 0587B) for details.
>> +  Flag bit constants for use in DT source files are defined in
>> +  <dt-bindings/arm/aest.h>.
>> +
>> +properties:
>> +  compatible:
>> +    const: arm,aest
>> +
>> +  "#address-cells":
>> +    const: 2
>> +
>> +  "#size-cells":
>> +    const: 2
>> +
>> +  ranges: true
>> +
>> +required:
>> +  - compatible
>> +
>> +additionalProperties: false
>> +
>> +patternProperties:
>> +  "^aest-[a-z0-9-]+(@[0-9a-f]+)?$":
>> +    type: object
>> +    description:
>> +      An AEST error source node describing one error source defined by
>> +      the Arm AEST specification.
>> +
>> +    properties:
>> +      compatible:
>> +        description:
>> +          Identifies the type of AEST error source. Each value corresponds to
>> +          a distinct error source class defined by the Arm AEST specification.
>> +          arm,aest-proxy represents a proxy error source that forwards errors
>> +          from another error source.
>> +        enum:
>> +          - arm,aest-processor
>> +          - arm,aest-memory
>> +          - arm,aest-smmu
>> +          - arm,aest-gic
>> +          - arm,aest-pcie
>> +          - arm,aest-vendor
>> +          - arm,aest-proxy
> 
> This is a fundamental difference how DT and ACPI get structured. ACPI 
> defines new table for some feature and puts everything in that table. 
> For DT, these all belong in the node for the corresponding h/w. For 
> example, if the GIC supports AEST, then that belongs in the GIC node.

Thanks for the feedback. To clarify your suggestion — should the AEST
RAS properties be added directly as properties of the hardware node
(e.g. arm,ras-num-records inside the cpu@0 node itself), or as a child
node under the hardware node (e.g. a ras-error-source {} child under cpu@0)?


> 
>> +
>> +      reg:
>> +        description:
>> +          Register ranges for the error source. Absence of reg implies
>> +          system-register access (interface type 0). A single range implies
>> +          memory-mapped access (interface type 1). Two ranges imply
>> +          single-record memory-mapped access (interface type 2).
>> +        minItems: 1
>> +        maxItems: 4
>> +
>> +      reg-names:
>> +        description:
>> +          Names for the register ranges. The base error-record window is
>> +          unnamed (or first entry). Optional named ranges provide access to
>> +          the fault-injection, error-group, and interrupt-config register
>> +          windows defined by the AEST specification.
>> +        minItems: 1
>> +        maxItems: 4
>> +        items:
>> +          enum:
>> +            - fault-inject
>> +            - err-group
>> +            - irq-config
>> +
>> +      interrupts:
>> +        description: Interrupts associated with the error source.
>> +        minItems: 1
>> +        maxItems: 2
>> +
>> +      interrupt-names:
>> +        description: Names of the interrupts associated with the error source.
>> +        minItems: 1
>> +        maxItems: 2
>> +        items:
>> +          enum:
>> +            - fhi
>> +            - eri
>> +
>> +      arm,fhi-flags:
>> +        description:
>> +          Bitmask of flags for the fault-handling interrupt (FHI), as defined
>> +          in the AEST node interrupt structure flags field. Constants are
>> +          defined in <dt-bindings/arm/aest.h> - AEST_IRQ_MODE_LEVEL (0),
>> +          AEST_IRQ_MODE_EDGE (1).
> 
> DT already has a way to define interrupt flags. Why invent something 
> new?

Ack, this flag is not needed for DT based systems, will remove this.

> 
> Rob

Thanks,
Umang




^ permalink raw reply

* [PATCH] clk: moxart: fix refcount leak
From: Alexander A. Klimov @ 2026-05-20 17:55 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Michael Turquette, Stephen Boyd,
	Brian Masney, Jonas Jensen, Mike Turquette,
	moderated list:ARM/MOXA ART SOC, open list:COMMON CLK FRAMEWORK,
	open list
  Cc: Alexander A. Klimov

Every value returned from of_clk_get() is supposed to be cleaned up
via clk_put() once not needed anymore.
The values here are used only for error checking,
but weren't cleaned up until now.

Fixes: c7bb4fc16ead ("clk: add MOXA ART SoCs clock driver")
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
---
 drivers/clk/clk-moxart.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/clk/clk-moxart.c b/drivers/clk/clk-moxart.c
index 3786a0153ad1..7e191b1481bb 100644
--- a/drivers/clk/clk-moxart.c
+++ b/drivers/clk/clk-moxart.c
@@ -39,6 +39,7 @@ static void __init moxart_of_pll_clk_init(struct device_node *node)
 		pr_err("%pOF: of_clk_get failed\n", node);
 		return;
 	}
+	clk_put(ref_clk);
 
 	hw = clk_hw_register_fixed_factor(NULL, name, parent_name, 0, mul, 1);
 	if (IS_ERR(hw)) {
@@ -83,6 +84,7 @@ static void __init moxart_of_apb_clk_init(struct device_node *node)
 		pr_err("%pOF: of_clk_get failed\n", node);
 		return;
 	}
+	clk_put(pll_clk);
 
 	hw = clk_hw_register_fixed_factor(NULL, name, parent_name, 0, 1, div);
 	if (IS_ERR(hw)) {
-- 
2.54.0



^ permalink raw reply related

* Re: [PATCH v4 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices
From: Jason Gunthorpe @ 2026-05-20 17:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nicolin Chen, will, robin.murphy, bhelgaas, joro, praan, baolu.lu,
	kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
	linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
	linux-cxl, nirmoyd
In-Reply-To: <20260520174758.GA66039@bhelgaas>

On Wed, May 20, 2026 at 12:47:58PM -0500, Bjorn Helgaas wrote:

> I don't know enough about CXL to know what's behind the ATS
> requirement.  It sounds like it's more than a simple performance
> optimization.  If you happen to know the reason, it might be worth
> a short comment about that too.

At the core of this is underlying physical interconnect protocols that
only work with translated addresses.

Ie CXL.cache only has a definition for translated physical in its
protocol spec. The use of true physical only is due to the cache
coherence shootdown protocol..

It is why I suggested 'pci_translated_required()' earlier, there are a
few more than CXl.cache why a device might need translated physical
addresses only.

ATS is the only way for a device to get those addresses, so
ats_required is fine too, but it sort of glosses over what is driving
it.

Jason


^ permalink raw reply

* Re: [PATCH v4 2/3] PCI: Allow ATS to be always on for pre-CXL devices
From: Jason Gunthorpe @ 2026-05-20 17:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nicolin Chen, will, robin.murphy, bhelgaas, joro, praan, baolu.lu,
	kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
	linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
	linux-cxl, nirmoyd
In-Reply-To: <20260520175026.GA67027@bhelgaas>

On Wed, May 20, 2026 at 12:50:26PM -0500, Bjorn Helgaas wrote:
> On Sun, Apr 26, 2026 at 10:54:01PM -0700, Nicolin Chen wrote:
> > Some NVIDIA GPU/NIC devices, though they don't implement CXL config space,
> > have many CXL-like properties. Call this kind "pre-CXL".
> > 
> > Similar to CXL.cache capability, these pre-CXL devices also require the ATS
> > function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always on"
> > v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
> > ...
> 
> > +/* Some pre-CXL devices require ATS when it is IOMMU-bypassed */
> 
> I guess these devices are purely PCIe, with no actual CXL
> transactions, so a hint here about what leads to the ATS requirement
> would be useful. 

Not quite, they are not actually "real" PCIe devices, these are
devices that present to the system as PCIe, can issue CXL like .cache
operations and don't use PCIe electrical.

Jason


^ permalink raw reply

* Re: [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device
From: Jason Gunthorpe @ 2026-05-20 17:51 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Helgaas,
	Rafael J . Wysocki, Len Brown, Pranjal Shrivastava, Mostafa Saleh,
	Lu Baolu, Kevin Tian, linux-arm-kernel, iommu, linux-kernel,
	linux-acpi, linux-pci, vsethi, Shuai Xue
In-Reply-To: <ag1guf1fHg4fE7Lw@Asurada-Nvidia>

On Wed, May 20, 2026 at 12:20:25AM -0700, Nicolin Chen wrote:
> > > I see you suggest to treat the entire batch as ATS-broken. Just to
> > > confirm: without per-SID retry, that might falsely block a healthy
> > > device in the ATC batch, right? The driver now batches all ATC_INV
> > > commands via arm_smmu_invs_end_batch().
> > 
> > Yes, it is not good, but a giant complex series is not reviewable. So
> > I'd start with trashing all the devices, then come with a narrowing.
> 
> I can take that path for now and leave a FIXME.
> 
> Another option is to not batch multiple devices, until we support
> retry (which shouldn't be hard to add since we've already done the
> coding)?

That's an interesting idea, it undoes some of the meaningful
optimization we have recently done though :\

> > We cannot eliminate parallel ATS invalidation. Two threads could be
> > concurrently processing the invs list. So it has handle it, the driver
> > is going to have to tolerate a number of redundant error events.
> 
> OK. That sounds like we still need a flag or locking so that at
> least pci_disable_ats() would not be called again. I will see
> what I can do.

I think we can call pci_disable_ats() as many times as we want, we
mostly need the driver to merge multiple error notifications for the
same event.

> > It depends on the driver, mlx5 has a FLR RAS flow for instance.
> 
> I assume a driver like that would trigger FLR flow on its own?

Yes
 
> > A driver with a device that can blow up ATS should implement the FLR
> > flow if it wants automatic RAS. It requires driver co-ordination.
> 
> Or FLR via sysfs, which I have been doing...

Yes
 
> > But I wasn't thinking we can rely on existing AER events here, yes
> > probably there will be AERs associated with the device exploding so
> > badly it cannot do ATS, but also maybe not..
> 
> So, should I put the AER injection on hold for a future work? To
> be honest, I am still not very clear how AER injection could help
> here; or is it for a case where ATC times out while device isn't
> aware of any AER fault?

Right, if we don't get an AER fault then we should ensure the ATC is
surfaced, but you have a reasonable point that it isn't so likely the
get an ATC invalidation timeout without a corresponding related AER..

Still, I'd feel better if it is was definititive and we didn't rely on
this. This further points that the driver has to merge multiple error
notifications if it gets some AERs and a new "ATC ERROR" all for the
same key event.

Jason


^ permalink raw reply

* Re: [PATCH v22 08/13] mfd: core: Add firmware-node support to MFD cells
From: Shivendra Pratap @ 2026-05-20 17:51 UTC (permalink / raw)
  To: Bartosz Golaszewski, Lee Jones
  Cc: linux-pm, linux-kernel, linux-arm-msm, linux-arm-kernel,
	devicetree, Florian Fainelli, Krzysztof Kozlowski,
	Dmitry Baryshkov, Mukesh Ojha, Andre Draszik, Greg Kroah-Hartman,
	Kathiravan Thirumoorthy, Srinivas Kandagatla, Bartosz Golaszewski,
	Sebastian Reichel, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Daniel Lezcano, Christian Loehle, Ulf Hansson,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson,
	Konrad Dybcio, Arnd Bergmann, Souvik Chakravarty, Andy Yan,
	Matthias Brugger, John Stultz, Moritz Fischer, Sudeep Holla
In-Reply-To: <f604833d-b333-4514-91fa-3cf95f99f9e7@oss.qualcomm.com>



On 18-05-2026 22:11, Shivendra Pratap wrote:
> 
> 
> On 18-05-2026 14:27, Bartosz Golaszewski wrote:
>> On Thu, 14 May 2026 16:25:49 +0200, Shivendra Pratap
>> <shivendra.pratap@oss.qualcomm.com> said:
>>> MFD core has no way to register a child device using an explicit 
>>> firmware
>>> node. This prevents drivers from registering child nodes when those 
>>> nodes
>>> do not define a compatible string. One such example is the PSCI
>>> "reboot-mode" node, which omits a compatible string as it describes
>>> boot-states provided by the underlying firmware.
>>>
>>> Extend struct mfd_cell with a callback that allows drivers to provide an
>>> explicit firmware node. The node is added to the MFD child device during
>>> registration when none is assigned by device tree, ACPI, or software
>>> matching.
>>>
>>> Suggested-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
>>> Signed-off-by: Shivendra Pratap <shivendra.pratap@oss.qualcomm.com>
>>> ---
>>>   drivers/mfd/mfd-core.c   | 30 ++++++++++++++++++++++++++++++
>>>   include/linux/mfd/core.h | 14 ++++++++++++++
>>>   2 files changed, 44 insertions(+)
>>>
>>> diff --git a/drivers/mfd/mfd-core.c b/drivers/mfd/mfd-core.c
>>> index 
>>> 7aa32b90cf1eb7fa0a05bf3dc506e60a262c9850..cc2a2a924d6d3044e29a9f864b536ee325ed797b 100644
>>> --- a/drivers/mfd/mfd-core.c
>>> +++ b/drivers/mfd/mfd-core.c
>>> @@ -10,6 +10,7 @@
>>>   #include <linux/kernel.h>
>>>   #include <linux/platform_device.h>
>>>   #include <linux/acpi.h>
>>> +#include <linux/fwnode.h>
>>>   #include <linux/list.h>
>>>   #include <linux/property.h>
>>>   #include <linux/mfd/core.h>
>>> @@ -148,6 +149,11 @@ static int mfd_match_of_node_to_dev(struct 
>>> platform_device *pdev,
>>>       return 0;
>>>   }
>>>
>>> +static void mfd_child_fwnode_put(void *data)
>>> +{
>>> +    fwnode_handle_put(data);
>>> +}
>>
>> Ah, this seems to answer my previous question, but...
>>
>>> +
>>>   static int mfd_add_device(struct device *parent, int id,
>>>                 const struct mfd_cell *cell,
>>>                 struct resource *mem_base,
>>> @@ -156,6 +162,7 @@ static int mfd_add_device(struct device *parent, 
>>> int id,
>>>       struct resource *res;
>>>       struct platform_device *pdev;
>>>       struct mfd_of_node_entry *of_entry, *tmp;
>>> +    struct fwnode_handle *fwnode;
>>>       bool disabled = false;
>>>       int ret = -ENOMEM;
>>>       int platform_id;
>>> @@ -224,6 +231,29 @@ static int mfd_add_device(struct device *parent, 
>>> int id,
>>>
>>>       mfd_acpi_add_device(cell, pdev);
>>>
>>> +    if (!pdev->dev.fwnode && cell->get_child_fwnode) {
>>> +        fwnode = cell->get_child_fwnode(parent);
>>> +        if (fwnode) {
>>> +            device_set_node(&pdev->dev, fwnode);
>>> +
>>> +            /*
>>> +             * platform_device_release() drops only of_node refs.
>>
>> Which is a separate problem we're discussing elsewhere. It should 
>> probably drop
>> the fwnode reference it holds, not the one of of_node.
>>
>>> +             * Track non-OF fwnodes explicitly so they are put on
>>> +             * all teardown paths.
>>> +             */
>>> +            if (!to_of_node(fwnode)) {
>>> +                ret = devm_add_action(&pdev->dev,
>>> +                              mfd_child_fwnode_put,
>>> +                              fwnode);
>>
>> What if the device never gets bound to the driver? The release will 
>> never be
>> called, this is why it's wrong to schedule devres actions for unbound 
>> devices
>> and one of the reasons for patch 1 in this series.
>>
>> What I suggest for now is: in tear-down path: see if the cell has the
>> get_child_fwnode() callback and - if so - drop the reference. Add a 
>> big, fat
>> comment saying that this must be removed if we decide to switch to 
>> dropping the
>> device's fwnode reference in platform driver core which may happen soon.
> 
> Ack. sure. lets me work it out.

Hi Lee,

While planning to address this for the next spin, it would be helpful if 
you could review and share any additional comments that should be taken 
care in next spin.

thanks,
Shivendra


^ permalink raw reply

* Re: [PATCH v4 2/3] PCI: Allow ATS to be always on for pre-CXL devices
From: Bjorn Helgaas @ 2026-05-20 17:50 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, will, robin.murphy, bhelgaas, joro, praan, baolu.lu,
	kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
	linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
	linux-cxl, nirmoyd
In-Reply-To: <1a8cf5e88051ab5c10417edb94df598ecbc810cf.1777269009.git.nicolinc@nvidia.com>

On Sun, Apr 26, 2026 at 10:54:01PM -0700, Nicolin Chen wrote:
> Some NVIDIA GPU/NIC devices, though they don't implement CXL config space,
> have many CXL-like properties. Call this kind "pre-CXL".
> 
> Similar to CXL.cache capability, these pre-CXL devices also require the ATS
> function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always on"
> v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
> ...

> +/* Some pre-CXL devices require ATS when it is IOMMU-bypassed */

I guess these devices are purely PCIe, with no actual CXL
transactions, so a hint here about what leads to the ATS requirement
would be useful.  It sounds like an actual functional requirement, not
just a performance optimization.

> +bool pci_dev_specific_ats_always_on(struct pci_dev *pdev)
> +{
> +	const struct pci_dev_ats_always_on *i;
> +
> +	for (i = pci_dev_ats_always_on; i->vendor; i++) {
> +		if (i->vendor != pdev->vendor)
> +			continue;
> +		if (i->ats_always_on && i->ats_always_on(pdev))
> +			return true;
> +		if (!i->ats_always_on && i->device == pdev->device)
> +			return true;
> +	}
> +
> +	return false;
> +}
>  #endif /* CONFIG_PCI_ATS */
>  
>  /* Freescale PCIe doesn't support MSI in RC mode */
> -- 
> 2.43.0
> 


^ permalink raw reply

* Re: [PATCH v4 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices
From: Bjorn Helgaas @ 2026-05-20 17:47 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jason Gunthorpe, will, robin.murphy, bhelgaas, joro, praan,
	baolu.lu, kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
	linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
	linux-cxl, nirmoyd
In-Reply-To: <ag3vb4SWIfuw++vp@Asurada-Nvidia>

On Wed, May 20, 2026 at 10:29:19AM -0700, Nicolin Chen wrote:
> On Wed, May 20, 2026 at 11:20:43AM -0300, Jason Gunthorpe wrote:
> > On Tue, May 19, 2026 at 06:04:18PM -0700, Nicolin Chen wrote:
> > 
> > > > Yeah, that's fair, so let's rename it to 
> > > > 
> > > > pci_translated_required()
> > > > 
> > > > ie the device requires translated requests to function. This is what
> > > > CXL.cache implies (IIRC I was told the spec specifically says this)
> > > > 
> > > > Requiring translated requests implies you have to enable ATS in the
> > > > system.
> > > 
> > > Perhaps we could let IOMMU drivers check:
> > >   pci_cxl_is_cache_capable() || pci_dev_specific_is_pre_cxl()
> > > directly?
> > 
> > I'd rather have a single function.
> 
> OK. Can we use pci_ats_required()?
> 
> CXL spec explicitly used "ATS" when stating the requirement of
> CXL.cache). And it'd fit into the existing pci_ats_ functions.

OK by me.

You already have a comment in the code about the CXL.cache
requirement; thanks for that.

I don't know enough about CXL to know what's behind the ATS
requirement.  It sounds like it's more than a simple performance
optimization.  If you happen to know the reason, it might be worth
a short comment about that too.

Please add a one-line comment in the code about why we check
Cache_Capable instead of Cache_Enable, i.e., even if CXL.cache is not
enabled now, it may be enabled later.

Bjorn


^ permalink raw reply

* Re: [PATCH] Documentation: KVM: Document guest-visible compatibility expectations
From: Oliver Upton @ 2026-05-20 17:47 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Paolo Bonzini, Marc Zyngier, Will Deacon, Jonathan Corbet,
	Shuah Khan, kvm, Linux Doc Mailing List,
	Kernel Mailing List, Linux, Sean Christopherson, Jim Mattson,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Raghavendra Rao Ananta, Eric Auger, Kees Cook, Arnd Bergmann,
	Nathan Chancellor, linux-arm-kernel, kvmarm, linux-kselftest
In-Reply-To: <add71b6f61edc6357e1fddad83273b2cba697d10.camel@infradead.org>

On Wed, May 20, 2026 at 12:33:52AM +0100, David Woodhouse wrote:
> On Tue, 2026-05-19 at 15:57 -0700, Oliver Upton wrote:
> > What ifs and maybes do not meet the bar, in my opinion, for preserving
> > bug emulation in KVM. Of course there could be a little flexibility with
> > that but we need to have some way of discriminating between bug fixes
> > and genuine guest expectations around the behavior of virtual hardware.
> 
> I believe you have this completely backwards.

No, I really don't.

Leaving every bugfix that could _possibly_ have a guest-visible impact
subject to drive-by scrutiny many years after the dust has settled is
not an acceptable working dynamic. Especially since it would appear
that the rest of the ecosystem has long since moved on from this
particular issue.

If this matters to you so deeply then please, be part of the solution
instead. You may find that reviewing patches leads to better outcomes
than getting belligerent with the arm64 folks every time you guys
decide to rebase your kernel. Hell, hypotheticals actually have a lot
more weight in the context of a review. And if your testing is extensive
enough to catch these sort of subtleties, don't you think it's better
done against mainline?

Maybe it's just me but I am left feeling disappointed that we all
haven't found a productive way of working together. I've tried to bridge
the gap here; we obviously need to do something that at least fixes the
UAPI breakage. Although apparently we don't even care to meet that low
of bar.

> A stable and mature platform doesn't get to play in its ivory tower and
> randomly inflict breakage on guests because they "deserve it".

Really? Aren't you asking for us to emulate something completely broken
for you?

Thanks,
Oliver


^ permalink raw reply

* Re: [PATCH v3 1/2] media: nxp: imx8-isi: Fix potential out-of-bounds issues
From: Laurent Pinchart @ 2026-05-20 17:41 UTC (permalink / raw)
  To: Guoniu Zhou
  Cc: Mauro Carvalho Chehab, Shawn Guo, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Stefan Riedmueller,
	Jacopo Mondi, Christian Hemp, Frank Li, Dong Aisheng, linux-media,
	imx, linux-arm-kernel, linux-kernel, Guoniu Zhou, stable
In-Reply-To: <20260323-isi-v3-1-8df53b24e622@oss.nxp.com>

On Mon, Mar 23, 2026 at 04:33:30PM +0800, Guoniu Zhou wrote:
> From: Guoniu Zhou <guoniu.zhou@nxp.com>
> 
> The maximum downscaling factor supported by ISI can be up to 16. Add
> minimum value constraint before applying the setting to hardware.
> Otherwise, the process will not respond even when Ctrl+C is executed.
> 
> Fixes: cf21f328fcaf ("media: nxp: Add i.MX8 ISI driver")
> Cc: stable@vger.kernel.org
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com>
> ---
> Changes in v3:
> - Replace CLAMP_DOWNSCALE_16 macro with inline function
> - Adjust downscale threshold from 0x4000 to 0x2000
> - Clarify downscaling limit in comment
> ---
>  drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h | 16 ++++++++++++++++
>  drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c   |  2 +-
>  drivers/media/platform/nxp/imx8-isi/imx8-isi-m2m.c  | 11 ++++++++---
>  drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c | 13 ++++++++-----
>  4 files changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h b/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h
> index 3cbd35305af0..822466445b72 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-core.h
> @@ -11,6 +11,7 @@
>  #define __MXC_ISI_CORE_H__
>  
>  #include <linux/list.h>
> +#include <linux/math.h>
>  #include <linux/mutex.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
> @@ -413,4 +414,19 @@ static inline void mxc_isi_debug_cleanup(struct mxc_isi_dev *isi)
>  }
>  #endif
>  
> +/*
> + * ISI scaling engine works in two parts: it performs pre-decimation of
> + * the image followed by bilinear filtering to achieve the desired
> + * downscaling factor.
> + *
> + * The decimation filter provides a maximum downscaling factor of 8, and
> + * the subsequent bilinear filter provides a maximum downscaling factor
> + * of 2. Combined, the maximum scaling factor can be up to 16.
> + */
> +static inline unsigned int
> +mxc_isi_clamp_downscale_16(unsigned int val, unsigned int max_val)
> +{
> +	return clamp(val, max(1U, DIV_ROUND_UP(max_val, 16)), max_val);
> +}
> +
>  #endif /* __MXC_ISI_CORE_H__ */
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
> index 9225a7ac1c3e..37e59d687ed7 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-hw.c
> @@ -11,7 +11,7 @@
>  #include "imx8-isi-core.h"
>  #include "imx8-isi-regs.h"
>  
> -#define	ISI_DOWNSCALE_THRESHOLD		0x4000
> +#define	ISI_DOWNSCALE_THRESHOLD		0x2000

This should be split to a separate patch as it's a separate fix.

>  static inline u32 mxc_isi_read(struct mxc_isi_pipe *pipe, u32 reg)
>  {
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-m2m.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-m2m.c
> index a39ad7a1ab18..a0e2061f4344 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-m2m.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-m2m.c
> @@ -508,10 +508,15 @@ __mxc_isi_m2m_try_fmt_vid(struct mxc_isi_m2m_ctx *ctx,
>  			  struct v4l2_pix_format_mplane *pix,
>  			  const enum mxc_isi_video_type type)
>  {
> +	const struct v4l2_pix_format_mplane *format =
> +		&ctx->queues.out.format;

This can go in the 'if' below.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

I'll take this patch without the ISI_DOWNSCALE_THRESHOLD change in my
tree, and will send a separate patch for ISI_DOWNSCALE_THRESHOLD.

> +
>  	if (type == MXC_ISI_VIDEO_M2M_CAP) {
> -		/* Downscaling only  */
> -		pix->width = min(pix->width, ctx->queues.out.format.width);
> -		pix->height = min(pix->height, ctx->queues.out.format.height);
> +		/* Downscaling only, by up to 16. */
> +		pix->width = mxc_isi_clamp_downscale_16(pix->width,
> +							format->width);
> +		pix->height = mxc_isi_clamp_downscale_16(pix->height,
> +							 format->height);
>  	}
>  
>  	return mxc_isi_format_try(ctx->m2m->pipe, pix, type);
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c
> index a41c51dd9ce0..b290821d03d2 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c
> @@ -641,16 +641,19 @@ static int mxc_isi_pipe_set_selection(struct v4l2_subdev *sd,
>  			/* Composing is supported on the sink only. */
>  			return -EINVAL;
>  
> -		/* The sink crop is bound by the sink format downscaling only). */
> +		/*
> +		 * The ISI supports downscaling only, with a factor up to 16.
> +		 * Clamp the compose rectangle size accordingly.
> +		 */
>  		format = mxc_isi_pipe_get_pad_format(pipe, state,
>  						     MXC_ISI_PIPE_PAD_SINK);
>  
>  		sel->r.left = 0;
>  		sel->r.top = 0;
> -		sel->r.width = clamp(sel->r.width, MXC_ISI_MIN_WIDTH,
> -				     format->width);
> -		sel->r.height = clamp(sel->r.height, MXC_ISI_MIN_HEIGHT,
> -				      format->height);
> +		sel->r.width = mxc_isi_clamp_downscale_16(sel->r.width,
> +							  format->width);
> +		sel->r.height = mxc_isi_clamp_downscale_16(sel->r.height,
> +							   format->height);
>  
>  		rect = mxc_isi_pipe_get_pad_compose(pipe, state,
>  						    MXC_ISI_PIPE_PAD_SINK);
> 

-- 
Regards,

Laurent Pinchart


^ permalink raw reply

* Re: [PATCH v4 1/3] PCI: Allow ATS to be always on for CXL.cache capable devices
From: Nicolin Chen @ 2026-05-20 17:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, will, robin.murphy, bhelgaas, joro, praan,
	baolu.lu, kevin.tian, miko.lenczewski, linux-arm-kernel, iommu,
	linux-kernel, linux-pci, dan.j.williams, jonathan.cameron, vsethi,
	linux-cxl, nirmoyd
In-Reply-To: <20260520142043.GT3602937@nvidia.com>

On Wed, May 20, 2026 at 11:20:43AM -0300, Jason Gunthorpe wrote:
> On Tue, May 19, 2026 at 06:04:18PM -0700, Nicolin Chen wrote:
> 
> > > Yeah, that's fair, so let's rename it to 
> > > 
> > > pci_translated_required()
> > > 
> > > ie the device requires translated requests to function. This is what
> > > CXL.cache implies (IIRC I was told the spec specifically says this)
> > > 
> > > Requiring translated requests implies you have to enable ATS in the
> > > system.
> > 
> > Perhaps we could let IOMMU drivers check:
> >   pci_cxl_is_cache_capable() || pci_dev_specific_is_pre_cxl()
> > directly?
> 
> I'd rather have a single function.

OK. Can we use pci_ats_required()?

CXL spec explicitly used "ATS" when stating the requirement of
CXL.cache). And it'd fit into the existing pci_ats_ functions.

Nicolin


^ permalink raw reply

* Re: [PATCH v3 2/2] media: nxp: imx8-isi: Prioritize pending buffers over discard buffers
From: Laurent Pinchart @ 2026-05-20 17:18 UTC (permalink / raw)
  To: Guoniu Zhou
  Cc: Mauro Carvalho Chehab, Frank Li, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Stefan Riedmueller,
	Jacopo Mondi, Christian Hemp, linux-media, imx, linux-arm-kernel,
	linux-kernel, Alexi Birlinger, Dong Aisheng, Guoniu Zhou
In-Reply-To: <20260520171026.GA10336@killaraus.ideasonboard.com>

On Wed, May 20, 2026 at 07:10:27PM +0200, Laurent Pinchart wrote:
> Hello Guoniu,
> 
> Thank you for the patch.
> 
> On Fri, Mar 20, 2026 at 02:42:02PM +0800, Guoniu Zhou wrote:
> > From: Guoniu Zhou <guoniu.zhou@nxp.com>
> > 
> > The number of times to use the discard buffer is determined by the
> > out_pending list size:
> > 
> >   discard = list_empty(&video->out_pending) ? 2
> >             : list_is_singular(&video->out_pending) ? 1
> > 	    : 0;
> > 
> > In the current buffer selection logic, when both discard and pending
> > buffers are available, the driver fills hardware slots with discard
> > buffers first which results in an unnecessary frame drop even though
> > a user buffer was queued and ready.
> > 
> > Change the buffer selection logic to use pending buffers first (up to
> > the number available), and only use discard buffers to fill remaining
> > slots when insufficient pending buffers are queued.
> > 
> > This improves behavior by:
> > - Reducing discarded frames at stream start when user buffers are ready
> > - Decreasing latency in delivering captured frames to user-space
> > - Ensuring user buffers are utilized as soon as they are queued
> > - Improving overall buffer utilization efficiency
> 
> There's a bit of repeat here, but that's OK.
> 
> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> 
> > Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com>
> > ---
> > Changes in v3:
> > - Expanded commit message to explain the problem in current driver and the
> >   benefits gained from this change
> > - No code changes
> > 
> > Changes in v2:
> > - Replace "This ensures" with "ensure"
> > - Put example from commit message to comment in driver suggested by Frank
> >   https://lore.kernel.org/linux-media/20260311-isi_min_buffers-v1-0-c9299d6e8ae6@nxp.com/T/#m2774912ed31553ef1fdcc840bd6eae53a03ecccd
> > ---
> >  drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c
> > index 1be3a728f32f..77ebff03323a 100644
> > --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c
> > +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c
> > @@ -792,7 +792,14 @@ static void mxc_isi_video_queue_first_buffers(struct mxc_isi_video *video)
> >  		struct mxc_isi_buffer *buf;
> >  		struct list_head *list;
> >  
> > -		list = i < discard ? &video->out_discard : &video->out_pending;
> > +		/*
> > +		 * Queue buffers: prioritize pending buffers, then discard buffers
> > +		 * For example:
> > +		 * - 2 pending buffers: both slots use pending buffers
> > +		 * - 1 pending buffer: first slot uses pending, second uses discard
> > +		 * - 0 pending buffers: both slots use discard buffers
> > +		 */

Having a second look, I think this is a bit too verbose. Unless you
object, I'll write

		 * Queue buffers: prioritize pending buffers, then discard
		 * buffers.

No need to submit a new version.

> > +		list = (i < 2 - discard) ? &video->out_pending : &video->out_discard;
> >  		buf = list_first_entry(list, struct mxc_isi_buffer, list);
> >  
> >  		mxc_isi_channel_set_outbuf(video->pipe, buf->dma_addrs, buf_id);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply

* Re: [PATCH v3 2/2] media: nxp: imx8-isi: Prioritize pending buffers over discard buffers
From: Laurent Pinchart @ 2026-05-20 17:10 UTC (permalink / raw)
  To: Guoniu Zhou
  Cc: Mauro Carvalho Chehab, Frank Li, Sascha Hauer,
	Pengutronix Kernel Team, Fabio Estevam, Stefan Riedmueller,
	Jacopo Mondi, Christian Hemp, linux-media, imx, linux-arm-kernel,
	linux-kernel, Alexi Birlinger, Dong Aisheng, Guoniu Zhou
In-Reply-To: <20260320-isi_min_buffers-v3-2-66e0fabccca3@oss.nxp.com>

Hello Guoniu,

Thank you for the patch.

On Fri, Mar 20, 2026 at 02:42:02PM +0800, Guoniu Zhou wrote:
> From: Guoniu Zhou <guoniu.zhou@nxp.com>
> 
> The number of times to use the discard buffer is determined by the
> out_pending list size:
> 
>   discard = list_empty(&video->out_pending) ? 2
>             : list_is_singular(&video->out_pending) ? 1
> 	    : 0;
> 
> In the current buffer selection logic, when both discard and pending
> buffers are available, the driver fills hardware slots with discard
> buffers first which results in an unnecessary frame drop even though
> a user buffer was queued and ready.
> 
> Change the buffer selection logic to use pending buffers first (up to
> the number available), and only use discard buffers to fill remaining
> slots when insufficient pending buffers are queued.
> 
> This improves behavior by:
> - Reducing discarded frames at stream start when user buffers are ready
> - Decreasing latency in delivering captured frames to user-space
> - Ensuring user buffers are utilized as soon as they are queued
> - Improving overall buffer utilization efficiency

There's a bit of repeat here, but that's OK.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> Signed-off-by: Guoniu Zhou <guoniu.zhou@nxp.com>
> ---
> Changes in v3:
> - Expanded commit message to explain the problem in current driver and the
>   benefits gained from this change
> - No code changes
> 
> Changes in v2:
> - Replace "This ensures" with "ensure"
> - Put example from commit message to comment in driver suggested by Frank
>   https://lore.kernel.org/linux-media/20260311-isi_min_buffers-v1-0-c9299d6e8ae6@nxp.com/T/#m2774912ed31553ef1fdcc840bd6eae53a03ecccd
> ---
>  drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c b/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c
> index 1be3a728f32f..77ebff03323a 100644
> --- a/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c
> +++ b/drivers/media/platform/nxp/imx8-isi/imx8-isi-video.c
> @@ -792,7 +792,14 @@ static void mxc_isi_video_queue_first_buffers(struct mxc_isi_video *video)
>  		struct mxc_isi_buffer *buf;
>  		struct list_head *list;
>  
> -		list = i < discard ? &video->out_discard : &video->out_pending;
> +		/*
> +		 * Queue buffers: prioritize pending buffers, then discard buffers
> +		 * For example:
> +		 * - 2 pending buffers: both slots use pending buffers
> +		 * - 1 pending buffer: first slot uses pending, second uses discard
> +		 * - 0 pending buffers: both slots use discard buffers
> +		 */
> +		list = (i < 2 - discard) ? &video->out_pending : &video->out_discard;
>  		buf = list_first_entry(list, struct mxc_isi_buffer, list);
>  
>  		mxc_isi_channel_set_outbuf(video->pipe, buf->dma_addrs, buf_id);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply

* Re: [PATCH v2] phy: exynos5-usbdrd: Remove error print for devm_add_action_or_reset()
From: Tudor Ambarus @ 2026-05-20 17:06 UTC (permalink / raw)
  To: Waqar Hameed, Vinod Koul, Kishon Vijay Abraham I,
	Krzysztof Kozlowski, Alim Akhtar
  Cc: kernel, linux-phy, linux-arm-kernel, linux-samsung-soc,
	linux-kernel, Peter Griffin, André Draszik, Juan Yescas
In-Reply-To: <pndjysynn55.a.out@axis.com>



On 5/20/26 5:58 PM, Waqar Hameed wrote:
> On Fri, Oct 10, 2025 at 15:39 +0200 Waqar Hameed <waqar.hameed@axis.com> wrote:
> 
>> On Tue, Aug 05, 2025 at 11:33 +0200 Waqar Hameed <waqar.hameed@axis.com> wrote:
> 
> [...]
> 
>> Friendly ping incoming!
> 
> Friendly ping 2 incoming!
> 
> This will be the last ping for the original patch [1]. I'll interpret
> the silence as a NACK (though it would preferable to get an explicit

The patch is fine, I think it just slipped through the cracks:

Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org>

> one...). All other patches in the previous patch series has been merged
> [2].
> 
> [1] https://lore.kernel.org/all/pnd34a6m7tc.a.out@axis.com/
> [2] https://lore.kernel.org/all/pnd7c0s6ji2.fsf@axis.com/
> 



^ permalink raw reply

* [PATCH rc v6 7/7] iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP_ADOPT in probe()
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

arm_smmu_device_hw_probe() runs before arm_smmu_init_structures(), so it's
natural to decide whether the kdump kernel must adopt the crashed kernel's
stream table.

Given that memremap is used to adopt the old stream table, set this option
only on a coherent SMMU.

And make sure SMMU isn't in Service Failure Mode.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 31 +++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 851bcebfdb3d4..fb34c3ffee9fe 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5353,6 +5353,33 @@ static void arm_smmu_get_httu(struct arm_smmu_device *smmu, u32 reg)
 			  hw_features, fw_features);
 }
 
+static void arm_smmu_device_hw_probe_kdump(struct arm_smmu_device *smmu)
+{
+	u32 gerror, gerrorn, active;
+
+	/* No adoption if SMMU is disabled (i.e., there is no in-flight DMA) */
+	if (!(readl_relaxed(smmu->base + ARM_SMMU_CR0) & CR0_SMMUEN))
+		return;
+
+	/* For now, only support a coherent SMMU that works with MEMREMAP_WB */
+	if (!(smmu->features & ARM_SMMU_FEAT_COHERENCY)) {
+		dev_warn(smmu->dev,
+			 "kdump: non-coherent SMMU unsupported; reset to block all DMAs\n");
+		return;
+	}
+
+	gerror = readl_relaxed(smmu->base + ARM_SMMU_GERROR);
+	gerrorn = readl_relaxed(smmu->base + ARM_SMMU_GERRORN);
+	active = gerror ^ gerrorn;
+	if (active & GERROR_SFM_ERR) {
+		dev_warn(smmu->dev,
+			 "kdump: SMMU in Service Failure Mode, must reset\n");
+		return;
+	}
+
+	smmu->options |= ARM_SMMU_OPT_KDUMP_ADOPT;
+}
+
 static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 {
 	u32 reg;
@@ -5567,6 +5594,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 
 	dev_info(smmu->dev, "oas %lu-bit (features 0x%08x)\n",
 		 smmu->oas, smmu->features);
+
+	if (is_kdump_kernel())
+		arm_smmu_device_hw_probe_kdump(smmu);
+
 	return 0;
 }
 
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v6 3/7] iommu/arm-smmu-v3: Do not enable EVTQ/PRIQ interrupts in kdump kernel
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

In kdump cases, the crashed kernel's CDs and page tables can be corrupted,
which could trigger event spamming. Also, we cannot serve page requests.

Skip the IRQ setup for EVTQ/PRIQ in arm_smmu_setup_irqs().

Skip their IRQ handler registration in unique-IRQ and combined-IRQ cases.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 58 ++++++++++++++-------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2d7eb42449eaf..e00b28e36f9c4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2464,7 +2464,11 @@ static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev)
 
 static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
 {
-	arm_smmu_gerror_handler(irq, dev);
+	irqreturn_t ret = arm_smmu_gerror_handler(irq, dev);
+
+	/* In kdump, EVTQ/PRIQ are disabled and there is no thread to wake */
+	if (is_kdump_kernel())
+		return ret;
 	return IRQ_WAKE_THREAD;
 }
 
@@ -4963,6 +4967,21 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
 	arm_smmu_setup_msis(smmu);
 
 	/* Request interrupt lines */
+	irq = smmu->gerr_irq;
+	if (irq) {
+		ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
+				       0, "arm-smmu-v3-gerror", smmu);
+		if (ret < 0)
+			dev_warn(smmu->dev, "failed to enable gerror irq\n");
+	} else {
+		dev_warn(smmu->dev,
+			 "no gerr irq - errors will not be reported!\n");
+	}
+
+	/* No EVTQ/PRIQ interrupts in kdump -- queues are disabled */
+	if (is_kdump_kernel())
+		return;
+
 	irq = smmu->evtq.q.irq;
 	if (irq) {
 		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
@@ -4975,16 +4994,6 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
 		dev_warn(smmu->dev, "no evtq irq - events will not be reported!\n");
 	}
 
-	irq = smmu->gerr_irq;
-	if (irq) {
-		ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler,
-				       0, "arm-smmu-v3-gerror", smmu);
-		if (ret < 0)
-			dev_warn(smmu->dev, "failed to enable gerror irq\n");
-	} else {
-		dev_warn(smmu->dev, "no gerr irq - errors will not be reported!\n");
-	}
-
 	if (smmu->features & ARM_SMMU_FEAT_PRI) {
 		irq = smmu->priq.q.irq;
 		if (irq) {
@@ -5005,7 +5014,7 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
 static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 {
 	int ret, irq;
-	u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
+	u32 irqen_flags = IRQ_CTRL_GERROR_IRQEN;
 
 	/* Disable IRQs first */
 	ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
@@ -5020,19 +5029,30 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 		/*
 		 * Cavium ThunderX2 implementation doesn't support unique irq
 		 * lines. Use a single irq line for all the SMMUv3 interrupts.
+		 *
+		 * In kdump, EVTQ/PRIQ are disabled, so no threaded handling.
 		 */
-		ret = devm_request_threaded_irq(smmu->dev, irq,
-					arm_smmu_combined_irq_handler,
-					arm_smmu_combined_irq_thread,
-					IRQF_ONESHOT,
-					"arm-smmu-v3-combined-irq", smmu);
+		if (is_kdump_kernel())
+			ret = devm_request_irq(smmu->dev, irq,
+					       arm_smmu_combined_irq_handler, 0,
+					       "arm-smmu-v3-combined-irq",
+					       smmu);
+		else
+			ret = devm_request_threaded_irq(
+				smmu->dev, irq, arm_smmu_combined_irq_handler,
+				arm_smmu_combined_irq_thread, IRQF_ONESHOT,
+				"arm-smmu-v3-combined-irq", smmu);
 		if (ret < 0)
 			dev_warn(smmu->dev, "failed to enable combined irq\n");
 	} else
 		arm_smmu_setup_unique_irqs(smmu);
 
-	if (smmu->features & ARM_SMMU_FEAT_PRI)
-		irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
+	/* No EVTQ/PRIQ IRQ generation in kdump -- queues are disabled */
+	if (!is_kdump_kernel()) {
+		irqen_flags |= IRQ_CTRL_EVTQ_IRQEN;
+		if (smmu->features & ARM_SMMU_FEAT_PRI)
+			irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
+	}
 
 	/* Enable interrupt generation on the SMMU */
 	ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v6 6/7] iommu/arm-smmu-v3: Skip RMR bypass for kdump adoption
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

RMR bypass STEs are installed during SMMUv3 probe for StreamIDs listed by
IORT RMR nodes. A normal boot switches the driver to a fresh stream table
whose initial STEs abort, so those RMR SIDs need bypass entries before it
becomes live. This preserves firmware/guest-owned traffic, including vSMMU
guest MSI cases built around RMR-described SIDs.

ARM_SMMU_OPT_KDUMP_ADOPT is the opposite case: the driver keeps SMMUEN set
and adopts the crashed kernel's stream table, so RMR SIDs already have the
only translation state known to be safe for active in-flight DMA. Replacing
an adopted STE with bypass can turn translated DMA into physical DMA, then
point it at the wrong memory.

arm_smmu_make_bypass_ste() also rewrites the STE in place after clearing it
first. While the table is live, a concurrent hardware STE fetch can observe
V=0 or mixed old/new state.

Leaving the adopted STE unmodified keeps the kdump kernel using the crashed
kernel's translation. That gives the endpoint driver a chance to probe and
quiesce the device.

If the old STE was already abort or invalid, installing bypass would create
new DMA permission; leaving it alone is a safer failure mode. Later domain
setup still gets the RMR direct mappings through the reserved-region path.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Assisted-by: Codex:gpt-5.5
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f9220c007ad25..851bcebfdb3d4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5731,6 +5731,14 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 	struct list_head rmr_list;
 	struct iommu_resv_region *e;
 
+	/*
+	 * Kdump adoption keeps the crashed kernel's table live. Rewriting the
+	 * adopted STE here could expose an in-flight fetch to a transient V=0
+	 * entry, or change Cfg=translate to Cfg=bypass. Must skip here.
+	 */
+	if (smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT)
+		return;
+
 	INIT_LIST_HEAD(&rmr_list);
 	iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
 
@@ -5747,10 +5755,7 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 				continue;
 			}
 
-			/*
-			 * STE table is not programmed to HW, see
-			 * arm_smmu_initial_bypass_stes()
-			 */
+			/* The fresh stream table is not yet live. */
 			arm_smmu_make_bypass_ste(smmu,
 				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
 		}
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v6 5/7] iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

When ARM_SMMU_OPT_KDUMP_ADOPT is detected, do not disable SMMUEN and skip
the CR1/CR2/STRTAB_BASE update sequence in arm_smmu_device_reset(). Those
register writes are all CONSTRAINED UNPREDICTABLE while CR0_SMMUEN==1, so
leaving them intact lets in-flight DMAs continue to be translated by the
adopted stream table.

Initialize 'enables' to 0 so it can carry CR0_SMMUEN in kdump case. Then,
preserve that when enabling the command queue.

Clear latched gerror bits if necessary.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 +++++++++++++++++++--
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3f22949391c82..f9220c007ad25 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5101,11 +5101,28 @@ static void arm_smmu_write_strtab(struct arm_smmu_device *smmu)
 static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
 	int ret;
-	u32 reg, enables;
+	u32 reg, enables = 0;
 	struct arm_smmu_cmdq_ent cmd;
 
-	/* Clear CR0 and sync (disables SMMU and queue processing) */
 	reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
+
+	/*
+	 * In a kdump case (set when CR0_SMMUEN=1 and !GERROR_SFM_ERR), retain
+	 * CR0_SMMUEN to avoid aborting in-flight DMA, and CR0_ATSCHK to carry
+	 * on the ATS-check policy.
+	 *
+	 * According to spec, updating STRTAB_BASE/CR1/CR2 when CR0_SMMUEN=1 is
+	 * CONSTRAINED UNPREDICTABLE. So, skip those register updates and rely
+	 * on the adopted stream table from the crashed kernel.
+	 */
+	if (smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT) {
+		dev_info(smmu->dev,
+			 "kdump: retaining SMMUEN for in-flight DMA\n");
+		enables = reg & (CR0_SMMUEN | CR0_ATSCHK);
+		goto reset_queues;
+	}
+
+	/* Clear CR0 and sync (disables SMMU and queue processing) */
 	if (reg & CR0_SMMUEN) {
 		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
 		arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
@@ -5135,12 +5152,36 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Stream table */
 	arm_smmu_write_strtab(smmu);
 
+reset_queues:
+	if (smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT) {
+		/* Disable queues since arm_smmu_device_disable() was skipped */
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to disable queues\n");
+			return ret;
+		}
+	}
+
+	/*
+	 * GERROR bits are latched. Read after queue disabling so that unhandled
+	 * errors would be visible. Ack everything prior to re-enabling the CMDQ
+	 * as a stale CMDQ_ERR would halt the CMDQ and new command will timeout.
+	 */
+	if (is_kdump_kernel()) {
+		u32 gerror = readl_relaxed(smmu->base + ARM_SMMU_GERROR);
+		u32 gerrorn = readl_relaxed(smmu->base + ARM_SMMU_GERRORN);
+
+		if ((gerror ^ gerrorn) & GERROR_ERR_MASK)
+			writel(gerror, smmu->base + ARM_SMMU_GERRORN);
+	}
+
 	/* Command queue */
 	writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
 	writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
 	writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);
 
-	enables = CR0_CMDQEN;
+	enables |= CR0_CMDQEN;
 	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
 				      ARM_SMMU_CR0ACK);
 	if (ret) {
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v6 4/7] iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

In kdump cases, the crashed kernel's CDs and page tables can be corrupted,
which could trigger event spamming. Also, we cannot serve page requests.

Skip the EVTQ/PRIQ setup entirely rather than enabling then disabling them.

Also add some inline comments explaining that.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 43 +++++++++++++--------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e00b28e36f9c4..3f22949391c82 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -5161,21 +5161,35 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	cmd.opcode = CMDQ_OP_TLBI_NSNH_ALL;
 	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 
-	/* Event queue */
-	writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE);
-	writel_relaxed(smmu->evtq.q.llq.prod, smmu->page1 + ARM_SMMU_EVTQ_PROD);
-	writel_relaxed(smmu->evtq.q.llq.cons, smmu->page1 + ARM_SMMU_EVTQ_CONS);
-
-	enables |= CR0_EVTQEN;
-	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
-				      ARM_SMMU_CR0ACK);
-	if (ret) {
-		dev_err(smmu->dev, "failed to enable event queue\n");
-		return ret;
+	/*
+	 * Event queue
+	 *
+	 * Do not enable in a kdump case, as the crashed kernel's CDs and page
+	 * tables might be corrupted, triggering event spamming.
+	 */
+	if (!is_kdump_kernel()) {
+		writeq_relaxed(smmu->evtq.q.q_base,
+			       smmu->base + ARM_SMMU_EVTQ_BASE);
+		writel_relaxed(smmu->evtq.q.llq.prod,
+			       smmu->page1 + ARM_SMMU_EVTQ_PROD);
+		writel_relaxed(smmu->evtq.q.llq.cons,
+			       smmu->page1 + ARM_SMMU_EVTQ_CONS);
+
+		enables |= CR0_EVTQEN;
+		ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
+					      ARM_SMMU_CR0ACK);
+		if (ret) {
+			dev_err(smmu->dev, "failed to enable event queue\n");
+			return ret;
+		}
 	}
 
-	/* PRI queue */
-	if (smmu->features & ARM_SMMU_FEAT_PRI) {
+	/*
+	 * PRI queue
+	 *
+	 * Do not enable in a kdump case, as we cannot serve page requests.
+	 */
+	if (!is_kdump_kernel() && (smmu->features & ARM_SMMU_FEAT_PRI)) {
 		writeq_relaxed(smmu->priq.q.q_base,
 			       smmu->base + ARM_SMMU_PRIQ_BASE);
 		writel_relaxed(smmu->priq.q.llq.prod,
@@ -5208,9 +5222,6 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
 		return ret;
 	}
 
-	if (is_kdump_kernel())
-		enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
-
 	/* Enable the SMMU interface */
 	enables |= CR0_SMMUEN;
 	ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v6 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien

When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.

In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
   PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
   the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.

To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
flight DMA using the crashed kernel's page tables until the endpoint device
drivers probe and quiesce their respective hardware.

However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.

This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.

In this series:
 - Introduce an ARM_SMMU_OPT_KDUMP_ADOPT
 - Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
 - Skip EVENTQ/PRIQ setup including interrupts and their handlers
 - Memremap the crashed kernel's stream tables into the kdump kernel [*]
 - Defer any default domain attachment to retain STEs until device drivers
   explicitly request it.

[*] For verification reasons, this series only fixes coherent SMMUs.

For non-ARM_SMMU_OPT_KDUMP_ADOPT cases, keep a status quo since the commit
3f54c447df34f ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel"):
full reset followed by driver-initiated reattach, potentially rejecting any
in-flight DMA.

Note that the series requires Jason's work that was merged in v6.12: commit
85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
I have a backported version that is verified with a v6.8 kernel. I can send
if we see a strong need after this version is accepted.

This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v6

Changelog
v6
 * Rebase v7.1-rc3
 * Add Reviewed-by from Jason
 * Replace dma_addr_t with phys_addr_t
 * Drop arm_smmu_kdump_phys_is_corrupted()
 * Skip threaded IRQ handlers for EVTQ and PRIQ
 * Bypass arm_smmu_rmr_install_bypass_ste() in kdump case
 * Drop devm_ for adopt-time allocations; set up cleanup function via
   devm_add_action_or_reset()
v5
 https://lore.kernel.org/all/cover.1778416609.git.nicolinc@nvidia.com/
 * Add Reviewed-by from Kevin
 * Drop READ_ONCE on lazy-attach L1 read
 * Split "Skip EVTQ/PRIQ setup" into two patches
 * Tighten kdump probe comment and dev_warn message
 * Use MEM + BUSY in arm_smmu_kdump_phys_is_corrupted
v4
 https://lore.kernel.org/all/cover.1777446969.git.nicolinc@nvidia.com/
 * Rebase v7.1-rc1
 * s/arm_smmu_adopt/arm_smmu_kdump_adopt
 * Revert alloc/memremap/fmt on fallback
 * Reorder patches to avoid bisect regression
 * Use IRQ_NONE for spurious evtq/priq entries
 * Cap linear log2size by kdump's allocation bound
 * Defer clearing FEAT_2_LVL_STRTAB on linear adopt
 * Add arm_smmu_kdump_phys_is_corrupted() validation
 * Defer l2 stream table memremap till master inserts
 * Re-validate L1 desc on master insert with READ_ONCE
v3
 https://lore.kernel.org/all/cover.1777150307.git.nicolinc@nvidia.com/
 * s/OPT_KDUMP/OPT_KDUMP_ADOPT
 * Do not adopt if GERROR_SFM_ERR
 * Retain CR0_ATSCHK beside CR0_SMMUEN
 * Clear latched GERROR bits (e.g. CMDQ_ERR)
 * Assert ARM_SMMU_FEAT_COHERENCY in adopt functions
 * Add STE.Cfg check in arm_smmu_is_attach_deferred()
 * Fix validations on return codes from devm_memremap()
 * Sanitize crashed kernel register values in adopt functions
 * Drop unnecessary l2ptrs guard in arm_smmu_is_attach_deferred()
 * Don't enable PRIQ/EVTQ irqs and guard the irq functions for combined
   irq cases
v2
 https://lore.kernel.org/all/cover.1776286352.git.nicolinc@nvidia.com/
 * Add warning in non-coherent SMMU cases
 * Keep eventq/priq disabled vs. enabling-and-disabling-later
 * Check KDUMP option in the beginning of arm_smmu_device_reset()
 * Validate STRTAB format matches HW capability instead of forcing flags
v1:
 https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/

Nicolin Chen (7):
  iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump
  iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
  iommu/arm-smmu-v3: Do not enable EVTQ/PRIQ interrupts in kdump kernel
  iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
  iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
  iommu/arm-smmu-v3: Skip RMR bypass for kdump adoption
  iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP_ADOPT in probe()

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 470 ++++++++++++++++++--
 2 files changed, 426 insertions(+), 45 deletions(-)

-- 
2.43.0



^ permalink raw reply

* [PATCH rc v6 2/7] iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

Though the kdump kernel adopts the crashed kernel's stream table, the iommu
core will still try to attach each probed device to a default domain, which
overwrites the adopted STE and breaks in-flight DMA from that device.

Implement an is_attach_deferred() callback to prevent this. For each device
that has STE.V=1 and STE.Cfg!=Abort in the adopted table, defer the default
domain attachment, until the device driver explicitly requests it.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 +++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index aa6837a5daa88..2d7eb42449eaf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4268,6 +4268,29 @@ static void arm_smmu_remove_master(struct arm_smmu_master *master)
 	kfree(master->build_invs);
 }
 
+static bool arm_smmu_is_attach_deferred(struct device *dev)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct arm_smmu_device *smmu = master->smmu;
+	int i;
+
+	if (!(smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT))
+		return false;
+
+	for (i = 0; i < master->num_streams; i++) {
+		struct arm_smmu_ste *ste =
+			arm_smmu_get_step_for_sid(smmu, master->streams[i].id);
+		u64 ent0 = le64_to_cpu(ste->data[0]);
+
+		/* Defer only when there might be in-flight DMAs */
+		if ((ent0 & STRTAB_STE_0_V) &&
+		    FIELD_GET(STRTAB_STE_0_CFG, ent0) != STRTAB_STE_0_CFG_ABORT)
+			return true;
+	}
+
+	return false;
+}
+
 static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 {
 	int ret;
@@ -4430,6 +4453,7 @@ static const struct iommu_ops arm_smmu_ops = {
 	.hw_info		= arm_smmu_hw_info,
 	.domain_alloc_sva       = arm_smmu_sva_domain_alloc,
 	.domain_alloc_paging_flags = arm_smmu_domain_alloc_paging_flags,
+	.is_attach_deferred	= arm_smmu_is_attach_deferred,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-- 
2.43.0



^ permalink raw reply related

* [PATCH rc v6 1/7] iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump
From: Nicolin Chen @ 2026-05-20 17:03 UTC (permalink / raw)
  To: will, robin.murphy, jgg
  Cc: joro, praan, kees, baolu.lu, kevin.tian, miko.lenczewski,
	smostafa, linux-arm-kernel, iommu, linux-kernel, stable, jamien
In-Reply-To: <cover.1779265413.git.nicolinc@nvidia.com>

When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.

In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
   PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
   the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.

To safely absorb in-flight DMAs, a kdump kernel will have to leave SMMUEN=1
intact and avoid modifying STRTAB_BASE, allowing HW to continue translating
in-flight DMAs reusing the crashed kernel's page tables until the endpoint
device drivers probe and quiesce their respective hardware.

However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.

This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.

Introduce ARM_SMMU_OPT_KDUMP_ADOPT and adopt functions memremapping all the
stream tables extracted from STRTAB_BASE and STRTAB_BASE_CFG.

Note that the adoption of the crashed kernel's stream table follows certain
strict rules, since the old stream table might be compromised. Thus, apply
some basic validations against the values read from the registers. If tests
fail, it means the stream table cannot be trusted, so toss it entirely. To
avoid OOM due to a potentially corrupted stream table, the memremap for l2
tables is done on the kdump kernel's demand.

The new option will be set in a following change.

Fixes: b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel")
Cc: stable@vger.kernel.org # v6.12+
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 254 +++++++++++++++++++-
 2 files changed, 252 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ef42df4753ec4..cd60b692c3901 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -861,6 +861,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_OPT_MSIPOLL		(1 << 2)
 #define ARM_SMMU_OPT_CMDQ_FORCE_SYNC	(1 << 3)
 #define ARM_SMMU_OPT_TEGRA241_CMDQV	(1 << 4)
+#define ARM_SMMU_OPT_KDUMP_ADOPT	(1 << 5)
 	u32				options;
 
 	struct arm_smmu_cmdq		cmdq;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e8d7dbe495f03..aa6837a5daa88 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2040,16 +2040,70 @@ static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab,
 	}
 }
 
+static int arm_smmu_kdump_adopt_l2_strtab(struct arm_smmu_device *smmu, u32 sid,
+					  phys_addr_t base, u32 span,
+					  struct arm_smmu_strtab_l2 **l2table)
+{
+	struct arm_smmu_strtab_l2 *table;
+	size_t size;
+
+	/*
+	 * Only a coherent SMMU is supported at this moment. For a non-coherent
+	 * SMMU that wants to support ARM_SMMU_OPT_KDUMP_ADOPT, try MEMREMAP_WC.
+	 */
+	if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_COHERENCY)))
+		return -EOPNOTSUPP;
+
+	/*
+	 * Retest the span in case the L1 descriptor has been overwritten since
+	 * the adopt. Reject this master's insert; panic or SMMU-disable would
+	 * either lose the vmcore or cascade aborts. Do not try to fix it, as it
+	 * would break all other SIDs in the same bus (PCI case). The corruption
+	 * blast radius is already bounded to that bus range.
+	 */
+	if (span != STRTAB_SPLIT + 1) {
+		dev_err(smmu->dev,
+			"kdump: L1[%u] span %u changed since adopt (was %u)\n",
+			arm_smmu_strtab_l1_idx(sid), span, STRTAB_SPLIT + 1);
+		return -EINVAL;
+	}
+
+	size = (1UL << (span - 1)) * sizeof(struct arm_smmu_ste);
+
+	table = devm_memremap(smmu->dev, base, size, MEMREMAP_WB);
+	if (IS_ERR(table)) {
+		dev_err(smmu->dev,
+			"kdump: failed to adopt l2 stream table for SID %u\n",
+			sid);
+		return PTR_ERR(table);
+	}
+
+	*l2table = table;
+	return 0;
+}
+
 static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 {
 	dma_addr_t l2ptr_dma;
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
 	struct arm_smmu_strtab_l2 **l2table;
+	u32 l1_idx = arm_smmu_strtab_l1_idx(sid);
 
-	l2table = &cfg->l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)];
+	l2table = &cfg->l2.l2ptrs[l1_idx];
 	if (*l2table)
 		return 0;
 
+	/* Deferred adoption of the crashed kernel's L2 table */
+	if (smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT) {
+		u64 l2ptr = le64_to_cpu(cfg->l2.l1tab[l1_idx].l2ptr);
+		phys_addr_t base = l2ptr & STRTAB_L1_DESC_L2PTR_MASK;
+		u32 span = FIELD_GET(STRTAB_L1_DESC_SPAN, l2ptr);
+
+		if (span && base)
+			return arm_smmu_kdump_adopt_l2_strtab(smmu, sid, base,
+							      span, l2table);
+	}
+
 	*l2table = dmam_alloc_coherent(smmu->dev, sizeof(**l2table),
 				       &l2ptr_dma, GFP_KERNEL);
 	if (!*l2table) {
@@ -2061,8 +2115,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 
 	arm_smmu_init_initial_stes((*l2table)->stes,
 				   ARRAY_SIZE((*l2table)->stes));
-	arm_smmu_write_strtab_l1_desc(&cfg->l2.l1tab[arm_smmu_strtab_l1_idx(sid)],
-				      l2ptr_dma);
+	arm_smmu_write_strtab_l1_desc(&cfg->l2.l1tab[l1_idx], l2ptr_dma);
 	return 0;
 }
 
@@ -4556,10 +4609,204 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+static int arm_smmu_kdump_adopt_strtab_2lvl(struct arm_smmu_device *smmu,
+					    u32 cfg_reg, phys_addr_t base)
+{
+	u32 log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, cfg_reg);
+	u32 split = FIELD_GET(STRTAB_BASE_CFG_SPLIT, cfg_reg);
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	u32 num_l1_ents;
+	size_t size;
+	int i;
+
+	/*
+	 * Only a coherent SMMU is supported at this moment. For a non-coherent
+	 * SMMU that wants to support ARM_SMMU_OPT_KDUMP_ADOPT, try MEMREMAP_WC.
+	 */
+	if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_COHERENCY)))
+		return -EOPNOTSUPP;
+
+	if (log2size < split || log2size > smmu->sid_bits) {
+		dev_err(smmu->dev, "kdump: log2size %u out of range [%u, %u]\n",
+			log2size, split, smmu->sid_bits);
+		return -EINVAL;
+	}
+	if (split != STRTAB_SPLIT) {
+		dev_err(smmu->dev,
+			"kdump: unsupported STRTAB_SPLIT %u (expected %u)\n",
+			split, STRTAB_SPLIT);
+		return -EINVAL;
+	}
+
+	num_l1_ents = 1U << (log2size - split);
+	if (num_l1_ents > STRTAB_MAX_L1_ENTRIES) {
+		dev_err(smmu->dev, "kdump: l1 entries %u exceeds max %u\n",
+			num_l1_ents, STRTAB_MAX_L1_ENTRIES);
+		return -EINVAL;
+	}
+
+	cfg->l2.num_l1_ents = num_l1_ents;
+
+	size = num_l1_ents * sizeof(struct arm_smmu_strtab_l1);
+	cfg->l2.l1tab = memremap(base, size, MEMREMAP_WB);
+	if (!cfg->l2.l1tab)
+		return -ENOMEM;
+
+	cfg->l2.l2ptrs =
+		kcalloc(num_l1_ents, sizeof(*cfg->l2.l2ptrs), GFP_KERNEL);
+	if (!cfg->l2.l2ptrs)
+		return -ENOMEM;
+
+	for (i = 0; i < num_l1_ents; i++) {
+		u64 l2ptr = le64_to_cpu(cfg->l2.l1tab[i].l2ptr);
+		phys_addr_t l2_base = l2ptr & STRTAB_L1_DESC_L2PTR_MASK;
+		u32 span = FIELD_GET(STRTAB_L1_DESC_SPAN, l2ptr);
+
+		if (!span || !l2_base)
+			continue;
+
+		if (span != STRTAB_SPLIT + 1) {
+			dev_err(smmu->dev,
+				"kdump: L1[%u] unsupported span %u (vs %u)\n",
+				i, span, STRTAB_SPLIT + 1);
+			return -EINVAL;
+		}
+
+		/*
+		 * If the crashed kernel's l1 descriptors are deeply corrupted,
+		 * blindly memremapping every l2 table here could lead to OOM.
+		 *
+		 * Defer the l2 memremap to arm_smmu_init_l2_strtab(), so peak
+		 * memory is bounded by the kdump kernel's actual demand.
+		 */
+	}
+
+	return 0;
+}
+
+static int arm_smmu_kdump_adopt_strtab_linear(struct arm_smmu_device *smmu,
+					      u32 cfg_reg, phys_addr_t base)
+{
+	u32 log2size = FIELD_GET(STRTAB_BASE_CFG_LOG2SIZE, cfg_reg);
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	unsigned int max_log2size;
+	size_t size;
+
+	/*
+	 * Only a coherent SMMU is supported at this moment. For a non-coherent
+	 * SMMU that wants to support ARM_SMMU_OPT_KDUMP_ADOPT, try MEMREMAP_WC.
+	 */
+	if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_COHERENCY)))
+		return -EOPNOTSUPP;
+
+	/* Cap the size at what the kdump kernel itself would have allocated */
+	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
+		max_log2size =
+			ilog2(STRTAB_MAX_L1_ENTRIES * STRTAB_NUM_L2_STES);
+	else
+		max_log2size = smmu->sid_bits;
+
+	/* cfg->linear.num_ents is unsigned int, so cap log2size at 31 */
+	max_log2size = min(max_log2size, 31U);
+	if (log2size > max_log2size) {
+		dev_err(smmu->dev, "kdump: unsupported log2size %u (> %u)\n",
+			log2size, max_log2size);
+		return -EINVAL;
+	}
+
+	/*
+	 * We might end up with a num_ents != sid_bits, which is fine. In the
+	 * ARM_SMMU_OPT_KDUMP_ADOPT case, arm_smmu_write_strtab() is bypassed.
+	 */
+	cfg->linear.num_ents = 1U << log2size;
+
+	size = cfg->linear.num_ents * sizeof(struct arm_smmu_ste);
+	cfg->linear.table = memremap(base, size, MEMREMAP_WB);
+	if (!cfg->linear.table)
+		return -ENOMEM;
+	return 0;
+}
+
+static void arm_smmu_kdump_adopt_cleanup(void *data)
+{
+	struct arm_smmu_device *smmu = data;
+	u32 cfg_reg = readl_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+	u32 fmt = FIELD_GET(STRTAB_BASE_CFG_FMT, cfg_reg);
+
+	if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
+		kfree(cfg->l2.l2ptrs);
+		if (cfg->l2.l1tab)
+			memunmap(cfg->l2.l1tab);
+	} else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
+		if (cfg->linear.table)
+			memunmap(cfg->linear.table);
+	}
+}
+
+static int arm_smmu_kdump_adopt_strtab(struct arm_smmu_device *smmu)
+{
+	u32 cfg_reg = readl_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+	u64 base_reg = readq_relaxed(smmu->base + ARM_SMMU_STRTAB_BASE);
+	u32 fmt = FIELD_GET(STRTAB_BASE_CFG_FMT, cfg_reg);
+	phys_addr_t base = base_reg & STRTAB_BASE_ADDR_MASK;
+	int ret;
+
+	dev_info(smmu->dev, "kdump: adopting crashed kernel's stream table\n");
+
+	if (fmt == STRTAB_BASE_CFG_FMT_2LVL) {
+		/*
+		 * Both kernels run on the same hardware, so it's impossible for
+		 * kdump kernel to see the support for linear stream table only.
+		 */
+		if (WARN_ON(!(smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)))
+			ret = -EINVAL;
+		else
+			ret = arm_smmu_kdump_adopt_strtab_2lvl(smmu, cfg_reg,
+							       base);
+	} else if (fmt == STRTAB_BASE_CFG_FMT_LINEAR) {
+		/*
+		 * In case that the old kernel for some reason used the linear
+		 * format, enforce the same format to match the adopted table.
+		 */
+		ret = arm_smmu_kdump_adopt_strtab_linear(smmu, cfg_reg, base);
+		if (!ret)
+			smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+	} else {
+		dev_err(smmu->dev, "kdump: invalid STRTAB format %u\n", fmt);
+		ret = -EINVAL;
+	}
+
+	if (ret) {
+		arm_smmu_kdump_adopt_cleanup(smmu);
+		goto err;
+	}
+
+	ret = devm_add_action_or_reset(smmu->dev, arm_smmu_kdump_adopt_cleanup,
+				       smmu);
+	/* devm_add_action_or_reset ran the cleanup upon failure */
+	if (ret) {
+		dev_warn(smmu->dev, "kdump: failed to set up cleanup action\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	dev_warn(smmu->dev, "kdump: falling back to full reset\n");
+	memset(&smmu->strtab_cfg, 0, sizeof(smmu->strtab_cfg));
+	smmu->options &= ~ARM_SMMU_OPT_KDUMP_ADOPT;
+	return ret;
+}
+
 static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 {
 	int ret;
 
+	if ((smmu->options & ARM_SMMU_OPT_KDUMP_ADOPT) &&
+	    !arm_smmu_kdump_adopt_strtab(smmu))
+		goto out;
+
 	if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
 		ret = arm_smmu_init_strtab_2lvl(smmu);
 	else
@@ -4567,6 +4814,7 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 	if (ret)
 		return ret;
 
+out:
 	ida_init(&smmu->vmid_map);
 
 	return 0;
-- 
2.43.0



^ permalink raw reply related

* Re: [PATCH] arm64: Add user and kernel page-fault tracepoints
From: Justinien Bouron @ 2026-05-20 16:45 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Gunnar Kudrjavets, Ryan Roberts, Quentin Perret,
	Catalin Marinas, Kevin Brodsky, linux-kernel, David Hildenbrand,
	Lorenzo Stoakes, Will Deacon, linux-arm-kernel
In-Reply-To: <20260520073606.GA101133@e132581.arm.com>

On Wed, May 20, 2026 at 08:36:06AM +0100, Leo Yan wrote:
> On Tue, May 19, 2026 at 09:55:24PM -0700, Justinien Bouron wrote:
> 
> [...]
> 
> > @@ -606,6 +609,11 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
> >  	int si_code;
> >  	int pkey = -1;
> >  
> > +	if (user_mode(regs))
> > +		trace_page_fault_user(addr, regs, esr);
> > +	else
> > +		trace_page_fault_kernel(addr, regs, esr);
> 
> Based on the discussion [1], Arm64 has already supported perf sw event
> for page-faults:
> 
>   perf record -e page-faults ...
> 
> Seems there have a plan to consolidate perf event and tracepoints but I
> have no idea how it is going.
This refactor/consolidation is not specific to arm64 right? I see that both x86
and riscv also have both the tracepoints and the perf event, presumably they
will need to be refactored as well?

> I would leave this to maintainers.
Agreed.

> 
> > +
> >  	if (kprobe_page_fault(regs, esr))
> >  		return 0;
> 
> tracepoints should be after kprobe_page_fault(), as explained [2] by Mark.
Interesting. The reason I put them before the kprobe_page_fault is because this
is how x86 is doing it. x86 calls kprobe_page_fault from do_kern_addr_fault /
do_user_addr_fault which are called _after_ the trace_page_fault_{user,kernel}.
Is there a reason why this is allowed in x86 but not arm64?

Also I did not realize that there already was an attempt to add the page-fault
tracepoints in the past!

Best,
Justinien
> 
> Thanks,
> Leo
> 
> [1] https://lore.kernel.org/all/20250520140453.GA18711@willie-the-truck/
> [2] https://lore.kernel.org/all/aCtZfiU8bgkSAgLh@J2N7QTR9R3/


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox